Guide Line For Equipment Reability PDF

Guidelines for Equipment Reliability
SEMATECH
Technology Transfer 92031014A-GEN
SEMATECH and the SEMATECH logo are registered service marks of SEMATECH, Inc.
Product names and company names used in this publication are for identification purposes only and may be trademarks or service
marks of their respective companies
© 1997 SEMATECH, Inc.

Guidelines for Equipment Reliability
Technology Transfer # 92031014A-GEN
SEMATECH
May 5, 1992
Abstract: This guideline was developed by a task force comprised of reliability experts and users of
reliability methodologies from the SEMI/SEMATECH member companies. The document was
written to address the needs of semiconductor equipment manufacturers and their customers. It
includes a description of the principles of a cost-effective reliability program, instructions on how
to get started, and details on what needs to be done. A large portion of the document is dedicated
to analysis and testing methodologies. These include: Failure Modes and Effects Analysis
(FMEA), Fault Tree Analysis (FTA), Component Failure Analysis (CFA), Human Reliability
Analysis (HRA); and Reliability Testing, Component Testing, Accelerated Testing (Sudden Death,
Step-Stress Testing), Burn-in Testing, Life Testing, Environmental Stress Screening, Qualification
Testing, and Acceptance Testing.
Keywords: Life Cycle Phases, Reliability Testing, RAMP, Failure, FRACAS, Failure Modes and Effects
Analysis, Quality Function Deployment (QFD), Design of Experiment, Cost of Ownership, Infant
Mortality, Reliability Qualification Testing (RQT), Taguchi, User’s Groups, Reliability Block
Diagram Modeling (RBD), Environmental Stress Screening (ESS), Fault Tree Analysis (FTA)
Authors: Dhudsia, Vallabh
Approvals: Vallabh Dhudsia, Project Manager & Author

Keith Erickson, Director
Dan McGowan, Technical Information Transfer Team Leader
iii
Table of Contents
1 SUMMARY ................................................................................................................................. 1
2 THE RELIABILITY IMPROVEMENT PROCESS AND EQUIPMENT LIFE CYCLE........... 2
2.1 Introduction ......................................................................................................................... 2
2.2 The Equipment Life Cycle .................................................................................................. 2
2.3 Life Cycle Phases ................................................................................................................ 3
2.4 Life Cycle Cost.................................................................................................................... 9
2.5 The Reliability Improvement Process ............................................................................... 13
2.6 Applying the Reliability Improvement Process................................................................. 21
2.7 Summary ........................................................................................................................... 23
2.8 References ......................................................................................................................... 24
3 IMPLEMENTATION OF THE RELIABILITY IMPROVEMENT PROCESS....................... 25
3.1 Introduction ....................................................................................................................... 25
3.2 Management’s Role........................................................................................................... 25
3.3 Applying the Reliability Improvement ProcessThe Reliability Improvement Process..... 26
3.4 Specific Applications of the Reliability Improvement Process......................................... 44
3.4.1 Starting with Equipment in the Design Phasewith Equipment in the Design
Phase .................................................................................................................... 44
3.4.2 Starting with Equipment in the Prototype Phase ................................................... 46
3.4.3 Starting with Equipment in the Pilot Production Phasewith Equipment in the
Pilot Production Phase ......................................................................................... 47
3.4.4 Starting with Equipment in the Production and Operation Phasewith
Equipment in the Production and Operation Phase ............................................. 49
3.4.5 Starting with Equipment in Phase Out Phase with Equipment in Phase Out
Phase .................................................................................................................... 50
3.5 Functional ResponsibilitiesResponsibilities...................................................................... 51
3.6 Where to Begin.................................................................................................................. 52
3.7 Reliability Plans ................................................................................................................ 55
3.8 Application of Resources and Communicating Value ...................................................... 56
3.9 Summary ........................................................................................................................... 57
3.10 References ....................................................................................................................... 58
4 ACTIVITIES AND TOOLS IN THE RELIABILITY IMPROVEMENT PROCESS............... 59
4.1 Introduction ....................................................................................................................... 59
4.2 Reliability ActivitiesActivities.......................................................................................... 59
Technology Transfer # 92031014A-GEN SEMATECH

iv
List of Figures
Figure 2-1. Percent of Total Life Cycle Costs vs Locked-in Costs................................................ 9

Figure 2-2. Impact of a reliability program on life cycle cost...................................................... 11
Figure 2-3. Optimizing Life Cycle Costs ..................................................................................... 12
Figure 2-4. Decrease in Life Cycle Costs in New Generations of Equipment............................. 13
Figure 2-5. The Reliability Improvement Process........................................................................ 14
Figure 2-6. Application of Reliability Improvement Process....................................................... 22
Figure 3-1. Multiple Equipment and Their Life Cycle Phase Status............................................ 53
Figure 4-1. A Block Model Developed in RAMP for the SETEC Generic Wafer Handler
System...................................................................................................................... 125
Figure 4-2. An Estimate of the Cumulative Distribution Function for MTBF .......................... 127
Figure 4-3. A Pareto Diagram for Component Contribution to System Failure ........................ 128
Figure 4-4. A Revised Block Diagram for the SETEC Generic Wafer Handler System,
showing the Addition of the Redundant Wafer Sensor............................................ 128
Figure 4-5. An Estimate of the Cumulative Distribution Function for MTBF after
Modifying the Generic Wafer Handler System........................................................ 129
Figure 4-6. A Pareto Diagram for Component Contribution to System Failureafter
Modifying the Generic Wafer System...................................................................... 130
Further analysis reveals that C2 fails if parts 1 and 2 (P1 and P2) fail. C4 fails if parts 3
or 4 (P3 or P4) fail. The block diagram model now looks like:.............................. 135
SEMATECH Technology Transfer # 92031014A-GEN

v
List of Tables
Table 3-1. Reliability Improvement Process Applied at Six Different Starting Points................. 27
Table 3-2. Reliability Improvement Process Activities ............................................................... 31
Table 3-3. Reliability Improvement Process Activities2-3. Reliability Improvement
Process Activities for the Design Phase..................................................................... 34
Table 3-4. Reliability Improvement Process Activities for the Prototype Phase......................... 37
Table 3-5. Reliability Improvement Process Activities for the Pilot Production Phase .............. 40
Table 3-6. Reliability Improvement Process Activities for the Production and Operation
Phase .......................................................................................................................... 42
Table 3-7. Reliability Improvement Process Activities for the Phase–Out Phase2-7.
Reliability Improvement Process Activities for the Phase–Out Phase ...................... 44
Table 3-8. Design Phase Reliability Improvement Process Activities......................................... 45
Table 3-9. Prototype Phase Reliability Improvement Process Activities..................................... 47
Table 3-10. Pilot Production Phase Reliability Improvement Process Activities When
Initiated In Pilot Production Phase............................................................................. 48
Table 3-11. Production and Operation Phase Reliability Improvement Process Activities
When Initiated in Production and Operation Phase ................................................... 50
Table 3-12. Phase Out Phase Reliability Improvement Process Activities When Initiated
in Phase-Out Phase..................................................................................................... 51
Table 3-13. Current Product Line Status...................................................................................... 54

iii
Acknowledgements
To assist in the development of these guidelines, a task force of representatives from the
semiconductor industry was assembled to provide guidance in the structure and content. Their
contributions and dedication to this effort has been excellent and beyond the call of duty. Our
thanks to each of the task force members, reviewers, and contributors for their commitment to
such an ambitious effort. It has made the development of these guidelines more enjoyable and
possible.
TASK FORCE MEMBERS
Sandia National Labs. - SETEC

Wallis Cramond Dr. Robert Cranwell Dr. Irving Hall
Dennis Huffman Dr. Ron Iman Teresa Sype
SEMATECH
Dr. Vallabh H. Dhudshia, Texas Instruments, Inc.
David Seekon, National Semiconductor Corp.
Mario Villacourt
SEMATECH Member Companies

Denny Johnson, International Business Machines (IBM)
Karl Koch, Digital Equipment Corp. (DEC)
John O’Reilly, DEC
Richard Talbot, IBM
Larry Waite, National Semiconductor Corp.
Chuck Woodard, IBM
SEMI/SEMATECH
Dr. Michael McGraw
SEMI/SEMATECH Member Companies

Ron Dornseif, Genus Dr. Ralph Dudley, Applied Materials
Jack Olivieri, MKS Instruments

iv
REVIEWERS and CONTRIBUTORS
Samuel Becktel, Genus Dennis R. Hoffman, TI

Richard E. Howard, Luxtron Products Bob Holmstrom, ATEQ Corp.
Dr. Samuel Keene, IBM Dr. David J. Klinger, AT&T
Richard Gerstner, SEMATECH Dr. Jerry Brandwie, RI
David Troness, Intel Dr. Richard Prairie, SETEC
Sue Howell, SEMI/SEMATECH Debra Vogler, Varian Associates

v
The SEMATECH Perspective
Statement from Bill Spencer, CEO of SEMATECH:
Today’s competitive environment demands an increasing level of reliability in semiconductor

manufacturing equipment. The industry has made great strides in the last four years in improving
reliability. In fact, VLSI Research reports that in its annual customer survey, reliability has fallen
to sixth place on the list of biggest problems, after being number one for 10 years. VLSI is quick
to give SEMATECH credit for much of the improvement. And while the existence of
SEMATECH was a key element, the supplier industry should receive added praise for stepping
up and solving a major problem.
But, as with so much of this business today, reliability is a race without an end. And the formula
to improved reliability is to build it into every stage of development. This Reliability Guideline
will assist in development of a program to ensure consideration of reliability factors at every
stage of product development from inception through qualification.
The Guideline was developed by a task force comprised of reliability experts and users of
reliability methodologies from the SEMI/SEMATECH member companies. As a result, it offers
best-of-breed concepts and is written to meet the needs of semiconductor equipment
manufacturers and their customers. I’m sure it will prove an excellent tool.
William J. Spencer
President and Chief Executive Officer

vi
Preface
These guidelines have been written for use by semiconductor equipment suppliers and customers.
They are intended as a road map that these groups can refer to for assistance in improving the
reliability of their semiconductor manufacturing equipment as part of a long-term strategy aimed
at regaining an increased worldwide market share.
Although there is an abundance of reliability information available in text books, military
handbooks and standards, and guidebooks directed at specific products, there is no concise,
single source document available for the semiconductor equipment industry. The purpose of
these guidelines is to fill this gap. To assist in this effort, a task force consisting of
representatives from the semiconductor industry was assembled to provide guidance in the
structure and content of these guidelines. The guidelines do not provide comprehensive
instruction on the details of reliability engineering; rather they provide a description of the
principles of a cost-effective reliability program, instructions on how to get started, and details on
what needs to be done. Descriptions of necessary program activities and reliability concepts are
provided along with references for those who desire additional information.
The focus of the guidelines is on hardware reliability realizing that software reliability is an
important aspect of reliability for a large segment of semiconductor manufacturing equipment.
However, other guidelines exist that address the issue of software reliability. Thus, the software
reliability topic is discussed only briefly.
The guidelines:
• Are intended to be of value to managers, reliability engineers, and designers
• Are not a "detailed how-to" document, but rather a "roadmap of how to"
• Are centered around a continuous improvement process referred to as the
Reliability Improvement Process
• Cover the entire equipment life cycle as it applies to the semiconductor equipment
industry
Even though emphasis is placed on designing in reliability, the guidelines show how to
incorporate reliability into every phase of the equipment life cycle.

vii
The guidelines are broken into three sections:

• Section 2.0, The Reliability Improvement Process and Equipment Life Cycle,
describes the Reliability Improvement Process and the Equipment Life Cycle.
Life cycle phases are defined and discussed, as well as life cycle costs. The five
steps of the Reliability Improvement Process are defined and discussed.
• Section 3.0, Implementation of the Reliability Improvement Process, describes the
activities involved in applying each step of the Reliability Improvement Process to
each phase of the Equipment Life Cycle. The section associated with activities
provides information on applying the Reliability Improvement Process
continuously throughout the entire life cycle. Also discussed are the activities
associated with applying the Reliability Improvement Process during later phases
of the life cycle.
• Section4.0, Activities and Tools in the Reliability Improvement Process, provides
a description of the activities and tools that are part of the Reliability
Improvement Process. Activities are grouped under engineering, data, and testing.
Specific tools used in the application of certain activities are also discussed.
Section 3.0 is meant to provide more information and guidance on activities and
tools used in the application of the Reliability Improvement Process.

1
1 SUMMARY
These guidelines focus on a continuous improvement process referred to as the Reliability
Improvement Process, and the Equipment Life Cycle. These two concepts are introduced and
discussed in Section 1.0 of the guidelines. Knowledge of the equipment life cycle is important
because it provides a basis for understanding how and where reliability engineering enters into
the process of designing, producing, and operating the equipment. In this document, the life
cycle has been broken into six distinct phases, each representing a unique portion of the life
cycle. These six life cycle phases are:
1. Concept and Feasibility Phase
2. Design Phase
3. Prototype (alpha-site) Phase
4. Pilot Production (beta-site) Phase
5. Production and Operation Phase
6. Phase-out Phase
These phases provide the framework for tracking reliability improvement throughout the
equipment life cycle phases and guidance on when and where to apply resources. Life cycle costs
concepts are introduced to help understand the impact on expenditures and cost of ownership
when reliability is initiated at different phases of the life cycle.
The Reliability Improvement Process provides a means for systematically improving reliability
throughout the equipment life cycle. It is an iterative process of setting goals, evaluating,
comparing, and improving directed toward continuous reliability improvement. It consists of
five basic steps.
1. Establish reliability goals and requirements for equipment
2. Apply reliability engineering or improvement activities, as needed
3. Conduct an evaluation of the equipment or equipment design
4. Compare the results of the evaluation to the goals and requirements and make a
decision for the next step
5. Identify problems and root causes
The process then returns to Step 2, and repeats Steps 2 through 5 until goals and requirements are
met.

2
The role of management in implementing the Reliability Improvement Process is introduced in

Section 2.0. Management has responsibilities in establishing and implementing the Reliability
Improvement Process. These responsibilities include establishing the right environment and
choosing individuals to champion the effort. Section 2.0 provides details on preparing for and
implementing the Reliability Improvement Process, including a discussion on the various
activities associated with each step of the Reliability Improvement Process and each phase of the
life cycle. The Reliability Improvement Process can be used for a piece of equipment regardless
of its placement in the life cycle. The discussion in Section 2.0 includes information on how to
select equipment for initiating reliability improvement, the importance of data, and the choice of
activities when resources are limited.
Activities and tools used in applying the Reliability Improvement Process are discussed in more
detail in Section 3.0. Three types of activities are listed: engineering, data–related, and testing.
Many of the activities require tools for implementation. These tools come from various
disciplines such as probability and statistics and reliability engineering. References that have
detailed information on the tool or activity are provided at the end of each activity in Section 3.0.
2 THE RELIABILITY IMPROVEMENT PROCESS AND EQUIPMENT LIFE

CYCLE
2.1 Introduction
The reliability improvement process and the equipment life cycle form the basis for these
guidelines and are introduced in this section. The reliability improvement process is an iterative
process that provides:
• An effective and systematic way to include reliability in equipment design
• A structure for making reliability improvements throughout the equipment life
cycle
The reliability improvement process provides a means for making revolutionary advancements
when it is applied to equipment early in the design stage, or during major design upgrades, or for
making evolutionary improvements to existing equipment.
Knowledge of the equipment life cycle is important because it provides:
• The framework for applying the reliability improvement process
• A basis for understanding the best practice for improving equipment reliability
and the cost of the improvement
Life cycle costs are introduced in this section to provide a perspective on the impact of initiating
the reliability improvement process early in the equipment life cycle. A thorough knowledge of
life cycle costs and life cycle phase relationships helps to achieve better equipment at lower total
costs.
2.2 The Equipment Life Cycle

The equipment life cycle begins when the idea for the equipment is conceived and ends when the
equipment is no longer useful. The life cycle consists of phases that describe the state of design,
process of development, and production of the equipment. A working knowledge of these phases

3
enables proper planning and execution of the activities and functions necessary for designing,
manufacturing, and operating reliable equipment in a cost effective manner.
2.3 Life Cycle Phases

In this document, the life cycle has been divided into the six phases listed below. As indicated,
these six phases can be grouped under three macro phases. The three macro phases are
sometimes used in place of the six phases for illustrative purposes; this in no way impacts the
concepts and methodology presented.
1. Concept and Feasibility —Concept and Feasibility

2. Design
3. Prototype (alpha (X)-site) —Design and Development —Macrophases
4. Pilot Production (Beta (B)-site
5. Production and Operations
6. Phase-out Phase —Production and Operation
A discussion of each of the six life cycle phases follows.

1. Concept and Feasibility. The life cycle begins with this phase; the need for new
equipment is identified and alternative approaches to fulfilling that need are explored.
The need for new equipment may be based on existing equipment that can no longer
perform its intended function or on customer requirements for which the necessary
equipment does not exist.
ä Concept/Feasibility
Design
Prototype (α-site)
Pilot Production (β-site)
Production/Operation
Phase Out
During this phase, marketing and sales personnel, customer service representatives,
design and reliability engineers, and manufacturing engineers work together with the
customer to:
• Determine the need for new equipment
• Establish reliability goals
• Evaluate the feasibility of meeting these goals
• Estimate resource requirements
• Examine alternative design concepts

4
• Select those concepts to be studied in more detail during the design phase
• Estimate cost trade offs
The concept and feasibility phase, and the design phase that follows, are the optimal
times for using design-for-reliability practices.
2. Design. The alternative design concepts selected during the concept and feasibility phase
are explored in more detail by the design engineers during this phase of the life cycle. A
design disclosure package is prepared and evaluated by all concerned parties. Reliability
and manufacturing engineers, as well as quality assurance and field service personnel are
generally called on by the design engineers for input concerning parts selection,
components, serviceability, and manufacturing processes. Also, reliability goals set for
the equipment during the concept and feasibility phase are translated into requirements
very early in the design phase. Requirements are useful in making preliminary reliability
allocations to subsystems and components to understand cost impacts.
This phase of the life cycle can be separated into two parts: preliminary design and final
design.
Concept/Feasibility
ä Design
Prototype (α-site)
Phase Out
During the preliminary design process, design and reliability engineers:

• Modify goals to meet customer requirements
• Evaluate a number of design alternatives
• Make preliminary reliability allocations to subsystems and components
• Prepare a design disclosure package of requirements and specifications
• Estimate cost considerations
More than one design alternative may be selected for the final design phase if serious
questions remain about the best choice.
During the final design process, customer and supplier representatives, design and
reliability engineers, project managers, field service personnel, manufacturing engineers,
and quality assurance personnel:
• Update reliability allocations to subsystems and components
• Carry out design reviews
• Implement design-for-reliability practices
• Update the design disclosure package to reflect these reviews
• Select specific designs for prototype construction

5
• Estimate cost trade offs and considerations

Several iterations of design review and redesign are usually required before a design is
ready for prototype construction. Design reviews are important in measuring the progress
against design requirements and gaining management approval to proceed with the
prototype phase of the life cycle. These reviews are carried out in parallel with the design
process and are often categorized as follows:
• Requirements Review - review the equipment’s design requirements
• Preliminary Design Review - evaluate the preliminary design against
requirements
• Critical Design Review - provide design to the customer(s) for review
3. Prototype. Specific designs selected during the design phase are built and tested during
this phase to determine if all design requirements will be met. The prototype phase pro-
vides the first opportunity to validate the entire design, and is therefore commonly called
alpha-site evaluation. Selected customers are included in alpha-site evaluations and are
asked to provide feedback on all aspects of the equipment.
Concept/Feasibility
Design
ä Prototype (α-site)
Phase Out
Multiple design alternatives may require prototyping and testing if serious questions exist
about the best overall choice. It is common for reliability engineers to have responsibility
for performing these tests. However, manufacturing personnel will have responsibility
for determining that parts and components conform to specifications within financial
guidelines.
During the prototype phase, design, reliability, test, and manufacturing engineers, as well
as quality assurance personnel:
• Build and test one or more prototypes of a design
• Present the test results for a pilot production design review
• Redesign as needed to fix weaknesses or make other desirable changes
• Conduct additional design reviews as appropriate
The design reviews should include another critical design review to give the customer an
opportunity to review the latest design being considered.
Concurrent with redesigns and design reviews, reliability engineers, quality assurance
personnel, and manufacturing engineers will develop quality assurance plans, design
inspection and testing programs, set up production facilities, and develop production
plans in preparation for the pilot production phase.

6
4. Pilot Production. This phase of the life cycle serves as a bridge between the prototype
phase and the production and operation phase. This is the first opportunity for the
equipment to be evaluated in an extended customer environment, and is therefore
commonly called beta-site evaluation. In fact, it may be the first time that the equipment
is exposed to a customer’s processes.
Concept/Feasibility
Design
Prototype (α-site)
ä Pilot Production (β-site)
Phase Out
The purpose of the pilot production phase is to help identify and correct problems with
the equipment before full-scale production begins. Design and reliability engineers
should evaluate the actual level of equipment reliability and determine what needs to be
accomplished to meet requirements in a cost effective manner.
During the pilot production phase, project management, reliability engineers, manufactur-
ing and test personnel, and customer service representatives:
• Qualify the equipment manufacturing process
• Establish field trials and customer applications of equipment
• Monitor the equipment’s performance
• Identify root causes of failures
• Implement a "corrective action" program for reliability problems
• Determine cost of ownership
Prior to the production and operation phase of the life cycle, reliability and design
engineers should evaluate equipment reliability and make the appropriate recommen-
dations. If the actual equipment reliability level is less than desired, specific reliability
improvement activities that were identified in the corrective action program should be
implemented. This is the last opportunity to make design changes and other
improvements before full-scale production.
Design reviews conducted at this point are often broken down into:
• Qualification Review - verify that the final design meets requirements
• Production Readiness Review - to determine the readiness of full
production
• Reliability Budget Review - verify the reliability goal allocations
If any design changes were made at this point, another critical design review may be
appropriate.

7
5. Production and Operation. This phase of the life cycle represents the time when units
are produced and sold. All major reliability problems should have been identified and
corrected prior to the production and operation phase. A formal program must be in place
for collecting and analyzing field service data and performance data for the customer’s
unit as well as for the cost impact.
Concept/Feasibility
Design
Prototype (α-site)
ä Production/Operation
Phase Out
During the production and operation phase, field service personnel, management, quality
assurance personnel, and reliability engineers:
• Implement a field tracking and customer feedback and satisfaction
program
• Provide training and technical assistance to customers
• Document and employ installation testing and operation procedures
• Identify and report operation and maintenance problems
• Record failure data in a formal database
• Manage continuous improvement efforts
• Determine cost of ownership impacts
Recorded failure data should account for uncertainty due to variations in site, product
vintage, and customer procedures.
After proper review, decisions are made for resource allocation for continuous improve-
ment in the reliability process. The supplier and customer should function as partners in
these efforts and may participate in user groups.
Once equipment is in the field, it is important to continually monitor reliability, analyze
failures and identify root causes, implement corrective actions, and improve known
causes of failures both for the current and the next generation of equipment.
6. Phase Out. The equipment product line is approaching the end of its useful life during
this final phase of the life cycle. The end of useful life naturally occurs earlier for the
supplier than it does for the customer. The end of useful equipment life for the customer
can occur due to obsolescence, wear, or a change in business plans. To remain
competitive, the supplier must make plans for the next generation of equipment before
phasing out current generation production.

8
Concept/Feasibility
Design
Prototype (α-site)
ä Phase Out
The information gained during the six phases of the life cycle should be retained so that it
can be used to improve future generations of similar or new equipment.
This completes the life cycle for the current generation of equipment. Each new
generation of equipment would experience basically the same life cycle.
Supplier Cost Implications. The early life cycle phases typically represent the smallest portion
of those total life cycle costs borne by the supplier, yet generally represent the region where the
greatest impact on equipment reliability can be made. As a design moves toward completion,
design details become increasingly fixed. Thus, the cost in time and dollars to correct reliability
problems increases. Figure 1-1 shows that typically, toward the end of the design/development
macro phase of the life cycle, only 15% of the life cycle costs are consumed, but approximately
95% of the total life cycle costs have been determined (i.e., locked in).[2] Thus, changes made to
improve reliability after the design/development macro phase have little impact on overall life
cycle costs, but can be very expensive in terms of costly design changes, retrofits, service calls,
warranty claims, and customer goodwill. This is not meant to imply that equipment already in
the production/operation macro phase should be ignored in terms of improving reliability.
Reliability improvement activities should continue throughout the life cycle.

9
100 100
95%
85%
Operation (50%)
80 80
% Locked-In Costs
60 60
% %
Locked-In Total
Costs Costs
40 40
Production (35%)
20 20
12%
3%
0 0
Concept/Feasibility Design/Development Production/Operation
Source: Arsenault and Roberts, Reliability and Maintainability of Electronic Systems
Figure 2-1. Percent of Total Life Cycle Costs vs Locked-in Costs
Although reliability improvements made earlier in the life cycle can increase initial supplier
costs, they generally result in lower support costs for the supplier and lower operational costs for
the customer. Also, early improvement could reduce the supplier’s costs of production, warranty,
and service.
2.4 Life Cycle Cost

Two criteria used by semiconductor manufacturers to select equipment for a manufacturing step
or process are:
1. Technical
2. Economical[1]
The question asked for the technical criterion is, "Can a particular piece of equipment or
equipment line do the manufacturing step or process required?" The question asked for the
economical criterion is, "Does the result of the manufacturing process justify or support the cost
and on-going expense of a particular piece of equipment or equipment line?" It is increasingly
common for several pieces of equipment to be able to meet the technical criterion. Thus, the
economical criterion is becoming increasingly important. Customers consider not only the initial
purchase price, but the costs associated with equipment operations over its entire life (i.e., life
cycle costs).

10
Life cycle costs include both equipment supplier costs, which are passed on to the customer in
the purchase price of the equipment, and all costs incurred by the customer over the equipment
life. Supplier costs plus the supplier’s gross profit margin are referred to asacquisition costs, and
include:
• Research and development
• Marketing and sales
• Testing and manufacturing
• Supplier shipping and installation
• Supplier training and support
• Supplier service and spare parts
• Warranty costs
• Continuous improvement
Costs incurred by the customer are referred to as operational costs, and include:
• Customer installation and training
• Operating costs
• Customer service costs and spares inventory
• Customer performed maintenance
• Customer space costs
• Scheduled maintenance
• Equipment improvements and upgrades
• Down time and scrap costs
• Disposal costs
Life cycle costs implications to both the supplier and the customer are discussed in the following
paragraphs.

11
Customer Cost Implications. Improvements in reliability made by the supplier early in the
equipment life cycle may result in higher development costs being passed on to the customer in
the equipment acquisition costs. However, this can be more than offset as the customer benefits
by having lower operational costs with increased reliability and up time that results in greater
productivity.
Figure 1-2 illustrates how a reliability program impacts acquisition and operational costs. As this
figure indicates, acquisition costs may increase due to efforts to improve reliability.
Operational
Total Costs
Operational
Life
Costs
Cycle
Costs
Total
Life
Cycle Acquisition
Costs
Costs
Acquisition
Costs
No Formal With Formal

Reliability Reliability
Program Program
Figure 2-2. Impact of a reliability program on life cycle cost
However, operational costs, and even more important, total life cycle costs decrease. It is
important for the customer to make equipment purchase decisions based on total life cycle costs
and not just on initial purchase price.

12
Optimizing Life Cycle Costs. Increasing acquisition costs to improve equipment reliability and
lower operational and total life cycle costs is clearly a recommended practice. However, there is
a point at which increasing acquisition costs to obtain higher levels of reliability is no longer
beneficial. Figure 1-3 shows an optimal point beyond which total life cycle costs begin
increasing with further improvements in reliability.
Life Cycle
Costs
Optimized Cost
Point
Life
Cycle
Costs
Acquisition Operational
Costs Costs
Reliability
Figure 2-3. Optimizing Life Cycle Costs
When this occurs, a more reliable technology is required for further improvement.
Reliability insights from a technology used in one generation of equipment should be
documented so they can be used to improve the next generation. Improvements in technology
transfer between equipment generations will generally produce a decrease in the life cycle costs
in each succeeding generation of equipment as shown in Figure 2-4.

13
Generation 1
Generation 2
Life Generation 3
Cycle
Costs
Generation 4
Reliability
Figure 2-4. Decrease in Life Cycle Costs in New Generations of Equipment
2.5 The Reliability Improvement Process

The reliability improvement process is an iterative process that is applied at each phase of the
equipment life cycle. It consists of five basic steps:
1. Establish reliability goals and requirements for equipment
2. Apply reliability engineering or improvement activities, as needed
3. Conduct an evaluation of the equipment or equipment design

14
4. Compare the results of the evaluation to the goals and requirements and make a
decision to move either to the next step or the next phase
5. Identify problems and root causes
The process then returns to Step 2, and Steps 2 through 5 are repeated until goals and
requirements are met.
The reliability improvement process steps are shown in the flowchart in Figure 1-5.
Establish Goals/Requirements
Step 2.
Reliability Engineering/Improvements
Step 3.
Conduct Evaluation
Step 4.
Go/No Go
Are Yes Decision on
Goals/Requirements Met? Next Phase
No
Step 5.
Identify Problems & Root Causes
Figure 2-5. The Reliability Improvement Process

15
1. Establish Reliability Goals and Requirements. The first step in the reliability improve-
ment process is to establish reliability goals and requirements. A distinction is made be-
tween goals and requirements. Goals are more internally driven and may or may not be
met. Requirements, on the other hand, are more specific and are customer driven.
Requirements are usually included as deliverables in contractual agreements. Goals are
the starting point, but are modified to satisfy customer requirements early in the equip-
ment life cycle.
Step 2.
Step 3.
Conduct Evaluation
Step 4.
Go/No Go
Are Yes Decision on
No
Step 5.
All goals have certain common characteristics. The following criteria can be used to
assist in establishing goals[3]:
• Attainability: Goals should be set at levels reasonably attainable within
the available time span. Large goals over long periods should be avoided
to maintain interest and commitment. Subgoals over shorter times are
more attainable and more cost effective.
• Supportability: Support and resources must be available at the time they
are needed to achieve goals. Advance planning is needed to determine the
resources and the extent to which they can or will be provided.
• Acceptability: Goals must be acceptable to those who will be actively
involved in pursuing these goals. Acceptance is influenced by relevance,
perceived importance, reasonableness, and desirability of outcome.
• Measurability: Goals provide standards against which performance may
be assessed and, therefore, should be selected for suitability and defined in
a way that enables measurement. To make them measurable, goals must
be defined qualitatively, quantitatively, and in terms of performance
parameters, values, and time scales.

16
2. Reliability Engineering and Improvements. Once goals and requirements have been
established, design-for-reliability practices, or reliability improvement activities are
applied to enhance the reliability of equipment that is in any phase of the life cycle, or for
equipment already in existence.
Step 2.
Step 3.
Conduct Evaluation
Step 4.
Go/No Go
Are Decision on
No
Step 5.
There are some basic practices that can be applied to improve reliability. These include:
• Simplicity. Simplification of equipment configuration is one of the basic
principles of designing-for-reliability. Added parts or features increase the
number of failure modes. A common practice in simplification is referred
to as component integration (the use of a single component to perform
multiple functions).
• Redundancy. Another reliability improvement practice is to include more
than one way to accomplish a function by having certain components or
subassemblies in parallel, rather than in series. Beyond a certain point,
redundancy may be the only cost-effective way to design reliable
equipment.
• Proven Components and Methods. To the extent possible, designers
should use components and methods that have been shown to work in
similar applications. Using proven components can minimize analyses
and testing to verify reliability, thus reducing time and costs of
demonstrating reliability of the equipment.
• Derating. Derating is the practice of using components or materials at
environmental conditions or loads that are less severe than their limiting
condition. Under these conditions, the component or material is expected
to be more reliable.
• Eliminating Known Causes of Failure (Fault Avoidance). This can be
accomplished through screening and burn-in procedures to eliminate weak
components before equipment is actually shipped to the customer.

17
• Failure Detection Techniques. Reliability of equipment can be improved

by incorporating failure detection methods or self-healing devices such as
periodic maintenance schedules, monitoring procedures, automatic sensing
and switching devices.
• Ergonomics or Human Factors Engineering. The activities of humans can
be very important to equipment reliability. The equipment design must
consider human factors aspects such as the person-machine interface,
human reliability, and maintainability.
Conduct Evaluation. The next step in the reliability improvement process is to conduct
an evaluation of the equipment or equipment design to assess its reliability level. A
powerful tool for conducting this evaluation is reliability modeling. For equipment in the
early phases of the life cycle, reliability modeling can be used to predict the equipment’s
performance to provide information for design changes or for evaluating design alterna-
tives. For equipment that is already in production or is operational in the field, reliability
modeling, combined with testing and failure data analysis, can be used to identify critical
components and help guide resource allocation and reliability improvement decisions.
Step 2.
Step 3.
Conduct Evaluation
Step 4. Go/No Go
Are Decision on
No
Step 5.
There are a number of reliability prediction models. These include:

• Block diagram models. A block diagram is used to logically represent the
equipment being modeled by breaking it down into subsystems and
components. Equipment reliability is modeled using failure data on the
subsystems and components.
• State transition (Markov) models. Equipment reliability is modeled by
identifying the various operating conditions (states) that the equipment,
subsystem, or component can experience, and the probability of transition
from one state to another.
Other techniques for evaluating equipment reliability and identifying design weaknesses
include:

18
• Fault tree analysis (FTA). A "top down" approach beginning with an

undesirable event (usually equipment failure) at the top or system level
and identifying the events at subsequent lower levels that can cause the
undesirable top event.
• Failure modes and effects analysis (FMEA). A technique for
systematically identifying, analyzing, and documenting the possible failure
modes within a design and the effects of such failures on equipment
performance.
Testing is another tool for evaluating equipment reliability. Typically, three different
categories of testing are applied:
1. Component tests - useful in flushing out basic weaknesses in critical
components
2. Systems tests - intended to explore effects of component interactions
3. Reliability demonstration tests - used to demonstrate equipment capability
The above concepts are discussed in more depth in Section 2.0 and 3.0.

19
4. Are Goals and Requirements Met? Results of the evaluation process are compared to
reliability goals and requirements. If goals and requirements are not met, the problems
and root causes should be identified as described in Step 5, and reliability improvement
activities should be initiated. If goals and requirements are met or exceeded, then approv-
al can be given to move to the next phase of the life cycle, or goals and requirements can
be updated and additional analyses carried out. For example, if the equipment is in the
concept and feasibility or design phase of the life cycle, sensitivity analyses can be
conducted to evaluate design and cost trade-offs such as:
• Design complexity versus reliability
• Maintainability versus reliability
• Increased costs versus reliability
Esbablish Goals/Requirements
Step 2.
Step 3.
Conduct Evaluation
Step 4.
Go/No Go
Are Decision on
No
Step 5.
If goals are, or can be exceeded by a significant margin, then the supplier should
capitalize on the situation by turning it into a competitive leadership position.
Upon completing design trade-off studies, approval can be given to move to the next
phase of the equipment life cycle where the reliability improvement process is again
initiated.
5. Identify Problems and Root Causes. If reliability goals and requirements are not met,
the reasons need to be identified and corrective actions should be taken. Test data on
prototypes or actual equipment in the field can be used to supplement information on
equipment reliability generated from predictive modeling. Testing can also help to
identify causes of failure and any potential reliability problems.

20
Step 2.
Step 3.
Conduct Evaluation
Step 4. Go/No Go
Are Decision on
No
Step 5.
A key tool useful for reporting and analyzing failure data is the failure reporting,
analysis, and corrective action system (FRACAS). This tool is discussed in more detail in
Sections 2.0 and 3.0.
Test data and all reported failures should be investigated to verify that a failure occurred.
Failure verification can be performed by subjecting the component to the same conditions
as those reported when the "failure" occurred.
The reliability improvement process now returns to Step 2, where reliability improvement
and growth activities are initiated, or upgrades and modifications to reliability goals and
requirements are made. Reliability growth activities generally fall into the following
major categories:
• Strengthening the existing design, by testing or modeling (or both) to
identify optimal design changes to improve reliability. The process of
identifying weak areas can be aided by performing sensitivity studies using
the reliability model of the system.
• Redesigning part or all of the system (fault tolerance), which includes
studying ergonomic–enhancing software, adding redundancy, and
incorporating error detection techniques.
• Eliminating known causes of failure (fault avoidance), which includes
using screening and burn-in procedures to eliminate weak components,
derating parts, and using more reliable parts.
Steps 2 through 5 are repeated until goals and requirements are met. The process may
require several cycles of goal setting, evaluating, comparing, and improving. Approval
can then be given to move to the next phase of the life cycle, where the reliability
improvement process is again applied.

21
2.6 Applying the Reliability Improvement Process

Optimal benefits from use of the reliability improvement process are clearly realized when the
process is applied to equipment in the concept and feasibility phase of the life cycle and then
continuously applied thereafter. Benefits can also be realized when the improvement process is
applied to equipment that is in some advanced phase of its life cycle. It is important to address
equipment reliability throughout the life cycle. For example, reliability improvements may be
necessary:
• Following the Prototype Phase, because of design deficiencies or parts problems
uncovered during prototype testing
• Beginning the Pilot Production Phase, due to reliability related issues resulting
from manufacturing a new equipment line
• During the Production and Operation Phase, because feedback from field
personnel and customers indicate reliability problems due to unanticipated failure
mechanisms.
Activities
Activities associated with applying the reliability improvement process to the equipment life
cycle remains basically the same from one phase of the life cycle to the next. Others, however,
vary because of the change in focus from phase to phase. For example, focus in the concept and
feasibility macro phase is primarily on "planning and allocating;" focus in the design and
development macro phase is primarily on "predicting and verifying;" and focus in the production
and operation macro phase is primarily on "evaluating and improving."
The activities also vary depending on whether the improvement process has been continuously
applied to equipment as it moved through its life cycle from concept and feasibility to phase out,
or whether it is being applied for the first time to equipment that is in some advanced phase. For
example, consider equipment in the prototype phase: If the reliability improvement process has
been applied continuously to the equipment in the concept and feasibility phase and in the design
phase, then the reliability goals and requirements already exist. Thus, the reliability goals and
requirements activity consists, primarily, of updating the goals and requirements; the primary
focus would be on prototype testing and corrective action activities. However, if the reliability
improvement process was applied to equipment for the first time during the prototype phase, then
developing reliability goals and requirements should be a major focus because these goals and
requirements do not exist. These concepts are discussed in more detail in Section 2.0.
Figure 1.6 provides a high-level view of the main activities associated with applying the
reliability improvement process to each of the three macro phases of the life cycle. This is
provided primarily to illustrate the flow from one macro phase to the next. A more detailed
discussion of applying the reliability improvement process to all six phases of the life cycle, and
a list of the associated activities, is presented in Section 2.0. Some of the activities will vary as
the reliability improvement process is tailored to a particular need or equipment line. However,
the reliability improvement process remains unchanged.

22
Concept/Feasibility
Step 2.
Step 3.
Conduct Evaluation
Concept/Feasibility
Step 4.
Are Go/No Go
Decision on Establish Goals/Requirements
Goals/Requirements Met?
Next Phase
Step 2.
No Reliability Engineering/Improvements
Step 5.
Step 3.
Conduct Evaluation
-Set Reliability Goals Concept/Feasibility

Step 4.
-Create Reliability Program Plan Are Go/No Go
Decision on Establish Goals/Requirements
Goals/Requirements Met?
Next Phase
-Develop Conceptual Designs
Step 2.
No Reliability Engineering/Improvements
-Develop Preliminary Model Step 5.
-Evaluate Conceptual Designs Step 3.
Conduct Evaluation
-Next Phase Go/No Go Approval -Translate Goals into Requirements
-Identify Problems and Root -Apply Design-For-Reliability Practices Step 4.
Are Go/No Go
Causes Goals/Requirements Met?
Decision on
-Carry out Design Reviews Next Phase
-Develop Corrective Actions

-Upgrade Reliability Model No
Step 5.
-Predict Equipment Performance
-Next Phase Go/No Go Approval -Revise Goals/Requirements
-Identify Problems and Root Causes -Implement Field Tracking System
-Develop Corrective Actions -Begin Customer Feedback Program

-Start Corrective Action Program
-Upgrade Reliability Model
-Identify Problems and Root Causes

-Develop Corrective Actions
-Begin Phase Out Activities
Figure 2-6. Application of Reliability Improvement Process

23
2.7 Summary
Knowledge of the equipment life cycle is important because it provides a basis for understanding
how and where reliability engineering enters into the process of designing, producing, and
operating the equipment. The equipment life cycle is broken into distinct phases, each
representing a unique portion of the equipment life. These phases provide the framework for
tracking reliability throughout the life cycle of the equipment and guidance on when and where to
apply resources. Awareness of life cycle costs help equipment owners understand the impact on
expenditures and cost of ownership when reliability is initiated at different life cycle phases.
The reliability improvement process provides a means for systematically improving reliability
throughout the equipment life cycle. Optimal benefits are realized when reliability is designed
into a piece of equipment. However, it is important to improve reliability throughout the life of
the equipment to meet reliability goals and objectives.
The reliability improvement process is an iterative process of setting goals, then evaluating
(predicting), comparing, and improving those goals. Central to the reliability improvement
process is data collection and analysis; design improvements; and operations and maintenance
procedure improvements.
About Section 3.0
The next section provides details on preparing for and implementing the reliability improvement
process. It includes a discussion of the various activities associated with each step of the
improvement process and each phase of the life cycle. In preparation for this discussion, the
following questions may assist in assessing current reliability practices and focus.
1. Is the importance of reliability conveyed throughout the company?
2. Is the approach to reliability improvement reactive or proactive?
3. Is the equipment development process life cycle oriented?
4. Have specific goals and requirements been established for equipment
reliability and its growth?
5. Does the organization have technical and executive managers who
champion the reliability cause?
6. Is demonstrated achievement of reliability goals a part of the criteria for
deciding when equipment is ready for release to market?
7. Does the organization collect data that can readily be used in measuring and
providing guidance for equipment reliability performance?
8. Do indicators of reliability performance exist for all equipment?
9. Are these indicators routinely monitored to ensure achievement of
improvement goals?
10. Is a closed–loop failure reporting and corrective action system in place?

24
2.8 References
1. SI Staff, "Selecting a Product: The Task at Hand," Semiconductor International,
March 1991, pages 7-8.
2. J. E. Arsenault and J. A. Roberts, Reliability and Maintainability of Electronic
Systems, Potomac, MD:Computer Science Press, 1980.
3. W. Grant Ireson and Clyde F. Coombs, Jr., Handbook of Reliability Engineering
and Management, Editors in Chief, McGraw-Hill, 1988.

25
3 IMPLEMENTATION OF THE RELIABILITY IMPROVEMENT PROCESS
3.1 Introduction
To ensure that maximum benefits are achieved when implementing the reliability improvement
process, it is important to have an understanding of:
• Management’s role in the implementation process
• The activities associated with applying the process
• Functional responsibilities in the implementation process
• Where to start the process
• How to use limited resources and communicate the value of the process
Each of these topics is discussed in this section. Primary focus is given to applying the
reliability improvement process. Activities associated with applying the reliability improvement
process to equipment in the concept and feasibility phase and continuing throughout its life cycle
are discussed first. Later, the discussion focuses on activities associated with applying the
reliability improvement process to equipment in an advanced phase (other than concept and
feasibility) of the life cycle.
3.2 Management’s Role

Management plays a vital role in implementing the reliability improvement process. It has the
responsibility for establishing the right environment, and in choosing individuals to champion the
effort. The champions provide leadership and are accountable for the success of the reliability
improvement process.
Management’s Responsibility
One of management’s primary responsibilities is to convey the importance of reliability
throughout the company. Institutionalizing the reliability improvement process may require a
cultural change and even an organizational change. Therefore, management leadership and
commitment to this change is essential to ensure success. Success also depends on management’s
understanding of the activities involved in the reliability improvement process and on their
support of these activities.
Reliability Champions
Selection of reliability champions is critical to the success of the reliability improvement process.
Two reliability champions are recommended for moderate-to-large sized companies: an
executive champion and a technical champion. In a small company, these two roles may be
combined for one person.
Executive Champion. The role of the executive champion is to:
• Provide executive leadership in reliability improvement matters
• Promote reliability improvement throughout the company
• Provide assurance that the reliability improvement process is supported

26
• Work closely with the technical champion to develop reliability activities

• Mentor the reliability improvement process and ensure that accomplishments are
acknowledged
Depending on the size of the company, the executive champion could occupy any of a number of
upper management positions. The following are a few examples:
• President or vice president
• Chief operations officer
• Chief technical officer
• Corporate total quality management executive
Technical Champion. The technical champion establishes the reliability improvement process
and is held accountable for its success. The technical champion takes an active role in:
• Providing both managerial and technical leadership
• Ensuring the implementation of an effective cross-functional improvement
process
• Selecting the reliability activities to be performed and the tools that will be used
• Ensuring that the reliability improvement process is continuously applied
• Training participants in reliability concepts and tools
If not already experienced in reliability, the technical champion should be trained in reliability
principles. This training should include a full understanding of the equipment life cycle and life
cycle costs concepts as well as reliability improvement process activities. This ensures the
background necessary to provide proper guidance for application of the activities and tools
associated with implementing the reliability improvement process.
The technical champion could be the manager of, or chief engineer within, one of the following
organizations:
• Systems engineering
• Reliability engineering
• Product engineering
• Customer engineering
3.3 Applying the Reliability Improvement ProcessThe Reliability Improvement Process

The reliability improvement process can be applied continuously as equipment moves through its
life cycle phases. Activities associated with applying the process may vary as the equipment
moves from one phase of the life cycle to the next. This variation results from a change in focus
from phase to phase, and from the fact that an activity performed in one phase lays the
foundation for activities in subsequent phases. Activities will also vary depending on whether
the improvement process is applied continuously as equipment moves through its life cycle (from
concept and feasibility to phase out), or whether it is applied for the first time to equipment that
is in some advanced (other than concept and feasibility) phase.
The following table lists the sections that contain descriptions of the reliability improvement
process for each of the starting points (process applied for the first time):

27
Table 3-1. Reliability Improvement Process Applied at Six Different Starting Points
Starting Points/Life Cycle Phase in Which Reference Sections

The Process Applied For The First Time
Concept and Feasibility Section 3.3.1

Design Section 3.4.1
Prototype Section 3.4.2
Pilot Production Section 3.4.3
Production/Operation Section 3.4.4
Phase Out Section 3.4.5

28
Starting with Equipment in the Concept and Feasibility Phase

The following paragraphs discuss the activities that are performed when the reliability
improvement process is first applied to equipment in the concept and feasibility phase and then
continuously applied in subsequent phases. The discussion for each life cycle phase concludes
with a list of objectives that will have been met as a result of applying the reliability
improvement process, and a table summarizing the activities associated with applying the process
to that phase of the life cycle.
Concept and Feasibility
Step 1. Establish Goals and Requirements. In the concept and feasibility phase, the focus of
Step 1 is on establishing goals to meet customer requirements. Later these goals may be revised,
and are eventually modified to reflect changes in customer requirements, or in response to
observations regarding equipment performance level.
ä Concept/Feasibility
Design
Prototype (a-site)
Pilot Production (b-site)
Phase Out
Goals can be established based on:

• Customer Voice. When establishing reliability goals, it is important to consider
who the customers are and what aspects of reliability they regard as most
important. The supplier must fully understand customers’ needs, and be able to
translate these needs into equipment-specific information for setting goals.
• Competitive Benchmarking. Competitive benchmarking is a process used by
suppliers to measure and compare their products, services, and operations against
competitors and world class performers.
• Reverse Engineering. The systematic dismantling of equipment with a high
reliability ranking is referred to as reverse engineering. The information obtained
provides information about the actual reliability of similar equipment and the
technology used to achieve that reliability.
• Warranty Requirements. To remain competitive, the reliability goals must support
the established warranty requirements.
• Equipment Maintenance. It is essential to discuss maintenance aspects of the
equipment with field personnel when establishing reliability goals. Improperly
addressing maintenance issues can lead to a design with very high user-perceived
reliability, but prohibitive maintenance costs.

29
Once goals have been established, a reliability program plan is created that documents how these
goals will be achieved. It defines:
• Activities to be performed
• Resources required to fulfill the activities
• Schedule for these activities
• Procedures by which the activities will be performed
• Organizations and interfaces required to perform the activities
The program plan provides management and the customer with a means of measuring progress
and assuring that requirements will be accomplished.
Step 2. Reliability Engineering and Improvements. In the concept and feasibility phase, Step
2 of the reliability improvement process focuses first on developing alternative design concepts.
All possible alternatives should be identified and evaluated to ensure that those selected for the
design phase are capable of fulfilling goals and requirements. Functional block diagrams are
used to develop the basic concepts for the equipment and to evaluate their feasibility. The
functional block diagram is updated as the concept changes.
The next step is to develop a preliminary model of the equipment using the functional block
diagrams. The initial model is created at a gross level; that is, the equipment is broken into a few
(approximately 10 to 20) major subsystems. This model is used to make initial predictions of the
equipment reliability (Step 3).
A reliability allocation is conducted to allocate the equipment reliability goal into the individual
major subsystems. This is done to make equipment reliability requirements more manageable and
to establish individual reliability requirements for each major subsystem. Since no detailed
information on the equipment is yet available, the allocation process is approximate; it is used to
guide the designer when developing various concepts.
In this phase, the equipment has not been built, so other sources of data are required. Historical
data can be used for those subsystems that are similar to previous generations of equipment. For
those subsystems for which no historical data is available, expert judgement can be used. Expert
judgement takes the opinion of individuals that are considered to be knowledgeable about a
subsystem or component and uses this knowledge to create initial reliability values.
Another reliability engineering activity available for identifying conceptual design weaknesses is
a failure modes and effects analysis (FMEA). This is a technique for systematically identifying,
analyzing and documenting the possible failure modes within a design and the effects of such
failures on equipment performance.
The process of setting up an FMEA is initiated in this step, but it is used later in Step 5 to help
identify problems and root causes.

30
Step 3. Conduct Evaluation. The subsystem failure data and the reliability prediction model
are used to evaluate the reliability of the conceptual design. A reality check assures that the
predicted reliability value makes sense. Evaluate the following:
• Predicted versus the anticipated reliability value
• Historical and expert opinion data used to calculate equipment reliability
• Reliability prediction model
Conceptual design review(s) of the concepts that will be carried to the design phase are
conducted at this point. These design reviews are also useful in evaluating the current level of
the predicted reliability of the concepts being considered.
Step 4. Are Goals and Requirements Met? A comparison is made between established goals
and the predicted reliability values. If the goals are not met, continue to Step 5 where problems
and root causes are identified. If the goals are met or exceeded, approval is eventually given to
move to the design phase of the life cycle, where goals may be modified to meet customer
requirements.
Step 5. Identify Problems and Root Causes. If goals are not met, problems and root causes
should be identified. Sensitivity analyses can be conducted to direct attention to those
subsystems that have the greatest impact on the equipment reliability.
If an FMEA was developed in Step 2, use it to examine the potential failure modes identified and
to establish possible root causes.
The reliability improvement process now returns to Step 2 (reliability improvement and growth
activities are initiated). These might include:
• Adding high-level redundancy
• Using proven high reliability components and parts
• Forming partnerships with sub-tier suppliers
• Derating
Once the conceptual design improvements have been selected and incorporated, both the
functional block diagram and the reliability prediction model are re-evaluated. The model and the
data used in the model are changed to reflect the conceptual design improvements. If an FMEA
was initiated, it is also updated to reflect design changes.
Steps 2 through 5 are repeated until goals are met and approval is given to move to the design
phase of the life cycle.
At the end of concept and feasibility phase, the following objectives have been met:
• Reliability goals have been established and allocated to major subsystems
• A reliability program plan has been initiated
• Conceptual designs that form the basis of the equipment design are determined
• Feasibility that selected conceptual designs will meet goals is demonstrated
Table 3-2 summarizes the activities associated with applying the reliability improvement process
to the concept and feasibility phase. There are three designators used for the activities:

31
E(engineering), D(data), T(testing). These designators followed by a number provides the

location of the activity in Section 3.0.
Table 3-2. Reliability Improvement Process Activities

Reliability
Improvement Activities
Process Step
1. Establish Goals and - Establish reliability goals (E1)

Requirements
- Create reliability program plan (E2)
2. Reliability - Develop functional block diagrams (E3)
Engineering and
- Create preliminary reliability model (E4)
Improvements
- Allocate reliability goals (E5)
- Collect historical failure data (D1)
- Develop preliminary FMEA (E14)
- Develop preliminary Life Cycle Cost (AT19)
3. Conduct Evaluation - Preliminary prediction of equipment reliability (E6)
- Conceptual design review(s) (E7)
4. Are Goals and - Compare goals to predicted reliability values
Requirements Met?
- If goals are not met, continue to Step 5
- If goals are met move to design phase of life cycle
5. Identify Problems - Perform sensitivity analyses using reliability model (E8)
and Root Causes
Design
Step 1. Establish Goals and RequirementsGoals and Requirements. The reliability goals
established in the concept and feasibility phase of the life cycle are modified and become
reliability requirements in the design phase. Requirements need to be well-defined so that they
are understandable by design engineers and manufacturers. Requirements should be broad in
nature and be both qualitative (e.g., definition of responsibilities and program requirements) and
quantitative (e.g., mean time between failures and uptime).
Concept/Feasibility
ä Design
Prototype (α-site)
Phase Out

32
System level requirements are allocated to major subsystems and components.

Once reliability requirements have been established, the reliability program plan is updated to
reflect these requirements.
Step 2. Reliability Engineering and ImprovementsEngineering and Improvements.
Design-for-reliability practices are applied at this step in the improvement process. Application
of design-for-reliability practices creates a proactive environment for the design team. Some of
the more basic practices include:
• Simplicity. Simplification of equipment configuration is one of the basic
principles of designing-for-reliability. Added parts or features increase the
number of failure modes. A common practice in simplification is referred to as
component integration, which is the use of a single component to perform
multiple functions.
• Proven Components. To the extent possible, designers should use components
that have been shown to work in similar applications. Using proven components
can minimize analyses and testing to demonstrate reliability of equipment.
• Derating. Derating is the practice of using components or materials at environ-
mental conditions or loads that are less severe than their limiting condition.
Under these conditions, the component or material is expected to be more reliable.
• Redundancy. Another reliability improvement practice is to include more than
one method for accomplishing a function by having certain components or
subassemblies in parallel, rather than in series. Beyond a certain point,
redundancy may be the only cost-effective way to design reliable equipment.
• Failure Detection. Reliability of equipment can be improved by incorporating
failure detection methods such as automatic sensing and switching devices.
• Ergonomics or Human Factors Engineering. The equipment design must
consider human factors aspects such as the person-machine interface, human
reliability, and maintainability.
The functional block diagram is updated as the design develops. The gross reliability model,
which consists of major subsystems, is expanded. Each subsystem is broken into more detail.
For example, a wafer handler subsystem could be categorized into software, electronics, arm, and
casing components. The reliability allocated to a subsystem is further allocated to the component
level. As was the case in the concept and feasibility phase, this allocation is based on limited
information available during the early phases of the life cycle; it is used as a guide when
developing the various designs. As the design progresses, the allocation becomes finalized.
If an FMEA was not developed in the concept and feasibility phase of the life cycle, initiate it in
this phase.
As was the case in the concept and feasibility phase, equipment in the design phase has not yet
been built, so actual component failure data may not be available. Here again, historical data can
be used for those components that are similar to previous generations of equipment. Use
standard handbooks (such as MIL-HDBK-217[1] or NPRD-91 Handbook[2]), or expert opinion
to obtain data for those components where no historical data is available.

33
If a critical component is used for the first time and the life data is not available, run a simulated
life test to generate the life data under the expected use conditions.
Step 3. Conduct Evaluation. Use the subsystem and component failure data, and the updated
reliability prediction model, to evaluate the reliability of the current equipment design. As was
the case in the concept and feasibility phase, evaluate the following:
• Data sources and their validity
• Predicted versus the anticipated reliability value
• Historical and expert opinion data used in determining equipment reliability
• Reliability prediction model
Conduct design review(s) of the design(s) that will be carried to the prototype phase at this time.
These reviews are often broken down into:
• Requirements Review - review the equipment’s design requirements
• Preliminary Design Review - evaluate the preliminary design against requirements
• Critical Design Review - provide design to the customer(s) for review
Step 4. Are Goals and Requirements Met? Compare the reliability requirements and the
predicted reliability values. If requirements are not met, continue to Step 5 where problems and
root causes are identified. If requirements are met, approval is given to move to the prototype
phase of the life cycle.
Step 5. Identify Problems and Root Causes. If requirements are not met, sensitivity analyses
can be conducted to direct attention to those subsystems and components that have the greatest
impact on the equipment reliability. Evaluate the FMEA that was developed in Step 2 to
determine potential failure modes of the subsystems and components.
The process now returns to Step 2, where reliability improvement activities are initiated.
Steps 2 through 5 are repeated until requirements are met. Approval can then be given
to move to the prototype phase of the life cycle.
At the end of the design phase, the following objectives have been met:
• The core architecture of the equipment design has been finalized
• Design(s) have been chosen for prototype

34
Table 3-3 summarizes the activities associated with applying each step of the reliability
improvement process to the design phase.
Table 3-3. Reliability Improvement Process Activities2-3. Reliability Improvement Process

Activities for the Design Phase
Reliability Improve-
Activities
ment Process Step
1. Establish Goals - Modify goals to match customer requirements(E1)

and Requirements
- Update reliability program plan (E2)
2. Reliability - Apply design-for-reliability practices (E9)
Engineering and
- Update functional block diagram (E3)
Improvements
- Expand reliability model to include more detailed subsystems (E4)
- Allocate subsystem requirements to subsystem components (E5)
- Collect failure data for components within subsystems (D1)
- Evaluate reliability of purchased components (E11)
- Run life test on new and critical components (AT18)
- Update Life Cycle Cost (AT19)
- Perform ergonomics and human factors studies (E12)
- Conduct software reliability studies (E13)
- Implement FMEA (E16)
3. Conduct - Predict equipment reliability (E6)
Evaluation
- Conduct design reviews (E7)
4. Are Goals and - Compare reliability requirements to predicted values
Requirements Met?
- If requirements are not met, continue to Step 5
- If requirements are met, move to prototype phase of life cycle
5. Identify Problems - Perform sensitivity analyses (E8)
and Root Causes
- Evaluate FMEA (E14)
Prototype
Step 1. Establish Goals and RequirementsGoals and Requirements. At this point in the life
cycle, requirements have been established and little remains to be done other than to upgrade
these as the design moves toward completion and prototypes are built. Modeling, as well as
failure data analyses can be used to appraise current equipment reliability levels and evaluate
what levels are achievable.

35
Concept/Feasibility
Design
ä Prototype (α-site)
Phase Out
As was the case in the previous two phases, the reliability program plan is updated.
Step 2. Reliability Engineering and ImprovementsEngineering and Improvements. The
functional block diagram is again updated in the prototype phase to reflect any design changes.
Subsystems and components having the greatest impact on equipment reliability are further
expanded in the reliability prediction model. If reliability requirements were revised in Step 1,
re-allocation to major subsystems and components may be necessary. For those subsystems and
components that are modeled in more detail, reliability allocations need to be made to lower
levels. If more than one prototype is built, a reliability model for each prototype design may be
needed.
Conduct a test to generate subsystem and system level reliability data for each of the prototypes.
Aspects of the test program that are considered include:
• Test objectives
• Test parameters
• Test sample size
• Test duration
• Test environments
Component tests are useful for identifying basic weaknesses in critical components, whereas
system tests are useful in exploring the effects of components interactions. Results from
component tests alone should not be used for predicting system reliability performance, since
component tests rarely duplicate system interactions.
A failure reporting and corrective action system (FRACAS) can be initiated to record failure data
gathered during the testing program. The FRACAS is a closed-loop reporting system that is
useful in:
• Identifying failures and establishing a historical data base
• Analyzing failures to determine the cause
• Documenting the corrective action required to minimize reoccurrence of the
failures
Maximum benefits from a FRACAS are realized when it is implemented early in a test program
and is directly coupled to the modeling effort. Failures identified during in-house testing (e.g.,
prototype tests) are easier to analyze than failures in the field. Furthermore, it is more cost
effective to identify and correct failures earlier in the life cycle.

36
The actual failure modes that are uncovered during testing, should be recorded in the FRACAS,
and compared to the predicted failure modes established in the FMEA. Where difference occur,
the reasons should be identified.
Step 3. Conduct EvaluationEvaluation. Reliability of the various prototypes is evaluated
based on the test data.
Results of the prototype test are then presented for a design review prior to pilot production.
Step 4. Are Goals and Requirements Met?Goals and Requirements Met? Compare the
results of the testing of the prototype(s) to the requirements to see if they have been met. If the
requirements are not met, move to Step 5, where problems and root causes are identified. If
requirements are met, then a design review is performed, including a management go/no go
decision to continue to the pilot production phase of the life cycle.
Step 5. Identify Problems and Root CausesProblems and Root Causes. A sensitivity
analysis is conducted to direct attention to those subsystems and components that have the
greatest impact on the equipment reliability. Root causes of the failures recorded in the
FRACAS are identified and corrective actions implemented. A more detailed failure analysis
might also be performed on those subsystems and components that are failing at a significantly
higher rate than previously anticipated.
The process now returns to Step 2, where improvement activities are initiated. If a FRACAS was
initiated, it might identify corrective actions that could be implemented to eliminate failures.
Other possibilities include:
• Derating
• Procedural changes
• Process changes
A preventive maintenance (PM) program can be developed for subsystems and components that
degrade equipment performance. Partnerships established with suppliers are continually
nurtured and purchased subsystems and components are continually evaluated. Human
capabilities and limitations are considered and changes are made to the equipment to eliminate
failures due to human errors. The software reliability program is continued. For critical
subsystems and components, the optimal operating range is found and the impact of the optimal
range on other components is evaluated.
Steps 2 through 5 are repeated until requirements are met. Approval can then be given to move
to the pilot production phase of the life cycle.
At the end of the prototype phase, the following objectives have been met:
• The prototype(s) has been tested and evaluated to determine its capability of
achieving the requirements. This includes redesigning and re-evaluating until a
go/no go decision is reached
• The core subsystem and component designs are finalized.
to the prototype phase.

37
Table 3-4. Reliability Improvement Process Activities for the Prototype Phase
ment Process Step Activities
1. Establish Goals - Update reliability requirements (E1)
and Requirements - Update reliability program plan (E2)
2. Reliability - Update functional block diagram (E3)
Engineering and - Expand reliability model, as needed (E4)
Improvements
- Re-allocate subsystem and component reliability requirements (E5)
- Establish test plan (T1)
- Conduct Prototype test (T2)
- Establish FRACAS (E17)
- Perform human reliability analysis (D2)
- Develop preventive maintenance program (E10)
- Continue to evaluate the reliability of purchased components (E11)
- Perform ergonomics studies (E12)
3. Conduct - Evaluate prototype reliability (T2)
Evaluation - Conduct design review(s) (E7)
Requirements Met? - If requirements are not met, continue to Step 5
- If requirements are met move to pilot production phase of life cycle
and Root Causes - Evaluate FRACAS to identify problems and root causes (E17)
- Evaluate FMEA to identify potential failure modes (E14)
- Perform failure analyses on critical components (E16)
Pilot ProductionProduction
Step 1. Establish Goals and RequirementsGoals and Requirements. During the pilot
production phase, upgrades are made to goals and requirements, as appropriate, and the reliability
program plan is updated to reflect these, as well as other, changes. Modeling and failure data
analyses are used to assess current and potential levels of equipment performance.
Concept/Feasibility
Design
Prototype (α-site)
ä Pilot Production (β-site)
Phase Out

38
Step 2. Reliability Engineering and ImprovementsEngineering and Improvements.

Functional block diagrams and the reliability model are once again updated to reflect any changes
that occurred during the prototype phase. If a FRACAS was not implemented during the
prototype phase, then it should be done at this time.
The test program is evaluated and updated as needed. Any aspects of the test program that are
not clearly defined during the prototype phase should be established here. Additional tests that
should be implemented at this time are:
• Burn-in tests
• Reliability qualification tests (RQT)
Burn-in tests are useful in identifying weak components or subsystems prior to field use.
An RQT is useful in initial customer applications of the equipment to evaluate equipment
performance in actual operating environments. The RQT is also useful in verifying compliance
with contractual objectives; whereby, equipment is tested according to a predetermined plan
under specified environmental conditions and pass/fail criteria prior to a full-scale production
decision[3]. Testing equipment in an environment that represents usage throughout its service
life allows for establishing reasonable correlations between test results and actual field
experience.
The manufacturing processes should be qualified at this time to avoid the manufacturing
problems identified during the pilot production. Qualifying manufacturing processes before
full-scale production reduces manufacturing costs and prevents equipment performance
degradation[4]. Qualifying manufacturing processes includes:
• Performing a process capability study
• Establishing process control
• Monitoring the defect level
• Reducing the defect level
• Periodically assessing and controlling the processes[5]
Both new and existing manufacturing processes should be requalified periodically to ensure
requirements are maintained. Personnel involved in the manufacturing process should be
properly trained before introduction of the equipment.
Step 3. Conduct EvaluationEvaluation. The pilot production phase of the life cycle is
generally the first time equipment is evaluated in a customer environment. Thus, reliability
modeling and prototype testing, engineers should work closely with customer service and field
service personnel to evaluate initial customer applications of the equipment to evaluate its
performance in actual operating environments. A reliability qualification test (RQT) is
performed to verify compliance with contractual objectives.
Problems and failures occurring during testing should be carefully analyzed, and
recommendations for corrective action should be issued as part of the FRACAS. Failure modes
identified in the FMEA are compared to reported failures during testing. Differences that occur
should be analyzed.
Definitions of failures should be issued, and pass-fail criteria should be established. Failures
generally fall into four categories[5]:

39
1. Catastrophic/Hard failures - failures that are permanent. For equipment, these

failures reflect an irreversible physical change. These failures are easily identified
and replicated.
2. Marginal failures - failures that are due to dirty or degraded performance of the
critical components. The equipment is operational, but the output is not within
the acceptable limits.
3. Intermittent failures - failures that only occur due to unstable equipment or
varying software conditions. Intermittent failures occur randomly and are difficult
to replicate.
4. Soft failures - failures that result from temporary environmental conditions. Like
intermittent failures, soft failures occur randomly and are difficult to replicate.
The pilot production phase provides the last opportunity to make design changes and other
improvements before full-scale production begins.
Step 4. Are Goals and Requirements Met?Goals and Requirements Met? Results of field
testing are compared to requirements to determine if they are met. If requirements are not met,
the process moves to Step 5 where problems and root causes are identified. If requirements are
met, a design review is conducted, and a go/no go decision to continue to the production and
operation phase of the life cycle is made.
Step 5. Identify Problems and Root CausesProblems and Root Causes. Sensitivity analyses,
as well as feedback from a FRACAS and FMEA, are used to direct attention to problem areas
and root causes. Techniques such as a Pareto analysis can assist in focusing on addressing major
problems first, and then working to lower level problems later.
The process now returns to Step 2, where improvement activities and corrective actions are
initiated. Steps 2 through 5 are repeated until requirements are met. Approval can then be given
to move to the production and operation phase of the life cycle.
At the end of the pilot production phase of the life cycle, the following objectives have been met:
• Capability of the pilot production design is tested and evaluated to determine if
the design can achieve the end use requirements in the customer’s operating
environment.
• The equipment design for full-scale production and deployment is finalized.
to the pilot production phase of the life cycle.

40
Table 3-5. Reliability Improvement Process Activities for the Pilot Production Phase
Reliability
Process Step
1. Establish Goals - Update reliability requirements, as needed (E1)
and Requirements - Update reliability program plan (E2)
2. Reliability - Update functional block diagram, if needed (E3)
Engineering and - Update reliability model, if needed (E4)
Improvements
- Re-allocate reliability requirements, as needed (E5)
- Upgrade testing program, as needed (T1)
- Implement FRACAS, if not already done (E17)
- Perform human reliability analyses (D2)
- Perform software reliability studies (E13)
- Perform ergonomic studies (E12)
- Update preventive maintenance program, as needed (E10)
- Continue to evaluate reliability of purchased components (E11)
3. Conduct - Conduct tests of equipment (T2)
Evaluation - Evaluate equipment reliability (E6)
- Conduct design review(s) (E7)
4. Are Goals and - Compare reliability requirements to observed values
- If requirements are met move to production & operations phase of life cycle
and Root Causes - Evaluate FRACAS (E17)
5
Step 1. Establish Goals and Requirements. Final updates to reliability requirements and the
reliability program plan are made at this point. All major reliability problems should have been
identified and corrected prior to full-scale production and deployment of the equipment.
Concept/Feasibility
Design
Prototype (α-site)
ä Production/Operation
Phase Out

41
Step 2. Reliability Engineering and Improvements. Functional block diagrams and the
reliability model are updated to reflect any design changes that occurred during the pilot
production phase. The FRACAS data base is updated to reflect failure modes uncovered during
pilot production testing. The observed failures are also used to update the reliability model.
A field tracking and customer feedback program is initiated to record operation and maintenance
problems in the field. This information should account for uncertainty due to variations in site,
equipment vintage, and customer procedures.
Step 3. Conduct EvaluationEvaluation. Evaluation of the equipment’s performance at this
point consists primarily of feedback from maintenance records. However, the effect of the
pending corrective actions should be counted to predict the equipments future performance.
Step 4. Are Goals and Requirements Met?Goals and Requirements Met? Here again, if
requirements are not being met, then problems and root causes are identified in Step 5. If
requirements are being met, then it is important to continually monitor equipment performance
and to implement a process of continuous improvement until decisions are made to phase out the
current generation of equipment and begin development of the next generation.
Step 5. Identify Problems and Root CausesProblems and Root Causes. Failures and
problems reported during full-scale production and deployment in the field are fed through the
FRACAS to verify the failure(s) and to identify root causes and corrective actions. Pareto
analyses can be used to prioritize problems.
The process now returns to Step 2, where improvements and corrective actions are implemented.
Steps 2 through 5 are repeated until requirements are met.
At the end of the equipment’s production and operation phase, the following objectives have been
met:
• The equipment is manufactured in a manner that uniformly meets the customer
and supplier requirements.
• Continuous improvement goals and requirements are established and
demonstrated.
to the production and operation phase of the life cycle.

42
Table 3-6. Reliability Improvement Process Activities for the Production and
Operation Phase
1. Establish Goals - Final update of reliability requirements, if needed (E1)
and Requirements - Final update of reliability program plan (E2)
2. Reliability - Update FRACAS data base (E17)
Engineering and - Implement field tracking, customer feedback (D1) and corrective action
Improvements program
- Update human reliability analyses (D2)
- Update software reliability studies (E13)
- Update ergonomic studies (E12)
- Update preventive maintenance program, as needed (E10)
- Continue to evaluate reliability of purchased components (E11)
- Update Life Cycle Cost, if required (AT19)
3. Conduct - Assess equipment reliability based on the field data(E6)
Evaluation - Evaluate feedback from field tracking and maintenance records (D1)
4. Are Goals and - Compare requirements to observed values
- If requirements are met:
* Continually monitor equipment performance
* Implement process of continuous improvement
* Revise goals and requirements, as appropriate (E1)
* Eventually phase out current generation equipment
and Root Causes - Perform failure analyses on field failures (E16)
Phase Out
Step 1. Establish Goals and RequirementsGoals and Requirements. At this point in the life
cycle, there are no goals or requirements to establish. A general goal would be to set
requirements for subsystems and components to be carried over to the next generation of
equipment. Also, it is important to have documented and retained all the information gained
during the life cycle phases of the current generation of equipment so that similar mistakes will
not be repeated.

43
Concept/Feasibility
Design
Prototype (α-site)
ä Phase Out
Step 2. Reliability Engineering and ImprovementsEngineering and Improvements. There

are no reliability engineering or reliability improvements to be made at this point. Phase-out
alternatives should be offered to customers of current generation equipment. Possible
alternatives might include:
• Training and spare parts availability for current generation equipment
• Trade-ins on new generation equipment (customer discounts)
• Inventory of current generation equipment could be phased out in stages such as:
• Stage 1 - where spare parts requirements are maintained
• Stage 2 - where spare parts are sold to customers who still want them (last chance)
• Stage 3 - where remaining spare parts are scrapped
Step 3. Conduct EvaluationEvaluation. At this point, there is nothing to evaluate except the
past performance of the generation of equipment being phased out. The failure rate database of
the subsystems and components is being carried over to the next generation of equipment for
future reliability modeling.
Step 4. Are Goals and Requirements Met? Since no goals or requirements have been
established, there are none to compare.
Step 5. Identify Problems and Root Causes. As previously mentioned, it is important to retain
all information on the performance of the equipment being phased out so that the information can
be used to improve future generations of similar or new equipment.
At the end of the phase-out phase of the life cycle, the following objectives have been met:
• The discontinuation of production and field support is planned and implemented
in a manner that satisfies both the customer and supplier needs.
• Subsystems and components carried over to the next generation of equipment are
evaluated for information that will cause an improvement in the next generation.
• A failure rate database has been developed for subsystems and components for the
next generation of equipment.
Table 3-7 summarizes the activities involved in applying the reliability improvement process to
the phase out phase of the life cycle.

44
Table 3-7. Reliability Improvement Process Activities for the Phase–Out Phase2-7.
Reliability Improvement Process Activities for the Phase–Out Phase
Reliability Improvement
Process Step Activities
1. Establish Goals and - Set requirements for subsystems and components to be carried to next generation
Requirements of equipment
- Document and retain all information gathered during generation of equipment
being phased out
2. Reliability Engineering - Offer phase-out alternatives to customers of equipment being phased out
and Improvements - Phase out current generation equipment in stages
3. Conduct Evaluation - Assess reliability of the current generation(E6) and carried information to next
generation of equipment.
4. Are Goals and - There are no goals or requirements to meet
Requirements Met?
5. Identify Problems and - Retain all information on equipment being phased out so that it can be used in
Root Causes future generations of equipment
3.4 Specific Applications of the Reliability Improvement Process

When applying the reliability improvement process for the first time to equipment in some
advanced phase (other than concept and feasibility) of the life cycle, the activities will vary from
those discussed earlier. This is because the activities that would have been performed in the
previous life cycle phase(s) have not been performed and must, to some extent, be made up.
The discussion in the following paragraphs is based on starting the reliability improvement
process in some phase of the life cycle other than the concept and feasibility phase, and then
continuously applying it throughout the remainder of the phases. For example, if the reliability
improvement process is being applied for the first time to equipment that is already in the
prototype phase of its life cycle, then activities associated with each step of the process for that
phase and all subsequent phases (pilot production, production and operation, and phase out) are
considered. The activities associated with applying the reliability improvement process to phases
beyond the phase in which the process is being initiated are, however, basically the same as those
discussed earlier. Furthermore, this discussion is similar to the earlier discussions that involved
the application of the improvement process. Therefore, every process improvement step in every
life cycle phase is not discussed in detail. Only the differences are highlighted.
3.4.1 Starting with Equipment in the Design Phasewith Equipment in the Design Phase
When equipment has reached the design phase, the basic concept has already been established
and is fixed in the minds of the design engineers. It is more difficult to incorporate customer
needs into the design in this phase than in the concept and feasibility phase. However, it is not
too late and is clearly important, to incorporate customer needs and requirements when
establishing reliability goals.
If a reliability program plan has not been initiated, do so at this time.

45
If the functional block diagrams and the corresponding reliability model were not initiated in the
concept and feasibility phase, develop them now. Equipment reliability requirements are then
allocated to individual major subsystems in the model. Failure data are collected for use in the
reliability model.
Other activities associated with applying the reliability improvement process to the remainder of
the process steps and life cycle phases are identical to those discussed earlier and are listed in
Tables 3-4 through 3-7. Therefore, they are not listed again here.
to equipment that is in the design phase. The activities listed in Table 3-8 are similar to those
listed in Table 3-3; the difference is in the activities listed under Steps 1 and 2.
Table 3-8. Design Phase Reliability Improvement Process Activities

Reliability Improvement Activities
Process Step
1. Establish Goals and - Establish reliability goals and requirements (E1)
Requirements - Establish reliability program plan (E2)
2. Reliability Engineering - Apply design-for-reliability practices (E9)
and Improvements - Develop functional block diagram (E3)
- Develop reliability model (E4)
- Allocate requirements to subsystems and components (E5)
- Collect failure data for subsystems and components (D1)
- Implement FMEA (E16)
- Develop life Cycle Cost (AT19)
3. Conduct Evaluation - Predict equipment reliability (E6)
- Conduct design reviews (E7)
- If requirements are met move to prototype phase of life cycle
- Perform sensitivity analyses (E8)

46
3.4.2 Starting with Equipment in the Prototype Phase

For equipment already in the prototype phase of the life cycle, the design is fixed. There is little
opportunity to make major design changes due to cost and time constraints. However, it is still
important to set goals and to understand and establish customer requirements. Furthermore,
available failure data can be used to assess the current performance of the equipment for
establishing upgrades to goals and requirements.
If, a reliability program plan has not been developed, create one that identifies and ties together
all of the reliability improvement process activities that will be performed during the prototype
phase and subsequent phases of the life cycle.
Develop the functional block diagrams and reliability models to better understand and predict the
reliability of equipment designs being prototyped. Update these model(s) as the design changes
but realize that the models may become more complex as the design evolves. Develop detailed
breakdowns of the subsystems that are significant contributors to system unreliability.
Allocate reliability requirements to the individual subsystems. The subsystem allocations are
then further divided into component allocations. The allocation process is used as a guide for
improving the reliability of the equipment components and subsystems.
to equipment that is in the prototype phase. The activities associated with applying the reliability
improvement process to the remainder of the life cycle phases are identical to those discussed
earlier and listed in Tables 2-5 through 2-7. Therefore, details are not listed here.

47
Table 3-9. Prototype Phase Reliability Improvement Process Activities

Requirements - Establish reliability program plan (E2)
2. Reliability - Create functional block diagram (E3)
Engineering and - Create reliability model (E4)
Improvements
- Allocate reliability requirements to subsystems and components (E5)
- Establish test plan (T1)
- Establish data collection program (D1)
- Establish FMEA (E16)
- Perform human reliability analysis (D2)
- Develop preventive maintenance program (E10)
- Continue to evaluate the reliability of purchased components (E11)
- Develop Life Cycle Cost (AT19)
3. Conduct Evaluation - Test prototype(s) (T2)
- Evaluate prototype reliability (E6)
- Conduct design review(s) (E7)
- If requirements are met move to pilot production phase of life cycle
and Root Causes - Evaluate FRACAS (E17)
3.4.3 Starting with Equipment in the Pilot Production Phasewith Equipment in the Pilot
Production Phase
For equipment in the pilot production phase of the life cycle, the focus should be on appraising
the actual level of equipment reliability (from available data) and determining what levels are
desired and obtainable. This is still an important step in the environment of customer
requirements.
A reliability program plan can still be created to identify and tie together all of the reliability
improvement process activities that will be performed during the pilot production phase and
subsequent phases of the equipment life cycle.

48
The majority of this effort should be directed at making needed design improvements once the
equipment is evaluated. It is not too late to incorporate some design-for-reliability practices.
The focus should be on reliability growth activities directed at the existing design.
A method for collecting, tracking, and storing reliability data should be established. A FRACAS
can be initiated and used to track reported failures during pilot production, and to identify
corrective actions necessary to eliminate these failures. It is still not too late to initiate an FMEA.
Ergonomic studies can be used very effectively at this point.
Table 3-10 summarizes the activities associated with applying the reliability improvement
process to equipment starting in the pilot production phase.
Table 3-10. Pilot Production Phase Reliability Improvement Process Activities When
Initiated In Pilot Production Phase
Reliability Improvement Activities
Process Step

Requirements
- Establish reliability program plan (E2)
2. Reliability Engineering · Create functional block diagram (E3)
and Improvements
· Create reliability model (E4)
· Allocate reliability goals and requirements (E5)
· Establish data collection and tracking system (D1)
· Establish testing program (T1)
· Establish FRACAS (E17)
· Establish FMEA (E16)
· Perform human reliability analyses (D2)
· Perform ergonomic studies (E12)
· Perform software reliability studies (E13)
· Establish preventive maintenance program (E10)
· Evaluate reliability of purchased components (E11)
3. Conduct Evaluation · Evaluate equipment reliability (E6)
· Conduct tests of equipment (T2)
· Conduct design review(s) (E7)
4. Are Goals and · Compare goals and requirements to observed values
Requirements Met?
- If requirements are not met, continue to Step 5
- If requirements are met move to production & operation phase
5. Identify Problems and · Perform sensitivity analyses (E8)
Root Causes
· Evaluate FRACAS (E17)
· Evaluate FMEA (E14)
· Perform failure analyses on critical components (E16)

49
3.4.4 Starting with Equipment in the Production and Operation Phasewith Equipment in
the Production and Operation Phase
For equipment in the production and operation phase of the life cycle, the design is fixed. There
is no opportunity to make major design changes at this time. Thus, the focus of Step 1 should be
on appraising the actual level of reliability of equipment in this phase, and evaluating the levels
that are desired and whether these levels are achievable. Upgrades to existing equipment can be
made based on failure data analyses.
Although rather late in the life cycle, creating a reliability program plan to track the activities to
be performed during this phase and the phase out period of the life cycle is still beneficial.
Efforts should focus on making needed improvements to the existing design and on reliability
growth activities since it is too late to design reliability into the system.
Table 3-11 summarizes the activities associated with applying the reliability improvement
process to equipment that is in the production and operation phase of the life cycle. The
activities associated with applying the improvement process to the phase–out phase of the life
cycle are identical to those discussed earlier and listed in Table 3-7 and, therefore, are not listed
here.

50
Table 3-11. Production and Operation Phase Reliability Improvement Process

Activities When Initiated in Production and Operation Phase
Reliability
Process Step
1. Establish Goals - Establish reliability goals and requirements (E1)
and Requirements - Establish reliability program plan (E2)
2. Reliability - Develop functional block diagram (E3)
Engineering and - Create reliability model (E4)
Improvements
- Allocate goals and requirements (E5)
- Establish FMEA (E14)
- Implement field tracking and customer feedback program (D1)
- Perform human reliability analyses (D2)
- Perform software reliability studies (E13
- Establish preventive maintenance program (E10)
3. Conduct - Assess equipment reliability using the field data (E6)
Evaluation - Evaluate feedback from field tracking and maintenance records (D1)
- Use FRACAS to evaluate field failures (E17)
4. Are Goals and - Compare goals and requirements to observed values
- If requirements are met:
* Continually monitor equipment performance
* Implement process of continuous improvement
* Eventually phase out current generation equipment
5. Identify Problems · Perform sensitivity analyses (E8)
and Root Cause · Perform failure analyses (E16)
3.4.5 Starting with Equipment in Phase Out Phase with Equipment in Phase Out Phase
It is much too late to make any changes to the equipment during the phase-out phase. The goal in
this phase is limited to collecting the reliability data of the equipment in order to gain insight into
the next generation of equipment. This information can save tremendous amounts of time and
money in the concept and feasibility phase of the next generation.
There are no reliability engineering or reliability improvements to be made at this point. Phase-
out alternatives should be offered to customers of current generation equipment.
Table 3-12 summarizes the activities involved in applying the reliability improvement process to
equipment that is in the phase-out phase of the life cycle. This table is identical to Table 3-7.

51
Table 3-12. Phase Out Phase Reliability Improvement Process Activities When
Initiated in Phase-Out Phase
1. Establish Goals and - Set requirements for subsystems and components to be carried to next
Requirements generation of equipment
- Document and retain all information gathered during generation of equipment
being phased out
2. Reliability - Offer phase-out alternatives to customers of equipment being phased out
Engineering and - Phase out current generation equipment in stages
Improvements
3. Conduct Evaluation - Create reliability model of subsystems and components carried to next
generation equipment (E4)
4. Are Goals and - There are no goals or requirements to meet
Requirements Met?
5. Identify Problems and - Retain all information on equipment being phased out so that it can be used in
Root Causes future generations of equipment
3.5 Functional ResponsibilitiesResponsibilities

The executive and technical reliability champions have responsibility for ownership of the
reliability improvement process. However, various groups are assigned responsibility for
implementing and maintaining the reliability improvement process activities during the life cycle
of a piece of equipment. The type of group that should be held accountable depends on the
particular life cycle phase and the activity being performed. Both managers and engineers are
given responsibility for activities.
Although a particular group has been assigned overall responsibility for an activity, other groups
may actually provide assistance or perform the activity. Because each company has a unique
management structure, the reliability champion’s responsibilities include choosing the
appropriate groups to assist, participate, and own each activity.
For companies that have a reliability engineering group, the following paragraphs present
recommended practices and organizational guidelines that will help make the reliability
improvement process activities successful.
Recommended practices for reliability engineers:
• The engineering group and designers (not the reliability engineers) are
accountable for the reliability of the design and the cost of poor reliability,
• All designers are trained in basic reliability methods and tools by the reliability
group
• Reliability engineers are part of the design team
• Reliability engineers assist designers
• The reliability group is accountable for reliability planning, program development,
and assuring adherence to program policy

52
Organizational guidelines for reliability engineering group:

• The group reports to development engineering manager, not to quality assurance
• The group reports to the systems engineering manager, not to field service
• Reliability engineer(s) report to the program manager of equipment with other
members of the design team not to operations
• The group exists as a separate peer group with engineering (Caution: this can lead
to reliability engineers being accountable for reliability and becoming isolated
from the design team), not part of sales
3.6 Where to Begin

One of the most difficult problems facing a company is where to begin. In an ideal environment,
a reliability program would evolve along with the formation of the company and the development
of its first product. A master plan for continuous reliability improvement would have been
established, and reliability activities would have been initiated as needed throughout the
equipment’s life cycle.
In a more typical situation, a company has an informal reliability effort. This effort may be
applied sporadically, based on the personal style and management priorities of the equipment
development manager. If the company’s equipment has poor reliability in the field, a major
engineering project may be initiated to fix specific reliability problems. Otherwise, the company
faces losing business to the competition.
The management team frequently does not recognize the need for or require development of a
core reliability program that ensures ongoing attention to reliability requirements for all
equipment. Even if management recognizes the need for the reliability process, they often find
themselves in a reactive mode with current equipment problems and limited resources. Often,
management may not be willing to wait for the benefits of a reliability program that is developed
at the same time as its next product.
Although each company’s situation is unique, there are some general guidelines that can be used
to determine where implementation of a reliability program would be most effective.
The first step involves assessing where in the life cycle each equipment line falls, and
determining its current reliability performance. The ultimate goal is to choose one equipment
line on which to focus reliability improvement activities. Obviously, the earlier in the equipment
life cycle reliability improvement activities are implemented, the greater the benefits.
It is likely that a supplier will be developing more than one equipment line at any given time,
each of which is in a different phase of its life cycle. For example, Figure 2-1 shows three
equipment lines, each of which is in a different phase of its life cycle.
• Equipment A is in full production
• Equipment B is in the design phase
• Equipment C is just beginning the concept and feasibility phase
Benefits can be gained by applying the reliability improvement process to any of these three
equipment lines. However, there are optimal situations to be aware of.

53
Figure 3-1. Multiple Equipment and Their Life Cycle Phase Status
Equipment C
Concept Production
and Phase
Design ´-Site ´-Site and
Feasibility Out
Operation
Equipment B
Concept Production
and Design Phase
´-Site ´-Site and
Feasibility Out
Operation
Equipment A
Concept Production
Phase
and Design ´-Site ´-Site and
Out
Feasibility Operation
Today Time
Equipment C has the greatest potential for cost-effective improvements in reliability because it is
in the earliest phase of its life cycle. However, this does not mean that it is too late to improve
the reliability of Equipment A and B. Reliability improvements can and should be considered in
every phase of the life cycle. However, when starting a reliability improvement process, it is
generally advantageous to choose equipment that will show immediate successes. If sufficient
resources exist, address all equipment in all life cycle phases. Because it is unlikely that this is
the situation, the following priorities are recommended:
1. Equipment in the Production and Operation Phase. Although this is a reactive
strategy, it is the most customer oriented, and is capable of demonstrating quick
benefits. Another benefit of starting with equipment in this phase is that data on
the equipment in the field is available and can be used to determine current
reliability performance. If you are unable to determine your current situation, it
is difficult to set realistic goals and determine whether they have been met. It is
also important to assess the impact of upgrades to equipment in this phase using
the reliability model and existing failure data.

54
2. Equipment in the Design Phase. This is a proactive strategy and has the greatest
long-term benefits. In this phase, it is difficult to determine what the reliability
performance of the equipment will be unless the previous generation has a
database and a significant number of similar parts. If this information exists, it
can be used with modeling to evaluate potential performance of designs being
considered.
3. Equipment in the Prototype or Pilot Production Phase. These phases are
reactive and have benefits between the prior two stages. There is some amount
of data available; therefore, the anticipated reliability performance of the
equipment in the field can be determined. The drawback with these phases is the
expense and time involved if major design changes are necessary.
4. Equipment in the Concept and Feasibility Phase. This is a proactive and the
least expensive phase. Significant reliability improvements can be made to
equipment in this phase with minimal use of resources. However, as with the
design phase, the lack of data makes it difficult to determine reliability
performance.
5. In general, ignore equipment in or near Phase Out. Activities should be limited
to customer requests. However, if the product that is being phased out has future
generations that are significant to the company’s strategic plan, collecting data
and analyzing failures of the product will yield tremendous insight into
development of the next generation.
When making a choice, choose equipment that you know will have future generations. As
mentioned in Section 1.0, the cost of improving equipment reliability will decrease as it moves
from generation to generation.
Knowing the reliability performance of existing equipment is essential to evaluating current
equipment status and for setting reliability goals for current and future equipment. It is difficult to
set realistic and attainable performance goals without this knowledge. Table 2-13 illustrates the
type of reliability performance information that is available for the three equipment lines shown
in Figure 2-1.
Table 3-13. Current Product Line Status

Equipment A B C
Current Life Cycle Phase Production and Design Concept and
Operation Feasibility
Current Reliability Performance Actual - MTBFp Predicted - MTBF p Goal - MTBF p
Actual - MTTR
Predicted - MTTR Goal - MTTR
Mean time between failures (MTBFp) and mean time to repair (MTTR) are the two measures of
reliability performance used in this illustration. SEMI Standard E10-90[6] provides several other
measures of reliability. Table 2-13 indicates that the MTBFp and MTTR values are known for

55
Equipment A. Actual data are not available for Equipment B and C because they are in early
stages of development. However, Equipment B has predicted values based on the design and
Equipment C has goals that it is targeted to meet.
Reliability and design engineers determine current reliability performance by collecting and
analyzing data received from a number of sources, including
• Field service reports
• Customer feedback
• In-house testing
In situations where data is not available, but reliability performance needs to be determined,
preliminary engineering judgements, mathematical predictions, and consensus using the opinions
of experts can be used as a first cut at data values.
As discussed previously, one of the cornerstones of reliability improvement is the reliability data
reporting system. It is an organized means of gathering factual data about equipment
performance-both good and bad. Although useful data estimates can be determined during the
concept and feasibility phase as well as the design and development phases of the equipment life
cycle, the most meaningful data is collected during the production and operation phase, when the
equipment is operating in its intended environment. Nevertheless, information gathered in any
phase of the life cycle can be used to ensure that the reliability goals are attained with minimal
time and expense commitments.
Section 4.0 discusses in detail the activities associated with data collection and analysis. These
activities include determining:
• What data to collect
• How to use this data
• The most effective format to use when collecting data
• How to transform the data into failure rates
• How to get numerical values for human errors
It is important to note that an effective reliability improvement process includes a central
database that includes data collected for all equipment of the same model or type and accounts
for uncertainty due to variations in site, equipment vintage, and customer procedures.
3.7 Reliability Plans

The supplier should develop several reliability plans, a general company plan covering all
products, and the specific product for individual equipment lines.
The following six elements must be included in these plans:
1. Objectives
2. Constraints, limitations and requirements that exist at the time the plan is written
3. Basic assumptions made
4. Activities to be performed to meet objectives
5. Resources required to perform the planned activities
6. A schedule showing when the activities will be started and completed

56
General Company PlanCompany Plan

An overall reliability plan tailored to a company that takes into account the company’s size and
available resources; the plan addresses the following issues:
• The company’s reliability policy
• Identification of reliability champions
• The overall strategy
• How reliability skills will be acquired within the company, and
• A description of organizational activities
Specific Product Plans
Each equipment line requires a reliability plan based on the life cycle phase of the equipment
line, reliability goals and requirements, schedule limitations, and resources available. The more
stringent the goals, the more activities, tools, and resources required to achieve the goals. Also,
the shorter the schedule, the more resources that must be applied over the scheduled period. The
plan will identify the specific reliability activities and tools that will be used for a specific
equipment line, and who (or which department) is responsible for performing them.
3.8 Application of Resources and Communicating Value

There are typically two difficult problems facing an organization at this point
• Applying limited and already allocated resources to what appears to be a
monumental undertaking
• Communicating the value of the reliability improvement process to key decision
makers and participants
In an ideal environment, a master plan for continuous reliability improvement would have been
established and reliability activities would have been initiated as needed throughout the
equipment’s life cycle.
In a more typical situation, a company has an informal reliability effort. This effort may be
applied sporadically, based on the personal style and management priorities of the equipment
development manager. If the company’s equipment has poor reliability recorded in the field, a
major engineering project may be initiated to fix specific reliability problems. Otherwise, the
company faces losing business to the competition. The management team frequently does not
recognize the need for or require development of a core reliability plan that ensures ongoing
attention to reliability requirements for all equipment. Even if management recognizes the need
for the reliability process, they often find themselves in a reactive mode with current equipment
problems and limited resources. Often management may not be willing to wait for the benefits of
implementing a reliability improvement process that is developed at the same time as its next
product.
Ideally, once a piece of equipment has been selected for the reliability improvement process,
responsible individuals or groups would perform all the activities within the process steps. If
resources are limited, individuals or groups would perform selected activities. The choice of
activities depends on the company, and ultimately only the company’s people know what
resources can be successfully deployed and the best time frame for employing these activities.

57
However, the following items should be considered:

• Select activities that require various groups to work together on reliability
improvement. This extends ownership of the reliability mission and shows
success across multiple fronts.
• Initially choose activities that will give immediate benefits. Implementation of
the reliability improvement process requires a long-term sense of vision and
commitment. However, the engineer needs to "sell" management and participants
on the advantages of the activities. This generally requires some demonstration of
improvements almost immediately. If portions of an activity are already in place,
build on them.
• Specific reliability skills training should be taught to individuals as they become
directly involved and are ready to apply new skills to real issues.
• The vision of reliability for the equipment and the plan for how that reliability is
going to be met should be discussed early to orient everyone in the company to the
reliability effort.
The implementation of the reliability process as described, occurs in a somewhat piecemeal
fashion. However, this approach offers an effective means of applying limited resources to real
and timely issues. When this approach is used, it is particularly important to have a technical
champion to manage the entire equipment reliability effort. This ensures that a coherent and
well-coordinated development effort occurs.
It is best to start small; start with one piece of equipment, implementing those activities that fit
best in your company. Attempting to implement the process for all equipment simultaneously
generally does not work. Once the reliability process for one piece of equipment is in place and
the next piece of equipment is targeted for reliability improvement, find those activities that
overlap. For example, if components or subsystems in the first piece of equipment are identical
or very similar in the next piece of equipment, combine databases and reliability models for those
parts.
Communicating Value
Communicating the value of the reliability effort to key decision makers and participants is
vitally important and can be accomplished in three ways:
1. Translate the reliability efforts and benefits to measures such as cost savings,
resource or cost avoidance, time to market, or market share gain.
2. Demonstrate a series of immediate short-term improvements and document
those improvements noting the benefits gained.
3. Develop a champion in senior management who will support your reliability
efforts when top level support is needed. The champion has the respect of
decision makers and also the authority to influence and encourage participants.
3.9 Summary
The role management plays in the reliability improvement process is vital. Management has
unique responsibilities in the establishment and implementation of the process. Management
also assigns individuals to the role of reliability champions. The executive champion provides

58
reliability leadership with the full support of upper management. The technical champion
establishes the reliability improvement process and is responsible for its success.
The five steps of the reliability improvement process can be applied to a piece of equipment no
matter what phase it is in. This section discussed the activities associated with each step of the
reliability improvement process for each phase of the life cycle.
This section also included a discussion on how to select a piece of equipment to implement a
reliability program based on the life cycle phases. The section also covered the importance of
data, the choice of activities when resources are limited, rules for the reliability program plan,
and suggestions on how to communicate the value of the reliability effort to key decision makers
and participants in the reliability program.
Section 3.0 provides more detailed descriptions of the reliability-related activities and presents
some of the tools and techniques available in planning, developing, and implementing a
reliability improvement program.
3.10 References
1. MIL-HDBK-217E, Reliability Prediction of Electronics Components.
2. Non-Electronics Part Reliability Data, Reliability Analysis Center, Rome, NY,
1991.
3. RMS Committee, RMS, Reliability, Maintainability & Supportability
Guidebook, SAE G-11, Society of Automotive Engineers, Inc, Warrendale, PA,
1990.
4. DOD 4245.7-M, Transition from Development to Production, September, 1985.
5. William W. Everett, et al., Reliability by Design, A Guide to Reliability
Management, Issue 1, AT&T, Indianapolis, IN, November 1990.
6. SEMI E10-90, Guideline for Definition and Measurement of Equipment
Reliability, Availability, and Maintainability (RAM), SEMI 1990.

59
4 ACTIVITIES AND TOOLS IN THE RELIABILITY IMPROVEMENT

PROCESS
4.1 Introduction
The first two sections of these guidelines provided an overview of the reliability improvement
process and the equipment life cycle. This section provides a description of the activities and
tools that are part of the reliability improvement process. The reliability activities are grouped as:
• Engineering
• Data
• Testing
Engineering activities form the foundation of the reliability improvement process. Data activities
also play an important role because the engineering activities depend on data. Testing activities
provide a valuable source of data and information. There are three designators used for the
activities: E (engineering), D (data), and T (testing). These designators followed by a number
provide the location of the activity in this section.
Some of the activities stand alone; that is, they do not require any formally recognized tools of
the trade. These tools come from various academic disciplines such as probability and statistics,
and reliability engineering. However, many of the activities use these standard methods and
techniques referred to as tools. The designator used for the tools is AT, followed by a number.
4.2 Reliability ActivitiesActivities

The following lists summarize the reliability activities that are discussed in this section:
Engineering Activities
E1 Reliability Goals
E2 Reliability Program Plan
E3 Functional Block Diagrams
E4 Equipment Reliability Modeling
E5 Reliability Goal Allocation
E6 Equipment Reliability Quantification
E7 Design Reviews
E8 Sensitivity Analysis
E9 Design for Reliability Practices
E10 Preventive Maintenance Program (PM)
E11 Reliability of Purchased Components
E12 Ergonomic Studies
E13 Software Reliability Studies
E14 Failure Modes and Effects Analysis (FMEA)

60
E15 Equipment Characterization

E16 Component Failure Analysis
E17 Failure Reporting and Corrective Action System (FRACAS)
Data Activities
D1 Data Collection and Data Base Management
D2 Human Reliability Analysis (HRA)
Testing Activities
T1 Test Plans
T2 Reliability Tests
Reliability Tools
The following list summarizes the reliability tools that are discussed in this section.
AT1 Accelerated Testing
AT2 Burn-In Testing
AT3 Cause & Effect (Fishbone) Diagram
AT4 Competitive Benchmarking
AT5 Design of Experiments (DOE)
AT6 Environmental Stress Screening (ESS)
AT7 Fault Tree Analysis (FTA)
AT8 Life Testing
AT9 Pareto Diagram
AT10 Process Capability
AT11 Quality Function Deployment (QFD)
AT12 Reliability, Analysis and Modeling Program (RAMP) Software
AT13 Reliability Development/Growth Testing (RD/GT)
AT14 Reliability Qualification Testing (RQT)
AT15 Reliability Block Diagram Modeling (RBD)
AT16 Repairable Systems Analysis
AT17 Taguchi Methodology
AT18 User Groups
AT19 Cost of Ownership Calculations
The following pages discuss each activity. Following the activity descriptions is a description of
the tools in enough detail that the reader can either use the tool or understand what it can be used
for. References are available at the end of each activity or tool that requires more detailed
descriptions. Much of the material used in the activity and tool descriptions come directly from
the references. The purpose of this section is not to recreate work that has already been done
well, but rather to give the reader an opportunity to know what the activity or tool is about and
where to go for more information.

61
Engineering Activity E1: Reliability Goals

Reliability goals are used to focus attention toward producing reliable equipment and to serve as
standards against which reliability achievements can be measured. These goals define the design
requirements which in turn form the basis for design specifications.
Various sources are used to establish goals:
Customer Voice. Listening to the voice of the customer means understanding what the
customer wants and needs. Quality Function Deployment (QFD) is a tool developed to help
establish goals through customer involvement. Customers identify the qualities they need and
want in equipment, using their own words. These qualities are then translated by the supplier into
measurable technical goals. QFD is most useful when applied during the concept and feasibility
and design phases. However, it is important in every phase to understand what the customers
consider to be their primary needs and wants. It is also important to establish customer
partnerships to assure continued customer involvement.
Competitive Benchmarking. Competitive benchmarking is a process used by a company to
measure and compare their products, services, and operations against their toughest competitors
and those companies demonstrating world class performance.
Reverse Engineering. The systematic dismantling of a piece of equipment with a high
reliability ranking is called reverse engineering. The information obtained provides clues about
the actual reliability of similar equipment and the technology used to achieve that reliability.
Contractual Agreements. A contractual agreement is a formal document that contains an
explicit statement of the customer’s requirements for reliability and safety. No matter what phase
of the life cycle the equipment is in, the reliability and safety values agreed upon by the customer
and the supplier are set. An inability to maintain these values leads to a dissatisfied customer.
Warranty Requirements. To remain competitive, the reliability goals must support the
warranty requirements.
The following criteria are used to establish goals;
Attainable. Establishing reliability goals involves making those goals realistic for the given
technology constraints. However, these goals should still be a "stretch," that is they should be
challenging. Reliability goals and the time allotted to accomplish these goals must be carefully
correlated. Trade offs may be necessary to match completion dates to the level of achievable
reliability.
Resources Available. Resource availability at the time they are required is important. It is best
to stay with what can be realistically supported.
Equipment Perspective. Approach setting goals from an overall equipment perspective.
Attempt to optimize reliability, cost, time to market, resources, and maintainability while staying
within the overall equipment specifications and design constraints.
Measurable. It is difficult to determine if goals are being met if those goals are not defined
quantitatively.

62
Even though safety and maintainability goals are not addressed in these guidelines, some mention
of these goals is necessary because of the key interactive role they play with reliability. Designers
should identify safety, maintainability, and reliability goals at the same time. Since
maintainability is built into equipment, it is primarily addressed in the concept and feasibility and
design phases. Maintainability is achieved by carefully considering and balancing numerous
factors such as basic physical configuration and layout of the design, test provisions for quick
fault location, interchangability of replaceable parts, adequate maintenance procedures, and skill
levels of technicians. As with reliability, pertinent data is collected to estimate the maintainability
measures and to ensure that the maintainability goals are being achieved.
It is important to remember that setting reliability goals is not a one-time affair; it is a continuous
process of gradual improvements that are made toward the goals over time.
Applicable Tools
AT4 Competitive Benchmarking
AT11 Quality Function Deployment (QFD)
References
Ireson, W., C. Coombs, Jr., editors, Handbook of Reliability Engineering and Management, New
York:McGraw-Hill, 1988, pp. 2.3-2.8.

63
Engineering Activity E2: Reliability Program Plan

The purpose of the reliability program plan is to identify and tie together all of the activities
required for the reliability improvement process. A company typically has several reliability
plans. The general company plan covers all equipment, while the equipment-specific plans are
developed for individual equipment. These plans include six elements:
• Objectives
• Constraints, limitations, and requirements (that exist at the time that the plan is
written)
• Basic assumptions
• Activities required to meet the objectives
• Resources necessary to perform the plan
• Schedule showing when the activities will start and be completed
Every plan needs clearly stated objectives that can be easily understood. Constraints, limitations,
or requirements may exist due to physical, monetary or manpower limitations, identifying these
early may address problems before they arise. The basic assumptions stated in the reliability
program plan will differ from company to company and equipment to equipment; for example,
equipment A can implement its reliability program within 6 months, however, equipment B
requires 10 months. The heart of the reliability plan is the identification of the activities required
to meet the objectives. The plan also requires a statement of the resources required to implement
the plan, along with an explanation of why those resources are required, if it is not obvious. If the
resources are not available, then the plan should show how they will be obtained. Finally, a
schedule is developed showing when the selected activities will start and be completed.
General Company Plan. An overall reliability plan takes the company’s size and available
resources into account. The plan addresses the following issues:
• The company’s reliability policy
• Identification of reliability champions
• The overall strategy
• How reliability skills will be acquired within the company
• A description of organizational activities
Equipment Specific Plan. Each piece of equipment requires a reliability plan based on the
equipment’s life cycle phase, reliability goals, schedule, and available resources. The loftier the
goal, the more activities, tools, and resources are required to meet that goal. Also, the shorter the
schedule, the more resources must be applied over the scheduled period. The plan identifies the
specific reliability activities and tools to be used for particular equipment, and who (or what
department) is responsible for performing them.
References
MIL-STD-785B, Reliability Program For Systems And Equipment Development And Production,
Task 101, 15 September 1980.

64
Engineering Activity E3: Functional Block Diagrams
During this activity, the equipment is depicted by clear, abbreviated schematics that show the
major subsystems, components, or parts of the equipment and the critical support systems such as
power, actuation signals, control, and cooling. A functional block diagram is used to show how
the equipment subsystems, components, and parts interact with one another and with the support
systems. A block diagram provides a clear picture of how the equipment functions and can be
used to create a reliability model. It also helps create an understanding of what makes the
equipment work and what causes it to fail. If alternative concepts or designs have been created,
each one should have its own functional block diagram.
Step 2.
Step 3.
Conduct Evaluation
Step 4. Go/No Go
Are Decision on
No
Step 5.
An example of a functional block diagram is given in the icon above. The functional block
diagram represents a hypothetical personal computer (PC). As can be seen from the diagram, the
PC has two hard disk drives (HD1, HD2) and two floppy drives (FD1, FD2). The keyboard, IO
board, ram card, disk controller card, and video control card all derive their power from the
power supply via the mother board. The CRT (monitor) is a separate unit with its own power
supply.
The need for schematics and flow diagrams is well recognized, but typically these are too
complex to use directly. It is important to construct diagrams that depict clearly and simply how
the equipment functions. Subsystems, components, support systems, and human actions that lead
to equipment failure should be obvious when the functional block diagram is constructed
properly.

65
Engineering Activity E4: Equipment Reliability Modeling

During this activity, the equipment reliability modeling allows one to predict what the reliability
of a piece of equipment is or will be; it is particularly useful when the equipment is complex. To
be of the most value, the reliability model is created in the early life cycle phases; that is, concept
and feasibility, and design. However, the earlier in the equipments’ life the reliability model is
created, the more challenging it is for the model to realistically predict the equipments’ reliability.
The reliability of a piece of equipment is known with absolute certainty only after it has been
used in the field until it is worn out and its failure history has been faithfully recorded. Even
though one cannot predict the equipment reliability with absolute certainty, a reliability model
can predict the equipments’ reliability with enough confidence that changes to the equipment that
lead to improved reliability can be proposed.
No matter what phase of the life cycle a piece of equipment is in, reliability modeling has
numerous benefits; these include:
• Improving understanding of the equipment
• Allowing an early evaluation of design alternatives
• Identifying critical subsystems, components, and parts and their interactions
• Guiding resource allocations to portions of the equipment most needing
improvement.
In order to design and manufacture reliable equipment, it is important to understand how the
various subsystems, components, and parts fit together; how they affect one another during
normal operation; and what the reliability of the subsystems, components, and parts must be to
achieve the desired equipment reliability. Understanding these issues allows one to predict what
the future performance of an equipment design will be. Thus, design alternatives can be modeled
and the best designs chosen. Reliability modeling can consider subsystems, components, parts,
materials, and human errors; that is, anything that affects the equipments’ reliability, to determine
what the reliability of a piece of equipment will be. Target areas for improvement can then be
found. Reliability modeling also allows one to consider the natural variation inherent in
equipment. This variation is due to differences in how operators use the equipment, the
environment in which the equipment is used, differences in how the equipment was
manufactured, and so forth.
The reliability modeling described here is not concerned with time degradation; that is, the
equipment is neither being broken in nor at the point of wear out. This idea can be explained
more clearly by discussing the typical "bathtub" curve seen in many reliability texts. The
following figure shows a typical bathtub curve, also known as a failure rate curve, over the life of
a part, component, subsystem or the equipment. The early part of the curve, where the failure rate
is decreasing, is often called burn-in or the break-in stage. The later part of the curve, where the
failure rate is increasing, is typically called the wear-out stage. As was mentioned, the reliability
model discussed in these guidelines assumes that the components, parts, subsystems and the
equipment itself are in the constant failure rate portion of this curve. This allows one to assume
that all components, parts, subsystems, and the equipment have a constant failure rate; that is, the

66
failure rate does not change over time. The model also assumes that the components, parts, and
subsystems being modeled are repairable and that the repaired items are as good as new.
Failure Rate
Time
If a block diagram is used to model the equipment, the equipment model will consist of series
blocks (when the failure of one subsystem, component, or part causes the equipment to fail),
parallel blocks (when every subsystem, component, or part must fail for the equipment to fail) or
a combination of these.
The following paragraphs discuss how to create a reliability model. The first step involves clearly
defining what is meant by equipment failure. For example, one might define failure as any
occurrence that causes the equipment to be down for more than a given period of time (e.g., 6
minutes) or any occurrence that results in wafer damage. This step also involves identifying all of
the failure mechanisms that lead to the defined equipment failure. If, for example, equipment
failure is defined as a down time of 6 minutes or more, all failure mechanisms that cause the
equipment to be down at least 6 minutes are included in the reliability model. If equipment
failure is defined as any occurrence that results in wafer damage, all failure mechanisms that
result in wafer scrap are identified. Field data is often useful in defining what is meant by
equipment failure and in identifying mechanisms that lead to failure.
The next step involves creating the reliability model. Fault trees and reliability block diagrams
are the tools that are used to do this. RAMP is a software package that has been created to help in
the documentation and analysis of a reliability model. It uses reliability block diagrams. RAMP
allows one to create the reliability model on a personal computer, provides a means of
documenting failures, and performs the Boolean algebra necessary to solve the model. A
reasonable starting point in the creation of the model is to initially create a coarse model made up
of the equipments’ major subsystems. If a block diagram is used as the modeling tool, the model
would consist of approximately 10 to 20 major subsystems; that is, in the model, one block
would represent each major subsystem. Later versions of the model add detail only to those
subsystems that are identified as being important; that is, only those subsystems that cause the
equipment to fail are broken down into components and parts. Adding detail to unimportant
subsystems for the sake of completeness simply increases the modeling effort without adding to
the usefulness of the results. Careful examination of field data helps determine the appropriate
level of detail for the model. In general, the model should not be more detailed than the available

67
information will support. If the modeling effort is for equipment not yet in the field, field data for
a previous generation of equipment can yield valuable insights into improvements in the next
generation.
Once the model is completed, it can be transformed into an equation for quantification, which is
discussed in engineering activity E6. The equipment reliability is calculated using the failure data
collected for the subsystems, components and parts.
The following paragraphs discuss tips that will make the modeling effort easier.
1. Think carefully about the subsystem divisions for the equipment being modeled.
The choice of subsystems will vary from company to company and equipment to
equipment; however, it is best to base the choice on functional considerations
not on parts count methods. Choose subsystems based on the functions they
perform. Group components and parts under the subsystems that make
functional sense.
2. Avoid parts list modeling. That is, do not represent the equipment as a collection
of parts. It is important to include failure modes such as operator errors, software
failures and failures that are the result of drifting out of specification. In
addition, valuable insight into the equipment is gained by thinking about failure
modes and interactions between different subsystems. Parts list modeling does
not encourage this kind of thinking.
3. It is best to begin by modeling an existing piece of equipment. Good reliability
modeling practice comes through experience. If the first model created is for
equipment that is well understood, the model can be validated in terms of the
failure rate and failure mechanisms. Also, introduction of a reliability modeling
program will almost always cause the data collection and data management
procedures to be revised. It is generally better to sort out data problems with an
existing system than with a new system.
4. No matter what phase of the life cycle the equipment is in, it is best to keep the
model as simple as possible. As the model becomes more complicated, it
becomes more difficult to interpret.
5. As the reliability process proceeds, continually change, expand, and improve the
model. This allows the model to be used throughout the life of the equipment.
Applicable Tools
References
Campbell, J.R., Iman, R., Longsine, D., Thompson, B., A Tutorial on Reliability Modeling Using
RAMP, Albuquerque, NM:SETEC, Sandia National Laboratories, SETEC91-030, 1991.
MIL-HDBK-217E, Reliability Prediction of Electronic Equipment, Griffiss AFB,NY:Rome Air
Development Center, October 1986.

68
Engineering Activity E5: Reliability Goal Allocation

The reliability goal allocation activity involves allocating or apportioning the equipment
reliability goals into individual subsystem, component, and part goals. The advantages of
allocating reliability goals include:
• Persuading equipment design and development personnel to understand the
relationships between parts, components, subsystems, and the overall equipment
reliability. This leads to an understanding of the basic reliability problems
inherent in a design.
• Persuading the design engineer to consider reliability equally with other
equipment parameters such as performance, cost, and weight characteristics.
• Ensuring adequate design, manufacturing methods, and testing procedures.
• Giving design engineers numerical goals for each portion of the design.
• The ability to specify a set of goals to a sub-tier supplier who is producing a
subsystem, component, or part of the equipment.
When starting the reliability allocation process, use the overall equipment goals and the
equipment reliability model along with a basic reliability allocation model. There are several
basic models available, such as the equal-apportionment technique, the ARINC apportionment
technique, and the AGREE allocation method; these are all described in Reliability in
Engineering Design, by Kapur and Lamberson.
When allocating reliability goals, combine engineering judgement along with knowledge of:
• How the various subsystems, components, and parts are related
• The reliability of similar subsystems, components, or parts of previous equipment
• The complexity of the subsystems, components, or parts
• The importance of the subsystems, components, or parts to the equipment
reliability
In the early life cycle phases, there is a lack of detailed information; thus, the allocation process
is approximate. However, a tentative reliability allocation can be done to guide the design team.
If the allocated goals for a piece of equipment cannot be met using the current technology or if
the goals can be met too easily, the equipment is modified and the allocations reassigned. This
process is repeated until the allocations meet the equipment requirements.
It is important to mention here that reliability is only one of many design attributes that need to
be allocated. The allocation process is repeated for other attributes such as; safety,
maintainability, and ease of use. A conscious decision is made in the trade-off between reliability
and other attributes.
Applicable Tools

69
References
Ireson, W., C. Coombs, Jr., editors, Handbook of Reliability Engineering and Management, NY,
McGraw-Hill, 1988, pp. 18.34-18.42.
Juran, J., F. Gryna, editors, Juran’s Quality Control Handbook, Fourth edition, NY, McGraw-
Hill, 1988, pp. 13.21-13.22.
Kapur, K., L. Lamberson, Reliability in Engineering Design, NY, John Wiley and Sons, 1977,
pp. 405-422.
Lloyd, D., M. Lipow, Reliability: Management, Methods, and Mathematics, Second edition,
Milwaukee, WS, The American Society for Quality Control, 1984, pp. 25-27, 267-270.
O’Connor, P., Practical Reliability Engineering, Third edition, NY, John Wiley and Sons, 1991,
pg. 136.

70
Engineering Activity E6: Equipment Reliability Quantification

In this activity, the equipment reliability model developed using a reliability block diagram or a
fault tree is transformed into a simple equation which is then used to quantify the equipment
failure rate. The reliability block diagram modeling description given in AT15 includes general
information on how to write the Boolean equation for a block diagram. Boolean equations are
used to quantify the block diagram; that is, take the block diagram created for a piece of
equipment and translate it into a failure rate for that equipment, based on the individual failure
rates of the subsystems, components, and parts that make up that equipment. The reliability,
analysis and modeling program (RAMP) software has been developed specifically to solve
reliability block diagram models. The fault tree analysis description given in AT7 also includes
general information on how to translate the model created for the equipment into a Boolean
equation, which can then be quantified into the equipment failure rate.
A B E
Equipment Failure = A + B + [ C * D ] + E
Applicable Tools

71
Engineering Activity E7: Design Reviews

Conducting a review of a proposed design is a way of evaluating a design and providing some
assurance that a new design will not create problems as the equipment proceeds through its life
cycle phases. There is not generally a separate design review for reliability. In fact, reliability
should be included as an integral part of the regular design review process that consists of a
sequence of reviews at appropriate points in the equipments’ life.
Design reviews provide an opportunity to review the design, make decisions, and establish action
plans. They also ensure that:
• The design meets all performance requirements
• The design has been studied to identify possible weaknesses
• Alternative designs and components have been considered
• The design can be manufactured at a low cost
• The design has easy, low-cost field maintenance
Designers, reliability engineers, manufacturing engineers, field service, management and other
appropriate personnel participate in the reviews. The basis of the design review is the design as it
exists at the time of the review.
The purpose of reliability-oriented design reviews includes:
• Verifying the appropriateness of the reliability model
• Detecting potential design weaknesses and any other condition that could degrade
equipment reliability
• Verifying fault detection and diagnostic capabilities
• Determining equipment recovery strategies
• Verifying the quality of component and equipment reliability data
• Developing a parts derating strategy
• Verifying reliability qualification test results
• Evaluating electrical, mechanical, and thermal aspects of the design
• Determining the effects of reliability and maintainability engineering on the
design
• Determining the extent to which software affects the equipment reliability
• Reviewing the status of previous review actions
• Using the lessons learned from previous generation equipment
• Deciding what trade-offs should be made between process, manufacturing and
reliability.
In order to accomplish the purpose of the design review, reviews must occur during all life cycle
phases and numerous times during a phase. The reviews may be formal or informal; within a
group, such as design; or include a cross-functional group of individuals, such as design,
manufacturing and field service; they may even include management. Generally those reviews
that include management are those that are involved in making critical decisions.

72
Ingredients for a successful design review include

• An emphasis on constructive input to designers, instead of criticism. The purpose
of a review is not to challenge the work of a designer, but to anticipate weak areas
in a design and eliminate them as early in the life cycle as possible.
• Avoiding the creation of an environment where the designer feels threatened. The
designer listens to the results of the review and, along with line management, has
the final decision on the design.
• Creating a design review team from a variety of areas. These areas may include
manufacturing, field service, reliability and quality engineering, procurement,
materials engineering, shipping, marketing, and design engineering personnel who
are not directly associated with the design under review. Customer involvement in
a post-design review meeting in which the program is reviewed may yield insight
into what the customer values in the equipment.
• Adequate planning for and emphasis on design review meetings. A formal agenda
and advanced documentation is distributed.
• Focusing on the unproven and untried features of a design.
• Sufficient structure in the design review process. Identified design weaknesses are
documented and provisions are made for their elimination. Subsequent review
meetings include a discussion of these weaknesses.
• A realization that the design review may uncover areas of conflict between
departments.
• Management support. Management is responsible for emphasizing the importance
of a carefully planned design.
References
Everett, W., et.al., Reliability by Design A Guide to Reliability Management, Issue 1,
Indianapolis, IN, AT&T Bell Laboratories, November 1990, pp. 55-56.
Juran, J., F. Gryna, Juran’s Quality Control Handbook, Fourth edition, NY, McGraw-Hill, 1988,
pp. 13.7-13.11, 16.5-16.6.
Lloyd, D., M. Lopow, Reliability: Management, Methods, and Mathematics, Second edition,
Milwaukee, WS, The American Society for Quality Control, 1991, pp. 28-30.
O’Connor, P., Practical Reliability Engineering, Third edition, NY, John Wiley and Sons, 1991,
pp. 160-162.

73
Engineering Activity E8: Sensitivity Analysis

This activity discusses two types of analysis; uncertainty and sensitivity. Uncertainty provides a
means of more accurately representing a line of equipment. Sensitivity allows one to simulate
changes to subsystems, components, and parts and determine the impact of those changes on the
equipment.
Uncertainty is a means of addressing variability. Suppose that one tracks the failure history of a
piece of equipment at a particular customer’s location. By the time the equipment is worn out,
one has an accurate MTBF value for that unit. While this information is useful, it is unlikely that
an identical unit will have the same MTBF, even if that unit was manufactured at the same
factory by the same personnel and used at the same customer location. Uncertainty allows one to
assign a range of values to a subsystem, component or part failure rate which is then propagated
through the reliability model and yields a range over which the equipment MTBF will fall. This
provides a more accurate representation of the equipment line. Uncertainty is also useful in
pointing out those subsystems, components and parts that require more accurate data. If, for
example, the solution of the reliability model highlights a component that has a range of MTBF
from 10 hours to 1,000 hours, it is clear that one cannot predict with any confidence, what the
reliability performance of that component will be from unit-to-unit. It may be that the failure rate
range is large because there is no field data available and the design team does not have a good
idea of what its failure rate will be. If this is the case, the team needs to find a way to get more
accurate information on this component. Another cause of the large range may be that the
component failure rate truly falls within this range, in which case, the team needs to address the
unit-to-unit variability. Once these sources of variability are identified, they can be reduced.
The reliability model has already been created for the equipment in engineering activity E4. The
solution of the model has highlighted those subsystems, components and parts that are the largest
contributors to the unreliability of the equipment. The design team can now concentrate on
improving these items. Sensitivity analysis uses the reliability model to simulate changes to the
equipment. Changes could range from modifying individual component failure rates to
completely re-designing a subsystem. Component failure rate changes could be due to a
preventive maintenance procedure that will significantly reduce the likelihood of that failure
occurring or to changing suppliers of the component to one who has a more reliable product. If a
subsystem is re-designed, a reliability model can be created for it and used to replace the previous
subsystem model. The new model is then solved and the effect of the change on the overall
equipment reliability predicted. This allows the design team to make rational decisions on what
changes to make in the equipment that result in the most improved reliability for the least amount
of expenditure in time and money.
A software tool called RAMP has been designed to address both uncertainty and sensitivity. The
data base associated with RAMP allows one to input a range of values for a failure rate. A
sensitivity analysis is easy to perform using RAMP and yields results quickly.
Applicable Tools
AT19 Cost of Ownership Calculations

74
References
Campbell, J., R. Iman, D. Longsine, B. Thompson, A Tutorial on Reliability Modeling Using
RAMP, Albuquerque, NM, SETEC, Sandia National Laboratories, SETEC91-030, 1991, pp. 43-
50.

75
Engineering Activity E9: Design for Reliability Practices

Critical items of a design can be enhanced by applying reliability practices such as: derating;
design simplification; redundancy; procedural changes; process control; design for
maintainability; deployment considerations; and use of preferred and proven processes,
components, and materials. All of these principles are briefly discussed in the following
paragraphs.
Derating. Derating means that components are operated at less severe stresses than their rating
specifies; e.g., a capacitor rated at 300 V is used in a 200 V application. Components selected for
derating are typically critical to the reliability of the equipment and are selected at significantly
higher rates for voltage, current, temperature, environment, and power dissipation. Derating
enhances reliability by:
• Reducing the likelihood that marginal components will fail during the life of the
equipment
• Reducing the effects of parameter variations
• Reducing long-term drift in parameter values
• Providing an allowance for uncertainty in stress calculations
• Providing some protection against transient stresses, such as voltage spikes
Design Simplification. Reducing the complexity of equipment is one way of improving its
reliability as well as its manufacturability and maintainability. The following are considered
when simplifying a design:
• Identify components that can be eliminated or combined with other components in
the equipment.
• Ensure that the simplification does not impose higher stresses on other compo-
nents in the equipment.
• Do not replace components known to be reliable with components that perform
complex or multiple functions unless the latter’s reliability is known.
G. Boothroyd and P. Dewhurst have developed a process that can be used to simplify a design.
This process involves reducing the number of individual components and ensuring that the
remaining components are easy to manufacture and assemble.
Redundancy. Unlike the goal of design simplification, redundancy actually increases the
number of components. However, the use of redundancy is one of the most effective ways to
improve reliability. The following are different types of redundancy:
• Pure Parallel Redundancy. In a pure parallel system, more than one component
or subsystem can perform the same function. If two components are in a pure
parallel system, failure of the equipment due to these components requires both
components to fail.
• Pure Parallel with Partial Redundancy. In this system, equipment works if some
of the components work. For example, the system consists of four independent
components where successful operation of the equipment occurs when any two of
the four are working.

76
• Standby with Changeover Redundancy. This system has one component

operating and one or more identical components in standby. When one component
fails, the next component takes over. The assumption here is that no repairs are
carried out on failed components until all of the components have failed; that is,
the first component and all of the standby components have failed.
• Standby with Several Operating Components. In this system, there are N
operating components and n components in standby. For example, the system
consists of 5 identical components, 3 of which must work for the system to be
successful. If one of the components fails, another takes its place. This continues
until there are none left to take over for a failed component; then repairs occur.
Procedural changes. Procedural changes involve creating a new procedure or changing an
existing one to prevent reliability degradation. For example, improving procedures for handling
electrostatic-sensitive parts or for aligning dimensionally critical components.
Process control. Process control involves modifying a manufacturing process that is degrading
reliability. The idea, simply stated, is that if the manufacturing process is understood and
controlled, the equipment will come out all right. J. Tunner discusses five basic steps which, if
followed, lead to total manufacturing process control:
1. Clearly defining what is required of the equipment
2. Understanding the production process
3. Improving the process so that acceptable equipment is manufactured
4. Controlling and monitoring the process itself
5. Searching out new quality improvement opportunities
These steps are applicable to any manufacturing operation. There are numerous tools that are
useful for process control: Cause & Effect (Fishbone) Diagrams, Design of Experiments, Pareto
Diagrams, Process Capability, and Taguchi Methodology. It is important to note that the success
of these steps depends on taking a team approach; that is, operators, engineers, scientists,
supervisors, and other key persons throughout the company are involved in all steps.
Design for Maintainability. Equipment maintainability is defined as a measure of the ease and
rapidity with which equipment can be restored to or maintained in an operational status. It is
important that maintained equipment are designed so that maintenance tasks are easily performed
and the skill level required for diagnosing, repairing, and scheduling maintenance is not too high.
Desirable features include:
• Making access and handling easy
• Using standard tools and equipment
• Eliminating the need for delicate adjustments or calibrations
The repairable system analysis tool is useful here in establishing maintenance policies and in
highlighting subsystems, components, and parts that need to be more maintainable.
While the designer has no control over the performance of the maintenance people, he or she can
directly affect the inherent maintainability of the equipment.

77
Deployment considerations. Reliability degradation during deployment is typically a result of

the interaction between people and the equipment or the equipment and the environment. Some
of these problems can be prevented if appropriate measures are taken. These measures include:
• Documenting deployment procedures
• Training personnel and users
• Testing during installation
• Providing technical assistance
• Identifying and correcting problems
• Establishing equipment change procedures
Improper handling of equipment during delivery and installation can degrade the inherent
reliability that has been designed into the equipment. To prevent problems associated with
handling the equipment, procedures specifically developed for storage and shipping, installation,
and handling and operation are created. In addition, training installation and maintenance
personnel and users in the installation, operation, and maintenance of the equipment can
significantly reduce reliability problems.
Specifying a plan for testing during installation will verify that the installed equipment operates
properly and according to specifications and that the equipment performance has not been
degraded as a result of shipping and handling. Providing appropriate technical assistance helps
customers solve problems. It is also important to identify and correct problems that occur during
shipping, installation, operation, and maintenance. Problems are reported and recorded, carefully
analyzed, and then reported to the design and manufacturing staff to prevent their recurrence.
Equipment change procedures are the methods by which the equipment is changed in the field to
meet or enhance the original performance specifications. These specifications are established to
assure the customer that any such changes maintain compatibility with existing equipment and do
not adversely affect customer requirements.
Use of preferred and proven processes, components, and materials. The reliability of a piece
of equipment depends on the reliability of its processes, components and materials. Concepts and
procedures for ensuring process, component and material reliability include:
• Selecting, specifying, qualifying, and controlling materials and processes
• Qualifying and requalifying components
• Conducting a supplier testing and reliability monitoring program
• Monitoring subcontractors and suppliers
• Screening and derating components and materials
Applicable Tools
AT3 Cause & Effect (Fishbone) Diagram
AT9 Pareto Diagram
AT17 Taguchi Methodology

78
References
Arsenault, J., J. Roberts, editors, Reliability & Maintainability of Electronic Systems, Potomac,
MD, Computer Science Press, 1980, pp. 280-293, 365-393.
Boothroyd, G., P. Dewhurst, Product Design For Assembly, Wakefield, RI:Boothroyd Dewhurst,
Inc.
Burgess, J.A., "Improving Product Reliability," Quality Progress, December 1987, pp. 47-54.
Davidson, J., editor, The Reliability of Mechanical Systems, London, Mechanical Engineering
Publications Limited for The Institution of Mechanical Engineers, 1988, pp. 47-57.
O’Connor, P., Practical Reliability Engineering, Third Edition, NY, John Wiley & Sons, 1991,
pp. 219-220, 117-125, 328-329.
Skrabec, Q. Jr., "The Transition for 100% Inspection to Process Control," Quality Progress,
April 1989, pp. 35-36.
Smith, J. R., "Reliability Analysis By Simulation," 41st Annual Quality Congress Transactions,
May 4-6, 1987, pp. 654-662.
Tunner, J., "Total Manufacturing Process Control-The High Road To Product Control," Quality
Progress, October 1987, pp. 43-50.
Vanderbei, K., et.al., Reliability by Design, Indianapolis, IN:AT&T, 1990, pp. 105-114, 61-71.
MIL-STD-470B, Maintainability Program for System and Equipment, Irvine, CA:Global
Engineering Documents, 30 May 1989.

79
Engineering Activity E10: Preventive Maintenance Program

If a specific component used in a piece of equipment has a reliability value below what is
required, a method of circumventing this deficiency is through preventive maintenance (PM).
PM involves developing a maintenance schedule where, prior to failure,
• Components that are partially worn out or aged are replaced by new components
• Components that require adjustments or become contaminated are inspected,
readjusted, or cleaned, as required
Developing a PM program is advantageous because it:
• Is less expensive than having the equipment down at an undesirable time
• Helps alleviate degradation in equipment performance
• Yields insights into how the next generation of equipment can be improved
There are two tools that are useful in this activity; repairable systems analysis and user groups.
Repairable systems analysis can be used to compare different maintenance policies, predict future
numbers of repairs, and to highlight areas where preventive maintenance would improve the
equipment reliability. User groups are an effective means of maintaining open communication
between the equipment supplier and the equipment customer.
Applicable Tools
AT18 User Groups
References
MIL-HDBK-338-1A, Electronic Reliability Design Handbook, Oct. 1984, pp. 11-87 to 11-93,
12-47 to 12-49.

80
Engineering Activity E11: Reliability of Purchased Components

Evaluating the reliability of purchased subsystems, components and parts allows the customer to
choose those that are the best and that meet the reliability needs of their equipment. However, as
the article by E. Broeker implies, evaluating reliability is not limited to purchasing various
products and testing them to determine which has the best reliability. It also involves building a
customer-supplier relationship, one based on mutual respect, trust, and benefit. Unfortunately,
this takes time and requires management support from both the supplier and the customer. There
are several methods that can be used to perform evaluations on the supplier’s products when the
customer-supplier relationship is limited to purchasing subsystems, components and parts:
• Request reliability performance information or data from potential suppliers. Be
sure to obtain the basis for any reliability claims
• Include reliability performance requirements as part of the purchase contract
• Perform tests on the products
Developing a good supplier-customer relationship requires planning. The supplier should be
regarded as an extension of the customer’s process. Sharing information is often necessary if
suppliers are to provide products that meet all of the customer requirements. A supplier will not
be able to understand a customer’s problem without some knowledge of the customer’s processes
and procedures, likewise for the customer.
One of the first things the customer does before selecting a supplier is to establish clear, precise
requirements for the product to be provided. This provides a basis from which various suppliers
can be evaluated. Questions that may be asked include:
• Does the supplier understand what is required of the product?
• Can the supplier meet the schedule?
• What experience should the supplier have in making the product?
• How much will the supplier be involved in developing the customer’s equipment?
• How will conformance to requirements be measured?
• What sort of corrective action system will be required?
The answers to these questions helps the customer evaluate which supplier can best meet its
needs.
It is also important to select suppliers who are capable of conforming to specified requirements.
Information on this ability comes from a review of past performance, tests of incoming materials,
and on-site evaluations. The on-site evaluation provides the opportunity to assess the supplier’s
methods of assuring reliability. The supplier may need to implement a reliability program similar
to the one the customer is using.
In order for the customer to produce high reliability equipment, their supplier must produce
reliable products. This is non-negotiable. Customers and suppliers must agree completely on the
required reliability of the product. This involves making sure that the supplier understands
exactly what the customer expects, a willingness to explain why the supplier’s product must meet
the expectations, and establishing precise methods for accepting the product.

81
One of the most important steps in the supplier reliability program is measurement and feedback.
Measurements provide a means of determining if the supplier is meeting the agreed upon
reliability requirements. Feedback gives the supplier the necessary information to improve the
product.
Finally, for every product that does not meet the reliability requirements, suppliers are asked
what corrective action they will take. They should be able to provide answers to the following
questions:
• What caused the product to not meet the reliability requirements?
• What changes need to be made to make the product meet the requirements?
• How will these changes be made foolproof?
• How will the customer know that these changes have been made?
It is the customer’s responsibility to ensure that the supplier works to find the root cause of each
failure to meet the requirements and takes the necessary action to permanently eliminate the
cause. The role of the supplier in improving reliability of the equipment is critical. For the
supplier to continuously improve the product’s reliability, the customer must demand it.
Applicable Tools
AT8 Life Testing
AT18 User Groups
References
Broeker, E., "Build a Better Supplier-Customer Relationship," Quality Progress, September
1989, pp. 67-68.
Juran, J., F. Gryna, editors, Juran’s Quality Control Handbook, Fourth edition, NY:McGraw-
Hill, 1988, pp. 15.1-15.46, 30.18-30.21.
Klock, J., "How to Manage 3,555 (or Fewer) Suppliers," Quality Progress, June 1990,
pp. 43-47.
Richardson, J., "Vendor Quality Assurance in a Process Industry," Quality Progress, November
1984, pp. 60-63.

82
Engineering Activity E12: Ergonomic Studies

Both reliability and ergonomics are concerned with predicting, measuring, and improving
equipment performance. Equipment failures are caused by human errors and equipment
malfunctions. Thus, the overall equipment reliability is evaluated from the viewpoint that the
equipment consists not only of the equipment and its associated procedures, but also includes the
people who use them. One must identify and plan for human reliability factors and their effects
on the overall equipment reliability. For example, when the interface between the human and the
equipment is complex, the possibility of human error increases, with an accompanying increase
in the probability of equipment failure. It is interesting to note that designing in reliability
frequently includes detecting and correcting equipment malfunctions, which is a task often
assigned to humans. Thus, the equipment performance can be enhanced or degraded, depending
on whether or not the malfunction indicators are presented so that they are understood readily.
Studying human response to audio and visual stimuli provides valuable guidance in the design of
equipment malfunction indicators.
Ground Coffee
Transportation Shop
First Aid
Baggage Phones
Claim
Ergonomics (or Human Factors Engineering) is a discipline concerned with designing equipment,
operations, and work environments to match human capabilities and limitations. Ultimately,
everything that one designs has an impact on the human in one way or another. Someone will
have to fabricate the equipment, package it, distribute it, unpack it and prepare it for use, operate
or use it, service and maintain it, and finally dispose of it. For this reason, designers should be
constantly alert to the human factors implications of their proposed design. Keep in mind that the
ultimate success of the equipment depends on how well the user performs the tasks associated
with it.
The intent of human factors engineering in this document is to focus on and resolve human-
equipment interface problems and solutions wherever or whatever they are. Philosophically, then,
human factors engineering is looking at a design from the standpoint of user efficiency, or total
human-equipment output effectiveness. Inherent in this philosophy are the following objectives:
• To make the user’s contribution to the equipment output as efficient as possible so
that the basic equipment output is not compromised by human failures.
• To make the combined user-equipment involvement as safe as possible so that
neither human nor equipment failures will compromise the user’s health or
damage the hardware. Inherent in this objective is the avoidance of injury to
others and of damage to adjacent hardware.

83
• To minimize the stress that the equipment imposes on the user as he or she uses,
operates, services, or maintains it. This includes such stresses as an undue energy
demand, frustration in trying to deal with the equipment at any point in the
human-equipment interaction, and worry about whether one is using the
equipment properly.
• To maximize the acceptability of the equipment, not only in terms of its
attractiveness, but also in terms of giving users the feeling that the equipment
allows them to use it efficiently and keep it in good working order with a
minimum of effort.
The methods of ergonomics are based on a logical and systematic process of: (1) establishing the
proper role of the human with the equipment, (2) designing the human-equipment interfaces to fit
the human’s capabilities and limitations, (3) evaluating and testing to see that the design does fit
these capabilities and limitations, and (4) properly training the human to operate the equipment.
If the equipment has used ergonomically sound human-equipment interfaces, the following items
have been accomplished:
• The equipment conforms to populational stereotypes and user expectations
• It is easy to learn how to operate the equipment
• Easily perceived displays and simple controls allow effective and efficient
communication between humans and the equipment
• The tasks allocated to humans and the equipment are based on known relative
strengths and weaknesses
• Relevant information is provided to the user by the equipment which avoids
reliance on the user’s memory
• Effective and efficient performance of equipment functions are facilitated
Whenever practicable, human engineering specialists should be used to help identify and solve
human engineering problems. However, this is not always possible. There are numerous human
factors references available; however, most of these references are directed to human factors or
human engineering specialists. The reference provided at the end of this activity has been
directed specifically toward the engineer or designer and provides a number of guidelines to
assist designers in doing their own human engineering. Its purpose is to provide a general
reference to key human factors questions and human-equipment interface design suggestions in a
form that engineers and designers can utilize with a minimum of searching or study.
References
Woodson, W., Human Factors Design Handbook Information and Guidelines for the Design of
Systems, Facilities, Equipment, and Products for Human Use, New York:McGraw-Hill Book
Company, 1981.

84
Engineering Activity E13: Software Reliability Studies

Software failures impact the ability of a piece of equipment to accomplish its intended function.
Therefore, the equipment’s reliability model must include appropriate software components; that
is, software reliability must be an integral part of equipment reliability concerns. In addition,
software must be managed to reduce these concerns within project constraints.
Software reliability is defined as the probability that software will perform its intended function
for a specified period of time, in a specified environment. Three key concepts are:
Failure. A failure is defined as an inability of equipment controlled by software to successfully
perform in accordance with its specified requirements. The source of the failure is an identified
software fault; the source of the fault is a human error.
Time. The measure of time includes calendar time, operational time, and computer processor
unit time. From a user perspective, calendar time and operational time are the most important.
From a modeling accuracy perspective, operational time and CPU time are the most important.
From a data collection perspective, calendar time is the easiest to collect while CPU time is the
most difficult.
Environment. The environment includes; the input domain scenario, profile, and tests being
conducted; the parts of the equipment being used during the tests; and the physical environment
in which the tests are being conducted. The actual operational environment of the software is of
the most interest. That is, the closer the test scenario, equipment configuration, and physical
environment are to the actual operating environment, the more accurate the software operational
reliability computations will be.
Software reliability management is concerned with meeting the software reliability goals by
building the software to satisfy requirements consistent with project constraints; such as, cost,
schedule, resources, and performance. Software reliability management has two complementary
elements: software design reliability and software operational reliability. Software design
reliability is concerned with improving the software life cycle processes and the individual
products; that is, the plans, specifications, code, and tests, that are the inputs and outputs of those
processes. An emphasis is placed on early defect prevention, fault detection and fault removal.
Software operational reliability is concerned with measuring how well the software performs or
is predicted to perform its intended function in its operational environment. The emphasis here is
on the use of testing and failure data measurements.
A checklist of activities that will improve software design reliability includes:
CHECK 1: Baseline Current Software Processes
− Define the current software development and support processes
CHECK 2: Identify Immediate Areas of Improvement
− Management
− Engineering
− Training

85
CHECK 3: Train Personnel in Priority Areas

− Software requirements
− Software testing
− Software configuration management
− Software inspections
The primary indicator of process improvement at this time is the use of software inspections to
identify and classify defects throughout the software life cycle. The intent is to find as many
defects as possible, conduct a root cause analysis to identify how the process might be improved
in order to reduce defects in the future, and measure the resources; that is, the time, personnel,
and costs, required to correct the defects. There is emerging research that is attempting to link the
early defect identification with the software operational reliability failure data.
A checklist of activities that will improve software operational reliability include:
CHECK 1: Define equipment and software reliability goals
− Probability
− Failure intensity
− Fault density
CHECK 2: Analyze failure data from equipment test/operation
− Equipment identification data
Equipment Identification/Version : [name & version#]
Subsystem Identification/Version : [three characters]
Location of Equipment : [site name]
Software Release #/Version : [release #]
Software Component Version : [version #]
− Test execution data
Test Procedure/Sequence : [id#]
Test Start Date (Calenda) : [mo/da/yr]
Test Start Time (Operational) : [hh:mm:ss]
Test Start Time (Execution) : [hh:mm:ss]
Failure Time (Execution) : [hh:mm:ss]
Failure Time (Operational) : [hh:mm:ss]
Failure Date (Calendar) : [mo/da/yr]
Failure Classification : [1,2,3,4,5]
Problem Description : [text description]
Log File Data : [task logs]

86
− Failure identification data

Failure Identification # : [id #/ unique]
Failure Node : [component id#]
Failure Reference Id# : [previous failure]
Failure Correction Time (est) : [work days]
Failure Correction Time (act) : [work days]
Failure Correction Resources (est) : [person days]
Failure Correction Resources (act) : [person days]
− Management status data
Classification : [fatal|chg|info]
Priority : [1-high to 7-low]
Status : [open/closed & date]
Disposition : [acc/kill/def &date]
Scheduled Release : [release #]
CHECK 3: Apply failure classification scheme
Code Severity Description of Failure
1. Equipment Abort A software or firmware problem that results in an
equipment abort or crash.
2. Equipment Degraded A software or firmware problem that severely

No Work-around degrades the equipment and no alternative work-
around exists; restarts not acceptable.
3. Equipment Degraded A software or firmware problem that severely

Work-around degrades the equipment and an alternative workaround
exists; process can continue with more operator action;
restarts not acceptable.
4. Equipment Not Degraded An indicated software or firmware problem that does not
severely degrade the equipment or any essential function;
restart acceptable.
5. Minor Fault All other minor problems/non-functional faults due to
software or firmware problems.
CHECK 4: Apply operational reliability model for the decision process

− Poisson process models are typical.
− When will software meet reliability goals?
− When can software release be delivered?
− What level of support will be required?

87
An example set of data collection, analysis, and reporting process flow steps include:
STEP 1: Begin test sequence.
STEP 2: Collect equipment and execution data for each failure.
STEP 3: Send collected data to analysis personnel at end of test sequence.
STEP 4: Respond to queries from analysis personnel for more information.
STEP 5: Record failure and management status data.
STEP 6: Update software operational reliability data base.
STEP 7: Generate failure/fault count summary reports.
STEP 8: Update software operational reliability model.
STEP 9: Generate software operational reliability measures, graphs.
STEP 10: Provide summary of results to management on a regular basis.
The references provide more detail about software reliability.
References
Ireson, W., C. Coombs, Jr., editors, Handbook of Reliability Engineering, NY:McGraw-Hill,
1988.
Musa, J., A. Iannino, K. Okumoto, Software Reliability: Measurement, Prediction, Application,
NY:McGraw-Hill, 1987.
SETEC, "Software Reliability for SEMI/SEMATECH Companies (Draft)," SEMATECH,
SETEC-91-032, December 20, 1991.

88
Engineering Activity E14: Failure Modes and Effects Analysis (FMEA)

Failure modes and effects analysis (FMEA) is a technique for systematically identifying,
analyzing, and documenting the possible failure modes that exist for a piece of equipment and
the effects of such failures on the equipment’s performance. The term failure mode is used to
refer to the possible ways in which a component can fail. If the criticality of each failure mode is
analyzed, the analysis is called a failure modes, effects, and criticality analysis (FMECA). The
purpose of the criticality analysis is to rank each potential failure mode identified according to
the severity of the failure and its probability of occurrence, based on the best available data.
Equipment: FMEA Date:

Subsystem: Sheet:
Reference Drawing: Fault Code # Prepared By:
Subsystem Potential Potential Potential S C Potential O Current Recom-

/Module Failure Local End E R Cause(s) C Controls mended
& Function Mode Effect(s) Effect(s) V Of C /Fault Action(s)
Of Of Failure Detection
Failure Failure
The complexity of the equipment and the availability of data dictate the FMEA analysis approach
that will be used. There are two primary approaches for accomplishing an FMEA. One is the
hardware approach which lists individual hardware components and analyzes their possible
failure modes. The other is the functional approach which recognizes that every component is
designed to perform a number of functions that can be classified as outputs. These outputs are
listed and their failure modes are analyzed. For complex systems, a combination of the functional
and hardware approaches may be used. The FMEA may start at the highest equipment level and
proceed down to lower levels (top-down) or start at the lowest level and proceed to the highest
equipment level (bottom-up). The hardware approach is normally used when hardware
components can be uniquely identified from schematics, drawings, and other engineering and
design data. This approach is generally done bottom-up. The functional approach is normally
used when hardware components cannot be uniquely identified or when equipment complexity
requires analysis from the highest equipment level down through succeeding levels. This
approach is generally done top-down.
An FMEA analysis is used to:
• Ensure that all conceivable failure modes and their effects are understood
• Assist in the identification of design weaknesses
• Select design alternatives
• Select design improvements
• Prioritize corrective actions

89
• Select test programs

• Assist in troubleshooting existing equipment with operating problems
Since an FMEA concentrates on identifying possible component failures and their effects on the
equipment, design deficiencies can be identified and improvements can be made. Identification
of potential failures leads to a recommendation for an effective test program. Failure modes can
be prioritized according to their frequency so that concentrated effort can be placed on the higher
priority components; that is, on those components with the most failures. A limitation of the
FMEA analysis is that it considers each failure mode individually, if a single failure does not
affect the equipment but two or more failures do, the FMEA analysis is not well-suited to
assessing the combined effects of these failures on the equipment. As the equipment proceeds
through the life cycle phases, one may conduct a progressively more detailed FMEA analysis.
An FMEA analysis consists of four steps:
1. Establishing the scope of the analysis
2. Collecting data
3. Preparing a components list
4. Preparing the FMEA worksheets
It is important to clearly state the scope of the FMEA analysis. Clearly identifying the boundaries
of the equipment so that no component within that equipment is left out is an important part of
the scope. Also included in the scope is the identification of underlying causes of failures and the
possible effects of these failures on the equipment. Failure detection, safeguards, frequency of the
failure, and the criticality of the effects of the failure information may also be included.
The type of information necessary to perform the analysis includes: equipment configurations,
designs, specifications, and operating procedures. Data may also be collected by interviewing:
design personnel; operations, testing, and maintenance personnel; component vendors; and
outside experts, to gather as much information as possible.
A list of all components in the equipment is prepared before examining the potential failure
modes of each of those components. Functions, operating conditions (such as; temperature,
loads, and pressure), and environmental conditions of each component may be included in the
components list.
According to C. Sundararajan, the following questions are answered for every component of the
equipment.
1. How can the component fail? (There could be more than one mode of failure.)
2. What are the consequences (effects) of the failure?
3. How critical are the consequences?
4. How is the failure detected?
5. What are the safeguards against the failure?
How many of these questions are asked and which ones they are depends on the scope and
purpose of the analysis. When these questions are answered, all significant failure modes of the
different components are identified, their detection and safeguards are documented, and their
effects on the equipment are determined.

90
Findings of the FMEA analysis are recorded in a tabular format in FMEA worksheets. MIL-STD-
1629A describes the worksheets in detail.
References
Sundararajan, C., Guide to Reliability Engineering Data, Analysis, Applications, Implementation,
and Management, NY:Van Nostrand Reinhold, 1991, pp. 146-152.
MIL-HDBK-338-1A, Electronic Reliability Design Handbook, Irvine, CA:Global Engineering
Documents, 12 October 1988, Global Engineering Documents, pp. 7-100 to 7-121.
MIL-STD-1629A, Procedures for Performing a Failure Mode, Effects, and Criticality Analysis,
Washington, DC:Department of Defense, 24 November 1980.

91
Engineering Activity E15: Equipment Characterization

The characterization of a piece of equipment involves identifying the optimal and extreme
operating ranges for:
• Electronic parameters, such as voltage, current, frequency
• Environmental parameters, such as temperature, humidity, vibration
• Mechanical adjustments, such as dial settings and clearances
• Faulty inputs, such as gas, water, electrical
• Operational characteristics, such as wafer handler arm velocity versus number of
broken wafers
During characterization, the preferred operating range of an individual component is determined
as well as the impact of this range on the other components in the equipment. Components that
do not have a range that interfaces properly or are totally incompatible with the other components
are replaced or redesigned. This process is continued until compatible ranges are established for
all components.
Applicable Tools
AT8 Life Testing

92
Engineering Activity E16: Component Failure Analysis

The purpose of failure analysis is to determine what failed and why it failed. The root cause of
the failure is determined so that the correct change is made and the failure does not recur. Root
causes include: operator or maintenance errors, over stressed parts, and factory assembly errors.
A useful tool for helping to determine root causes is a cause and effect diagram. Failure analysis
is most appropriate when either:
• A particular component type is failing at a significantly higher rate than previously
estimated or predicted
• The failure has a major impact on safety or performance or both.
All details concerning the failure are recorded, these include:
• The observed failure mode
• The relevant conditions under which the failure occurred
• How long the failed component was operating before failure (Operating time
should be estimated if it is possible that the component was inoperative a
significant amount of time before it was noticed.)
• Extenuating circumstances; such as damage occuring during troubleshooting
• Assembly drawings of mechanical components
• A copy of the component data sheet or specifications
• Circuit schematic diagrams for electronic components
• The undisturbed failed component within its assembly (This is particularly
important for mechanical or electromechanical components.)
The level or depth of the failure analysis depends on the level where the corrective action will be
taken. For instance, if a subassembly may be replaced by a more reliable one from another
supplier, it is only necessary to determine which subassembly failed. Otherwise, the specific
component that failed is found and replaced within the equipment.
The supplier of a failed component may perform the failure analysis at no cost since there is a
vested interest. Supplier application or product engineers can also be very helpful in pointing out
possible improvements to their components. Failure analysis laboratories are set up to analyze
components and can also be helpful. Major semiconductor suppliers and failure analysis
laboratories analyze components using visual microscopes, scanning electron microscopes,
dissection, and elemental analysis techniques.
Applicable Tools
AT3 Cause and Effect (Fishbone) Diagram

93
Engineering Activity E17: Failure Reporting, Analysis and Corrective Action

A Failure Reporting, Analysis and Corrective Action System (FRACAS) provides a closed-loop
feedback path by which data on failures occurring during field tests and operation are collected,
recorded, and analyzed to determine where problems are concentrated in the design. This
promotes continuous improvement in equipment reliability. A FRACAS is also used to track
internal test performance and provides a good historical basis for comparison to external
equipment performance.
Test Inspect Correct
Design
Reliable
and Product
Production
CUSTOMER
Test failure
FAILURE REVIEW report
BOARD
Quality
Assurance
DATABASE Report
CORRECTIVE Actions Reports FAILURE

ACTION REPORTING
Analysis
Development
Implementation
Verification
ANALYSIS
Failure Investigation
Cause Investigation
Reprinted with permission ©1991 Society of Automotive Engineers,Inc.
A FRACAS is used to:

• Establish a closed-loop failure reporting system
• Establish procedures that are used to determine the cause of subsystem and
component failures
• Document the corrective actions taken
The reason for establishing a closed loop system is that it allows one to collect, analyze, and
record failures down to a specified level, that is to the subsystem, component and part level.
Procedures for initiating failure reports, the analysis of failures, feedback of corrective action into
the design, manufacturing and test processes are identified. The closed-loop system includes
provisions that ensure that effective corrective actions are taken on a timely basis by a follow-up
audit that reviews all open failure reports, failure analysis and corrective action suspense dates,

94
and the reporting of delinquencies to management. The failure cause for each failure is clearly
stated.
The objectives of a FRACAS are to:
• Assess historical reliability performance
• Develop a pattern of deficiencies
• Provide engineering data for corrective action
• Develop statistical data for
− component failure rates and downtime
− component selection suitability criteria
− component application reviews
− future designs and design reviews
− product improvement programs
− spares provisioning
− life cycle costing
• Develop contractual performance data
• Provide warranty information
• Furnish safety and regulatory compliance data
• Assess liability-claim information
References
A Reliability Guide to Failure Reporting, Analysis, and Corrective Action Systems, Milwaukee,
WS:American Society for Quality Control, 1977.
MIL-STD-785B, Reliability Program for Systems and Equipment Development and Production,
Task 104, Philadelphia, PA:Naval Publications and Forms Center, 1980.

95
Society of Automotive Engineers Data Activity D1: Data Collection and Data
One of the building blocks for FRACAS is the collection of data and managing that data with a
data base management system. Together, they provide an organized way to gather factual data
about equipment performance - both good and bad.
Based on the reliability model for the equipment, a shopping list for data is established. Each
component or subsystem modeled in the fault tree or block diagram requires data in the form of a
failure probability or frequency. Several types of data are needed to determine the failure
probability and to assess product reliability:
• Cumulative operating time
• Number of failures
• Conditions present at the time of failure
There are three methods used for collecting reliability data. The first method involves the use of
a standardized reporting form that is filled out by engineers and technicians who are involved in
equipment testing, troubleshooting, and repair. These forms need to be simple to use and ask
only for needed information. An example of a reliability reporting form is on the following page.
To obtain a better understanding of the final use and importance of the data; personnel involved
in the collection of the data, final test technicians, and field service engineers are part of the team
that designs the data collection form and are involved in analyzing the data.
The second method involves the use of customer database and equipment tracking information.
This requires an excellent on-going customer supplier relationship. Great care must be taken to
ensure compatibility between the supplier and multiple customers’ data. Simply agreeing to
SEMI E10-90 specifications will not suffice; although basing the specifications on E10-90 makes
it industry compatible. In addition, a standard way of identifying failures and assists to the
subsystems and components should be devised. Inclusion of key customer equipment engineers
in evaluating the validity of the data collected is very useful.
The third method is to use the on-board CPU power to monitor and track equipment status,
faults, and errors. Customers agree to allow the information to be downloaded to a floppy disk
and removed from the site. The ability to time stamp and match this information to customer
data base information provides useful data.

96
Project/Model Part Name Affected Date Problem Found
Part Number Affected Name of Major Component Affected
Description of Problem (what, where, when, how many, etc.)
Impact/Effect/Consequences of Problem
Apparent Cause of Problem
Remarks
Reported By Date Referred Problem To

97
If there is no equipment in the field from which to collect data there are several sources of data
available:
• Historical data
• Sub-tier supplier data
• In-house data
• Expert judgement
Historical data is data that has been collected for a previous generation of equipment or similar
equipment. The use of this data is limited to those subsystems and components that are similar
to those in current equipment. This data also requires that attention is paid to trends; that is, if
the subsystem or component had been undergoing improvements or if the methods of collecting
the data were changing, these must be accounted for. When a subsystem or component is
purchased from a supplier, that supplier should be able to supply the data that has been collected
for that part up to this point in time. Once a testing program exists for the equipment, in-house
data is available. For those subsystems and components that have none of the previous sources
of data available, expert judgement can be used to create initial reliability values. Expert
judgement takes the opinion of individuals who are considered to be knowledgeable about a
subsystem or component and uses this knowledge to create failure rates. It should be noted that
these sources of data do not always represent the environment and operating conditions that the
equipment will see in the field. Thus, the preferred source of data is always field data.
When collecting data, it is important to keep all of the data. This makes it possible to represent
the subsystem and component failure rates over a range of values and more accurately represents
the variety of environments and users that the subsystem and component will see.
It cannot be stressed enough that the validity of the reliability model and its predictions depend
on the validity of the data. A statement commonly used by software users is, "Garbage In,
Garbage Out," which is just as applicable here. As soon as possible replace historical and expert
judgement data with data collected during testing and operation in the field.
At this time it is important to discuss how the collected data is translated into failure rates, that
are used to improve the equipment’s reliability. In a typical piece of equipment, some
components are under stress or used continuously while others are used cyclically. Thus, failure
rates can be defined as a function of time (per hour) or cycle (per wafer). In either case, the
collected data includes the number of cycles, wafers, or hours during which the failures occurred.
Failures are evaluated to assure that the failures were genuine and resulted in equipment
shutdown or lost production time. Once the evaluation is done, translating data into failure rates
is fundamentally simple. Suppose that a database includes 25 machines operating over a 9-
month period. If component A failed 20 times and the average operational time for the 50
machines was 70 percent (that is, its utilization factor is 0.70), the failure rate for component A
would be
MTBF = 20/[25(9 mo.)(30 days/mo.)(24 hr./day)0.7]
= 1.8x10-4 failures/hr.

98
Suppose a second component, B, failed 12 times, but it relates to wafers, and the machine
averages 10 wafers/hr. the failure rate of component B would be
12/[25(9mo.)(30 day/mo.)(24 hr./day)(10 wafers/hr.)0.70]
= 9.5x10-5 failures/wafer processed.
Alternatively, it would be
MTBF = 9.5x10-5 failures/wafer(10 wafers/hr.)
= 9.5x10-4 failures/hr.
The key, of course, is knowing or estimating the utilization factor. This can be determined by
tabulating and averaging the operational times of all 25 machines. It can also come from groups
of machines, given general production information.
Applicable Tools
AT18 User Groups
References
Bigelow, J., "Tailored Data Collection," Quality, August 1991, pp. 21-22.
SEMI E10-90, Guideline For Definition And Measurement Of Equipment Reliability, Availability,
and Maintainability (RAM), SEMI 1990, pp. 69-75.

99
Data Activity D2: Human Reliability Analysis (HRA)

Human reliability analysis (HRA) is a technique used to systematically identify, analyze,
quantify, and document the possible human failure modes within a design, and the effects of such
failures on the overall equipment reliability. Analyses of the behavior and needs of humans are
among the more controversial of the sciences; thus, it is no surprise that there are several
competing approaches to the handling and identification of people problems. The most widely
used quantitative HRA technique is the Technique for Human Error Rate Prediction (THERP),
developed at Sandia National Laboratories.
THERP is defined as a method to predict human error rates and to evaluate the degradation to a
man/machine system likely to be caused by human errors in association with equipment
functioning, operational procedures and practices, and other system and human characteristics
which influence system behavior.
There are five steps in applying the THERP model:
1. Define equipment failures
2. Identify related human operations and tasks related to each equipment failure
3. Estimate associated human error probabilities
4. Estimate the effects of the human errors on the equipment reliability
5. Recommend changes to the man/machine system and return to step 2
The NATO article listed below summarizes and explains the THERP model (and extols its
virtues). The article from Human Factors is an annotated bibliography of Sandia Laboratories
work in this area and will be very helpful to anyone trying to estimate the effects of human frailty
on a system. It also lists 44 sources of further information.
References
Ericson, D., editor, et.al., Analysis of Core Damage Frequency: Internal Events Methodology,
NUREG\CR-4550, Volume 1, Revision 1, SAND86-2084, Albuquerque, NM:Sandia National
Laboratories, pp. 7-1 to 7-80.
Siegel, A., J. Wolf, A Technique for Evaluating Man-Machine Systems Design, Human Factors,
3:1, 1961.
Swain, A., H. Guttmann, Handbook of Human Reliability Analysis with Emphasis on Nuclear
Power Plant Applications, NUREG/CR-1278, SAND80-0200, Albuquerque, NM:Sandia
National Laboratories, August 1983.
Swain, A.D., Shortcuts in Human Reliability Analysis, Holland:Nordhoff Publishing Company,
NATO Advanced Study Institute on Generic Techniques in Systems Reliability Assessment,
1975.
MIL-HDBK-338-1A, Electronic Reliability Design Handbook, Vol. I of II, Irvine, Ca:Global
Engineering Documents, 12 October 1988, pg. 7-100.

100
Test Activity T1: Test Plans

Testing activities are driven by the need to ensure that all goals and requirements for a piece of
equipment and its subsystems, components, and parts are achieved. Testing is the primary means
of generating enough data on critical components, parts, and subsystems to reduce the uncertainty
of data being fed into the equipment model. It also provides a predetermined strategy for testing
the equipment as a whole. The testing plan includes testing across all life cycle phases, and is
updated and refined as required. The plan changes as the equipment passes through each phase
of the life cycle and is updated at the time of transition from one phase to the next.
A testing plan encompasses all aspects of testing necessary to meet reliability goals. Since
testing is one of the basic tools in reliability improvement it is also a means of providing
continuous improvement. The testing plan includes procedures and criteria for:
• Testing equipment and subsystems
• Testing components and parts
• Reliability demonstration testing
For every test performed, there must be a clear definition of requirements so that the proper type
and number of tests are conducted, valid measurements are made, and the necessary data are
obtained. One good practice is to predict the expected results or level of performance based on
calculations or best engineering judgment. These predictions serve as a guide for monitoring the
tests and assessing the validity of the test results. During testing, it is not unusual to experience
unexpected failures. Some of these may be fluke conditions, but more often each failure is an
indication of a true problem. Thus, it is a good practice to include all test failures in the failure
statistics and investigations.
Specific tests should be planned to coordinate with the total testing program so that the derived
information has the maximum possible value for continuing application throughout later stages
of the program.
References
MIL-STD-781D, Reliability Testing for Engineering Development, Qualification, and
Production, Washington, DC:Department of Defense, 17 Oct 1986.
Arsenault, J., F. Roberts, editors, Reliability & Maintainability of Electronic Systems, Potomac,
MD:Computer Science Press, Inc., 1980, pp. 353-354.

101
Test Activity T2: Reliability Tests

Reliability tests consists of testing:
• Components and parts
• Equipment and subsystems
• To demonstrate reliability
Testing in these three categories is discussed in more detail in the following paragraphs.
Component testing involves testing individual components and parts to examine the relative
merits of alternative designs and to determine design margins. Such tests are also useful in
determining the validity of design and calculation methods.
Component testing forms an important part of development, at this stage, various components
and parts are tested over a wide range of conditions. This is done to insure that the best of
several alternative designs will be chosen, and that the part or component will perform
satisfactorily at other than nominal conditions when integrated into the equipment. Problems
associated with component testing include realistically simulating equipment environments,
including parametric input and variation to the component or part and the determining number of
tests required to demonstrate reliability. Thus, component testing is better suited to improving
reliability by optimum selection; that is, flushing out basic weaknesses in critical components
and parts, than to determining the absolute value of reliability.
There are several tools that are useful for testing components and parts. Accelerated testing can
be used to gather reliability data in a shorter period of time; it can be used with Environmental
Stress Screening (ESS) and Reliability Development/Growth Testing (RD/GT). ESS can be used
to stimulate failures by stressing the component or part to detect and remove early failures.
RD/GT is used to identify and correct failure modes and then to verify that the failure has been
eliminated. Life testing can be used to evaluate the useful life or reliability of a component or
part. Burn-In testing is used to screen out defects in the part or components during the respective
infant mortality periods. (See AT2).
Equipment testing involves testing of individual subsystems or the equipment itself. Equipment
testing is basic to reliability improvement. In order to achieve the best results, the equipment and
subsystems should be tested under conditions that closely simulate the expected operating
conditions. Equipment tests are intended to explore the effects of component and part
interactions under loading and environmental conditions of the real world. The tests are
conducted on an iterative basis; that is, they follow a test, fail, fix, and retest approach. This
approach is intended to find the failure mode for the weakest link and design it out, find the
second weakest link and design it out, and so on, until an adequate level of reliability
performance is achieved. Equipment tests are also performed to see whether certain
configurations are feasible or which of several are optimal with respect to performance, cost, and
modes of behavior under varying conditions.
When testing on the equipment level, there is obviously no need to simulate internal
environments. The equipment has a lower reliability requirement relative to its components and
parts. This makes it easier to demonstrate an absolute reliability number, which is dependent on
the cost and/or number of equipment and subsystems available for testing. If started too soon,

102
many failures will occur in components and parts that have not been sufficiently proven out; this
makes failure tracking difficult. Another disadvantage in starting equipment testing too early is
that if too many component and part failures occur, the remainder will be subjected to too many
start operations, which are perhaps severer than steady-state operation. Consequently, a false
impression of the failure distributions will occur, compared with those expected in operation.
Equipment testing focuses on "Is the component or part reliable within the subsystem or
equipment?" Equipment testing does not eliminate component testing, but helps to pinpoint the
faulty components or parts, so that they may be replaced or modified by superior products.
Equipment testing is a way of realistically evaluating reliability as well as guiding component
and part improvement by systematically discovering problems and weaknesses.
There are several tools that are useful for testing subsystems and equipment. As with component
tests, accelerated testing can be used to gather reliability data in a shorter period of time. It can
also be used with Environmental Stress Screening (ESS) for subsystems and Reliability
Development/Growth Testing (RD/GT) for both subsystems and equipment. ESS is not done at
the equipment level; however, it is useful at the subsystem level. ESS can be used to stimulate
failures by stressing the subsystem to detect and remove early failures. RD/GT is used to identify
and correct failure modes and then to verify that the failure has been eliminated. Reliability
Qualification Testing (RQT) is used to verify that critical subsystems and the equipment meet
design goals and comply with contractual/program objectives. Life testing can be used to
evaluate the useful life or reliability of a subsystem or the equipment. Burn-In Testing is used to
screen out defects during a subsystem’s or equipment’s infant mortality period.
Reliability Demonstration Tests are used to demonstrate, often to the customer, that the
equipment is capable of meeting its specified performance and reliability for a stated period of
operation. This type of test can be very expensive and requires careful planning and execution.
The equipment and its associated subsystems, components, and parts that are going to be tested,
and the test conditions to be used must be closely controlled to ensure the validity of the final
results. It is often the practice to disassemble the items totally after the tests are completed to
inspect each one for wear, damage, or signs of impending failure.
A tool that is very useful for reliability demonstration tests is Reliability Qualification Testing
(RQT). RQT is used to verify that the equipment will meet design goals and comply with
contractual/program requirements.
Applicable Tools
AT1 Accelerated Testing
AT2 Burn-In Testing
AT8 Life Testing
AT13 Reliability Development/Growth Testing (RD\GT)
AT14 Reliability Qualification Testing (RQT)

103
References
Lloyd, D.K., M. Lipow, RELIABILITY: Management, Methods, and Mathematics, Second
Edition, Milwaukee, WS:The American Society for Quality Control, 1991, pp. 349-354.

104
Applicable Tool AT1: Accelerated Testing

Accelerated tests are performed when the test time necessary to provide adequate reliability
assurance under normal operating conditions is inordinately long, and therefore very expensive.
Gathering reliability data should not hold up the development of the equipment and it should be
as economical as practicable. Therefore, it is important to be able to accelerate reliability tests.
Reliability tests can be accelerated by increasing the sample size, provided that the item being
tested does not have wearout characteristics during its anticipated life. Increasing the sample size
is appropriate for small, cheap items that can be produced in quantity. Using the large sample
reduces the error in the reliability estimate for the population due to part-to-part variability.
However, large-sample reliability tests, to provide a high total operating time, should be
supported by some long duration testing if there is reason to suspect that failure modes exist
which have high times to first failure. Extrapolation of reliability data over long periods of time
must be treated with caution, and therefore whenever practicable, supporting long duration tests
should be considered.
A particular type of large sample test is sudden death testing, in which the sample is split into
subgroups and the time to first failure in each group is collected.
Increasing the severity of the test is an obvious approach when large samples cannot be provided.
However, there are two problems:
1. What is the equivalent operating time under normal stress?
2. Are the failures induced under the accelerated test conditions the same as those
that might occur under normal conditions?
Another type of accelerated testing is step-stress testing. Step-stress testing is a technique
whereby the item is tested initially at normal stress, but after a certain time the stress is increased,
and stepwise increases are continued until the item fails.
It is important in accelerated testing to ensure that unrealistic failure modes are not introduced by
the higher stresses. The physics of the materials being tested and analysis of failures should
indicate whether or not such failure modes are likely to occur or be stimulated. Obviously,
failure modes that can occur only at stresses well above the maximum operating stress will not be
of interest. For example, increasing temperature beyond a certain level may change the strength
of a material, so it is important that temperature increments are kept within limits. It is also
possible that interactions may occur between different stresses, so that the combined weakening
effect is greater than would be expected from a simple additive process.
References
Hall, I., W. Cramond, D. Huffman, Summary of the SETEC Accelerated Testing Workshop,
SETEC91-017, Albuquerque, NM:Sandia National Laboratories, 1991.
O’Connor, P., Practical Reliability Engineering, Third Edition, NY:John Wiley & Sons, 1991,
pp. 264-267.

105
Applicable Tool AT2: Burn-In Testing

Burn-in is a special type of test that might be better described as a pre-delivery operation of the
equipment. The following figure shows the life cycle failure probability as a curve that
resembles a cross-section of a common bathtub.
The left decreasing portion of the curve is the infant mortality period, where a disproportionate
number of failures occur early in the equipment’s lifetime. The flat part represents the constant
failure rate during the useful life of the equipment. The right increasing portion is the wear-out
period. It is useful to know, as closely as possible, where the infant mortality ends and the wear
out starts, even when burn-in tests are not performed.
Burn-in has proven to be an effective means of screening out defects during a components infant
mortality period. The typical burn-in test combines electrical stresses with temperature cycling
for short periods of time to activate temperature and voltage failure mechanism dependencies.
The two types of burn-in tests are static and dynamic. In static burn-in, a bias may be applied to
the device under test at very high temperatures. In dynamic burn-in, entire circuit cards may be
operated to simulate actual equipment operation.
Screening out the infant mortality failures results in more reliable components. Because most of
the failures occur during the infant mortality phase of the components life, this method of testing
results in reliability improvement of the equipment.
Burn-in tests are usually conducted on 100% of the production units to weed out production
errors related to minor variations in workmanship and process fluctuations that result from
engineering changes. Burn-in tests also discover some residual design errors. In these tests, the
stresses applied are usually within published performance constraints, and are applied for short
periods of time. Their purpose is to prevent production-related errors from being shipped.
Products that have undergone burn-in tests should be failure free.

106
References
Klinger, D., Y. Nakada, M. Menendez, AT&T Reliability Manual, NY:Van Nostrand Reinhold,
1990, pp. 52-57.
Punches, K., "Burn-In and Strife Testing," Quality Progress, May 1986, pp. 93-94.

107
Applicable Tool AT3: Cause & Effect (Fishbone) DiagramTool AT3:

The cause-and-effect diagram was invented by Dr. Kaoru Ishikawa to represent the relationship
between some effect, that is problem, and all the possible causes influencing it. The diagram is
also called a Ishikawa diagram, or a fishbone diagram because a well-detailed diagram will take
on the shape of fishbones. The main problem is indicated on a horizontal line and possible
causes of that problem are shown as branches. A common set of major categories for causes
consists of
• Personnel
• Work methods
• Materials
• Equipment
• Environment
For each cause ask, "Why does it happen?" and list responses as branches off the major causes.
The causes shown as branches can have sub-causes, indicated by sub-branches, and so on.
References
Ishikawa, K., Guide to Quality Control, White Plains, NY:Quality Resources, 1982, pp. 8-29.
O’Connor, P., Practical Reliability Engineering, Third Edition, New York:John Wiley & Sons,
1991, pp. 311-312
The Memory Jogger, Methuen, MA:GOAL/QPC, 1988, pp. 24-29.

108
Applicable Tool AT4: Competitive BenchmarkingTool AT4:

Competitive Benchmarking is an ongoing formal process used by a company to measure and
compare their products, services, and operations against their toughest competitors and those
companies demonstrating world class performance. The aim of this process is to identify the
leading companies’ secrets and use this information to establish goals and priorities, and target
areas for improvement.
The process itself is straightforward and simple; Industry Week outlines the benchmarking
process with a list of 10 steps. However, the simplicity of the process belies its true power. One
aspect of benchmarking that sets it apart is that it directs a company’s focusoutside their own
walls - aimed squarely at the marketplace and their competition. This leads to setting goals that
are geared toward being the best in the world, not just slightly better than last year.
Another benefit of benchmarking is that it can provide the blueprints for how a company can leap
ahead of even the best of its competitors. Improvements are not only in the equipment but in
secondary and supporting systems and processes.
Other benefits of benchmarking include:
• Identifying the keys for success for each area studied
• Providing specific quantitative targets
• Creating an awareness of state-of-the-art approaches
• Cultivating a culture where change, adaptation, and continuous improvement are
actively sought out
• Spotting emerging competitors and seeing where the company should be going in
the future

109
References
Altany, D., "Copycats," Industry Week, November 5, 1990, pp. 11-18.
Camp, R., Benchmarking: The Search For Best Practices That Lead To Superior Performance,
Milwaukee, WS:ASQC Quality Press, 1989.
Pryor, L., Beating The Competition: A Practical Guide To Benchmarking, Washington
DC:Kaiser Associates, 1988.
Competitive Benchmarking: What It Is And What It Can Do For You, Stamford, CONN:Xerox
Corporate Quality Office, Reference No. 700P90201, May 1987.

110
Applicable Tool AT5: Design of Experiments (DOE)

Design of Experiments (DOE) refers to a collection of methods, largely but not exclusively
statistical, for collecting and analyzing data under controlled conditions. This collection includes
methods for the design and analysis of simple experiments, as well as strategies for moving from
one experiment to the next based on previous results. The goal of all these methods is to
maximize the information contained in and available from relatively little data.
Experiments are performed for a variety of purposes, some exploratory, others confirmatory.
Exploratory experiments include those aimed at cause detection, as well as those designed to
accomplish the Taguchi goals of parameter design and tolerance design. Confirmatory
experiments include, for example, process qualification studies.
Regardless of an experiment’s purpose, the experimenter must face and deal with three issues
common to all experimental situations:
• Response variability
• Known but extraneous systematic effects
• Extraneous unknown effects
The general strategies used in experimental design to deal with these issues are replication,
blocking, and randomization, respectively.
Other aspects of experimental design include the:
• Selection of factors and determination of factor levels
• Selection of response
• Selection of the specific combination of factor levels at which to run the
experiment
• Precise specification of the experimental procedure to be followed
Each of these activities is governed by the experiment’s purpose.
Methods for analyzing experimental data can be either numerical or graphical. The commonest
family of numerical techniques are comprehended under the heading ANOVA, and include
formal hypothesis tests, confidence intervals, and multiple comparison procedures. Graphical
methods of analysis include simple histograms and dot-frequency plots, normal probability plots
of effects and residuals, and Bayes plots.
An important family of experimental designs are the full-factorial and fractional-factorial.
Usually implemented with 2-level factors, they can be readily extended to multi-level factors. A
serious drawback of a multi-leveled factorial design is its expense, the number of experiments
grows exponentially as the number of levels increases. To a large extent, this is the reason for
the popularity of 2-level factorial designs in initial screening experiments.
The purpose of factorial/fractional-factorial designs is:
• Screening and first pass optimization
• Investigating the effect of many factors simultaneously
• Assessing interactions or coupling of factor effects

111
In particular, in the presence of interactions, full-factorial and fractional-factorial designs are

superior to one-at-a-time strategies.
Fractional-factorial designs are useful for screening and are highly efficient for large numbers of
factors. However, one assumes that only low-order interactions are present. When the
experiment is run with center points both full-factorial and fractional-factorial designs can signal
curvature or non-linearity. When used with steepest-ascent methods, factorial designs provide
efficient second order optimization.
The final stage of optimization can be achieved using response-surface methods. These methods
are usually based on a second degree polynomial model that allows estimation of curvature.
Although multi-level factorial designs could be used for fitting higher order surfaces, the family
of central-composite designs are built up from fractional-factorial or full-factorial designs by
adding selected axial joints.
References
Box, G., W. Hunter, J. Hunter, Statistics for Experimenters, An Introduction to Design, Data
Analysis, and Model Building, New York:John Wiley and Sons, 1978.
Taguchi, G., Introduction To Quality Engineering: Designing Quality into Products and
Processes, White Plains, NY:UNIPUB/Kraus International Publications, 1987.

112
Applicable Tool AT6: Environmental Stress Screening (ESS)

Environmental Stress Screening (ESS) is a modern production tool used to increase reliability. It
has been particularly useful in the electronics industry. The methodology consists of the
application of environmental inputs; that is, electrical, thermal and mechanical stresses, to
equipment to accelerate the occurrence of potential failures. The environmental inputs are
chosen to maximize defect identification in a minimum amount of time without creating any new
defects.
ESS is used to stimulate failures by stressing subsystems, components, and parts to detect and
remove early failures due to weak subsystems, components, and parts; workmanship defects; and
other nonconformance anomalies. It is particularly useful in uncovering process-induced defects.
The stressing does not need to simulate the precise operating environment. However, the
subsystem, component, or part is cycled through its operational modes while simultaneously
being subjected to the required environmental stresses. Many stressors have been found to be
effective in ESS, including temperature cycling, random vibration, altitude, humidity, and sound.
Rapid thermal cycling and random vibration are the most commonly used environmental screens
and are effective in detecting most types of latent defects.
The type of test done at the component level will usually be different from the type of test done at
a higher or lower level. At lower levels, stronger screens can be used without damaging the
product. This is desirable because a stronger screen normally results in a higher defect detection
rate and lower repair costs later in the life cycle. At higher levels, ESS can be used to identify
intermittent failures effectively through power-on cycling. Equipment level screens are not
advisable because lower-cost screens tend to precipitate most screenable defects.
Even without ESS, some defects will be found before delivery to the customer. However, many
defects will remain and cause service difficulties, particularly early in service. With ESS, many
more of the existing errors are found at the factory, which leads to an improvement in In-service
reliability. If a pattern of defects is observed, changes are made in the manufacturing design,
manufacturing method, or both to eliminate the root cause of the problem. This means fewer
defects and even lower manufacturing costs.

113
References
Bailey, R., R. Gilbert, "STRIFE Testing for Reliability Improvement," PROCEEDINGS -
Institute of Environmental Sciences, Vol. 1, 1981, pp. 119 - 123.
Bird, C., "Unit Level Environmental Screening," PROCEEDINGS - Institute of Environmental
Sciences, May 1980, pp. 63 - 64.
Punches, K., "Burn-In and Strife Testing," Quality Progress, May 1986, pp. 93 - 94.
Tustin, W., "Shake and Bake the Bugs Out," Quality Progress, Sept. 1990, pp. 61-64.
MIL-STD-785B, Reliability Program For Systems And Equipment Development And Production,
Task 301 Environmental Stress Screening, 3 July 1986, pp. 301-1 to 301-2.
MIL-STD 810E, Environmental Test Methods And Engineering Guidelines, 14 July 1989.
RMS Committee, RMS Reliability, Maintainability & Supportability Guidebook, SAE G-11,
Warrendale, PA:Society of Automotive Engineers, Inc., 1990, pp. 203-209.

114
Applicable Tool AT7: Fault Tree Analysis

Fault Tree Analysis (FTA) is one of the most widely used and versatile methods of deductive
analysis. Deductive analysis constitutes reasoning from the general to the specific. For example,
the equipment has failed; now, what events have caused the failure of that equipment. This
approach is commonly called the "Sherlock Holmesian" approach. Holmes, faced with given
evidence, has the task of reconstructing the events leading up to the crime. Indeed, all successful
detectives are experts in deductive analysis.
Equipment
Failure
SS 1 SS 2 SS 3
C1 C2 C3 C4 C5
P1 P2 P3 P4
FTA is used to determine the various combinations of events; that is, component-level failures,
that could result in equipment failure. Component-level failures include hardware failures,
human errors, and software errors. A failure can range from noncompliance with specifications
to the inability of a component to perform its intended function. Component-level failures, in
fault tree (FT) terminology, are called primary events. Equipment failure refers to an undesired
state of the equipment; such as, the equipment stops functioning or makes bad products.
Equipment failure, in fault tree terminology, is called the top event. A fault tree is not a model of
all possible equipment failures or all possible causes of equipment failure. A fault tree is tailored
to its top event; that is, the fault tree only includes those failures that cause that top event to
occur.
Construction of a FT begins by defining what the top event is, for example, failure of the
equipment at less than 1000 hours. The next step involves determining the various ways that this
failure can occur. This is initially done at a fairly gross level. (For example, equipment failure
due to failure of the wafer handler subsystem). Once the equipment is modeled at a gross level;
that is, the model consists of 10 to 20 major subsystems, the next step is to determine which of
the subsystems should be modeled in more detail. If a particular subsystem rarely fails and it is
anticipated that this situation will not change, it would be a waste of time and effort to model it.
Concentrate instead on those subsystems that cause the equipment to frequently or
catastrophically fail. Those subsystems that are targeted as a reliability problem for the
equipment are broken into more detail. For example, the wafer handler subsystem could be

115
broken into the arm, associated software, and electrical components. Only those portions of the
wafer handler subsystem that significantly contribute to failure of that subsystem are broken into
more detail. This process is continued for all identified subsystems until all potential ways of
failing the equipment are identified.
The remainder of the description of this tool will focus on a general description of fault tree
analysis and the Boolean algebra necessary to quantify the fault tree into an equipment failure
rate. The references at the end of the description provide more detailed information.
At the top of the FT the top event is listed within a rectangle. The icon at the beginning of this
tool description has labeled its top event Equipment Failure. Next, the question, "How can the
equipment fail?" is asked. All those events; that is, subsystems, that can cause equipment failure
are placed in the FT under the top event, see Subsystem 1 (SS1), Subsystem 2 (SS2), and
Subsystem 3 (SS3) in the icon. Gates are used to connect the events. The gate between the top
event, equipment failure, and the primary events, SS1, SS2, and SS3, indicates that failure of
SS1, SS2 or SS3 will cause the equipment to fail. Some of the symbols used in a fault tree
include:
Primary Events
Basic Event A basic failure requiring no further development.
Undeveloped Event An event that is not further developed either

because it is insignificant or information is
unavailable.
Gates
AND Gate Output fault occurs if all the input faults occur.
OR Gate Output fault occurs if at least one input fault

occurs.
Transfer Symbols
Transfer In Indicates that the tree is developed further on
another page.
Transfer Out Indicates that this portion of the tree connects at

the corresponding transfer in.
There are other less-used events and gates that are described in texts on FTA. As can be seen in
the icon, SS1 fails if component 1 or 2 (C1 or C2) fail. C2 fails only if both parts 1 and 2 (P1
and P2) fail. SS3 fails if components 3, 4, and 5 (C3, C4, and C5) all fail. Failure of C4 requires
either part 3 (P3) or part 4 (P4) to fail.
Once construction of the fault tree is completed, it is translated into an equation that is used to
quantify the equipment failure rate. Fault trees are based on Boolean algebra. Boolean algebra is
the mathematical manipulation of events derived from logical reasoning. The references discuss
Boolean algebra in detail; it will not be discussed here. The Boolean equations for the icon fault
tree are:

116
Equipment Failure = SS1 + SS2 + SS3

SS1 = C1 + C2
C2 = P1 * P2
SS3 = C3 * C4 * C5
C4 = P3 + P4
where + means OR, and * means AND.
Substituting into the equipment failure equation,
Equipment Failure = C1 + P1 * P2 + SS2 + C3 * (P3 + P4) * C5
expanding and using the associative and distributive laws
Equipment Failure = C1 + P1 * P2 + SS2 + C3 * P3 * C5 + C3 * P4 * C5.
Each of the terms in this equation is a scenario that leads to the top event; for example, C1 is a
failure of component 1 which leads to equipment failure.
In the IC equipment industry, the fault tree will consist almost entirely of OR gates. This means
that every primary event is a scenario leading to the top event. AND gates are used when there is
redundant equipment. Redundancy is a principle often used for critical safety functions.
The fault tree has been translated into an equation, it is now time to quantify the probability of
the top event as a function of the primary events. Often, the term probability is used when what
is really meant is frequency, probabilities must lie between 0 and 1. A frequency can be any
number greater than or equal to 0, depending on the number of events that occur and the time
scale. For example, if a component fails twice per year, its frequency is 2/yr, or 0.66/mo.
Using the previous example, the probability of the Equipment Failure can be written,
P(Equipment Failure) = P(C1 + P1 * P2 + SS2 + C3 * P3 * C5 + C3 * P4 * C5).
But, how does one deal with the right-hand side of the equation? Considering the basic laws of
probability and the small probability approximation, and assuming that the events are
independent, the example equation becomes:
P(Equipment Failure) = P(C1) + P(P1)*P(P2) + P(SS2) + P(C3)*P(P3)*P(C5) +
P(C3)*P(P4)*P(C5).
References
Dhillon, B.S., Quality Control, Reliability, and Engineering Design, New York:Marcel Dekker,
Inc., 1985, pp. 154-163.
Roberts, N., W. Vesely, D. Haasl, F. Goldberg, Fault Tree Handbook, NUREG-0492,
Washington, DC:U.S. Nuclear Regulatory Commission, January, 1981.
Sundararajan, C., Guide To Reliability Engineering Data, Analysis, Applications,
Implementation, and Management, New York:Van Nostrand Reinhold, 1991. pp. 153-285.

117
Applicable Tool AT8: Life Testing

Life testing is used to estimate and demonstrate the numerical reliability of a part, component,
subsystem, or piece of equipment; that is, evaluate its useful life or reliability. Part and
component tests are typically performed to examine the relative merits of alternative designs and
to determine design margins. Subsystem and equipment tests are intended to explore the effects
of component and part interactions under the loading and environmental conditions of day-to-day
use. Life tests can be carried out either at normal operating conditions or at accelerated stress
levels.
When performing life testing, one is not only interested in when an item fails, but also in which
part in a component or which component in a piece of equipment fails. Other considerations
include: 1) determining the mode or modes of failure; that is, the types of failure, as exemplified
by performance drift, erratic performance, and catastrophic failure, 2) the mechanism of failure;
that is, the reasons for failure caused by poor design, and 3) part mis-application; in other words,
the how and why of failure. Time-to-failure testing by actually generating a failure, together with
the subsequent failure analysis, helps to find answers to these questions when time is the critical
parameter. Many types of electronic, electromechanical, and hydraulic equipment fall into this
category when they are continuously operating or experiencing a large number of cycles wherein
the transient conditions of starting and stopping are not more severe than the accumulation of
time. Life testing indicates how much more (or less) life the equipment has than is required for
operational use. This in turn allows priorities for reliability improvement to be established.
A subset of life testing is truncated (life) testing. Often a life test may be truncated before all test
units have failed due to time limitations. Truncated data arises when, either by accident or
design, values for all test items are not available. Truncated data is distinct from missing data.
The type of analysis done on truncated data depends on the type of test plan and on the objectives
of the test. There are many test plans that yield truncated data and methods for designing these
test plans are well developed. The analysis methods are also well known.
Life tests can be truncated in various ways; for example, the test might be stopped when a
predetermined number of units have failed or when a specified amount of test time has elapsed.
The truncation of the test depends on the resources available and the goals of the test.
References
Lloyd, D., M. Lipow, RELIABILITY: Management, Methods, and Mathematics, Second Edition,
Milwaukee, WS:The American Society for Quality Control, 1991, pp. 307-319, 352.
Nelson, W., Applied Life Data Analysis, NY:John Wiley & Sons, 1982.

118
Applicable Tool AT9: Pareto Diagram

The Pareto diagram is based on the Pareto principle of the ‘significant few and the insignificant
many’. It is often found that a large proportion of failures in equipment are due to a small
number of causes.
No. of
Failures
Part Part Part Part Part Part Part

1 2 3 4 5 6 7
The Pareto diagram is a vertical or horizontal bar chart used to quantify and identify problems
and determine which problems should be worked on first. The bars are used to present a graphic
picture of the problems related to equipment. The bars are arranged in descending order of
importance from left to right. Analyzing failure data and using that data to create a Pareto
diagram allows for determining how to solve the largest proportion of the overall reliability
problem with the most economical use of resources.
References
Harrington, H., The Improvement Process, New York:McGraw-Hill, 1987, pp. 108-110, 207.
Ishikawa, K., Guide to Quality Control, White Plains, NY:Quality Resources, 1982, pp. 42-49.
O'Connor, P., Practical Reliability Engineering, Third Edition, New York:John Wiley & Sons,
1991, pp. 270-271.
The Memory Jogger, Second Edition, Methuen, MA:GOAL/QPC, 1988, pg. 17.

119
Applicable Tool AT10: Process Capability
If equipment, subsystems, components, or parts have a tolerance (or specification) width, and are
produced by a process that generates variation in the parameter(s) of interest, it is important that
the process variation be less than the tolerance width. The ratio of the tolerance to the process
variation is called the process capability index, and is expressed as
T
Cp =
6σ
where T is the tolerance width and 6σ represents an interval of six standard deviations or, plus or
minus three standard deviations from the process mean. A Cp of 1 indicates that a process will
generate approximately 3 out-of-specification units in 1000, given the following assumptions.
The first assumption is that the process is normally distributed and stable. Any systematic
divergence, due for example to set-up errors, movement of the process mean during the
manufacturing cycle, or other causes, could significantly affect the output. Therefore, the use of
Cp to characterize a production process is appropriate only for processes that are under statistical
control; that is, there are no special causes of variation such as those just mentioned, only
common causes. Common cause variation is the random variation inherent in the process, when
it is under statistical control. The Cp index also assumes that the tolerance center and the process
mean coincide; that is, the process average is centered on the nominal value.

120
The Cpk index uses the Cp index as a starting point for stating a process’s capability, however, it
accounts for the process center not being the nominal value. Cpk is expressed as
C pk = (1- K) C p
where
D-x
K=
T/2
if D>x; otherwise replace D-x withx -D. D is the design center,x is the process mean, and T is
the tolerance width.
Ideally Cp = Cpk.
There are several things to keep in mind when using Cp and Cpk indices:
• If the process is not stable, Cp and Cpk are meaningless statistics.
• Not all processes can be assumed to be normally distributed. A naive user may
incorrectly assess the fraction of process output that will be out of specification.
• Cp and Cpk do not yield the same information about a process
• Both Cp and Cpk are closely tied to traditional 0-1 loss and do not account for
losses incurred for being off-target; each measures distance from specifications
not distance from target.
References
Gitlow, H., S. Gitlow, A. Oppenheim, R. Oppenheim, Tools and Methods for the Improvement of
Quality, Boston, MA:IRWIN, 1989, pp. 451-457.
Kane, V. E., "Process Capability Indices," Journal of Quality Technology, Vol. 18, No. 1,
January 1986, pp. 41-52.
O’Connor, P., Practical Reliability Engineering, Third Edition, New York:John Wiley & Sons,
1991, pp. 302-303.
Sullivan, L., "Reducing Variability: A New Approach to Quality," Quality Progress, July 1985,
pp. 15-21.
The Memory Jogger, Second Edition, Methuen, MA, GOAL/QPC, 1988, pp. 64-68.

121
Applicable Tool AT11: Quality Function Deployment (QFD)

Quality Function Deployment (QFD) is a disciplined approach to engineering and process
planning that projects customer requirements through all phases of the equipment life cycle.
QFD is also known as the "house of quality" because the matrix used in its implementation
resembles a house.
Correlation
Matrix
Pr
io
rit
How?
y
What? Relationship
Matrix
Importance Ratings
How much?
The focus of QFD is almost entirely on the customer; that is, the voice of the customer. The
attitude promoted by QFD is one of problem avoidance rather than problem solving.
QFD is best used in a team or group context. The information required to complete a QFD
matrix is usually found in many different disciplines or skill sets. The information needed
stretches from a few simple (but presumably accurate) statements of customer needs, all the way
to the most detailed manufacturing process description. Therefore, it is not a methodology that
can be effectively used by a single person.
Advantages of QFD include:
• Promoting careful planning of the equipment through all life cycle phases in such
a way that attention is paid to customer needs

122
• Eliminating spurious engineering and process requirements; that is, those that
have no role in meeting customer needs
• Shortening the time it takes to move through the concept and feasibility to
production and operation phases by avoiding later life cycle changes that stretch
out the cycle time
• Identifying problem areas early, exposing areas for improvement, and providing
documentation for these activities
Difficulties with QFD include:
• Being semi-quantitative, QFD doesn’t replace good engineering judgement and
good sense
• An inability to compensate for an inaccurate or incomplete list of customer needs
• Not being designed to promote innovation in the sense of new or radical product
ideas
• Requiring the use of a wide variety of expertise and a team environment
The basics of the QFD matrix are simple; although, in practice it is a great deal of work to collect
the information necessary to create the matrix. Generally the QFD matrix consists of seven parts
• What?
• How?
• Relationship matrix
• Priority
• Correlation matrix
• Importance ratings
• How much?
What? is a collection of simple statements of customer wants, needs, or requirements; that is, the
voice of the customer. These statements are easy for the customer to identify with and to
understand. They accurately and simply list the group of characteristics or properties that make
the customer happy.
How? is a list of engineering, design and technical properties that are necessary to develop the
equipment. The What? list becomes the titles for the QFD matrix rows, and the How? list
becomes the titles for the columns, see the icon at the beginning of the QFD discussion.
The relationship matrix is used to relate the What? rows to the How? columns. A relevance
number or symbol is assigned to the intersections of the rows and columns. This results in
establishing the relationship between what the customer wants and how the equipment is going to
meet those wants.
Usually an extra column, called priority, is placed just to the left of the relationship matrix. It is
used to assign importance weights to the customer wants; that is, to determine which of the
customer wants are the most important to the customer. This determines which characteristics
will get the most focus. The determination is made with the customer, or at least with some very
good knowledge of what the customer wants.

123
Engineering, design, and technical properties are not independent of one another. Therefore, it is
necessary to examine how they relate to one another. This results in the roof of the house of
quality which is the correlation matrix. It is also necessary to determine if the properties are
correlated positively or negatively. An example of negatively correlated properties would be
strength and flexibility.
The matrix is usually expanded further to include the importance ratings and the How much?
column. The importance ratings contain numbers derived from the matrix values and the priority
column. It is used to indicate the importance of each of the properties with respect to the
customer wants. The How much? column contains the target values for every property listed in
the How? column. It answers the question, "How much is enough?"
References
Akao, Y., editor, Quality Function Deployment: Integrated Customer Requirements into Product
Design, Norwalk CN:Productive Press, 1990.
Hauser, J., D. Clausing, "The House of Quality," Harvard Business Review, May-June 1988,
pp. 63-73.
Ryan, N., editor, Taguchi Methods and QFD: Hows and Whys for Management, Dearborn,
MI:ASI Press, 1988, pp. 63-110.

124
Applicable Tool AT12: Reliability Analysis and Modeling

The Reliability Analysis and Modeling Program (RAMP) has been developed by SETEC in
support of SEMATECH. The RAMP software can be used to assist the system analyst or
designer in the construction of a system reliability model for equipment used in semiconductor
manufacturing. A system model provides the analyst with useful information in many different
forms, including the following:
• An evaluation of design alternatives prior to cutting metal
• An estimate of the system reliability (or MTBF)
• A quantification of the uncertainty in the estimate
• An identification of major contributors to system failure
• A rationale for allocating available resources to improve the performance of the
system
Modeling produces its maximum economic benefit when performed during the design phase of
the equipment life cycle. However, modeling can also provide economic benefits when applied
to existing equipment.
The development of a system model depends heavily on the user’s understanding of the
equipment that is being modeled. However, proper utilization of the model also requires the
analyst to have a working knowledge of several concepts in the areas of statistics, probability,
and reliability.
Version 1.0 of RAMP provides the capability for developing, editing, and evaluating reliability
models for equipment used in semiconductor manufacturing. This capability is supported by an
integrated data management system and an integrated graphics output capability.
The following features were included to make the software as user friendly as possible:
• Menu driven. All options available to the user can be accessed from on-screen
menus.
• Help screens. Context-sensitive help is available to the user at all times.
• Mouse support. Mouse support is provided on all screens where use of the mouse
significantly improves the user interface.

125
• Graphics output. Graphics output is fully integrated into the software.

• Modular design. The design of the software package is modular to allow easy
modification or addition of capabilities.
• Integrated data management. Management of component data is fully integrated
into the software.
• File management. Management of file names and file identification is transparent
to the user.
WHS-ROB- WHS-ROB- WHS-ROB WHS-ROB- WHS--ELEC WHS-ELEC
WHS-TC-VS
TARM SERV WSEN ELEC PS CIB
Figure 4-1. A Block Model Developed in RAMP for the SETEC Generic Wafer
Handler System
A system model for the equipment is easily developed in RAMP in the form of a block diagram.
Figure 3-1 gives an example of a block model representation of a SETEC generic wafer handler
as developed in RAMP by the analyst. The system is represented with 14 components in series
(7 of which are shown in Figure 4-1). Component failure rate information, including a
characterization of the uncertainty, is entered into the component data library in RAMP. RAMP
converts the block diagram model in Figure 4-1 to a mathematical equation and uses random
selection techniques to sample the component failure rates from the component data library. The
output from RAMP provides complete sensitivity and uncertainty analysis results for various
performance measures that are associated with a reliability analysis of the system being modeled,
including
• System MTBF The system MTBF is for the modeled system. A range of values
for the MTBF and the distribution associated with that range is provided.
• Component contribution to system failure The fractional contribution that a
component makes to the failure of the system.
• Component contribution to subsystem failure The fractional contribution that a
component makes to the failure of the subsystem.
• Subsystem contribution to system failure The fractional contribution that a
subsystem makes to the failure of the system.
• Reliability Improvement The value of reliability improvement for a component is
the system MTBF (in hours) that would result if the failure rate for that
component were zero (that is, the component were perfectly reliable or nearly so).
• Uncertainty importance Uncertainty importance provides a measure of the
contribution of a component to the uncertainty in the probability of system failure.
Results produced by RAMP are available in various types of displays that include
Histograms A histogram is a graphical presentation of sample data using classes (that is,
intervals) on the x axis and relative frequency on the y axis.
Cumulative distribution functions (CDFs) A CDF is a graph of the cumulative relative
frequency (cumulative fraction) of observations less than or equal to a given value.

126
Pareto diagrams A Pareto diagram is a bar chart with the displayed values ordered from the
largest to the smallest. RAMP orders displayed values based on the mean. The 5th and 95th
percentiles are also displayed when they are available.
Summary statistics A written list of all the statistics calculated by RAMP is displayed, such as
the average MTBF, standard deviation for MTBF, and selected quantiles of the uncertainty
distribution for MTBF.
Input samples This option allows an analyst to view or print input failure rates as sampled from
component failure rate distributions.
Output results from samples This option allows the analyst to view or print the numerical
results that are calculated for each of the sampled failure rates.
Statistical results This option allows an analyst to view or print selected statistical results, such
as the mean value for all components.
Based on the characterization of the failure rates in the component data library for the SETEC
generic wafer handler system shown in Figure 4-1, the summary statistics produced by RAMP
give a mean value for MTBF of 93 hrs with about a 5 percent chance of being less than 50 hrs
and a 5 percent chance of exceeding 178 hrs. A graph of the estimated cumulative distribution
function for MTBF that is produced by RAMP is given in Figure 4-2.

127
Figure 4-2. An Estimate of the Cumulative Distribution Function for MTBF
The Pareto diagram in Figure 3-3 identifies the components that are the dominant contributors to
the failure of the system such as robot servo, robot wafer sensor, elevator door, and sensor
amplifiers. The Pareto diagram uses three horizontal bars with each component name rather than
the usual one bar. This is done to display the uncertainty associated with the contribution of each
component to system failure. The three bars represent the 95th percentile, the mean, and the 5th
percentile of the distribution of the component’s contribution to system failure.
Now assume that the engineers involved with the SETEC generic wafer handler have developed
a new and improved elevator that improves its MTBF by a factor of 2. The component data
library is modified to reflect the new MTBF for the elevator.
In addition, the engineers would like to evaluate the impact on system reliability of a design
change that would incorporate redundancy by adding another robot wafer sensor in parallel.
Because the sensors are in parallel, they must both fail before they cause the system to fail, thus
improving the system MTBF. The block diagram model is modified to include this desired
design change. The modified block diagram is shown in Figure 4-4.

128
Figure 4-3. A Pareto Diagram for Component Contribution to System Failure
WHS-ROB- WHS-ROB- WHS-ROB WHS-ROB- WHS--ELEC WHS-ELEC

WHS-TC-VS
TARM SERV WSEN ELEC PS CIB
WHS-ROB
WSENP
Figure 4-4. A Revised Block Diagram for the SETEC Generic Wafer Handler
System, showing the Addition of the Redundant Wafer Sensor
After these modifications, the summary statistics produced by RAMP give a mean value for
MTBF of 137 hrs for an increase of 47 percent. There is approximately a 5 percent chance of the
MTBF being less than 64 hr and a 5 percent chance of it exceeding 249 hr. A graph of the
estimated cumulative distribution function for MTBF that is produced by RAMP is given in
Figure 4-5.

129
Figure 4-5. An Estimate of the Cumulative Distribution Function for MTBF after
Modifying the Generic Wafer Handler System
The new Pareto diagram is given in Figure 4-6 and shows that the wafer sensor is no longer a
problem and has dropped out of the top ten list of components contributing to system failure. In
addition, the elevator door has now dropped behind the sensor amplifiers in the rankings.

130
Figure 4-6. A Pareto Diagram for Component Contribution to System Failureafter

Modifying the Generic Wafer System
This example has illustrated how RAMP provides a prediction of the system MTBF (including
the uncertainty in the prediction) after making two improvements in the system. Thus, modeling
has provided a tool for adopting a proactive position rather than a reactive position with respect
to making changes in the system to improve its reliability. That is, the analyst now has a good
idea of how the proposed changes will affect the performance of the system and knows where to
expend the company’s resources to provide an even greater improvement prior to committing
those resources.
This simple example provided a flavor of how RAMP works and demonstrated the usefulness of
modeling. Modeling alone does not make a system reliable, but it does provide an organized
means of understanding the system as well as being a tool to guide the wise expenditure of
resources for improved reliability.
References
RAMP, SETEC91-030, Albuquerque, NM:Sandia National Laboratories.
Campbell, J., B. Thompson, D. Longsine, P. O’Connell, R. Iman, RAMP User’s Reference
Manual, SETEC91-030, Albuquerque, NM:Sandia National Laboratories.

131
Applicable Tool AT13: Reliability Development/Growth TestingTool

The purpose of the Reliability Development/Growth Test (RD/GT) is to improve the reliability of
equipment through a disciplined process of systematic and permanent removal of failure
mechanisms and design weaknesses. RD/GT is conducted under simulated or actual usage
environments based on operational requirements and mission profiles of the operational
environment.
Reliability Development/Growth Test (RD/GT) is a closed loop reliability improvement process
that involves testing under simulated or actual usage environments. The purpose of RD/GT is to:
• Induce failures
• Detect the failures
• Determine the cause of the failures
• Identify corrective actions to correct the failures
• Implement effective corrective actions
• Test to verify that the failure causes have been removed
Corrective action encompasses redesign, part and material changes, and changes in the design
and manufacturing processes. The reliability of the equipment is improved by systematically
implementing corrective action, which results in significantly higher reliability in the field. The
rate at which reliability grows depends on how effectively and rapidly failure modes can be
identified, corrected and then verified by retest.
Candidates for RD/GT include high risk and mission critical items. High risk items usually
represent designs utilizing new or state-of-the-art technology. Other candidates include those
items that are major contributors to the overall equipment reliability, are high in cost, and those
that experience suggests need reliability improvement.
There are several things that should be mentioned about RD/GT. The first is that RD/GT is only
as effective as the ability of the implemented process to detect and correct problems as they
occur. It should be recognized that unless problems are identified and fixes implemented and
verified, no reliability growth will occur. Reliability improvement results from fixes that
eliminate failure sources discovered through the analysis of test data. Reliability improvement is
a function of a design and manufacturing process improvement, not just test time. Implementing
an RD/GT that merely tests equipment and repairs failures will not result in reliability
improvement.
The second thing is that RD/GT will not effectively increase the reliability of an item that has a
low initial design reliability. If initial reliability is too low (due mainly to inadequate design), the
item will require an unrealistic reliability improvement or growth rate in order to reach an
acceptable level of reliability. This will be reflected in the requested amount of test time and
program cost. RD/GT is an engineering task designed to improve design reliability. Monitoring,
tracking and assessing the results of an RD/GT gives management insight into the efficiency of
the process, and provides a tool for evaluating development status and reallocating resources
when necessary to achieve the proper growth rate.
References

132
Arsenault, J., J. Roberts, Reliability & Maintainability of Electronic Systems, Potomac,

MD:Computer Science Press, Inc., 1980, pp. 344-353.
MIL-HDBK-338-1A, Electronic Reliability Design Handbook, Volume I of II, Irvine, CA:Global
Engineering Documents, 12 October 1988. 92714, pp. 8-68 to 8-90.
Warrendale, PA:Society of Automotive Engineers, Inc., 1990, pp. 149 - 158.

133
Applicable Tool AT14: Reliability Qualification Testing (RQT)Tool

Reliability Qualification Testing (RQT) is used to verify that the equipment will meet design
goals and comply with contractual/program objectives. The test is performed under specified
environmental conditions and pass/fail criteria are established prior to a production decision.
Usually, the qualification test is performed on the overall equipment. However, some sublevel
testing may be performed on a few critical items.
Involved in this test is a customer review of the results of the supplier’s qualification test to
ensure that a valid statistical representation of the projected equipment performance has been
achieved. Also involved in this test is periodically pulling equipment out of the normal
manufacturing cycle, that is out of production, and performing extended qualifications tests on it.
This provides performance benchmarks and a means of assuring that engineering changes are
meeting design goals.
A major aspect of this test is the final and acceptance testing that is carried out on each piece of
equipment. It is important to note that when testing plans are developed, they should be
upgraded as the equipment passes through the various life cycle phases.
Final Test. The process of equipment qualification and acceptance is generally accomplished in
two parts. The first occurs after final test on the production floor. The equipment qualification is
often referred to as final test. If properly planned, the customer’s source inspection can be
included at the same time. The detail involved in source inspections varies widely depending on
customer requirements, but all aspects of source inspections are normally covered during final
test. Once the equipment is operating properly, the reliability test starts. This involves multiple,
repeated cycling of equipment functions and subsystems. This test can also help bring infant
mortality failures to the surface before the equipment is delivered to a customer.
Acceptance Test. The next phase is acceptance testing. This occurs after equipment installation
at the customer’s site. The engineer starts by verifying that the equipment still passes tests
similar to those performed during equipment qualification to check for shipping and installation
damage. The focus then shifts to process capability. This is where the second half of the
equipment characterization takes place. Once the equipment is fully characterized with an
optimal or pre-determined process, the reliability testing begins again. This focuses on process
stability, repeatability and performance. The tests can collect concurrent equipment reliability
data because this testing generally involves many operational cycles of the equipment.
References
Ireson, W., C. Coombs, Jr., Handbook of Reliability Engineering and Management,
NY:McGraw-Hill, 1988, pp. 8.1 - 8.39.
RADC Reliability Engineer’s Toolkit An Application Oriented Guide for the Practicing
Reliability Engineer, Griffiss Air Force Base, NY:Systems Reliability and Engineering Division,
Rome Air Development Center, July 1988, pg. 101.
Warrendale, PA:Society of Automotive Engineers, Inc., 1990, pp. 211-217.

134
Applicable Tool AT15: Reliability Block Diagram Modeling (RBD)Tool
SS1 SS2 SS3
Reliability Block Diagram (RBD) models are one of the tools that can be used to create a
reliability model of equipment. One of the easiest ways to describe the basic ideas used in the
creation of RBD models is to create a simple RBD; for a more detailed description of the
diagrams look at the sources listed in the references. Construction of a reliability block diagram
begins by defining what is meant by equipment failure; for example, equipment failure may be
defined as any failure that causes the equipment to be down for 8 minutes or longer. Once this is
done, the next step is to determine the various ways that this failure can occur. This is initially
done at a gross level; that is, 10 to 20 subsystems are defined that can lead to equipment failure.
A block diagram model that consists of 3 subsystems (SS1, SS2, and SS3) follows:
C3
C1 C2 SS2 C4
C5
In this example SS2 is not a significant contributor to the unreliability of the equipment, so it will
not be broken into any more detail. SS1 and SS3 however, are contributors to equipment
unreliability. SS1 fails if component 1 or 2 (C1 or C2) fail. SS3 fails if components 3, 4, and 5
(C3, C4, and C5) all fail. The block diagram model now looks like:

135
C3
P1
C1 SS2 P3 P4
P2
C5
Further analysis reveals that C2 fails if parts 1 and 2 (P1 and P2) fail. C4 fails if parts 3 or
4 (P3 or P4) fail. The block diagram model now looks like:
Once construction of the model is complete, it is translated into a Boolean equation which is then
used to quantify the equipment reliability. The references discuss Boolean algebra in detail, it
will not be discussed here. The Boolean equation for the RBD is:
Equipment Failure = C1 + P1 * P2 + SS2 + [C3 * (P3 + P4) * C5]
expanding and using the associative and distributive laws,
Equipment Failure = C1 + P1 * P2 + SS2 + C3 * P3 * C5 + C3 * P4 * C5.
Each of the terms in this equation represent a way that the equipment can fail. For example, if
part 1 and part 2 (P1 and P2) fail, the equipment fails.
The reliability block diagram has been translated into an equation, it is now time to quantify the
probability that the equipment fails as a function of its subsystems, components, and parts. Often
the term probability is used when what is really meant is frequency, probabilities must lie
between 0 and 1. A frequency can be any number greater than or equal to 0, depending on the
number of failures and the time scale used. For example, if a component fails twice per year, its
frequency is 2/yr, or 0.66/mo.
Using the previous example, the probability of equipment failure can be written,
P(Equipment Failure) = P(C1 + P1 * P2 + SS2 + C3 * P3 * C5 + C3 * P4 * C5).
But, how does one deal with the right-hand side of the equation? Considering the basic laws of
probability and the small probability approximation, and assuming that the events are
independent, the example equation becomes:
P(Equipment Failure) = P(C1) + P(P1)*P(P2) + P(SS2) + P(C3)*P(P3)*P(C5)
+ P(C3)*P(P4)*P(C5).

136
References
RAMP, Albuquerque, NM:Sandia National Laboratories, SETEC91-030, pp. 9 - 31.
Klinger, D., Y. Nakada, M. Menendez, AT&T Reliability Manual, New York:Van Nostrand
Reinhold, 1990, pp. 78-91.
MIL-STD-756B, Reliability Modeling and Prediction, Washington, DC:Department of Defense,
18 November 1981, pp. 1001-1 to 1001-11.

137
Applicable Tool AT16: Repairable Systems Analysis

Repairable systems analysis is a method that can be used to estimate the number of repairs that
occurred per equipment versus the equipment’s age. Age is used to mean any appropriate
measure of equipment usage such as days, hours, or cycles. Repairable systems analysis is also
used to:
• Evaluate whether a repair rate increases or decreases with equipment age, which is
useful for equipment retirement and burn-in decisions
• Compare different equipment designs, production periods, maintenance policies,
environments, and operating conditions
• Predict the future number of equipment repairs
• Reveal unexpected information and insight into component repairs
It is important to mention that time between repair data on a piece of equipment is analyzed
differently than the time between failure data. The use of failure rate parameters are generally
not meaningful for single equipment repair data, especially if the reliability of the equipment is
increasing or decreasing. Statistical tests, graphical procedures, or both are available for
determining if the failure rate is increasing, decreasing, or staying constant. If the time between
repairs on a piece of equipment are gradually getting longer, one could reasonably assume that
the reliability is improving. On the other hand, if these times are decreasing, one assumes that
the reliability is decreasing. Using standard methods for estimating the failure rate, this
increasing or decreasing of the time between repairs is often overlooked. The methodology
associated with this tool detects trends of this type.
References
Asher, H., H. Feingold, Repairable Systems Reliability Modeling, Inference, Misconceptions and
Their Causes, New York:Marcel Dekker, Inc., 1984.
Nelson, W., "Graphical Analysis of System Repair Data," Journal of Quality Technology, Vol.
20, No. 1, Jan. 1988, pp. 24-35.

138
Applicable Tool AT17: Taguchi Methods

The following paragraphs will introduce some of the basic elements of Dr. Genichi Taguchi’s
quality methodology. Even though these elements are directed at quality, they apply equally well
to reliability. Taguchi’s methods are in part philosophical and in part methodological. The
methodological component, which consists of Taguchi’s use of statistical concepts and tools, is
the subject of heated controversy. However, the heart of his message has more to do with his
conceptual framework for the process of quality improvement and nearly all practitioners accept
Taguchi’s central philosophical ideas.
There are seven points that explain some of the basic elements of Taguchi’s philosophy:
1. The quality of a manufactured product is measured by the total loss created by
that product to society.
2. In a competitive economy, continuous quality improvement and cost reduction
are necessary for staying in business.
3. Quality improvement requires the never-ending reduction of variation in product
and process performance around desired values.
4. Society’s loss due to performance variation is frequently proportional to the
square of the deviation of the performance characteristic from its target value.
5. The final quality and cost of a manufactured product are determined to a large
extent by the engineering designs of the product and its manufacturing process.
6. Performance variation can be reduced by exploiting the nonlinear effects
between a product’s and/or process’s parameters and the product’s desired
performance characteristics.
7. Statistically planned experiments can be used to identify the settings of product
(and process) parameters that reduce performance variation.
These points will be discussed in more detail.
1. The quality of a manufactured product is measured by the total loss created by that product to
society. According to Taguchi, "Quality is the loss imparted to society from the time a
product is shipped." This view of quality includes customers, manufacturers, and the
community in the definition of quality. According to this perspective, quality improvement
saves society more resources than it costs, and it benefits everyone: customers,
manufacturers, and the community. This is a new way to think of investments in quality
improvement. A quality improvement project is justified as long as the resulting savings to
customers are more than the cost of improvements.
2. In a competitive economy, continuous quality improvement and cost reduction are necessary
for staying in business. In a competitive economy, a company that does not earn a
reasonable profit cannot survive for long. A sure way of increasing market share is to
provide high quality products at a low price, which is what customers want. Thus,
companies that are determined to stay in business use high quality and low cost as their
competitive strategy. Such companies also realize that the quality of their product is never
good enough and manufacturing costs are never low enough.

139
3. Quality improvement requires never-ending reduction of variation in product and process

performance around desired values. The quality of a product cannot be improved unless the
quality characteristics of that product can be identified and measured and the ideal values are
known. Each quality characteristic varies from unit to unit and over time. The objective of
a continuous quality improvement process is to reduce this variation; that is, make the
quality characteristics as close to their ideal values as possible. However, it is generally not
economical or necessary to improve all quality characteristics since not all characteristics are
of equal importance.
Performance characteristics are defined as those characteristics that determine the product’s
performance in satisfying the customer’s requirements. The ideal value is called the target
value. If a product is of high quality, the performance characteristics remain close to their
targeted values under all operating conditions. The variation of a performance characteristic
about its target value is referred to as performance variation. The smaller the performance
variation about the target value, the better the quality. Target specifications are typically
stated in terms of nominal values and tolerances about these values. It is not acceptable to
state target values in terms of interval specifications only. This leads to the idea that it is
okay to be anywhere within the interval and that magically the performance characteristics
deteriorate when they move out of the interval. The goal is for the performance
characteristics to always be at their targeted values.
4. Society’s loss due to performance variation is frequently proportional to the square of the
deviation of the performance characteristic from its target value. Any variation in a
product’s performance characteristic about its targeted value causes a loss to society. This
loss can range from inconvenience to monetary loss and physical harm.
Variation is represented mathematically in the following manner. Let Υ be a performance
characteristic measured on a continuous scale and let the target value of Υ be τ. Let ζ(Υ)
represent dollar losses suffered by society at some time during the products life span due to
the deviation of Υ from τ. Generally, the larger the deviation of the performance
characteristic Υ from it target value τ, the larger the loss to society, ζ(Υ). However, it is
usually difficult to determine the actual mathematical form of ζ(Υ). Often, a quadratic
approximation to ζ(Υ) adequately represents economic losses due to the deviation of Υ from
τ. The simplest quadratic loss function is ζ(Υ) = k(Υ-τ)2, where k is some unknown
constant that can be determined when ζ(Υ) is known for a particular value of Υ. There are
three cases of the loss function that are typically used:
• when a specific target value is the best and the loss increases symmetrically as the
performance characteristic deviates from the target
• when the smaller is better, for example, if the performance characteristic is the
amount of impurity and the target value is zero; here the smaller the impurity, the
better it is
• when the larger the better, for example, if the performance characteristic is
strength; here the larger the strength the better it is
The average loss to society due to performance variation is obtained by "statistically
averaging" the quadratic loss ζ(Υ) = k(Υ-τ)2 associated with the possible values of Υ. In the
case of quadratic loss functions, the average loss due to performance variation is

140
proportional to the mean squared error of Υ about its targeted value τ. Therefore the
fundamental measure of variability is the mean squared error and not the variance. The
concept of quadratic loss emphasizes the importance of continuously reducing performance
variation.
5. The final quality and cost of a manufactured product are determined to a large extent by the
engineering design of the product and its manufacturing process. The number of
manufacturing imperfections in a product, hence the manufacturing cost of a product, is
significantly affected by the product’s design and the design of the process used to produce
the product. Generally, a product’s field performance is affected by environmental variables
as well as human variations in operating the product, product deterioration, and
manufacturing imperfections. Note that these sources of variation are chronic problems.
Manufacturing imperfections are the deviations of the actual parameters of a manufactured
product from their nominal values. These imperfections are caused by inevitable
uncertainties in a manufacturing process and are responsible for performance variation
across different units of a product. Dealing with variations due to environmental factors and
product deterioration can be done only in the product’s concept and design phases.
The manufacturing costs and imperfections in a product are largely determined by the design
of the manufacturing process. Increasing process controls can reduce manufacturing
imperfections; however, process controls cost money. It is, therefore, necessary to reduce
both manufacturing imperfections and process controls. Once the process is under statistical
control, it can be improved. Without a stable process it is almost impossible to discover a
means of reducing variation due to chronic problems.
6. Performance variation can be reduced by exploiting the nonlinear effects between a
product’s and/or process’s parameters and the product’s desired performance
characteristics. Due to the importance of the product and process design, quality control
must begin in the concept phase of the life cycle and continue through all phases. There are
two types of quality control methods:
• Off-line, which are technical aids for quality and cost control in product and
process design. These are used to improve product quality and manufacturability,
and to reduce product development, manufacturing, and lifetime costs.
• On-line, which are technical aids for quality and cost control in manufacturing.
As with performance characteristics, all specifications of product and process parameters
should be stated in terms of ideal values and tolerances around these ideal values. The idea
is not to produce products whose parameters are barely inside the tolerance intervals. Such
products are likely to be of poor quality due to the interdependencies of the parameters. A
product performs best when all parameters of the product are at their ideal values. Further,
the knowledge of ideal values of product and process parameters encourages continuous
quality improvements.
Taguchi has introduced a three-step approach to assign nominal values and tolerances to
product and process parameters:
• System design
• Parameter design

141
• Tolerance design
System design involves applying scientific and engineering knowledge to produce a basic
functional prototype design. The prototype model defines the initial setting of the product or
process parameters. System design requires an understanding of both the customer’s needs
and the manufacturing environment. A product cannot satisfy the customer’s needs unless it
is designed to do so. Designing for manufacturability requires an understanding of the
manufacturing environment.
Parameter design involves identifying the settings of product or process parameters that
reduce the sensitivity of engineering designs to the sources of variation. Adjustment of the
mean value of a performance characteristic to its targeted value is usually a much easier
engineering problem than the reduction of performance variation. The utilization of
nonlinear effects of product or process parameters on the performance characteristics to
reduce the sensitivity of engineering designs to the sources of variation is the essence of
parameter design. Because parameter design reduces performance variation by reducing the
influence of the sources of variation rather than by controlling them, it is a very cost-
effective technique for improving engineering designs. It is economically advantageous for
a designer to provide designs that are tolerant to statistical variations.
Tolerance design involves determining tolerances around the nominal settings identified by
parameter design. Industry commonly assigns tolerances using convention rather than
science. Narrow tolerances increase manufacturing costs while wide tolerances increase
performance variation. Thus, tolerance design is a trade-off between society’s loss due to
performance variation and the increase in manufacturing costs.
7. Statistically planned experiments can be used to identify the settings of product (and
process) parameters that reduce performance variation. This is the portion of Taguchi’s
methodology that is subject to criticism. Engineers tend to like Taguchi’s statistical methods
because he has made a serious effort to develop methods that are easy for a non-statistical
expert to use. However, Taguchi’s experiments can be enormous and extremely inefficient.
Taguchi’s approach to the use of statistically planned experiments for parameter design
involves classification of the performance characteristics of a product or process into two
categories: design parameters and sources of noise. Design parameters are those product or
process parameters whose nominal settings can be chosen by the responsible engineer.
These nominal settings define the product or process design specifications and vice versa.
The sources of noise are all those variables that cause the performance characteristics to
deviate from their targeted values. The noise factors are those sources of noise that can be
systematically varied in a parameter design experiment. The key noise factors, those that
represent the major sources of noise affecting a product’s performance in the field and a
process’ performance in the manufacturing environment, should be identified and included in
the experiment.

142
References
Barker, T.B., "Quality Engineering By Design: Taguchi’s Philosophy,"Quality Progress,
December 1986, pp. 32-42.
Gitlow, H., S. Gitlow, A. Oppenheim, R. Oppenheim, Tools and Methods for the Improvement of
Quality, Boston, MA:IRWIN, 1989, pp. 491-507.
Gunter, B., "A Perspective on the Taguchi Methods," Quality Progress, June 1987, pp. 44-52.
Kackar, R.N., "Taguchi’s Quality Philosophy: Analysis and Commentary,"Quality Progress,
December 1986, pp. 21-29.
Miller, K.L., D. Woodruff, "A Design Master’s End Run Around Trial and Error,"Business
Week/Quality, October 15, 1991, pg. 24.
Phadke, M.S., Quality Engineering Using Robust Design, Englewood Cliffs, NJ:Prentice Hall,
1989.
Port, O., J. Carey, "Quality: A Field With Roots That Go Back To The Farm," Business
Week/Quality, October 15, 1991, pg. 15.
Ross, P.J., Taguchi Techniques for Quality Engineering Loss Function, Orthogonal Experiments,
Parameter and Tolerance Design, New York, NY:McGraw-Hill Book Company, 1988.
Taguchi, G., Introduction To Quality Engineering Designing Quality into Products and Process,
White Plains, NY:Asian Productivity Organization, 1987.

143
Applicable Tool AT18: User Groups

The reason for creating User Groups is to establish clear and direct communication between the
equipment supplier and the equipment user.
One of the most effective means of establishing and maintaining user groups is through user
group meetings. The user group meetings are structured working level meetings where needed
equipment improvements are identified and prioritized, problems are solved, and strategic
information is shared.
Key factors for successful user group meetings include having:
• A sufficient number of both user attendees with "hands-on" knowledge of
equipment performance and supplier attendees with design, manufacturing and
field service responsibilities
• "Ownership" of the meeting by one individual or joint ownership by one user
person and one supplier person
• Plenty of lead time for identification of attendees, surveys, and for meeting
preparation
• A well-paced agenda with enticements that lead to discussion and user
participation
• "Effective Meetings" skills used by leaders
• A comfortable meeting setting (facilities and accommodations)
References
EIP Data Gathering Group, SEMATECH, Austin, TX
Partnering For Total Quality A Total Quality Tool Kit, Volume Six, SEMATECH, 1990, pp. 76,
61.

144
Applicable Tool AT19: Life-Cycle Cost Calculations

Life cycle costs include initial purchase price and the costs associated with equipment installation
and operations over its entire life. Life cycle costs include both equipment supplier costs, which
are passed on to the customer in the purchase price of the equipment, and all costs incurred by
the customer over the life of the equipment. Supplier costs plus the supplier’s profit margin are
referred to as acquisition costs, and include:
• Research and development
• Marketing and sales
• Testing and manufacturing
• Supplier shipping and installation
• Supplier training and support
• Supplier service and spare parts
• Warranty costs
• Continuous improvement
Costs incurred by the customer are referred to as operational costs, and include:
• Customer installation and training
• Operating costs
• Customer service costs and spares inventory
• Customer performed maintenance
• Customer space costs
• Scheduled maintenance
• Equipment improvements and upgrades
• Down time and scrap costs
• Disposal costs
Life cycle costs can be calculated manually by summing up all the expected costs and then
normalizing the amount by production units such as number of wafers expected to be produced
over the life of the equipment. If the equipment life is very long (more than 3 years) present
value (discounted values) of the costs occurring in the later years should be used rather than the
phase values in those years.
If RAMP software (AT12) or SEMATECH Cost Of Ownership (COO) Model software is
available, the life cycle cost calculations can be done using any one of those models.

145
References
RAMP, SETEC91-030, Albuquerque, NM:Sandia National Laboratories.
Campbell, J., B. Thompson, D. Longsine, P. O’Connell, R. Iman, RAMP User’s Reference
Manual, SETEC91-030, Albuquerque, NM:Sandia National Laboratories.
Cost of Ownership Model, SEMATECH Technology Transfer # 91020473B-GEN,
Austin, TX:SEMATECH, January 24, 1991

SEMATECH Technology Transfer
2706 Montopolis Drive
Austin, TX 78741
http://www.sematech.org

Guide Line For Equipment Reability PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Guide Line For Equipment Reability PDF

Uploaded by

Copyright:

Available Formats

Guidelines for Equipment Reliability

© 1997 SEMATECH, Inc.

Authors: Dhudsia, Vallabh

Approvals: Vallabh Dhudsia, Project Manager & Author

Technology Transfer # 92031014A-GEN SEMATECH

Figure 2-1. Percent of Total Life Cycle Costs vs Locked-in Costs................................................ 9

SEMATECH Technology Transfer # 92031014A-GEN

Technology Transfer # 92031014A-GEN SEMATECH

Sandia National Labs. - SETEC

SEMATECH Member Companies

SEMI/SEMATECH Member Companies

Technology Transfer # 92031014A-GEN SEMATECH

REVIEWERS and CONTRIBUTORS

Samuel Becktel, Genus Dennis R. Hoffman, TI

SEMATECH Technology Transfer # 92031014A-GEN

The SEMATECH Perspective

Statement from Bill Spencer, CEO of SEMATECH:

Today’s competitive environment demands an increasing level of reliability in semiconductor

Technology Transfer # 92031014A-GEN SEMATECH

SEMATECH Technology Transfer # 92031014A-GEN

The guidelines are broken into three sections:

Technology Transfer # 92031014A-GEN SEMATECH

Technology Transfer # 92031014A-GEN SEMATECH

The role of management in implementing the Reliability Improvement Process is introduced in

2 THE RELIABILITY IMPROVEMENT PROCESS AND EQUIPMENT LIFE

2.2 The Equipment Life Cycle

SEMATECH Technology Transfer # 92031014A-GEN

2.3 Life Cycle Phases

1. Concept and Feasibility —Concept and Feasibility

A discussion of each of the six life cycle phases follows.

Technology Transfer # 92031014A-GEN SEMATECH

During the preliminary design process, design and reliability engineers:

SEMATECH Technology Transfer # 92031014A-GEN

• Estimate cost trade offs and considerations

Technology Transfer # 92031014A-GEN SEMATECH

SEMATECH Technology Transfer # 92031014A-GEN

Technology Transfer # 92031014A-GEN SEMATECH

SEMATECH Technology Transfer # 92031014A-GEN

Source: Arsenault and Roberts, Reliability and Maintainability of Electronic Systems

Figure 2-1. Percent of Total Life Cycle Costs vs Locked-in Costs

2.4 Life Cycle Cost

Technology Transfer # 92031014A-GEN SEMATECH

SEMATECH Technology Transfer # 92031014A-GEN

No Formal With Formal

Figure 2-2. Impact of a reliability program on life cycle cost

Technology Transfer # 92031014A-GEN SEMATECH

Figure 2-3. Optimizing Life Cycle Costs

SEMATECH Technology Transfer # 92031014A-GEN

Figure 2-4. Decrease in Life Cycle Costs in New Generations of Equipment

2.5 The Reliability Improvement Process

Technology Transfer # 92031014A-GEN SEMATECH

Figure 2-5. The Reliability Improvement Process

SEMATECH Technology Transfer # 92031014A-GEN

Technology Transfer # 92031014A-GEN SEMATECH

SEMATECH Technology Transfer # 92031014A-GEN

• Failure Detection Techniques. Reliability of equipment can be improved

There are a number of reliability prediction models. These include:

Technology Transfer # 92031014A-GEN SEMATECH

• Fault tree analysis (FTA). A "top down" approach beginning with an

SEMATECH Technology Transfer # 92031014A-GEN

Technology Transfer # 92031014A-GEN SEMATECH

SEMATECH Technology Transfer # 92031014A-GEN

2.6 Applying the Reliability Improvement Process