VEM

DFR – Design for Reliability

RAL

DFR – Fundamentals for Engineers

Reliability Audit Lab

VEM

RAL

Topics that will be covered:
1. Need for DFR 2. DFR Process 3. Terminology 4. Weibull Plotting 5. System Reliability 6. DFR Testing 7. Accelerated Testing

Reliability Audit Lab

VEM

RAL

1. Need for DFR

Reliability Audit Lab

VEM

RAL

What Customers Care about:

1. Product Life…. i.e., useful life before wear-out. 2. Minimum Downtime…. i.e., Maximum MTBF. 3. Endurance…. i.e., # operations, robust to
environmental changes.

4.Stable Performance…. i.e., no degradation in CTQs. 5. ON time Startup…. i.e., ease of system startup

Reliability Audit Lab

VEM

RAL

Reliability Audit Lab

VEM
Failure Mode Identification
(Pre-Launch)

Reliable Product Vision

RAL

Failure Rate
Resources/costs
Release

Resources/Costs
Release

# Failure Modes

Failure Rate

DFR

No DFR

No DFR

50%

No DFR

DFR

Goal

DFR

5%

Time

Time Start with lower “running rate”, then aggressively “grow” reliability. (Reduce Warranty Costs)

Time

Identify & “eliminate” inherent failure modes before launch. (Minimize Excursions!)

Reduce overall costs by employing DFR from the beginning.

Take control of our product quality and aggressively drive to our goals Reliability Audit Lab

VEM

RAL

2. DFR - Process

Reliability Audit Lab

VEM
NPI Process
• CTQ Identification • Customer Metrics

RAL

• Field data analysis
DP0 Specify DP1 Design DP2 Implement DP3

Rel. Goal Setting
• Assess Customer needs • Develop Reliability metrics • Establish Reliability goals

Production / Field
• Establish audit program • FRACAS system using ‘Clarify’ • Correlate field data & test results

System Model
• Construct functional block diagrams • Define Reliability model • ID critical comps. & failure potential • Allocate reliability targets

Verification Design
• Apply robust design tools • DFSS tools • Generate life predictions • Begin Growth Testing • Execute Reliability Test strategy • Continue Growth Testing • Accelerated Tests • Demonstration Testing • Agency / Compliance Testing

Reliability Audit Lab

VEM

Legacy Product DFR Process . . .
Review Historical Data • Review historical reliability & field failure data • Review field RMA’s • Review customer environments & applications Analyze Field & In-house Endurance Test Data • Develop product Fault Tree Analysis • Identify and pareto observed failure modes

RAL

1 2 3 4 5

Develop Reliability Profile & Goals • Develop P-Diagrams & System Block Diagram • Generate Reliability Weibull plots for operational endurance • Allocate reliability goals to key subsystems • Identify reliability gaps between existing product & goals for each subsystem Develop & Execute Reliability Growth Plan • Determine root cause for all identified failures • Redesign process or parts to address failure mode pareto • Validate reliability improvement through accelerated life testing & field betas Institute Reliability Validation Program • Implement process firewalls & sensors to hold design robustness • Develop and implement long-term reliability validation audit

Reliability Audit Lab

VEM

Design For Reliability Program Summary

RAL

Keys to DFR:
• Customer reliability expectations & needs must be fully understood • Reliability must be viewed from a “systems engineering” perspective • Product must be designed for the intended use environment • Reliability must be statistically verified (or risk must be accepted) • Field data collection is imperative (environment, usage, failures) • Manufacturing & supplier reliability “X’s” must be actively managed
DFR needs to be part of the entire product development cycle
Reliability Audit Lab

VEM

RAL

3. DFR - Terminology

Reliability Audit Lab

VEM

RAL

What do we mean by
1. Reliability 2. Failure 3. Failure Rate 4. Hazard Rate 5. MTTF / MTBF

Reliability Audit Lab

VEM

RAL

1. Reliability R(t): The probability that an item will perform its intended function without failure under stated conditions for a specified period of time 2. Failure: The termination of the ability of the product to perform its intended function 3. Failure Rate [F(t)]: The ratio of no. of failures within a sample to the cumulative operating time. 4. Hazard Rate [h(t)]: The instantaneous probability of failure of an item given that it has survived until that time, sometimes called as instantaneous failure rate.

Reliability Audit Lab

VEM

Failure Rate Calculation Example

RAL

EXAMPLE: A sample of 1000 meters is tested for a week, and two of them fail. (assume they fail at the end of the week). What is the Failure Rate?

Failure Rate =

2 failures 1000 * 24 * 7 hours

2 = failures /hour 168 , 000
= 1.19E-5 failures/hr

Reliability Audit Lab

VEM Probability Distribution Function (PDF):

RAL

The Probability Distribution Function (PDF) is the distribution f(t) of times to failure. The value of f(t) is the probability of the product failing precisely at time t.

f (t)
Probability Distribution Function

t

time

Reliability Audit Lab

VEM

RAL

Common Distributions
Probability Distribution Exponential Weibull Probability Density Function, f(t) Variate, Range, t

f  t =λe f  t =

−λt −  β
β t

0≤t∞ 0≤t∞

β t β−1 ⋅  ⋅e η η
− t− μ 
2

Normal

2 1 2σ f  t = ⋅e σ  2π

−∞t ∞

Log Normal

2 1 2σ f  t = ⋅e σt  2π

 ln  t −μ 2

0≤t∞

Reliability Audit Lab

VEM Cumulative Distribution Function (CDF) :

RAL

The Cumulative Distribution Function (CDF) represents the probability that the product fails at some time prior to t. It is the integral of the PDF evaluated from 0 to t.

CDF =F  t =∫ f  t dt
0

t

f (t)
Probability Distribution Function

t1

time

Cumulative Distribution Function Reliability Audit Lab

VEM Reliability Function R(t)

RAL

The reliability of a product is the probability that it does not fail before time t. It is therefore the complement of the CDF:

R t =1−F  t =1−∫ f  t dt
0

t

or R t =∫ f  t  dt
t ∞

Typical characteristics: • when t=0, R(t)=1 • when t→∞, R(t) →0

f (t)
Probability Density Function R(t) = 1-F(t)

t
Reliability Audit Lab

time

VEM Hazard Function h(t)
The hazard function is defined as the limit of the failure rate as Δt approaches zero. In other words, the hazard function or the instantaneous failure rate is obtained as h(t) = lim [R(t) – R(t+Δt)] / [Δt * R(t)] Δt -> 0

RAL

The hazard function or hazard rate h(t) is the conditional probability of failure in the interval t to (t + Δt), given that there was no failure at t. It is expressed as h(t) = f(t) / R(t).

Reliability Audit Lab

VEM Hazard Functions
As shown the hazard rate is a function of time. What type of function does hazard rate exhibit with time? The general answer is the bathtub-shaped function.

RAL

The sample will experience a high failure rate at the beginning of the operation time due to weak or substandard components, manufacturing imperfections, design errors and installation defects. This period of decreasing failure rate is referred to as the “infant mortality region” This is an undesirable region for both the manufacturer and consumer viewpoints as it causes an unnecessary repair cost for the manufacturer and an interruption of product usage for the consumer. The early failures can be minimized by improving the burn-in period of systems or components before shipments are made, by improving the manufacturing process and by improving the quality control of the products.
Reliability Audit Lab

VEM

RAL

At the end of the early failure-rate region, the failure rate will eventually reach a constant value. During this constant failure-rate region the failures do not follow a predictable pattern but occur at random due to the changes in the applied load. The randomness of material flaws or manufacturing flaws will also lead to failures during the constant failure rate region. The third and final region of the failure-rate curve is the wear-out region. The beginning of the wear out region is noticed when the failure rate starts to increase significantly more than the constant failure rate value and the failures are no longer attributed to randomness but are due to the age and wear of the components. To minimize the effect of the wear-out region, one must use periodic preventive maintenance or consider replacement of the product.

Reliability Audit Lab

VEM

Product's Hazard Rate Vs. Time : “The Bathtub Curve”

RAL

Infant Mortality
h(t) decreasing

Random Failure (Useful Life)

Wear out

h(t) increasing

Hazard Rate, h(t)

h(t) constant

Manufacturing Defects Random Failures

Wear out Failures

Time

Reliability Audit Lab

VEM Mean Time To Failures [MTTF] -

RAL

One of the measures of the system's reliability is the mean time to failure (MTTF). It should not be confused with the mean time between failure (MTBF). We refer to the expected time between two successive failures as the MTTF when the system is non-repairable. When the system is repairable we refer to it as the MTBF Now let us consider n identical non-repairable systems and observe the time to failure for them. Assume that the observed times to failure are t1, t2, .........,tn. The estimated mean time to failure, MTTF is MTTF = (1/n)Σ ti

Reliability Audit Lab

VEM

Useful Life Metrics: Mean Time Between Failures (MTBF)

RAL

Mean Time Between Failures [MTBF] - For a repairable item, the ratio of the cumulative operating time to the number of failures for that item.
(also Mean Cycles Between Failures, MCBF, etc.)

EXAMPLE: A motor is repaired and returned to service six times during its life and provides 45,000 hours of service. Calculate MTBF.

Total operating time 45 ,000 MTBF = = = 7,500 hours ¿ of failures 6
MTBF or MTTF is a widely-used metric during the Useful Life period, when the hazard rate is constant
Reliability Audit Lab

VEM

The Exponential Distribution

RAL

If the hazard rate is constant over time, then the product follows the exponential distribution. This is often used for electronic components.

ht = λ=constant 1 MTBF mean time between failures = λ −λt f t =λe  −λt F t =1−e  Rt =e−λt
At MTBF: R t =e−λt =e
1 −λ   λ

=e−1 =36. 8

Appropriate tool if failure rate is known to be constant
Reliability Audit Lab

VEM

The Exponential Distribution
0.0003

RAL

λ=.0003
0.0002

PDF:

f(t)
0.0001

λ=.0002 λ=.0001
0 0 1 10 4 2 10 4 3 10 4 4 10 4 5 10 4

Time to Failure
1

0.667

CDF:

λ=.0001 λ=.0002

F(t)
0.333

λ=.0003
0 0 1 10 4 4 2 10 4 3 10 4 4 10 5 10 4

Reliability Audit Lab

Time

VEM

Useful Life Metrics: Reliability

RAL

Reliability can be described by the single parameter exponential distribution when the Hazard Rate, λ, is constant (i.e. the “Useful Life” portion of the bathtub curve),

 R= e

t MTBF

=e

− FR t

Where:

t = Mission length (uptime or cycles in question)

EXAMPLE: If MTBF for a motor is 7,500 hours, the probability of operating for 30 days without failure is ...

 R=e

30 ∗ 24 hours − 7500 hours

 = 0 .908 = 90 . 8

A mathematical model for reliability during Useful Life
Reliability Audit Lab

VEM

RAL

3. DFR – Weibull Plotting

Reliability Audit Lab

VEM

Weibull Probability Distribution

RAL

• Originally proposed by the Swedish engineer Waloddi Weibull in the early 1950’s • Statistically represented fatigue failures • Weibull probability density function (PDF, distribution of values):

f t  =

β

β -1 − t t  η e β



β

η

Equation valid for minimum life = 0

t = Mission length (time, cycles, etc.) β = Weibull Shape Parameter, “Slope” η = Weibull Scale Parameter, “Characteristic Life”
Reliability Audit Lab
Waloddi Weibull 1887-1979

VEM

The Weibull Distribution

RAL

This powerful and versatile reliability function is capable of modeling most real-life systems because the time dependency of the failure rate can be adjusted.

β h  t  = β  t  β -1 η
f
β−1 − t βt η  t = β e



β

η

R t =1−F  t =e

− t η



β

Reliability Audit Lab

VEM

RAL

Weibull PDF
• • • Exponential when β = 1.0 Approximately normal when β = 3.44 Time dependent hazard rate
0 .0 0 5

f

β−1 − t βt η  t = β e



β

η

0 .0 0 4

0 .0 0 3

β=0.5 η=1000 β=3.44 η=1000

0 .0 0 2

β=1.0 η=1000

0 .0 0 1

500

1000

1500

2000

Reliability Audit Lab

VEM

RAL β > 1: Highest failure rate later“Wear-Out”
0.006

Weibull Hazard Function
ht  = f t  f t  = 1 - F t  R t  β h

ht  =


t η

β−1

t exp − η
β

t 1 - 1 - exp − η ht  = β  t  β -1 β η

{

[  ] [   ]}
β

0.004

β=0.5 η=1000

β=3.44 η=1000 β=1.0 η=1000

h(t)
0.002

0

500

1000

1500

2000

2500

β < 1: Highest failure rate early“Infant Mortality”
Reliability Audit Lab

Time

β = 1: Constant failure rate

VEM

Weibull Reliability Function

RAL

Reliability is the probability that the part survives to time t.
1

R t =1−F  t =e

− t η



β

0.8

β=3.44 η=1000 β=1.0 η=1000 β=0.5 η=1000

0.6

R(t)
0.4

0.2

0

0

500

1000

1500

2000

2500

Time
Reliability Audit Lab

VEM

RAL

Summary of Useful Definitions - Weibull Analysis
Beta (β): B-life: CDF: Eta (η): The slope of the Weibull CDF when printed on Weibull paper A common way to express values of the cumulative density function - B10 refers to the time at which 10% of the parts are expected to have failed. Cumulative Density Function expresses the time-dependent probability that a failure occurs at some time before time t. The characteristic life, or time at which 63.2% of the parts are expected to have failed. Also expressed as the B63.2 life. This is the y-intercept of the CDF function when plotted on Weibull paper. Probability Density Function expresses the expected distribution of failures over time. A plot where the x-axis is scaled as ln(time) and the y-axis is scaled as ln(ln(1 / (1-CDF(t))). The Weibull CDF plotted on Weibull paper will be a straight line of slope β and y intercept = ln(ln(1 / (1-CDF(0))) = η.

PDF: Weibull plot:

Reliability Audit Lab

VEM

Weibull Analysis

RAL

What is a Weibull Plot ?
• Log-log plot of probability of failure versus age for a product or component Nominal “best-fit” line, plus confidence intervals Easily generated, easily interpreted graphical read-out Comparison: test results for a redesigned product can be plotted against original product or against goals
Reliability Audit Lab
Confidence on Fit

Weibull Best Fit

Observed Failures

• •

VEM

Weibull Shape Parameter (β ) and Scale Parameter (η ) Defined

RAL

β is called the SLOPE For the Weibull distribution, the slope describes the steepness of the Weibull best-fit line (see following slides for more details). β also has a relationship with the trend of the hazard rate, as shown on the “bathtub curves” on a subsequent slide. η is called the CHARACTERISTIC LIFE For the Weibull distribution, the characteristic life is equal to the scale parameter, η. This is the time at which 63.2% of the product will have failed.
Scale and Shape are the Key Weibull Parameters

Reliability Audit Lab

VEM

β and the Bathtub Curve

RAL

β<1
• Implies “infant mortality” • If this occurs: ­ Failed products “not to print” ­ Manufacturing or assembly defects ­ Burn-in can be helpful • If a component survives infant mortality phase, likelihood of failure decreases with age.

β=1
• Implies failures are “random”, individually unpredictable • An old part is as good as a new part (burnin not appropriate) • If this occurs: ­ Failures due to external stress, maintenance or human errors. ­ Possible mixture of failure modes

1<β<4
• Implies mild wearout • If this occurs ­ Low cycle fatigue ­ Corrosion or Erosion ­ Scheduled replacement may be cost effective

β>4
• Implies rapid wearout • If this occurs, suspect: ­ Material properties ­ Brittle materials like ceramics • Not a bad thing if it happens after mission life has been exceeded.

Reliability Audit Lab

VEM

RAL

5. DFR – System Reliability

Reliability Audit Lab

VEM System Reliability Evaluation

RAL

A system (or a product) is a collection of components arranged according to a specific design in order to achieve desired functions with acceptable performance and reliability measures. Clearly, th type of components used, their qualities, and the design configuration in which they are arranged have a direct effect on the system performance an its reliability. For example, a designer may use a smaller number of high-quality components and configure them in a such a way to result in a highly reliable system, or a designer may use larger number of lower-quality components and configure them differently in order to achieve the same level of reliability. Once the system is configured, its reliability must be evaluated and compared with an acceptable reliability level. If it does not meet the required level, the system should be redesigned and its reliability should be re-evaluated.
Reliability Audit Lab

VEM

Reliability Block Diagram (RBD) Technique

RAL

The first step in evaluating a system's reliability is to construct a reliability block diagram which is a graphical representation of the components of the system and how they are connected. The purpose of RBD technique is to represent failure and success criteria pictorially and to use the resulting diagram to evaluate System Reliability. Benefits The pictorial representation means that models are easily understood and therefore readily checked. Block diagrams are used to identify the relationship between elements in the system. The overall system reliability can then be calculated from the reliabilities of the blocks using the laws of probability. Block diagrams can be used for the evaluation of system availability provided that both the repair of blocks and failures are independent events, i.e. provided the time taken to repair a block is dependent only on the block concerned and is independent of repair to any other block
Reliability Audit Lab

VEM

RAL

Elementary models Before beginning the model construction, consideration should be given to the best way of dividing the system into blocks. It is particularly important that each block should be statistically independent of all other blocks (i.e. no unit or component should be common to a number of blocks). The most elementary models are the following Series Active parallel m-out-of-n Standby models

Reliability Audit Lab

VEM

Typical RBD configurations and related formulae

RAL

Simple Series and Parallel System
Figure a shows the units A,B,C,….Z constituting a system. The interpretation can be stated as ‘any unit failing causes the system as a whole to fail’, and the system is referred to as active series system. Under these conditions, the reliability R(s) of the system is given by

R(s) = Ra * Rb * Rc * ………Rz I A B C
a) Series System Figure b shows the units X and Y that are operating in such a way that the system will survive as long as At lest one of the unit survives. This type of system is referred to as an active parallel system.

Z

O

R(s) = 1 – (1 – Rx)(1 – Ry) X I Y
b) Parallel System Reliability Audit Lab

O

VEM
A Series / Parallel System

RAL

When blocks such as X and Y themselves comprise sub-blocks in series, block diagrams of the type are illustrated in figure c. Rx = Ra1 * Rb1 * Rc1 *……..Rz1; Ry = Ra2 * Rb2 * Rc2 *……..Rz2 Rs = 1 – (1 – Rx)(1 – Ry)

A1 I A2

B1

C1

Z1 O

B2

C2
c) Series / ParallelSystem

Z2

Reliability Audit Lab

VEM
m-out-of-n units
The figure represents instances where system success is assured whenever at least m of n identical units are in an operational state. Here m = 2, n = 3. Rs = (Rx)^3 + 3*(Rx)^2*Fx, where Fx = 1 – Rx.

RAL

X X X
d) m-out-of-n System

I

2/3

O

Reliability Audit Lab

VEM

RAL

6. DFR – Reliability Testing

Reliability Audit Lab

VEM

Reliability Testing - Why?

RAL

Reliability Testing allows us to:
• Determine if a product’s design is capable of performing its intended function for the desired period of time. • Have confidence that our sample-based prediction will accurately reflect the performance of the entire population. • Provide a path to “grow” a product’s reliability by identifying weak points in the design. • Confirm the product’s performance in the field. • Identify failures caused by severe applications that exceed the ratings, and recognize opportunities for the product to safely perform under more diverse applications.
Reliability Audit Lab

VEM

Reliability Testing - Measures

RAL

Reliability Testing answers questions like …
• What is my product’s Failure Rate? • What is the expected life? • Which distribution does my data follow? • What does my hazard function look like? • What failure modes are present? • How “mature” is my product’s reliability?
These metrics and more can be obtained with the right reliability test
Reliability Audit Lab

. . .. ..

VEM

RAL

Four Major Categories of Reliability Testing
• Reliability Growth Tests (RGT)
- Normal Testing - Accelerated Testing

• Reliability Demonstration Tests (RDT) • Production Reliability Acceptance Tests (PRAT) • Reliability Validation (RV)

Reliability Audit Lab

VEM

Reliability Testing - Growth Testing

RAL

Scope: To determine a product’s physical limitations, functional capabilities and inherent failure mechanisms.
• Emphasis is on discovering & “eliminating” failure modes • Failures are welcome. . . represent data sources • Failures in development = less failures in field • Used with a changing design to drive reliability growth • Sample size is typically small • Test Types: Normal or Accelerated Testing • Can be very helpful early in process when done on competitor products which are sufficiently similar to the new design.
Used early & throughout the design process
Reliability Audit Lab

VEM

Reliability Testing … Demonstration Testing

RAL

Scope: To demonstrate the product’s ability to fulfill reliability, availability & design requirements under realistic conditions.
• Failures are no longer hoped for, because they jeopardize compliance (though it’s still better to catch a problem before rather than after launch!) • Management tool . . . provides means for verifying compliance • Provide reliability measurement, typically performed on a static design (subsequent design changes may invalidate the demonstrated reliability results) • Sample size is typically larger, due to need for degree of confidence in results and increased availability of samples.

Used at end of design stages to demonstrate compliance to specification

Reliability Audit Lab

VEM

Reliability Testing … Production Reliability Acceptance Testing (PRAT)

RAL

Scope: To ensure that variation in materials, parts, & processes related to move from prototypes to full production does not affect product reliability
• Performed during full production, verifies that predictions based on prototype results are valid in full production • Provides feedback for continuous improvement in sourcing/manufacturing • Sample size ranges from full(screen) to partial (audit) • Test Types: Highly Accelerated Stress Screens/Audits (HASS/A), Environmental Stress Screening (ESS), Burn in
Screens and Audits precipitate and detect hidden defects
Reliability Audit Lab

VEM

Reliability Testing … Validation

RAL

Scope: To ensure that the product is performing reliably in the actual customer environment/application.
• “Testing results” based on actual field data sources • Provides field feedback on the success of the design • Helps to improve future design / redesign & prediction methods • Requires effective data collection & corrective action process • Sample size depends on the customer & product type

Reliability Validation tracks field data on Customer Dashboards
Reliability Audit Lab

VEM

Reliability Testing … The Path

RAL

NPI (New Products):
Set Reliability Goals Develop Models Initial Design Accelerated Testing NPI Pilot Readiness Mature Design Implement Production Reliability Demonstration Audit Programs Establish service schedule Keep updated dashboards Ensure Data Collection Improve future design

Initial Design
Growth Testing

Pilot Testing
Demonstration Testing

Implementation
Acceptance Testing

Post-Sales Service
Validation Testing

Legacy Products:
Complaint generated Create case Clarify Reproduce Failure Reliability Verification Revise goals Redefine models Product redesign Implement changes Reliability Demonstration Audit Programs

Field Data Acquisition
Validation Testing

Verification
Growth Testing

Product Redesign
Demonstration Testing

Implementation
Acceptance Testing

Reliability Tests are critical at all stages!
Reliability Audit Lab

VEM

RAL

7. DFR – Accelerated Testing

Reliability Audit Lab

VEM

Accelerated Testing

RAL

Scope : Accelerated testing allows designers to make predictions about the
life of a product by developing a model that correlates reliability under accelerated conditions to reliability under normal conditions.

BASIC CONCEPT
Time to Failure

. . .
Stress

Model: The model is how we extrapolate back to normal stress levels.

. . .

Common Models: • Arrhenius: Thermal • Inverse Power Law: Non-Thermal • Eyring: Combined

To predict here,
(Normal stress level)

Results @ high stress + stress-life relationship = Results @ normal stress
Reliability Audit Lab

}

we test here
(Elevated stress level)

}

VEM

Accelerated Testing

RAL

Key steps in planning an accelerated test:
• Choose a stress to elevate: requires an understanding of the anticipated failure mechanism(s) - must be relevant (temp. & vibration usually apply) • Determine the accelerating model: requires knowledge of the nature of the acceleration of this failure mechanism, as a function of the accelerating stress. • Select elevated stress levels: requires a previous study of the product’s operating & destructive limits to ensure that the elevated stress level does not introduce new failure modes which would not occur at normal operating stress levels.

Applicability of technique depends on careful planning and execution
Reliability Audit Lab

VEM
Parametric Reliability Models One of the most important factors that influence the design process of a product or a system is the reliability values of its components.

RAL

In order to estimate the reliability of the individual components or the entire system, we may follow one or more of the following approaches. Historical Data ➢Operational Life Testing ➢Burn-In Testing ➢Accelerated Life Testing

Reliability Audit Lab

VEM
Approach 1 : Historical Data

RAL

The failure data for the components can be found in data banks such as

GIDEP (Government-Industry Data Exchange Program), MIL-HDBK-217 (which includes failure data for components as well as AT&T Reliability Manual and Bell Communications Research Reliability Manual.

procedures for reliability prediction),

In such data banks and manuals, the failure data are collected from different manufacturers and presented with a set of multiplying factors that relate to different manufacturer's quality levels and environmental conditions

Reliability Audit Lab