You are on page 1of 28

The Journal of the Second Quarter - 2003

Reliability Analysis Center


Celebrating 35 Years
of Excellence in R&M

3
Five Key Ways to Improve Reliability Methods for
By: David E. Mortin, Ph.D., U.S. Army Materiel Systems Analysis Activity, Aberdeen Proving Reducing the Cost
Ground, MD, Stephen P. Yukas, U.S. Army Evaluation Center, Aberdeen Proving Ground, MD, and to Maintain a Fleet
Michael J. Cushing Ph.D., U.S. Army Materiel Systems Analysis Activity, of Repairable
Aberdeen Proving Ground, MD System

I
Introduction Efforts to eliminate failures require a commit-
12
ment of technical resources. Engineers are need-

N
The importance of achieving reliability require- PRISM Column
ments on today’s weapon systems cannot be ed to conduct thermal and vibration analyses to
understated. Noncompliance with reliability address potential failure mechanisms and failure Insert

S
sites. These analyses can include the use of Training Flyer
requirements may result in reduced mission
fatigue analysis tools, finite element modeling,
effectiveness and billions of dollars in added 13
dynamic simulation, heat transfer analyses, and

I
operating and support costs over the life cycles of Reliability Testing of
systems. Today, with shortened acquisition other engineering analysis models. Industry, uni-
Printed Wiring
cycles and systems employing complex state-of- versities, and government organizations have pro-
Boards with

D
the-art technologies, it is not unusual to find duced a large number of engineering tools that are
widely used, especially in the commercial sector. Interconnect Stress
weapon systems falling short of fulfilling their Testing Technology
The capability exists to model and address a num-

E
reliability requirements. There are many reasons
ber of failure mechanisms for everything from (IST)
why systems fail to achieve their requirements.
suspension components to circuit boards. Yet, we
However, there are several reasons for failures 17
continue to see poorly designed components mak-
that repeatedly surface. Five of these reasons University of
are: (1) failure to design in reliability early in the ing their way into system-level developmental
Warwick Conducting
development process; (2) inadequate lower-level testing, operational testing, and beyond. As one
example, we still see circuit boards that, because Reliability
testing; (3) relying on predictions instead of con- Benchmark Survey
ducting engineering design analysis; (4) failure of their design and mounting, have very low nat-
to perform engineering analyses of commercial- ural frequencies, large board displacements, and 19
off-the-shelf (COTS) equipment; and, (5) lack of very large stresses imposed upon component DoD to Replace
reliability improvement incentives. This article leads. This type of problem can be identified very Handbook on Test
early in the design process with a $15,000 engi-
discusses how reliability can be improved by and Evaluation of
focusing on design, lower-level testing, conduct- neering analysis and fixed for a few dollars more.
Based on one example, the reduction in operating
Systems Reliability,
ing appropriate analyses, evaluating COTS, and Availability, and
providing incentives for reliability improvement. and support costs associated with identifying a
single problem in a given circuit card can be Maintainability
$27,000,000. 21
Design in Reliability Early in the
Future Events
Development Process Conduct Lower-Level Testing
It is critical that developers have a thorough 22
Lower-level testing, such as highly accelerated life
understanding of the tactical operational environ- testing (HALT) and highly accelerated stress
From the Editor
ment in which the system, subsystem, or compo- screening (HASS), is critical for precipitating fail- 22
nent will be employed and ensure that the ures early and identifying weaknesses in the RMSQ Headlines
requirements flow-down process to suppliers is design. Contractors are encouraged to plan and
adequate. Given this understanding, many implement an accelerated qualification process
potential failures can be identified and eliminat- that is based on engineering and failure mecha-
ed very early in the development process. In nism analysis. Integration testing is also critical
addition, every effort should be used to leverage 35
for identifying unforeseen interface issues. Years
existing field test results to support failure analy- Although some programs are conducting these Of Leadership
in R&M
sis efforts. lower-level tests, many do not or only perform

RAC is a DoD Information Analysis Center Sponsored by the Defense Technical Information Center
The Journal of the Reliability Analysis Center

them for a small subset of the components. These tests can be very bility growth. When directly used to influence the design team,
affordable but, when budgets are reduced, can be some of the first or when used to manage reliability progress, these tools can be
activities to be cut. If a program does not conduct early testing that extremely useful to focus engineering and testing efforts.
is specifically designed to precipitate failures so that the design can However, the most important reliability tools are the structural,
be improved early in the program, there will likely be unforeseen thermal, fatigue, failure mechanism, and vibration models used
problems down the road that will cause significant schedule delays by the design team to ensure that they are producing a product
and operating and support cost increases. Additionally, it is a good that will have a sufficiently large failure-free operating period. A
idea to integrate the customer in both the test and development good supplier routinely conducts thermal and vibration analyses
process early on to provide valuable insights on user operations to address potential failure mechanisms and failure sites. When
and requirements that can impact the test and design. the major focus of the system reliability program is reliability
prediction, there is a high potential for failure.
Developmental testing serves as one of the last opportunities to
fix remaining problems and increase the probability of system Perform Engineering Analyses of Commercial-
success. Some programs choose to have very limited or no for-
mal developmental testing. When an Army system meets the Off-The-Shelf Equipment
reliability requirement in developmental testing, there is a 68 COTS equipment represents a great opportunity to improve reli-
percent chance that it will meet the operational testing reliabili- ability, reduce costs, and leverage the latest technologies.
ty requirement. If the system fails in developmental testing, However, COTS does not imply that we abandon engineering
there is only an 18 percent chance that it will meet the opera- analyses and early testing. On many occasions, we have heard
tional testing reliability requirement. Significant program set- the expression, “that piece of equipment is COTS so its reliabil-
backs often happen when testing is reduced or eliminated to ity is what it is.” Thermal, vibration, fatigue, and failure mech-
meet schedule or cost constraints. Developmental testing is not anism modeling, combined with early accelerated testing, can
only required to ensure system reliability maturation but also to quantify and minimize the risk of COTS equipment failing in the
mitigate risk in meeting requirements during operational testing. military operating environment. We still have cases where a
Insufficient or poorly planned developmental test strategies major COTS failure mode is discovered relatively late in the pro-
often result in systems failure and a repeat of operational testing. gram. Often COTS equipment data is proprietary; however,
there are typically workarounds that can be used to develop data
Early low-level testing, along with focused higher-level testing, is that can support sufficiently detailed engineering analyses.
key to producing products with high reliability. Without compre- Relatively simple vibration and thermal analyses can detect
hensive lower-level testing on most or all-critical subassemblies, some potential showstoppers. The showstoppers that have
and without significant integration and developmental testing, there emerged because of inadequate early analysis have cost millions
is little likelihood that high levels of reliability will be achieved. of dollars. Again, insurance of adequate requirements flow-
down to suppliers is a must. It is a good idea to adhere to a strict
configuration of suppliers as well.
Rely More on Engineering Design Analysis and
Less on Predictions Provide Reliability Incentives in Contracts
We still see programs where, when you ask to see the status of Often, the supplier does not have a strong incentive to make a
the reliability program, you are shown a set of system and product reliable. Even when reliability is mentioned in a state-
subsystem reliability predictions. A reliability prediction may ment of work or in platform specifications, the weight of relia-
have little or nothing to do with the actual reliability of the prod- bility in the selection criteria is usually small. Contractors have
uct and can actually encourage poor design practices. Contractors to bid low in order to be competitive. When they have to trim
and subcontractors that frequently quote predictions may not their programs, reliability is often one of the first areas to go. To
understand the engineering and design considerations necessary further complicate things, contractors typically make significant
to minimize risk and to produce a reliable design. In many cases, amounts of profit from follow-on replenishment spares. Unless
the person producing the prediction may not be a direct contribu- the contractor sees value in directing and resourcing the design
tor to the design team. The historic focus on the accounting of team to achieve high reliability, we will continue to field equip-
predictions versus the engineering activities needed to eliminate ment with reliability values that fall far short of what the com-
failures during the design process has significantly limited our mercial consumer typically experiences. Most suppliers have
ability to produce highly reliable products. Additionally, in some the engineering staff and technical know-how to produce highly
cases, predictions are used as a means to bypass demonstration of reliable systems.
contractual specifications in developmental testing resulting in
failure to meet operational testing requirements. High reliability
is not obtained through reliability predictions. Summary
By one estimate, operating and support costs represent 60 per-
When most people think of reliability models, they think of reli- cent of total life cycle costs. Reliability improvements directly
ability predictions; reliability block diagrams; failure mode, influence the majority of the operating and support cost contrib-
effects, and criticality analysis (FMECA); fault trees; and relia- utors. Over the life cycle of a major system, moderate improve-

2 Second Quarter - 2003


The Journal of the Reliability Analysis Center

ments in reliability can result in savings in the hundreds of mil- Ground, MD. He has a B.S. in aerospace engineering from the
lions to billions of dollars. State University of New York at Buffalo, an M.S. in statistics
from the University of Delaware, and a Ph.D. in reliability engi-
The five areas addressed in this article can provide a significant neering from the University of Maryland.
step toward achieving high reliability. Many people perceive
that high levels of system reliability have to be very costly to Stephen P. Yuhas is the Reliability and Maintainability Director at
achieve. This perception can be based on the notion that only the U.S. Army Evaluation Center. He holds a B.S. in mathematics
expensive or militarized components provide high levels of reli- from Pennsylvania State University. He has also completed exten-
ability and that higher reliability equates to significant increases sive graduate studies in operations research/industrial engineering
in testing and delays in schedule. We must change this percep- at Penn State and in statistics at the University of Delaware.
tion. When engineering-based reliability improvement tech-
niques are performed as part of the design and development Dr. Michael J. Cushing is a technical advisor for the U.S. Army
process, high reliability can be cost-effectively achieved. Materiel Systems Analysis Activity. He has a B.S. in Electrical
Engineering from Johns Hopkins University and an M.S. and
About the Authors Ph.D. in reliability engineering from the University of Maryland.
Dr. David E. Mortin is Chief of the Reliability Branch at the U.S.
Army Materiel Systems Analysis Activity, Aberdeen Proving

Methods for Reducing the Cost to Maintain a Fleet of Repairable


System
By: Larry H. Crow, Alion Science and Technology

Introduction overhaul at the subsystem and LRU levels. Consequently, to


When a fleet is first deployed, the economic life and useful life address repair and overhaul criteria appropriate for a system in a
parameters are often not known. However, as the fleet ages, fleet, a methodology must be applicable to all levels of potential
spares usage, repair frequency, reliability, and cost information repair and overhaul options. Therefore, the methods discussed in
become available that may be used to estimate these parameters. this article apply at the complex repairable system, subsystem, and
LRU levels. The terminology “system” is used to reflect any of
Specific problems receiving increased attention as systems age these applications, and the only assumptions are that a system is
are: complex, repairable, and satisfies the Power Law reliability model
assumption discussed in the next section.
1. Cost to maintain a fleet due to repair and overhaul.
2. Maintaining the mission reliability requirements. Notation
3. Determining the optimum repair and overhaul strategy to λ Scale parameter, Power Law model
minimize life cycle cost. β Shape parameter, Power Law model
4. Determining the wearout profile for a fielded system. λs System failure intensity
5. Determining corrective actions for fielded systems to λA Type A mode failure intensity
upgrade reliability and reduce cost. λB Type B mode failure intensity
λGP Growth Potential failure intensity
In this article we present two methodologies designed to provide
λP Projected failure intensity
information based on data that will help make decisions on these
issues. One methodology is concerned with minimizing total life u (t) Intensity function
cycle costs due to repair and overhaul. The other methodology t System age
is concerned with corrective actions and in-service reliability N(t) Number of failure at system age t
growth to increase reliability and therefore reduce the cost of Tj Total operating time for jth system
failures and overhauls. Specifically, the minimum life cycle cost TUL System useful life
methodology addresses issues 1, 2, 3, and 4, and the in-service Xi,j Age at ith failure for jth system
reliability growth methodology addresses issues 1 and 5. K Number of systems in sample
D Number of Intervals
In many cases, the approach to sustaining a given system fleet may MIq Distinct Type B modes in qth interval
differ from the approach for another fleet of the same system. For M Total Number of Distinct Type B modes
example, the sustainment policy for one fleet of helicopters may C1 Average cost of repair
require periodic general overhaul for the entire helicopter, whereas C2 Cost of overhaul
the sustainment policy for another helicopter fleet may only have To Optimum overhaul time to minimize cost
Second Quarter - 2003 3
The Journal of the Reliability Analysis Center

System Reliability Model Also, for a system of age t we are often interested in the proba-
Both the minimum life cycle cost methodology and the in-serv- bility that the system will go to age t+b without failure. This is
ice reliability growth methodology assume that systems under mission reliability for a system of age t and mission length b.
consideration are complex, with many failure modes, and are For many systems maintaining a minimum level for the mission
repaired upon failure. If a repair just restores the system to oper- is an important consideration in costs, maintenance and over
ation it is called “minimum repair”. Under minimum repair the haul strategies. For the Power Law NHPP the mission reliabili-
system reliability after the repair is the same as the system relia- ty is given by
bility before the failure. Based on these assumptions, the under-
β β
R(t) = e -[λ (t + b) -λt ]
lying system failures follow the non-homogeneous Poisson (3)
process with intensity u(t). Also, the reliability analysis of
repairable systems under customer use will involve data generat-
ed by multiple systems. Crow, References 2 and 4, developed the A special application of the Power Law NHPP is for reliability
Power Law Non-homogeneous Poisson Process (NHPP) as a growth. The Power Law NHPP is the basis for the Crow
model for complex repairable systems and presented procedures (AMSAA) model developed in Reference 2. The Crow
for analyzing data from multiple systems. This model is widely (AMSAA) model will be applied as part of the analysis for the
used and is the standard model for repairable systems in in-service reliability growth analysis. We will also apply esti-
International Electrotechnical Commission standards. mation, goodness of fit tests, confidence intervals and other pro-
cedures given in Reference 4 for the application of this model in
Under the Power Law NHPP the intensity u(t) is this article.

u(t) = λβtβ-1 (1) For a repairable system with β > 1, we discuss two options for
reducing Life Cycle costs:
where t > 0 is the system’s age and λ, β > 0 are parameters. Also,
for the Power Law NHPP model the mean value function • By an optimum choice of overhaul schedule
• By reliability growth
β
E[N(t)] = λt t>0 (2)
For β = 1, overhaul may not be necessary, and the option to
is the expected number of failures for a system during its oper- reduce costs may be reliability growth. An example will be
ating time (0, t). given illustrating this case.

To perform the analyses discussed herein, we need failure data for Minimum Life Cycle Cost Model
K systems chosen at random from the fleet population. Each of the One consideration in reducing the cost to maintain a fleet of sys-
K data sets starts at system age 0 and represents a sequence of fail- tems (β > 1) is to establish an overhaul policy that will minimize
ures and repairs. If the systems are overhauled, then each cycle the total life cycle cost of the system. This says there is a point
starts at time 0, initialized after an overhaul, and each failure time in which it is cheaper to overhaul a system and return it to the
is the total accumulative operating time at failure during the over- fleet than to continue repairs. What is the overhaul time that will
haul cycle. System age t is the accumulated operating time since minimize the total life cycle cost, considering repair cost and the
overhaul. If the systems are not overhauled, then the age 0 begins cost of overhaul? This solution for a general NHPP is given in
when the system is deployed into the fleet and age is the accumu- Reference 1. Applying this solution to the Power Law NHPP
lated operating time since deployment. In both cases, the data for model with parameters λ, β, and average repair cost C1 and over-
the jth system consists of the failure times Xi,j, i = 1,…,N(Tj), haul cost C2, the optimum overhaul time To that will minimize
where N(Tj), is the total number of failures for the j-th system, and the life cycle cost of the system is given by:
Tj is the total accumulated time, j = 1,…,K. The failure times Xi,j
are the accumulated age at failure so the X1,j < X2,j <…< X N(Tj),j. 1/ β
 C2 
Note also that the total accumulated time Tj may or may not corre- To =   (4)
spond to a failure time. If XN(Tj),j = Tj the data are failure truncat-  λ ( β - 1)C1
ed, and if XN(Tj),j < Tj the data are time truncated.
The value T is called the economical life of the unit and is the
Note that for β = 1, we have the homogeneous Poisson process, operating time when the average cost of operation per unit time
and a constant intensity of failure. For β > 1, u(t) is increasing is minimum. The mission reliability and economical life To are
and the successive interval between failures. Xi,j - Xi-1,j are tend- of particular interest when β > 0 and is the main focus in this arti-
ing to decreasing, which is characteristic of wearout. For β < 1, cle. In particular, for β > 0, R(t) is decreasing as t increases. If
u(t) is decreasing and the successive interval between failures. the mission reliability R(t) must be greater than a certain level,
Xi,j - Xi-1,j are tending to increase, which is characteristic of qual- say R0 then the time TM when R(TM) = R0 is the mission life of
ity and manufacturing issues. the unit. Useful life is the minimum of To and TM.

4 Second Quarter - 2003


The Journal of the Reliability Analysis Center

To apply this useful life model, the parameters λ and β must be Example for Minimum Life Cycle Cost Model
estimated or assumed known. The estimation of these parame- Suppose we consider 11 systems selected at random from a fleet.
ters is addressed next. (A small number of systems are used for the example in order to
illustrate the methods. In practical applications a much larger
Estimation for Minimum Life Cycle Cost Model data set would be analyzed. This is hypothetical data and does
We estimate the model parameters λ and β based on the failure not represent any actual system.)
data from K systems. The maximum likelihood (ML) estimates
of λ and β References 2, 4, are values λ̂ and β̂ given by The nominal overhaul cycle is 1,500 hours, but the actual over-
haul time will often vary. The history of these systems for the
K last complete overhaul cycle are given in Table 1.
∑ N(Tj )
j=1
λˆ = , (5) Table 1. System 1 Data
K βˆ
∑ (T j ) System Failures OH Time
j=1 1 68,1137,1167 1,268
2 682,744,831 1,300
3 845 1,593
K 4 263,399 1,421
∑ N(T j ) 5 - 1,574
j=1
βˆ = (6) 6 - 1,415
K K N(T j ) 7 598 1,290
λˆ ∑ (T jβ ln T j ) - ∑ ∑ ln X i, j
ˆ
8 - 1,556
j=1 j=1 i =1 9 - 1,426
10 730 1,124
11 - 1,568
In general, these equations cannot be solved explicitly for λ̂ and
β̂ , but must be solved by iterative procedures. Once we have the
estimates λ̂ and β̂ , the ML estimate of the intensity function is: Applying Equations (5) and (6), the ML estimates of λ and β are
λ̂ = 0.000444,β̂ = 1.064. Because this is a small sample size we
use an upper confidence interval (CL) on β given in References
û(t) = λˆβˆt β̂ -1 , (7)
2 and 4. A 95% upper CL on β is β* = 1.774, and using this in
(5) we estimate λ by λ* = 0.000002558. For an average repair
the ML estimate of the mean value function is cost of C1 = $29,860 and an overhaul cost of C2 = $100,000, the
optimum overhaul time to minimize life cycle cost is estimated
Ê[N(t)] = λˆt β ,
ˆ (8) by Equation (10) as T̂o = 3,237 hours. This is the economic life
overhaul time that will minimize the system total life cycle costs
due to repairs and overhead.
and the ML estimate of the mission reliability function is
The mission time is b = 3 hours, and the minimum mission reli-
βˆ ˆ βˆ ability requirement is 0.995. At 3,237 hours the mission relia-
R̂(t) = e -[λ (t + b) -λt ] ,
ˆ (9)
bility is estimated to be 0.993. This is less than the requirement.
At 1,500 hours the mission reliability is estimated to be 0.996.
The estimate for the optimum overhaul time to minimize life This means the useful life for overhauls is between 1,500 and
cycle cost is given by 3,237 hours. At 2,060 hours the mission reliability is 0.995.
Because we are using an upper bound on β this preliminary
analysis indicates that the useful life overhaul time is greater
1 / βˆ
 C2  than the current 1,500 hours, but probably less than 3,237 hours.
T̂o =   .
(10)
 λ ( β - 1)C1 
ˆ ˆ
For a preliminary annual cost savings analysis the average num-
ber of hours per year on the fleet population of systems is
This is the estimated economic life of the system, and is the point approximately 79,000 hours. For the current nominal overhaul
where the average repair cost is minimized. The desired esti- time of 1,500 the expected number of failures between over-
mated useful life T̂UL is the minimum of T̂o and T̂M , (where for hauls, given by (8), is 1.10. The average cost of these failures is
mission length b) (1.10)($29,860) = $32,914. There is one overhaul per cycle at a
cost of $100,000. Therefore, the average total cost per 1,500
(11)
R(T̂M ) = R 0 , hour cycle is $132,846, and the average cost per hour is
($132,846/1,500) = $88.56. For an overhaul time of 2,060 hours
and R0 is the required minimum mission reliability. the corresponding average cost per hour is $76.66, or a savings

Second Quarter - 2003 5


The Journal of the Reliability Analysis Center

per hour of $11.90. Assuming 79,000 hours annual usage, this is in the system it is rarely totally eliminated by a corrective
translates to a yearly savings of $940,100. The overhaul time of action. In particular, after a Type B mode is found and fixed, a
3,237 hours gives an average cost per hour of $70.65. The annu- certain fraction of the failure mode average failure intensity will
al cost savings for this overhaul time is approximately $1.4 mil- be removed, but a certain fraction will generally remain.
lion. This is the maximum savings for any choice of overhaul
times and gives the minimum overall system life cycle cost. A fix effectiveness factor (EF) is the fraction decrease in a prob-
lem mode Average Failure Intensity after a corrective action.
The estimated overhaul time of 2,060 hours gives the maximum Studies indicate that an average EF of about 0.70 is typical for a
cost savings considering the constraint of maintaining a mini- reliability growth program during development. However, indi-
mum mission reliability of 0.995. Of course, a thorough analy- vidual EF’s for the failure modes may be larger or smaller than
sis would require applying these methods to a larger sample size. the average.

Approach for In-Service Reliability Growth The baseline failure intensity λS, is the current intensity and we
In this article we adapt a reliability growth projection model wish to project the decrease in λS due to correcting the Type B
developed by Crow in Reference 3 for application to in-service modes. The A mode, B modes and EF management strategy, may
reliability growth for a fleet of systems. During reliability be changed in order to reached a desired reliability objective.
growth testing there is a test phase, of length T, say, and a corre-
sponding chronological order of failures and identification of For discussions in this article, we assume an average effective-
candidate problem modes for corrective action. We assume the ness factor for the corrective actions so that the projection model
corrective actions are delayed until time T. takes the form

The projection model assumes that the system failure intensity is λP = λA + (1-d) λB +dh(T) (12)
constant, say, λS, during this testing over time T, and then jumps
to a lower value due to the incorporation of corrective actions. where λP is the lower failure intensity due to the corrective
The projection model estimates this lower failure intensity value actions, T = total test time, and h(T) is the rate in which new
based on information from the test. A projection allows the var- Type B modes are being uncovered.
ious corrective actions options to be assessed. The scope of this
article is on illustrating the adaptation and application the pro- Under the reliability growth projection model h(T) = λβT(β-1), is
jection model to in-service fleet reliability growth. For specific the intensity of the Crow (AMSAA) model applied to only the
information on the derivation and background on the projection first occurrences of Type B modes. (This application to the first
model the reader is referred to Reference 3. occurrences is a very important point and will be addressed later
in this article.)
As described in Reference 3, the reliability growth projection
model utilizes the concepts of Type A modes, Type B modes, and Following Reference 3 the following term is the Growth
effectiveness factors. Assume that any corrective actions are the Potential failure intensity.
result of reviewing the T hours of data and observing failure
modes. Management can make one of two possible decisions λGP = λA + (1-d) λB (13)
regarding each observed failure mode, either not fix or fix the
failure mode. Therefore, the management strategy places failure The Growth Potential Average Time Between Failure, 1/λGP, is the
modes into two categories called Type A and Type B modes. maximum Average Time Between Failure that can be attained with
Type A modes are all failure modes such that if seen in the data the current management strategy. This maximum is attained when
no corrective action will be taken. This accounts for all modes all Type B failure modes in the fleet have been seen in the data set
for which management determines that it is not economically or and corrected. The function h(T) is also the failure intensity for all
otherwise justified to take corrective action. Type B modes are Type B failure modes in the fleet that did not appear in the current
all failure modes such that when seen in the data a corrective data set. The growth potential is attained when h(T) is zero.
action will be taken. Thus, the management strategy is to parti-
tion the system into A parts and B parts. Application of the In-Service Reliability Growth
During reliability growth testing the model assumes that the total Model: Wearout Case
system and the A and B parts have corresponding Failure In the interest of space, the application of the in-service reliabil-
Intensity λS, λA, and λB, respectively. That is λS = λA + λB. The ity growth projection model will be illustrated by example.
average intensity λA for Type A modes will not change. With the Steps are given which can be followed as a template for general
management strategy, reliability growth can only be achieved by application. The example illustrates the application to a system
decreasing the Type B failure intensity λB. It is also clear that we that wears out and is overhauled. A statistical goodness of fit test
can only decrease that part of the Type B mode average failure strongly indicates that these systems follow the Power Law
intensity λB that we have seen. In addition, once a Type B mode process as discussed earlier.

6 Second Quarter - 2003


The Journal of the Reliability Analysis Center

The system in this example (System 2) is overhauled but does not and q indexes the successive order of the failures. In the exam-
have a fixed overhaul interval. The prime cost savings strategy ple, N = 37, Y1 = 1,396, Y2 = 5,893, Y3 = 6,418, Y37 = 52,110.
under consideration is to increase reliability and reduce failures. See Table 3.
(This is hypothetical data and does not represent any actual system.)
Table 3. Ordered System 2 Data and Failure Mode
STEP 1. We first obtain a sample of size K from the fleet. A Classification
cycle is a complete history from overhaul to overhaul. For our q Yq Mode q Yq Mode
data set, K = 27 systems are chosen at random. For these sys- 1 1396 B1 20 26361 B1
tems the failure history for the last completed cycle is recorded. 2 5893 B2 21 26392 A
This is the random sample of data from the fleet. See Table 2. 3 6418 A 22 26845 B8
These systems are in the order they were selected. 4 7650 B3 23 30477 B1
5 7877 B4 24 31500 A
Table 2. System 2 Data 6 8012 B2 25 31661 B3
System Cycle Time Tj Nj Failure Times Xi,j 7 8031 B2 26 31697 B2
8 8843 B1 27 36428 B1
1 1396 1 1396
9 10867 B1 28 40223 B1
2 4497 1 4497
10 11183 B5 29 40803 B9
3 525 1 525
11 11810 A 30 42656 B1
4 1232 1 1232
12 11870 B1 31 42724 B10
5 227 1 227
13 16139 B2 32 44554 B1
6 135 1 135
14 16104 B6 33 45795 B11
7 19 1 19
15 18178 B7 34 46666 B12
8 812 1 812
16 18677 B2 35 48368 B1
9 2024 1 2024
17 20751 B4 36 51924 B13
10 943 2 316, 943
11 60 1 60 18 20772 B2 37 52110 B2
12 4234 2 4233, 4234 19 25815 B1
13 2527 2 1877, 2527
14 2105 2 2074, 2105 Suppose our objective is to lower λS by selective corrective
15 5079 1 5079 actions to the systems based on information in the sample. The
16 577 2 546, 577 projection model estimates this lower value, λP.
17 4085 2 453, 4085
18 1023 1 1023 Each system failure time in Table 2 corresponds to a problem and
19 161 1 161
a cause, that is, a failure mode. The management strategy can
20 4767 2 36, 4767
either not fix the failure mode-Type A, or fix the failure mode-
21 6228 3 3795, 4375, 6228
22 68 1 68 Type B. To apply the projection methodology, each accumulat-
23 1830 1 1830 ed operating time in Table 3 is designated as being caused by
24 1241 1 1241 either a Type A mode or a distinct Type B mode. In this exam-
25 2573 2 871, 2573 ple, there are 13 distinct corrective actions corresponding to 13
26 3556 1 3556 distinct Type B failure modes. There are NA = 4 failures due to
27 186 1 186 failure modes that will not receive a corrective action. There are
Totals 52110 37 - NB = 33 total failures due to M = 13 distinct Type B failure
modes that will be corrected. Some of the distinct Type B modes
The failure intensity parameter λS of interest is the average num- had repeats of the same problem. For example, mode B1 had 12
ber of failures per cycle operating hour. The total accumulative occurrences of the same problem. Type B mode B13 had only
operating time is 52,110 hours, with 37 failures. The baseline one occurrence of that particular problem.
parameter λS is estimated by λ̂ S = 37/52,110 or 0.00071. This esti-
mates that for each system hour of operating time in the fleet, there The objective of the projection model is to estimate the impact of
is 0.00071 failures, or a failure every 1408 hours, on average. the M = 13 distinct corrective actions.

STEP 2. In order to apply the projection model we put the N = STEP 3. Choose an average effectiveness factor based on the
37 failure times on an accumulative time scale over (0, T), where proposed corrected actions and historical experience. Historical
T = 52,110. In the example each Ti corresponds to a failure time industry and government data supports a typical average effec-
Xi,j. This is often not the situation. However, in all cases the tiveness factor EF = 0.70 for many systems.
accumulated operating Yq at a failure time Xi,r is
In the System 2 application an average EF = 0.4 was assumed in
r -1 K order to be very conservative regarding the impact of the pro-
Yq = X i,r + ∑ T j , where q = 1,2, ... N and N = ∑ N j
j=1 j=1
posed corrected actions.
(Continued on page 10)
Second Quarter - 2003 7
PRISM
FRACAS

Relex ®

VISUAL RELIABILITY SOFTWARE


Relex Adds New Capabilities!
Relex Software Corporation has long been a worldwide leading source for reliability analysis software.
Whether your needs are Reliability and Maintainability Prediction analyses, complex RBD simulations,
Fault Tree analyses, FMEAs, Weibull or Markov analyses, or Life Cycle Cost projections, Relex Software
provides you with the tools you need to get the job done. No other reliability software supports more
reliability industry standards (including the RAC PRISM reliability model) and supplies such extensive
features and integrated analysis modules wrapped in a user-friendly interface. We’ve recently added a
wealth of new capabilities.

Relex FRACAS Relex 7.6 Reliability


Management System TM

Software Suite
The new Relex FRACAS Management System combines Relex 7.6 represents the pinnacle of reliability
the traditional functionality of a Failure Reporting, analysis software! This newest release of the Relex
Analysis, and Corrective Action System (FRACAS) with Reliability Software Suite contains even more
our signature reliability analysis capabilities to provide features, capabilities, and enhancements in the
an innovative business solution like no other. This industry-recognized user-friendly Relex environment.
closed-loop analysis system will revolutionize your
incident tracking and analysis processes, maximize
● Enhanced Spares Optimization
your product reliability, and directly impact your
● Preventive Maintenance and Inspection Intervals
bottom line.
with Repair Teams
● RDF 2000
● Closed-Loop Corrective Action System ● 299B Parts Count
● Central Data Repository ● FMD-97 Failure Modes
● Analytics to Aid in Making ● HAZOP Capabilities
Informed Decisions
● Company-Wide Collaboration
● Customizable to Your Requirements
● Implementation Services Available

FREE Live, Online Demos!


Register at www.relexsoftware.com/onlinedemos
Join the thousands of satisfied Relex users by taking the first step to RELiability EXcellence.
Call today or visit our web site.

Relex Software
C o r p o r a t i o n

www.relexsoftware.com 724.836.8800
Relex is a registered trademark of Relex Software Corporation. PRISM is a registered trademark, and RACRates and the PRISM and RAC logos
are copyrights of the Reliability Analysis Center. Microsoft Windows and Windows NT are registered trademarks of Microsoft Corporation.
Other brand and product names are trademarks or registered trademarks of their respective holders.
The Journal of the Reliability Analysis Center

Methods for Reducing the Cost . . . (Continued from page 6)


The first term of the projection model (12) is estimated by M
λˆ = . (16)
T β̂
λˆA = N A /T or λˆA = 0.000077.
This is the grouped data version of the Crow (AMSAA) model
Also, applied only to the first occurrence of distinct Type B modes.

For the data in Table 3 we chose the first 4 intervals of length


λˆB = N B /T or λˆB = 0.000633.
10,000, and the last interval of length 12,110. That is, D = 5.
This choice gives an average of about 5 overhaul cycles per
For an average EF, the second term of the projection model is interval. See Table 5.
estimated by
Now, λ̂ = 0.0033 and β̂ = 0.762, which gives
(1 - d) λˆB .
ĥ(T) = λˆβˆT β -1 = 0.00019.
ˆ

For d = 0.4, the estimated second term is


Consequently, for d = 0.4 the last term of the projection model is:
(1 - d) λˆB = 0.000380.
dĥ(T) = 0.000076

This estimates the Growth Potential failure intensity. STEP 5. The projected failure intensity is

STEP 4. To estimate the last term dh(T) of the projection model λˆP = λˆA + (1 - d) λˆB + dĥ(T) or
(12) we partition the data in Table 3 into intervals. This partition
consists of D successive intervals. The length of the qth interval
is Lq, q = 1, …, D. It is not required that the intervals be of the λˆP = 0.000077 + 0.6(0.00063) + 0.4(0.00019).
same length, but there should be several, say at least 5, cycles per
interval on average. Also, let S1 = L1 , S2 = L1 + L2, …, etc., be Table 5. Example of Grouped Data for Distinct Type B Modes
the accumulated time through the qth interval. For the qth inter- No. of Distinct Type B Mode Accum.
val we note the number of distinct Type B mode, MIq, appearing Interval Failures MIq Length Lq Time Sq
for the first time, q = 1, D. See Table 4. 1 4 10000 10000
2 3 10000 20000
Table 4. Grouped Data for Distinct Type B Modes 3 1 10000 30000
4 0 10000 40000
No. of Distinct Type B Mode Accum.
5 5 12110 52110
Interval Failures Length Time
Total 13 - -
1 MI1 L1 S1
2 MI2 L2 S2
The projected failure intensity is
q MIq Lq Sq
D MID LD SD
λˆP = 0.000533.
The third term is estimated by dĥ(T)
This estimates that the 13 proposed corrective actions will
where reduce the number of failures per cycle operation hour from the
current

ĥ(T) = λˆβˆT β -1
ˆ
(14) λˆS = 0.00071 to λˆP = 0.000533.

The average time between failures is estimated to increase from


and the values λ̂ and β̂ satisfy the current 1,408 hours to 1,877 hours.

D

  Sq

[ ] [ ][ ] [ ]
βˆ Ln S - S
q q -1
βˆ Ln S 
q -1  

The growth potential failure intensity is
∑ MI q 
q =1 

Sq[ ] [ ]
βˆ
- Sq -1 βˆ 

(15)
(1 - d)λˆB = 0.00038

10 Second Quarter - 2003


The Journal of the Reliability Analysis Center

and the estimated maximum average time between failure that effectiveness factor of 0.70. Based on this growth model, the
can be attained with this management strategy is 2,631 hours, estimate for the improved MTBF is 567 hours.
i.e., when h(T) = 0.
To test the validity of the model, another set of data was analyzed
STEP 6. The cost reduction associated with incorporating these 13 after the 11 corrective actions were implemented. The original
Type B-mode corrective actions can be calculated by considering data analysis the project team used to determine the design
the reduction in fleet failures. The System 2 population has an changes was not available.
average of 440,000 fleet hours per year. Based on data it is esti-
mated that 74% of the failures result in an overhaul. Currently, Table 6. Example of Medical Device Data
there is an estimated average of 231 overhauls per year. In this Cum Hours Mode Cum Hours Mode
example, we assume each overhaul costs $60,000, for an average 7661 A 10556 A
annual overhaul cost of $13,885,157. By increasing the average 7979 A 12498 A
time to failure from 1,408 to 1,878 hours, under the same assump- 8614 B1 12886 A
tions, we estimate 173 overhauls per year at a cost of $10,412,969. 9568 A 12886 A
Thus, the estimated projected annual cost savings is $3,470,000 if 9568 A 12886 A
the 13 corrective actions are implemented. 9885 A 13663 A
9885 A 14934 B2
10556 A
Application of the In-Service Reliability Growth
Model – No Wearout Case Data was pulled for a second sample of 60 units that had the cor-
We next give an example of the in-service reliability growth model rective actions implemented. This data set had 134,000 operating
to medical equipment. This system is not overhauled, operates hours and 237 failures. The calculated MTBF is 565 hours com-
continuously, and has a constant intensity of failure. The current pared to the projection of 567 hours. Given the actual MTBF, the
interest is to reduce maintenance costs by increasing reliability. true effectiveness factor is calculated to be 0.73. When looking at
the B-modes specifically, the MTBF of the B-modes improved
Note: For this example, the author modified all the actual real- from 5,700 hours to 11,600 hours which was considered a success
life data by a randomly selected number. The important points even though the system reliability did not change significantly.
are reflected in the relative comparisons of numbers in this exam-
ple, and these relative comparisons are in proportion to the real- After realizing the impacts that can be made, management want-
life numbers and results. However, because of the data modifi- ed to see improvements across all assemblies. A new manage-
cation, the example failure times and all the example reliability ment strategy was devised that put all modes on the list for
numbers have no relationship to the actual real-life numbers. improvement. The A-modes have been converted into an addi-
tional 11 B-modes giving the system a total of 22 B-modes.
The project plan for Project X identified design changes (based With the demonstrated effectiveness factor of 0.73, we should
on field experience data) that are corrective actions for 11 dis- expect to see the MTBF grow from 565 hours to 2,571 hours. A
tinct failure modes. The project plan scope was limited to one Life Cycle Cost (LCC) analysis shows that this improvement in
specific assembly. These modes were labeled as Bi modes, reliability has the potential to yield huge savings in service costs
where i = 1 to 11. Failure modes in that assembly that were not and easily justifies the effort.
addressed and all failure modes in other assemblies were labeled
A modes. These 11 corrective actions were implemented. The example on the application of these methods to a medical sys-
tem was developed in collaboration with Siemens Medical
For this example, service records of unscheduled maintenance Systems. Siemens noted that: “The increased effectiveness fac-
events were used to estimate the growth statistics based on sys- tor was achieved with the help of RAC. This included audits of
tem data before any of these 11 corrective actions were imple- our processes and the establishment of a Reliability Program that
mented. Field data was pulled for a sample of 60 units. See included derating, reliability prediction with PRISM, and HALT
Table 6 for an example of the data. The data was “cleaned” testing. We are continuing to work on our processes to sustain our
before the analysis to remove scheduled maintenance events and effectiveness factor and hopefully improve upon it. The Process
non-chargeable failures. For this analysis, there was a total of Grade Factors in PRISM is a tool we are using to point out
241,000 hours of data, 450 total failures, 42 B-modes, and 408 improvement areas. We are also spending more time setting tar-
A-modes. Statistical methods were applied to the data to verify gets since the targets are very aggressive. Reliability growth will
constant failure intensity. Based on this data, the system, before be used more extensively throughout the TAAF phase.”
any corrective actions, had an MTBF of 536 hours.

The reliability growth projection model was then applied to this Acknowledgements
set of data for the 11 corrective actions (450 total failures, 42 I would like to thank Jim Harriett and Lonnie Scott, with the reli-
total B-modes, and 408 A-modes failures), with an assumed ability team at the Cherry Point Naval Depot for implementation

Second Quarter - 2003 11


The Journal of the Reliability Analysis Center

of the methodologies to Navy systems. I thank Paul Oldham, About the Author
IMMC, Redstone Arsenal, for implementation of the methodolo- Larry H. Crow is VP, Reliability & Sustainment Programs, at
gies to Army systems. I would also like to acknowledge Kevin Alion Science and Technology, Huntsville, AL. Previously; Dr.
O’Shaughnessy at the Reliability Analysis Center for his excel- Crow was Director, Reliability, at General Dynamics ATS (for-
lent support, and for being instrumental in the application of mally Bell Labs ATS). From 1971-1985, Dr. Crow was chief of
these methods at Naval Depots. I acknowledge Pete Hurley of the Reliability Methodology Office at the US Army Materiel
Siemens Medical Systems for the example on the application of Systems Analysis Activity (AMSAA). He developed the Crow
these methods to a medical system. I would also like to thank (AMSAA) reliability growth model, which has been incorporat-
Adamantios Mettas of the ReliaSoft Corporation for the excel- ed into US DoD handbooks, and national & international stan-
lent application of ReliaSoft software to perform the growth dards. He chaired the committee to develop MIL-HDBK-189,
analyses on several large data sets used in the examples. Reliability Growth Management and is the principal author of
that document. He is the principal author of the IEC
References International Standard 1164, Reliability Growth-Statistical Tests
1. R.E. Barlow and F. Proschan, Mathematical Theory of and Estimation Methods. Dr. Crow is a Fellow of the American
Reliability, John Wiley & Sons, Inc., 1967. Statistical Association, the Institute of Environmental Sciences
2. L.H. Crow, Reliability Analysis for Complex, Repairable and Technology, and the recipient of The Florida State
Systems, in Reliability and Biometry, ed. by F. Proschan University “Grad Made Good” Award for the Year 2000, the
and R.J. Serfing, pp. 379-410, 1974, Philadelphia, SIAM. highest honor given to a graduate by Florida State University.
3. L.H. Crow, Reliability Growth Projection from Delayed
Fixes, Proceedings 1983 Annual Reliability and Larry H. Crow, Ph.D.
Maintainability Symposium, pp. 84-89. Alion Science and Technology
4. L.H. Crow, Evaluating the Reliability of Repairable 215 Wynn Drive, Suite 101
Systems, Proceedings 1990 Annual Reliability and Huntsville, AL 35805
Maintainability Symposium, pp. 275-279. Internet (E-mail): <lcrow@alionscience.com>

PRISM Column
At no charge to RAC PRISM users, the RAC provides an open • “I came in knowing little or nothing about PRISM or how
training course to teach users the most efficient and effective to use it. Now I can generate reliability predictions and
uses of the software. This July, RAC will present its eighth open have a basis for understanding the theory behind the
training course on PRISM. The course has evolved over the past PRISM models.”
three years into a comprehensive program providing a brief • “I feel much more able to do a credible analysis now.”
introduction to reliability, in-depth instruction on PRISM analy- • “I have a much better understanding of PRISM, both how
sis and techniques, and the theory behind the PRISM models. to use (it) and what it ‘means’. I also now know where to
This evolution paralleled the evolution of the software. find detailed background info.”
• “Thanks – This is my beginning education on prediction.
Users will leave this hands-on training course with the knowledge (I) have always resisted before due to lack of confidence
and expertise to effectively utilize PRISM. This training course is in the end result.”
specifically designed to assist users in making the transition from
novice to journeyman PRISM User. This two day training course Registration for any open RAC PRISM training course is free of
begins with a basic overview of reliability and it’s evolution over charge to licensed PRISM users. Individuals who are not cur-
the past fifty years. Building on this foundation, users learn to rently licensed users can purchase the PRISM software for
develop a basic parts count analysis, expand into the development $1,995 ($2,195 International) and attend the course at no charge.
of a detailed stress analysis, and move on to include experience To register for a PRISM Training Course go to <http://rac.
data for performing a Bayesian analysis. alionscience.com/prism/Prism_TrainingReg.html>. Course
seating is limited to 20 individuals. The RAC can also provide
Other techniques for building comprehensive analyses include on-site training. On-site training is provided at no charge when
the application of the Process Grading methodology introduced 6 or more copies of the PRISM software are purchased. If you
with PRISM and the importing techniques used to make predic- have any questions, feel free to contact the PRISM training coor-
tion development easier. dinator, Gina Nash at (315) 339-7047.

Using this training technique, the PRISM staff has trained users For more information on PRISM feel free to contact the PRISM
from over 100 companies and organizations around the globe. team by phone (315-337-0900), by E-mail (<rac_software
Users who have participated in the course have had the follow- @alionscience.com>) or through the PRISM Forum (<http://rac.
ing to say about it. alionscience.com/prism>).

• “This was an excellent overview.”


12 Second Quarter - 2003
Reliability Analysis Center’s
2003 Training Prog ram
Chicago, IL
October 7-9, 2003

Electronic Design Reliability

System Software Reliability

Reliability Growth and Repairable


System Data Analysis

Reliability Analysis Center


201 Mill Street
Rome, NY 13440-6916
Telephone: 888-RAC-USER (722-8737)
315-337-0900
Fax: (315) 337-9932
E-mail: rac@alionscience.com
http://rac.alionscience.com

RAC is a DoD Information Analysis Center Sponsored by the


Defense Technical Information Center
4. Compare & Contrast: MIL-HDBK- many examples included. It is not neces-
Electronic Design 217, Telcordia, PRISM sary to have a software background or a
Reliability Reliability Demonstration reliability background to attend the course,
This intensive course is structured for all 1. Statistical Concepts however, either or both are helpful. Hand-
key participants in the reliability engineer- 2. Confidence Intervals outs include a training course manual and
ing process. Included are systems and cir- 3. QC Concept a copy of Ann Marie Neufelder’s
cuit design engineers, quality engineers 4. MIL-STD-781 Methodology “Ensuring Software Reliability.”
and members of related disciplines having Failure Mode Effects and Criticality
little or no previous reliability training. Analysis Course Instructor
The course deals with both theoretical and 1. FMECA Characteristics Ann Marie (Leone) Neufelder is the
practical applications of reliability; all con- 2. FMEA Methodology President of SoftRel, a software reliability
siderations related to the design process 3. CA and RPN Methodologies research, development and testing organi-
including parts selection and control, cir- Reliability References and Data Sources zation. For more than a decade Ann
cuit analysis, reliability analysis, reliability 1. RAC, GIDEP, NTIS & DTIC Marie has been measuring and improving
test and evaluation, equipment production 2. Professional Organizations software reliability on a variety of sys-
and usage, reliability-oriented trade-offs, 3. Failure Reporting Analysis and tems in defense and commercial indus-
and reliability improvement techniques. Corrective Action System tries. She co-authored two industry stan-
Course hand-outs include a course manual Reliability Growth Management dards on software reliability and has pub-
and RAC’s publication “Reliability 1. The Growth Process lished “Ensuring Software Reliability”
Toolkit: Commercial Practices Edition.” 2. Growth Test Planning with Marcel Dekker Inc. Her experience
3. Duane & AMSAA Plots is in the practical hands on application of
Course Instructor Circuit Analysis software reliability measurement and her
Norman B. Fuqua has 35 years’ experi- 1. Circuit Simplification course reflects this. Ann Marie is a 1983
ence in the reliability-engineering field 2. Degradation Analysis Techniques graduate of Georgia Tech.
including teaching reliability techniques 3. Overstress and Transient Analysis
and application for the last 20 years. His Fault Tree Analysis Course Contents
experience includes the application of reli- 1. Construction Methodology Course Introduction
ability on commercial, military and space 2. Qualitative Analysis History of Software Reliability
programs. He holds a BSEE from the 3. Quantitative Analysis Software Reliability Defined
Design for the Environment 1. Software Reliability Definition
University of Illinois, is a senior member
1. Thermal Considerations 2. Software Reliability Terms
of IEEE, and a Registered Professional
2. Shock and Vibration 3. Sources of Software Faults
Engineer. He has published numerous
technical papers and has authored two 3. Salt and Humidity Software Life Cycle
4. EMI & Nuclear Radiation 1. Life Cycle Activities
textbooks on equipment reliability and
Reliability Program Management 2. Life Cycle Models
electrostatic discharge. He was responsible
1. Program Elements 3. Fault, Manpower and Cost Profiles
for the development of the RAC’s “Design
2. Program Implementation Over Life Cycle
Reliability” and “Advanced Design for
Reliability” Courses. 3. Organizational Considerations Factors That Impact Software Reliability
Production and Use Reliability and ESS 1. Application Type
1. Production Degradation Factors 2. Methodologies
Course Contents 3. Product Characteristics
2. Field Degradation Factors
General Concepts and Mathematics 4. Testing/Verification
1. Definitions 3. Environmental Stress Screening
4. HALT and HASS 5. Schedule
2. Mathematical Foundations 6. Maintenance
3. Military & International Final Group Problem
7. Operational Profile
Standards/Handbooks Overview of Software Reliability Models
System Reliability Analysis, Assessment, System Software 1. Types of Software Reliability Models
and Apportionment 2. Nomenclature Used in Modeling
1. Allocation/Apportionment Reliability 3. Assumptions of the Models
2. Modeling This training course is tailored for reliabili-
Data Required for Models
3. Prediction ty engineers, systems engineers, quality
1. Types of Data
Parts Management assurance engineers, and software engi-
2. Minimum Fault Data Needed
1. Part and Vendor Selection neers and testers. Featuring hands-on soft-
3. Setting Up a Data Collection System
2. Design Criteria and Tools ware reliability measurement, analyses and
4. Troubleshooting Bad Data
3. Manufacturing/Assembly Processes design, it is intended for those individuals
Software Reliability Prediction Models
Part Derating and Reliability Prediction responsible for measuring, analyzing,
1. Prediction Models
1. Derating Theory designing, automating, implementing or
2. Rome Laboratory TR-92-52
2. Specific Derating Factors for Various ensuring software reliability for either
3. Rome Laboratory TR-92-15
Part Types commercial or government programs.
4. Musa’s Execution Time Model
3. Microcircuit Prediction Example Practical approaches are stressed with
5. Putnam’s Model 5. Estimating the Growth Potential
6. Historical Data Collection
Reliability Growth and 6. Reliability Growth Fix Effectiveness
Software Reliability Estimation Models Repairable System Data Factors
1. Objectives 7. Managing Reliability Using Type A
2. Types of Estimation Models Analysis and Type B Failure Mode
3. Fault Count This course will address state of the art Methodology
A. Exponential methods for planning and evaluating the Reliability Growth Analysis and
B. Shooman Model reliability of complex systems during three Implementation
C. Lloyd-Lipow Model key life cycle phases: design, development 1. Duane Reliability Growth Postulate
D. Musa’s Basic Model testing and customer field use. Areas cov- 2. Crow (AMSAA) Reliability Growth
E. Musa’s Logarithmic Model ered include failure mode management Tracking Model
F. Goel-Okumoto Model strategy, the Crow (AMSAA) model and 3. Reliability Growth Confidence
G. Historical Data Collection Model the Crow Projection model for managing Intervals
H. Weibull Models reliability growth in development and 4. Reliability Growth Goodness of Fit
4. Test Coverage Models Crow Power Law model for assessing the Tests
A. IEEE Test Coverage Model reliability of repairable systems. Course 5. Reliability Growth Estimation with
B. Leone’s Test Coverage Model handouts include a course manual and Missing Data
C. Test Success Model RAC’s publication “Reliability Toolkit: 6. Reliability Growth Estimation with
5. Tagging Models Commercial Practices Edition.” Failure Times Unknown
A. Seeding 7. Crow One Shot Reliability Growth
B. Dual Test Group Model
Course Instructor Model
6. Bayesian Models Dr. Larry Crow is the Chief Scientist at 8. Crow Reliability Growth Projection
7. Thompson and Chelson’s Model the Reliability Analysis Center. Prior to Model
8. Goodness of Fit joining RAC he was the Technical 9. Estimating Reliability Growth
Software Reliability Metrics Manager for Reliability and Quality Maturity
1. Objectives Assurance, General Dynamics, Advanced Repairable System Analysis
2. Metrics to Use Based on Your Process Technology Division and chief of the 1. Evaluating the Reliability of Fielded,
Capability Reliability Methodology Office at the US Repairable Systems Using Crow
3. Metrics Used in Industry Army Material System Analysis Activity Power Law Model
4. Misusing Metrics (AMSAA). He developed the Crow 2. Mission Reliability for Fielded,
Software Fault Trees (AMSAA) reliability growth model that Repairable Systems Using Crow
1. Why Fault Trees are Used on Software has been incorporated into US DoD mili- Power Law Model
2. Applying Fault Trees to Software tary handbooks and international standards. 3. Warranty Analysis Methods for
3. Software Fault Tree Analysis Example He chaired the tri-service committee to Fielded, Repairable Systems
Software FMEAs develop MIL-HDBK-189, Reliability 4. Replacement and Overhaul Analysis
1. Why FMEAs are Used on Software Growth Management and is the principal Methods for Fielded, Repairable
2. Applying FMEAs to Software author of the document. He received the Systems
3. Example of Software FMEA 1976 Army Material Development and 5. Reliability Confidence Intervals for
System Reliability Software Redundancy Readiness Command Systems Award for Fielded, Repairable Systems
1. Series Configuration Individual Achievement. From 1979-1985,
2. Mission Oriented Dr. Crow chaired the US, UK, Canadian,
3. Semi-Markov and Australian reliability group of the
4. Parallel Concurrent Technical Cooperation Program (TCP) and
5. Voting Redundancy is a US delegate on reliability and main-
Improving Software Reliability tainability to the International
1. Evaluating Your Own Product and Electromechanical Commission (IEC). Dr.
Process Crow is an elected Fellow of the American
2. Techniques for Improving Software Statistical Association and the Institute of
Reliability Environmental Sciences.
Managing Software Reliability Course Contents
1. Matrix of Responsibilities
Reliability Growth Management and
2. Cost Benefit of Improvement
Planning
1. Reliability Management Strategy in
Design and Testing
2. Developing Idealized Growth Curves
3. Developing Planned Growth Curves
4. Estimating the Initial Reliability in
Growth Testing
RAC’S 2003 Training Program Enrollment Information
Registration Form Registration: Complete the registration form in this flyer and mail
Chicago, IL with your check or purchase order to the Reliability Analysis Center.
We urge you to register as soon as possible, as class size is limited
Please select one course & check box accordingly to 24. The fee includes attendance at one of the 3-day basic courses
Electronic Design Reliability $ 1,095 of the students choice, handout materials and coffee breaks. Hotels
System Software Reliability $ 1,095 and meals are not included.

Reliability Growth and Repairable $ 1,095 Multiple-Attendance Discounts: The discount schedule for course
attendance by several persons from one corporate entity is:
System Data Analysis
No. of Attendees % Discount
2.1 CEUs will be earned for each course. 1-2 None
Attendees are encouraged to bring their own calculator. 3-4 10%
5 and above 20%
RAC Training Course Details Refunds: Cancellations received up to five working days before
Course Dates October 7-9, 2003
the courses begin are refundable. After that, cancellations are sub-
Course Registration Deadline September 23, 2003 ject to the entire registration fee, which you may apply toward a
future course. Please note that if you do not cancel and do not
Course Site Knowledge Development Centers (KDC)
attend you are still responsible for payment. Substitutes may be
Ste. 904
made at any time.
Route 83 and 22nd Street
Oakbrook Terrace, IL 60181 Instruction Periods: Registration will be October 7th at 7:30 a.m.
(866) 456-6948 Classes run from 8:00 a.m. to 4:00 p.m. daily.

Lodging: KDC facilities are non-residential which allows the attendee to Additional Information: We reserve the right to cancel or postpone
choose their own overnight accommodations. A large variety of hotels to any of the training courses one week prior to the start of the
meet your individual needs are conveniently located within a short dis- course. For further information contact the Reliability Analysis
tance of Knowledge Development Center’s world class training facility. Center at (888) RAC-USER (722-8737) or (315) 337-0900, FAX:
Course attendees are responsible for making their own hotel reservations (315) 337-9932.
and are encouraged to do this soon. Knowledge Development Centers
has negotiated special rates with five area hotels. For details please visit
http://www.kdc-us.com/chicago/map.htm.

Course Registration Only


Name
Company
Address
City State Zip Country
Phone Ext Fax
E-mail

Method of Payment:
Company/ Personal Check Enclosed (Make checks payable to Reliability Analysis Center)
Credit card #: ___________________________________________________________________ Exp Date: ____________________________

Type (circle): American Express VISA Mastercard

Name on Card ____________________________________________________________________________________________________________


Signature ________________________________________________________________________________________________________________
Billing Address ___________________________________________________________________________________________________________
Federal ID No.: 54-2061691
Please list additional registrations on a separate sheet and attach.
Mail to Reliability Analysis Center, 201 Mill Street, Rome, NY 13440-6916 or Fax to (315) 337-9932.
The Journal of the Reliability Analysis Center

Reliability Testing of Printed Wiring Boards with Interconnect


Stress Testing Technology (IST)
By: Ron Carter, Alion Science and Technology

Introduction Background
The Interconnect Stress Test System (IST), see Figure 1, is used The Interconnect Stress Testing (IST) technology was initially
to evaluate the reliability of the copper plated barrel and inner- developed from “power cycling” used by Northern Telecom
layer interconnects to the plated barrel of printed wiring boards (Nortel) in the 1980’s. The developers at Nortel worked with
(PWB). The IST machine thermally cycles specially designed various test vehicles to determine the viability of the power
coupons by applying DC current to conductive patterns on the cycling concept. The developers left Nortel and joined the
coupons to elevate the temperature to 150°C (typically) within 3 Digital Equipment Corporation (DEC) in 1989 and soon estab-
minutes and then cooling by forced air back to ambient temper- lished a joint “power cycling” program between the two compa-
ature in approximately 2 minutes. These steps equate to a 5- nies. Nortel discontinued support after a year of funding and
minute thermal cycle. IST is becoming the standard reliability DEC continued on with the program searching for methods to
test method of choice in the electronic industry for both major achieve repeatability and correlation with industry test methods
fabricators and OEM’s. These include major defense contractors for thermal cycling. During the 1990’s DEC developed the IST
like TRW, Raytheon, Honeywell, and Rockwell Collins, as well name and operating principles for their system. Coupon design
as commercial giants like Intel, Cisco, and Sun Micro Systems. was addressed and an automated test system was developed to
IST is also being accepted by companies in Europe and Asia. provide repeatable and lower cost test methods to qualify the
integrity of PWB backplanes they were producing.

Digital Equipment Corporation started working with IPC in


1992 to resolve and promote correlation between IST and other
industry thermal cycling technologies. This correlation was
achieved in defining various mechanisms and failure modes
among the different cycling methods. At this time DEC provid-
ed an IST testing service to the PWB industry for thermal
cycling their board products.

Over the next few years, new IST principles were developed to
establish a capability to detect innerlayer to PTH barrel (post)
separations (see Figures 2 and 3). Working with the PWB indus-
try and various chemical suppliers, DEC was able to re-create
numerous types and levels of post separations within the inter-
connect. IST proved it was able to consistently detect the sepa-
rations and achieved a 98%+ correlation with micro-section
analysis.

Figure 1. IST Tester

Figure 2. Interconnect Post Separation


Second Quarter - 2003 13
The Journal of the Reliability Analysis Center

• Not only provides reliability data for the customers,


OEM’s, and contract assemblers, but it provides process
data for the fabricators and chemical suppliers as well.

The need or requirement for IST technology came out of PWB


manufacturer’s frustration with the traditional micro-section
analysis method because of the randomness and inconsistency of
their test results. However, once the advantages of the IST tech-
nology was known, the OEM’s started requiring IST data on each
lot of PWB’s furnished by their suppliers. IST is supplying timely
and repeatable information that not only determines the presence
of interconnect quality issues, but the severity of the problem.

The use of thermal imaging (after IST testing has been complet-
ed) will show the exact area of the defect so it can be identified
for micro-section analysis (see Figure 4).

Figure 3. Plated through Hole Barrel Crack

In 1996 IPC approved and issued an official IST test specifica-


tion defining the methodology as designed to replace/support
traditional accelerated stress testing and micro-section analysis.
This methodology is included in the latest version of TM-650
Test Methods manual (2.6.26).

The developers of the IST system formed their own company in


late 1996 and received exclusive licensing rights from DEC to
provide a testing service and fabricate the IST computer test sys-
tem. This company is PWB Interconnect Solutions operated by
Bill Birch in Ontario, Canada. The web site is <http://www.
pwbcorp.com> and is full of helpful information. Figure 4. Thermal Image of a Barrel Crack (White Hot Spot)

Advantages of IST Over Traditional Thermal The IST testers were initially purchased by fabricators to
improve their plating processes and to understand the effects of
Cycling Test Methods different materials on plated through holes and interlayer inter-
IST is lower in cost with results 6 to 10 times faster than tradi- connects to the barrel. However, when the customer became
tional thermal cycling test methods. IST will provide test results aware of the value of the data, they began requesting the data be
within 2 days, compared to 40 to 50 days for traditional thermal furnished to them as well. As a result, fabricators that had IST
cycling in chambers. Additional benefits of IST are that it: units quickly started using IST as a sales advantage. The fabri-
cators were able to point out plating weakness to their chemical
• Is totally automated for testing and data collection. suppliers thus forcing them to buy units. As a result, the chem-
• Has excellent repeatability and reproducibility when istry for plating improved, the fabricator made better boards and
compared to micro-section analysis, which is very opera- the OEM’s received a better product. This unit has had a tremen-
tor dependent. dous quality improvement impact on the PWB industry over the
• Tests a large number of holes (200 to 500); micro-section last eight years.
analysis typically looks at only 4 to 6 holes.
• Has a constant resistance monitoring that provides the Using IST as a reliability test provides a capability to remove the
exact point when failure occurs and follows failure pro- human factor from the decision making process for product
gression. Traditional thermal cycling does not. acceptance or rejection. This test rapidly quantifies whether any
• Quantifies severity of failure by constantly monitoring “flaw” within the interconnect has a detrimental impact to the
minute resistance changes. No other system on the mar- total interconnect integrity. This test method allows us to under-
ket today provides this capability. stand the “hierarchy of failure” and demonstrates which are
• Evaluates 360 degrees of all connections as compared to dominant and which are latent failure mechanisms.
micro-section analysis that sees only 1 degree of the hole
(plated barrel) circumference. IST has been used in the aerospace, avionics, and automotive
• Correlates with Air to Air & Liquid to Liquid Reliability industries ever since IST results were proved to correlate with
Thermal Cycling Methods. the traditional 1000-hour (-65°C to +125°C) thermal cycling
14 Second Quarter - 2003
The Journal of the Reliability Analysis Center

test. IST technology is becoming established as the preferred One of these is the circuit that carries the current through the
method for quantifying the integrity of high technology PWB’s inner layer circuitry and monitors the resistance changes associ-
containing microvias (our test vehicle measures 3” by 0.5” and ated to inner layer (post) separations. The remaining circuits
contains 2,000 microvias). (from 1 to 3) receive no current and are responsible for the plat-
ed through holes interconnects. The system compares both cir-
IST Methodology cuits to determine whether barrel cracking or post separation is
The principles behind IST technology are surprisingly simple. the more dominant failure mechanism.
The IST system automatically passes a predetermined constant
DC current through a specifically designed PWB interconnect IST Applications
(coupon); the current elevates the temperature of the metaliza- IST technology is being used in the following applications
tion and adjacent materials. The temperature to which the
coupon is heated is directly proportional to the measured resist- • Providing Reliability Data
ance and the amount of current that is passed through the con- • Process/Product Characterization
ductor pads and holes. A physical principle that can be described • Material/Chemical Evaluations
mathematically defines the relationship of the temperature of the • Process Troubleshooting
interconnect to the amount of metalization and its resistivity (a • PWB Vendor Base Assessment
value that describes how hard it is for electrons to flow through • Correlation Studies
the entire interconnect). • Customer Quality Assurance
• Product Prescreening for Long Term Testing
The IST system uses this principle to raise the resistance/tem- • Impact of Assembly/Rework Stresses
perature of the interconnect to a predetermined value. Once that
is achieved (3 minute heat cycle) the system turns off the current The active IST customer base consists of over one hundred com-
and the coupons are cooled back to ambient temperature (usual- panies from around the world. They include fabricators,
ly between 2 to 3 minutes) using forced air. These steps consti- OEMs/CEMs, and Chemical/Material suppliers. Following are
tute a single thermal cycle. The specified resistance/temperature the names of just a few of these companies.
level is designed to be just above the glass transition temperature
of the substrate (typically 150°C). • Fabricators: Viasystems, Sanmina SCI, Multeck, Tyco,
Honeywell, Merix, DDI, Teradyne, Hitachi, Toppan,
During each thermal excursion, the system continuously moni- Yamamoto, Ibiden, Compac, Gold, WUS.
tors the minute resistance changes in the PTH & inner layer to • OEMs/CEMs: IBM, Cisco Systems, Sun Microsystems,
barrel (post) interconnects. As the temperature of the intercon- Hewlett Packard, Lucent, SGI, EMC, Unisys, Siemans,
nect changes, the resistance value of the interconnect (i.e., traces, Cray, Celestica, Plexus, SCI, Jabil.
pads and hole barrels) proportionally changes. The IST system • Chemical/Material Suppliers: Shipley Ronal, Atotech,
is designed to quantify the ability of the total interconnect to Nelco Isola, Gil Laminates, Polyclad.
withstand these thermal/mechanical strains, from the as-manu-
factured state, until the products reaches the point of intercon- Alion Science and Technology
nect failure. Alion has a complete micro-section lab for preparing and ana-
lyzing coupons for failure analysis. Alion offers consulting serv-
The IST system measurements are very sensitive to minute ices within the PWB industry and is the only licensed IST test-
changes in resistance. In other words, if a failure mode initiates, ing service in the United States. The engineering staff at Alion
the measured difference in resistance is usually very small (sub has over 50 years of combined experience in fabrication and
milliohm). Subsequently, the ability of the interconnect to with- over 25 years experience in assembly and packaging.
stand further stressing is reduced which leads to larger measured
increases in resistance. When larger resistance changes are
detected, a defect that ultimately leads to failure has started to Conclusions
develop. This phenomena is referred to as failure propagation of IST technology is an industry standard for testing and quantify-
damage accumulation. If the changes are very large, a structur- ing the reliability of plated through holes and innerlayer inter-
al failure has occurred within the interconnect. IST is designed connects. The cost effectiveness, speed of testing, accuracy,
to monitor these changes and stop the stressing at a pre-deter- repeatability, flexibility, and ability to simultaneously test the
mined (low) level of failure. This ability permits early interven- inner layer interconnect and plated barrel makes IST the system
tion of the failure so that micro-section analysis can be used to of choice. No other reliability technology in the PWB industry
determine the failure mode. can offer all these advantages. Major companies such as
Honeywell and Cisco are using IST data as the final accept-
Using this methodology allows the IST system to determine ance/rejection of their PWBs. Alion is the only source for
when a defect begins to develop as well as how rapidly failure licensed IST testing service in the United States. For more infor-
occurs. The changes are monitored on independent circuits.
(Continued on page 17)
Second Quarter - 2003 15
The Journal of the Reliability Analysis Center

Reliability Testing of Printed Wiring . . . (Continued from page 15)

Figure 5. Graph of Resistance Change During IST Testing


mation on Alion’s licensed IST testing services, contact Ron About the Author
Carter, (256) 382-4804, <rcarter@alionscience.com>. Ron Carter is a Research Engineer and program manager for the
IST and failure analysis lab for Alion Science and Technology in
Reference Huntsville, Alabama. He has over 35 years of electronics indus-
“PWB Interconnect Solutions,” history and methodology on IST try experience in printed wiring board fabrication and failure
(<http://www.pwbcorp.com>). analysis and more than two years experience with IST testing.
Prior to coming to Alion he worked for Unisys Corporation and
Intergraph Corporation as a manufacturing process engineer.

University of Warwick Conducting Reliability Benchmark Survey


The University of Warwick (United Kingdom), sponsored by the UK Department of Trade & Industry, is conducting a detailed and far-
reaching survey of reliability practices. Companies worldwide, across all industries, are called upon to complete this powerful survey.
The size of the response will enhance the survey's power, to draw together several hundred companies worldwide, in the analysis of
reliability best practices. All participants will receive reports giving analysis results, general guidelines and individual comments. All
surveys will be treated in confidence and will not be divulged to any other agency, either completely or in part, in any form that would
permit identification of the respondent.

Survey packs may be obtained from: or by downloading from:

Les Warrington, Senior Fellow (Quality & Reliability) Forms: <www.wmg.org.uk/Benchmarksurveyforms-


University of Warwick finalversion.pdf>
Gibbet Hill Road, Coventry, CV4 7AL, United Kingdom Folder: <www.wmg.org.uk/BenchmarkFolde-
E-mail: <L.Warrington@Warwick.ac.uk> finalversion.pdf>

Second Quarter - 2003 17


YOU ASKED, AND WE LISTENED
PUTTING THE PIECES OF RELIABILITY, AVAILABILITY,
MAINTAINABILITY, SAFETY AND QUALITY ASSURANCE TOGETHER

New Fault Tree Analysis Engine!

ITEM TOOLKIT MODULES Binary Decision


■ MIL-217 Reliability Prediction Diagram (BDD)
■ Bellcore/Telcordia Reliability Prediction AVAILABLE
■ 299B Reliability Prediction NOW
■ RDF Reliability Prediction
■ NSWC Mechanical Reliability Prediction
■ Maintainability Analysis
■ Failure Mode, Effects and Criticality Analysis
■ Reliability Block Diagram
■ Fault Tree Analysis
■ Markov Analysis
■ SpareCost Analysis

ITEM QA MODULES
■ Design FMEA
■ Process FMEA
■ Control Plan
■ Document Control and Audit (DCA)
■ Calibration Analysis
■ Concern and Corrective Action
Management (CCAR)
■ Statistical Process Control (SPC)

Item Software (USA) Inc. Item Software (UK) Limited


2190 Towne Centre Place, Suite 314, Anaheim, CA 92806 1 Manor Court, Barnes Wallis Road, Fareham, Hampshire
Tel: 714-935-2900 - Fax: 714-935-2911 PO15 15TH, U.K.
URL: www.itemsoft.com Tel: +44 (0) 1489 885085 - Fax: +44 (0) 1489 885065
E-Mail: itemusa@itemsoft.com E-Mail: sales@itemuk.com

Visit our Web site at www.itemsoft.com,


or call us today for a free demo CD and product catalog.
The Journal of the Reliability Analysis Center

DoD to Replace Handbook on Test and Evaluation of Systems


Reliability, Availability, and Maintainability
By: Dr. Jay Mandelbaum, Strategic & Tactical Systems, OSD(AT&L) and Ned H. Criscimagna, Reliability Analysis Center
The Reliability Analysis Center will be the facilitator and lead nent and subsystem reliability at the production decision.2 Together
developer for a team-based effort to replace the cancelled DoD with an increased emphasis on systems engineering, the guidance
Handbook, DoD 3235.1-H Test and Evaluation of Systems included in the new document, will help acquisition agencies
Reliability, Availability, and Maintainability: A Primer, last improve reliability and maintainability (R&M) and, with the test
updated in 1982. This effort was one of several recommenda- and evaluation (T&E) community, make better R&M assessments.
tions of a National Research Council workshop on reliability
issues for Department of Defense (DoD) systems held in June A kickoff meeting was held in April at the headquarters of the
2002. The workshop participants believe that the update will National Defense Industries Association and was attended by
improve communication between those in DoD having the representatives of the T&E communities, academia, industry,
responsibility for the reliability of defense systems and academ- and industry associations. The attendees constitute an Integrated
ic researchers with significant expertise on the most readily Project Team for this effort. A general consensus was reached on
applied and broadly applicable statistical techniques. the purpose and scope of the new document. Whereas 3235.1H
focused on T&E, the new document will take a broader view of
Poor reliability in defense systems not only inhibits achieving R&M assessment throughout the life cycle of a system. It will
the operational readiness rates required by commanders in the also address the early actions that must take place in design and
field, it also leads to larger logistics footprints that detract from manufacturing to achieve the desired levels of R&M.
combat capability and increase operating and support costs that
divert resources from necessary modernization. Unfortunately, Subsequent to the April meeting, RAC prepared a draft table of
reliability issues continue to cause problems for the Department. contents that was distributed in early May for comment. After a
In 1998, the National Research Council wrote: consensus is reached on the content, drafting of the new docu-
ment will begin. The new document will be a guidebook and not
“The Department of Defense and the military services a directive or standard. Although RAC will lead the writing
should give increased attention to their reliability, avail- effort, members of the defense and industry communities will
ability, and maintainability data collection and analysis participate.
procedures because deficiencies continue to be respon-
sible for many of the current field problems and con- For more information on this effort, contact the government
cerns about military readiness.1” project officer, Dr. Jay Mandelbaum, at (703) 695-0472 or the
RAC project manager, Ned Criscimagna, at (301) 918-1526.
GAO observed that in FY 2003, DoD asked for approximately $185
billion to develop, procure, operate, and maintain its weapon sys-
tems—an increase of 18 percent over FY 2001. Often, DoD sys-
About the Authors
tems need expensive spare parts and support systems to meet Ned Criscimagna is the Deputy Director of the RAC and a fre-
required readiness levels. DoD has been increasingly concerned quent contributor to the Journal. Dr. Jay Mandelbaum recently
that the high cost of maintaining systems has limited its ability to joined the Systems Engineering staff under the Director of
modernize and invest in new weapons. This led GAO to issue a Defense Systems in the Office of the Under Secretary of Defense
report to the Senate Armed Services Subcommittee on Readiness for Acquisition, Technology and Logistics. In addition to his
Management Support where one recommendation to DoD was to responsibilities in reliability, maintainability, and sustainability
revise acquisition regulations to require a firm estimate of compo- policy, he leads DoD’s value engineering program and is leading
an effort to reduce total ownership costs of defense systems.
1

National Research Council; Statistics, Testing, and Defense Acquisition: New Approaches and Methodological Improvements, Panel
on Statistical Methods for Testing and Evaluating Defense Systems; Michael L. Cohen, John B. Rolph, and Duane L. Steffey, Editors.;
Committee on National Statistics; Washington, D.C., National Academy Press; 1998.
2

GAO final report; BEST PRACTICES: Setting Requirements Differently Could Reduce Weapon Systems’ Total Ownership Costs;
February 11, 2003; (GAO Code 120092/GAO-03-057).

NOTICE

The Second Edition of The Reliability and Maintainability Guideline for Manufacturing Machinery and Equipment co-published
by SAE and the National Center for Manufacturing Sciences is available from SAE (<http://www.sae.org/>) for $39.00.

Second Quarter - 2003 19


The Journal of the Reliability Analysis Center

Future Events in Reliability, Maintainability, Supportability & Quality


21st International System Safety Aerospace Congress & Exhibition Interoperability
Conference September 8-12, 2003 October 14-17, 2003
August 4-8, 2003 Contact: Barb Roth Arlington, VA
Ottawa, Ontario, Canada SAE International Contact: IDGA
Contact: Gerry Einarsson, Conference Chair Warrendale, PA 15096-0001 Little Falls, NJ 07424
24 Wedgewood Cres. Tel: (724) 772-4081 Tel: (800) 882-8684
Ottawa, Ontario Fax: (724) 776-0210 Fax: (973) 256-0205
Canada K1B 4B4 E-mail: <roth@sae.org> E-mail: <info@idga.org>
Tel: (613) 824-2468 On the Web: <http://www.sae.org/ace/> On the Web: <http://www.idga.org>
E-mail: <einargk@rogers.com>
On the Web: <http://www.russona.com/ BINDT 2003 Conference and Exhibition Canadian Reliability and Maintainability
issc21/> September 16-18, 2003 Symposium
Contact: Phil Kolbe October 16-17, 2003
DoD DMSMS 2003 Conference The British Institute of Non-Destructive Ottawa, ON
August 18-21, 2003 Testing Contact: Dean Scrofano, Program Chair
San Diego, CA 1 Spencer Parade CRMS
Contact: James Neely Northampton NN1 5AA UK 4312 Carp Road
AFRL/MLMT Tel: 01604 630124 Carp K0A 1L0, Canada
2977 P Street, Bldg. 653, Room 207 Fax: 01604 231489 Tel: (613) 839-7676
WPAFB, OH 45433-7739 E-mail: <conf@bindt.org> Fax: (613) 836-3662
Tel: (937) 904-4374 On the Web: <http://www.bindt.org/ E-mail: <dean@crms2003.ca>
Fax: (937) 656-4420 Mk1Site/NDT2003.html> On the Web: <http://www.crms2003.ca/>
E-mail: <james.neely@wpafb.af.mil>
On the Web: <http://www.dmsms2003. 46th Annual NDT Forum, 2003 6th Annual Systems Engineering
utcdayton.com> September 22-25, 2003 Conference
Contact: Sherri D. Brooks October 20-23, 2003
15th Annual International Military & Creative Conference Solutions San Diego, CA
Aerospace/Avionics COTS Conference Tel: (724) 601-4646 Contact: Dania Khan
& Seminar Fax: (724) 869-1802 NDIA
August 27-29, 2003 E-mail: <ccsolutions@aol.com> Arlington, VA 22201
Newton, MA On the Web: <http://www.airtransport.org/ Tel: (703) 247-2587
Contact: Edward B. Hakim public/events/display2.asp?nid=6265> Fax: (703) 522-1885
The Center for Commercial Component E-mail: <dkhan@ndia.org>
Insertion 2003 ASTR Workshop On the Web: <http://register.ndia.org/
Spring Lake, NJ 07762 October 1-3, 2003 interview/register.ndia?PID=Brochure&
Tel: (732) 449-4729 Seattle, WA SID=_0YC0OAWNB&MID=4870>
Fax: (775) 655-0897 Contact: Chris Hanse, Registration Chair
C. Hanse Industries, Inc. ASIP 2003 USAF Structural Integrity
7th Joint DoD/FAA/NASA Conference on Allegan, MI 49010 Program
Aging Aircraft Tel: (269) 673-8638 December 2-4, 2003
September 8-11, 2003 Fax: (616) 673-8632 Savannah, GA
New Orleans, LA E-mail: <Chris@chanseind.com> Contact: Jill Jennewine
Contact: Universal Technology Corporation On the Web: <http://www.ewh.ieee.org/soc/ Universal Technology Corporation
1270 N. Fairfield Rd. cpmt/tc7/ast2003/> 1270 N. Fairfield Road
Dayton, OH 45432 Dayton, OH 45432-2600
Tel: (937) 426-2808 13th Int’l Conference of Software Quality Tel: (937) 426-2808
Fax: (937) 426-8755 October 6-9, 2003 Fax: (937) 426-8755
E-mail: utc-mmg@utcdayton.com Dallas, TX E-mail: <jjennewine@utcdayton.com>
On the Web: <www.agingaircraft. Contact: Linda Westfall On the Web: <http://www.asipcon.com/>
utcdayton.com/> The Westfall Team
3000 Custer Road The appearance of advertis-
Suite 270, PMB 383 ing in this publication does
Plano, TX 75075-4499 not constitute endorsement
For a complete listing of upcoming
Tel: (972) 867-1172 by the Department of
RMSQ events, visit the RAC web site
Fax: (972) 943-1484 Defense or RAC of the prod-
at: <http://rac.alionscience.com/
E-mail: <lwestfall@westfallteam.com> ucts or services advertised.
NewsAndEvents/rac_calendar.html>
On the Web: <http://www.icsq.org/>
Second Quarter - 2003 21
The Journal of the Reliability Analysis Center

From the Editor


To many, the subjects of reliability and maintainability (R&M) Companies like GE and Pratt & Whitney
are vague terms that have something to do with probabilities and have not only increased the reliability of
predictions. Many managers in particular either do not under- the engines that power the commercial
stand the two disciplines, or consider them to be nitty-gritty aircraft on which we fly by orders of
details of design to which they need not give much attention. magnitude, but they have also signifi-
Often they willingly sacrifice R&M for the sake of cost or sched- cantly decreased the specific fuel con-
ule. Although such trades may be necessary, they should be sumption, made the engines much qui-
made only after fully understanding the effect on safety, mission eter, and made maintenance easier.
success, readiness, and operating and support costs.
Unfortunately, they are often made without a full understanding None of the improvements in aircraft
and appreciation of the consequences. engines, automobiles, and consumer
products came without investment. This
For several weeks, we all watched as the war against Saddam investment was in the form of research Ned H. Criscimagna
Hussein's regime was waged using the most sophisticated and and development, new materials, new
modern weapon systems ever devised by human beings. Despite technologies, and sound design. All of the improvements came
the overall success of these systems, and the victory achieved as a result of understanding that R&M are design characteristics
with them, the consequences of failure and the long lines of that must be given significant attention. Without strong require-
logistics that proved difficult to safeguard, vividly point out that ments, good design practices, adequate analyses and testing, the
R&M are not vague abstractions. They are concrete aspects of products we use would not give us the service we enjoy and
system performance that affect readiness, logistics footprint, and would cost much more in the long run.
mission effectiveness. We ignore them at our peril.
As we develop the weapon systems of tomorrow needed to
Most of us have little trouble distinguishing between products defend our Nation and way of life, the military acquisition com-
that are sufficiently reliable for our needs at a given cost. If we munity must ensure that adequate levels of R&M are not sacri-
have a lot of trouble with a product, we seldom will buy the same ficed under the pressures of budget and schedule.
item from the same manufacturer. Whether it is our automobile,
television, or appliances, we have come to expect failure-free If anything, we must redouble our efforts to design for R&M,
performance for some reasonable length of time. adopting commercial practice and technology where it makes
sense to do so, and giving our engineers the resources they need
Our expectations stem from our experience. Consumer prod- to do the job right. For once a military system is in the hands of
ucts, especially electronics but also including our automobiles, the soldier, airman, or sailor, the consequences of failure become
have become dramatically more reliable over the past 50 years. all too real.

RMSQ Headlines
Managing Software Quality With Defects, CROSSTALK, pub- Roadblocks to Quality, Quality Progress, published by ASQ,
lished by The Software Technology Center, Second March 2003, February 2003, page 49. Roderick Munro discusses how man-
page 4. The author, David Card, discusses two approaches for agers often misunderstand the use of quality tools in developing
modeling and measuring software quality throughout the life and manufacturing quality automobiles.
cycle of the software. Each approach involves the development
of a life-cycle profile of defects. Aircraft-engine-mounting analysis, Aerospace Engineering,
published by SAE International, April 2003, page 23. FEA,
Implementing Front-End Logistics Support for NASA modeling, and test used to ensure the adequacy of engine mounts
Program, PM, published for the Defense Acquisition University, under different flight maneuver conditions and failure modes.
January-February 2003, page 14. Gary McPherson from the Authored by Ly Nguyen, Taison Ku, and Remo Meri.
Army Acquisition Corps discusses the front-end logistics sup-
port within the Second Generation Reusable Launch Vehicle Reshaping F-16 production, Aerospace Engineering, published
(2GRLV). The 2GRLV is a replacement for the Space Shuttle by SAE International, April 2003, page 29. Modeling and
and there is increased emphasis on reliability, maintainability, Kaizen events are used to make the F-16 production line "lean."
and supportability engineering and analysis. Authored by Frank Bokulich.

22 Second Quarter - 2003


The Journal of the Reliability Analysis Center

Call: (888) RAC - USER Reliability Analysis Center


(315) 337 - 0900 201 Mill Street
Fax: (315) 337 - 9932 Rome, NY 13440-6916

Web: http://rac.alionscience.com E-mail: rac@alionscience.com


Order Form*

Quantity Title US Price Each Non-US Price Total

Quality Toolkit $50.00 $60.00


Maintainability Toolkit $50.00 $60.00
PRISM $1995.00 $2195.00
A Practical Guide to Developing Rel. Human-Machine Sys. & Proc. $75.00 $85.00
Practical Application of Reliability-Centered Maintenance $75.00 $85.00
Shipping and Handling:
US Orders add $6.00 per book or CD-ROM for First Class ($10.00 for NPRD or VZAP, $14.00 for
EPRD, $2.00 for RAC Blueprints). Non-US add $15.00 per book or CD-ROM for air mail
($30.00 for NPRD or VZAP, $50.00 for EPRD, $4.00 for RAC Blueprints).

Total

Name

Company

Division

Address

City State Zip

Country Phone Ext

Fax E-mail

Method of Payment:

• Company/Personal Check Enclosed (Make checks payable to Reliability Analysis Center)

• Credit Card #: Exp Date:

Type (circle): American Express VISA Mastercard


A minimum of $25.00 is required for credit card orders

Name on Card

Signature

Billing Address

• Company/Government Purchase Order • Send Product Catalog

*Visit <http://rac.alionscience.com/iPC/servlet/iPCServlet?> for a complete list of RAC products.

Second Quarter - 2003 23


Reliability Analysis Center The Journal of the PRSRT STD
201 Mill Street
US Postage Paid
Rome, NY 13440-6916 Utica, NY
Permit #566

Reliability Analysis Center

(315) 337-0900 General Information


(888) RAC-USER General Information
(315) 337-9932 Facsimile
(315) 337-9933 Technical Inquiries

rac@alionscience.com via e-mail

http://rac.alionscience.com Visit RAC on the Web

You might also like