Reliability Reliability

• "Reliability is the probability of a device performing its purpose
adequately for the period of time intended under the operating
The probability that an item will fail in conditions encountered." Billington and Allen (1983).
• Reliability theory was originally developed for estimating the
the interval from 0 to time t is reliabilities of physical devices.
F (t) • The source of the reliability failures of physical devices is typically
the physical deterioration of the materials used in their construction.
Reliability is • This physical deterioration provides the basis of stochastic reliability
modeling, since the deterioration is assumed to vary randomly with
R (t) = 1 - F (t) time.

IEEE. Software Engineering Standards. Third ed., New York: IEEE, 1989.

Handbooks Reliability Theory
• In many cases users of electronic components are
• Used to estimate the reliabilities of individual devices, such as
able to access reliability estimates for individual electronic components,
components, such as resistors, or families of • and the reliabilities of systems constructed of components.
components such as germanium small signal • Mathematical – based on probability theory.
transistors (AT&T Reliability Handbook).
• Engineers use such individual reliability estimates,
along with the mathematical theory of reliability,
when deciding if an assembly including them
meets reliability requirements.

Component Level Estimation Data Collection Approach
• Using the data collection approach, the physical
• Reliability at the component level can be done device is observed and failure data recorded.
either based on physical principles or data • This data can be collected under laboratory
conditions, or by observing the devices in the field.
collection. • This information is then used in the context of a
mathematical model to calculate a reliability figure.
• Reliability estimates based on physical principles • For example, aircraft manufacturers collect
use knowledge of the materials in the device to extensive failure data on aircraft components such as
motors, hydraulic pumps, etc.
make estimates. • Since it will be impossible to run these tests under
all environmental conditions, some extrapolation is
– For example, automobile tires eventually wear out. We necessary to develop more widely applicable
know that they will not last for a billion miles of use, or reliability estimates.
for a million. • Extrapolations are necessary to get from typical
testing conditions such as higher temperatures to
realistic use conditions which may involve much lower

• expected number of failures in a given time • An error is "A discrepancy between a computed. observed. specified.92)(0. • average time between failures • average down time • A failure is "The termination of the ability of a functional unit to perform its required function." • expected revenue loss due to failure • A fault is "An accidental condition that causes a • expected loss of output due to failure functional unit to fail to perform its required function" Reliability of a Series System Series Example R1 R2 R3 • System R has three components n – R1 = 0.98) = 0. can we do the reliability estimate? – How do we get the reliabilities of the individual components? – Are they independent? – What might cause them to fail? .92 – R3 = 0. or measured value or condition and the period true. or theoretically correct value or condition. Definitions Reliability Indicators • ANSI IEEE Standard Glossary of Software Engineering Terminology.90)(0.90 Rs = ∏ Ri i =1 – R2 = 0.98 Assumes that items in the series are independent • The system Reliability Rs = All items must work for the system to work (0.81 Software Pipeline Architecture Parallel System Reliability • If the three components in the last example are software filters in a pipeline architecture.

761 SUPER SUPER RBD Language • A simple formal language.98 • There are various equations for computing • The system Reliability Rs = the reliabilities of these systems.3. 1.0. For example.0. For instance.3.999 Complex System Example k out of n systems • System with n components in parallel will function iff at least k of those parallel components are functioning (1<=k<=n) R2.7*R8 = 0.(1 .4 = 0. the RBD • SUPER is a software package that provides language. • Nested RBDs are handled in the RBD W1 system reliability descriptions.0. and U3 is described in SUPER as maintenance model and for other useful • unamit = s(U1/U2/U3).96 Rs = R1 * R2. was developed to allow the description of block diagrams in computational support for the separate SUPER. the RBD in Figure 2 can Figure 2 telecommunications industry for over 15 • be represented by the statements U3 = p(W1/W2) years.98) = 0. a series system called “unamit“ comprising units U1.7 = 0. U2.92)(1 . Parallel Example Complex Systems • System R has three components • A complex system may contain series and – R1 = 0.90 parallel components and may include cross – R2 = 0. – R3 = 0.4*R5*R6. • unamit = s(U1/U2/U3). .936 R6.92 links.90)(1 . language by naming subtending blocks and using that name in the RBD U1 U2 W2 language statement defining the • SUPER has been used in the structure that includes those blocks.

92 over ten hours of processing time. Key Difference Failure Modes Analysis • A key difference between software and • Failure modes analysis is a standard hardware reliability analysis is that software engineering technique for process and does not physically wear out.57. Mean Time to Failure System Availability i MTTF = ∑ tk k=1 i Availability is t k = time between failures 1. 2. providing a • Thus the stochastic modeling must be systematic procedure for determining and based on some other source. does not occur during a specified exposure period. IEEE Software Engineering Standards. classifying the ways that a product or – What are the root causes of software failure? process can fail. Musa and Ianino (1987). Software Reliability Mean Time Between Failures • "Software reliability is defined as the probability that a software fault that causes deviations from the required output by more than a • MTBF = MTTF + MTTR specified tolerance. The ratio of system up-time to total operating time. The ability of an item to perform its designated 431 function when required for use. 3. program myprog has a reliability estimate of . • The applicability of hardware reliability theory to software is the subject of debate." Ralston and Reilly (1983) • Where MTTF is mean time to failure • Software reliability is “the probability of failure free operation of a computer program in a specified environment for a specified time”. 3rd Edition MTTF = 2594 / 7 = 370. 212. in a specified environment.g. 278. 503. . product improvement. 675. 315. if myprog were executed 100 times and use 10 hours of processing time it is likely to fail 8 times out of 100. That is. • MTTR is mean time to repair – e. Times to failures: 280. The probability that software will be able to perform its i = number of failures designated system function when required for use.

compiler problems and missing include files .no time allowed for search Part Understood . Reuse Survey . Time Constraints x . Insufficient Funding x Reuse Technology Immature x .different language.the code I was trying to reuse usually had to be modified to some degree as it was not More Fun to Write created with reuse in mind.too much dependence on ancillary (global) software to be portable. Reuse Failure Modes (2 of 3) Reuse Failure Modes (3 of 3) Part Isn't Understood Insufficient Representation x Failure Cause Responses Poor Education Part Too Complex x Part Doesn't Exist No Economic Incentive Novel Technology Part Isn't Valid Poor Testing Part Isn't Available No Import Organization x Part Can't Be Scavenged Insufficient Information x Part Not Designed for Reuse x Lack of Standards xx Part Can't Be Found Out There x Part is Proprietary/Classified x Part Can't Be Integrated Language Incompatibilities xxx Source Code Missing Improper Form x Part Isn't Found Insufficient Representation Non-Functional Specs x Poor or No Search Tools xx Hardware Incompatibilities x Inability to Specify Search x Linkage to extraneous software x Too Much Modification Required x . No Economic Incentive . classified information cannot be reused Failure Cause . new engineers are not told of existing reuse libraries.wasn't fast enough .Problems (2 of 2) Reuse Failure Modes (1 of 3) . so part not 100% applicable.cultural problems Responses . No Success Model NIH Syndrome . No Attempt to Reuse Lack of Education xxxxx . Reuse Failure Modes Model Try to Reuse Reuse Survey . Non-Egoless Programming .typically.knowing where to look.lack of flexibility to fit in my design.wasn't portable to new hardware Part Integrated .knowing what I wanted it to do and finding out what it did.specify exactly what is needed and identifying the appropriate parts.Problems (1 of 2) What problems have you had in trying to reuse Part Exists software? Part Available . Legal Problems Utility of Reuse Unclear xxx .lack of usable library available Part Found . so have to modify it. so have to understand it in detail.didn't compile with new compiler Part Valid .

pp. and Frakes. • Tortorella. 1987. “A Computer Implementation of the Separate Maintenance Model for Complex System Reliability”. April. 1996. M. A. . 22(4). 274-279. W. Dagastino... New York : John Wiley. R. Pitman Books. New York.B. D. • J. Iannino. 1996. and Allen... Reliability Evaluation of Engineering Systems. William B. Okumoto. MA. • Frakes. D. R. Marshfield. R. Application. and Christopher J. Practical Engineering Statistics. Prediction. Software Reliability - Measurement. "Quality Improvement Using A Software Reuse Failure Modes Model" IEEE Transactions on Software Engineering. Fox. • Schiff. McGraw-Hill. and K. References • Billington. 1983. Musa.