You are on page 1of 6
13.9, Software Reibilty 787 37. A producer of pocket calculators estimates that the calculators fail ata rate of one every five years. The calculators are sold for $25 each with a one-year free replacement warranty but can be purchased from an unregistered mail-order source for $18.50 without the warranty. Is it worth purchasing the caleulator with the warranty? 38. For Problem 37, what length of period of the warranty equates the replacement costs of the calculator with and without the warranty? 39. Zemansky’s sells tires with a pro rata warranty. The tires are warranted to deliver 50,000 miles with the rebate based on the remaining tread on the tire. The ties fail on the average after 35,000 miles of wear. Suppose the tires sell for $50 each with the warranty. If failures occur completely at random, what would be a consistent price for the tres if no warranty were offered? 40. Habard’s, a chain of hardware stores, sells a variety of tools and home repair items. One of their best wrenches sells for $5.50. Habard’s will include a three-year fr replacement warranty for an additional $1.50. The wrench is expected to be sub- ject to heavy use and, based on past experience, will fail randomly at a rate of one every eight years. Is it worth purchasing the warranty? 41. Consider the case in which the failure mechanism for the product does not obey the exponential law. In that case, the cost under the free replacement warranty that is indifferent to the cost of buying the item without a warranty is given by C= KIM(M,) + 1, where M(t) is known as the renewal function. If the time between failures, 7, follows an Erlang law with parameters A and 2, then At Mi) =F ~ 0.25 + 0.25" — forall = 0. (See, for example, Barlow and Proschan, 1965, p. 57.) 4. For Example 13.13, presented in this section, determine the indifference value of the item with a free replacement warranty when the failure law follows an Erlang distribution. Assume that A = 3 to give the same value of E(T)as in the example. ‘, Is the value of the warranty larger or smaller than in the corresponding expo- nential case? Explain the result intuitively. 13.9 SOFTWARE RELIABILITY Software reliability is a problem with characteristics different from hardware reliabil- ity problems. Typically, new software possesses a few “bugs,” or errors. Ideally, one would like to remove all the bugs from the software before its release, but thet may be impossible. It is more reasonable to release the software when the number of bugs has been reduced to ar acceptable level. Predicting the number of remaining bugs is, how- ever, a difficult problem. The importance of software reliability cannot be overemphasized. Quoting from The Wall Street Journal (Davis, 1987): ‘The tiniest software bux can fell the mightiest machine—often with disastrous consequences, During the past five yeas, software defects have Killed sailors, maimed patents, wounded WN ttetatey Snapsh: RELIABILITY-CENTERED MAINTENANCE IMPROVES OPERATIONS AT THREE MILE ISLAND NUCLEAR PLANT The Three Mile Island nuclear facility located on the Susquehanna River about 10 miles from Harrisburg, Pennsylvania, is notorious in one respect. It was the site of the worst nuclear power generating plant accident in the United States. In March of 1979, Unit 2 underwent a core meltdown as safety systems failed to lift nuclear fuel rods from the core. The facility was shut dows as a result of the accident for the nextsix and one-halfyears, finally reopening in October 1985. The plant, operated by GPU Nuclear Corporation, has compiled one of the ‘most impressive records in the industry since it has re- pened Arcarding ta Fox et al. (1994), the plant was ranked top in the world in 1989 on the basis of its capac- ity factor (proportion of up-time). In 1987, GPU began to consider the benefits of a reliablity-centered maintenance (RCM) approach to preventive maintenance. They identified 28 out of a total of 134 systems as viable candidates for RCM. These 28 systems included the main turbine, the cooling water system, the main generator, and circulating water, The RCM process relied on the following four basic principles Preserve system functions. ‘+ Identify equipment failures that functions. defeat those ‘+ Prioritize failure modes. ‘= Define preventive maintenance tasks for high-priority failure modes. ‘The RCM project spanned the period of September 1988 to June 1994, A total of 3,778 components in the 28 subsystems came under consideration. By the end of the program, preventive maintenance policies included more than 5,400 tasks for these components. The cost of implementing RCM was substantial: about $30,000 per system. However, these costs were more than offset by the benefits. Over the period 1990 to 1994, records show a significant decline in plant equipment failures. n addi tion, a reliabilty-based maintenance program can have other benefits, including + Increased plant availability + Optimized spare parts inventories. * Identification of component failure modes. * Discovery of new plant failure scenarios. + Training for engineering personnel * Identification of components that benefit from revised preventive maintenance strategies. + Identification of potential design improvements. * Improved documentation. Fox et al. (1994) report several lessons learned from this experience. One is that it is better for the intemal maintenance organization, rather than an outside agency, to direct the process. This avoids the “we versus they" syndrome, Successful implementation is also more likely in this case. A cost analysis checklist was developed to screen failure modes. Finally, the team evolved an efficient multiuser relational database software systemto facilitate RCM evaluations. This system reduced the time required to perform the necessary analyses by 50 percent. The lesson learned from this case is that a carefully designed and implemented reliability-based preventive maintenance program can have big payoffs for high stakes systems, corporations and threatened to cause the government-sccurities market to collapse. Such problems are likely to grow as industry and the military increasingly rely on software to run systems of phenomenal complexity, including President Reagan's proposed “Star Wars” ant missile defense system, Several models have been proposed for estimating software reliability. However, we will not present these models in detail because their utility has yet to be determined. Jelinski and Moranda (1972) have suggested the following approach. Let N be the total initial error content (i, the number of bugs) in the software. As the software under- goes testing, the number of bugs is reduced. They assume that the failure rate (that is, the likelihood of detecting a bug) is proportional to the number of bugs remaining in. the program, where @ is the proportionality constant. That is, the time until detection 13.10 Hiss Nows 789 of the first bug has the exponential distribution with parameter N¢; the time between detection of the first and the second bugs has exponential distribution with parameter (N ~ 1d; and so on. Hence, as bugs are removed from the program, the amount of time required to de~ tect the next bug increases. After n bugs have been removed, one will have observed the values of 7), 72, ... Ty representing the time between successive detections. These observations are used to estimate ¢ and N using the maximum likelihood principle. Based on these estimates, one could predict exactly how much testing would be required in order to achieve a certain level of reliability in the software. Shooman (1972) suggests using a normalized error rate to measure the error content in the program, He defines p(t) = Errors per total number of instructions per month of debugging time and develops a reliability model based on first principles. He demonstrates how this model can be used to build a functional relationship between the amount of time devoted to debugging and the reliability of the program. ‘The works of Jelinski and Moranda and of Shooman represent the foundation of the theory of software reliability. Extensions of their methods have been considered. It remains to be seen, however, if these methods provide accurate descriptions of the problem and whether they ultimately will assist in predicting the time required to achieve an acceptable level of reliability. 13.10 HISTORICAL NOTES Much of the theory of reliability life testing, and maintenance strategies has its roots, in actuarial theory developed by the insurance industry. Sophisticated mathematical models for predicting survival probabilities date back to the turn of the century. Lotka some of the connections between equipment replacement models and ‘The work of Weibull (1939 and 1951) laid the foundations for the subject of fatigue life in materials. Interest in reliability problems became considerably more widespread during World War II when attempts were made to understand the failure laws governing complex military systems. During the 1950s, problems concerning life testing and missile reliability began to receive serious attention. In 1952 the Department of Defense established the Acvisory Group on Reliability of Electronic Equipment, which pub- lished its first report on reliability in June of 1957. The origins of tie specific age replacement models presented in this chapter are un- clear. However, sophisticated age replacement models date back as far as the early 1920s (see Taylor. 1923, and Hotelling, 1925). The stochastic planned replacement models presented in Section 13.7 form the basis for much of the research in replace~ ment theory, but the origins of these models are unclear as well. Section 13.8, on warranties, is based on the paper by Blischke and Scheuer (1975). Extensions and corrections of their work can be found in Mamer (1982). Readers interested in pursuing further reading should refer to the excellent texts by Barlow and Proschan (1965 and 1975) on reliability models, and by Gertsbakh (1977) on mai tenance strategies. Issues concerning the application of maintenance models are di cussed by Turban (1967) and Mann (1976). 790 Chapter Thirteen Rell and Mainuanailty 13.11 Summary The purpose of this chapter was to review the terminology and the methodology of the theory and application of reliability and maintenance models. Reliability theory is an area of study that has received considerable attention from mathematicians. However, the mth ematics is of interest not only for its own sake, These models are extremely useful in an operational setting in considering such issues as failure characteristics of operating equip- ‘ment, economically sound maintenance strategies, and the value of product warranties and service contracts. ‘The complexity of the analysis depends upon the assumptions made about the random variable T, which represents the lifetime of a single item or piece of operating equipment. The distribution function of T, F(), is the probability that the item fails at or before time 1(P(T= 1)), whereas the reliability function of T, R(), i the probability thatthe item fails after time #(P{7 > ¢}). An important quantity related to these functions is the failure rate fietion r(t), which is the ratio /()/ R() of the probability density function and the reliabil- ity function, IfA/is sufficiently small, the term r()A¢ can be interpreted as the conditional probability thatthe item will fail in the next Ar units of time given that it has survived up until time f The failure rate function provides considerable information about the aging charactecis- tics of operating equipment. In a manufacturing environment, we would expect that most operating equipment would have an increasing failure rate function. That means it would be more likely to fail as it ages. A decreasing failure rate function can arise when the likeli- hood of early failure is high due to defectives in the population. The Neibull probability aw can be used to describe te failure characteristics of equipment having either an increasing ora decreasing failure rate function, Of interest is the case in which the failure rate function is constant. This case gives rise to the exponential distribution for the lifetime ofa single component. The exponential distri- bution is the only continuous distribution possessing the memoryless property. This means thatthe conditional probability that an item that has been operating up until time fails inthe next s units of time is independent of The Poisson process describes the situation in which a single piece of operating equip- ‘ment fails according to the exponential distribution and is replaced immediately upon fail- ure. When this occurs, the number of failures in a given time has the Poisson distribution, the time between successive failures has the exponential distribution, and the time for failures to occur has the Erlang distribution, ‘The chapter considered the reliability functions of complex systems of componeats. It showed how to obtain the reliability functions for components in series and parallel from the reliability functions ofthe individual components, The chapter also considered K out of N systems, which function only ifat least K components function. Reliability issues form the basis of the maintenance models discussed in the latter half of the chapter. An important measure of a system's performance is the availabilty which is the proportion of the time that the equipment operates. We treated both deter ministic age replacement models, which do not explicitly include the likelihood of equipment failure, and stochastic age replacement models, which do. The stochastic models allow for replacing the equipment before failure. This is of interest when items have an increasing failure rate function and unplanned failures are more costly than planned failures Finally, we concluded the chapter with a discussion of the economic value of warranties ‘A warranty isa promise supplied by the seller to the buyer to either replace the item with a new one if it fails during the warranty period (free replacement warranty) or provide a discount on the purchase of a new item proportional to the remaining amount of time Addional Problems on Relay and Mainuaindilty 794 (or wear) in the warranty period (pro rata warranty). The issues surrounding warrenties and service contracts are similar, but service contract models are considerably more complex, ‘owing to the need t) include multiple levels of repait. Additional Problems on Reliability and Maintainability 42. A large nationsl producer of appliances has traced customer experience with a popular toaster oven. A survey of 5,000 customers who purchased the oven early in 2000 has revealed the following: Number of Year Breakdowns 2000 188, 2001 58 2002 63 2003 R 2004 54 2005 n 4a. Using these data, estimate py = the probability that a toaster oven fails in its Ath -year of operation, for k= 1,..., ‘b. What is the likelihood that a toaster oven will last atleast six years without failure based on these data? ¢. The discrete failure rate function has the form ri = px/Ri-1, where Ris the prob- ability that a unit survives through period &: Determine the failure rate function for the first five years of operation from the given data. d. Suppose that you purchased a toaster oven at the beginning of 2004 and itis still ‘operating a: the end of 2007. If the reliability has not changed appreciebly from 2000 to 2007, use the results of part (c) to obtain the probability that it will fail during the first two months of calendar year 2008. 43. Six thousand ‘ight bulbs light a large hotel and casino marquee. Each bulb fails completely at random, and each has an average lifetime of 3,280 hours. Assuming that the marquee steys lit continuously and bulbs that bum outare replaced immedietely, how ‘many replacemrents must be made each year on the average? 44, The owner of tie hotel mentioned in Problem 43 has devided that in order to devsease the number of burned-out bulbs, she will replace all 6,000 bulbs at the start of each ‘year in addition to replacing the bulbs as they burn out. Comment on the effectiveness of this strategy 45. The owner of the hotel mentioned in Problem 43 falls on hard times and dispenses with replacement of the bulbs. She notices that more than half of the bulbs have burned out before the advertised average lifetime of 3,280 hours and decides to sue the light bulb ‘manufacturer for false advertising. Do you think she has a case? (Hint: What fraction of the bulbs would be expected to fail prior to the mean lifetime?) 46, Continuing with the example of Problem 43, determine the following: 4. The proportion of bulbs lasting more than two years. '. The probability that a bulb chosen at random fails in the frst three months of operation. 792. Chapter Thirteen Rly and Mainaanailty . The probability that a bulb that has lasted for 10 years fails in the next three months of operation. 47. Assume that the bults in Problem 43 are not replaced as they fail ‘a. What fraction of the 6,000 bulbs are expected to fail in the first year? 'b, What fraction of the bulbs surviving the first year are expected to fail in the second, year? ‘c. What fraction ofthe bulbs surviving the nth year are expected to fail in yearn + 1 for any value ofn = 1,2,...? d. Using the results of part (c), of the original 6,000 bulbs, how many would be expected to fail ir the fourth year of operation? 48, The mean value of a Weibull random variable is given by the formula waa PTC + 1/8) where I represents the gamma function. The gamma funetion has the property that 14k) = & = HT = 1) for any value of k > 1 and PC) = 1, Notice that if & is an integer, this results in [(&) = (k ~ 1)! If is not an integer, one must use the recur- sive definition for I) coupled with the following table, For values of 1 = k= 2, T(&) is given by k re) k re) 100 1.0000 1.55. 889) 105 9735 1.60 2935 110 9514 1.65. 9001 115 9330 1.70 ‘9086 120 9121.75 9191 125 9064 1.80 9314 130 9751.85. 9456 135 9121.90 9612 140 8731.95 9799 145 8857 200 ~—*1.0000 150 8862 For example, this table would be used as follows: T(3.6) = (2.6)(1.6)1'(.6) = (2.61.6)(.8935) = 3.717. ‘a. Compute the expected failure time for Example 13.4 regarding copier equipment. 5. Compute the expected failure time for a piece of operating equipment whose fail- ure law is given in Example 13.2. ‘. Determine the mean failure time for dd. Determine the mean failure time for a = 2.6)02.6) = 35 and B = 0.20. 1.90 and B = 0.45. 49. Suppose that a particalar light bulb is advertised as having an average lifetime of 2,000 hours and is known to satisfy an exponential failure law. Suppose for simplicity thatthe bulb is used continuously. Find the probability thatthe bulb lasts ‘a. More than 3,000 hours. , Less than 1,500 hours ¢. Between 2,000 and 2,500 hours. 50. Applicational Materals sells several pieces of equipment used in the manufacture of silicon-based microprocessors. In 2003 the company filled 130 orders for model 55212. Suppose that the machines fail according to a Weibull law. In particular, the cumulative distribution function F(() of the time until failure of any machine

You might also like