You are on page 1of 13
3 Failure-Data Analysis 3-1 INTRODUCTION The definition of reliability given in Chapter 2 states that it is the probability of a device giving satisfactory performance for a specified period under Specified operating conditions, When a unit or system does not perform Satisfactorily, it is said to have failed, The pattern of failure can be obtained from life-test results, i.e., by testing a fairly large number of models until failure occurs, and observing the failure-rate characteristics as a function of time. The first step, therefore, is to link reliability with experimental or field-failure data. These data will also provide a basis for formulating or Constructing mathematically a failure model for general analysis. In this chapter, we shall discuss in detail the information that is obtained from the analysis of failure data. 3-2 FAILURE DATA Consider a series of tests conducted under certain stipulated conditions on 1000 electronic components. The total duration of the tests is 19 hours. The number of components that fail during each hourly interval is noted. The results obtained are tabulated as shown in Table 3-1. This table lists the total number of failed components at the end of 1 hr, 2 hr, 3 hr, The time interval is generally denoted by Ar, the number of failures during the interval is represented by f, and the cumulative failures to the end of the interval by F. Since the number of components failed during a particular interval only is noted at the end of the interval (or at the beginning of the next interval), the values of f are entered between two values of tas shown in column Q) first hour, 130 component the first and second hours, an additional 83 components fail, leaving 787 total number of components failed from the beginning of the test till a particular time is given in column (3). From the ‘Table, We see that the total number of units failed is 130 till the end of the first hour, 213 till the end of the second hour, 288 till the end of the third hour, and so on. Based on the failure data or survival- we can now define failure density, failu of failure. test results shown in Table 3-1, re rate, reliability, and probability - > 0l } pe FATLURE-DATA ANALYSIS . 21 TABLE 3-1 10002610 7 (1) @y @) @ ) © M Time No. of Cumulative No. of Failure Failure Retiability failures failures survivors density rate t f, F Sued. fy Z R 9 +0 = 1900. 1 a au 0.130 0139 an - k 7 js 0.8 to Bey =e 0.083 ool = 2 ke a3 2)3 <> 787 | 0.787 75 BS 102213, g.975 0.100 SS thle? ae ee 0.712 < a Voo-2t8 9.968 0.100 a 4 ~456 644 0.644 Rar 62 0.062 0.101 5 418 582 0.582 56 Si 0.056 0.101 ial 4 526 0.526 51 ; 0.051 0.101 VA. 525 AB ib 0.475 46 0.046 0.101 8.577 S71 429 0.429 41 0,041 0.100 yf 612 388 . 0.388 37 0.037 0.100 10 649.351 0.351 34 0.034 0.101 Va ee 683 317 0.317 j 0.031 0.103 tae 4 286 0.286 eae : 0.028 0.103 13 742 258 0.258 # 64 Z 0.064 0.283 © 14-< 806 194 0.194 Beg te es 0.076 0.486 ent 882 118 O18, 62 0.062 0.714. Ig 6 944 56 0.056 40 0.040 1.110 Ir APF 984 16 0.016 12 0.012 1.200 18 - 996 4 0.004 0.004 2.000 19 tae 1000 0 “Vo sum = 1.00 mean = 0.376 22 RELIABILITY ENGINEERING o& “ Failure Density fy [ fais interval of This is the ratio of the number of failures arin = ae Test. Fox te time to the total number of items at she very “peginning ot example bei fered, the t i al populatio the test was 1000. This is also known as the 1008 O's se fail is 130. During the first unit interval, the number, of pains the second U Hence, the failure density f, is 130/1000 = 0.130. Prailure density during interval, 83 more components fail. Thus, the value of MMT the tenth the second unit interval is 83/1000 = 0.083. simian 037, The values unit interval, the failure density has a value 37/1000 = 1 atace density 1s of fy are given in column (5) of Table 3-1. also called the ratio part-failure rate. ty 53 ee Let F be the Cee ‘of components that fail during ha oat e a interval, n, be the number that fail during the second unit int + on, Let N be the total population. Then, the failure density during the Ist unit interval = fu, = nlN ee = fa, = IN Sometimes, the failure density during the 2nd unit interval aaa the failure density during the i-th unit interval = fy = n/N Let J be the last interval after which there are no survivors, Then, Su, = nd AE we add fs fay Sayr--++ Lay We Bet Sa, + faz + fay + +--+ Sa, = n/N + nN + n/N +... 4 n/N = (nj +mtnz+...+n)/N = NIN = 1. (3-2) _Hence. the sum of the values entered in column (5) will be one. i Failure Rate Z Laat e: This is the ratio of the number of failures during’... Met i ii to the average population during. that: interme given in Table 3-1, the failure rate Z dy interval : Temg to the data the first uit interval is 130 Zia pe BO te = TI eH. Cant ay “AY, = Be = 010, The average population during any interval at the beginning and at the end of the i interval, the failure rate Z(12) ig is the average of the Populations i . interval, During the twelfth unit FAILURE-DATA ANALYSIS 23 : 31 31 212) = ——3!_ . 34 20,103, oy (317 +2862 ~ 301.5 7 _ Sometimes, the population at the beginning of the interval is taken instead of the average population. If this procedure is followed, the failure rate during the first unit interval will be Z(1) = 130/1000 = 0.130 and that during the twelfth unit interval will be Z(12) = 31/317 = 0.098. In our discussion, we shall follow the former procedure, using the average Population during the unit interval. The failure rates are entered in column (6) of Table 3-1. The failure rate is also known as the hazard rate. Sometimes, it is called the instantaneous failure rate. PT ee Reliability R This is the ratio of the survivors at any given time to the total initial population. The reliability at the end of the first hour will be R(1) = 870/ 1000 = 0.870. At the end of the second hour, R(2) = 787/1000 = 0.787. Similarly, at the end of the twelfth hour, R(12) = 0.286. This calculation of reliability conforms to our original definition. It will be recalled that reliability is the probability of a device functioning satisfactorily for a given period under stipulated operating conditions. For the series of performance tests under consideration, we can appropriately modify this definition to the extent that the device is required to function satisfactorily for at least the given period. In Chapter 1, we defined probability as the ratio of the number of successes to the number of trials. In the present case, we started with 1000 items or components. At the end of the first hour, the number of survivors was 870. This means that successful operation was observed in 870 cases out of 1000. Hence, the reliability (i.e., the probability of success) for the first hour is 0.870. At the end of the second hour, the total number of components passing the test is 787. Hence, the reliability for the second hour is 0.787. This is equivalent to saying that the probability of the component functioning satisfactorily for at least two hours is 0.787. Similarly, the reliability factor for the twelfth hour is 0.286. These reliability factors are entered in column (7) of Table 3-1. As the test proceeds, more components fail, with the result that the reliability factor decreases progressively. Since all the components fail by the end of the nineteenth hour, the corresponding reliability will be zero. ‘The reliability factors so obtained can also be called the pra} of survival for the first hour, second hour, third hour,... .7 24 RELIABILITY ENGINEERING St. it Probability of Failure ; The concept of the probability of failure is similar to that of the probability of survival. This is the ratio of the number of units failed (within a certain time) to the total population, For example, the probability of failure Cult the first hour would be 130/1000 = 0.130 since 130 units fail during the first hour out of a total population of 1000. Similarly, the probability. failure between ¢ = 0 and 1 = 2 (j.e., the probability of a component failing within two hours) is 213/100 = 0.213 since 213 components fail during the first two hours. The probability of failure between ¢ = 0 and t = 5 is (130 + 83 +75 + 68 + 62)/1000 = 418/1000 = 0.418. We have seen that “probability of. ival” is another term for reliability factor. We can similarly Bes The oom ity factor for the probability of failure. The sum of the reliability and-unréliability factors will obviously be equal to one. Survival and failure are, therefore, complementary events. If the reliability factor between t = 0 and 1 = ¢, is R(t,), the unreliability factor for the same period will be 1 — R(t 3-3. MEAN FAILURE RATE h The data in Table 3-1 show that the failure rate Z varies with time. In the first hour, the failure rate is 0.139, in the second hour, it is 0.101, and so on. It is also possible to calculate the mean failure rate forthe entire test cycle. This is the overall failure rate. We started with 1000 components and it took 19 hours for all of them to fail. Hence, the overall rate at which failure has taken place per component = (1/19) x (1000/1000) =-1/19. This, of course, is a very rough parameter. Since Tab! failure rate for every hour, we can get a much better estimate by taking the mean of these values. If Z, is the failure for the first hour, Z, the failure rate for the second hou, and Z, the failure r: le 3-1 gives the : ‘ fate for the T-th hour, the mean failure rate for.T-hours, will be. hag that art ST. _~ (3-3) 1f the interval is made much smaller than one hour, we ger’ coe value of the mean failure rate. This SIUM NETS accurate Aspect will be discussed in Section 3-9. 3-4. MEAN TIME TO FAILURE (MTTF) Consider the following example involving the life “testing of a new device. Example 3-1 In the life-testing of ten s i Pecimens of a mini-mixer, the timg, to failure for each specimen is recorded as given ee a ¢ the mean failure rate A for T as given in Table 3-2 Calculate the = 900 hours, and the mean time to failure for all ten specimens, Fo et a FAILURE-DATA ANALYSIS 25 TABLE 3-2, Oe ee ee ee ee ee Specimen Time to failure Specimen Time to failure number hours number hours 1 805. 6 832 2 810 7 842 3 815 8 856 4 820 9 875 5 825 10 900 or a In this example, since the number of samples tested is small, it is possible to note the time to failure of each sample. When the number of samples is large, we record the number of specimens failed in each interval of time as shown in Table 3-1. The’ mean failure rate is obtained from the formula gee gee TT ee iMo-mnr \) TNO) nT) (3-4) / where A(T) isthe mean failure rate for T hours, N(0) is the total population at T = 0, and M7) is the population remaining at time T. In other words, N(O) - (7) is the number of specimens failed in T hours. In the present case; we have h(900) = (1/900)[(10 - 0)/10] = 1/900. As indicated by the data, all ten specimens do not fail at the same time, They have different times to? failure. Hence, we can calculate the- mean time to failure for all ten specimens as MTTF = 71(805 + 810 + 815 + 820 +... + 900) = 8380/10 = 838 hours. Be “In generaly if-4-is the time to failure for the first specimen, f; the time to failure for the second speciffien, and ty the time to failure for the N-th specimen, the mean time to failure for N specimens will be an time to failure for N specimens MTTF = (1, +) +... + ty/N = hh. ? (3-5) As noted earlier, it is difficult to record the time to failure for each component when the number of specimens tested is very large. Instead, we can record the number which fail during specific intervals of time. For example, the interval of time for the, data given in Table 3-1 was chosen 2% RELIABILITY raga ? as one hour, and the number of specimens that failed during’ each hour was recorded. We assumed that all the specimens which failed dut a particular time interval took the same total time to failure. For example, from Table 3-1, we see that 37 specimens failed during the tenth hour. Although these 37 specimens might have failed at different instants during that time interval, we assume that on the average all of them took ten hours to fail. If n, is the number of specimens that failed during the first hour, my the number that failed in the second hour, and 7, the number ‘tin failed during the k-th hour, then the mean time to failure for N specimens e MTTF = (1, + 2ny + 3ny +... + kiN. (3-6) If the time interval is At instead of one hour, the mean time to failure becomes MTTF = nyAt + Imdt +... + katt... + In,An/N 1 = ¥ knAt, 3-7) Nee where nj is the number of specimens that failed during the first interval, ‘ny the number of specimens that failed during the second interval, and so on. This is‘illustrated in the next example. Example 3-2 In the life-testing of 100 specimens of a particular device, the number of failures during each time interval of twenty hours is shown in Table 3-3. Estimate the MTTF for these specimens. TABLE 3-3 Time interval Number of failures hours during the interval T < 1000 0 ae 1000 < 7 ¢ 020 >) ~ + 4020 < 7 < 1046) a : 1040 < Ts 1060 i 1060 < T< 1080 10 1080 < Ts 1100 i As the number of specimens tested is large, it is tedious to record the time to failure for each specimen. Instead, we note the number of specimens that fail during each 20-hour interval. Therefore, the mean time to failure from Eq. (3-6) is ” = if Kchange ~ ‘ | NALYSIS 27 a qa teh ~ - AE DATA Al Ke: MTTF = + [25(1920) + 41040) + 2011060) + 10(1080) § 100 we ¥ + 5(1100)] _ 104,600 ~ 100 = 1046 ‘hours. 3-5 MEAN TIME BETWEEN FAILURES (MTBF) In many situations, a unit or system can be repaired immediately after breakdown. In such cases, the mean time between failures refers to the average time of breakdown until the device is beyond repair. This topic willbe-taken up again in Chapter 9 while discussing the concepts of availability and maintainability. Gopect Is llustrated by the next example. Example 3-3 Table 3-4 gives the results of tests conducted under severe Adverse conditions on 1000 safety valves. Columns (3) and (4) illustrate ¥ how failure density f(¢) and hazard rate Z(t) are calculated when the time interval is four hours instead of one hour. The cumulative failures and the number of survivors are not given, but can be calculated easily. tpn, LY TABLE 3-4 j { A qd) (2) GB) @ << Time Number of Failure density Hazard rate interval failures 0 L fxd) “bed. et ) 6 0 sao. 0 7 00° es 2671(1000 x 4) = 6.0668 267/(867 x 4 4 0-4, 793.267 Se ae : 59/(1000 x 4) = 0.0150 59/(704 x 4 4-8, 174 59 Teo” 7m =~ —36/(1000 x 4) = 0.0090 36/(656 x 4) = 0.0137 8212 650/36 2's p r-16 14/24 24/(1000 x 4) = 0.0060 - 24/(626 x 4) = 0.0096 12- 231000 x 4) = 0.0058 . 23/(603 x 4) = 0.0095 16-20 53) 23 11/(1000 x 4) = 0.0028 11/586 x 4) = 0.0047 20-24 4 11 anes Note that the hazard rate is entered in between the time intervals. The average population during the first interval is [1000 + (1000 - 267)/2 = 867. Hence, the hazard rate during the first 4-hour interval is e Z = 267/(867 x 4) = 0.07699 = 0.0770. Ww The average population during the second 4-hour interval is (733 + 674)/2 = 704. Therefore, the hazard rate is i Z = 59/(704 x 4) = 0.02095 = 0.0210. The remaining values are obtained similarly. 0770 0210 6 - s/2, 6,.... ample 3-5 A hard plastic box designed to house for its impact strength by dropping it from a fixed for any damage. A total of 500 boxes were tested tabulated here: a multimeter is tested height and observing and the results are as 15°17) 20/21 23,25 Number of drops 10-12 13 F Number of boxes 30. 50 30 110 90 130 17 35 damaged eres In this example, the criterion is the number of drops that a given box can withstand without damage and not time. The probability that a box will withstand at least 9 drops without damage is unity since the first damage occurred at the end of 10 drops. Also, the probability that a box can withstand 25 drops or more is zero. Further, the probability that a box can withstand at least_15 drops is obtained by counting the number of boxes surviving more than 15 drops and dividing this number by 500. Thus, the probability is 500 - (30+ 50+30+110) 289 P(d < 15) = = =~ = 0.560. — 500 200 The failure density for this case will be defined as the ratio of the number of boxes failing per incremental drop after a given number of drops, to the initial population. For instance, at the end of the 12th drop, the number of boxes failing per incremental drop (i.e., for the 13th. drop) is 30, and the ratio of this number to the initial popyflation of 500 gives the failure density for the 13th drop. Thus, £4(13) = 30/500 = 0.060, However, the tabulation shows that incremental drop. The number of failu; aye oe for the 14th drop, i.e., for the next incremental drop after we ae rop. So, according to the definition, the failure density for the a . eae ra ohne Population at the end of the 14th drop * Peet he failure density for the 15th drop would failures do not occur after every res for the 13th drop is 30, but no = S ae == FF FAILURE-DATAANALYSIS 43 4 t Similarly, for the 16th, 18th, 19th, 22nd, and 24th drops, the failure densities are Zero as no failures are reported at the end of each of these drops. The failure density for the 17th drop is Sa(17) = 90/500 = 0.180. The failure density values are given in column (3) of Table 3-8. TABLE 3-8 a Number of Nuhor vf Pale Hiceed Os drops failures density | tate meee d fs fe Zz Rd) 0 0 0.000 0.000 1.000 10 30 0.060 0.060 0.940 12 50 0.100 0.106 0.840 got] (3 a0 0.060 0.071 0.780 15 110 0.220 ~~ 0.282 0560 grt 17 90 0.180 0.321 0.380 20 130 0.260 0.684 0.120 2 0 0.034 0.283 0.086 23. ‘55 0.070 0.814 0.016 25 8 0.014 1,000 0.000 “ sum = 500 For the present case, the hazard rate Z can be defined as the rate at which failure occurs per incremental drop after a given number of impacts, assuming that no failure has occurred prior to that incremental drop. Thus, the hazard rate for the 13th drop (after the 12th drop) is the ratio of the number of failures o¢curring for the 13th drop (which is given as 30 in Table 3-8) to the population at the beginning of the 13th drop (which is 500 - 80 = 420). Hence, Z,3 = 30/420 = 0.071. As already noted, the failures do not occur after every incremental drop. Accordingly, Ziq = Zig Ziy = Ziy = 2p = Zr = 0. and since the number of survivors at the end of the 14th drop is (500 — 110 =) 390, Zys = 110/390 = 0.282. i) “4 RELIABILITY ENGINEERING of the number Similarly, since at the end of the 18th and of ae 19th drop is failures is zero, the number of survivors at (500 — 310 =) 190, and hence Zyq = 130/190 = 0.684. The values of Z have been entered in column ( ‘The values of reliability R(d) are given 1” ‘These ave the values of the probability that the given number of drops. Thus, the probability tha minimum of 15 drops is total number of initial population ilures till end of 15 drops 5 RIS) = 1- = 1-(30 + 50+ 30+ 110)/500 = 0.560. ‘As no failure is observed at the end of the 14th drop, (14) = R(13) = 1 ~ (0+ 50 + 30)/500 = 0.780, As Eqs. (3-15) and (3-20) show, the hazard rate or the failure rate is well suited for continuous functions. In the case of discrete functions (like if the test results show failures for the number of impacts, drops, etc.), ‘m incremental loads, impacts, hours, every incremental step (like unifori tic), the calculation of failure rate is straightforward. This is also true for the calculation of failure density. If the test results do not give non-zero vals for every incremental step sin our example, then the calculations of failure density and failure rate as per definitions do not yield satisfactory crnes, However, in such cases, a different approach can be adopted. ‘Thus, instead of tabulating the test results for every incremental dro er can choose a suitable class interval (like 2 consecutive dr O25 consecutive drops) and tabulate the number of failures for Sener ae lass intervals. In our example, choosing a class interval of th eae we can calculate the number of failures, failure density, fai a reliability for each of these class intervals as shown i ie allure ats gan case, the failure density will be defined as the rato. abe Sp tata boxes failing per incremental class of drops after © of the number of to the inital population. For instance, after th Rgiven class of drops incremental class of drops i (15, 16, 17) and fon heb 2 13. 14), the failures is 200. The ratio ofthis to the initial * this class, the number of 500 = 0.400, is the failure density forthe clase Ca naa oes ees 2001 hazard rate "Z can be defined 26 the rate ae pict gt? Similarly, the incremental class of drops after a gi aa ? assuming no failure haf oecutied ori toga Thus, after the class (I2, 13, 14), he meget 390, and the number of failures in the me 17), is 200. Hence, ¢ next ine: 215, 16, 17) = 200/390 = 0.513, which failure occurs per 7 of drops (or impacts), cremental class of drops. of survivors is 500 — 110 = remental class, i.e., (15, 16, FAILURE-DATA ANALYSIS 45 On the same lines, the probability that the box will survive at least a given number of class of drops is the reliability R(@). Thus, the probability that a box will survive till the end of class (15, 16, 17) is R(IS, 16, 17) = 1 - (30 + 80 + 200)/500 ) = 1 - 310/500 = 0.380. The values of f,, Z, and R(d) are given in columns (3), (4), and (5), respectively, of Table 3-9. TABLE 3-9 qd) (2) GB) (4) (5) Number of Number of Failure Hazard Reliability drops failures density tate ‘~ n “fi Zz R@) 9,10, 11 30 . 0.060 0.060 9.940 / 12, 13, 14 80 0.160 0.170 15, 16, 17 * 200 0.400 0.513 0.380 18, 19, 20 130 0.260 0.684 0.120 21, 22, 23 52 0.104 0.867 0.016 24, 25, 26 8 0.016 1.000 0.000 sum = 500 PROBLEMS 3-1 In a printed page, it is observed that the frequency with which different alphabets and spaces occur varies considerably. Consequently, it becomes irrational for a small letterpress printer to stock up the same number of all alphabets since some alphabets appear much more frequently than the others. Pick a printed page at random and tabulate the frequency with which each alphabet appears in it. Represent this graphically through a histogram. in a survival test conducted on 100 cardboard boxes for their strength undes-impact loading, the following results were obtained: = impacts 20°22 24 26 29 32 35 37 40 lumber of boxes 7 10.15 4 15 13 13 8 5 failed For this case, how will you define failure density, failure rate, and teliability? Tabulate these quantities and represent them graphically, 33 A series of tests were conducted to determina tha sonassaan= =A

You might also like