You are on page 1of 50



The next 2 slides cover some of the terminology you will encounter in this topic.
Stochastic means involving or subject to probabilistic behaviour. That is, the
behaviour cannot be predicted exactly. A stochastic model is contrasted to a
deterministic model where the relationships between the variables affecting the
system behaviour are known and defined.
Component failures represented as a point process tell us when on a time line the
failure events occurred and for a single component the time between the failures
gives us the times to failure for the component. To model these times with a
probability distribution the failures must be independent and identically distributed.


System failures represented as a point process give us when on the time line the
failure events occurred but tell us nothing about the time to failure of the individual
components causing the system failure.
If a system with a large number of components is in a steady state it can be modelled
using a homogeneous Poisson Process. Homogeneous in this context means that the
expected number of failures in any interval is constant and the distribution of the
time between failures is exponential.


The close conjunction of failures might be a chance phenomena but the
independence of failures occurring in quick succession needs be ascertained to gain a
better appreciation of the system or component reliability.
Independent means the failures are not related.
One failure does not contribute to the other or they do not have a common cause.
I have given some examples here of situations where failures may not be

ENGG960 Peter Gordon UOW


6 . Note that although it is not helpful for the analysis we are doing in most situations we would like components to become increasingly reliable.Identically distributed means that all components drawn from the population have the same probability of surviving any given period. I have listed some situations in which components lives may not be identically distributed. This must hold for components concurrently in operation or components placed in operation sequentially over a period of time. In fact as a maintenance engineer you probably spend a lot of your time trying to establish positive trends in component and system reliability.

7 .

The average rate of occurrence of failure (ROCOF) for each component is the inverse of that component’s mean life. If the component failures are independent renewal processes the point process for the system is termed a superimposed renewal process. (MTTF). The average rate of occurrence of failure for the system is the sum of the average ROCOF of the components. 8 .This slide shows a diagrammatic representation of a stochastic point process for a system that comprises 3 components. The failure of any of the components causes a system failure.

9 .

The components fail by wear out mechanisms with the individual component lives normally distributed.The data from the plot was derived from simulation of a system with 1000 components in series. The apparent initial deteriorating trend is characteristic of a superimposed renewal system operated from new and is not indicative of reliability problems with the system components. The system commences operation with all components in new condition. Components are replaced on failure. The system is not preventively maintained. The plot shows that the ROCOF increases from zero to reach a constant average rate after about 150 days. 10 .

This plot magnifies the apparent initial deteriorating trend from the previous plot by showing the first 300 failures only. If there was no trend we would expect the average ROCOF to be constant.5) 11 . The application of this trend test is described in Ebeling (Ch 16. (The line of best fit would have to be a straight line) The function for the ROCOF is obtained by differentiating the power function for the cumulative number of failures. Clearly in fitting a power function to the plotted data we will never get an index of exactly 1. The failure arrivals form a non-homogeneous Poisson process and with this form of ROCOF (intensity) function the process is often referred to as a power law process. I.e. the power index would be 1. The regression line of best fit shown is a power function of the operating time. It can be seen more clearly here that the average ROCOF in this time period is not constant. To test whether the non-homogeneous Poisson process is in fact the most appropriate model we can use the AMSAA model.

12 .

Note using the MLE for b gives a value of b = 3. 13 .29 This does not change the conclusion.

14 .

The variable capital Xi gives the individual times to failure. The variable lower case xi gives the cumulative time to each failure from the start of observation. (identically distributed) ENGG960 Peter Gordon UOW 15 . In this example observation continues beyond the last observed failure. The observation interval is designated x0. The test compares the mean of the cumulative times to failure for the sample data with the mean that would be obtained if the times to failure were drawn from the same population.This is the similar to the diagram given in O’Connor . It shows a point process for a single component.

87% chance that the variable (in our case the average of the cumulative times to failure) will have a value that is more than 1 standard deviation from the mean. We know from the normal distribution that there is a 15. then the distribution function of Sn is wellapproximated by the normal distribution ENGG960 Peter Gordon UOW 16 . Ie it could indicate the component is becoming less reliable.If there is no trend the average of the cumulative times to failure will equal half the period of observation and the Std Dev will be given by the period of observation divided by the square root of 1 upon 12n where n is the number of failures. If U is negative it means that more failures were observed in the first half of the period of observation and the times to failure are increasing = good trend.87% of the time. The statistic U is a standard normal variable. Even in the situation where there is no trend the average of the cumulative times to failure for any sample will show variation from the population mean. This would be a bad trend. *We know from the Central Limit Theorem that if Sn is the sum of n mutually independent random variables.* If U is positive it means that more failures were observed in the second half of the period of observation and the times to failure are decreasing . However the indicated trend may not be statistically significant. (U>1) This means that if we concluded from the fact that U = 1 that the data was trending we would be making an error 15.

15. A value of U of 1. ENGG960 Peter Gordon UOW 17 . In most cases this would be an acceptable level at which to reject the null hypothesis.645 gives a 5% level of significance.87% would normally be considered too large for a level of significance. In our case the null hypothesis is that the data is not trending.645 or -1.The probability we are making an error in rejecting the null hypothesis is termed the level of significance.

ENGG960 Peter Gordon UOW 18 .If the period of observation is stopped at a failure the statistic is calculated in the manner shown.

The test was shown at the 5% significance level (U>1.89 is however effectively zero.645) The probability of getting a value of the test statistic of 15. 19 .

In this situation we can use a Laplace test for the pooled data. 20 .In a fleet situation we may record the times between failures for each individual system but if our approach to managing and maintaining each system is the same we will be interested in the overall picture of trends for the fleet.

21 .

If the slope of the plot is increasing with age the population rate of occurrence of failure is increasing.This method uses the data from all the samples to calculate the average number of repairs that could be expected to have occurred by a given mileage. (Sad trend) 22 . This plot is described as the Nelson-Aalen plot in most references and the mean cumulative function is also referred to as the cumulative intensity function.

the failure data are plotted on a time line. In the reliability context.” This diagram from O’Connor depicts point processes for 6 sub-systems and a superimposed point process for the overall system. For example. seasonal changes. or different operating patterns can be shown on the chart. The system is overhauled every 1000 hours and from visual examination you can see failures clustered after each overhaul indicating the overhaul is actually adversely affecting reliability and also failures paired in a number of places for the sub-systems. overhaul intervals. This could indicate that the failures are not independent of each other. along with other information.O’Connor explains that “exploratory data analysis is a simple graphical technique for searching for connections between time series data and explanatory factors. A pattern like this would warrant further investigation of the nature and cause of the failures. 23 .

You are invited to determine whether the data-sets are trending from visual examination of the point process for each component and for the superimposed process for the system. 24 .This diagram is reproduced from O’Connor’s text. O’Connor shows the value of the Laplace statistic for each of the parts comprising the system and also for the superimposed system at the bottom. Compare this with O’Connor’s assessment based on the U value.

25 .

26 .

This is a diagrammatic representation of a stochastic point process for a pump that is replaced on failure. The times to failure are shown in the table in chronological sequence. The crosses indicate the times the failures occur measured from the time of the original installation. In this type of situation it could take many years to gather failure data for a component 27 . The pump has been replaced 5 times since first installation.

28 .

29 .

The Lewis Robinson test is more generally applicable and is calculated by dividing the Laplace statistic by the coefficient of variation for the sample. 30 . For the normal distribution or other distributions associated with wear out failure mechanisms the coefficient of variation will be less than 1. These distributions are said to be underdispersed with respect to the exponential.The Laplace test is applicable where the times between failure are exponentially distributed. The Laplace test applied to data from a wear-out failure mechanism will give a result biased towards accepting the null hypothesis. If it is clearly less than 1 this indicates that the data is under-dispersed with respect to the exponential distribution and we are dealing with a wear-out failure mechanism. For the exponential distribution the coefficient of variation is equal to one. The coefficient of variation is the standard deviation divided by the mean. The coefficient of variation of a sample gives an indication of the type of failure mechanism causing the component to fail.

46 𝑋 −𝑋 2 𝑖 𝑖 Cv = /𝑋𝑖 = 0.Although you have the answer I suggest you attempt this calculation yourself. Sample avg of cumulative times to failure ∑n-1x / n-1 = 358 Mean = x0/2 = 798/2 = 399 SD = x0 √(1/(12(n-1)) = 798 x √(1/(12x11) = 69.29% of the time.216 𝑛−1 We would reject the null hypothesis and conclude the data was trending. The level of significance is 0.29%. This is well within the 5% level of significance normally used to test such an hypothesis. This means that if we conclude the data is trending we will be wrong 0. 31 .

32 .

(i. This result is in line with that from the Lewis-Robinson test. A time series depiction of the times to failure provides another way to check for trend. with (n-2) degrees of freedom. A statistical test can be applied to test the significance of the slope of the regression line.e. 0.09% probability of getting this value if the population regression slope was 0 suggesting strong evidence that the data is trending. The P-value for this test is half the P-value shown in the excel table.This is a time series depiction of the data for our example. With no trend the linear regression line would be horizontal. (0. There is 0.18. A section of the Excel output table showing the t Stat for the slope is pasted over the chart on this slide.29%) Note that the MS Excel data analysis function “Regression” calculates the t-test value. The P-value given by Excel is for a 2 tail t-test.0018 / 2) 33 . This test is described in the E-reading reference (Walpole and Myers) It gives a t-test value of 4.

34 .

Another situation that you may encounter quite frequently is where a number of identical components are in operation concurrently. 35 . In this situation more renewal data will be available but we must be confident that the loads on the components are similar.

In a fleet we are looking at one or more identical components in concurrent operation in many systems This type of situation could generate lots of renewal records.A fleet situation is similar to the previous case of many components in one system. 36 .

37 .

If however we have “identical” components sourced from the same supplier that are operating concurrently we get a different chronological sequence defined by the installation dates from that defined by the replacement dates. 38 .When we test for trend we are testing to see whether the times to failure are dependent on the order in which they occur. These are shown on the next slide. If we have a point process for a single component the sequence defined by installation date is the same as that defined by replacement date. We test for trend when the data is in chronological sequence.

39 . In the example above the times to failure are reasonably tightly banded and consequently the sequences are very nearly the same. If over a period of time however. repair or installation of a component that could affect the life that it achieves. In grouping data in this manner for multiple single component renewal processes we are essentially analysing the data as a single sequence originating from a supplier or a repairer.If we sequence the ages at renewal based on the installation date we are in effect looking to see whether the times to failure are dependent on the date the components were supplied or installed. The linear regression trend test can be used to test for trend in the sequence and also the Lewis-Robinson test can be applied to the combined sequence as if it were a single process. It is not so obvious how the times to failure could be dependent on the date the components were replaced as this is after the event. Variability in repair and installation processes is reflected in the variance of the times to failure. There are a number of things that occur in supply. repair or installation the mean life is likely to show an increasing or decreasing trend. there are systemic changes to methods or standards related to supply. If these factors are random the mean life will not be affected.

(0.37) 40 .The P-Value is half the value shown in the excel table for a 2 tailed t-test.

41 .

This result is very close to that from the linear regression trend test (Pvalue = 18.The LR test confirms that the apparent deteriorating trend is not statistically significant. 42 .5%) Note the P-value is the lowest level of significance at which the observed value of the test statistic is significant.

43 .

99% 44 .From the statistical test the evidence of trend is borderline with a significance level in rejecting the null hypothesis of 4.

Note that had we applied the Laplace test we would have accepted the null hypothesis of no trend. The reliability is showing a deteriorating trend but the statistical significance of the trend is borderline and would warrant continued observation of the system.The Lewis-Robinson test provides marginal evidence of trend. 45 .

46 .

47 .

48 .

49 .

50 .