CprE 545: Fault Tolerant Systems (G. Manimaran)

1

This metric is more useful to the user than the reliability measure. Manimaran) 2 . The MTBF is usually expressed in hours. CprE 545: Fault Tolerant Systems (G.Definitions Reliability of a system is defined to be the probability that the given system will perform its required function under specified conditions for a specified period of time. MTBF (Mean Time Between Failures): Average time a system will run between failures.

3. 2. 3. Worst case design Using high quality components Strict quality control procedures 1. Manimaran) 3 .Approaches to increase the reliability of a system Increasing reliability of a system 1. 2. Redundancy Typically employed Less expensive CprE 545: Fault Tolerant Systems (G.

R(t) = 1 . Manimaran) 4 .t CprE 545: Fault Tolerant Systems (G.t) where is the failure rate expressed as percentage failures per 1000 hours or as failures per hour.Reliability expressions Exponential Failure Law: Reliability of a system is often modeled as: ± R(t) = exp(. ± When the product ³ t´ is small.

CprE 545: Fault Tolerant Systems (G.Relation between MTBF and the Failure rate MTBF is the average time a system will run between failures and is given by: ± MTBF = 0 R(t) dt = 0 exp(. Manimaran) 5 . the MTBF of a system is the reciprocal of the failure rate. ± If ³ ´ is the number of failures per hour.t) dt = 1 / ± In other words. the MTBF is expressed in hours.

= (0.02 / 100) * (1 / 1000) * 4000 = 8 * 10-4 failures/hour MTBF = 1 / (8 * 10-4 ) = 1250 hours CprE 545: Fault Tolerant Systems (G. Calculate and MTBF. Manimaran) 6 .02% per 1000 hours.A simple example A system has 4000 components with a failure rate of 0.

6 R(t) 0.36 .8 Reliability 0.2 0 1 MTBF 2 MTBF Time t CprE 545: Fault Tolerant Systems (G. ± MTBF = t / (1 ± R(t)) 1.4 0.0 0. Manimaran) 7 0.Relation between Reliability and MTBF R(t) = (1 ± t) = (1 ± t / MTBF) Therefore.

5%/(1000 hours). of components = 10000 ± = failure rate of a component = 0.01 / av ± Where av is the average failure rate ± N = No. t = 0. Manimaran) 8 .An example A first generation computer contains 10000 components each with = 0.99) ± t = MTBF * 0.01 = 0.01 / (5 * 10-2 ) = 12 minutes CprE 545: Fault Tolerant Systems (G. av = N = 10000 * 5 * 10-6 = 5 * 10-2 per hour Therefore.5% / (1000 hours) = 0. What is the period of 99% reliability? MTBF = t / (1 ± R(t)) = t / (1 ± 0.005/1000 = 5 * 10-6 per hour Therefore.

Reliability for different configurations 1. R = RN 2. Manimaran) . Series Configuration 1 2 3 4 N R R R R R Overall reliability = Ro = R * R * R«. Parallel Configuration Ro = 1 ± (probability that all of the components fail) Ro = 1 ± (1 R)N 1 2 R R N R 9 CprE 545: Fault Tolerant Systems (G.

Manimaran) 10 .Reliability for different configurations 3. Hybrid Configuration 1 1 2 N 2 M R R R R R R Overall reliability = Ro = ? CprE 545: Fault Tolerant Systems (G.

Reliability for different configurations 4. Triple Modular Redundancy (TMR) 1 2 M R R R Voting Overall reliability = Ro = [3C2 * R2 * (1-R)] + [R3] CprE 545: Fault Tolerant Systems (G. Manimaran) 11 .

Reliability calculation ² a more complicated example System B Assuming C is faulty A C E F S1 B A E D F R = Rc Rs2 + (1-Rc) Rs1 D Assuming C is fault free S2 B A E F Needs further reduction Rs1 can be calculated using parallel series formulae D .

S2 B Rs2 = RE Rs3 + (1-RE) Rs4 Assuming E is faulty A E F S4 D Assuming E is fault free S3 B A F A D F S3 B D A F .

Maintainability is given by: ± M(t) = 1 ± exp(-µt) ± Where µ is the repair rate ± And t is the permissible time constraint for the maintenance action ± µ = 1/(Mean Time To Repair) = 1/MTTR ± M(t) = 1 ± exp(-t/MTTR) CprE 545: Fault Tolerant Systems (G. Manimaran) 14 .Maintainability Maintainability of a system is the probability of isolating and repairing a ³fault´ in the system within a given time.

± Availability = System up-time / (System up-time + (System up-time * * MTTR) = 1 / (1 + ( *MTTR) ± Availability = MTBF / (MTBF + MTTR) * MTTR CprE 545: Fault Tolerant Systems (G. of failures * MTTR System down-time = System up-time * Therefore.Availability Availability of a system is the probability that the system will be functioning according to expectations at any time during its scheduled working period. Availability = System up-time / (System up-time + System down-time) System down-time = No. Manimaran) 15 .

