You are on page 1of 18

ECE 412 Fault-Tolerant Computing

(Lecture Notes)

Dr. Jie Han


Department of Electrical and Computer Engineering
Faculty of Engineering
University of Alberta

(Reference: Fault-Tolerant Systems, by I. Koren and C.M. Krishna, Morgan Kaufmann, 2007.)

Chapter Three

1
Chapter 3. Information Redundancy

2
3
4
5
6
7
8
 Example

In single-precision checksum, the transmitted checksum and


computed checksum match.

In Honeywell checksum, the computed checksum differs from the


received checksum, so error is detected.

9
10
11
12
Adder with residue check

 When 𝐴 = 2𝑎 − 1, computing the residue for A is easy. This is


called a low cost arithmetic code.

13
14
15
16
17
 Resilient disk systems:
Redundant Arrays of Independent Disks (RAID)

 RAID Level 1 uses a copy or backup for a memory disk.

 RAID Level 2 uses Hamming-coded disks with 𝑑 data disks and


𝑐 code disks (or a (𝑐 + 𝑑)-bit codeword).

 RAID Level 3 consists of a bank of 𝑑 data disks with one parity


disk.
Equivalent to a (𝑑 + 1) -bit word:
01101 (even parity) → a bit error

(The erroneous disk is designed to indicate the error, so it


will be corrected.)

Markov model: 𝜆: failure rate; 𝜇: repair rate

 RAID 4 and 5 are modified from RAID 3 by using different


parity disks, however the reliability model is the same (with
different performance).

18

You might also like