You are on page 1of 9

UNIVERSITY OF MASSACHUSETTS

Dept. of Electrical & Computer Engineering

Fault Tolerant Computing


ECE 655

Part 7
Networks - 2

C. M. Krishna
Fall 2006

ECE655/Krishna Part.7 .1 Copyright 2004 Koren & Krishna

Extra-Stage Vs. Butterfly Network


♦ Non-redundant butterfly network - the two inputs to
any switch are statistically independent
♦ Extra-Stage network - two paths connecting any given
processor-memory pair - links are dependent - previous
equations not valid
♦ First and last stages have multiplexers and
demultiplexers - different analysis than internal stages
♦ Four links leading to two
switches - two pairs are
independent, though the
links within each pair are
dependent
∗ Example - output links 0
and 1 of stage 2 are
dependent (processors 0
and 1 send requests to memory 0 through both); links 2 and 3
are dependent; the pairs 0,1 and 2,3 are independent
ECE655/Krishna Part.7 .2 Copyright 2004 Koren & Krishna

Page 1
Extra-Stage Network - Bandwidth
♦ Bandwidth BW - expected number of processors
actively communicating with some memory =
expected number of memories actively communicating
with some processor = product of number of
memories (N) and Ψ m - the probability that a given
memory (say memory 0) is non-faulty and has a
request at its input
♦Ψm is calculated iteratively - following a path from
the processors leading to this specific memory
♦ Link is in state 1 (0) if it has (does not have) a
request for memory - a faulty link is in state 0

ECE655/Krishna Part.7 .3 Copyright 2004 Koren & Krishna

Extra Stage - Bandwidth Calculation


Stage 4 3 2 1 0

♦ k+2 stages - numbered k+1 . . . 0 (k=log N)


2
∗ X i ,Y i - state of two output links in stage i
i+1
∗ X i+1 ,Yi+1 ,Zi+1 ,W - inputs to the links in stage i -
output links in stage i+1
♦ Probability of links in stage i having requests -
calculated based on the probability that a request
has been accepted at the input links

ECE655/Krishna Part.7 .4 Copyright 2004 Koren & Krishna

Page 2
Extra Stage - Bandwidth
Calculation - Cont.

♦ For the first stage (k+1 - processor stage)


♦ ;
♦ For stage k (processors are statistically independent)

ECE655/Krishna Part.7 .5 Copyright 2004 Koren & Krishna

Internal Stages in Extra-stage Network


♦ Previous expressions - assumed that a request is first
sent through the straight connection and only if this
fails - the cross connection is used
∗ Different protocol - straight or cross connection with equal
probability - expressions for the probabilities will be
different
♦ For the internal stages (i=k-1,…,1):

ECE655/Krishna Part.7 .6 Copyright 2004 Koren & Krishna

Page 3
Bandwidth Calculation - Cont.
♦ Only joint probabilities of two links are required
- can be calculated recursively from stage k+1
(processor stage) to stage 0 (memory stage)
♦ Stage 0 includes demultiplexers

♦Ψm = P( X 0=1) p l p m
♦ BW = N Ψ m
ECE655/Krishna Part.7 .7 Copyright 2004 Koren & Krishna

Extra-Stage Network - Connectivity

♦ Q - product of the number of processor-memory


pairs (N 2 ) and the probability of at least one fault-
free path between given processor and memory
♦ Each processor-memory pair is connected by two
disjoint paths (except for both ends)
♦ P(At least one path is fault-free)=P(First path is
fault-free)+P(Second path is fault-free)- P(Both
paths are fault-free)
♦ Probability can assume one of two expressions
(compare paths between processor 0 and memory 0
to paths between processor 0 and memory 1)

ECE655/Krishna Part.7 .8 Copyright 2004 Koren & Krishna

Page 4
Calculating Connectivity
♦ For the paths between processor 0 and memory 0
P(At least one path is fault-free)=P(0,0)=

♦ For the paths between processor 0 and memory 1


P(At least one path is fault-free)=P(0,1)=

♦ Half of the proc-mem


pairs follow P(0,0) and
the other half P(0,1)

♦ Q = [P(0,0)+P(0,1)] N 2 /2

ECE655/Krishna Part.7 .9 Copyright 2004 Koren & Krishna

Extra Stage Network - Additional Measures


♦ Ar and Am - Expected numbers of accessible
processors and memories, respectively
♦φ r (φ m) - probability that a given processor (memory)
is connected to at least one memory (processor)
♦ For A r calculation - link is in state X=0 (X=1) if all
(not all) paths from it to the memories are faulty
∗ A faulty path is a path that has at least one faulty link
♦ A faulty link is in state X=0
♦ Stage numbers - k+1 (processors) to 0 (memories)
♦ X i - state of link in stage i
♦φ r = pr pl P(X k+1 = 1)
♦A r = N φr
ECE655/Krishna Part.7 .10 Copyright 2004 Koren & Krishna

Page 5
Calculating Ar
♦ P(X i = 1) is calculated recursively from 0 to k+1
♦ X i , Y i - state of two links in stage i
♦ For stage 0 -
♦ P(X 0 =1) = p m ; P(X 0 =0) =1-p m
♦ For stage 1 -

ECE655/Krishna Part.7 .11 Copyright 2004 Koren & Krishna

Calculating Ar - Cont.
♦ For states 2,…,k -
♦ X i-1,Y i-1,Z i-1,W i-1 - state of 4 links in stage i-1

♦ The conditional probabilities are -

ECE655/Krishna Part.7 .12 Copyright 2004 Koren & Krishna

Page 6
Calculating Ar - Cont.
♦ For the extra stage k+1 -

♦ P(X k+1=1)=1-P(Xk+1 =0)


♦φ r = p r p l P(X
k+1
= 1)
♦A r = N φr

♦ A m is calculated similarly by interchanging pr


and p m

ECE655/Krishna Part.7 .13 Copyright 2004 Koren & Krishna

Interstitial Mesh
♦ Conventional 2-dimensional rectangular mesh network
- unable to tolerate any faults in a node

♦ (1,4) Interstitial
Redundancy

♦ A spare node can be switched in to take the place of


any of its neighbors that has failed
♦ Each primary node has a single spare node while each
spare node is a spare for four primary nodes
♦ Redundancy overhead is 25%
♦ Main advantage - physical proximity of the spare
node to the primary node which it replaces -
reducing the delay penalty due to the use of a spare
ECE655/Krishna Part.7 .14 Copyright 2004 Koren & Krishna

Page 7
Different Interstitial Redundancy

♦ (4,4) Interstitial
Redundancy

♦ Primary node has four spare nodes


♦ Each spare node is a spare for four primary
nodes
♦ Higher level of fault tolerance - higher
redundancy overhead of almost 100%.

ECE655/Krishna Part.7 .15 Copyright 2004 Koren & Krishna

Reliability of Mesh with (1,4)


Interstitial Redundancy
♦ Mesh is of size m x n with m,n even numbers
♦ Cluster - four primary nodes with one spare node
♦ Mesh has mn/4 clusters
♦ R(t) - reliability of primary or spare node
♦ Reliability of cluster
♦ R cluster (t)=R 5 (t) + 5R 4 (t) [ 1-R(t) ]
♦ Reliability of mesh
mn/4
♦ R mesh (t) = R cluster (t)

♦ No simple algorithm to
calculate the reliability
of the (4,4) interstitial redundancy scheme
ECE655/Krishna Part.7 .16 Copyright 2004 Koren & Krishna

Page 8
Non-Redundant
Crossbar

♦ 3x4 crossbar

♦ nxm crossbar - n inputs,


m outputs, nm switches
♦ Switches connect every
input-output pair of nodes
♦ Crossbar is not fault-tolerant -
failure of any switchbox will disconnect
certain pairs

ECE655/Krishna Part.7 .17 Copyright 2004 Koren & Krishna

Redundant Crossbar
♦ Adding redundancy
to make the crossbar
fault-tolerant:
♦ A row and a column
of switches are added
♦ Input and output
connections are
augmented - each
input can be sent
to either of two rows
and each output can
be received on either of
two columns
♦ If a switch becomes faulty - row and
column to which it belongs are replaced by
the spare row and column
ECE655/Krishna Part.7 .18 Copyright 2004 Koren & Krishna

Page 9

You might also like