You are on page 1of 5

1993 International Conference on Parallel Processing

O N THE PRACTICAL APPLICATION OF A QUANTITATIVE MODEL


OF SYSTEM RECONFIGURATION DUE TO A FAULT
Gene Saghi, Howard Jay Siegel, and Jose A. B.Fortes
Parallel Pmcessing Laboratory, School of Electrical Engineering
Purdue University, West Lafayette, IN 47907-1285 USA
Abstract -- If a processor develops a permanent defined as a task. I1 is possible to achieve iaull lolerance
faull during tk wcution of a task on a dynamically in such a s y s G by utilizing the mnfigurability of the
reconjg urable parallel machine, ihree recovery options system to effectively "disconnect" the faulty component.
are lask miirarion ro afad(-fiee subdivision, task mgro- For example, parallel processing systems such as Intel
tion to mother sMbmchine, and rusk redistribuiion. Cube 131, nCUBE I21, IBM RP3 PI, and PASM [ I , 131
Quantitative models of these three reconjigurutiun incorporate pthionable interconnection networks and
schemes are analyzed, logetker with the cost of d n g therefore have the ability to migrate a task from a faulty
the wrong choice, to dewlop guidelines fur selecting submachine ta a fadt-free submachine. The architecture
among rkse methods in a prac~ical dynamic assumed implements a physically distributed memory
reconfrgnratwn implementation. A mallisrage cube or such that each pmesmr is paired with local memory to
hypercube inrer-processor network is assumed. The form a processing element m. Accesses to memory
PASM and nCU3E 2 parallel machines arc used as whi- lacations located in a remote prcwssor's memory require
clesfor sudying r k model parameters. use of the interconnection network.
T k system and fault models uxd to analyze fault-
1,INTRODUCTION recovery options and the recovery options and associated
To provide reliable operation over exlended perids cosls are p w n ted as background information in Section
of time, massively parallel processing syslems must be 2. An analysis of the range of costs for each option is
capable of blerating faults. A fault-tolerant system must examined in Section 3 to determine the relative weight of
be able to detect and locate faults, to reconfigure ilself to these costs in Ihe reconfiguration decision. The penalty
**disconnect'' and perhaps replace faulty components, to for making the wrong choice is consided in Section 4.
mover from possibly e r r o m s computations, and to Guidelines for choosing a reconfiguration smtegy in Ihe
restart operation from a correct state. When more than event of a fault are also presented in Section 4.
one reconfiguration option is available, the option that 2, BACKGROUND INFORMATION
results in the earliest completion of the task is desirable.
In [B], a quantitative model of dynamic reconfigdon A mixed-mode machine can operate in either the
was presented and it was observed hat collecting precise SIMD or MLMD rnde of pamllelism and can switch
values for dl of these parameters is very di€ficult (if not modes at instruction level granularity Ill. A parti-
impsible). The research here analyzes ranges of values tionable SIMDMMD machine can operale as one or
that these parameters can assume to develop guidetines more independent or cmperating submachines, where
for making the best reconfiguration choice. Because there each submachine may operate as a mixed-mode machine
is no guaranlee that these guidelines will produce the [ll]. The analyses here can IE applied lo MIMD,
optlrnal reconfigmtim strategy in all cases, the cost multiple-SIMD, or parblionable SIMDMMD parallel
penally of making the m g choke is also considered. processing systems,utilizing a mu1listage cube or hyper-
The analysis incorporates experimentally-derived param- cube interconnection nerwork.
eters obtained on PASM [ I , 131, an experimental At regular intervals during the execution of a task,
SIMD/MIMD mixed-mode machine with a partltionable the stare of each PE, including register and allocated
multistage cube communication network, and on nCWBE memory contents, is stored in a different PE within the
2 [6],a commercially available MlMD machine with a same submachine. This state information is called check-
partitionable hypercube communication network. point data and is used to restore a valid system state in the
This research f w w s on partitionable parallel pro- event of a fault. Further details about the system and fault
cessing systems where the set of processors can b parti- models assumed in this research can be found in [S, 101
tioned to Corn multiple independent submachines. The and m not included here due to space constraints.
execution of a parallel program on a submachine is When a permanent fault occurs in a submachine A,
three possible mnfiguration/recovery options are as fol-
This research was supported by the Office of Naval Reswrcb under lows:
grant nurnbtr N00014-904-1483and suppwred by the lnnwalive
Science and Technology Office of the Strategic Defense OrsmizAon I ) subdivide A into two equal-size system submachines,
and administered through the Office of Naval Research under conuacl and use the one that is fault-free lo complete the exe-
m b e r NOOD14-88-K-0723. culion of h e task,
1993 International Conference on Parallel Processing

2) migrate the task to another submachine that is fault- all three reconfiguration opt ions consideM is providzd in
b, Table 11. Theanalysisde~ailscanbe found in 1101.
3, redistribute the Iask pmgrms and the Table U: Approximale ranges, in rnicmsecondf. fa
fault-free PEs in A and complete the task using a TPb and TTre im the FFS,TM,and TR
modified algorithm that does not use the faulty PE. m n f i g m o n options.
These m v e r y options are discussed in detail in [8.101.
3. CHOOSING AN OPTION
3.1. Overview
me time to reconfigureand complete a task for each
reconfigmtionoption listad in Section 2 can be separated
into three primary components: time to plan for the
- lime to move the task data
reconfiguration option (TP&.
Upper bound determined by size of PE memories.
and code (TTwr). and time t complete the task execution
-
(Tee). In this section, the relative impact thm three '
'- " Add Tm for SIMD, MIMD, or mixed-mode tasks.
"'Add TDAfor W D tasks.
components has on the overall reconfigurationcost is dis- o .
< . ..
cussed to derive gurdelines for choosing the best option. 3.3. Range of T-
Experimentally determined ranges for these parameters
on the PASM prototype and the nCUBE 2 are used in the Tasks can be divided into two categories based on
analysis where applicable. Table I summarizes the most execution time. These are tasks with data-independent
important notation-u& in the sections that follow. execution times and tasks with data-dependent execution
times. A task with a data-independent execution time
does not depend on input data to make branching deci-
Table I: Summary of notation used. sions. Thus, the n u m k of times any branch in the task
program c d e is taken can be &ternin& by a compiler
during program compilation, and a compiler can deter-
mine an exuected execution time for the task.In contrast,
a task wit6 a data-dependent execution time has branch
decisions that are based on data that is known only at run
time. In this case, it is assumed that an ewated execu-
tion time for the task can be determined through the use
of empirical studies (i.e., information about task execu-
tion time on various sets of data), an automatic complex-
ity evaluator such as that presented in [S], or through
analysis of the algorithm and data sets.
For all the reconfiguration options discussed, the
number of PEs assigned to a task after the reconfiguration
is equal to or less than the number of PEs orignally
assigned to the task. It is assumed that the average time
for an inter-PE transfer does not increase when the task is
executed on fewer PEs.
'lW= FFS, TM,or TR mxnhgmation option. Once a task has been migrated from a submachine
containing a faulty PE to a huft-free submachine of equal
In practical situarions, the fault-& subdivision and size, the time 10 complete the task execution will be the
msk migrmon options may not l x available. In some sys- same as it would have been on the original fault-freesub-
Ems there may be a minimum size for a submxhine. If
Ihe current submachine is of minimum size. it cannot be
-be he estimated execution time for a
machine. Let q(2')
task on a submachine with zk PEs. It is assumed that the
subdivided. It is also possible that no idle destination sub- total amount of execution time spent on a wk prior to a
machine exists ro which to migrate a task. The task can checkpoint is stored with thal check pin^ If a recovering
h migrakd to a submachirse already being used to exe- task is to proceed from a checkpoinl and he execution
cute another task, and the two tasIrs can time-share the time stored with that checkpoinl is T, b e expected amount
submachine. This is discussed in 1101. of time required to complete task execution after migmt-
3.2. Range of TPh and T- ing the task to another submachine is
The results of an analysis of the range of values that
Th and TTmb can assume on PASM and nCUBE 2 for TF@= a s m a that the submachine size remains the
1993 International Conference on Parallel Processing

same and that all the PEs in the submachineare fault-free. extxution time. Therefore, the range on T : ~ ~is given
,
'Ihe expected range for ~g~~~ is by:
Q < ~ E & E x l c 5-Wk). TY- < T & , ~5 ~g,.
Now, consider the completion time of a task that In cases where an equal disuibution of the task load
completes execution on a subdivision that is half the size among Lhe fault-& PEs is possible, the upper bound of
of the original submachine.
lows:
cF&is bounded as fol- G h in ihe above in d i t y can be replaced by
min((2 12' - 1) T&-, 9s G-). 3y combining the
results of the inequalities for remaining execution time
GP& 5 2 ( ~ ( 2-
~ T}) = 2pwEm. determind in this subsection, the following ordering i s
In addition, it is expected that GF& > T F ~ ~
establishd. ~
becaux h e number ofprocessors in a fault-freesubdivi-
sion is assumed to be hall the number that would be avail-
able if the task was migrated to another submachine. Although the above inequality indicates that task migra-
Although m e tasks can execute faster on fewer PEs [4, tion is the best recon6guration oplion when Tc is
9, 121, it is assumed here rhat the original submachine the dominant factar. il has already been shown g r m s k
size was selected for minimum execution time. That is, if rnigfation i s nor the best option when considering TPh
a smaller submachine could be used to execute the task in and/or TTmP. Therefore, no clear choice is apparent
the same ar less time, the task would have been mapped
-. based on the analysis up ro his poinl.
to hat smaller size submachine initially.
4, PENALTY FOR WRONG CHOICE
A more accurate remaining execution time estimate
can be obtained if q(2'-' ) is known. Then,the estimated Thus far, a quantilative framework has k e n
task execution time becomes a function of the sub- developed that alrempts to relate various reconfiguraion
machine size and the estimate of the remaining execution parameters. Some of the parameteTs can be predicted
time becomes with good precision on real machines, while other param-
eters can only be coarsely bounded. The next step is to
dewmine if a heuristic can be found hat is based on the
in for ma ti^ available. In this section, a combination of
Consistent with the assumptions given above, probabilistic analysis and worst-tax analysis is used to
q(29 < q(2"-') 5 ~ ( 2 ~ Thus,
) . using eilher the ~ ( 2 ' - ') develop useful guidelines for choosing among
information, if it is kmown, or the inequalities slated in the reconfiguration options on real machines in practical
previous pmgraph, the expected range for C F k C
is: situations.
Consider the relative magnitudes of r((2k), Tph,
and TTe. In general, for tasks with short execution
An execution-time estimate for the task redismbu- times, it is better to restart the task when a PE becomes
tion recovery option is more difficull rhan for the p v i - unusable rather than permanently and significantly
ous options. Consider a task executing on a submachine increasing the execution time by including periodic
of size 2k in MlMD m d . If a PE becomes faulty and its checkpin ring. Therefore, dynamic reconfiguration is
subtasks are distributed equally to the 2" - 1 fault-free generally not considered for tasks unless the estimated
PEs in the submachine, the remaining execulion lime is execution rime for the task,~(2'). is orders of magnitude
bounded as follows: larger than TTr* and TPlm.
One of the most common cumulative distribution
functions assumed in reliability models is the exponential
distribution, F (f ) = 1 - e [ 141. represents the p m
Consider the situation where the faulty PE's subtasks bability that a PE fault will occur between time 0 and time
cannot be distributed equally among the fault-free PEs. t, inclusive. The parameter h - describes the rate at which
In the worst case, all the faulty PE's subtasks would k failures m u r in time.
assigned to a single PE and the remaining execution time The reliability funclion, R(I), is defined as
could be twice that of the remaining execution time on a R ( I ) = 1 - ~ ( r ) = e - k . For a parallel system subrnachine
fault-free submxhine. of size 2k PEs,where all the PEs must be operational for
In general, it is expected that T & ~5 G ~ FLE the submachine to be operational, Lhe submachine relia-
because h e fault-free subdivision option can be thouat
of as a subset of the task redistribution option where the
-
bility function, R,, ( r ) , is the product of the individual PE
reliability functions.
task is r e d i s m b d to half the PEs in the original sub- 2' 2'
machine. Funhermore, it is expected that RxM(r)= ~ R ( =
I ne-'
) = e-*h
T C h > T? h x d on the earlier assumption that i =l i=l
the original sGYkhine size was selected for minimum
Thus, the submachine-failure probabili~y disuibution
negligible, and T™Kfr - Tj^fr is generally ontheorder of
hundreds of milliseconds (see Table II). Thus, in this
case, f^enaity is ontheorder of T™^. (recall TT‰ = 0).
Consider the conditional probability that a failure If insteadthetask redistribution option istheoptimal
occurs at or before time .911(2*) given that a failure occurs choice for this example,theworst-case penalty would be:
at or before time T|(2*). 7 ‰ < max(7 ‰, + T‰) - min{T‰ + T‰r).
Again, the best expected time to complete execution
after task redistribution is greater than the expected time
to complete execution after task migration. For PASM
and nCUBE 2, r}‰/ry is on the order of 7"‰ (see Table
II).
This probability approaches 0.9 as T|(2*) approaches zero
from the positive direction, and it monotonically Now, consider the case where it was incorrectly
approaches 1 as T|(2*) increases. therefore, when 11(2*) assumed that (T|(2*) - x) < < TjmΨ I n this situation,
is 100 times greater than TTrnΨ,thereis a 0.9 or greater either the fault-free subdivision or task redistribution
probability that a failure will occur by time 90TTm^, option would have been chosen. If the fault-free subdivi­
given that a failure occurs by the time the program has sion option was chosen when the task migration option
completed. Thus, for this case,thereis a high probability would have been better (because in actuality
that x, the time the failure occurs, will be less than or (ri(2*) - x) > > TTmifr), the penalty for making the wrong
k
equal to 90TTrnsfr, and TCmpExec = T\(2 )-x> > TTm≠. choice is given by:
For the case where T|(2*) is more than 100 times greater
than TTrnΨ, there is an even greater probability that
T\(2k)-x>>TT .
Here, the penalty of making the wrong choice of a
reconfiguration option is examined. the worst-case
penalty, 7p‰;o,, is defined to betheworst-case difference
between the expected completion time of a task after
choosing a suboptimal reconfiguration option and the
expected completion time of a task after choosing the
optimal reconfiguration option. For example, if the task
redistribution option was chosen, but the task migration Recall the value of T|(2*) is assumed to be much
option would have resulted intheearliest completion time larger than TTm^r when reconfiguration options are to be
for the task,theworst-case penalty would be: considered. Thus, in the worst case, the penalty for
incorrectly assuming Cn(2*) - T) < < TTrnsfr is much
greater than incorrectly assuming Cn(2*)-x) > > TTrnsfr.
A similar analysis for the case where task redistribution
was erroneously chosen over task migration results in the
where the maximum and minimum refer totheranges for
same potential for a large penalty.
the parameters. Here, two cases are considered: 1) the
reconfiguration choice was made assuming that the To summarize this section, two conclusions are
remaining execution time was much greater than the made: first, it is expected that there is a high probability
time to transfer the task code and data that TcmpE∞c will be much greater than TTrnsfr when a fault
((11(2*) - x) > > TTnsfr), and 2)thereconfiguration choice occurs, and second, that the worst-case penalty for
was made assuming that the remaining execution time incorrectly assuming this is true is far less than the
was much less than the time to transfer the task code and worst-case penalty for incorrectly assuming the opposite.
data((TK2*)-T)<<r rrw> ). therefore, a mathematical justification for choosing a
First, consider the case where it was incorrectly reconfiguration option by considering only the time
k
assumed that (r\(2 ) - x) > > r rm ^.. In this case, from the required to complete the task has been established. Com­
results of Subsection 3.3,thetask migration option would bining this result with the results of Subsection 3.3, the
have been chosen. If the fault-free subdivision option is choice of reconfiguration strategy becomes one of choos­
the optimal one,theworst-case penalty would be: ing to migratethetask if an idle submachine exists. If this
option is not available, the next best option is task redis­
7 ‰ < max(7$L + T‰) - min(7 5„ + TFT™fr), tribution. Finally, if the task does not lend itself to redis­
tribution, a fault-free subdivision can be used to complete
because the best expected time to complete execution on the task.
a fault-free subdivision is greater than the expected time the model parameter value ranges established in
to complete execution after task migration. Furthermore, Section 3 are in some cases very coarse, e.g., TCmpExec for
for machines like PASM and nCUBE 2, T‰ - Tffil is tasks with data-dependent (nondeterministic) execution

III-251
1993 International Conference on Parallel Processing

times, and therefore do not provide the informaljon 199l.p~.239-251.


needed to determine the best reconfiguration ofion. 12) J. P.Hayes, T.N.Mudge, Q.F.Stout,and S. Colley.
However, the analysis in this &on has made it possible "Architecture of a hypercube supercomputer,"
to establish a good set of reconfigdon guidelines. 1986 Inr '1 Co& on Parallel Processing, Aug. 1986,
5. CONCLUSIONS pp. 653460.
The application of a quantitative model of system [3] Intel Copfation, A New Direcrion in Scienriftc
reconfiguration due to a PE fault was examined The Compuring, Order # 28009-001, Intel Corporation,
model paramekrs were categorized into one of dme 1985.
categories: time to plan for the reconfiguration option (41 R. Krislmamurti and E. Ma, "The processes parti-
(Tph), time to move the task data and code (T-1, and tioning problem in special-purposepartitionable sys-
time to complete task execution after reconhgwtim tems," 1988 Itu'l Cod. on Paralie1 Processing,
(TC-). The relative times for each ~ G g ~ o n VOI. 1, Aug. 1988, p ~434-443. .
option considered were examined for each *gory and 151 D. Le Mdtayer, "ACE: An Automatic Cmplexity
the options were ranked when possible. Accual parame- Evaluator," ACM Trans. on Progra-ng
ters collected on the PASM and nCUBE 2 parallel Languages und Systems, VoL 10, Apr. 1988, pp.
machine were used to support the analysis. 248-266.
For the system anhitectm considered, Tph.i s gen-
erally much smaller than TT4. Furthermore. when bas- 161 nCUBE Corporation, nCUBE 2 Prucessor Manual,
ing the reconf~gurationdecision only on Tr4 (ignoring Order # 101636, nCUBE Corpomtion, Dec. 1990.
Tc+,), the fault-free subtllvision or task redistribution [71 G, F.PGster et d.,"The LBM Research Paralle! Prr>-
opt~onswdl mulc in the smallest total execution time far c a m Prototype (RP3): introduction and architec-
the task. The choice between the fault-free subdivision ture," 1985 Inti 1 CUM.on P d e i Processing, Aug.
and task redisuibution options will depend on the task 1985, pp. 764-771.
being executed. [81 G, Saghi, H.J. Siegel, and J, A. B.Forks, "On the
I r was shown that for tho= tasks where dynamic viability of a quantitative model of syswm
recmfigmtion should be considered, there is a high pro- reconfiguration due to a faulc" 1992 Inr'l Conf: on
bability that the expected value of TC- wiU be grater Parallel Processing, Vol. I. Aug. 1992, pp. 233-242.
Ihan Twr. Thus, TC+ k a m e s the primary parame-
ter to consider when chmmng among mn6guration (91 G. Saghi, H. J. Siegel, and 1. L. Gray, * 'Predicting
performance and selecting modes of parallelism: a
options. When TC- is the dominant lactar. the task case study using cyclic reduction on three parallel
migralion option resulls in the earliest task complerion. machines," J. Pmalkl Md Disrribured Compuling,
Task redistribution is he next best ophon. However, in
the worst w x ,mk redistribution can require as much to appear, Oct. 1893.
time as completing the task on a fauli-£m subdivision. [lo] G. Saghi, H. J. Siegel, and J. A, B. Fortes, On a
Task execution times used in the model may be just Quantirative Model of System Recornfiguraiion Due
expected values when execution times are data dependent 10 a Fault, Tech. Rep. in preparation, EE School.
and therefore nondeterministic. Therefore, a worst case Purdue.
analysis was performed. An examination of the penalty 1111 H. J. Siegel, Interconncctiort Networks for Large-
for choosing the wrong reconfigdon option provides Scale Parallel Processing: T k o r y and Case Studies.
further justification for basing be reco~gurationdeci- Second Edition, McGraw-Hill, New York, NY,
sion on TI:-. An analysis of the worst-case penalties 1990.
reveals lhat the penalty for assuming Tcm to be much 1121 H. J. Siegel, J. 3. Annstrong, and D.W.Warson,
greater ban Tph and TTe is much less than the penalty "Mapping computer-vision-related W s onto
for assuming orherwise. Thus, using a quantitative
recontigumble parallel-processing syslems," Cum-
framework, it has been shown that task migration i s the purer, Vol. 25, Feb. 1992, pp. 54-63.
k t dynamic recovery option when TCe is expected
rn be much greater than TPtn and TTmF for tasks with (131 H.J. Siegel, T.Schwekski, J. T.Kuehn, and N. J.
nondeterministic execution times. Davis IV, "An ovaview of the PASM parallel prp
cessing system." in Computer Archrrecrure, D. D.
Aclowledgmnrs: The authors gratefully aAmowledge Gajski, V. M. Milutinovic. H. 5. Siegel, and B. P.
discussions with James Armstrong and Dan Watson. Furht, eds., lEEE Computer Swiety Press, Washing-
REFERENCES ton, DC,1987, pp. 387-407.
[I41 D. P. Siewiorek and R. S. Swan, The Theory and
[I] S. A. Finem, T. L. Casavant, and H. J. Siegel, Pracrice of Reliable System Design, Digilal Equip-
"Experimental analysis of a mixed-m& parallel ment Corp., Bedford, MA, 1982.
architecm using bimnic sequence sorting," J.
Parallel and Distributed Cumpuring, Vol. 1I, Mar.

You might also like