You are on page 1of 6

Evolving Fault Tolerant Systems

CSRP 385
Adrian Thompson
School of Cognitive and Computing Sciences, University of Sussex,
Brighton BN1 9QH, UK. E-mail: adrianth@cogs.susx.ac.uk

Abstract the practicalities of explicitly including fault


tolerance as one of the properties required of
The conventional mechanism used to gain the evolving system. I then note that defec-
fault tolerance is redundancy. In contrast, tive components can sometimes be exploited
this paper suggests that arti cial evolution by evolution as if they were working parts.
can be used to produce systems that are in- Finally, we see that the evolution of these
herently insensitive to faults, with fault tol- designs that are by their nature insensitive
erance becoming part of the task speci ca- to faults can go hand-in-hand with more tra-
tion. The possible techniques are investi- ditional redundancy approaches.
gated, and the study is grounded in a real-
world evolved electronic control system for
a robot. 2 A Real-World Example
The specimen evolved system used as an ex-
1 Introduction ample in this paper is a real electronic cir-
cuit evolved to control a real robot. The cir-
If a defect occurs in the underlying im- cuit is a simple example of \evolvable hard-
plementation of a fault-tolerant system, it ware" | a recon gurable electronic archi-
either continues una ected or undergoes tecture that can physically instantiate many
graceful degradation. In a harsh environ- possible circuits. By placing the con gura-
ment or a safety-critical application, a sys- tion under evolutionary control, it is pos-
tem might be required to retain a certain sible to evolve electronic circuits that are
level of ability even if a computer's memory evaluated by their performance as real phys-
becomes slightly corrupted, or a few transis- ical circuits implemented in hardware. This
tors fail. Fault tolerance is especially impor- technique has profound implications, but
tant in designs for integrated circuits, be- here I will only sketch out the details nec-
cause it increases the yield of usable chips essary for the rest of this paper, and the
in the presence of unavoidable silicon de- interested reader is referred to [2].
fects, permitting larger and cheaper chips. The objective was to cause a two-wheeled
Indeed, wafer-scale integration is not feasi- robot always to move, but yet to keep away
ble without fault tolerance [1]. from all four walls of its rectangular enclo-
This paper investigates the production of sure as much as possible. The evolvable
fault-tolerant designs by arti cial evolution. hardware architecture for the control sys-
Firstly, I describe an evolved electronic con- tem is shown in Figure 1: I call it a \Dy-
trol system to serve as an example in what namic State Machine" (DSM). It is based
follows. Then I shown that in some circum- on the well-known \direct-addressed ROM"
stances, evolution will automatically tend to implementation of a nite-state machine [3],
produce designs that are insensitive to some but has been endowed with the potential
faults. Next, the discussion is broadened to for much richer dynamical behaviour, by
 To appear in the First IEE/IEEE International putting many of the temporal constraints
Conference on Genetic Algorithms in Engineering under genetic control. As in the nite-
Systems: Innovations and Applications (GALE- state machine, a RAM holds a look-up ta-
SIA'95), Sheeld, September 1995. ble of the next state to follow each possi-
selection. The bitwise mutation probability
Sonars Evolved RAM Contents
was set to give an expected rate of one per
genotype, the crossover probability was 0.7,
and the population size was 30. Fitness was
measured according to the performance of
the real hardware DSM in controlling the
1k by 8 bits RAM

real wheels (which were just spinning in the


10 Address inputs 8 Data outputs

1 1 10 6 1 1
air), but the sonar echo signals were syn-
G.L. thesised in real-time by a simulator. The
products of evolution in this \virtual envi-
ronment" have been veri ed to perform well
in the real world | after about 40 genera-
tions, the behaviour is to move reliably to
G.L. the centre of the arena and wander around
there, even if started o facing into a corner.
Evolved M M Armed with this real-world example of
a piece of electronics evolved to control a
Clock

robot, the following sections discuss the evo-


Motors

lution of fault tolerance. Note, however,


Figure 1: The evolvable DSM. that the concepts are general and are not
con ned to this particular arrangement.
ble (present-state, input) combination. The
clocked state-register that would normally 3 The Evolution of Fault
hold the present state has been replaced by a
\Genetic Latch" (G.L.), which behaves like Tolerance
the state register except that which of the 3.1 Via the Genetic Mutation
variables are latched according to the clock,
and which are passed straight through asyn- Operator
chronously is under genetic control. Ge- A single-stuck-at (SSA) fault means that
netic latches also control whether any of the one signal in the system is clamped at an
inputs or outputs are clocked. All of the invariant value due to a defect. For a
latches run from a common clock, but its RAM chip, a SSA fault in the memory array
frequency is under genetic control, as is the causes a particular bit of the RAM always
contents of the RAM. to read the same (either always 0 or always
The temporal freedom available in this 1) no matter what is written to it, while all
arrangement means that the evolved DSM of the other bits function correctly. Now re-
robot controller is able to accept directly call that for the evolving DSM example, the
the echo pulses from a pair of time-of- ight contents of the RAM chip were directly en-
sonars mounted on the robot facing left and coded, bit-for-bit, onto the bit-string geno-
right, and to generate the pulse trains to type. As a consequence of this encoding, an
drive the two d.c. motors as its outputs. For application of the genetic mutation (bit- ip)
the simple wall-avoidance behaviour, only operator to the section of the genotype cod-
two of the RAM's data outputs and four of ing for the RAM causes one of the RAM's
its address inputs were enabled, so only 32 bits to be inverted | the same e ect as an
bits of RAM and six latches were placed at adverse SSA fault. This section will show
the evolutionary algorithm's disposal. that evolved systems automatically tend to
A 16-bit binary code for the clock period, have some insensitivity to faults that have
a bit for each signal passing through a ge- the same e ect as genetic mutations, as in
netic latch, and the 32 bits of RAM were this example. The phenomenon arises from
all encoded directly onto a linear bit-string the way in which the tness landscape [5]
genotype. A conventional generational ge- (the \topography" of tness values assigned
netic algorithm (GA) was used [4], but with over the space of all possible genotypes) in-
elitism (the ttest individual of each gener- uences the distribution of individuals in an
ation was always copied once without mu- evolving population.
tation into the next) and linear rank-based In considering the evolution of nucleic
acids, Eigen [6] de nes a \quasi-species" as
\a mutant `clan' that is ordered around one Mean number of
number of possible
or a degenerate set of selected master se- individuals
different genotypes
quences, containing weighted contributions 10
from all mutants present in the distribution. 1
::: This distribution, and not a single type, 1
is the target of selection." For a converged 8 1
population, which has arrived at a local op- 0 1
1
timum, it can be imagined that the popu- 6 0
lation forms a \cloud" on the tness land- 0
scape, unable to converge completely upon 4 0
the optimal genotype because of the forces 0
of mutation, but held around it by crossover 2
and selection. Consequently, optima that
have surrounding regions of high tness will
be favoured over isolated optima standing 0
out alone amongst low tness genotypes, be- 0 1 2 3 4 5
cause it is around the optimum that most of Hamming distance h(i) from 00000
the individuals will be found, not exactly at
the optimal genotype. Figure 2: Mean population distribution.
Eigen [6] summarised several experiments
which demonstrate this e ect in a model
used for the study of molecular evolution. were averaged over 100 runs of the GA.
Here, I adapt one of those experiments to The results (Figure 2) show that the pop-
the GA described in the previous section. ulation nearly always moved away from the
Consider a population of 5-bit genotypes. isolated global optimum, in favour of the
Let the Hamming distance of an individual slightly suboptimal tness peak, with its
i from the sequence 00000 be ( ), so that
h i surrounding 1-bit mutant region of medium
( ) is simply the number of `1's in 's geno-
h i i tness. In the gure, the bar for ( ) = 5 h i

type. Then de ne 's tness as: is not the highest, even though the popula-
8 10 if ( ) = 0
i

tion is converged around this point, because


>< h i
there is only one possible genotype (11111)
tness( ) = 59 ifif (( )) =
h i=5 for ( ) = 5, but there are more possibili-
>: 0 otherwise4
i h i
h i
ties for ( ) = 4 3 2 1, as indicated. The
h i ; ; ;

outcome was similar even when the elitism


This tness landscape consists of two op- mechanism was re-introduced, as long as
tima. The rst is a global optimum of 10 there was more than 10% noise added to the
for the genotype 00000, which is an isolated tness evaluations, or if the two optima were
optimum: all genotypes near it in Hamming set to be of equal tness.
space (within three bit- ips) give zero t- So far, we have considered a highly con-
ness. The second optimum is for the se- trived tness landscape having only two op-
quence 11111, and has the slightly inferior tima. To see whether the result applies to
tness of 9, but is surrounded by a region more typical situations, the NK model of t-
of medium tness, such that all ve possible ness landscapes was used[5]. N is the num-
1-bit mutants of the optimum have tness ber of binary \genes" in the genotype. Each
5. All other genotypes confer zero tness. gene's additive contribution to the overall
The GA was as described earlier (expected tness is a real-valued function of the values
mutation rate of 1 per genotype, population of K other epistatically linked genes and it-
size 30) except that the elitism mechanism self. Each gene's function and epistatic link-
was removed. To initialise the population, ages are chosen randomly and then held con-
all of the genotypes were set to the 00000 stant to de ne a static random landscape.
global optimum, and the GA was then let N is the dimensionality of the tness land-
to run. After 200 generations, the distri- scape, and K its ruggedness. K=0 gives a
bution of the population was measured by single-optimum landscape, and K= 1a N

counting the number of individuals at each maximally rugged (uncorrelated) one.


of ( ) = 0 1 2 3 4 5. The measurements
h i ; ; ; ; ; The GA described earlier (but with a pop-
ulation of 1000, a bitwise mutation probabil-
ity of 0.005, and no elitism) was applied to a 2.60
No Faults

random N=20, K=10 landscape.1 After 100 Mean Faulty


generations, the ttest individual was taken,
and the mean tness decrease caused by sin- Mean
gle mutations was calculated (averaged over Fitness
all possible single mutations). Then, a new
random N=20, K=10 landscape was gen- Mean Random
erated, and exhaustive search was used to 1.60
nd that genotype with tness closest to the
one found by the GA on the other land- 32 different SSA faults
scape. The mean tness decrease of this
non-evolved individual in the presence of Figure 3: Sensitivity to adverse SSA faults.
single mutations was then compared to that
of the evolved one. Repeating the entire pro-
cess 250 times (until the results were statis- statistically signi cant. However, Figure 3
tically signi cant) showed that the tness shows that the evolved wall-avoider DSM is
decrease of the evolved individuals in the quite robust to adverse SSA faults | obser-
presence of a single mutation was 5% less vation of the robot's qualitative behaviour
than for non-evolved ones of equal tness2 , bears this out | but it is not known how
on average. much is due to the e ect described above,
It can be tentatively concluded that when and how much is simply a property of the
using a GA, the population will tend to con- DSM architecture. The 32 possible adverse
verge upon a high- tness region of the t- SSA faults were each emulated in turn by
ness landscape in which single mutations are writing the opposite value to that speci ed
less deleterious, on average, than if a simi- by the genotype to the RAM bit in ques-
lar result had been arrived at through non- tion. For each fault, the DSM was then used
evolutionary means.3 Therefore, if the in- to control the robot (in the virtual environ-
troduction of some type of fault a ects the ment) for sixteen 90-second runs from the
phenotype in the same way as would a ge- same starting position, and the average t-
netic mutation, then the evolved system will ness was measured to give the data in the
automatically tend to tolerate a fault of that gure.
type better than a non-evolved (designed) This section has shown that when the in-
system would. The e ect will certainly de- troduction of some type of fault has the
pend upon the shape of the particular t- same e ect on the phenotype as would a ge-
ness landscape, the mutation rate and the netic mutation, then evolved systems will
population size, and it is not yet known if it tend to be less sensitive to those faults
is of any practical importance: the 5% im- than equivalent systems produced by non-
provement seen on N=20, K=10 landscapes evolutionary means. The observation is not
is small. con ned to SSA faults: if, for example, the
As mentioned earlier, this phenomenon connectivity matrix of a neural network is
should cause the evolved DSM wall-avoider directly encoded onto the genotype, then
to be less sensitive to SSA faults in the RAM evolved networks should tend to be less sen-
memory array than equivalent non-evolved sitive to spurious breaking and creation of
DSMs. Unfortunately, it has not been feasi- connections than non-evolved ones. The
ble to produce DSMs as good as the evolved magnitude of the e ect in a practical appli-
ones by other means (such as design), and cation will depend upon the particular t-
it would be too time-consuming to repeat ness landscape; it is not yet known if it is
the comparison enough times for it to be great enough to be useful.
1 Future work will determine the e ect of varying
these parameters.
2 If the exhaustive search could not nd an indi- 3.2 With an Environment of
vidual with tness very close to the evolved one, or Faults
if the GA produced a freak very-poor result, then
that trial was discarded. The argument of the previous section ap-
3 This may also apply, to a decreasing extent, to plies only when the genetic encoding is such
greater numbers of simultaneous mutations. that the faults of interest have the same phe-
notypic e ect as a genetic mutation. What Hillis' result strongly suggests that the
about faults and encoding schemes where use of a co-evolving population of faults may
that is not so? What if greater tolerance be a way to subject individuals to faults dur-
to faults is required than can be obtained ing their evaluations, but without wasting
in that way? Then the evolving system time on faults to which they are already ro-
needs to be deliberately subjected to the bust. It may then be possible to evolve toler-
faults of interest during its tness evalua- ance to all of a large set of faults of interest,
tions, so that tolerance to them is an ex- because the co-evolving faults would soon
plicit part of the task to be performed4: the adapt to thwart a group of individuals that
phenotype must operate in an environment could be seriously a ected by any subset of
of faults. The exposure to faults can most them. There is a danger that the co-evolving
easily be done in a software simulation, but populations will become trapped in a cycle,
some fault emulation is also possible in an without making useful progress: more em-
evolvable hardware architecture | the abil- pirical investigation into the applicability of
ity to introduce SSA faults into the DSM's this approach is needed.
RAM is an example (see previous section).
A problem with the \environment of
faults" method arises when only a small
3.3 By Exploiting Resources
proportion of the possible faults of interest If a particular defect persists for an extended
have a serious e ect on the system, but it period of time while the system is evolving,
is not known beforehand which those will then the behaviour of the faulty part be-
be: it depends on how the system happens comes just another component to be used:
to evolve. If, when assessing the tness of the evolutionary algorithm does not \know"
an individual, it is not subjected to all of that the part is supposed to do something
the faults during its evaluation, but rather else. For example, one of the SSA faults
to a random selection of them, then it will (the one marked with an arrow in Figure 3)
often be those individuals which are lucky was introduced as a permanent feature in
enough not encounter any crucial faults that the DSM, and the evolved controller was al-
score best, instead of those which are actu- lowed to evolve some more. At rst, the
ally better. Such very noisy tness evalua- tness of the population was dramatically
tions reduce the ecacy of the evolutionary lowered, with none of the individuals per-
process. For all but the smallest systems, it forming as well as the best of the population
is prohibitively time-consuming to test each used to, but after 10 generations the mean
individual in the presence of every possible and best tnesses of the population had re-
fault, so some way of adaptively choosing covered to their previous values. In this
those faults likely to disrupt the evolving case, the faulty part was tolerated rather
system is required. than used , but in general this need not be
Hillis [8] faced a directly analogous prob- so. This mode of fault tolerance may prove
lem in generating test cases for the eval- useful when transferring an evolved system
uation of evolved sorting networks. They between pieces of hardware having di er-
quickly evolved to sort all but a few test ent defects, or to cope with slowly changing
cases correctly, but it could not be deter- faults in the same hardware.
mined a priori which would be the \prob-
lem" tests. Hillis' solution was to co-evolve
test cases along with the sorting networks:
3.4 By Redundancy
the networks were scored according to how This paper has concentrated on how the
well they sorted the test cases, and the nature of the evolutionary process may be
test cases by how well they found aws in used to produce designs that are inherently
the sorters. The continuous and automatic fault-tolerant. However, the work reported
adaptation of test cases by co-evolution in [9, 10, 11] shows that the more tradi-
was found to be superior to simply vary- tional fault tolerance technique of redun-
ing the test cases over time or over the two- dancy (the use of spares when faults are
dimensional grid upon which the population identi ed) may be integrated into an evo-
was spatially distributed. lutionary framework. A special architec-
4 There is also the possibility that the Baldwin ef-
ture for a eld-programmable gate array
fect could occur, aiding the evolutionary process[7]. integrated-circuit is presented, which sup-
ports the \embryological" development of a References
circuit speci ed by a genome (which could
be evolved). During this development, and [1] Moritoshi Yasunaga et al. Design,
even during run-time, if some of the self- fabrication and evaluation of a 5-inch
testing cells of the array are found to be wafer scale neural network LSI com-
faulty, the chip can automatically redis- posed of 576 digital neurons. In
tribute the expression of the genome so as to Int Joint Conf on Neural Networks
avoid those cells. This promising approach (IJCNN'91), volume II, pages 527{535.
implies that the use of arti cial evolution IEEE, New York, 1991.
may be able to augment the highly e ec- [2] Adrian Thompson. Evolving electronic
tive fault-tolerance techniques already de- robot controllers that exploit hardware
veloped for hand-designed systems. resources. In F. Moran et al., editors,
Advances in Arti cial Life: Proceedings
4 Conclusion of the 3rd European Conference on Ar-
ti cial Life (ECAL95), pages 640{656.
Traditionally, humans design fault-tolerant Springer-Verlag, 1995.
systems by providing spare parts. In con- [3] David J. Comer. Digital Logic & State
trast, arti cial evolution can produce sys- Machine Design. Holt, Rinehart and
tems that are inherently tolerant to faults Winston, 1984.
by the nature of their construction, with- [4] David E. Goldberg. Genetic Algo-
out explicit redundancy. Viewing arti cial rithms in Search, Optimisation & Ma-
evolution as an automatic design process, chine Learning. Addison-Wesley, 1989.
fault-tolerance can be integrated with the [5] Stuart A. Kau man. The Origins of
behavioural requirements and respected in Order. Oxford University Press, 1993.
all aspects of the design. Some insensitiv-
ity to faults that have the same in uence on [6] M. Eigen. New concepts for deal-
the system as a genetic mutation will tend ing with the evolution of nucleic acids.
to arise \for free." Tolerance to an arbi- In Cold Spring Harbor Symposia on
trary and large set of faults can possibly be Quantitative Biology, volume LII, 1987.
achieved eciently through the use of a co- [7] G.E. Hinton and S.J. Nowlan. How
evolving population of faults, which adap- learning guides evolution. Complex
tively targets weak-spots. Implementation Systems, 1:495{502, 1987.
defects that are permanent or slowly chang- [8] W. Daniel Hillis. Co-evolving parasites
ing may even have whatever properties they improve simulated evolution as an op-
happen to exhibit put to use. Finally, the timization procedure. In C. Langton
evolutionary approach can be used as well et al., editors, Arti cial Life II, pages
as more traditional redundancy methods.
This early study suggests that arti cial evo- 313{324. Addison-Wesley, 1992.
lution may be well suited to the dicult, [9] P. Marchal et al. Embryological de-
yet rewarding, challenge of fault-tolerant de- velopment on silicon. In R. Brooks
sign, but much more empirical investigation and P. Maes, editors, Arti cial Life IV,
is needed. pages 365{366. MIT Press, 1994.
[10] P. Marchal and A. Stau er. Bi-
5 Acknowledgements nary decision diagram oriented FP-
GAs. In ACM International Work-
This research is funded by a D.Phil. scholar- shop on Field-Programmable Gate Ar-
ship from the School of Cognitive and Com- rays (FPGA'94), Berkeley, February
puting Sciences, for which I am very grate- 1994.
ful. Special thanks to Phil Husbands, Dave [11] S. Durand and C. Piguet. FPGA with
Cli and Inman Harvey. self-repair capabilities. In ACM Int
Workshop on Field-programmable gate
arrays (FPGA'94), Berkeley, February
1994.

You might also like