You are on page 1of 2

Fail-Safe Mechanism

Suppose I am developing a fail-safe mechanism for Arduino (Or any other microcontroller).
In other words a secondary microcontroller or a seperate board should get the responsibility
when the primary controller fails.
Two possible mechanisms are as follows.
Method 1 - Client Server Mechanism
There are 2 identical systems which are powered separately.

The secondary system sends a request periodically and the primary system replies.

If the primary system fails to reply (several times) the secondary system becomes in
charge.
Method 2 - Heart Beat Mechanism
There are 2 identical systems which are powered separately.

The primary system sends a periodic heartbeat message.

If the heart beat is there the secondary node knows that the primary node is up.

When there is no heart beat the primary node is assumed to be dead. Secondary node
gets the control.
Do you guys know any better mechanism to implement this?
Typically in commercial embedded systems, a watchdog timer would be utilized to reset the
processor in the case that it fails to respond by periodically "kicking the dog". All AVR
microcontrollers (and many if not most other brands as well) have an internal watchdog
timer. Though a design with an independent, external watchdog timer is typically more
robust and reliable. Like this:

For systems that require an even higher degree of fault tolerance, for instance aerospace
applications, triple redundant or triple modular redundant architectures are used.
In a triple redundant system, three identical processing components perform the same task
at the same time. The result is then sent to a voting circuit or what John von Neumann
called a "majority organ" (Section 4.2.2). The output of the voting circuit is the majority
opinion of the three processing components.

This allows for one of the processing components to fail without affecting the operation of
the system. However, if the voting circuit fails, then the whole system fails as well. A triple
modular redundant system does away with this single point of failure by implementing three
voting circuits as well.

Eventually though, the three outputs will need to be combined into one result again leading
to a single point of failure. Even if that point of failure is the human looking at three gauges,
each monitoring the same temperature.
What you need to determine is just how fault-tolerant you need your system to be and what
kind ofmean time between failures (MTBF) your system can handle. Then design your
redundancy system around that.

You might also like