You are on page 1of 9

Fault Tolerance

A fault-tolerant system
masks failures
exhibits well defined failure behavior
Failure types
process deaths
machine crashes
network failures: link failures, network partitions,
message losses
Remedies
No single point of failure
Fault isolation to the failing component
Fault containment to prevent propagation of the
failure
Availability of reversion modes
Global atomicity
Fault Tolerance in Project ICTP
Potential failure section
Time server computation
Time server physical crash
Client machine time sending network delay
Wrong or late time response
Time server unresponsive
Failure Type: Service Failure
Failure Type: Time Failure
Failure Type: Response Failure
occurs when the time server responds
incorrectly
the value of its output is incorrect
the state of the server is incorrect.
Time server may generate incorrect output or
transition

duplicate service point should be deployed for
remedies
Failure Type: Crash Failure
the time server crashing that means physically
service less
the time server failed to deliver service
Time server is out of service
time server may crashed and the whole time
synchronization process will fall down.
Remedies: Crash Failure
need two extra Data Recovery (DR) time
servers
heard-bit duplication of our central time server.
If the central server falls down or crash, than
the DR will take the service responsibility.
Conclusion
Decrease of availability and reliability of our
ICTS project
Sensitive and solution service points
Dedicated data recovery server,
Internet communication channel
Regular system monitoring
Multilevel and distributed check points

You might also like