You are on page 1of 22

DONE BY

M.SIVACHANDRAN
(B1713564)

ABSTRACT
Grid and cluster architectures is computationally
intensive parallel applications.

consisting of computational nodes, mass storage, and
interconnection networks is very complexity.

Mean Time to Failure (MTTF),decreases the
complexity.

Fault tolerance is, thus, a necessity to avoid failure in
large applications.
PROBLEM IDENTIFICATION
Any process can be check-pointed at any time.

An alternative approach releases the constraint of
always check- point table processes.

This protocol has been implemented within Pro-
Active.
PROBLEM SOLUTION
Fault-tolerance mechanisms called Theft-Induced
Check pointing and Systematic Event Logging.

Specifically, the protocols base the state of the
execution on a dataflow graph.

Efficient recovery in dynamic heterogeneous systems
as well as multithreaded applications.

SYSTEM REQUIREMENTS
Hardware Requirements:

System : Dual Core 2.6GHz.

Hard Disk : 160 GB.

Monitor : 15 VGA Colour.

Mouse : Logitech.

Ram : 1 GB More.




SYSTEM REQUIREMENTS
Software Requirements:

Operating system : - Windows XP Professional.

Coding Language : - Java,SWING

Tool Used : - Netbeans 6.9.1

PROJECT MODULES
Network Module

Logging Module

Check-pointing Module

Work Stealing Module

Fault and Fault Free Module

NETWORK MODULE
clients and servers operate over a computer network
on separate hardware.




Clients therefore initiate communication sessions with
servers which await (listen to) incoming requests.
LOGGING MODULE
Logging can be classified as pessimistic, optimistic, or
causal.




Log-based mechanisms in which the only
nondeterministic events in a system are the reception
of messages is usually referred to as message logging.

CHECK-POINTING METHOD

Checkpointing relies on periodically saving the state of
the computation to stable storage.



The consistent global state can be achieved either at
the time of checkpointing or at the time of rollback
recovery
WORK-STEALING METHOD
The runtime environment and primary mechanism for
load distribution is a scheduling algorithm called
work-stealing.



The principal mechanism for dispatching tasks in the
distributed environment is task stealing.


FAULT AND FAULT FREE MODULE
it is of special interest to analyze its overhead
associated with fault-free execution.




Faults is considered to be the rare exception rather
than the norm
Server form

Client A

Client B

Client C

Client A process

Client B process

Client C process

Conclusion
To overcome the problem of applications executing in
large systems where the MTTF approaches time of the
application, two fault-tolerant protocols, TIC and SEL,
were introduced.


The experimental results confirmed the theoretical
analysis and demonstrated the low overhead of both
approaches.

Future Enhancements
The implementation stage involves careful planning,
investigation of the existing system and its constraints
on implementation, designing of methods to achieve
changeover and evaluation of changeover methods.