the long run. The former places the burden of incorporating fault-tolerance techniques intothe hands of application programmers, whilethe latter only works for specializedapplications. Even in cases where fault-tolerance techniques have been integrated into programming tools, these solutions havegenerally been point solutions, i.e., tooldevelopers have started from scratch inimplementing their solution and have notshared, nor reused, any fault tolerance code. A better way is to use the compositionalapproach in which fault-tolerance expertswrite algorithms and encapsulate them intoreusable code artifacts, or modules.
: In this a faultmonitoring unit is attached with the grid. The base technique which most of the monitoringunits follow is heartbeating technique. Theheartbeating technique  is further classified into 3 types:
- Centralized Heartbeating
- Sendingheartbeats to a central member creates a hotspot, an instance of high asymptoticcomplexity.
- Ring Based Heartbeating
- along a virtualring suffers from unpredictable failuredetection times when there are multiplefailures, an instance of the perturbation effect.
- All-to-all heartbeating
- sending heartbeatsto all members, causes the message load in thenetwork to grow quadratically with group size,again an instance of high asymptoticcomplexity
: Checkpointing androllback recovery provides an effectivetechnique for tolerating transient resourcefailures, and for avoiding total loss of results.Checkpointing involves saving enough stateinformation of an executing program on astable storage so that, if required, the programcan be re-executed starting from the staterecorded in the checkpoints. Checkpointingdistributed applications is more complicatedthan Checkpointing the ones which are notdistributed. When an application is distributed,the Checkpointing algorithm not only has tocapture the state of all individual processes, but it also has to capture the state of all thecommunication channels effectively.Checkpointing  is basically divided into 2types:
- Uncoordinated Checkpoint
: In this approach,each of the processes that are part of thesystem determines their local checkpointsindividually. During restart, these checkpointshave to be searched in order to construct aconsistent global checkpoint.
- Coordinated Checkpoint
: In this approach,the Checkpointing is orchestrated such that theset of individual checkpoints always results ina consistent global checkpoint. This minimizesthe storage overhead, since only a singleglobal checkpoint needs to be maintained on
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 7, October 2010226http://sites.google.com/site/ijcsis/ISSN 1947-5500