Professional Documents
Culture Documents
978-1-4673-6747-9/15/$31.00 2015
c IEEE 78
or the server failure in the distributed cloud services. Malladi is found in the distributed cloud environment is the hard disk
and Rao [6] they make use of check pointing and message failure.
logging methods in this paper. An efficient co-ordinate check To provide effective level of the user transparency even in
pointing protocol combined with limited sender based the presence of the failure, FDRM as shown in Fig. 2 helps to
pessimistic message logging. detect the failed server in the distributed cloud environment.
Zheng and Lyu [7] they proposed an approach in which The job migration policy adopts by the FDRM helps to
nodes test another network observation for the vicinity of the migrates the resume tasks, which are hampered during the
botch is engaged. The motivation behind this model a failure, to the replicated server. In the Fig. 2 a FDRM is
distributed algorithm is produced which tries to permit all the deployed at the particular distributed location where the
network nodes to correctly reach independent diagnosis of servers and replicated servers are working. A FDRM will
condition like faulty and non-flawed of all the network nodes communicate will all the participating servers and the
and the intermediate nodes with the communication facilities. replicated servers. The FDRM will first sends an
Girault et al. [10] proposed a solution to automatically produce acknowledgment to all the servers, the message contains the
a fault tolerant distributed schedule of a given algorithm onto a information like aliveness or status of the servers and this
given distributed architecture, where each operation of the acknowledgment will be send at the regular intervals of the
algorithm is replicated on different processors. Calheiros and time.
Buyya [11] proposed an algorithm that uses idle time of In response to this, message would be delivered by all the
provisioned resources and budget surplus to replicate tasks. participating servers or replicated server. In case server do not
Zhou and Jiang [12] proposed a composite self-adaptable sends the message(aliveness) to the FDRM, the FDRM will
hierarchical fault tolerant scheme which effectively integrates again sends an acknowledgment to all the server in that
and expands the idea of centralized and distributed fault distributed location in order to determine the state of the
tolerant methods. This executes system fault tolerance in the servers. If the replicated server does not send the message to
sequence of 'first centralized then distributed'. Altaf et al. [13] the FDRM, it FDRM assumes a fault in that server, in order to
developed supervised Artificial Neural Network (ANN) deal with it, the failed server will be globule by the FDRM
approach is used to identify various fault types such as Broken and allocate the replicated server to the cloud users. If
Rotor Bar (BRB) as well as the location of fault event within replicated server failed then other replicated server will be
an industrial motor network. allocated to accomplish tasks requested by the cloud clients.
The FDRM uses check-pointing technique and job migration
III. PROPOSED ARCHITECTURE policy to make the distributed environment fault free and
The communication in the distributed cloud environment is effective. In order to determine the failed server, the FDRM
accomplished with the help of service oriented architecture will save the address of the server or the replicated server
protocols, Common Request Broker Architecture (CRBA) etc. before and after the failure occur in the server. The job
These techniques are deployed at the middleware layer of migration will helps to allocate the replicated server to the
distributed cloud environment. In Fig. 1 when the number of failed server.
the cloud clients are growing dynamically. There will be
single cloud client, a business organization or several users etc.
This resource requirement will be either same or different
depending upon their requirements like using the cloud as a
service.
In Fig. 1 during the handling the thousands of users, the users
request taken by the active server or the server. In this case the
replication mechanism is used. Cloud client's application
process are copied and allocated to the other servers using
replication mechanism by doing this, load balancing can be
easily handle. There are various technologies have been
developed like Hadoop Mapreduce function[4], whose
purpose is to divide the tasks and allocate it to the other server
which are located in different locations. Hadoop Map decrease
function is implemented at the middleware layer of the
distributed environment. But in this case if the demand and
number of users are increasing dynamically, then there would
be the chances of the failure which are mostly found in server
failure and there are hard disk failure, Redundant Array of
Independent Disks (RAID), Memory modules, and
replacements of the components etc., one of the major failure
2015 IEEE 3rd International Conference on MOOCs, Innovation and Technology in Education (MITE) 79
Save the address (state) of the failed server using
check-pointing mechanism. Drop that failed server.
}
Else
{
Continue to process the cloud client's needs.
}
}
80 2015 IEEE 3rd International Conference on MOOCs, Innovation and Technology in Education (MITE)
In Fig. 5 all the nodes/servers respond the FDRM by
sending it the message to declare their aliveness. In Fig. 6
Suppose the node 2 becomes fail. Before node 2 sends the
notification to the FDRM regarding is aliveness.
Fig. 7. Node sending its data in replication from its adjacent nodes i.e. node
1, 6, and 3.
2015 IEEE 3rd International Conference on MOOCs, Innovation and Technology in Education (MITE) 81
In Fig.8 node 3 is sending the replication of data to node 2. A. Simulation Model
The most nearby node i.e. node 3 is sending the replicated of Our simulation model consist of 6 number of nodes
data of node 2 to FDRM. deployed in the distributed environment and connected
bidirectional.
In Fig. 9 when node 4, i.e. FDRM, assume to be failed. In this
case, node 4, i.e. FDRM sends the replication of its data 0 all B. Simulation Procedure
nodes except node 0. FDRM contain the replicated copies of Simulation inputs for proposed scheme are as follows:
the participating nodes except node 0. So when FDRM get Number of nodes (N) = 6, Number of sink Node(s) = 1, Packets
failed, before failure it firstly sends the replicated data of itself (P) = 64 bits, 128 bits, 512 bits, 1024 bits and so on,
and replicated data of all other nodes that it contains. So when transmission of data = bits/sec.
FDRM failed, other nodes perform the roles of FDRM itself. Begin
1) Deploy the number of nodes randomly as in the
distributed cloud environment.
2) Apply the proposed scheme to control topology of
the distributed cloud.
3) Compute the performance parameters.
4) Generate the graphs.
End
C. Performance Parameters
We have used the following parameters the performance
of our proposed fault tolerant scheme.
• Probability of Fault Tolerance: It measure the enable
system to continue to operating properly even in the
presence of failure in distributed cloud environment.
• Time Computation: is defined as the number of nodes
increases as the percentage of time computation will
increase in topology of distributed cloud environment
E. Conclusion
Fig. 9. Failure of FDRM.
To make the distributed cloud environment more
reliable and user transparent, an efficient algorithm is
V. IMPLEMENTATION
proposed i.e. fault detector and replication manager is
In this section we describe the simulation setup used to proposed. FDRM in the proposed architecture helps to manage
verify the fault tolerant algorithm for the distributed cloud the replication of data during failure as shown in Fig. 10. The
environment. We conducted the simulation of the proposed probability of fault tolerance is about 10% by employing the
scheme by using the "Network Simulator Version 2.0": FDRM in the proposed architecture.
82 2015 IEEE 3rd International Conference on MOOCs, Innovation and Technology in Education (MITE)
REFERENCES
[1] S. Chandrasekar and P. K. Srimani, “A new fault tolerant
distributed algorithm for longest paths in a DAG,” in Software
Reliability Engineering, 1993. Proceedings. Fourth International
Symposium on, 1993, pp. 202–206.
[2] S. H. Hosseini, J. G. Kuhl, and S. M. Reddy, “A diagnosis
algorithm for distributed computing systems with dynamic failure
and repair,” Computers, IEEE Transactions on, vol. 100, no. 3, pp.
223–233, 1984.
[3] U. Malladi, “Notice of Violation of IEEE Publication Principles
Design, analysis and performance evaluation of a new algorithm for
developing a fault tolerant distributed system,” in Parallel and
Distributed Systems, 2006. ICPADS 2006. 12th International
Conference on, 2006, vol. 1, p. 10–pp.
[4] J. Al-Jaroodi, N. Mohamed, and K. A. Nuaimi, “An efficient
fault-tolerant algorithm for distributed cloud services,” in Network
Cloud Computing and Applications(NCCA), 2012 Second Symposium
on, 2012, pp. 1–8.
[5] S. Hariri, A. Choudhary, and B. Sarikaya, “Architectural support
for designing fault-tolerant open distributed systems,” Computer, vol.
25, no. 6, pp. 50–62, 1992.
[6] P.-Y. Li and B. McMillin, “Fault-tolerant distributed deadlock
detection/resolution,” in Computer Software and Applications
Conference, 1993. COMPSAC 93. Proceedings., Seventeenth Annual
International, 1993, pp. 224–230.
[7] M. B. Bheevgade and R. M. Patrikar, Implementation of Fault
Tolerance Techniques for Grid Systems. INTECH Open Access
Publisher, 2009.
[8] B. Sapre, A. Garje, and B. B. Mesharm, “Fault Tolerant
Environment Using Hardware Failure Detection, Roll Forward
Recovery Approach and Micro rebooting For Distributed Systems”.
[9] Z. Zheng and M. R. Lyu, “A distributed replication strategy
evaluation and selection framework for fault tolerant web services,”
in Web Services, 2008. ICWS’08. IEEE International Conference on,
2008, pp. 145–152.
[10] A. Girault*, H. Kalla, and Y. Sorel, “A scheduling heuristics for
distributed real-time embedded systems tolerant to processor and
communication media failures,” International Journal of Production
Research, vol. 42, no. 14, pp. 2877–2898, 2004.
[11] R. N. Calheiros and R. Buyya, “Meeting deadlines of scientific
workflows in public clouds with tasks replication,” Parallel and
Distributed Systems, IEEE Transactions on, vol. 25, no. 7, pp. 1787–
1796, 2014.
[12] H. Zhou and J. Jiang, “CSHFt: A Composite Fault- Tolerant
Architecture and Self-Adaptable Hierarchical Fault-Tolerant Strategy
for Satellite System,” in Distributed Computing and Applications to
Business, Engineering and Science (DCABES), 2011 Tenth
International Symposium on, 2011, pp. 333–337.
[13] S. Altaf, A. Al-Anbuky, and H. GholamHosseini, “Fault
diagnosis in a distributed motor network using Artificial Neural
Network,” in Power Electronics, Electrical Drives, Automation and
Motion (SPEEDAM), 2014 International Symposium on, 2014, pp.
190–197.
2015 IEEE 3rd International Conference on MOOCs, Innovation and Technology in Education (MITE) 83