You are on page 1of 8

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & ISSN

0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

TECHNOLOGY (IJCET)

ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 4, Issue 6, November - December (2013), pp. 378-385 IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com

IJCET
IAEME

PRIORITY BASED DYNAMIC ADAPTIVE CHECKPOINTING STRATEGY IN DISTRIBUTED ENVIRONMENT


Priya Deshpande Assistant Professor -MITCOE Pune Sunayna Giroti ME IT Student

ABSTRACT Dealing with a fault and its recovery in a distributed data is always a matter of concern from many years. Recovering the lost data with checkpoints is one way out. But how to store snapshots in checkpoint is a big concern as working with large number of checkpoints may result in poor system performance. This paper deals with the dynamic adaptive checkpointing strategy in distributed system, which takes into account an important issue i.e. storing checkpoints on namenode for failure recovery on the basis of priority given to the datanodes. So that bandwidth consumption of the network can be decreased. For this purpose we have suggested new architecture which will help us define this strategy. This strategy optimizes the process of checkpointing by consuming lesser bandwidth then the usual one. Keywords: Dynamic checkpointing, Access calculator, Priority scheduler. I. INTRODUCTION With the increasing popularity of distributed environment, application such as Life science, Telecommunication, nuclear research and many more are using the system for performing important tasks. Therefore data of these applications must be stored in a secure way or should be easily recoverable at the time of failure. One of the well known ways to recover faulty data in distributed system is by using checkpoints. Checkpoints provide a system with ability to save its present state in the form of snapshots, and tolerate failure by enabling a failed datanode to recover to a previous safe state [5]. Whenever a fault takes place in the system, checkpoint is executed to recover it. Presently most of the checkpointing strategy are periodic i.e. either checkpoints are stored in constant time interval or they are stored in variable time dependent upon requirement of the system. But in both the ways the bandwidth consumption of the system is high as they are storing each and every data weather it is important or not. The checkpoints must be designed in such a way that they dynamically take the snapshot of the memory structure which is important part of datanode and save them first.
378

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

But the concern is to recognize from the large amount of data, is which is more important and which is less. As the answer of this question we have designed the system on the basis of the priority given to the datanode dynamically. So that, the bandwidth consumption of the system is decrease. For this we have calculated how many times the datanode is accessed by the users and on the basis of this calculation we have provide the priority to the data. These priorities will decide which datanode get the checkpoint first and which will get second. This will reduce the number of checkpoints hence the bandwidth consumption of whole system will be low. The rest of the paper is organized as follows. Section 2 describes the related work. Section 3 gives the overview of proposed architecture. Section 4 describes the dynamic adaptive checkpointing. And finally in section 5 we conclude. II. RELATED WORK In the distributed environment checkpointing strategies are common to for handling faults. There is always a concern about data whenever we talk about fault tolerance. What so ever are the strategies use to perform checkpointing; goal is always to recover the faulty data. But performing checkpointing always reduces the system performance as we need to keep a copy of data in form of snapshot which consume a lot of memory. Dealing with this concern there are many researches going in this area. John De Vale[5], provided a basis idea about checkpoints, faults, checkpointing and recovery of the system using these methods. One of most researched approach in the field of checkpointing is a diskless checkpointing. Ge-Ming et al.[1] proposed a neighbor-based scheme on diskless checkpointing to achieve good load balancing. Whereas Raphael Marcos et al.[2] used diskless checkpointing to increase system performance by deceasing number of checkpoints using quasi-synchronous protocol. Similarly, Leonardo et al.[9] overcome the drawbacks of disk-based model using diskless-based model to increase the scalability of the system. In [6], two time based checkpointing were compared i.e. full checkpointing and incremental checkpointing to get the better model. In [4], a full checkpointing is used over incremental to introduce dynamic adaptive fault tolerance model so that serviceability can be maximized. Whereas in [8], two different checkpointing models are used i.e. local checkpointing and global checkpointing, coordination between these two checkpointing models is used for increasing system performance and decreasing time interval between two checkpoints. Maria Chtepen et al.[7] proposed a periodic checkpointing to reduce system load and to increase availability of system using heuristic approach. III. PROPOSED ARCHITECTURE Proposed architecture of our checkpointing strategy is shown in figure 1.

379

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

NameNode

Secondary NameNode

Priority Scheduler

AC

AC

AC

CE

SE

CE

SE

CE

SE

Figure 1: Proposed Architecture[10] Component of our architecture are as follows: NameNode- It is a master server which allows access to the data stored in it. It is responsible for operation such as opening, closing and renaming files and directory [10]. In short it consists of metadata of system. It is configured to support and maintain checkpoints. So, data can be recovered at the time of failure. Any update in the files took place synchronously. Secondary NameNode- It is a copy of namenode. The only purpose of secondary namenode is to provide backup to the system. That is, when namenode get fail data can be recover from the checkpoints store in secondary namenode. DataNode- It is used to manage stored data at every node in the system. Datanodes are responsible for serving read and write request from users [10]. Datanode is consist of an access calculator (AC) which calculates access of particular data record in a datanode, computing element (CE) and storage element(SE). AC (Access Calculator)- It will calculate the total access of data record in a datanode between time t1 and t2. It also calculate the time interval for sending snapshots to namenode using value which will be send by priority scheduler. CE (Computing Element)- Each datanode contain 1 or more computing elements for its computing capability. SE (Storage Element)- Each datanode contain 1 or more storage elements to represent its storage capacity. Priority Scheduler- It will calculate the priority of the data record in datanode on the basis of the result of the access calculator i.e. access count of a file in datanode.
380

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

Whole process goes like this: All the data in forms of files is stored in the datanode. Each datanode is embedded with the access calculator in it. These access calculators calculate how many times a particular data record is accessed by the users. The access count of a data record will send at constant time interval to the priority table. Priority table consists of list of records most recently accessed by the users. Depending upon the result of the access calculator; priority scheduler calculates the priority of each data record. Details of access calculator and priority scheduler are described in section 4. The snapshots of data record with the higher priority get the checkpoint first. This checkpoint is then stored on the namenode. Namenode updates the record of each checkpoint synchronously. The copy of each checkpoint is also stored on secondary namenode. So that, at the time of namenode failure data can be recovered from there. The manner in which checkpointing is happening in our proposed system should minimize the overall bandwidth consumption. IV. DYNAMIC CHECKPOINTING Here we have taken some assumptions. The failure occurred in the system are not transits and failure can be detected at the run time also. The reason behind taking these assumptions are that dynamic checkpoints, we are using in our system are always present to recover the fault and the failure can be recovered from these checkpoints. The fault occurring in the system can always be recoverable till it is not permanent [2]. The checkpoints are getting snapshots of the data record of the higher priority first. These priorities can be taken from priority table. The priority of the datanode is assigning on the basis of the result calculated by access calculator. Access calculator calculates, how many times a datanode is being accessed by the system users. With accordance of these results checkpoints get their respective snapshots. These checkpoints store on namenode as well as secondary namenode. Whenever we talk about implying checkpoints dynamically so that bandwidth can be consumed less, there are always two things we have to keep in mind: How to decide which data record is important? How to provide priority to that data record?

The answer to both the questions lies in the two algorithms discussed below: A. ACCESS CALCULATOR It will calculate the access count on the particular datanode. That is how many times a data record is accessed by users. The count is calculated for the most recent time duration i.e. between current time and the time when last time access count were generated. Access count of each record is sending in some predefined constant time interval. After every predefined constant time S an updated access count will be send to the priority table. Every time a data record is accessed in between last count set and current time, the access count will be increased by 1. Time interval for sending snapshots to the namenode is also calculated here. Below are the terms used in the following algorithm: t1: Last count set time. t2: Current time. tn: Predefine constant time for sending snapshots. p: Priority which will be send by priority scheduler.
381

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

Count=0 For each access entry in datanode between t1 and t2 { Count++; } Send (datanode id, count) tm =tn *((p+1)/2)

B. PRIORITY SHEDULER Priority scheduler is calculating the priority of the data records in the datanode on the basis of which we are assigning checkpoints for respective snapshots. The priority of a data record is decided on the results of access calculator i.e. how many times a data record is access by the user. The priority scheduler works on two different loops. First, for loop is to check whether the data record is present in the priority table or not. If not then it adds a new data record in the table. Also if data record is already present in the priority table is updating its value with the help of new access count calculated in access calculator. Second, for loop check the already existing data record in the priority table i.e. whether access count of a data record is same or its changing. If the value of access count of a data record is same for more the three times that record will be delete from the table. Else it will provide average priority to all the data records in the table.
Priority scheduler (access info list) New priority=0, last access count=0 For each record in access info list { If record.access count != last access count { New priority++; last access count= record.access count } If record in priorityTable

382

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

{ Update record Set (absent count=0, priority= new priority, access count= new access count value) Else { Add new record in priority table } } For each record in priority table { If record is in access info list { If record.absent count <= 3 { Average priority = average (priorities in priority table) Update record in priority table Set (absent count= absent count +1, priority = avg priority) } else

{ Remove Record from priorityTable } } }

383

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

Terms used in algorithm Access Info List- It is the list of all the data records in the datanodes with access count of eachdata record. These records are sorted in reverse order. Priority Table- This table contains the record of datanode id and absent count of each data record. It also contains priority of each data record respectively. Absent Count- It is used to calculate, how many times a particular data record has not been accessed. Record Id- It is a unique identification number given to each data record in the system.

V. CONCLUSION In this paper, we proposed a dynamic adaptive checkpointing strategy which first calculates how many times a data record is accessed. On the basis of access count of a data record a priority is given to that data record. The data record with the higher priority will send their snapshots first and its checkpoint is saved on namenode. If a data record is not access for a predefine time interval, it will be removed from the priority table. Previously the snapshots were send on constant time interval which consume large network bandwidth. But using our strategy may reduce the network bandwidth consumption as checkpoints are stored dynamically on the basis of priority. This will also increase overall performance of the system. But still there are many areas need to be considered for the improvement of performance in distributed environment. VI. REFERANCES [1] Ge-Ming Chiu and Jane-Ferng Chiu, A New Diskless Checkpointing Approach for Multiple Processor Failures. Ieee Transactions On Dependable and Secure Computing, Vol. 8, No. 4, July/August 2011. Raphael Marcos Menderico and Islene Calciolari Garcia, Diskless Checkpointing with Rollback-Dependency Trackability. 2010 29th Ieee International Symposium on Reliable Distributed Systems. Yibei Ling, Jie Mi, And Xiaola Lin, A Variational Calculus Approach to Optimal Checkpoint Placement. Ieee Transactions on Computers, Vol. 50, No. 7, July 2001. Dawei Sun, Et Al., Analyzing, Modeling and Evaluating Dynamic Adaptive Fault Tolerance Strategies in Cloud Computing Environments. Springer Science+Business Media New York 2013. John Devale, Checkpoint/Recovery. 18-849b Dependable Embedded Systems February 4, 1999. N. Naksinehaboon, High Performance Computing Systems with Various Checkpointing Schemes. Int. J. Of Computers, Communications & Control, Issn 1841-9836, E-Issn 18419844 Vol. Iv (2009), No. 4, Pp. 386-400. Maria Chtepen, Et Al., Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids. Ieee Transactions On Parallel And Distributed Systems, Vol. 20, No. 2, February 2009. Mehdi Lofti and Seyed Ahmad Motamedi, Adaptive Two-Level Blocking Coordinated Checkpointing for High Performance Cluster Computing Systems. Journal of Information Science and Engineering 26, 951-966 (2010).
384

[2]

[3] [4]

[5] [6]

[7]

[8]

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), IAEME

[9] [10]

[11]

[12]

[13]

Leonardo Bautista Gomez, Et Al., Distributed Diskless Checkpoint For Large Scale Systems. 2010 10th Ieee/Acm International Conference on Cluster, Cloud and Grid Computing. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, Yahoo!, Sunnyvale, California Usa, {Shv, Hairong, Sradia, Chansler}@Yahoo-Inc.Com, The Hadoop Distributed File System. Preeti Gupta, Parveen Kumar and Anil Kumar Solanki, A Comparative Analysis of Minimum-Process Coordinated Checkpointing Algorithms for Mobile Distributed Systems, International Journal of Computer Engineering & Technology (IJCET), Volume 1, Issue 1, 2010, pp. 46 - 56, ISSN Print: 0976 6367, ISSN Online: 0976 6375. Parveen Kumar and Poonam Gahlan, A Minimum Process Synchronous Checkpointing Algorithm for Mobile Distributed System, International Journal of Computer Engineering & Technology (IJCET), Volume 1, Issue 1, 2010, pp. 72 - 81, ISSN Print: 0976 6367, ISSN Online: 0976 6375. Priya Deshpande, Brijesh Khundhawala and Prasanna Joeg, Dynamic Data Replication and Job Scheduling Based on Popularity and Category, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 5, 2013, pp. 109 - 114, ISSN Print: 0976 6367, ISSN Online: 0976 6375.

385

You might also like