You are on page 1of 6

2011 Workshops of International Conference on Advanced Information Networking and Applications

Markov Chain based Monitoring Service for Fault Tolerance in Mobile Cloud Computing*
JiSu Park, HeonChang Yu
Dept. of Computer Science Education Korea University Seoul, Korea {bluejisu, yuhc}

KwangSik Chung
Dept. of Computer Science Korea National Open University Seoul, Korea

EunYoung Lee**
Dept. of Computer Science Dongduk Womens University Seoul, Korea

AbstractMobile cloud computing is a combination of mobile computing and cloud computing, and provides cloud computing environment through various mobile devices. Recently, due to rapid expansion of smart phone market and wireless communication environment, mobile devices are considered as resource for large scale distributed processing. But mobile devices have several problems, such as unstable wireless connection, limitation of power capacity, low communication bandwidth and frequent location changes. As resource providers, mobile devices can join and leave the distributed computing environment unpredictably. This interrupts the undergoing operation, and the delay or failure of completing the operation may cause a system failure. Because of low reliability and no-guarantee of completing an operation, it is difficult to use a mobile device as a resource. That means that mobile devices are volatile. Therefore, we should consider volatility, one of dynamic characteristics of mobile devices, for stable resource provision. In this paper, we propose a monitoring technique based on the Markov Chain model, which analyzes and predicts resource states. With the proposed monitoring technique and state prediction, a cloud system will get more resistant to the fault problem caused by the volatility of mobile devices. The proposed technique diminishes the volatility of a mobile device through modeling the patterns of past states and making a prediction of future state of a mobile device Keywords-component; Monitoring, Monitoring Time Interval, Mobile Cloud Computing, Markov Chain, Pattern



Mobile cloud computing offers pay-as-go cloud computing environment with various mobile devices that support mobility. Mobile devices refer all kinds of devices that have mobility, such as laptops, PDAs, tablet PCs, and smart phones. Previous mobile devices were notorious for restricted battery power and low CPU performance. However, the computing power of the latest mobile devices is getting as fast as that of desktop computers. The battery capacity is also growing, and the number of users who use mobile devices is rapidly increasing. Especially, more people use mobile devices regularly in campus or in office than ever. This trend leads researchers to try to utilize mobile devices in cloud computing. Researches on utilizing mobile devices in mobile cloud computing can be categorized into mobile devices-asinterface and mobile devices-as-resource. Most of previous researches in mobile cloud computing have been focused on
* This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(2010-0005460) ** Corresponding author
978-0-7695-4338-3/11 $26.00 2011 IEEE DOI 10.1109/WAINA.2011.10 520

utilizing mobile devices as interface. The research on utilizing mobile devices as resources in mobile cloud environment gains attentions recently, because the population of smart phone or other mobile device users grows fast. In order to use mobile devices as resource, several problems must be solved, such as unstable wireless connection, limitation of power supply, low communication bandwidth and frequent location changes. Because a join or a leave of a mobile device is unpredictable, the undergoing process can be also interrupted unpredictably. This interruption causes the delay of operation completion, and could lead a system to a fault. Therefore, operations on mobile devices are not guaranteed for completion. This reduces the reliability of mobile devices and prevents mobile devices from being used as resource. Therefore, the dynamic characteristics of mobile devices must be considered and solved, in order to guarantee the stable usage of mobile devices as resources. In order to solve the above problems, previous researches focused on fault tolerance techniques. Resource scheduling and fault tolerance techniques calculate state information through monitoring resource information. But, if correct resource information is not provided timely, the incorrect information would cause an accuracy problem. Therefore, a monitoring scheme that can collect and analyze dynamic state information is required in order to ensure the stable participation of resources. Monitoring schemes need to be adaptive dynamically in real time in order to monitor correct state information and reflect characteristics of mobile resources. In this paper, we propose a monitoring technique based on Markov chain, which can analyze resource states more precisely in order to solve the fault problem that occurs by the volatility of mobile devices. The proposed technique can deal with the volatility of mobile devices by modeling the patterns of operations performed in the past and predicting the type of future operation states. The predicted information is used for fault tolerance, and it improves the reliability and performance of the system. The rest of the paper is organized as follows: Section 2 presents the related work on monitoring services. Our mobile cloud system architecture and components which is used for the suggested monitoring scheme is described in Section 3. Section 4 proposes a monitoring technique based on Markov Chain model with Viterbi algorithm. In section 5, we present a monitoring time interval rates and the accuracy of predicted values. We draw a conclusion and discuss some future work in Section 6.



The monito oring services use the pull m model and the push e m model [1][2]. In the pull mo odel, a server s send a messag to a ge c client in orde to request r er resource infor rmation, but i the in p push model, r resource inform mation is sent from a clien to a t nt s server accordi to the mo ing onitoring polic of the serv In cy ver. t pull mode the monitor the el, ring overhead is relatively small, d b because the r resource infor rmation of a client is requ uested w whenever the resource infor rmation is nee eded, But this m model h a long res has sponse time an is not wide used in dyn nd ely namic e environments because requests are sent t clients rega to ardless o their states. In the push m of . model, the mon nitoring inform mation i statically collected, be is ecause a sys stem adminis strator d determines the monitoring time intervals If the moni s. itoring t time interval is very sho ort, the overh head of collecting m monitoring in nformation in ncreases. The scheme, how wever, c cannot keep correct st p tate informa ation in dyn namic e environments if the interv is very lo val ong. Therefor we re, p propose a mon nitoring schem that can cha me ange the moni itoring t time intervals in dynamic en nvironments. he g Huh et al. [3] tried to determine th monitoring time i interval by observing th dynamic state of res he source i information an this scheme is based on t push mode But nd e the el. t scheme u the utilizes only t informatio of CPU a the on among v various resour rces so it is d difficult to use the scheme in the e m mobile cloud environment where resou t urces change more r rapidly. MDS4 was deve 4[4] eloped as a par of Globus p rt project a used for m and monitoring an selecting gr resources. Since nd rid M MDS4 is base on the pul model, it is difficult to d ed ll deploy M MDS4 in the dynamic mob environm bile ment. OVIS[12 is a 2] r resource mo onitoring too in the cloud comp ol puting e environment. OVIS can characterize dynamically the y r resource and application state, and manage opti d imally r resources based on the mo onitored inform mation. OVIS uses S s statistical ana alysis to scal data collec le ction and res source a allocation. OV VIS, however is for wired environmen and r, d nt s statistical anal lysis is only u used to find res sources of the same e a attribute. er, e sed In this pape we propose a monitoring technique bas on M Markov chain which can analyze res n, n source states more p precisely in or rder to solve th fault proble that is caus by he em sed t volatility o mobile devi the of ices. III. SY YSTEM MODEL L

T mobile cloud middlewa consists of Job Schedul The are ler, Fau Tolerance Manager, Jo Manager, and Monitori ult ob ing Ma anager. Monit toring Manag decides m ger monitoring tim me inte ervals from th collected m he mobile resour rces informatio on. Job Manager man b nages job ope erations in a m mobile device a and Job Scheduler allocates a job t mobile reso b to ources accordi ing to m monitoring inf formation. Fau Tolerance M ult Manager predi icts the fault occur rrence and supply techn niques such as che eckpoints and r replication.

Figure 1. Mobile Cloud System Architecture e d

A mobile d device consists of Conn nection Modu ule, Mo onitoring Pro ovider, and Job Exec cution Modu ule. Con nnection Mod dule manages the variety of networks to s mak a connec ke ction to the mobile clou middlewa ud are. Mo onitoring Prov vider collects the mobile devices st s e tate info ormation and sends the re esources state information to e Mo onitoring Man nager. Job E Execution Mo odule runs jo obs rece eived from Job Manager an sends the pr b nd rocessed result to t Job Manager. b B. Using Pattern of the Mobil Devices n le I a mobile environment (LAN, 3G Network, et In tc.) netw work connecti is mostly a ion available, but there exist som me regions where ne etwork connec ction is impossible such as t the derground and moun ntainous are eas. Previo ous und rese earches[13][14 for mobile environment have analyz 4] ts zed the mobile usag in the WL ge LAN environm ment, which are nerally in scho ools or comp panies, and sh howed that there gen exis a usage pa sts attern of mobil devices over time. Song a le and Yu[15] showed t that there exists a usage pattern of mob bile vices. The data was collected from the wir a d reless network of k dev Dar rtmouth Unive ersity for 6 mo onths between December 20 n 005 and May 2006. d I this paper, we used the u In usage patterns of [15], who s ose nitoring infor mon rmation was acquired fro om the mob bile dev vices of a univ versity campus The followi figure sho s. ing ows the utilization pat ttern that our r research team analyzed.

A Mobile Clo System Ar A. oud rchitecture Mobile clou computing is a combina ud g ation of mobil and le c cloud computi ing, and offers a cloud com s mputing environ nment t through vario ous mobile d devices. How wever, due to the p problems such as heterogen h neity among m mobile devices low s, n network bandw width, and hig ghly intermitte connection it is ent n, d difficult to in ntegrate mobi devices di ile irectly with m mobile c cloud environm ments. To med diate between mobile device and es w wired grids, a proxy is used. The roles o the mobile cloud of m middleware a as follow are wing: supplem menting insuff fficient p performance o mobile dev of vices, connect ting mobile de evices t a cloud p to platform, and managing m mobile devices The s. p proposed mob cloud arc bile chitecture is s shown in Figu 1. ure T function o each compo The of onent is as follows.

Fig gure 2. Connectio count for a wee on ek

F Figure 2 shows that the conne s ection counts ar almost the sa re ame duri the weekday but are decre ing ys, easing in the wee ekends.


N Network Ban ndwidth mea ans a remain ning bandwid dth calc culated by sub ubtracting the current netwo traffic usa ork age from the maxi m imum availa able bandwid dth. It chang ges acc cording to the network traff usage over a fixed interv fic val. The erefore, Netw work Bandwi idth information is used to calc culate the ut tilization rate of the netw e work bandwid dth ( ).

Figure 3. Average conne ection counts of m mobile devices

Figure 3 sho the number of connected m ows r mobile devices over a w week. The grap also shows th difference of connections be ph he f etween w weekdays and w weekends.

is an ava ailable bandw width at time t, C is a capac city of t maximum available ban the m ndwidth, and is a netwo ork traf ffic. L Location info ormation uses GPS inform s mation and t the dist tance is calcu ulated by sub btracting the v value of current loca ation from the value of AP (Access Poin location. T e P nt) The valu stands for the distance b ue between the ce enter of AP ar rea and the current lo d ocation.

Figure 4. Connection coun of mobile dev nts vices for 4 terms

Figure 4 sh hows that the n number of users is decreasing in the s v vacation period ( , IV. ), and in ncreasing in the s semesters ( , ).

is the rate of distance b e between the center of AP ar rea and the curren location, d nt is the communicati ion cov verage of dista ance of AP, the center of AP of current area and a is t current loc the cation of resou urce. B. Markov Chai Modeling fo Predicting F in or Faults M Monitored sta informatio can be classified into 3 ate on o cate egories by the possibility o fault occurr e of rence. Each st tate is d defined as follows: an ope eration-availab and no-fa ble ault stat (Stable Stat an operatio te te), on-available an possible-fa nd ault stat (Unstable S te State) and an o operation-unav vailable state d due to f faults or netw work disconne ection (Disabled State). Thr ree stat values are s based on C te set CPU utilization rate ( n ), t this clas ssification com CPU Utili mes ization Guide of IBM [16].


A Resource I A. Information D Definition CPU power, memory, n network bandw width, and location a are the exam mples of dy ynamically c changing reso ources i information o mobile dev of vices. The inf formation col llected f from mobile d devices is use to calculate the utilizatio rate ed e on o each mobil cloud resou of le urce. The utili ization of CPU and U t utilization of memory are calculated by the follo the n d owing f formula.

The subscr ript util denotes the uti ilization rate of a r resource, user denotes th utilization r he rate of a user, sys , d denotes the ut tilization rate of a system, cache denot the tes u utilization ra of cache memory, t ate total denote the es m maximum ava ailable utilizati rate. ion

M Markov Chain Model (MC n CM) is usual defined as a lly s mat trix shown in Figure 5. In matrix P in Figure 5, st n n n tate means Stable State at tim I. Therefore the possibil me e, lity


o each state a time I can b written as of at be r respectively.

and d


Figure 5. Matrix Table P .

m means the prob bability of tra ansition from at t time I to at time J. An the matrix in Figure 5 must nd x s satisfy the nex time conditi xt ion

is the trans sition probabil lity( ) t that any state at e tim I transits to any state at time J. is number th me o t hat eac transition occurs. For example, ch is number to tran to and is number to transition fro nsition from o om to . K ind dicates the num mber of past states, on whi ich the value of cu urrent state de epends. Estim mated value th hat ectively reflec N past pat cts tterns can be obtained by t the effe tran value. nsition probab bility value of t biggest P v the D. Accuracy Imp provement F Failure occur rrence can b predicted by the st be d tate pro obability of each monitoring interval. In o g order to impro ove the accuracy of p prediction, the most probab value and t e ble the opt timal status pa is extracte and the prediction of st ath ed tate tran nsition is calc culated using Viterbi algorithm [17]. T The pro obability of all the paths on a predicted s l n state (ps) can be calc culated by m multiplying fo orward calcul lation (FT) a and bac ckward calcu ulation (BT). Forward ca alculation is as foll lows.

Therefore, the state trans sition probabi ility from tim J to me t time I can be c calculated as f follows.

m is the num mber of states for  an nd  . The predict tion of probab bility of the n next state need the ds p probability of the previous state. The data of the pre f s evious s state is used to calculate t probability of the next state. the y for T informati from the p The ion previous state is also used f an i initial state probability. An initial st tate probabili ity is c calculated as f follows.

is an arbitra state, and records the optimal path at ary e h eve step. BT i calculated by the same method as F ery is FT. Opt timal state pat is extracted as follows. th

is an ini itial state prob bability of a s state i (if , t then is an initial state probability of ) and m is the n n number of stat for tes and . C. Transition Probability E n Estimation If previousl recorded d ly data is insuffic cient, there ar two re m methods to es stimate transiti probability One is to al ion y. llocate s subjective va alues and a another is to yield tran o nsition p probability u using statistic cal data. In this paper the n r, i information of the previous state is collec from the d of f cted data t same day a week ago. We use the m the maximum likel lihood f function to yie transition p eld probability fro collected da om ata.
Fig gure 6. MCM usin Viterbi algorith ng hm


expresse an optimal path, Q is an observed state es l d p path( ,, ). Figure 6 show a M MCM using V Viterbi a algorithm. E Monitoring Interval E. g The more jobs are pr rocessed, the more monitoring i information i produced. Likewise, th fewer job are is he bs p processed, th less monit he toring inform mation is prod duced. B Because the monitored inf formation cha anges accordi to ing r resource utiliz zation, the mo onitoring time interval has to be e r related with a job processi time. At the same tim the ing me, m monitoring ti ime interval is also relate with each state ed b because the ra ange of resourc utilization d ce differs in state es If the monit toring time int terval is short at , unnece t essary i information is accumulated and an over s d rhead occurs d to due i information collection. But if the monit t, toring time in nterval i long, it is difficult to analyze the n is necessary state for r resource usag or sudden faults. The ge n erefore, if a value p predicted by MCM is , the monitori time inter ing rval is c calculated by job processin time and 's usage limi ng itation r rate. At , t monitoring time interva is called as the g al ( (Stable Job Pr rocessing Tim Interval) an calculated b the me nd by f following form mulae.

CPU Intel Duo P8600 2.4G U: o Ghz, Memory: 4GB, Storage: : 120 0GB, LAN: 54 4Mbps, GPS: BluetoothGPS. We measur red tion by comp the accuracy of state predict f paring the st tate ormation from mobile devi m ices with the prediction usi ing info our Markov chain model and d r n dynamic monit toring intervals s.


is job process sing time and limit tation rate.


's usag ge

able state , no one kn nows how res source In an unsta i information w change, n which stat will be the next will nor te e s state. The next state would b a disabled s be state . Ther refore, t monitoring time interva of the g al must be shorter tha that an o of . At , the monitor ring time inte erval is called d ( (Unstable Job Processing T Time Interval), and it is calcu ulated b a job processing time an the usage limitation rate by nd e 's a and 's.


i is

's usage li imitation rate an nd limit tation rate.


's usage s

Figure 7. Monitoring infor rmation of mobile devices e

In order to calculate a monitoring t o time interval, other i information su as battery power, which changes rega uch h ardless o processed jobs, should be also conside of e ered. This inter is rval c called (St tatic Time In nterval), and i can be set by a it s system admini istrator. V. SIMULATION

In order for the simulatio of calculati monitoring time r on ing g i intervals, we i implemented a monitoring module using Java, g J JNI, SIGAR[19], RXTX[20]. The m monitoring m module c collects the s state informat tion of CPU, Memory, Sto orage, B Battery, Netw work Bandwidt and GPS. T configurat th The tion of a mobile dev vice for collec cting informa ation is as follows;

F Figure 7 presents the mon nitoring inform mation collect ted from the mobile device using the monitori module. T m g ing The gra aphs show th the utilization rates o memory a hat of and ban ndwidth are quite stable, and the di istance of G GPS movement is rela atively short. W think that it is because t We the ource perform mance of mobile devices is higher than t the reso per rformance needed for job pr rocessing. In c case of GPS, t the dist tance is short because the u users of mobile devices mov e ved only within camp pus.


Figure 8. Mon nitoring time inter rvals

Figure 8 pr resents monito oring time int tervals by rate The e. h higher the rate is, the shorte the monitor e er ring time inter is. rval J Job processin time is se to 300 se ng et econds to cal lculate m monitoring tim interval. me Figure 9 s shows the nu umber of mo onitoring freq quency a according to monitoring time interva When a static al. m monitoring tim interval is set to 60 seco me onds, the numb of mber s static monitori frequency is 89 for 2,500 seconds. ing

imp portant since the informa ation used to calculate t o the reli iability for fau tolerance is provided by s ult state informati ion mon nitoring modu ules. To cope w the faults due to dynam with s mic cha anges of mobi resource s ile state, it is a g good strategy to cha ange the monit toring time int terval dynamic cally. I this paper we propos In r, sed a techniq que to regul late mon nitoring time intervals base on our Mark chain model ed kov of mobile resour state. We applied Vite rce erbi algorithm to bas Markov mo sic odeling, and a avoided collect ting unnecessa ary stat information collection. T te n Thus the prop posed techniq que redu the overh duces head of inform mation collectio on. F the future work, we w develop a fault toleran For e will nce algo orithm using t proposed m the monitoring tech hnique REFER RENCES
[1] S Acharya, R A Alonso, M Frankl S Zdonik , B lin, Broadcast disks: d data management fo asymmetric co or ommunication env vironments, Mob bile Computing, Spr ringer, 1996 S Acharya, M F Franklin, S Zdonik, Balancing pu and pull for d ush data broadcast, ACM SIGMOD, 199 97 n Dynamic Monitori time interval for ing E. N. Huh, An Optimal and D Grid Resource E Environments, IC CCS'04, 2004. MDS, http://ww olkit/mds/ (visit 2 2010.5.25). GMA White P Paper, http://www GFPERF/GMA-W WG/ (visit 2010. 5.26 6). R-GMA: Relati ional Grid Monit toring Architectu December 20 ure, 003, (visit 2010.5.20) / ). R. Byrom et a al.,Fault Toleran in the R-GM Information and nce MA Monitoring Sys stem, EGC2005, LNCS 2470, 200 05. Manvi, S.S., Bi irje, M.N, Device Resource Status Monitoring Syst s tem in Wireless Grid ds, ACEEE Int ACT, 2009. tl. Manvi, S.S., B Birje, M.N. Mon nitoring and Statu Representation of us n Devices in Wire eless Grids, GPC 2010, LNCS 6104, 2010 C Sundaresanz, R Kurcy, T., Lau R., uriaz, M., Parthasarathyz, S., Saltz J.: z, A slacker, coh herence protocol for pull-based m monitoring of on-line data sources, IEEE/ACM In nternational Sym mposium on Clu uster Computing and the Grid, 2003 d I. Foster, Y. Z Zhao, I. Raicu, S Lu. Cloud c S. computing and G Grid computing 360 0-degree compare ed, Grid Comp puting Environme ents Workshop, 2008 J. Brandt, A. G Gentile, J. Mayo, P Pebay, D. Roe D. Thompson, and P. e, M. Wong, Re esource monitori ing and managem ment with OVIS to S enable HPC in Cloud computi n ing, IEEE Inter rnational Parallel & l Distributed Processing Symposiu 2009 um, B. Megdalena, & C. Paul, Ch haracterizing Mo obility and Netw work Usage in a Co orporate Wireless Local-Area Ne s etwork, MOBISY YS, 2003 H. Tristan, & K David, A. Ilya The Changing Usage of a Mat K. a, g ture Campus-wide W Wireless Network MOBICOM 0 k, 04,(2004 Sungjin Song, H HeonChang Yu, The Scheduling method consider ring the Using Patte of the Mobile device in Mobi Grid, Journa of ern e ile al Korea Associati of Computer Education, 2008 ion JA Whittaker, M Thomason, Markov chain model for statist MG A tical software testing IEEE Transact g, tions, 1994 G.D. Forney Jr The Viterbi al r., lgorithm, IEEE Proceedings of TT, E 1973. IBM, Se erver Utiliza ation Guide e, http://ww r/event/download d/20080612_360 _notes/s360. .pdf (visit 2010.7.5) SIGAR, http://w (visit 2010.7.15 m/ 5) RXTX, http://w (v 2010.7.20) visit

[2] [3] [4] [5] [6] [7] [8] [9]

Figure 10 Accuracy rate 0.

[10] ]

curacy of predicted The graph in Figure 10 shows the acc s states, and rep presents 88.4% better accu % uracy of predi ictions w comparis of basic M with son Markov chain m model.

[11] ]

[12] ]

[13] ]

[14] ] [15] ]

[16] ] Figure 9. N Number of monitoring [17] ]



[18] ]

oud computi ing using m mobile device as es Mobile clo r resources is considered un nstable becau of dynam use mically c changing state information. e Therefore the fault tol lerance must be supporte for ed p performing sta and reliab operations. Monitoring is very able ble .

[19] ] [20] ]