You are on page 1of 6

The Journal of China Universities of Posts and Telecommunications

August 2011, 18(4): 98103 www.sciencedirect.com/science/journal/10058885 http://jcupt.xsw.bupt.cn

Network selection policy in multi-radio access environment using stochastic control theory
WEI Yi-fei ( ), SONG Mei, ZHANG Yong, LIU Ning-ning
School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

Abstract

Network selection is crucial in improving the performance of heterogeneous wireless access systems. Most of previous work on network selection or radio resource allocation concentrates on the capability of each available network and ignores the time-varying nature of wireless media due to channel fading. However, the channel condition determines the state of each wireless network and plays a vital role in ensuring quality of service in multi-radio access environment. In this article, we propose a network selection policy using stochastic control theory considering the time-varying and stochastic character of wireless channels. The proposed scheme selects one network among different alternatives in each decision epoch according to the channel state of each network, which is modeled as finite-state Markov channel, with the objectives of increasing the data-rate, decreasing the bit error rate and minishing the delay. The procedure of network selection is formulated as a stochastic control problem, which can be solved using linear programming and primal-dual index heuristic algorithm. Simulation results are presented to show that network selection has great impact on the system performance, and the proposed scheme can improve the performance significantly.
Keywords network selection, multi-radio, finite-state Markov channel, stochastic control

Introduction

The complementary characteristics of different wireless networks make it attractive to integrate a wide range of radio access technologies, and the future wireless communication system is essentially an integration of many kinds of heterogeneous wireless networks. The multi-radio access concept with different radio access networks (RANs) ranging from wide area coverage with wireless cellular networks to hot spots covered with wireless local area networks (WLANs) is currently considered as a strong candidate for the future wireless access networks [12]. Several interworking architectures between 3G cellular and WLAN have been proposed in the technical literature. The European Telecommunications Standards Institute (ETSI) specifies two generic approaches toward WLAN-cellular integration, known as loose and tight coupling [3]. In loose coupling, the WLAN acts as an access network

Received date: 02-12-2010 Corresponding author: WEI Yi-fei, E-mail: weiyifei@bupt.edu.cn DOI: 10.1016/S1005-8885(10)60090-8

complementary to the cellular network, which means only signaling is transported between two systems while the WLAN data flow directly to the external IP network. In tight coupling, the WLAN emulates a radio access network (RAN), communicating with the external network through the cellular network. The third generation partnership project (3GPP) has developed a 3GPP-WLAN interworking architecture to enable 3GPP cellular network subscribers to access WLAN service. The research includes enabling reuse of a 3GPP subscription, developing a network selection mechanism, and defining authentication, authorization, and accounting [4]. In the multi-radio access environment, a dynamical network selection mechanism has to be developed to keep users always best connected (ABC) [5]. The most conventional algorithm is a fuzzy-logic based algorithm that adopts the radio signal strength (RSS) threshold and hysteresis values as input parameters [6]. Analytical hierarchy process (AHP) and grey relational analysis (GRA) are used in Ref. [7] to combine multiple network selection criteria and decide the weights of the criteria according to the user preferences and service applications. In Ref. [8] several

Issue 4

WEI Yi-fei, et al. / Network selection policy in multi-radio access environment using stochastic control theory

99

resource management and admission control schemes are proposed in cellular/WLAN integrated networks. The theory of evolutionary games is introduced by the authors of Ref. [9] for radio resource management including bandwidth allocation and admission control. The auction-based pricing is used as a mechanism for resource allocation, admission control, and network selection in Ref. [10] to maximize the revenue. Although some works has been done to integrate heterogeneous wireless networks, most of previous work on network selection or radio resource allocation concentrates on the capability of each available network and ignores the time-varying nature of wireless media due to channel fading. They assume that the channel fading is slow enough such that the channel conditions remain in the same state, and use the current observed channel conditions to make the network selection decision for the incoming session. However, this channel changeless assumption is often not realistic [11] and the state of each wireless channel plays a vital role in ensuring quality of service in multi-radio access environment. In this paper, we consider slow fading wireless links that can be modeled as a Markov chain by dividing the continuous link state into discrete levels for simplification. We propose a network selection policy using stochastic control theory considering the time-varying and stochastic character of wireless channels. The proposed scheme selects one network among different alternatives in each decision epoch according to the channel states of each network, which is modeled as finite-state Markov channel (FSMC) [1213], with the objective of increasing data-rate, decreasing bit error rate and minishing delay. The procedure of network selection is formulated as a stochastic control problem, which can be solved using linear programming (LP) and primal-dual index heuristic algorithm [14].

We consider an area within the coverage of three types of networks: cellular network such as general packet radio service (GPRS), Worldwide Interoperability for Microwave Access wireless (WiMAX) network and WLANs as shown in Fig. 1. In the multi-radio access environment, different radio access networks with different air interface technologies provide different data rates and different link qualities to the mobile users. Each new session arrives on the mobile user is to be associated to one network. In this figure, we have four different networks, WLAN1, WLAN2, WiMAX, and a cellular network. In reality, the number of radio access networks are not limited to four.

Fig. 1

The heterogeneous multi-radio access environment

To deal with the time-varying wireless connection states of the networks, we assume that there are M different types of networks available for the mobile user. We set the decision epoches to be the set of session arrival and departure time points, because the states change when a session arrives and departs, and assume that time t {0,1,..., T 1} stands for time instant at which decision need to be made. We assume that there are K different types of services running on the mobile terminal. The mobile user must automatically decide which network to be active at each decision epoch, and such decision depends on the state of each network. 2.2 Wireless channel model Recently, FSMC model has been widely used in literature to characterize the wireless channel [15]. With the first-order Markovian assumption, given the information of the symbol immediately preceding the current one, any other previous symbol should be independent of the current one. In FSMC, the channel state is characterized via the received signal-to-noise ratio (SNR), a parameter that is commonly used to represent the quality of a channel. The range of the average SNR of a received packet is quantized into L levels, and each level is associated with a state of a Markov chain. The channel varies over these states at each time slot according to a set of Markov transition probabilities. That is, the average received SNR of a radio access network can be modeled as a random variable evolving according to a

2
2.1

System model of network selection


System model

Since any single type of radio access technologies cannot provide all types of services, an integrated radio access network is introduced by combining these different types of wireless and mobile networks, which can provide more comprehensive services. In an integrated wireless and mobile network, a mobile user that is equipped with multi-radio interfaces is capable of accessing all the available networks. Therefore, how to select a desired network is an important issue for the integrated wireless and mobile network.

100

The Journal of China Universities of Posts and Telecommunications

2011

finite state Markov chain, which is characterized by a set of states, = { 0 , 1 ,..., K 1} . Let gm hm (t ) denote the

described as MDP, and decisions are made over the horizon under a policy u U , where U is the set of all Markovian policies, which select the current action as a function of the current state. The aim in solving the MDP is to find a control policy which maximizes expected future discounted reward, and this policy is called optimal policy u. 3.1 State space and action space

probability that

moves from state g m to state hm at

time t. The channel state transition probability matrix of network m is defined as: n (t ) = (1) gm hm (t )
L L

where gm hm (t ) = P ( (t +1)=hm (t )=g m ) , and g m , hm . In Ref. [16], the authors developed and analyzed a methodology to partition the average received SNR into a finite number of states according to the time duration of each state for a packet transmission system, and declared that the number of states and SNR partitions are determined by the fading speed of the channel. In real systems, the values in the above transition probability matrices can be obtained from the history observation of the wireless network. 2.3 Objectives

The state for an available network m {1,2,..., M } in time slot t {0,1,..., T 1} is determined by the channel state

m (t ) and service type k {1,2,..., K } . Consequently,


the state of an available network is the combination of them: sm (k , t ) = [ m (t ), k (t )] (2) In practice, the change of channel state and service type are independent with each other. Therefore, the system state will change in a Markovian fashion, and the finite-state space is represented as Sm , sm (k , t ) Sm , with the transition probability matrix: Pm (t ) = gm hm (t ), k (t ) (3 )

We need to find out the optimal network selection policy, which can select one network for epoch t according to the channel states with the following optimization objectives: 1) Increase the data-rate. Some networks provide relatively high bandwidth when the wireless channel state (t ) is good, and should be reflected as higher reward. 2) Increase the bit error rate. Some networks support low bit error rate when the wireless channel state (t ) is
good, and should be reflected on higher reward in our formulation. 3) Minish the delay. Some networks support low transmission delay when the wireless channel state (t ) is good, and should be reflected on higher reward in our formulation.

Hm Hm

where gm hm (t ) is defined in Eq. (1), H m = L K , and K is

the number of service types. The element of Pm (t ) is

pim ( k ,t ) jm ( k ,t ) (t ) , denoting the transition probability that the


state of network m changes from im (k , t ) to jm (k , t ) under action a, where im (k , t ), jm ( k , t ) Sm . We set the decision epoches to be the set of session arrival and departure time points, because the states change when a session arrives and departs. The mobile user must automatically decide which network to be active at each decision epoch, and such decision depends on the state of each network. The action is the network selection decision at the current epoch. At each epoch t, one of the networks is selected to be active, meaning that it is ready to admit a new arrival session at the next epoch t + 1 if a new session arrives at t + 1 . For each network m at epoch t, 1; if network m is selected at epoch t am (t ) = 0; if network m is not selected at epoch t 3.2 System reward

3 Restless bandit formulation and network selection scheme


The classical multiarmed bandit problem allows M parallel projects with finite state spaces to decide which one project will be active at each discrete time instant in a distributed way. An active project earns a reward, with the change of its state. A passive one does not change state with a passive reward. The aim is to maximize the total discounted reward earned over the time horizon, by determining the optimal policy that identifies which project should be active at each time point. The restless bandit formulation is an extension of the classical multiarmed bandit, and this formulation provides a powerful modeling framework. A restless bandit problem can be

In the restless bandit problem, the system reward represents the optimization objectives. Since the objectives of the proposed scheme is to increase the data-rate, decrease the bit error rate (BER) and minish the delay, we formulate the system reward to be the function of them. Since different type of services has different requirement, each objective is

Issue 4

WEI Yi-fei, et al. / Network selection policy in multi-radio access environment using stochastic control theory
polytope

101

weighted by a factor determined by the service type. The action of a network determines whether the reward will be gained. Therefore, we define the system reward as: Z = am (t ) f (r (k ) R( (t )), p (k ) P( (t )), d (k ) D( (t ))) (4) where | r (k ) | + | p (k ) | + | d (k ) |= 1 , r (k ) is positive weight depends on the service type k, p (k ) and d (k ) are negative weights depend on the service type k, R ( (t )) is the data-rate determined by the channel state (t ) , P ( (t )) is BER function determined by the channel state (t ) , and D( (t )) is the delay for transmitting a data packet. Since sessions of the same service type in the same network have the consistent properties, the instantaneous reward Z samm((kt ,)t ) is earned for network m in state sm (k , t ) when it takes action am (t ) in time slot t. For a stochastic process, we need to think about more than just the instantaneous reward that the system can receive. The goal of the network selection is to find a selection policy that maximizes the total expected reward, and the optimum value is: T 1 t M (t ) (5) Z * = max Eu ( Rsa11((kt ,)t ) + Rsa22((kt ,)t ) + ... + RsaM ( k , t ) ) u U t =0 3.3 Solving the restless bandit problem

over the space of the variable xsamm( k ,t ) for

1 project m , and the complete formulation of m is given

by Ref. [14]. The authors of Ref. [14] interpreted the primal-dual heuristic as a priority-index heuristic under some mixing assumptions on active and passive transition probabilities. The obtained network selection policy has an indexable rule that reduces the computational complexity dramatically. Please refer to Ref. [14] for details. We use this priority-index rule, which set active the network that has the smallest index, to select the optimal network. For network m in state sm (k , t ) , we denote by the index m ( sm ) . At each epoch, the network with the smallest index m ( sm ) is set to be active, while other networks are passive. The process of network selection scheme is proposed as follows: at each epoch, a request from a service on the mobile terminal is put forward. Then the policy agent calculates every network's index based on the current system state. By comparing the indices, the network with the lowest index is selected to be the active. The network selection scheme can be divided into the off-line stage and the on-line stage. In the off-line stage, indices are calculated for all states and actions, and are stored in a table. In the on-line stage, the policy agent looks up its table to find out the index corresponding to the current system state and action.

To solve the restless bandit problem, a hierarchy of increasingly stronger LP relexations is developed based on the classical result on LP formulations of Markov decision chains (MDCs) [14]: Z * = max (6) Rsamm( k ,t ) xsamm( k ,t )
x X
mM sm ( k , t )Sm am {0,1}

Simulation results and discussions

where

X = { x = ( xsamm( k ,t ) (u )) u U } is the corresponding

performance region spanned by performance vector x under all admissible policies u U , and the performance measure xsamm( k ,t ) (u ) represents the total expected discounted time that
network m take action am in state sm ( k , t ) under admissible policy u. Let sm ( k ,t ) denote the probability that the initial state is sm (k , t ) , and the initial state probability vector = ( sm ( k ,t ) ) is given. The first-order relaxation is formulated as the linear program in Ref. [14]: Z 1 = max Rsamm( k ,t ) xsamm( k ,t ) x X mM sm ( k , t )Sm am {0,1} 1 s.t. xm m ; m M (7) 1 x1sm ( k ,t ) = 1 mM sm ( k , t )Sm 1 The m is precisely the projection of restless bandit

In this section, we compare the proposed scheme with the random selection scheme and the existing network selection scheme, in which the network selection decision is made according to the current observed channel conditions. Three typical services are generated on the terminal including the real-time voice over IP (VoIP) and Video services, and the non-real-time file transfer protocol (FTP) service. The data rate of the VoIP, video and FTP service are 64 kbit/s, 1.5 Mbit/s and 128 kbit/s, respectively. The area considered is covered by four networks: WLAN1, WLAN 2, WiMAX, and a cellular network. We adopt the parameters shown in Table 1.
Table 1 Network parameters
Parameter WLAN1 WLAN2 WiMAX Cellular Target SNR/dB 20 15 15 10 2 3 1 0.4 Available bandwidth/ (Mbit s 1 ) Average delay/ms 30 20 20 10 Jitter/ms 10 10 5 1 Average BER 103 103 104 105

The received SNR is partitioned into 3 states s0 (10 dB), s1 (15 dB), s2 (20 dB). In each channel, we set the transition

102

The Journal of China Universities of Posts and Telecommunications

2011

probability of staying in the same state as 0.7, and set the probability of transition to the adjacent state is twice that of transition to non-adjacent state. The channel state transition probability matrices are: 0.70 0.20 0.10 P = 0.15 0.70 0.15 0.10 0.20 0.70 The reward considered in this paper is given by Eq. (4), which is the function of data-rate, BER and delay with different weights. We take R ( (t )) 1 Mbit/s as the first component of reward function weighted by positive r (k ) , and take lg( P ( (t )) 103 ) as the second component of reward function weighted by negative p (k ) , and take

the probability for wireless channel staying in the same state. It can be seen that the performance of the existing network selection scheme is getting closer to our proposed scheme with the increase of the transition probability, and it performs as good as our proposed scheme when the channel is absolutely static, in which the transition probability that the channel will be at the same state is 1. The proposed scheme can achieve the highest data-rate in any transition probabilities.

D( (t )) 10 ms as the third component of reward function


weighted by negative d (k ) . The reward function Z samm((kt ,)t ) will be zero if the network is passive ( am (t ) = 0 ). The initial states of the each network are random, and 10 runs with different seed numbers are conducted for each simulation and output data are averaged over these runs. The system reward can be acquired according to different weights to each component of the reward function. For the video service, we can specify r (k ) = 0.8 , d (k ) = 0.2 , and p (k ) = 0 . We run the simulations for 1 000 s, and one of a network will be selected at each decision time. Fig. 2 shows the data-rate of video service using different network selection schemes. It can be seen that the proposed scheme can select a network with good channel quality for the subsequent packet at almost every decision time, and the data-rate is near 1.5 Mbit/s which satisfies the requirement of the service. The data-rate using the existing network selection scheme is about 1.35 Mbit/s, since it selects a network for the subsequent frame according to the current state which may change in the subsequent epoch. The random selection obtains the lowest data-rate round about 1.15 Mbit/s.
Fig. 3 Data-rate of video service under different transition probabilities

For the VoIP service, we can specify r (k ) = 0.2,

d (k ) = 0.8 , and p ( k ) = 0 . Fig. 4 shows the delay of VoIP


service under different state transition probabilities using different network selection schemes. It can be seen that the proposed scheme can select a network with good channel quality in any transition probabilities, and the delay is near 10ms which satisfies the requirement of the service. The delay using the existing network selection scheme is getting closer to our proposed scheme with the increase of the transition probability, and it performs as good as our proposed scheme when the channel is absolutely static, since it selects a network for the subsequent frame according to the current state which may change in the subsequent epoch. The proposed scheme always has better performance compared to the existing network selection scheme and random selection scheme.

Fig. 2 Data-rate of video service using different network selection schemes

Fig. 4 Delay of VoIP service under different transition probabilities

We evaluate how the parameters in the transition matrix affect the system performance. Fig. 3 shows the average data-rate with different state transition probability, which is

For the FTP service, we can assign p ( k ) = 1 , and 0 for the other weights. Fig. 5 shows the average BER of FTP service under different state transition probabilities using

Issue 4

WEI Yi-fei, et al. / Network selection policy in multi-radio access environment using stochastic control theory

103

different network selection schemes. It can be seen that the proposed scheme can select a network with good channel quality in any transition probabilities, and the average BER is about 105 which satisfies the requirement of the service. The average BER using the existing network selection scheme is getting closer to our proposed scheme with the increase of the transition probability. The proposed scheme can achieve the lowest average BER in any transition probabilities, and the performance of the random selection scheme is the worst in any case.

References
1. Sachs J, Wiemann H. Integration of multi-radio access in a beyond 3G network. Proceedings of the 15th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC04), Sep 58, 2004, Barcelona, Spain. Piscataway, NJ, USA: IEEE, 2004: 757762 2. Wei Y, Hu Y, Song J. Network selection strategy in heterogeneous multi-access environment. The Journal of China Universities of Posts and Telecommunications, 2007, 14(Sup): 1620 3. Zhang S, Yu F, Leung V. Joint connection admission control and routing in IEEE 802.16-based mesh networks. Proceedings of the IEEE International Conference on Communications (ICC08), May 1923, 2008, Beijing, China. Piscataway, NJ, USA: IEEE, 2008: 49384942 4. 3GPP TS 23.234. Group services and system aspects; 3GPP systems to wireless local area network (WLAN) interworking; system description, v.6.2.0. 2004 5. Gustafsson E, Jonsson A. Always best connected. IEEE Wireless Communications, 2003, 10(1): 4955 6. Salkintzis A K. Interworking techniques and architectures for WLAN/3G integration toward 4G mobile data networks. IEEE Wireless Communications, 2004, 11(3): 5061 7. Song Q, Jamalipour A. Network selection in an integrated wireless LAN and UMTS environment using mathematical modeling and computing techniques. IEEE Wireless Communications, 2005, 12(3): 4248 8. Song W, Jiang H, Zhuang W, et al. Resource management for QoS support in cellular/WLAN interworking. IEEE Network, 2005, 19(5): 1218 9. Niyato D, Hossain E. A noncooperative game-theoretic framework for radio resource management in 4G heterogeneous wireless access networks. IEEE Transactions on Mobile Computing, 2008, 7(3): 332345 10. Sallent O, Perez-Romero J, Agusti R, et al. Resource auctioning mechanisms in heterogeneous wireless access networks. Proceedings of the 63rd Vehicular Technology Conference (VTC-Spring06): Vol 1, Mar 710, 2006, Melbourne, Australia. Piscataway, NJ, USA: IEEE, 2006: 5256 11. Yang J, Khandani A K, Tin N. Statistical decision making in adaptive modulation and coding for 3G wireless systems. IEEE Transactions on Vehicular Technology, 2005, 54(6): 20662073 12. Wang H S, Chang P C. On verifying the first-order Markovian assumption for a rayleigh fading channel model. IEEE Transactions on Vehicular Technology, 1996, 45(2): 353357 13. Pimentel C, Falk T H, Lisba L. Finite-state Markov modeling of correlated Rician-fading channels. IEEE Transactions on Vehicular Technology, 2004, 53(5): 14911501 14. Berstimas D, Nino-Mora J. Restless bandits, linear programming relaxations, and a primal dual index heuristic. Operations Research, 2000, 48(1): 8090 15. Li L, Goldsmith A J. Low-complexity maximum-likelihood detection of coded signals sent over finite-state Markov channels. IEEE Transactions on Communications, 2002, 50(4): 524531 16. Zhang Q, Kassam S. Finite-state Markov model for Rayleigh fading channels. IEEE Transactions on Communications,1999, 47(11): 16881692

Fig. 5 Average BER of FTP service under different transition probabilities

Conclusions

In this paper, we have presented a network selection policy using stochastic control theory considering the time-varying and stochastic character of wireless channels. The proposed scheme selects one network among different alternatives in each decision epoch according to the channel state of each network, which is modeled as finite-state Markov channel, with the objectives of increasing the data-rate, decreasing the bit error rate and minishing the delay. The procedure of network selection is formulated as a stochastic control problem, which can be solved using linear programming and primal-dual index heuristic algorithm. Simulation results are presented to show that network selection has great impact on the system performance, and the proposed scheme can improve the performance significantly.
Acknowledgements This work was supported by the National Natural Science Foundation of China (60971083), and the Scientific Research and Innovation Plan for the Youth of BUPT (2011RC0305).

(Editor: ZHANG Ying)