You are on page 1of 7

5632 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO.

11, NOVEMBER 2009

Delay-Sensitive Distributed Power and


Transmission Threshold Control for S-ALOHA
Network with Finite State Markov Fading Channels
Huang Huang Student Member, IEEE, and Vincent K. N. Lau, Senior Member, IEEE

Abstract—In this paper, we consider the delay-sensitive power network (heterogenous users), the stability region has only
and transmission threshold control design in S-ALOHA network been obtained in two and three user cases[2]. The study of
with FSMC fading channels. The random access system consists the stability region for general number of users is difficult
of an access point with 𝐾 competing users, each has access to the
local channel state information (CSI) and queue state information because the transition probability of the state space of the
(QSI) as well as the common feedback (ACK/NAK/Collision) interacting queues alters from the non-empty to empty buffer
from the access point. We seek to derive the delay-optimal case. In [3], the authors proposed a dominant system technique
control policy (composed of threshold and power control). The to obtain a lower bound for the stability region for the general
optimization problem belongs to the memoryless policy 𝐾-agent case. In symmetric ALOHA network (homogeneous users), all
infinite horizon decentralized Markov decision process (DEC-
MDP), and finding the optimal policy is shown to be computation- users are statistically identical and hence, the stability region
ally intractable. To obtain a feasible and low complexity solution, is degenerated to one dimension. It is shown in [1], [4] that
we recast the optimization problem into two subproblems, namely the system is stable as long as the arrival rate is less than the
the power control and the threshold control problem. For a average throughput. As a result, stability analysis is equivalent
given threshold control policy, the power control problem is to the throughput analysis. The authors in [4] extended the
decomposed into a reduced state MDP for single user so that
the overall complexity is 𝒪(𝑁 𝐽), where 𝑁 and 𝐽 are the buffer protocol to an adaptive ALOHA over the multi-packet recep-
size and the cardinality of the CSI states. For the threshold tion (MPR) channel to maximize the system throughput. For
control problem, we exploit some special structure of the collision instance, the transmission probability is a function of the local
channel and common feedback information to derive a low channel state information (CSI). In [5], the authors extended
complexity solution. The delay performance of the proposed to the adaptive transmission rate and power control w.r.t to
design is shown to have substantial gain relative to conventional
throughput optimal approaches for S-ALOHA. CSI to maximize the throughput. In [6], it is shown that a
simple adaptive permission probability scheme, namely binary
Index Terms—S-ALOHA, delay, Markov decision process
scheduling, is throughput optimal for homogeneous users with
(MDP), local channel state information (CSI), local queue state
information (QSI), threshold control, power control. adaptive transmission rate in collision channel. In the binary
scheduling scheme, there is a transmission threshold in which
user could attempt to transmit its backlogged packet only when
I. I NTRODUCTION
its local CSI exceeds the threshold.

R ANDOM access network is a hot research topic due


to its robustness in system performance. In particular,
ALOHA is a popular example of random access protocol
In all the above works on stability and throughput analysis
and optimization, the delay performance has been ignored
completely. In practice, applications are delay-sensitive and
which has attracted a lot of research attention over the past it is critical to optimize the delay performance in S-ALOHA
two decades. One important application is the access network network to support realtime applications. In [7], the authors
(such as the infrastructure mode in WiFi) where multiple nodes surveyed the recent works on delay analysis of traditional S-
compete for transmission opportunity to transmit data to an ALOHA network in which exact delay can be obtained only
access point (AP). In [1], the authors considered the design in two user case. In [8], the delay performance for finite user
and analysis of the traditional buffered slotted ALOHA (S- finite buffer is analyzed using the tagged user analysis (TUA)
ALOHA) in which finite users with infinite buffer attempt method. Although the channel fading is considered, adaptive
to transmit a backlogged packet according to a transmission transmission probability and rate with power control is not
probability in one slot, and the packet is successfully received allowed. In [9], the trade-off between delay and energy in
if and only if exact one packet is transmitted. In asymmetric additive write Gaussian noise (AWGN) channel with no queue
Manuscript received December 19, 2008; revised June 12, 2009 and August state information (QSI) is investigated. However, they assumed
15, 2009; accepted August 15, 2009. The associate editor coordinating the multi-access coding to ensure successful reception for each
review of this paper and approving it for publication was S. Aissa. user even if all competing users transmit simultaneously. In
The authors are with the Department of Electronic and Computer Engineer-
ing (ECE), The Hong Kong University of Science and Technology (HKUST), [10], the authors proved that the longest queue highest possible
Hong Kong (e-mail: {huang, eeknlau}@ust.hk). rate (LQHPR) policy, which is a centralized control policy
This work is supported by RGC 615407. The material in this paper was requiring perfect knowledge of global QSI and global CSI, is
presented in part at the IEEE International Symposium on Information Theory,
Seoul, Korea, June/July 2009. delay-optimal in symmetric network. While the above works
Digital Object Identifier 10.1109/TCOMM.2009.081661 deal with the delay performance of S-ALOHA network, there
1536-1276/09$25.00 ⃝
c 2009 IEEE
HUANG and LAU: DELAY-SENSITIVE DISTRIBUTED POWER AND TRANSMISSION THRESHOLD CONTROL FOR S-ALOHA . . . 5633

are still a lot of technical challenges to be solved. They are


listed below.

∙ Queue-aware power and threshold control for S-


ALOHA: Previous literature focused either on the power
control (under a fixed and common threshold for all
users) for throughput optimization, or on the delay
analysis of uncontrolled S-ALOHA network. Both the
transmission threshold control and power control policies
are important means to optimize the delay performance of
Fig. 1. The system model in symmetric S-ALOHA network.
S-ALOHA. However, due to the lack of global knowledge
on CSI and QSI, it is quite challenging to design delay-
sensitive control schemes for S-ALOHA networks.
II. S YSTEM M ODEL
∙ Exploiting memory in the fading channels: Existing
works have assumed memoryless adaptation in which In this section, we shall elaborate the system model, in-
the control actions are done independently slot by slot cluding source and physical layer model, as well as the
(assuming fading is i.i.d). While i.i.d fading could lead control policy in symmetric network. We consider a 𝐾 users
to simple solution, it fails to exploit the memory of the S-ALOHA network in this paper. The time dimension is
time varying fading channels, which is critical to boost partitioned into slots (each slot lasts 𝜏 seconds). The 𝑚-th
the delay performance of S-ALOHA network. slot means the time interval (𝑚𝜏, (𝑚 + 1)𝜏 ), 𝑚 = 0, 1, 2 ⋅ ⋅ ⋅ .
∙ Utilization of local QSI and common feedback in- Fig. 1 illustrates the top level system model in symmetric
formation from the AP: Existing control policy on network. The 𝐾 competing users are coupled together via the
throughput optimization only adapts to the local CSI and transmission threshold and power control policy.
did not exploit the local QSI as well as common feedback
information from the AP. These side information are A. Source Model
also critical to improve the delay performance of the S- For simplicity, the arrival packet rate of all the users
ALOHA network. is assumed to follow independent Poisson distribution with
arrival rates 𝜆 (number of packets per second). The packet
In this paper, we shall propose a delay-sensitive power
length of the data source 𝑁𝑏 , follows exponential distribution
and transmission threshold control algorithm for S-ALOHA
with mean packet size 𝑁𝑏 (bits per packet), and the buffer
network which addresses the above three important issues.
size is 𝑁 (packets). The QSI of the whole system at the
We consider a S-ALOHA network with 𝐾 users. The trans-
𝑚-th slot is denoted by Q𝑚 = {𝑄𝑘,𝑚 }𝐾 𝐾
𝑘=1 ∈ 𝒩 , where
mit power and threshold control policies adapt to the local
𝑄𝑘,𝑚 is the number of packets in the 𝑘-th user’s buffer, and
CSI, local QSI as well as common feedback information
𝒩 = {0, 1, 2, ..., 𝑁 } denotes a finite state space of local QSI
(ACK/NAK/Collision) from the AP. The delay-optimization
for single user. When the buffer is full, i.e, 𝑄𝑘,𝑚 = 𝑁 , it will
problem belongs to the memoryless policy 𝐾-agent infi-
not accept any potential new packets.
nite horizon decentralized Markov decision process (DEC-
MDP)[11]. The problem of finding the optimal policy is
proved to be NP-hard[12], [13], which means that the optimal B. Physical Layer Model and Feedback Mechanism
solution is computationally intractable. To obtain a feasible We consider a block fading channel between each user
and low complexity solution, we recast the optimization and the AP. The CSI at 𝑚-th slot is denoted by H𝑚 =
problem into two subproblems, namely the power control and {𝐻𝑘,𝑚 }𝐾 𝐾
𝑘=1 ∈ 𝒮 , where 𝐻𝑘,𝑚 is the channel gain for
the threshold control problem. For a given threshold control user 𝑘, and 𝒮 = {𝑆𝑖 }𝐽𝑖=1 denote a set of 𝐽 CSI states for
policy, the power control problem is decomposed into a re- single user. {𝐻𝑘,𝑚 }∞ 𝑚=1 is modeled as a stationary ergodic
duced state MDP for single user so that the overall complexity process[14], which is independent among users. Specifically,
is 𝒪(𝑁 𝐽 2 ), where 𝑁 and 𝐽 are the buffer size and the let 𝑝𝑖,𝑗 = Pr{𝐻𝑘,𝑚 = 𝑆𝑗 ∣𝐻𝑘,𝑚−1 = 𝑆𝑖 } be the state
cardinality of the CSI states. On the other hand, we solve the transition probability and 𝜋𝑗 = Pr{𝐻𝑘,∞ = 𝑆𝑗 } be the
threshold control problem by exploiting the special structure stationary probability. All the users share a common spectrum
of the S-ALOHA network and common feedback information with a bandwidth of 𝑊 Hz using S-ALOHA protocol. The
to derive a low complexity solution. The delay performance of signal received by the AP at 𝑚-th slot is given by:
the proposed design is shown to have substantial gain relative ∑𝐾 √
to conventional solutions. 𝑦[𝑚] = 𝐻𝑘,𝑚 𝑥𝑘 [𝑚] + 𝑧[𝑚] (1)
𝑘=1
This paper is organized as follows. In section II, we outline
where 𝑥𝑘 [𝑚] is the transmit signal for the 𝑘-th user at 𝑚-th
the system model of S-ALOHA network and define the delay- slot, and {𝑧[𝑚]}∞𝑚=1 is the i.i.d 𝒩 (0, 𝑁0 ) noise. Suppose that
optimal control policy. In section III, we shall formulate the
only the 𝑘-th user attempts to transmit its packet to the AP at
delay-optimal problem and introduce the DEC-MDP model. the 𝑚-th slot. The maximum achievable data rate (b/s) of the
In section IV, we exploit the special structure in symmetric 𝑘-th user is given by:
network, and illustrate the performance via simulations in
section V. A brief summary is given in section VI finally. 𝑅(𝑃𝑘,𝑚 , 𝐻𝑘,𝑚 ) = 𝑊 log2 (1 + 𝑃𝑘,𝑚 𝐻𝑘,𝑚 /(𝑁0 𝑊 )) (2)
5634 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO. 11, NOVEMBER 2009

where 𝑃𝑘,𝑚 and 𝐻𝑘,𝑚 is the power and channel gain of 𝑘-th 𝜋𝑃 : 𝒩 × 𝒮 × 𝒮 × 𝒵 × 𝒮 → ℝ is defined as the
user at 𝑚-th slot. mapping from current local system state for 𝑘-th user,
To decouple the delay-optimal design from the detailed to current slot’s transmit power 𝜋𝑃 (𝜒𝑘,𝑚 ) = 𝑃𝑘,𝑚 . The
implementation of the modulation and coding in the physical set of all feasible stationary policies 𝜋𝑃 is defined as
layer, we assumed that the data rate (2) is achievable. In fact, 𝒫𝑃 = {𝜋𝑃 : 𝜋𝑃 (𝜒𝑘,𝑚 ) ≥ 0}. Note that 𝑃𝑘,𝑚 = 0 for all
it has been shown [15] that the Shannon’s limit in (2) can 𝐻𝑘,𝑚 < 𝛾𝑚 , because current slot’s CSI is lower than the
be achieved to within 0.05dB SNR using LDPC with 2K threshold.
byte block size at 1% PER. We consider a collision channel For simplicity, let 𝜋 = {𝜋𝛾 , 𝜋𝑃 } denote the joint control
for the S-ALOHA random access and hence, the AP could policy of all the 𝐾 users. The corresponding set of station-
only decode the data successfully when there is only one user ary joint control policy is given by 𝒫 = {𝒫𝛾 , 𝒫𝑃 } . As
transmitting in any time slot. At the end of each slot, the a result, 𝜋(𝝌𝑚 ) = {𝜋𝛾 (𝛾𝑚−1 , 𝑍𝑚−1 ), {𝜋𝑃 (𝜒𝑘,𝑚 )}𝐾 𝑘=1 } =
AP broadcasts the ACK/NAK/Collision feedback, denoted as {𝛾𝑚 , {𝑃𝑘,𝑚 }𝐾𝑘=1 }.
𝒵 = (1, 0, 𝑒), to all the 𝐾 users in the network. For instance, In practice, the user with empty buffer will not transmit
ACK (𝑍 = 1) means that exactly one user has transmitted even if its local CSI exceeds the system threshold, and this
the packet, and data was successfully decoded; NAK (𝑍 = 0) is one important technical challenge in the delay analysis of
means that none of users has transmitted and hence, no data S-ALOHA network. Instead of dealing with the delay for the
was received; Collision (𝑍 = 𝑒) means that at least two users original S-ALOHA network, we shall utilize the technique
have transmitted, and the data was corrupt1. of dominant system [3] to obtain an upper bound of the delay
performance. In the dominant system, we assume users always
C. Control Policy have virtual packets to send (even if the buffer is empty) and
Each user decides whether to transmit a packet at the therefore, the delay performance associated with the dominant
beginning of a slot using a threshold mechanism. Due to system is always an upper bound of the actual system. Yet,
symmetry, a user will transmit if the buffer is not empty and the bound is asymptotically tight in the large delay regime.
its local CSI exceeds a common system threshold 𝛾𝑚 2 . If there
are more than one backlogged users’ local CSI exceeding the III. P ROBLEM F ORMULATION
threshold, then collision will occur and none of the packets
could get through. As a result, 𝛾𝑚 determines the priority on In this section, we shall first formulate the delay-optimal
the access opportunity of each user. In this paper, we shall control policy problem, and then formally introduce DEC-
consider an adaptive threshold control to exploit the fading MDP model. We show that our problem belongs to the
memory to minimize the system delay. A stationary threshold memoryless policy case of DEC-MDP in which finding the
control policy 𝜋𝛾 is defined below: optimal policy is computationally intractable.
Definition 1 (Stationary Threshold Control Policy): 3 A
stationary threshold control policy 𝜋𝛾 : 𝒮 × 𝒵 → 𝒮 is defined A. System Delay
as the mapping from the previous slot’s system threshold
𝛾𝑚−1 and common feedback 𝑍𝑚−1 from the AP to the Due to the nature of random access, the queues of the 𝐾
system threshold 𝜋𝛾 (𝛾𝑚−1 , 𝑍𝑚−1 ) = 𝛾𝑚 in current slot. users are coupled together via the control policy. When the
The set of all feasible stationary policies 𝜋𝛾 is denoted as system threshold is small, there will be a high probability
𝒫𝛾 = {𝜋𝛾 : 𝜋𝛾 (𝛾𝑚−1 , 𝑍𝑚−1 ) ∈ 𝒮}. of having more than one users sending packet, leading to
The threshold control is adaptive to the common informa- collision and wastage of power resource. On the other hand,
tion for all the 𝐾 users and hence, each user could determine when the system threshold is high, there is non-negligible
the system threshold just from the feedback from the AP. probability of having no user sending packet, leading to
Denote 𝝌𝑚 = {Q𝑚 , H𝑚−1 , 𝛾𝑚−1 , 𝑍𝑚−1 , H𝑚 } to be wastage of idle time. Similarly, individual user may want to
the global system state at the 𝑚-th slot and 𝜒𝑘,𝑚 = increase the transmit power when the local CSI is good but
{𝑄𝑘,𝑚 , 𝐻𝑘,𝑚−1 , 𝛾𝑚−1 , 𝑍𝑚−1 , 𝐻𝑘,𝑚 } to be the local system if there is collision, the transmitted power is wasted. In this
state which is observable locally at the 𝑘-th user. Note that paper, we seek to find an optimal stationary control policy
{𝛾𝑚−1 , 𝑍𝑚−1 } is the common information for all users, and to minimize the average delays of the 𝐾 competing users
{𝑄𝑘,𝑚 , 𝐻𝑘,𝑚−1 , 𝐻𝑘,𝑚 } is the local information for the 𝑘-th subject to average transmit power constraint for single user.
user. Given the observed local system state realization 𝜒𝑘,𝑚 , Specifically, the average delay for the 𝑘-th user is
the 𝑘-th user should adjust the transmission power according [∑ ]
1 𝑀
to a stationary power control policy 𝜋𝑃 , which is formally 𝑇𝑘 (𝜋) = lim sup 𝔼 𝑄𝑘,𝑚 ∀𝑘 ∈ {1, ..., 𝐾} (3)
𝑀 𝑀 𝑚=1
defined below.
Definition 2 (Stationary Power Control Policy): and average transmit power constraint is given by:
The stationary power control policy for single user [∑ ]
1 𝑀
1 Since 𝑃𝑘 (𝜋) = lim sup 𝔼 𝑃𝑘,𝑚 ≤ 𝑃0 (4)
we assume strong coding is used by each user, we ignore the case 𝑀 𝑀 𝑚=1
with transmission error.
2 Referring to [16] for more discussion on the common threshold setting.
where 𝑃𝑘,𝑚 is the transmitted power determined by 𝜋(𝜒𝑘,𝑚 ),
3 We have assumed the deterministic threshold control policy here. In [16],
and 𝑃0 is the average power constraint for single user. The
we have shown that the same formulation and approach can be used to deal
with a transmission probability approach rather than threshold approach. delay-optimal control problem can be formally written as:
HUANG and LAU: DELAY-SENSITIVE DISTRIBUTED POWER AND TRANSMISSION THRESHOLD CONTROL FOR S-ALOHA . . . 5635

Problem 1 (Delay Optimal S-ALOHA Control Policy): Lemma 1 (Transition Probability of Local System State):
Find a stationary control policy 𝜋 that minimizes At 𝑚-th slot, the current state of the 𝑘-th user is
∑ [ ] 𝜒𝑘,𝑚 = {𝑄𝑘,𝑚 , 𝐻𝑘,𝑚−1 , 𝛾𝑚−1 , 𝑍𝑚−1 , 𝐻𝑘,𝑚 }. Conditioned
𝐽 𝜋 (𝜒1 ) = 𝑇𝑘 (𝜋) + 𝜉𝑃𝑘 (5) on 𝜋𝑃 , the transition probability to the next slot is given by:
𝑘
1 ∑
= lim sup 𝔼 [𝑔𝑘 (𝜒𝑘,𝑚 , 𝜋(𝜒𝑘,𝑚 ))] Pr{𝜒𝑘,𝑚+1 ∣𝜒𝑘,𝑚 , 𝜋𝑃 (𝜒𝑘,𝑚 )} = 𝕀 (𝛾𝑚 = 𝜋𝛾 (𝛾𝑚−1 , 𝑍𝑚−1 ))
𝑀 𝑀 𝑚,𝑘
× Pr{𝐻𝑘,𝑚+1 ∣𝐻𝑘,𝑚 } Pr{𝑍𝑚 ∣𝑍𝑚−1 , {𝐻𝑘,𝑖 , 𝛾𝑖 }𝑚𝑖=𝑚−1 }
where 𝑔𝑘 (𝜒𝑘,𝑚 , 𝜋(𝜒𝑘,𝑚 )) = 𝑄𝑘,𝑚 + 𝜉𝑃𝑘,𝑚 is the per-stage × Pr{𝑄𝑘,𝑚+1 ∣𝜒𝑘,𝑚 , 𝑍𝑚 , 𝜋𝑃 (𝜒𝑘,𝑚 )}
system price function and 𝜉 > 0 is the Lagrange multipliers (6)
corresponding to the average power constraints in (4). where 𝕀(𝑋) is an indicate function, which is equal to 1 when
event 𝑋 is true and 0 otherwise.
Proof: Please refer to [16].
B. DEC-MDP Model
Problem 1 in (5) in fact belongs to the class of infinite
B. Reduced State MDP Formulation
horizon DEC-MDP, which is formally defined below [11]:
Definition 3 (DEC-MDP): An 𝐾-agent DEC-MDP is given For a given threshold control policy in (5), we seek to find
as a tuple an optimal power control policy to minimize
{𝐼, 𝑆, 𝐴, 𝑃 (𝑠′ ∣𝑠, 𝑎), 𝑅(𝑠, 𝑎), 𝑝0 } 1 ∑
𝐽 𝜋𝑃 (𝜒1 ) = lim 𝔼 [𝑔(𝜒𝑘,𝑚 , 𝜋𝑃 (𝜒𝑘,𝑚 ))] (7)
𝑀 𝑀 𝑘,𝑚
where 𝐼 = {1, .., 𝐾} is a set of agents, 𝑆 = {𝑆𝑘 } is a finite
set of states, 𝐴 = {𝐴𝑘 } is a set of joint actions, 𝑆𝑘 and 𝐴𝑘 is Note that, power control policy is a function of local system
available to agent 𝑘, 𝑃 (𝑠′ ∣𝑠, 𝑎) is the transition probability that state, and for the 𝑘-th user, its local system state transition
transits from state 𝑠 to 𝑠′ given joint action 𝑎 taken, 𝑅(𝑠, 𝑎) probability is given in (6). The optimal power control policy
is the price function given in state 𝑠 and joint action 𝑎 taken, in (7) could be decoupled into 𝐾 single-user optimization
𝑝0 is the initial state distribution of the system 4 . problems, which can be modeled as a MDP and summarized
The association between Problem 1 and DEC-MDP is as as following lemma.
follows: We have 𝑠𝑘 = 𝜒𝑘,𝑚 , 𝑎𝑘 = 𝜋, 𝑃 (𝑠′ ∣𝑠, 𝑎) can be easily Lemma 2 (Power Control Optimization for Single User):
obtained from local system state transition 𝑃 (𝑠′𝑘 ∣𝑠𝑘 , 𝑎𝑘 ) given The optimal power control policy7 minimizing the whole
∑𝐾 system delay can be modeled as a single user MDP
in lemma 1, and 𝑅(𝑠, 𝑎) = 𝑘=1 [𝑔𝑘 (𝜒𝑘,𝑚 , 𝜋𝑘 (𝜒𝑘,𝑚 ))].
When the policy is given by a mapping from histories of problem, with state space given by local system state 𝜒𝑚
local system state {𝑠𝑘,1 , ...𝑠𝑘,𝑚 , ...} to actions 𝑎𝑘 ∈ 𝐴𝑘 , the (ignoring user index 𝑘). The transition probability is given by
problem is undecidable5 [18]. When the policy is given by a Pr{𝜒𝑚+1 ∣𝜒𝑚 , 𝜋𝑃 (𝜒𝑚 )} from lemma 1, and average price is
mapping from current local system state 𝑠𝑘 to actions 𝑎𝑘 ∈ given by:
𝐴𝑘 , it is called memoryless or reactive policy. In that case, the 1 ∑
𝐽 𝜋𝑃 (𝜒1 ) = lim 𝔼 [𝑔(𝜒𝑚 , 𝜋𝑃 (𝜒𝑚 ))] (8)
problem is NP-hard[12], [13]. As a result, it is very difficult to 𝑀 𝑀 𝑚
obtain the optimal solution for the Problem 1. Instead of brute- For the infinite horizon MDP, the optimal policy can be
force solution, we shall try to exploit the special structure of obtained by solving the bellman equation recursively w.r.t
our problem to obtain low complexity solutions. (𝜃, {𝑉 (𝜒)}) as below:
{
IV. D ELAY-O PTIMAL C ONTROL P ROBLEM 𝑉 (𝜒𝑚 ) + 𝜃 = inf 𝑔(𝜒𝑚 , 𝑎(𝜒𝑚 )) +
𝑎(𝜒𝑚 )
In this section, we will focus on exploiting the special ∑ }
structure of the symmetric network6. We shall first solve an Pr{𝜒𝑚+1 ∣𝜒𝑚 , 𝑎(𝜒𝑚 )}𝑉 (𝜒𝑚+1 ) (9)
𝜒𝑚+1
optimal power control policy by a reduced state MDP for
any given threshold control policy. To solve the threshold where 𝑎(𝜒𝑚 ) = 𝜋𝑃 (𝜒𝑚 ) is the power allocation when state is
control problem, we utilize the collision channel mechanism 𝜒𝑚 . If there is a (𝜃, {𝑉 (𝜒)}) satisfying (9), then 𝜃 is the op-
and derive a low complexity solution. timal average price per stage 𝐽 𝜋𝑃 (𝜒1 ) and the corresponding
optimizing policy is given by 𝑎∗ (𝜒𝑚 ), the optimizing action
A. Embedded Markov Chain under a Given Threshold Control of (9) at state 𝜒𝑚 .
Policy Value or policy iteration can be used to solve the bellman
equation (9) [19]. The challenge of the two iteration algorithm
For a given threshold control policy, the observed local lies in the size of the local state space. To reduce the complex-
system state for single user is actually evolved as a Markov ity, we shall recast the original MDP in lemma 2 into a reduced
chain. Specifically, the transition probability conditioned on state MDP. Let’s partition the policy 𝜋𝑃 into a collection of
the power control policy 𝜋𝑃 is given in the following lemma. actions, the above MDP could be further reduced to a simpler
4 More details about the infinite horizon DEC-MDP is provided in [17] and MDP over a reduced state 𝜒 ˆ𝑚 = {𝑄𝑚 , 𝐻𝑚−1 , 𝛾𝑚−1 , 𝑍𝑚−1 }
the references therein. only8 . Specifically, we have following definition:
5 A decision problem is called undecidable if no algorithm can decide it,
such as for Turing¡¯s halting problem. Please refer to [16] for more details. 7 In [16], we have shown that there is no loss of optimality for this power
6 A symmetric network refers to the case where all the 𝐾 users are control policy.
statistically identical. 8 A similar technique was also used in [20].
5636 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO. 11, NOVEMBER 2009

Definition 4 (Conditional Action): Given a policy 𝜋𝑃 , we where 𝛿(𝑞𝑚 , 𝐻𝑚 , 𝛾𝑚 ) = 𝑉˜ ((𝑞𝑚 − 1)+ , 𝐻𝑚 , 𝛾𝑚 , 𝑍𝑚 =


define 𝜋P (𝜒ˆ𝑚 ) = {𝜋𝑃 (𝜒𝑚 ) : 𝜒𝑚 = (𝜒 ˆ𝑚 , 𝐻𝑚 )∀𝐻𝑚 } as 1) − 𝑉˜ (𝑞𝑚 , 𝐻𝑚 , 𝛾𝑚 , 𝑍𝑚 = 1). Note that the optimal power
the collection of actions under a given reduced state 𝜒 ˆ𝑚 control action depends on the local CSI via the standard
for all possible current slot’s CSI 𝐻𝑚 . The policy 𝜋𝑃 is water-filling form. On the other hand, it also depends on
therefore
∪ equal to the union of all conditional actions, i.e., the local QSI and common feedback 𝑍 through the water-
𝜋𝑃 = 𝜒ˆ 𝜋P (𝜒).ˆ level9 . Using the optimal power allocation policy, the tran-
Taking conditional expectation (conditioned on 𝜒) ˆ on sition probability of reduced state is Pr{𝜒 ˆ𝑚+1 ∣𝜒ˆ𝑚 } =
both sides of (9), and letting ˜ (𝜒
𝑉 ˆ 𝑚 ) = 𝔼[𝑉 (𝜒 𝑚 )∣ 𝜒
ˆ 𝑚] = Pr{𝑄𝑚+1 ∣𝜒𝑚 , 𝑍𝑚 , 𝜋𝑃 (𝜒𝑚 )} Pr{Φ𝑚+1 ∣Φ𝑚 }. The stationary

Pr{𝐻𝑚 ∣𝐻𝑚−1 }𝑉 (𝜒𝑚 ), the Bellman equation becomes: distribution of 𝜒,ˆ denoted∑𝜔(𝜒),ˆ could be found by the
𝐻𝑚 linear equations 𝜔(𝜒 ˆ𝑗 ) = 𝑖 𝜔(𝜒
ˆ𝑖 ) Pr{𝜒
ˆ 𝑗 ∣𝜒
ˆ𝑖 }. Finally, the
{
Lagrange multiplier 𝜉 is chosen to satisfy the average power
𝑉˜ (𝜒
ˆ𝑚 ) + 𝜃 = inf 𝑔 (𝜒
˜ ˆ𝑚 , a(𝜒
ˆ𝑚 )) + constraint per user 𝑃0 :
a(𝜒
ˆ𝑚 )
∑ } ∑
Pr{𝜒
ˆ𝑚+1 ∣𝜒 ˆ𝑚 )}𝑉˜ (𝜒
ˆ𝑚 , a(𝜒 ˆ𝑚+1 ) (10) 𝑃0 = 𝜔(𝜒 ˆ𝑚 ) Pr{𝐻𝑚 ∣𝐻𝑚−1 }𝑃 (𝜒𝑚 ) (13)
𝐻𝑚
𝜒
ˆ𝑚+1

where 𝑎(𝜒𝑚 ) = 𝜋𝑃 (𝜒𝑚 ) is a single power allocation action D. Threshold Control Policy
at state 𝜒𝑚 and a(𝜒 ˆ𝑚 ) = 𝜋P (𝜒 ˆ𝑚 ) is the collection of Threshold control policy is determined based on the com-
power allocation actions under a given reduced state 𝜒 ˆ𝑚 . mon information {𝛾𝑚−1 , 𝑍𝑚−1 }. The full exploitation of
Furthermore, ˜
𝑔(𝜒ˆ𝑚 , a(𝜒
ˆ𝑚 )) is the conditional per-stage price the known information is critical to improve the delay per-
function given by: formance of the system. In fact, the common information
𝑔˜(𝜒
ˆ𝑚 , a(𝜒
ˆ𝑚 )) = 𝔼[𝑔(𝜒
ˆ𝑚 , 𝐻𝑚 , 𝑎(𝜒𝑚 ))∣𝜒
ˆ𝑚 ] (11) {𝛾𝑚−1 , 𝑍𝑚−1 } could be used to exploit the memory of all
(∑ ) the 𝐾 competing users’ fading channels, and predict their
= 𝑄𝑚 + 𝜉 Pr{𝐻𝑚 ∣𝐻𝑚−1 }𝑃𝑚 transmission events at the current slot. Specifically, in the
𝐻𝑚
collision channel, data will be successfully received by the
As a result, the original MDP is equivalent to a reduced
AP in the S-ALOHA network, if and only if exactly one user
state MDP, which is summarized in the following lemma.
transmits at one slot. Consequently, the known information
Lemma 3 (Equivalent MDP on a Reduced State Space): shall be chosen to ensure the user with the largest CSI
The original MDP in lemma 2 is equivalent to the following re- will transmit alone with the highest probability. Based on
duced state MDP with state space given by 𝜒
ˆ𝑚 , average price
1 ∑𝑀 this observation, we propose a larger CSI higher priority
given by: 𝐽 𝜋𝑃 (𝜒1 ) = lim sup𝑀 𝑀 𝑚=1 [˜
𝔼 𝑔 (𝜒
ˆ𝑚 , a(𝜒
ˆ𝑚 ))]. (LCSIHP) threshold control policy as follows:
Pr{
∑ 𝜒
ˆ 𝑚+1 ∣ 𝜒
ˆ 𝑚 , a(𝜒ˆ 𝑚 )} is the states transition kernel equal

to 𝐻𝑚 Pr{𝜒 ˆ𝑚+1 ∣𝜒𝑚 , 𝑎(𝜒𝑚 )} Pr{𝐻𝑚 ∣𝐻𝑚−1 }. 𝛾𝑚 = 𝜋𝛾 (𝛾𝑚−1 , 𝑍𝑚−1 ) (14)
The bellman equation for reduced state MDP is given in = arg max Pr{only 1 user transmits∣𝛾𝑚−1 , 𝑍𝑚−1 }
𝛾𝑚
(10). Note that while the reduced state MDP is defined over
the partial state 𝜒,
ˆ the power allocation is still a function of the Given {𝛾𝑚−1 , 𝑍𝑚−1 }, 𝛾𝑚 ∗
given in [16] is a one di-
original complete local system state. In fact, for realization of mensional optimization problem on 𝛾𝑚 and can be solved
the reduced state 𝜒 ˆ𝑚 , the solution of the reduced MDP gives efficiently using standard numerical methods.
the conditional actions for different realization of 𝐻𝑚 .
E. Summary of the Solution in Symmetric Network
C. Delay-Optimal Power Control Solution The overall power and threshold control solution in sym-
Value or policy iteration can be used to solve the bellman metric network consists of an offline procedure and an online
equation (10), and the convergence of the iteration algorithms procedure and they are summarized below.
is ensured by the following lemma. Offline Procedure: The output of the offline procedure is
Lemma 4 (Decidability of the Unichain of Reduced State): optimal power allocation 𝜋𝑃 (𝜒), which will be stored in a
The unichain of the reduced state MDP in lemma 3 is table and used in the online procedure.
decidable under all power control policy. ∙ Step 1) Determination of the threshold control policy:
Proof: Please refer to [16]. Figure out the threshold control policy from (14) for
The number of unichains of the reduced state MDP in different realization of {𝛾𝑚−1 , 𝑍𝑚−1 }.
(3) depends on the number of recurrent classes of local ∙ Step 2) Acquire unichains of reduced state: From the
system state (excluding the queue state 𝑄) in 𝜒 ˆ𝑚 , i.e., given threshold control policy, obtain the recurrent classes
Φ𝑚 = {𝐻𝑖 , 𝛾𝑖 , 𝑍𝑖 }𝑖=𝑚−1 . The value or policy iteration of the reduced state 𝜒ˆ from lemma 4.
could be applied to different unichains respectively, while the ∙ Step 3) Determination of the optimal power
convergence and unique solution is ensured [19]. Specifically, control policy: For a given 𝜉, determine 𝜃(𝜉),
the optimal power control policy for a system state 𝜒𝑚 is thus {𝑉˜ (𝑄𝑚 , 𝐻𝑚−1 , 𝛾𝑚−1 , 𝑍𝑚−1 ; 𝜉)} of the bellman equa-
given by[16]: tion (10) in every unichain of reduced state by policy
( )+
𝑊 𝜏 Pr{𝑍𝑚 = 1∣𝑍𝑚−1 }𝛿(𝑞𝑚 , 𝐻𝑚 , 𝛾𝑚 ) 𝑁0 𝑊 9 As a sanity check in [16], we have shown that when the CSI are i.i.d and
𝑃 (𝜒𝑚 ) = ( ) −
− 𝑁𝑏 𝜉 ln 2 𝐻𝑚 the the control policies are not function of QSI, our proposed algorithm can
(12) be reduced to the Variable-Rate Algorithms studied in the [5].
HUANG and LAU: DELAY-SENSITIVE DISTRIBUTED POWER AND TRANSMISSION THRESHOLD CONTROL FOR S-ALOHA . . . 5637

or value iteration algorithm. The optimal power control 5


policy 𝜋𝑃 (𝜒𝑚 ; 𝜉) is then determined in (12).
Binary Scheduling
∙ Step 4) Transmit power constraint: For a given 𝜉, 4.5
the average transmit power 𝑃0 can be obtained in (13). Fixed Power

On the other hand, we could use root-finding numerical 4

Average Delay(packets)
Baselines
algorithm to determine 𝜉 that satisfies a given 𝑃0 .
3.5
Online procedure: The homogeneous users observe 𝜒𝑚 =
Variable Rate
{𝑄𝑚 , 𝐻𝑚−1 , 𝛾𝑚−1 , 𝑍𝑚−1 , 𝐻𝑚 }, the local system state re- 3
alization at the beginning of the 𝑚-th slot and transmits Proposed Scheme
at a power given by 𝜋𝑃 (𝜒𝑚 ). If 𝐻𝑚 < 𝜋𝛾 (𝛾𝑚−1 , 𝑍𝑚−1 ), 2.5
𝑃𝑚 = 𝜋𝑃 (𝜒𝑚 ) = 0, i.e., the user will not transmit. Binary Scheduling(Baseline 1)
Fixed Power(baseline 2)
The complexity of the online procedure is negligible be- 2
Variable Rate(baseline 3)
cause it is simply a table looking up. The complexity of the LCSIHP(Proposed Scheme)
1.5
offline procedure depends mostly on the solution of power −5 0 5 10 15 20 25
Average Power P (dB)
control policy, which contains an iteration algorithm to solve 0

the bellman equation in (10). Specifically, the complexity of


the reduced state MDP is given in following theorem. Fig. 2. Comparison of the delay performance between proposed control
Theorem 1 (Complexity of the Reduced State MDP): The policy and three baselines in symmetric network. We assume that the buffer
worst case complexity of the reduced state MDP is 𝑂(𝑓 (𝐾)), length 𝑁 = 5, packet arrival rate 𝜆 = 1 for all 𝐾 = 5 users, with mean
packet size 𝑁 𝑏 = 1K bits.
where 𝑓 (𝐾) is a monotonic decreasing function of number
of users 𝐾. Furthermore, there exists a constant 𝐾0 > 0 such
that for all 𝐾 > 𝐾0 , the complexity is reduced to 𝑂(𝑁 𝐽). 1

Proof: Please refer to [16]. 0.9


Theorem 1 implies that when 𝐾 is large enough, there is 0.8
Proposed Scheme

no need to exploit the memory of the fading channels. The


Average Throughput(bits/slot)

threshold is fixed to 𝑆𝐽 regardless of the common feedback. 0.7

This is reasonable because the more competing users we 0.6

have, the smaller the chance for single user to transmit. 0.5 Variable Rate
Hence, for sufficiently large 𝐾, the users are only allowed Baselines
0.4
to transmit when local CSI reaches the largest state 𝑆𝐽 , so as
Binary Scheduling
to reduce the intensive collision. Note that, the complexity of 0.3

the offline procedure is substantially reduced, compared to the 0.2 LCSIHP(Proposed Scheme)

complexity 𝑂(𝑁 𝐽 3 ) of the brute-force solution in the original 0.1 Fixed Power
Variable Rate(Baseline 3)
Fixed Power(Baseline 2)
MDP in lemma 2. Binary Scheduling(Baseline 1)
0
−5 0 5 10 15 20 25
Average Power P (dB)
0
V. N UMERICAL R ESULTS AND D ISCUSSIONS
In this section, we shall illustrate the delay performance of Fig. 3. Comparison of the throughput performance between proposed control
the proposed control policy via numerical simulations. We set policy and three baselines in symmetric network. The configuration is the
the time of a slot 𝜏 = 1ms, bandwidth 𝑊 = 1KHz. We model same as Fig.2.
the packet arrival and CSI event follows the assumption in the
system model (Section II). With different simulation scenarios,
we calculate the optimal policies in offline. In the online be inferred from the optimal power control policy, which will
application, the users simply implement the policy at each slot potentially put more power on the node with larger QSI to
corresponding to the system state observed in that slot. The reduce the delay.
packet will stay in the buffer until it is successfully serviced, We have provided more simulation results in [16], including
and the performance is evaluated with sufficient realizations. the comparison of delay performance of the random access
Fig.2-Fig.4 compares the LCSIHP threshold control policy channel with capture effect. Our proposed scheme for the
(corresponding optimal power control policy) in symmetric asymmetric network also has significant gain compared with
network with three reference baselines. Baseline 1 corresponds the baselines.
to the binary scheduling algorithm in [6]. Baseline 2 corre-
sponds to the LCSIHP threshold control policy without power VI. S UMMARY
control. Baseline 3 corresponds to the variable-rate algorithm We considered delay-sensitive transmit power and threshold
with power control proposed in [5]. We observe that there control design in S-ALOHA network. The users adaptively
is a significant gain in both delay and throughput of the adjust their transmission threshold and power, to achieve the
proposed policy over these three baselines. Fig.4 compares minimal delay of the network. The jointly optimal policy is
packet dropping probability (packet arrives when the buffer is revealed to be computationally intractable and hence brute
full 𝑄 = 𝑁 ). It shows that packet dropping performance is force solution is simply infeasible. However, for a given
also improved by the proposed policy. This scenario can also threshold control policy, we decompose the optimal power
5638 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO. 11, NOVEMBER 2009

1
[9] T. P. Coleman and M. Mdard, “Trade-off between energy and delay
Pakcet Dropping Probability(Conditioned on Pakcet arrival)
Binary Scheduling(Baseline 1) in wireless packetized systms," in Proc. 39th Annual Allerton Conf.
0.9 Fixed Power(Baseline 2) Commun., Control Computing, 2001.
Variable Rate(Baseline 3) [10] E. M. Yeh, “Multiaccess and fading in communication networks," Ph.D.
0.8 LCSIHP(Proposed Scheme) dissertation, MIT, Sep. 2001.
0.7 Binary Scheduling
[11] D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein, “The
complexity of decentralized control of markov decision processes,"
0.6
Fixed Power
Mathematics Operations Research, vol. 4, pp. 819-840, 2002.
[12] M. L. Littman, “Memoryless policies: theoretical limitations and prac-
0.5
tical results," in Proc. Third International Conf. Simulation Adaptive
0.4 Variable Rate Behavior, 1994.
[13] N. Meuleau, K. E. Kim, L. P. Kaelbling, and A. R. Cassandra, “Solving
0.3 POMDPs by searching the space of finite policies," in Proc. Fifteenth
Conf. Uncertainty AI, 1999, pp. 417-426.
0.2 Proposed Scheme
[14] H. S. Wang and N. Moayeri, “Finite-state markov channel—a useful
0.1 Baselines
model for radio communication channels," IEEE Trans. Veh. Technol.,
0
vol. 44, pp. 163-17, Feb. 1995.
−5 0 5 10 15 20 25 [15] S. Y.Chung, J. G. D.Forney, T. J.Richardson, and R. Urbanke, “On
Average Power P (dB)
0 the design of low-density parity-check codes within 0.0045db of the
shannon limit," IEEE Commun. Lett., vol. 5, pp. 58-60, Feb. 2001.
[16] H. Huang and V. K. N. Lau, “Delay-sensitive distributed power
Fig. 4. Comparison of Packet Dropping Probability (Conditioned on Packet and transmission threshold control for S-ALOHA network
Arrival) between proposed control policy and three baselines in symmetric with finite state markov fading channels." [Online]. Available:
network. The configuration is the same as Fig.2. http://arxiv.org/abs/0908.2941.
[17] D. S. Bernstein, C. Amato, E. A. Hansen, and S. Zilberstein, “Policy
iteration for decentralized control of Markov decision processes," J.
Artificial Intelligence Research, vol. 34, pp. 89-132, Feb. 2009.
control policy into a reduced state MDP for single user, in [18] O. Madani, S. Hanks, and A. Condon, “On the undecidability of
which the overall complexity is 𝒪(𝑁 𝐽). Threshold control probabilistic planning and inifinite-horizon partially observable Markov
policy is proposed by exploiting the special structure of the decision process problems," in Proc. Sixteenth National Conf. Artificial
Intelligence, 1999, pp. 541-548.
collision channel and the common feedback to derive a low [19] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dy-
complexity solution, which is a one dimensional optimization namic Programming. John Wiley and Sons, 1994.
problem. The delay performance of the proposed design is [20] V. K. N. Lau and Y. Chen, “Delay-optimal power and precoder adapta-
tion for multi-stream MIMO systems in wireless fading channels with
illustrated to have substantial gain relative to conventional S- outdated CSIT," to appear in IEEE Trans. Wireless Commun.
ALOHA protocols.
Huang Huang received the B.Eng. and M.Eng.
R EFERENCES (Gold medal) from the Harbin Institute of Tech-
nology(HIT) in 2005 and 2007 respectively, all in
[1] B. Tsybakov and W. Mikhailov, “Ergodicity of slotted ALOHA sys- Electrical Engineering. He is currently a PhD stu-
tems," Probl. Inf. Transm., vol. 15, pp. 30-312, Oct./Dec. 1979. dent at the Department of Electrical and Computer
[2] W. Luo and A. Ephremides, “Stability of N interacting queues in random Engineering, The Hong Kong University of Science
access system," IEEE Trans. Inf. Theory, vol. 45, pp. 1579-1587, July and Technology. His recent research interests in-
1999. clude cross layer design and performance analysis
[3] R. Rao and A. Ephremides, “On the stability of interacting queues in via game theory in random access network, and
a multi-access system," IEEE Trans. Inf. Theory, vol. 34, pp. 918-930, embedded system design.
Sep. 1988.
[4] S. Adireddy and L. Tong, “Exploiting decentralized channel state
information for random access," IEEE Trans. Inf. Theory, vol. 51, pp. Vincent Lau obtained B.Eng (Distinction 1st Hons)
537-561, Feb. 2005. from the University of Hong Kong (1989-1992) and
[5] X. Qin and R. A. Berry, “Distributed approaches for exploiting multiuser Ph.D. from Cambridge University (1995-1997). He
diversity in wireless networks," IEEE Trans. Inf. Theory, vol. 52, pp. was with HK Telecom (PCCW) as system engineer
392-413, Feb. 2006. from 1992-1995 and Bell Labs - Lucent Technolo-
[6] Y. Yu and G. B. Giannakis, “Opportunistic medium access for wire- gies as member of technical staff from 1997-2003.
less networking adapted to decentralized CSI," IEEE Trans. Wireless He then joined the Department of ECE, Hong Kong
Commun., vol. 5, pp. 1445-1455, June 2006. University of Science and Technology (HKUST) as
[7] V. Naware, G. Mergen, and L. Tong, “Stability and delay of finite-user Associate Professor. His current research interests
slotted ALOHA with mutipacket reception," IEEE Trans. Inf. Theory, include the robust and delay-sensitive cross-layer
vol. 51, pp. 2636-200, July 2005. scheduling of MIMO/OFDM wireless systems with
[8] S. B. Rasool and A. U. H. Sheikh, “An approximate analysis of buffered imperfect channel state information, cooperative and cognitive communi-
S-ALOHA in fading channels using tagged user analysis," IEEE Trans. cations, dynamic spectrum access as well as stochastic approximation and
Wireless Commun., vol. 6, pp. 1320-1326, Apr. 2007. Markov Decision Process.

You might also like