Stochastic Network Optimization
with Application to
Communication and
Queueing Systems
Copyright © 2010 by Morgan & Claypool
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in
printed reviews, without the prior permission of the publisher.
Stochastic Network Optimization with Application to Communication and Queueing Systems
Michael J. Neely
www.morganclaypool.com
ISBN: 9781608454556 paperback
ISBN: 9781608454563 ebook
DOI 10.2200/S00271ED1V01Y201006CNT007
A Publication in the Morgan & Claypool Publishers series
SYNTHESIS LECTURES ON COMMUNICATION NETWORKS
Lecture #7
Series Editor: Jean Walrand, University of California, Berkeley
Series ISSN
Synthesis Lectures on Communication Networks
Print 19354185 Electronic 19354193
This material is supported in part by one or more of the following: the DARPA ITMANET program grant
W911NF070028, the NSF Career grant CCF0747525, and continuing through participation in the Network
Science Collaborative Technology Alliance sponsored by the U.S. Army Research Laboratory.
Synthesis Lectures on
Communication Networks
Editor
Jean Walrand, University of California, Berkeley
Synthesis Lectures on Communication Networks is an ongoing series of 50 to 100page publications
on topics on the design, implementation, and management of communication networks. Each lecture is
a selfcontained presentation of one topic by a leading expert. The topics range from algorithms to
hardware implementations and cover a broad spectrum of issues from security to multipleaccess
protocols. The series addresses technologies from sensor networks to reconﬁgurable optical networks.
The series is designed to:
• Provide the best available presentations of important aspects of communication networks.
• Help engineers and advanced students keep up with recent developments in a rapidly evolving
technology.
• Facilitate the development of courses in this ﬁeld.
Stochastic Network Optimization with Application to Communication and Queueing
Systems
Michael J. Neely
2010
Scheduling and Congestion Control for Wireless and Processing Networks
Libin Jiang and Jean Walrand
2010
Performance Modeling of Communication Networks with Markov Chains
Jeonghoon Mo
2010
Communication Networks: A Concise Introduction
Jean Walrand and Shyam Parekh
2010
Path Problems in Networks
John S. Baras and George Theodorakopoulos
2010
iv
Performance Modeling, Loss Networks, and Statistical Multiplexing
Ravi R. Mazumdar
2009
Network Simulation
Richard M. Fujimoto, Kalyan S. Perumalla, and George F. Riley
2006
Stochastic Network Optimization
with Application to
Communication and
Queueing Systems
Michael J. Neely
University of Southern California
SYNTHESIS LECTURES ON COMMUNICATION NETWORKS #7
C
M
&
cLaypool Morgan publishers
&
ABSTRACT
This text presents a modern theory of analysis, control, and optimization for dynamic networks.
Mathematical techniques of Lyapunov drift and Lyapunov optimization are developed and shown
to enable constrained optimization of time averages in general stochastic systems. The focus is on
communication and queueing systems, including wireless networks with timevarying channels,
mobility, and randomly arriving trafﬁc. A simple driftpluspenalty framework is used to optimize
time averages such as throughput, throughpututility, power, and distortion. Explicit performance
delay tradeoffs are provided to illustrate the cost of approaching optimality. This theory is also
applicable to problems in operations research and economics, where energyefﬁcient and proﬁt
maximizing decisions must be made without knowing the future.
Topics in the text include the following:
• Queue stability theory
• Backpressure, maxweight, and virtual queue methods
• Primaldual methods for nonconvex stochastic utility maximization
• Universal scheduling theory for arbitrary sample paths
• Approximate and randomized scheduling theory
• Optimization of renewal systems and Markov decision systems
Detailed examples and numerous problem set questions are provided to reinforce the main
concepts.
KEYWORDS
dynamic scheduling, decision theory, wireless networks, Lyapunov optimization, con
gestion control, fairness, network utility maximization, multihop, mobile networks,
routing, backpressure, maxweight, virtual queues
vii
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Example Opportunistic Scheduling Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Example Problem 1: Minimizing Time Average Power Subject to
Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Example Problem 2: Maximizing Throughput Subject to Time
Average Power Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3 Example Problem 3: Maximizing ThroughputUtility Subject to Time
Average Power Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 General Stochastic Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Lyapunov Drift and Lyapunov Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Differences from our Earlier Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Alternative Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 On General Markov Decision Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 On Network Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7.1 Delay and Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7.2 Optimal O(
√
V) and O(log(V)) delay tradeoffs . . . . . . . . . . . . . . . . . . . . . . 9
1.7.3 Delayoptimal Algorithms for Symmetric Networks . . . . . . . . . . . . . . . . . . 10
1.7.4 Orderoptimal Delay Scheduling and Queue Grouping . . . . . . . . . . . . . . . 10
1.7.5 Heavy Trafﬁc and Decay Exponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7.6 Capacity and Delay Tradeoffs for Mobile Networks . . . . . . . . . . . . . . . . . . 11
1.8 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2
Introduction to Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 Rate Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Stronger Forms of Stability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Randomized Scheduling for Rate Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 A 3Queue, 2Server Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3.2 A 2Queue Opportunistic Scheduling Example . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
viii
3
Dynamic Scheduling Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Scheduling for Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 The Sonly Algorithm and
max
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.2 Lyapunov Drift for Stable Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.3 The “MinDrift” or “MaxWeight” Algorithm . . . . . . . . . . . . . . . . . . . . . . . 34
3.1.4 Iterated Expectations and Telescoping Sums . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.5 Simulation of the MaxWeight Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Stability and Average Power Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.1 DriftPlusPenalty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.2 Analysis of the DriftPlusPenalty Algorithm . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.3 Optimizing the Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.4 Simulations of the DriftPlusPenalty Algorithm . . . . . . . . . . . . . . . . . . . . 42
3.3 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4
Optimizing Time Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1 Lyapunov Drift and Lyapunov Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.1 Lyapunov Drift Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.2 Lyapunov Optimization Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.3 Probability 1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 General System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.1 Boundedness Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Optimality via ωonly Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 Virtual Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.5 The Min DriftPlusPenalty Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.5.1 Where are we Using the i.i.d. Assumptions? . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6.1 Dynamic Server Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6.2 Opportunistic Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.7 Variable V Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.8 PlaceHolder Backlog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.9 Noni.i.d. Models and Universal Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.9.1 Markov Modulated Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.9.2 NonErgodic Models and Arbitrary Sample Paths . . . . . . . . . . . . . . . . . . . . 77
4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.11 Appendix 4.A — Proving Theorem 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.11.1 The Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.11.2 Characterizing Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
ix
5
Optimizing Functions of Time Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.0.3 The Rectangle Constraint R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.0.4 Jensen’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.0.5 Auxiliary Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1 Solving the Transformed Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2 A FlowBased Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.1 Performance of the FlowBased Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2.2 Delayed Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2.3 Limitations of this Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3 MultiHop Queueing Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3.1 Transmission Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3.2 The Utility Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3.3 MultiHop Network Utility Maximization . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3.4 BackpressureBased Routing and Resource Allocation . . . . . . . . . . . . . . . 113
5.4 General Optimization of Convex Functions of Time Averages . . . . . . . . . . . . . . 114
5.5 NonConvex Stochastic Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.6 Worst Case Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.6.1 The persistent service queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.6.2 The DriftPlusPenalty for WorstCase Delay . . . . . . . . . . . . . . . . . . . . . . 123
5.6.3 Algorithm Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.7 Alternative Fairness Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6
Approximate Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.1 TimeInvariant Interference Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.1.1 Computing over Multiple Slots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.1.2 Randomized Searching for the MaxWeight Solution . . . . . . . . . . . . . . . . 140
6.1.3 The JiangWalrand Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.2 Multiplicative Factor Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7
Optimization of Renewal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.1 The Renewal System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.1.1 The Optimization Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.1.2 Optimality over i.i.d. algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.2 DriftPlusPenalty for Renewal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.2.1 Alternate Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
x 0. CONTENTS
7.3 Minimizing the DriftPlusPenalty Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.3.1 The Bisection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.3.2 Optimization over Pure Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.3.3 Caveat — Frames with Initial Information . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.4 Task Processing Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.5 Utility Optimization for Renewal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.5.1 The Utility Optimal Algorithm for Renewal Systems . . . . . . . . . . . . . . . . 167
7.6 Dynamic Programming Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
7.6.1 DelayLimited Transmission Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
7.6.2 Markov Decision Problem for Minimum Delay Scheduling . . . . . . . . . . . 171
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Author’s Biography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Preface
This text is written to teach the theory of Lyapunov drift and Lyapunov optimization for stochastic
network optimization. It assumes only that the reader is familiar with basic probability concepts
(such as expectations and the law of large numbers). Familiarity with Markov chains and with stan
dard (nonstochastic) optimization is useful but not required. A variety of examples and simulation
results are given to illustrate the main concepts. Diverse problem set questions (several with ex
ample solutions) are also given. These questions and examples were developed over several years
for use in the stochastic network optimization course taught by the author. They include topics
of wireless opportunistic scheduling, multihop routing, network coding for maximum throughput,
distortionaware data compression, energyconstrained and delayconstrained queueing, dynamic
decision making for maximum proﬁt, and more.
The Lyapunov theory for optimizing network time averages was described collectively in our
previous text (22). The current text is signiﬁcantly different from (22). It has been reorganized with
many more examples to help the reader. This is done while still keeping all of the details for a
complete and selfcontained exposition of the material. This text also provides many recent topics
not covered in (22), including:
• A more detailed development of queue stability theory (Chapter 2).
• VariableV algorithms that provide exact optimality of time averages subject to a weaker form
of stability called “mean rate stability” (Section 4.7).
• Placeholder bits for delay improvement (Sections 3.2.4 and 4.8).
• Universal scheduling for nonergodic sample paths (Section 4.9).
• Worst case delay bounds (Sections 5.6 and 7.6.1).
• Nonconvex stochastic optimization (Section 5.5).
• Approximate scheduling andfull throughput scheduling ininterference networks via the Jiang
Walrand theorem (Chapter 6).
• Optimization of renewal systems and Markov decision examples (Chapter 7).
• Treatment of problems with equality constraints and abstract set constraints (Section 5.4).
xii PREFACE
Finally, this text emphasizes the simplicity of the Lyapunov method, showing how all of the
results follow directly from four simple concepts: (i) telescoping sums, (ii) iterated expectations,
(iii) opportunistically minimizing an expectation, and (iv) Jensen’s inequality.
Michael J. Neely
September 2010
1
C H A P T E R 1
Introduction
This text considers the analysis and control of stochastic networks, that is, networks with random
events, time variation, and uncertainty. Our focus is on communication and queueing systems.
Example applications include wireless mesh networks with opportunistic scheduling, cognitive radio
networks, adhoc mobile networks, internets with peertopeer communication, and sensor networks
with joint compression and transmission. The techniques are also applicable to stochastic systems
that arise in operations research, economics, transportation, and smartgrid energy distribution.
These problems can be formulated as problems that optimize the time averages of certain quantities
subject to time average constraints on other quantities, and they can be solved with a common
mathematical framework that is intimately connected to queueing theory.
1.1 EXAMPLEOPPORTUNISTICSCHEDULINGPROBLEM
Q
1
(t)
Q
2
(t)
a
1
(t)
a
2
(t)
S
1
(t)
S
2
(t)
Receiver
b
1
(t)=b
1
(S(t),p(t))
b
2
(t)=b
2
(S(t),p(t))
Figure 1.1: The 2user wireless system for the example of Section 1.1.
Here we provide a simple wireless example to illustrate how the theory for optimizing time
averages can be used. Consider a 2user wireless uplink that operates in slotted time t ∈ {0, 1, 2, . . .}.
Every slot new data randomly arrives to each user for transmission to a common receiver. Let
(a
1
(t ), a
2
(t )) be the vector of new arrivals on slot t , in units of bits. The data is stored in queues
Q
1
(t ) and Q
2
(t ) to await transmission (see Fig. 1.1). We assume the receiver coordinates network
decisions every slot.
Channel conditions are assumed to be constant for the duration of a slot, but they can change
from slot to slot. Let S(t ) = (S
1
(t ), S
2
(t )) denote the channel conditions between users and the
receiver on slot t . The channel conditions represent any information that affects the channel on slot t ,
such as fading coefﬁcients and/or noise ratios. We assume the network controller can observe S(t ) at
the beginning of each slot t before making a transmission decision. This channelaware scheduling
is called opportunistic scheduling. Every slot t , the network controller observes the current S(t )
2 1. INTRODUCTION
and chooses a power allocation vector p(t ) = (p
1
(t ), p
2
(t )) within some set P of possible power
allocations. This decision, together with the current S(t ), determines the transmission rate vector
(b
1
(t ), b
2
(t )) for slot t , where b
k
(t ) represents the transmission rate (in bits/slot) fromuser k ∈ {1, 2}
to the receiver on slot t . Speciﬁcally, we have general transmission rate functions
ˆ
b
k
(p(t ), S(t )):
b
1
(t ) =
ˆ
b
1
(p(t ), S(t )) , b
2
(t ) =
ˆ
b
2
(p(t ), S(t ))
The precise form of these functions depends on the modulation and coding strategies used for
transmission. The queueing dynamics are then:
Q
k
(t +1) = max[Q
k
(t ) −
ˆ
b
k
(p(t ), S(t )), 0] +a
k
(t ) ∀k ∈ {1, 2}, ∀t ∈ {0, 1, 2, . . .}
Several types of optimization problems can be considered for this simple system.
1.1.1 EXAMPLEPROBLEM1: MINIMIZINGTIMEAVERAGEPOWER
SUBJECTTOSTABILITY
Let p
k
be the time average power expenditure of user k under a particular power allocation algorithm
(for k ∈ {1, 2}):
p
k
=
lim
t →∞
1
t
¸
t −1
τ=0
p
k
(τ)
The problem of designing an algorithm to minimize time average power expenditure subject to
queue stability can be written mathematically as:
Minimize: p
1
+p
2
Subject to: 1) Queues Q
k
(t ) are stable ∀k ∈ {1, 2}
2) p(t ) ∈ P ∀t ∈ {0, 1, 2, . . .}
where queue stability is deﬁned in the next chapter. It is shown in the next chapter that queue
stability ensures the time average output rate of the queue is equal to the time average input rate.
Our theory will allow the design of a simple algorithm that makes decisions p(t ) ∈ P every slot
t , without requiring apriori knowledge of the probabilities associated with the arrival and channel
processes a(t ) and S(t ). The algorithm meets all desired constraints in the above problem whenever
it is possible to do so. Further, the algorithm is parameterized by a constant V ≥ 0 that can be
chosen as desired to yield time average power within O(1/V) from the minimum possible time
average power required for queue stability. Choosing a large value of V can thus push average power
arbitrarily close to optimal. However, this comes with a tradeoff in average queue backlog and delay
that is O(V).
1.1.2 EXAMPLEPROBLEM2: MAXIMIZINGTHROUGHPUTSUBJECTTO
TIMEAVERAGEPOWERCONSTRAINTS
Consider the same system, but nowassume the arrival process a(t ) = (a
1
(t ), a
2
(t )) can be controlled
by a ﬂow control mechanism. We thus have two decision vectors: p(t ) (the power allocation vector)
1.1. EXAMPLEOPPORTUNISTICSCHEDULINGPROBLEM 3
and a(t ) (the data admission vector). The admission vector a(t ) is chosen within some set Aevery
slot t . Let a
k
be the time average admission rate (in bits/slot) for user k, which is the same as
the time average throughput of user k if its queue is stable (as shown in the next chapter). We
have the following problem of maximizing a weighted sum of throughput subject to average power
constraints:
Maximize: w
1
a
1
+w
2
a
2
Subject to: 1) p
k
≤ p
k,av
∀k ∈ {1, 2}
2) Queues Q
k
(t ) are stable ∀k ∈ {1, 2}
3) p(t ) ∈ P ∀t ∈ {0, 1, 2, . . .}
4) a(t ) ∈ A ∀t ∈ {0, 1, 2, . . .}
where w
1
, w
2
are given positive weights that deﬁne the relative importance of user 1 trafﬁc and user
2 trafﬁc, and p
1,av
, p
2,av
are given constants that represent desired average power constraints for
each user. Again, our theory leads to an algorithmthat meets all desired constraints and comes within
O(1/V) of the maximum throughput possible under these constraints, with an O(V) tradeoff in
average backlog and delay.
1.1.3 EXAMPLEPROBLEM3: MAXIMIZINGTHROUGHPUTUTILITY
SUBJECTTOTIMEAVERAGEPOWERCONSTRAINTS
Consider the same system as Example Problem 2, but now assume the objective is to maximize
a concave function of throughput, rather than a linear function of throughput (the deﬁnition of
“concave” is giveninfootnote 1 inthe next subsection). Speciﬁcally, let g
1
(a) andg
2
(a) be continuous,
concave, and nondecreasing functions of a over the range a ≥ 0. Such functions are called utility
functions. The value g
1
(a
1
) represents the utility (or satisfaction) that user 1 gets by achieving a
throughput of a
1
. Maximizing g
1
(a
1
) +g
2
(a
2
) can provide a more “fair” throughput vector (a
1
, a
2
).
Indeed, maximizing a linear function often yields a vector with one component that is very high and
the other component very low (possibly 0). We then have the problem:
Maximize: g
1
(a
1
) +g
2
(a
2
)
Subject to: 1) p
k
≤ p
k,av
∀k ∈ {1, 2}
2) Queues Q
k
(t ) are stable ∀k ∈ {1, 2}
3) p(t ) ∈ P ∀t ∈ {0, 1, 2, . . .}
4) a(t ) ∈ A ∀t ∈ {0, 1, 2, . . .}
Typical utility functions are g
1
(a) = g
2
(a) = log(a), or g
1
(a) = g
2
(a) = log(1 +a). These
functions are nondecreasing and strictly concave, so that g
1
(a
1
) has a diminishing returns prop
erty with each incremental increase in throughput a
1
. This means that if a
1
< a
2
, the sum utility
g
1
(a
1
) +g
2
(a
2
) would be improved more by increasing a
1
than by increasing a
2
. This creates a
more evenly distributed throughput vector. The log(a) utility functions provide a type of fairness
called proportional fairness (see (1)(2)). Fairness properties of different types of utility functions are
considered in (3)(4)(5)(6).
4 1. INTRODUCTION
For any given continuous and concave utility functions, our theory enables the design of
an algorithm that meets all desired constraints and provides throughpututility within O(1/V) of
optimality, with a tradeoff in average backlog and delay that is O(V).
We emphasize that these three problems are just examples. The general theory can treat
many more types of networks. Indeed, the examples and problem set questions provided in this text
include networks with probabilistic channel errors, network coding, data compression, multihop
communication, and mobility. The theory is also useful for problems within operations research and
economics.
1.2 GENERAL STOCHASTICOPTIMIZATIONPROBLEMS
The three example problems considered in the previous section all involved optimizing a time
average (or a function of time averages) subject to time average constraints. Here we state the
general problems of this type. Consider a stochastic network that operates in discrete time with
unit time slots t ∈ {0, 1, 2, . . .}. The network is described by a collection of queue backlogs, written
in vector form Q(t ) = (Q
1
(t ), . . . , Q
K
(t )), where K is a nonnegative integer. The case K = 0
corresponds to a system without queues. Every slot t , a control action is taken, and this action affects
arrivals and departures of the queues and also creates a collection of real valued attribute vectors x(t ),
y(t ), e(t ):
x(t ) = (x
1
(t ), . . . , x
M
(t ))
y(t ) = (y
0
(t ), y
1
(t ), . . . , y
L
(t ))
e(t ) = (e
1
(t ), . . . , e
J
(t ))
for some nonnegative integers M, L, J (used to distinguish between equality constraints and two
types of inequality constraints).The attributes canbe positive or negative, andthey represent penalties
or rewards associated with the network on slot t , such as power expenditures, distortions, or packet
drops/admissions. These attributes are given by general functions:
x
m
(t ) = ˆ x
m
(α(t ), ω(t )) ∀m ∈ {1, . . . , M}
y
l
(t ) = ˆ y
l
(α(t ), ω(t )) ∀l ∈ {0, 1, . . . , L}
e
j
(t ) = ˆ e
j
(α(t ), ω(t )) ∀j ∈ {1, . . . , J}
where ω(t ) is a random event observed on slot t (such as new packet arrivals or channel conditions)
and α(t ) is the control action taken on slot t (such as packet admissions or transmissions). The action
α(t ) is chosen within an abstract set A
ω(t )
that possibly depends on ω(t ). Let x
m
, y
l
, e
j
represent
the time average of x
m
(t ), y
l
(t ), e
j
(t ) under a particular control algorithm. Our ﬁrst objective is to
1.3. LYAPUNOVDRIFTANDLYAPUNOVOPTIMIZATION 5
design an algorithm that solves the following problem:
Minimize: y
0
(1.1)
Subject to: 1) y
l
≤ 0 for all l ∈ {1, . . . , L} (1.2)
2) e
j
= 0 for all j ∈ {1, . . . , J} (1.3)
3) α(t ) ∈ A
ω(t )
∀t (1.4)
4) Stability of all Network Queues (1.5)
Our second objective, more general than the ﬁrst, is to optimize convex functions of time
averages.
1
Speciﬁcally, let f (x), g
1
(x), . . . , g
L
(x) be convex functions from R
M
to R, and let X
be a closed and convex subset of R
M
. Let x = (x
1
, . . . , x
M
) be the vector of time averages of the
x
m
(t ) attributes under a given control algorithm. We desire a solution to the following problem:
Minimize: y
0
+f (x) (1.6)
Subject to: 1) y
l
+g
l
(x) ≤ 0 for all l ∈ {1, . . . , L} (1.7)
2) e
j
= 0 for all j ∈ {1, . . . , J} (1.8)
3) x ∈ X (1.9)
4) α(t ) ∈ A
ω(t )
∀t (1.10)
5) Stability of all Network Queues (1.11)
These problems (1.1)(1.5) and (1.6)(1.11) can be viewed as stochastic programs, and are
analogues of the classic linear programs and convex programs of static optimization theory. Asolution
is an algorithm for choosing control actions over time in reaction to the existing network state, such
that all of the constraints are satisﬁed and the quantity to be minimized is as small as possible. These
problems have wide applications, and they are of interest even when there is no underlying queueing
network to be stabilized (so that the “Stability” constraints in (1.5) and (1.11) are removed). However,
it turns out that queueing theory plays a central role in this type of stochastic optimization. Indeed,
even if there are no underlying queues in the original problem, we can introduce virtual queues as
a strong method for ensuring that the required time average constraints are satisﬁed. Inefﬁcient
control actions incur larger backlog in certain queues. These backlogs act as “sufﬁcient statistics” on
which to base the next control decision. This enables algorithms that do not require knowledge of
the probabilities associated with the random network events ω(t ).
1.3 LYAPUNOVDRIFTANDLYAPUNOVOPTIMIZATION
We solve the problems described above with a simple and elegant theory of Lyapunov drift and
Lyapunov optimization. While this theory is presented in detail in future chapters, we brieﬂy describe
it here. The ﬁrst step is to look at the constraints of the problem to be solved. For example, for the
1
A set X ⊆ R
M
is convex if the line segment formed by any two points in X is also in X. A function f (x) deﬁned over a convex
set X is a convex function if for any two points x
1
, x
2
∈ X and any two probabilities p
1
, p
2
≥ 0 such that p
1
+p
2
= 1, we
have f (p
1
x
1
+p
2
x
2
) ≤ p
1
f (x
1
) +p
2
f (x
2
). A function f (x) is concave if −f (x) is convex. A function f (x) is afﬁne if it
is linear plus a constant, having the form: f (x) = c
0
+
¸
M
m=1
c
m
x
m
.
6 1. INTRODUCTION
problem (1.1)(1.5), the constraints are (1.2)(1.5). Then construct virtual queues (in a way to be
speciﬁed) that help to meet the desired constraints. Next, deﬁne a function L(t ) as the sum of
squares of backlog in all virtual and actual queues on slot t . This is called a Lyapunov function, and
it is a scalar measure of network congestion. Intuitively, if L(t ) is “small,” then all queues are small,
and if L(t ) is “large,” then at least one queue is large. Deﬁne (t ) = L(t +1) −L(t ), being the
difference in the Lyapunov function from one slot to the next.
2
If control decisions are made every
slot t to greedily minimize (t ), then backlogs are consistently pushed towards a lower congestion
state, which intuitively maintains network stability (where “stability” is precisely deﬁned in the next
chapter).
Minimizing (t ) every slot is called minimizing the Lyapunov drift. Chapter 3 shows this
method provides queue stability for a particular example network, and Chapter 4 shows it also
stabilizes general networks. However, at this point, the problem is only half solved: The virtual
queues and Lyapunov drift help only to ensure the desired time average constraints are met. The
objective function to be minimized has not yet been incorporated. For example, y
0
(t ) is the objective
function for the problem (1.1)(1.5). The objective function is mapped to an appropriate function
penalty(t ). Instead of taking actions to greedily minimize (t ), actions are taken every slot t to
greedily minimize the following driftpluspenalty expression:
(t ) +V ×penalty(t )
where V is a nonnegative control parameter that is chosen as desired. Choosing V = 0 corresponds
to the original algorithm of minimizing the drift alone. Choosing V > 0 includes the weighted
penalty term in the control decision and allows a smooth tradeoff between backlog reduction and
penalty minimization. We showthat the time average objective function deviates by at most O(1/V)
from optimality, with a time average queue backlog bound of O(V).
While Lyapunov techniques have a long history in the ﬁeld of control theory, this form
of Lyapunov drift was perhaps ﬁrst used to construct stable routing and scheduling policies for
queueing networks inthe pioneering works (7)(8) byTassiulas andEphremides.These works usedthe
technique of minimizing (t ) every slot, resulting in backpressure routing and maxweight scheduling
algorithms that stabilize the network whenever possible. The algorithms are particularly interesting
because they only require knowledge of the current network state, and they do not require knowledge
of the probabilities associated with future random events. Minimizing (t ) has had wide success
for stabilizing many other types of networks, including packet switch networks (9)(10)(11), wireless
systems (7)(8)(12)(13)(14), and adhoc mobile networks (15). A related technique was used for
computing multicommodity network ﬂows in (16).
We introduced the V ×penalty(t ) term to the drift minimization in (17)(18)(19) to solve
problems of joint network stability and stochastic utility maximization, and we introduced the virtual
queue technique in (20)(21) to solve problems of maximizing throughput in a wireless network
2
The notation used in later chapters is slightly different. Simpliﬁed notation is used here to give the main ideas.
1.4. DIFFERENCES FROMOUREARLIERTEXT 7
subject to individual average power constraints at each node. Our previous text (22) uniﬁed these
ideas for application to general problems of the type described in Section 1.2.
1.4 DIFFERENCES FROMOUREARLIERTEXT
The theory of Lyapunov drift and Lyapunov optimization is described collectively in our previous
text (22). The current text is different from (22) in that we emphasize the general optimization
problems ﬁrst, showing how the problem (1.6)(1.11) can be solved directly by using the solution
to the simpler problem (1.1)(1.5). We also provide a variety of examples and problem set questions
to help the reader. These have been developed over several years for use in the stochastic network
optimization course taught by the author. This text also provides many new topics not covered in
(22), including:
• A more detailed development of queue stability theory (Chapter 2).
• VariableV algorithms that provide exact optimality of time averages subject to a weaker form
of stability called “mean rate stability” (Section 4.7).
• Placeholder bits for delay improvement (Sections 3.2.4 and 4.8).
• Universal scheduling for nonergodic sample paths (Section 4.9).
• Worst case delay bounds (Sections 5.6 and 7.6.1).
• Nonconvex stochastic optimization (Section 5.5).
• Approximate scheduling andfull throughput scheduling ininterference networks via the Jiang
Walrand theorem (Chapter 6).
• Optimization of renewal systems and Markov decision examples (Chapter 7).
• Treatment of problems withequality constraints (1.3) andabstract set constraints (1.9) (Section
5.4).
1.5 ALTERNATIVEAPPROACHES
The relationship between network utility maximization, Lagrange multipliers, convex programming,
and duality theory is developed for static wireline networks in (2)(23)(24) and for wireless networks
in (25)(26)(27)(28)(29) where the goal is to converge to a static ﬂow allocation and/or resource
allocation over the network. Scheduling in wireless networks with static channels is considered
from a duality perspective in (30)(31). Primaldual techniques for maximizing utility in a stochastic
wireless downlink are developed in (32)(33) for systems without queues. The primaldual technique
is extended in (34)(35) to treat networks with queues and to solve problems similar to (1.6)(1.11)
in a ﬂuid limit sense. Speciﬁcally, the work (34) shows the primaldual technique leads to a ﬂuid
8 1. INTRODUCTION
limit with an optimal utility, and it conjectures that the utility of the actual network is close to this
ﬂuid limit when an exponential averaging parameter is scaled. It makes a statement concerning weak
limits of scaled systems. A related primaldual algorithm is used in (36) and shown to converge to
utilityoptimality as a parameter is scaled.
Our driftpluspenalty approach can be viewed as a dualbased approach to the stochastic
problem (rather than a primaldual approach), and it reduces to the well known dual subgradient
algorithmfor linear and convex programs when applied to nonstochastic problems (see (37)(22)(17)
for discussions on this). One advantage of the driftpluspenalty approach is the explicit convergence
analysis andperformance bounds, resultinginthe [O(1/V), O(V)] performancedelay tradeoff.This
tradeoff is not shown in the alternative approaches described above. The dual approach is also robust
to nonergodic variations and has “universal scheduling” properties, i.e., properties that hold for sys
tems with arbitrary sample paths, as shown in Section 4.9 (see also (38)(39)(40)(41)(42)). However,
one advantage of the primaldual approach is that it provides local optimum guarantees for problems
of minimizing f (x) for nonconvex functions f (·) (see Section 5.5 and (43)). Related dualbased ap
proaches are used for “inﬁnitely backlogged” systems in (31)(44)(45)(46) using static optimization,
ﬂuid limits, and stochastic gradients, respectively. Related algorithms for channelaware scheduling
in wireless downlinks with different analytical techniques are developed in (47)(48)(49).
We note that the [O(1/V), O(V)] performancedelay tradeoff achieved by the driftplus
penalty algorithm on general systems is not necessarily the optimal tradeoff for particular networks.
An optimal [O(1/V), O(
√
V)] energydelay tradeoff is shown by Berry and Gallager in (50) for a
single link with known channel statistics, and optimal performancedelay tradeoffs for multiqueue
systems are developed in (51)(52)(53) and shown to be achievable even when channel statistics are
unknown. This latter work builds on the Lyapunov optimization method, but it uses a more aggres
sive drift steering technique. A placeholder technique for achieving nearoptimal delay tradeoffs is
developed in (37) and related implementations are in (54)(55).
1.6 ONGENERAL MARKOVDECISIONPROBLEMS
The penalties ˆ x
m
(α(t ), ω(t )), described in Section 1.2, depend only on the network control action
α(t ) and the random event ω(t ) (where ω(t ) is generated by “nature” and is not inﬂuenced by
past control actions). In particular, the queue backlogs Q(t ) are not included in the penalties. A
more advanced penalty structure would be ˆ x
m
(α(t ), ω(t ), z(t )), where z(t ) is a controlled Markov
chain (possibly related to the queue backlog) with transition probabilities that depend on control
actions. Extensions of Lyapunov optimization for this case are developed in Chapter 7 using a
driftpluspenalty metric deﬁned over renewal frames (56)(57)(58).
A related 2timescale approach to learning optimal decisions in Markov decision problems is
developed in (59), and learning approaches to poweraware scheduling in single queues are developed
in (60)(61)(62)(63). Background on dynamic programming and Markov decision problems can be
found in (64)(65)(66), and approximate dynamic programming, neurodynamic programming, and
Qlearning theory can be found in (67)(68)(69). All of these approaches may suffer from large
1.7. ONNETWORKDELAY 9
convergence times, high complexity, or inaccurate approximation when applied to large networks.
This is due to the curse of dimensionality for Markov decision problems. This problem does not arise
when using the Lyapunov optimization technique and when penalties have the structure given in
Section 1.2.
1.7 ONNETWORKDELAY
This text develops general [O(1/V), O(V)] tradeoffs, giving explicit bounds on average queue
backlog and delay that grow linearly with V. We also provide examples of exact delay analysis for
randomized algorithms (Exercises 2.62.10), delaylimited transmission (Exercises 5.135.14 and
Section 7.6.1), worst case delay (Section 5.6), and average delay constraints (Section 7.6.2). Further
work on delaylimited transmission is found in (70)(71), and Lyapunov drift algorithms that use
delays as weights, rather than queue backlogs, are considered in (72)(73)(74)(75)(76). There are
many additional interesting topics on network delay that we do not cover in this text. We brieﬂy
discuss some of those topics in the following subsections, with references given for further reading.
1.7.1 DELAY ANDDYNAMICPROGRAMMING
Dynamic programming and Markov decision frameworks are considered for onequeue energy and
delay optimality problems in (77)(78)(79)(80)(81). Onequeue problems with strict deadlines and
apriori knowledge of future events are treated in (82)(83)(84)(85)(86), and ﬁlter theory is used to
establish delay bounds in (87). Control rules for two interacting service stations are given in (88).
Optimal scheduling in a ﬁnite buffer 2 ×2 packet switch is treated in (89).
Minimum energy problems with delay deadlines are considered for multiqueue wireless sys
tems in (90). In the case when channels are static, the work (90) maps the problem to a shortest
path problem. In the case when channels are varying but ratepower functions are linear, (90) shows
the optimal multidimensional dynamic program has a very simple threshold structure. Heuristic
approximations are given for more general ratepower curves. Related work in (91) considers delay
optimal scheduling in multiqueue systems and derives structural results of the dynamic programs,
resulting in efﬁcient approximation algorithms. These approximations are shown to have optimal
decay exponents for sum queue backlog in (92), which relies on techniques developed in (93) for op
timal maxqueue exponents. A mixed Lyapunov optimization and dynamic programming approach
is given in (56) for networks with a small number of delayconstrained queues and an arbitrar
ily large number of other queues that only require stability. Approximate dynamic programs and
qlearning type algorithms, which attempt to learn optimal decision strategies, are considered in
(61)(60)(56)(57)(62)(63).
1.7.2 OPTIMAL O(
√
V) ANDO(log(V)) DELAYTRADEOFFS
The [O(1/V), O(V)] performancedelay tradeoffs we derive for general networks in this text are
not necessarily the optimal tradeoffs for particular networks. The work (50) considers the optimal
10 1. INTRODUCTION
energydelay tradeoff for a onequeue wireless system with a fading channel. It shows that no
algorithm can do better than an [O(1/V), O(
√
V)] tradeoff, and it proposes a bufferpartitioning
algorithm that can be shown to come within a logarithmic factor of this tradeoff. This optimal
[O(1/V), O(
√
V)] tradeoff is extended to multiqueue systems in (51), and an algorithm with
an exponential Lyapunov function and aggressive drift steering is shown to meet this tradeoff to
within a logarithmic factor. The work (51) also shows an improved [O(1/V), O(log(V))] tradeoff
is achievable in certain exceptional cases with piecewise linear structure.
Optimal [O(1/V), O(log(V))] energydelay tradeoffs are shown in (53) in cases when packet
dropping is allowed, and optimal [O(1/V), O(log(V))] utilitydelay tradeoffs are shown for ﬂow
control problems in (52). Nearoptimal [O(1/V), O(log
2
(V))] tradeoffs are shown for the basic
quadratic Lyapunov driftpluspenalty method in (37)(55) using placeholders and LastInFirstOut
(LIFO) scheduling, described in more detail in Section 4.8, and related implementations are in (54).
1.7.3 DELAYOPTIMAL ALGORITHMS FORSYMMETRICNETWORKS
The works (8)(94)(95)(96)(97) treat multiqueue wireless systems with “symmetry,” where arrival
rates and channel probabilities are the same for all queues. They use stochastic coupling theory to
prove delay optimality for particular algorithms. The work (8) proves delay optimality of the longest
connected queue ﬁrst algorithmfor ON/OFFchannels with a single server, the work (94)(97) considers
multiserver systems, and the work (95)(96) considers wireless problems under the information
theoretic multiaccess capacity region. Related work in (98) proves delay optimality of the join the
shortest queue strategy for routing packets to two queues with identical exponential service.
1.7.4 ORDEROPTIMAL DELAY SCHEDULINGANDQUEUEGROUPING
The work (99) shows that delay is at least linear in N for N ×N packet switches that use
queueunaware scheduling, and it develops a simple queueaware scheduling algorithm that gives
O(log(N)) delay whenever rates are within the capacity region. Related work in (100) considers
scheduling in Nuser wireless systems with ON/OFF channels and shows that delay is at least linear
in N if queueunaware algorithms are used, but it can be made O(1) with a simple queueaware
queue grouping algorithm. This O(1) delay, independent of the number of users, is called order opti
mal because it differs from optimal only in a constant coefﬁcient that does not depend on N. Order
optimality of the simple longest connected queue ﬁrst rule (simpler than the algorithm of (100)) is
proven in (101) via a queue grouping analysis.
Orderoptimal delay for 1hop switch scheduling under maximal scheduling (which provides
stability only when rates are within a constant factor of the capacity boundary) are developed in
(102)(103), again using queue grouping theory. In particular, it is shown that N ×N packet switches
can provide O(1) delay (orderoptimal) if they are at most halfloaded. The best known delay bound
beyond the halfloaded region is the O(log(N)) delay result of (99), and it is not known if it is possible
to achieve O(1) delay in this region. Timecorrelated “bursty” trafﬁc is considered in (103). The
1.7. ONNETWORKDELAY 11
queue grouping results in (101)(103) are inspired by queuegrouped Lyapunov functions developed
in (104)(105) for stability analysis.
1.7.5 HEAVYTRAFFICANDDECAY EXPONENTS
Aline of work addresses asymptotic delay optimality in a “heavy trafﬁc” regime where input rates are
pushed very close to the capacity region boundary. Delay is often easier to understand in this heavy
trafﬁc regime due to a phenomenon of state space collapse (106). Of course, delay grows to inﬁnity
if input rates are pushed toward the capacity boundary, but the goal is to design an algorithm that
minimizes an asymptotic growth coefﬁcient. Heavy trafﬁc analysis is considered in (107) for wireless
scheduling and (108)(109) for packet switches.
The work (108)(109) suggests that delay in packet switches can be improved by changing the
wellknown maxweight rule, which seeks to maximize a weighted sumof queue backlog and service
rates every slot t (
¸
i
Q
i
(t )μ
i
(t )), to an αmax weight rule that seeks to maximize
¸
i
Q
i
(t )
α
μ
i
(t ),
where 0 < α ≤ 1. Simulations on N ×N packet switches in (110) show that delay is improved
when α is positive but small. A discussion of this in the context of heavy trafﬁc theory is given in
(111), along with some counterexamples. It is interesting to note that αmax weight policies with
small but positive α make matching decisions that are similar to the maxsize matches used in the
framebased algorithm of (99), which achieves O(log(N)) delay. This may be a reason why the delay
of αmax weight policies is also small. Large deviation theory is often used to analyze queue backlog
and delay, and this is considered for αmax weight policies in (112), for delaybased scheduling in
(73), and for processor sharing queues in (113)(114). Algorithms that optimize the exponent of
queue backlog are considered in (93) for optimizing the maxqueue exponent and in (92) for the
sumqueue exponent. These consider analysis of queue backlog when the queue is very large. An
analysis of backlog distributions that are valid also in the small buffer regime is given in (115) for
the case when the number of network channels is scaled to inﬁnity.
1.7.6 CAPACITY ANDDELAYTRADEOFFS FORMOBILENETWORKS
Work by Gupta and Kumar in (116) shows that pernode capacity of adhoc wireless networks with
N nodes and with random sourcedestination pairings is roughly (1/
√
N) (neglecting logarithmic
factors in N for simplicity). Grossglauser and Tse show in (117) that mobility increases pernode
capacity to (1), which does not vanish with N. However, the algorithm in (117) uses a 2hop relay
algorithmthat creates a large delay. The exact capacity and average endtoend delay are computed in
(118)(17) for a cellpartitioned network with a simpliﬁed i.i.d. mobility model. The work (118)(17)
also shows for this simple model that the average delay W of any scheduling and routing protocol,
possibly one that uses redundant packet transfers, must satisfy:
W
λ
≥
N −d
4d
(1 −log(2))
12 1. INTRODUCTION
where λ is the peruser throughput, C is the number of cells, d = N/C is the node/cell density, and
log(·) denotes the natural logarithm. Thus, if the node/cell density d = (1), then W/λ ≥ (N).
The 2hop relay algorithm meets this bound with λ = (1) and W = (N), and a relay algorithm
that redundantly transmits packets over multiple paths meets this bound with λ = (1/
√
N) and
W = (
√
N). Similar i.i.d. mobility models are considered in (119)(120)(121). The work (119)
shows that improved tradeoffs are possible if the transmission radius of each node can be scaled to
include a large amount of users in each transmission (so that the d = (1) assumption is relaxed).
The work (120)(121) quantiﬁes the optimal tradeoff achievable under this type of radius scaling,
and it also shows improved tradeoffs are possible if the model is changed to allow time slot scaling
and network bitpipelining. Related delay tradeoffs via transmission radius scaling for nonmobile
networks are in (122). Analysis of noni.i.d. mobility models is more complex and considered in
(123)(124)(122)(125). Recent network coding approaches are in (126)(127)(128).
1.8 PRELIMINARIES
We assume the reader is comfortable with basic concepts of probability and random processes (such
as expectations, the law of large numbers, etc.) and with basic mathematical analysis. Familiarity
with queueing theory, Markov chains, and convex functions is useful but not required as we present
or derive results in these areas as needed in the text. For additional references on queueing theory
and Markov chains, including discussions of Little’s Theorem and the renewalreward theorem,
see (129)(66)(130)(131)(132). For additional references on convex analysis, including discussions of
convex hulls, Caratheodory’s theorem, and Jensen’s inequality, see (133)(134)(135).
All of the major results of this text are derived directly from one or more of the following four
key concepts:
• Lawof Telescoping Sums: For any function f (t ) deﬁned over integer times t ∈ {0, 1, 2, . . .}, we
have for any integer time t > 0:
t −1
¸
τ=0
[f (τ +1) −f (τ)] = f (t ) −f (0)
The proof follows by a simple cancellation of terms. This is the main idea behind Lyapunov
drift arguments: Controlling the change in a function at every step allows one to control the
ending value of the function.
• Law of Iterated Expectations: For any random variables X and Y, we have:
3
E{X} = E{E{XY}}
3
Strictly speaking, the law of iterated expectations holds whenever the result of Fubini’s Theorem holds (which allows one to
switch the integration order of a double integral). This holds whenever any one of the following hold: (i) E{X} < ∞, (ii)
E{max[X, 0]} < ∞, (iii) E{min[X, 0]} > −∞.
1.8. PRELIMINARIES 13
where the outer expectation is with respect to the distribution of Y, and the inner expectation
is with respect to the conditional distribution of X given Y.
• Opportunistically Minimizing an Expectation: Consider a game we play against nature, where
nature generates a random variable ω with some (possibly unknown) probability distribution.
We look at nature’s choice of ω and then choose a control action α within some action set A
ω
that possibly depends on ω. Let c(α, ω) represent a general cost function. Our goal is to design
a (possibly randomized) policy for choosing α ∈ A
ω
to minimize the expectation E{c(α, ω)},
where the expectation is taken with respect to the distribution of ω and the distribution of
our action α that possibly depends on ω. Assume for simplicity that for any given outcome ω,
there is at least one action α
min
ω
that minimizes the function c(α, ω) over all α ∈ A
ω
. Then,
not surprisingly, the policy that minimizes E{c(α, ω)} is the one that observes ω and selects
a minimizing action α
min
ω
.
This is easy to prove: If α
∗
ω
represents any random control action chosen in the set A
ω
in
response to the observed ω, we have: c(α
min
ω
, ω) ≤ c(α
∗
ω
, ω). This is an inequality relationship
concerning the random variables ω, α
min
ω
, α
∗
ω
. Taking expectations yields E
¸
c(α
min
ω
, ω)
¸
≤
E
¸
c(α
∗
ω
, ω)
¸
, showing that the expectation under the policy α
min
ω
is less than or equal to the
expectation under any other policy. This is useful for designing drift minimizing algorithms.
• Jensen’s Inequality (not needed until Chapter 5): Let X be a convex subset of R
M
(possibly being
the full space R
M
itself ), and let f (x) be a convex function over X. Let X be any random
vector that takes values in X, and assume that E{X} is well deﬁned and ﬁnite (where the
expectation is taken entrywise). Then:
E{X} ∈ X and f (E{X}) ≤ E{f (X)} (1.12)
This text also uses, in addition to regular limits of functions, the limsup and liminf. Using
(or not using) these limits does not impact any of the main ideas in this text, and readers who are
not familiar with these limits can replace all instances of “limsup” and “liminf” with regular limits
“lim,” without loss of rigor, under the additional assumption that the regular limit exists. For readers
interested in more details on this, note that a function f (t ) may or may not have a well deﬁned
limit as t → ∞(consider, for example, a cosine function). We deﬁne limsup
t →∞
f (t ) as the largest
possible limiting value of f (t ) over any subsequence of times t
k
that increase to inﬁnity, and for
which the limit of f (t
k
) exists. Likewise, liminf
t →∞
f (t ) is the smallest possible limiting value. It
can be shown that these limits always exist (possibly being ∞or −∞). For example, the limsup and
liminf of the cosine function are 1 and −1, respectively. The main properties of limsup and liminf
that we use in this text are:
• If f (t ), g(t ) are functions that satisfy f (t ) ≤ g(t ) for all t , then limsup
t →∞
f (t ) ≤
limsup
t →∞
g(t ). Likewise, liminf
t →∞
f (t ) ≤ liminf
t →∞
g(t ).
14 1. INTRODUCTION
• For any function f (t ), we have liminf
t →∞
f (t ) ≤ limsup
t →∞
f (t ), with equality if and only
if the regular limit exists. Further, whenever the regular limit exists, we have liminf
t →∞
f (t ) =
limsup
t →∞
f (t ) = lim
t →∞
f (t ).
• For any function f (t ), we have limsup
t →∞
f (t ) = −liminf
t →∞
[−f (t )] and
liminf
t →∞
f (t ) = −limsup[−f (t )].
• If f (t ) and g(t ) are functions such that lim
t →∞
g(t ) = g
∗
, where g
∗
is a ﬁnite constant, then
limsup
t →∞
[g(t ) +f (t )] = g
∗
+limsup
t →∞
f (t ).
15
C H A P T E R 2
Introduction to Queues
Let Q(t ) represent the contents of a singleserver discrete time queueing systemdeﬁned over integer
time slots t ∈ {0, 1, 2, . . .}. Speciﬁcally, the initial state Q(0) is assumed to be a nonnegative real
valued random variable. Future states are driven by stochastic arrival and server processes a(t ) and
b(t ) according to the following dynamic equation:
Q(t +1) = max[Q(t ) −b(t ), 0] +a(t ) for t ∈ {0, 1, 2, . . .} (2.1)
We call Q(t ) the backlog on slot t , as it can represent an amount of work that needs to be done. The
stochastic processes {a(t )}
∞
t =0
and {b(t )}
∞
t =0
are sequences of real valued random variables deﬁned
over slots t ∈ {0, 1, 2, . . .}.
The value of a(t ) represents the amount of new work that arrives on slot t , and it is assumed
to be nonnegative. The value of b(t ) represents the amount of work the server of the queue can
process on slot t . For most physical queueing systems, b(t ) is assumed to be nonnegative, although
it is sometimes convenient to allow b(t ) to take negative values. This is useful for the virtual queues
deﬁned in future sections where b(t ) can be interpreted as a (possibly negative) attribute.
1
Because
we assume Q(0) ≥ 0 and a(t ) ≥ 0 for all slots t , it is clear from (2.1) that Q(t ) ≥ 0 for all slots t .
The units of Q(t ), a(t ), and b(t ) depend on the context of the system. For example, in a
communication system with ﬁxed size data units, these quantities might be integers with units of
packets. Alternatively, they might be real numbers with units of bits, kilobits, or some other unit of
unﬁnished work relevant to the system.
We can equivalently rewrite the dynamics (2.1) without the nonlinear max[·, 0] operator as
follows:
Q(t +1) = Q(t ) −
˜
b(t ) +a(t ) for t ∈ {0, 1, 2, . . .} (2.2)
where
˜
b(t ) is the actual work processed on slot t (which may be less than the offered amount b(t )
if there is little or no backlog in the system on slot t ). Speciﬁcally,
˜
b(t ) is mathematically deﬁned:
˜
b(t )
=
min[b(t ), Q(t )]
1
Assuming that the b(t ) value in (2.1) is possibly negative also allows treatment of modiﬁed queueing models that place new
arrivals inside the max[·, 0] operator. For example, a queue with dynamics
ˆ
Q(t +1) = max[
ˆ
Q(t ) −β(t ) +α(t ), 0] is the same
as (2.1) with a(t ) = 0 and b(t ) = β(t ) −α(t ) for all t . Leaving a(t ) outside the max[·, 0] is crucial for treatment of multihop
networks, where a(t ) can be a sum of exogenous and endogenous arrivals.
16 2. INTRODUCTIONTOQUEUES
Note by deﬁnition that
˜
b(t ) ≤ b(t ) for all t . The dynamic equation (2.2) yields a simple but important
property for all sample paths, described in the following lemma.
Lemma 2.1 (Sample Path Property) For any discrete time queueing system described by (2.1), and for
any two slots t
1
and t
2
such that 0 ≤ t
1
< t
2
, we have:
Q(t
2
) −Q(t
1
) =
t
2
−1
¸
τ=t
1
a(τ) −
t
2
−1
¸
τ=t
1
˜
b(τ) (2.3)
Therefore, for any t > 0, we have:
Q(t )
t
−
Q(0)
t
=
1
t
t −1
¸
τ=0
a(τ) −
1
t
t −1
¸
τ=0
˜
b(τ) (2.4)
Q(t )
t
−
Q(0)
t
≥
1
t
t −1
¸
τ=0
a(τ) −
1
t
t −1
¸
τ=0
b(τ) (2.5)
Proof. By (2.2), we have for any slot τ ≥ 0:
Q(τ +1) −Q(τ) = a(τ) −
˜
b(τ)
Summing the above over τ ∈ {t
1
, . . . , t
2
−1} and using the law of telescoping sums yields:
Q(t
2
) −Q(t
1
) =
t
2
−1
¸
τ=t
1
a(τ) −
t
2
−1
¸
τ=t
1
˜
b(τ)
This proves (2.3). Inequality (2.4) follows by substituting t
1
= 0, t
2
= t , and dividing by t . Inequality
(2.5) follows because
˜
b(τ) ≤ b(τ) for all τ. 2
An important application of Lemma 2.1 to poweraware systems is treated in Exercise 2.11.
The equality (2.4) is illuminating. It shows that lim
t →∞
Q(t )/t = 0 if and only if the time average
of the process a(t ) −
˜
b(t ) is zero (where the time average of a(t ) −
˜
b(t ) is the limit of the right
handside of (2.4)). This happens when the time average rate of arrivals a(t ) is equal to the time
average rate of actual departures
˜
b(t ). This motivates the deﬁnitions of rate stability and mean rate
stability, deﬁned in the next section.
2.1. RATESTABILITY 17
2.1 RATESTABILITY
Let Q(t ) be a real valued stochastic process that evolves in discrete time over slots t ∈ {0, 1, 2, . . .}
according to some probability law.
Deﬁnition 2.2 A discrete time process Q(t ) is rate stable if:
lim
t →∞
Q(t )
t
= 0 with probability 1
Deﬁnition 2.3 A discrete time process Q(t ) is mean rate stable if:
lim
t →∞
E{Q(t )}
t
= 0
We use an absolute value of Q(t ) in the mean rate stability deﬁnition, even though our queue
in (2.1) is nonnegative, because later it will be useful to deﬁne mean rate stability for virtual queues
that can be possibly negative.
Theorem2.4 (Rate Stability Theorem) Suppose Q(t ) evolves according to (2.1), with a(t ) ≥ 0 for all
t , and with b(t ) real valued (and possibly negative) for all t . Suppose that the time averages of the processes
a(t ) and b(t ) converge with probability 1 to ﬁnite constants a
av
and b
av
, so that:
lim
t →∞
1
t
t −1
¸
τ=0
a(τ) = a
av
with probability 1 (2.6)
lim
t →∞
1
t
t −1
¸
τ=0
b(τ) = b
av
with probability 1 (2.7)
Then:
(a) Q(t ) is rate stable if and only if a
av
≤ b
av
.
(b) If a
av
> b
av
, then:
lim
t →∞
Q(t )
t
= a
av
−b
av
with probability 1
(c) Suppose there are ﬁnite constants > 0 and C > 0 such that E
¸
[a(t ) +b
−
(t )]
1+
¸
≤ C for
all t , where b
−
(t )
=
−min[b(t ), 0]. Then Q(t ) is mean rate stable if and only if a
av
≤ b
av
.
18 2. INTRODUCTIONTOQUEUES
Proof. Here we prove only the necessary condition of part (a). Suppose that Q(t ) is rate stable, so
that Q(t )/t → 0 with probability 1. Because (2.5) holds for all slots t > 0, we can take limits in
(2.5) as t → ∞and use (2.6)(2.7) to conclude that 0 ≥ a
av
−b
av
. Thus, a
av
≤ b
av
is necessary for
rate stability. The proof for sufﬁciency in part (a) and the proof of part (b) are developed in Exercises
2.3 and 2.4. The proof of part (c) is more complex and is omitted (see (136)). 2
The following theorem presents a more general necessary condition for rate stability that does
not require the arrival and server processes to have well deﬁned limits.
Theorem 2.5 (Necessary Condition for Rate Stability) Suppose Q(t ) evolves according to (2.1), with
any general processes a(t ) and b(t ) such that a(t ) ≥ 0 for all t . Then:
(a) If Q(t ) is rate stable, then:
limsup
t →∞
1
t
t −1
¸
τ=0
[a(τ) −b(τ)] ≤ 0 with probability 1 (2.8)
(b) If Q(t ) is mean rate stable and if E{Q(0)} < ∞, then:
limsup
t →∞
1
t
t −1
¸
τ=0
E{a(τ) −b(τ)} ≤ 0 (2.9)
Proof. The proof of (a) follows immediately by taking a limsup of both sides of (2.5) and noting
that Q(t )/t → 0 because Q(t ) is rate stable. The proof of (b) follows by ﬁrst taking an expectation
of (2.5) and then taking limits. 2
2.2 STRONGERFORMS OF STABILITY
Rate stability and mean rate stability only describe the long term average rate of arrivals and depar
tures from the queue, and do not say anything about the fraction of time the queue backlog exceeds
a certain value, or about the time average expected backlog. The stronger stability deﬁnitions given
below are thus useful.
Deﬁnition 2.6 A discrete time process Q(t ) is steady state stable if:
lim
M→∞
g(M) = 0
where for each M ≥ 0, g(M) is deﬁned:
g(M)
=
limsup
t →∞
1
t
t −1
¸
τ=0
Pr[Q(τ) > M] (2.10)
2.3. RANDOMIZEDSCHEDULINGFORRATESTABILITY 19
Deﬁnition 2.7 A discrete time process Q(t ) is strongly stable if:
limsup
t →∞
1
t
t −1
¸
τ=0
E{Q(τ)} < ∞ (2.11)
Under mildboundedness assumptions, strong stability implies all of the other forms of stability,
as speciﬁed in Theorem 2.8 below.
Theorem 2.8 (Strong Stability Theorem) Suppose Q(t ) evolves according to (2.1) for some general
stochastic processes {a(t )}
∞
t =0
and {b(t )}
∞
t =0
, where a(t ) ≥ 0 for all t , and b(t ) is real valued for all t .
Suppose Q(t ) is strongly stable. Then:
(a) Q(t ) is steady state stable.
(b) If there is a ﬁnite constant C such that either a(t ) +b
−
(t ) ≤ C with probability 1 for all t
(where b
−
(t )
=
−min[b(t ), 0]), or b(t ) −a(t ) ≤ C with probability 1 for all t , then Q(t ) is rate stable,
so that Q(t )/t → 0 with probability 1.
(c) If there is a ﬁnite constant C such that either E
¸
a(t ) +b
−
(t )
¸
≤ C for all t , or
E{b(t ) −a(t )} ≤ C for all t , then Q(t ) is mean rate stable.
Proof. Part (a) is given in Exercise 2.5. Parts (b) and (c) are omitted (see (136)). 2
Readers familiar with discrete time Markov chains (DTMCs) may be interested in the fol
lowing connection: For processes Q(t ) deﬁned over an ergodic DTMC with a ﬁnite or countably
inﬁnite state space and with the property that, for each real value M, the event {Q(t ) ≤ M} corre
sponds to only a ﬁnite number of states, steady state stability implies the existence of a steady state
distribution, and strong stability implies ﬁnite average backlog and (by Little’s theorem (129)) ﬁnite
average delay.
2.3 RANDOMIZEDSCHEDULINGFORRATESTABILITY
The Rate Stability Theorem (Theorem 2.4) suggests the following simple method for stabilizing a
multiqueue network: Make scheduling decisions so that the time average service and arrival rates
are well deﬁned and satisfy a
av
i
≤ b
av
i
for each queue i. This method typically requires perfect
knowledge of the arrival and channel probabilities so that the desired time averages can be achieved.
Some representative examples are provided below. A better method that does not require apriori
statistical knowledge is developed in Chapters 3 and 4.
20 2. INTRODUCTIONTOQUEUES
Q
1
(t)
Q
2
(t)
a
1
(t)
a
2
(t)
Q
3
(t) a
3
(t)
Figure 2.1: A3queue, 2server system. Every slot the network controller decides which 2 queues receive
servers. A single queue cannot receive 2 servers on the same slot.
2.3.1 A3QUEUE, 2SERVEREXAMPLE
Example Problem: Consider the 3queue, 2server system of Fig. 2.1. All packets have ﬁxed length,
and a queue that is allocated a server on a given slot can serve exactly one packet on that slot. Every
slot we choose which 2 queues to serve. The service is given for i ∈ {1, 2, 3} by:
b
i
(t ) =
¸
1 if a server is connected to queue i on slot t
0 otherwise
Assume the arrival processes have well deﬁned time average rates (a
av
1
, a
av
2
, a
av
3
), in units of pack
ets/slot. Design a server allocation algorithm to make all queues rate stable when arrival rates are
given as follows:
a) (a
av
1
, a
av
2
, a
av
3
) = (0.5, 0.5, 0.9)
b) (a
av
1
, a
av
2
, a
av
3
) = (2/3, 2/3, 2/3)
c) (a
av
1
, a
av
2
, a
av
3
) = (0.7, 0.9, 0.4)
d) (a
av
1
, a
av
2
, a
av
3
) = (0.65, 0.5, 0.75)
e) Use (2.5) to prove that the constraints 0 ≤ a
av
i
≤ 1 for all i ∈ {1, 2, 3}, and a
av
1
+a
av
2
+
a
av
3
≤ 2, are necessary for the existence of a rate stabilizing algorithm.
Solution:
a) Choose the service vector (b
1
(t ), b
2
(t ), b
3
(t )) to be independent and identically distributed
(i.i.d.) every slot, choosing (0, 1, 1) with probability 1/2 and (1, 0, 1) with probability 1/2. Then
{b
1
(t )}
∞
t =0
is i.i.d. over slots with b
av
1
= 0.5 by the law of large numbers. Likewise, b
av
2
= 0.5 and
b
av
3
= 1. Then clearly a
av
i
≤ b
av
i
for all i ∈ {1, 2, 3}, and so the Rate Stability Theorem ensures all
queues are rate stable. While this is a randomized scheduling algorithm, one could also design a
deterministic algorithm, such as one that alternates between (0, 1, 1) (on odd slots) and (1, 0, 1)
(on even slots).
b) Choose (b
1
(t ), b
2
(t ), b
3
(t )) i.i.d. over slots, equally likely over the three options (1, 1, 0),
(1, 0, 1), and (0, 1, 1). Then b
av
i
= 2/3 = a
av
i
for all i ∈ {1, 2, 3}, and so by the Rate Stability
Theorem all queues are rate stable.
2.3. RANDOMIZEDSCHEDULINGFORRATESTABILITY 21
c) Every slot, independently choose the service vector (0, 1, 1) with probability p
1
, (1, 0, 1)
with probability p
2
, and (1, 1, 0) with probability p
3
, so that p
1
, p
2
, p
3
satisfy:
p
1
(0, 1, 1) +p
2
(1, 0, 1) +p
3
(1, 1, 0) ≥ (0.7, 0.9, 0.4) (2.12)
p
1
+p
2
+p
3
= 1 (2.13)
p
i
≥ 0 ∀i ∈ {1, 2, 3} (2.14)
where the inequality (2.12) is takenentrywise.This is anexample of a linear program. Linear programs
are typically difﬁcult to solve by hand, but this one can be solved easily by guessing that the constraint
in (2.12) can be solved with equality. One can verify the following (unique) solution: p
1
= 0.3,
p
2
= 0.1, p
3
= 0.6. Thus, b
av
1
= p
2
+p
3
= 0.7, b
av
2
= p
1
+p
3
= 0.9, b
av
3
= p
1
+p
2
= 0.4, and
so all queues are rate stable by the Rate Stability Theorem. It is an interesting exercise to design an
alternative deterministic algorithm that uses a periodic schedule to produce the same time averages.
d) Use the same linear program (2.12)(2.14), but replace the constraint (2.12) with the
following:
p
1
(0, 1, 1) +p
2
(1, 0, 1) +p
3
(1, 1, 0) ≥ (0.65, 0.5, 0.75)
This can be solved by hand by trialanderror. One simplifying trick is to replace the above inequality
constraint with the following equality constraint:
p
1
(0, 1, 1) +p
2
(1, 0, 1) +p
3
(1, 1, 0) = (0.7, 0.5, 0.8)
Then we can use p
1
= 0.3, p
2
= 0.5, p
3
= 0.2.
e) Consider any algorithm that makes all queues rate stable, and let b
i
(t ) be the queuei
decision made by the algorithm on slot t . For each queue i, we have for all t > 0:
Q
i
(t )
t
−
Q
i
(0)
t
≥
1
t
t −1
¸
τ=0
a
i
(τ) −
1
t
t −1
¸
τ=0
b
i
(τ)
≥
1
t
t −1
¸
τ=0
a
i
(τ) −1
where the ﬁrst inequality follows by (2.5) and the ﬁnal inequality holds because b
i
(τ) ≤ 1 for all τ.
The above holds for all t > 0. Taking a limit as t → ∞ and using the fact that queue i is rate stable
yields, with probability 1:
0 ≥ a
av
i
−1
22 2. INTRODUCTIONTOQUEUES
and so we ﬁnd that, for each i ∈ {1, 2, 3}, the condition a
av
i
≤ 1 is necessary for the existence of an
algorithm that makes all queues rate stable. Similarly, we have:
Q
1
(t ) +Q
2
(t ) +Q
3
(t )
t
−
Q
1
(0) +Q
2
(0) +Q
3
(0)
t
≥
1
t
t −1
¸
τ=0
[a
1
(τ) +a
2
(τ) +a
3
(τ)] −
1
t
t −1
¸
τ=0
[b
1
(τ) +b
2
(τ) +b
3
(τ)]
≥
1
t
t −1
¸
τ=0
[a
1
(τ) +a
2
(τ) +a
3
(τ)] −2
where the ﬁnal inequality holds because b
1
(τ) +b
2
(τ) +b
3
(τ) ≤ 2 for all τ. Taking limits shows
that 0 ≥ a
av
1
+a
av
2
+a
av
3
−2 is also a necessary condition.
Discussion: Deﬁne as the set of all rate vectors (a
av
1
, a
av
2
, a
av
3
) that satisfy the constraints in
part (e) of the above example problem. We know from part (e) that (a
av
1
, a
av
2
, a
av
3
) ∈ is a necessary
condition for existence of an algorithm that makes all queues rate stable. Further, it can be shown
that for any vector (a
av
1
, a
av
2
, a
av
3
) ∈ , there exist probabilities p
1
, p
2
, p
3
that solve the following
linear program:
p
1
(0, 1, 1) +p
2
(1, 0, 1) +p
3
(1, 1, 0) ≥ (a
av
1
, a
av
2
, a
av
3
)
p
1
+p
2
+p
3
= 1
p
i
≥ 0 ∀i ∈ {1, 2, 3}
Showing this is not trivial and is left as an advanced exercise. However, this fact, together with the
Rate Stability Theorem, shows that it is possible to design an algorithm to make all queues rate
stable whenever (a
av
1
, a
av
2
, a
av
3
) ∈ . That is, (a
av
1
, a
av
2
, a
av
3
) ∈ is necessary and sufﬁcient for the
existence of an algorithm that makes all queues rate stable. The set is called the capacity region for
the network. Exercises 2.7 and 2.8 provide additional practice questions about scheduling and delay
in this system.
2.3.2 A2QUEUEOPPORTUNISTICSCHEDULINGEXAMPLE
Example Problem: Consider a 2queue wireless downlink that operates in discrete time (Fig. 2.2a).
All data consists of ﬁxed length packets. The arrival process (a
1
(t ), a
2
(t )) represents the (integer)
number of packets that arrive to each queue on slot t . There are two wireless channels, and packets in
queue i must be transmitted over channel i, for i ∈ {1, 2}. At the beginning ofeach slot, the network
controller observes the channel state vector S(t ) = (S
1
(t ), S
2
(t )), where S
i
(t ) ∈ {ON, OFF}, so
that there are four possible channel state vectors. The controller can transmit at most one packet per
slot, and it can only transmit a packet over a channel that is ON. Thus, for each channel i ∈ {1, 2},
we have:
b
i
(t ) =
¸
1 if S
i
(t ) = ON and channel i is chosen for transmission on slot t
0 otherwise
2.3. RANDOMIZEDSCHEDULINGFORRATESTABILITY 23
Q
1
(t)
Q
2
(t)
a
1
(t)
a
2
(t)
S
1
(t)
S
2
(t)
(a)
1
2
(0,0)
(0,0.4)
(0.6,0)
(0.6,0.16)
(0.36,0.4)
(b)
Figure 2.2: (a) The 2queue, 1server opportunistic scheduling system with ON/OFF channels. (b) The
capacity region for the speciﬁc channel probabilities given below.
If S(t ) = (OFF, OFF), then b
1
(t ) = b
2
(t ) = 0. If exactly one channel is ON, then clearly the
controller should choose to transmit over that channel. The only decision is which channel to
use when S(t ) = (ON, ON). Suppose that (a
1
(t ), a
2
(t )) is i.i.d. over slots with E{a
1
(t )} =
λ
1
and E{a
2
(t )} = λ
2
. Suppose that S(t ) is i.i.d. over slots with Pr[(OFF, OFF)]
=
p
00
,
Pr[(OFF, ON)] = p
01
, Pr[(ON, OFF)] = p
10
, Pr[ON, ON] = p
11
.
a) Deﬁne as the set of all vectors (λ
1
, λ
2
) that satisfy the constraints 0 ≤ λ
1
≤ p
10
+
p
11
, 0 ≤ λ
2
≤ p
01
+p
11
, λ
1
+λ
2
≤ p
01
+p
10
+p
11
. Show that (λ
1
, λ
2
) ∈ is necessary for the
existence of a rate stabilizing algorithm.
b) Plot the 2dimensional region for the special case when p
00
= 0.24, p
10
= 0.36, p
01
=
0.16, p
11
= 0.24.
c) For the system of part (b): Use a randomized algorithm that independently transmits over
channel 1 with probability β whenever S(t ) = (ON, ON). Choose β to make both queues rate
stable when (λ
1
, λ
2
) = (0.6, 0.16).
d) For the system of part (b): Choose β to make both queues rate stable when (λ
1
, λ
2
) =
(0.5, 0.26).
Solution:
a) Let b
1
(t ), b
2
(t ) be the decisions made by a particular algorithm that makes both queues
rate stable. From (2.5), we have for queue 1 and for all slots t > 0:
Q
1
(t )
t
−
Q
1
(0)
t
≥
1
t
t −1
¸
τ=0
a
1
(τ) −
1
t
t −1
¸
τ=0
b
1
(τ)
Because b
1
(τ) ≤ 1
{S
1
(τ)=ON}
, where the latter is an indicator function that is 1 if S
1
(τ) = ON, and
0 else, we have:
Q
1
(t )
t
−
Q
1
(0)
t
≥
1
t
t −1
¸
τ=0
a
1
(τ) −
1
t
t −1
¸
τ=0
1
{S
1
(τ)=ON}
(2.15)
24 2. INTRODUCTIONTOQUEUES
However, we know that Q
1
(t )/t → 0 with probability 1. Further, by the law of large numbers, we
have (with probability 1):
lim
t →∞
1
t
t −1
¸
τ=0
a
1
(τ) = λ
1
, lim
t →∞
1
t
t −1
¸
τ=0
1
{S
1
(τ)=ON}
= p
10
+p
11
Thus, taking a limit as t → ∞ in (2.15) yields:
0 ≥ λ
1
−(p
10
+p
11
)
and hence λ
1
≤ p
10
+p
11
is a necessary condition for any rate stabilizing algorithm. A similar
argument shows that λ
2
≤ p
01
+p
11
is a necessary condition. Finally, note that for all t > 0:
Q
1
(t ) +Q
2
(t )
t
−
Q
1
(0) +Q
2
(0)
t
≥
1
t
t −1
¸
τ=0
[a
1
(τ) +a
2
(τ)] −
1
t
t −1
¸
τ=0
1
{{S
1
(τ)=ON}∪{S
2
(τ)=ON}}
Taking a limit of the above proves that λ
1
+λ
2
≤ p
01
+p
10
+p
11
is necessary.
b) See Fig. 2.2b.
c) If S(t ) = (OFF, OFF) then don’t transmit. If S(t ) = (ON, OFF) or (ON, ON) then
transmit over channel 1. If S(t ) = (OFF, ON), then transmit over channel 2. Then by the law
of large numbers, we have b
av
1
= p
10
+p
11
= 0.6, b
av
2
= p
01
= 0.16, and so both queues are rate
stable (by the Rate Stability Theorem).
d) Choose β = 0.14/0.24. Then b
av
1
= 0.36 +0.24β = 0.5, and b
av
2
= 0.16 +0.24(1 −
β) = 0.26.
Discussion: Exercise 2.9 treats scheduling and delay issues in this system. It can be shown that
the set given in part (a) above is the capacity region, so that (λ
1
, λ
2
) ∈ is necessary and sufﬁcient
for the existence of a rate stabilizing policy. See (8) for the derivation of the capacity region for
ON/OFF opportunistic scheduling systems with K queues (with K ≥ 2). See also (8) for optimal
delay scheduling in symmetric systems of this type (where all arrival rates are the same, as are all
ON/OFF probabilities), and (101)(100) for “orderoptimal” delay in general (possibly asymmetric)
situations.
It is possible to support any point in using a stationary randomized policy that makes a
scheduling decision as a random function of the observed channel state S(t ). Such policies are
called Sonly policies. The solutions given in parts (c) and (d) above use Sonly policies. Further, the
randomized server allocation policies considered in the 3queue, 2server example of Section 2.3.1
can be viewed as “degenerate” Sonly policies, because, in that case, there is only one “channel state”
(i.e., (ON, ON, ON)). It is known that the capacity region of general singlehop and multihop
networks with time varying channels S(t ) can be described in terms of Sonly policies (15)(22) (see
also Theorem 4.5 of Chapter 4 for a related result for more general systems).
Note that Sonly policies do not consider queue backlog information, and thus they may serve
a queue that is empty, which is clearly inefﬁcient. Thus, one might wonder how Sonly policies can
2.4. EXERCISES 25
stabilize queueing networks whenever trafﬁc rates are inside the capacity region. Intuitively, the
reason is that inefﬁciency only arises when a queue becomes empty, a rare event when trafﬁc rates are
near the boundary of the capacity region.
2
Thus, using queue backlog information cannot “enlarge”
the region of supportable rates. However, Chapter 3 shows that queue backlogs are extremely useful
for designing dynamic algorithms that do not require apriori knowledge of channel statistics or
apriori computation of a randomized policy with speciﬁc time averages.
2.4 EXERCISES
Exercise 2.1. (Queue Sample Path) Fill in the missing entries of the table in Fig. 2.3 for a queue
Q(t ) that satisﬁes (2.1).
t 0 1 2 3 4 5 6 7 8 9 10
Arrivals a(t ) 3 3 0 2 1 0 0 2 0 0
Current Rate b(t ) 4 2 1 3 3 2 2 4 0 2 1
Backlog Q(t ) 0 3 4 3 2
Transmitted
˜
b(t ) 0 2 1 2 1
Figure 2.3: An example sample path for the queueing system of Exercise 2.1.
Exercise 2.2. (Inequality comparison) Let Q(t ) satisfy (2.1) with server process b(t ) and arrival
process a(t ). Let
˜
Q(t ) be another queueing system with the same server process b(t ) but with an
arrival process ˜ a(t ) = a(t ) +z(t ), where z(t ) ≥ 0 for all t ∈ {0, 1, 2, . . .}. Assuming that Q(0) =
˜
Q(0), prove that Q(t ) ≤
˜
Q(t ) for all t ∈ {0, 1, 2, . . .}.
Exercise 2.3. (Proving sufﬁciency for Theorem 2.4a) Let Q(t ) satisfy (2.1) with arrival and server
processes with well deﬁned time averages a
av
and b
av
. Suppose that a
av
≤ b
av
. Fix > 0, and
deﬁne Q
(t ) as a queue with Q
(0) = Q(0), and with the same server process b(t ) but with an
arrival process ˜ a(t ) = a(t ) +(b
av
−a
av
) + for all t .
a) Compute the time average of ˜ a(t ).
b) Assuming the result of Theorem 2.4b, compute lim
t →∞
Q
(t )/t .
c) Use the result of part (b) and Exercise 2.2 to prove that Q(t ) is rate stable. Hint: I am
thinking of a nonnegative number x. My number has the property that x ≤ for all > 0. What
is my number?
2
For example, in the GI/B/1 queue of Exercise 2.6, it can be shown by Little’s Theorem (129) that the fraction of time the queue
is empty is 1 −λ/μ (assuming λ ≤ μ), which goes to zero when λ → μ.
26 2. INTRODUCTIONTOQUEUES
Exercise 2.4. (Proof of Theorem 2.4b) Let Q(t ) be a queue that satisﬁes (2.1). Assume time
averages of a(t ) and b(t ) are given by ﬁnite constants a
av
and b
av
, respectively.
a) Use the following equation to prove that lim
t →∞
a(t )/t = 0 with probability 1:
1
t +1
t
¸
τ=0
a(τ) =
t
t +1
1
t
t −1
¸
τ=0
a(τ) +
t
t +1
a(t )
t
b) Suppose that
˜
b(t
i
) < b(t
i
) for some slot t
i
(where we recall that
˜
b(t
i
)
=
min[b(t
i
), Q(t
i
)]).
Use (2.1) to compute Q(t
i
+1).
c) Use part (b) and (2.5) to show that if
˜
b(t
i
) < b(t
i
), then:
a(t
i
) ≥ Q(0) +
t
i
¸
τ=0
[a(τ) −b(τ)]
Conclude that if
˜
b(t
i
) < b(t
i
) for an inﬁnite number of slots t
i
, then a
av
≤ b
av
.
d) Use part (c) to conclude that if a
av
> b
av
, there is some slot t
∗
≥ 0 such that for all t > t
∗
,
we have:
Q(t ) = Q(t
∗
) +
t −1
¸
τ=t
∗
[a(τ) −b(τ)]
Use this to prove the result of Theorem 2.4b.
Exercise 2.5. (Strong stability implies steady state stability) Prove that strong stability implies
steady state stability using the fact that E{Q(τ)} ≥ MPr[Q(τ) > M].
Exercise 2.6. (Discrete time GI/B/1 queue) Consider a queue Q(t ) with dynamics (2.1). Assume
that a(t ) is i.i.d. over slots with nonnegative integer values, with E{a(t )} = λ and E
¸
a(t )
2
¸
=
E
¸
a
2
¸
. Assume that b(t ) is independent of the arrivals and is i.i.d. over slots with Pr[b(t ) = 1] = μ,
Pr[b(t ) = 0] = 1 −μ. Thus, Q(t ) is always integer valued. Suppose that λ < μ, and that there are
ﬁnite values E{Q}, Q, Q
av
, E
¸
Q
2
¸
such that:
lim
t →∞
1
t
t −1
¸
τ=0
E{Q(τ)} = Q , lim
t →∞
1
t
t −1
¸
τ=0
Q(τ) = Q
av
with prob. 1
lim
t →∞
E{Q(t )} = E{Q} , lim
t →∞
E
¸
Q(t )
2
¸
= E
¸
Q
2
¸
Using ergodic Markov chain theory, it can be shown that Q = Q
av
= E{Q} (see also Exercise 7.9).
Here we want to compute E{Q}, using the magic of a quadratic.
a) Take expectations of equation (2.2) to ﬁnd lim
t →∞
E
¸
˜
b(t )
¸
.
2.4. EXERCISES 27
b) Explain why
˜
b(t )
2
=
˜
b(t ) and Q(t )
˜
b(t ) = Q(t )b(t ).
c) Square equation (2.2) and use part (b) to prove:
Q(t +1)
2
= Q(t )
2
+
˜
b(t ) +a(t )
2
−2Q(t )(b(t ) −a(t )) −2
˜
b(t )a(t )
d) Take expectations in (c) and let t → ∞ to conclude that:
E{Q} =
E
¸
a
2
¸
+λ −2λ
2
2(μ −λ)
We have used the fact that Q(t ) is independent of b(t ), even though it is not independent of
˜
b(t ).
This establishes the average backlog for an integerbased GI/B/1 queue (where “GI” means the
arrivals are general and i.i.d. over slots, “B” means the service is i.i.d. Bernoulli, and “1” means
there is a single server). By Little’s Theorem (129), it follows that average delay (in units of slots) is
W = Q/λ. When the arrival process is Bernoulli, these formulas simplify to Q = λ(1 −λ)/(μ −λ)
and W = (1 −λ)/(μ −λ). Using reversible Markov chain theory (130)(66)(131), it can be shown
that the steady state output process of a B/B/1 queue is also i.i.d. Bernoulli with rate λ (regardless
of μ, provided that λ < μ), which makes analysis of tandems of B/B/1 queues very easy.
Exercise 2.7. (Server Scheduling) Consider the 3queue, 2server system example of Section 2.3.1
(Fig. 2.1). Assume the arrival vector (a
1
(t ), a
2
(t ), a
3
(t )) is i.i.d. over slots with E{a
i
(t )} = λ
i
for
i ∈ {1, 2, 3}. Design a randomized server allocation algorithm to make all queues rate stable when:
a) (λ
1
, λ
2
, λ
3
) = (0.2, 0.9, 0.6)
b) (λ
1
, λ
2
, λ
3
) = (3/4, 3/4, 1/2)
c) (λ
1
, λ
2
, λ
3
) = (0.6, 0.5, 0.9)
d) (λ
1
, λ
2
, λ
3
) = (0.7, 0.6, 0.5)
e) Give a deterministic algorithm that uses a periodic schedule to support the rates in part (b).
f ) Give a deterministic algorithm that uses a periodic schedule to support the rates in part (c).
Exercise 2.8. (Delay for Server Scheduling) Consider the 3queue, 2server system of Fig. 2.1 that
operates according to the randomized schedule of the solution given in part (d) of Section 2.3.1, so
that p
1
= 0.3, p
2
= 0.5, p
3
= 0.2. Suppose a
1
(t ) is i.i.d. over slots and Bernoulli, with Pr[a
1
(t ) =
0] = 0.35, Pr[a
1
(t ) = 1] = 0.65. Use the formula of Exercise 2.6 to compute the average backlog
Q
1
and average delay W
1
in queue 1. (First, you must convince yourself that queue 1 is indeed a
discrete time GI/B/1 queue).
Exercise 2.9. (Delay for Opportunistic Scheduling) Consider the 2queue wireless downlink with
ON/OFF channels as described in the example of Section 2.3.2 (Fig. 2.2). The channel probabilities
28 2. INTRODUCTIONTOQUEUES
are given as in that example: p
00
= 0.24, p
10
= 0.36, p
01
= 0.16, p
11
= 0.24. Suppose the arrival
process a
1
(t ) is i.i.d. Bernoulli with rate λ
1
= 0.4, so that Pr[a
1
(t ) = 1] = 0.4, Pr[a
1
(t ) = 0] =
0.6. Suppose a
2
(t ) is i.i.d. Bernoulli with rate λ
2
= 0.3. Design a randomized algorithm, using
parameter β as the probability that we transmit over channel 1 when S(t ) = (ON, ON), that
ensures the average delay satisﬁes W
1
≤ 25 slots and W
2
≤ 25 slots. You should use the delay
formula in Exercise 2.6 (ﬁrst convincing yourself that each queue is indeed a GI/B/1 queue) along
with an educated guess for β and/or trial and error for β.
Exercise 2.10. (Simulation of a B/B/1 queue) Write a computer program to simulate a
Bernoulli/Bernoulli/1 (B/B/1) queue. Speciﬁcally, we have Q(0) = 0, {a(t )}
∞
t =0
is i.i.d over slots
with Pr[a(t ) = 1] = λ, Pr[a(t ) = 0] = 1 −λ, and {b(t )}
∞
t =0
is independent of the arrival process
and is i.i.d. over slots with Pr[b(t ) = 1] = μ, Pr[b(t ) = 0] = 1 −μ. Assume that μ = 0.7, run
the experiment over 10
6
slots, and give the empirical time average Q
av
and the value of Q(t )/t for
t = 10
6
, for λ values of 0.4, 0.5, 0.6, 0.7, 0.8. Compare these to the exact value (given in Exercise
2.6) for t → ∞.
Exercise 2.11. (Virtual Queues) Suppose we have a system that operates in discrete time with slots
t ∈ {0, 1, 2, . . .}. A controller makes decisions every slot t about how to operate the system, and
these decisions incur power p(t ). The controller wants to ensure the time average power expenditure
is no more than 12.3 power units per slot. Deﬁne a virtual queue Z(t ) with Z(0) = 0, and with
update equation:
Z(t +1) = max[Z(t ) −12.3, 0] +p(t ) (2.16)
The controller keeps the value of Z(t ) as a state variable, and updates Z(t ) at the end of each slot
via (2.16) using the power p(t ) that was spent on that slot.
a) Use Lemma 2.1 to prove that if Z(t ) is rate stable, then:
3
lim
t →∞
1
t
¸
t −1
τ=0
p(τ) ≤ 12.3 with probability 1
b) Suppose there is a positive constant Z
max
such that Z(t ) ≤ Z
max
for all t ∈ {0, 1, 2, . . .}.
Use (2.3) to show that for any integer T > 0 and any interval of T slots, deﬁned by {t
1
, . . . , t
1
+
T −1} (where t
1
≥ 0), we have:
¸
t
1
+T −1
τ=t
1
p(τ) ≤ 12.3T +Z
max
This idea is used in (21) to ensure the total power used in a communication system over any interval
is less than or equal to the desired perslot average power constraint multiplied by the interval size,
plus a constant allowable “power burst” Z
max
. A variation of this technique is used in (137) to bound
the worstcase number of collisions with a primary user in a cognitive radio network.
3
For simplicity, we have implicitly assumed the limit lim
t →∞
1
t
¸
t −1
τ=0
p(τ) in Exercise 2.11(a) exists. More generally, the result
holds when “lim” is replaced with “limsup.”
29
C H A P T E R 3
Dynamic Scheduling Example
The dynamic scheduling algorithms developed in this text use powerful techniques of Lyapunov
drift and Lyapunov optimization. To build intuition, this chapter introduces the main concepts for
a simple 2user wireless downlink example, similar to the example given in Section 2.3.2 of the
previous chapter. First, the problem is formulated in terms of known arrival rates and channel state
probabilities. However, rather than using a randomized scheduling algorithm that bases decisions
only on the current channel states (as considered in the previous chapter), we use an alternative
approach based on minimizing the drift of a Lyapunov function. The advantage is that the drift
minimizing approach uses both current channel states and current queue backlogs to stabilize the
system, and it does not require apriori knowledge of trafﬁc rates or channel probabilities. This
Lyapunov drift technique is extended at the end of the chapter to allow for joint stability and
average power minimization.
1
2
(0.49, 0.75)
(0.70, 0.33)
(0.14, 1.10)
{
{
max
max
X
Y
Z
Q
1
(t)
Q
2
(t)
A
1
(t)
A
2
(t)
S
1
(t) in {0,1}
S
2
(t) in {0,1,2}
(a)
(b)
E{A
1
(t)} =
1
E{A
2
(t)} =
2
Figure 3.1: (a) The 2queue wireless downlink example with timevarying channels. (b) The capacity
region . For λ = (0.3, 0.7) (i.e., point Y illustrated), we have
max
(λ) = 0.12.
3.1 SCHEDULINGFORSTABILITY
Consider a slotted systemwith two queues, as shown in Fig. 3.1(a). The arrival vector (A
1
(t ), A
2
(t ))
is i.i.d. over slots, where A
1
(t ) and A
2
(t ) take integer units of packets. The arrival rates
are given by λ
1
=
E{A
1
(t )} and λ
2
=
E{A
2
(t )}. The second moments E
¸
A
2
1
¸
=
E
¸
A
1
(t )
2
¸
and
E
¸
A
2
2
¸
=
E
¸
A
2
(t )
2
¸
are assumed to be ﬁnite. The wireless channels are time varying, and every
30 3. DYNAMICSCHEDULINGEXAMPLE
slot t we have a channel vector S(t ) = (S
1
(t ), S
2
(t )), where S
i
(t ) is a nonnegative integer that
represents the number of packets that can be transmitted over channel i on slot t (for i ∈ {1, 2}),
provided that the scheduler decides to transmit over that channel. The channel state processes S
1
(t )
and S
2
(t ) are independent of each other and are i.i.d. over slots, with:
• Pr[S
1
(t ) = 0] = 0.3 , Pr[S
1
(t ) = 1] = 0.7
• Pr[S
2
(t ) = 0] = 0.2 , Pr[S
2
(t ) = 1] = 0.5 , Pr[S
2
(t ) = 2] = 0.3
Every slot t the network controller observes the current channel state vector S(t ) and chooses a
single channel over which to transmit. Let α(t ) be the transmission decision on slot t , taking three
possible values:
α(t ) ∈ {“Transmit over channel 1”, “Transmit over channel 2”, “Idle”}
where α(t ) = “Idle” means that no transmission takes place on slot t . The queueing dynamics are
given by:
Q
i
(t +1) = max[Q
i
(t ) −b
i
(t ), 0] +A
i
(t ) ∀i ∈ {1, 2}, ∀t ∈ {0, 1, 2, . . .} (3.1)
where b
i
(t ) represents the amount of service offered to channel i on slot t (for i ∈ {1, 2}), deﬁned
by a function
ˆ
b
i
(α(t ), S(t )):
b
i
(t ) =
ˆ
b
i
(α(t ), S(t ))
=
¸
S
i
(t ) if α(t ) = “Transmit over channel i”
0 otherwise
(3.2)
3.1.1 THE SONLY ALGORITHMAND
max
Let S represent the set of the 6 possible outcomes for channel state vector S(t ) in the above system:
S
=
{(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)}
Consider ﬁrst the class of Sonly scheduling algorithms that make independent, stationary, and ran
domized transmission decisions every slot t based only on the observed S(t ) (and hence independent
of queue backlog). A particular Sonly algorithm for this system is characterized by probabilities
q
1
(S
1
, S
2
) and q
2
(S
1
, S
2
) for all (S
1
, S
2
) ∈ S, where q
i
(S
1
, S
2
) is the probability of transmitting
over channel i if S(t ) = (S
1
, S
2
). These probabilities must satisfy q
1
(S
1
, S
2
) +q
2
(S
1
, S
2
) ≤ 1 for
all (S
1
, S
2
) ∈ S, where we use inequality to allow the possibility of transmitting over neither channel
(useful for the power minimization problem considered later). Let α
∗
(t ) represent the transmission
decisions under a particular Sonly policy, and deﬁne b
∗
1
(t )
=
ˆ
b
1
(α
∗
(t ), S(t )), b
∗
2
(t )
=
ˆ
b
2
(α
∗
(t ), S(t ))
as the resulting transmission rates offered by this policy on slot t . We thus have for every slot t :
E
¸
b
∗
1
(t )
¸
=
¸
(S
1
,S
2
)∈S
Pr[S
1
, S
2
]S
1
q
1
(S
1
, S
2
)
E
¸
b
∗
2
(t )
¸
=
¸
(S
1
,S
2
)∈S
Pr[S
1
, S
2
]S
2
q
2
(S
1
, S
2
)
3.1. SCHEDULINGFORSTABILITY 31
where we have used Pr[S
1
, S
2
] as shorthand notation for Pr[(S
1
(t ), S
2
(t )) = (S
1
, S
2
)].
Note that the above expectations are over the randomchannel state vector S(t ) andthe random
transmission decision in reaction to this vector. Under this Sonly algorithm, b
∗
1
(t ) is i.i.d. over slots
with mean E
¸
b
∗
1
(t )
¸
, and thus the time average of b
∗
1
(t ) is equal to E
¸
b
∗
1
(t )
¸
with probability 1 (by
the law of large numbers). It follows by the Rate Stability Theorem (Theorem 2.4) that queue 1 is
rate stable if and only if λ
1
≤ E
¸
b
∗
1
(t )
¸
. Likewise, queue 2 is rate stable if and only if λ
2
≤ E
¸
b
∗
2
(t )
¸
.
However, for ﬁnite delay, it is useful to design the transmission rates to be strictly larger than the
arrival rates (see Exercises 2.6, 2.8, 2.9, 2.10). The following linear program seeks to design an
Sonly policy that maximizes the value of for which λ
1
+ ≤ E
¸
b
∗
1
(t )
¸
and λ
2
+ ≤ E
¸
b
∗
2
(t )
¸
:
Maximize: (3.3)
Subject to: λ
1
+ ≤
¸
(S
1
,S
2
)∈S
Pr[S
1
, S
2
]S
1
q
1
(S
1
, S
2
) (3.4)
λ
2
+ ≤
¸
(S
1
,S
2
)∈S
Pr[S
1
, S
2
]S
2
q
2
(S
1
, S
2
) (3.5)
q
1
(S
1
, S
2
) +q
2
(S
1
, S
2
) ≤ 1 ∀(S
1
, S
2
) ∈ S (3.6)
q
1
(S
1
, S
2
) ≥ 0, q
2
(S
1
, S
2
) ≥ 0 ∀(S
1
, S
2
) ∈ S (3.7)
There are 8 known parameters that appear as constants in the above linear program:
λ
1
, λ
2
, Pr[S
1
, S
2
] ∀(S
1
, S
2
) ∈ S (3.8)
There are 13 unknowns that act as variables to be optimized in the above linear program:
, q
1
(S
1
, S
2
), q
2
(S
1
, S
2
) ∀(S
1
, S
2
) ∈ S (3.9)
Deﬁne λ
=
(λ
1
, λ
2
), and deﬁne
max
(λ) as the maximum value of in the above problem.
It can be shown that the network capacity region is the set of all nonnegative rate vectors λ for
which
max
(λ) ≥ 0. The value of
max
represents a measure of the distance between the rate vector
λ and the capacity region boundary. If the rate vector λ is interior to the capacity region , then
max
(λ) > 0. In this simple example, it is possible to compute the capacity region explicitly, and that
is shown in Fig. 3.1(b). The ﬁgure also illustrates an example arrival rate vector (λ
1
, λ
2
) = (0.3, 0.7)
(shown as point Y in the ﬁgure), for which we have
max
(0.3, 0.7) = 0.12.
It follows that for any rate vector λ = (λ
1
, λ
2
) that is interior to the capacity region , we have
max
(λ) > 0, and there exists an Sonly algorithm that yields transmission variables (b
∗
1
(t ), b
∗
2
(t ))
that satisfy:
E
¸
b
∗
1
(t )
¸
≥ λ
1
+
max
(λ) , E
¸
b
∗
2
(t )
¸
≥ λ
2
+
max
(λ) (3.10)
3.1.2 LYAPUNOVDRIFTFORSTABLESCHEDULING
Rather than trying to solve the linear program of the preceding subsection (which would require
apriori knowledge of the arrival rates and channel probabilities speciﬁed in (3.8)), here we pursue
queue stability via an algorithm that makes decisions based on both the current channel states and
the current queue backlogs. Thus, the algorithm we present is not an Sonly algorithm. Remarkably,
32 3. DYNAMICSCHEDULINGEXAMPLE
the proof that it provides strong stability whenever the arrival rate vector is interior to the capacity
region will use the existence of the Sonly algorithm that satisﬁes (3.10), without ever needing to
solve for the 13 variables in (3.9) that deﬁne this Sonly algorithm.
Let Q(t ) = (Q
1
(t ), Q
2
(t )) be the vector of current queue backlogs, and deﬁne a Lyapunov
function L(Q(t )) as follows:
L(Q(t ))
=
1
2
[Q
1
(t )
2
+Q
2
(t )
2
] (3.11)
This represents a scalar measure of queue congestion in the network, and has the following properties:
• L(Q(t )) ≥ 0 for all backlog vectors Q(t ) = (Q
1
(t ), Q
2
(t )), with equality if and only if the
network is empty on slot t .
• L(Q(t )) being “small” implies that both queue backlogs are “small.”
• L(Q(t )) being “large” implies that at least one queue backlog is “large.”
For example, if L(Q(t )) ≤ 32, then Q
1
(t )
2
+Q
2
(t )
2
≤ 64, and thus we know that both Q
1
(t ) ≤ 8
and Q
2
(t ) ≤ 8.
If there is a ﬁnite constant M such that L(Q(t )) ≤ M for all t , then clearly all queue backlogs
are always bounded by
√
2M, and so all queues are trivially strongly stable. While we usually cannot
guarantee that the Lyapunov function is deterministically bounded, it is intuitively clear that design
ing an algorithm to consistently push the queue backlog towards a region such that L(Q(t )) ≤ M
(for some ﬁnite constant M) will help to control congestion and stabilize the queues.
One may wonder why we use a quadratic Lyapunov function, when another function, such as a
linear function, would satisfy properties similar to those stated above. When computing the change
in the Lyapunov function from one slot to the next, we will ﬁnd that the quadratic has important
dominant cross terms that include an inner product of queue backlogs and transmission rates. This
is important for the same reason that it was important to use a quadratic function in the delay
computation of Exercise 2.6, and readers seeking more intuition on the “magic” of the quadratic
function are encouraged to review that exercise.
To understand howwe can consistently push the Lyapunov function towards a lowcongestion
region, we ﬁrst use (3.1) to compute a bound on the change in the Lyapunov function from one slot
to the next:
L(Q(t +1)) −L(Q(t )) =
1
2
2
¸
i=1
[Q
i
(t +1)
2
−Q
i
(t )
2
]
=
1
2
2
¸
i=1
¸
(max[Q
i
(t ) −b
i
(t ), 0] +A
i
(t ))
2
−Q
i
(t )
2
¸
≤
2
¸
i=1
[A
i
(t )
2
+b
i
(t )
2
]
2
+
2
¸
i=1
Q
i
(t )[A
i
(t ) −b
i
(t )] (3.12)
3.1. SCHEDULINGFORSTABILITY 33
where in the ﬁnal inequality we have used the fact that for any Q ≥ 0, b ≥ 0, A ≥ 0, we have:
(max[Q−b, 0] +A)
2
≤ Q
2
+A
2
+b
2
+2Q(A −b)
Now deﬁne (Q(t )) as the conditional Lyapunov drift for slot t :
(Q(t ))
=
E{L(Q(t +1) −L(Q(t ))Q(t )} (3.13)
where the expectation depends on the control policy, and is with respect to the randomchannel states
and the (possibly random) control actions made in reaction to these channel states. From (3.12), we
have that (Q(t )) for a general control policy satisﬁes:
(Q(t )) ≤ E
¸
2
¸
i=1
A
i
(t )
2
+b
i
(t )
2
2
 Q(t )
¸
+
2
¸
i=1
Q
i
(t )λ
i
−E
¸
2
¸
i=1
Q
i
(t )b
i
(t )Q(t )
¸
(3.14)
where we have used the fact that arrivals are i.i.d. over slots and hence independent of current queue
backlogs, so that E{A
i
(t )Q(t )} = E{A
i
(t )} = λ
i
. Now deﬁne B as a ﬁnite constant that bounds
the ﬁrst term on the righthandside of the above drift inequality, so that for all t , all possible Q(t ),
and all possible control actions that can be taken, we have:
E
¸
2
¸
i=1
A
i
(t )
2
+b
i
(t )
2
2
 Q(t )
¸
≤ B
For our system, we have that at most one b
i
(t ) value can be nonzero on a given slot t . The probability
that the nonzero b
i
(t ) (if any) is equal to 2 is at most 0.3 (because Pr[S
2
(t ) = 2] = 0.3), and if it
is not equal to 2, then it is at most 1. Hence:
1
2
E
¸
2
¸
i=1
b
i
(t )
2
Q(t )
¸
≤
2
2
(0.3) +1
2
(0.7)
2
= 0.95
and thus we can deﬁne B as:
B
=
0.95 +
1
2
2
¸
i=1
E
¸
A
2
i
¸
(3.15)
Using this in (3.14) yields:
(Q(t )) ≤ B +
2
¸
i=1
Q
i
(t )λ
i
−E
¸
2
¸
i=1
Q
i
(t )b
i
(t )Q(t )
¸
To emphasize howthe righthandside of the above inequality depends on the transmission decision
α(t ), we use the identity b
i
(t ) =
ˆ
b
i
(α(t ), S(t )) to yield:
(Q(t )) ≤ B +
2
¸
i=1
Q
i
(t )λ
i
−E
¸
2
¸
i=1
Q
i
(t )
ˆ
b
i
(α(t ), S(t ))Q(t )
¸
(3.16)
34 3. DYNAMICSCHEDULINGEXAMPLE
3.1.3 THE“MINDRIFT” OR“MAXWEIGHT” ALGORITHM
Our dynamic algorithm is designed to observe the current queue backlogs (Q
1
(t ), Q
2
(t )) and the
current channel states (S
1
(t ), S
2
(t )) and to make a transmission decision α(t ) to minimize the
righthandside of the drift bound (3.16). Note that the transmission decision on slot t only affects
the ﬁnal term on the righthandside. Thus, we seek to design an algorithm that maximizes the
following expression:
E
¸
2
¸
i=1
Q
i
(t )
ˆ
b
i
(α(t ), S(t ))Q(t )
¸
The above conditional expectation is with respect to the randomly observed channel states S(t ) =
(S
1
(t ), S
2
(t )) and the (possibly random) control decision α(t ). We now use the concept of oppor
tunistically maximizing an expectation: The above expression is maximized by the algorithm that
observes the current queues (Q
1
(t ), Q
2
(t )) and channel states (S
1
(t ), S
2
(t )) and chooses α(t ) to
maximize:
2
¸
i=1
Q
i
(t )
ˆ
b
i
(α(t ), S(t )) (3.17)
This is often called the “maxweight” algorithm, as it seeks to maximize a weighted sum of the
transmission rates, where the weights are queue backlogs. As there are only three decisions (transmit
over channel 1, transmit over channel 2, or don’t transmit), it is easy to evaluate the weighted sum
(3.17) for each option:
•
¸
2
i=1
Q
i
(t )
ˆ
b
i
(α(t ), S(t )) = Q
1
(t )S
1
(t ) if we choose to transmit over channel 1.
•
¸
2
i=1
Q
i
(t )
ˆ
b
i
(α(t ), S(t )) = Q
2
(t )S
2
(t ) if we choose to transmit over channel 2.
•
¸
2
i=1
Q
i
(t )
ˆ
b
i
(α(t ), S(t )) = 0 if we choose to remain idle.
It follows that the maxweight algorithm chooses to transmit over the channel i with the largest
(positive) value of Q
i
(t )S
i
(t ), and remains idle if this value is 0 for both channels. This simple
algorithm just makes decisions based on the current queue states and channel states, and it does not
need knowledge of the arrival rates or channel probabilities.
Because this algorithm maximizes the weighted sum (3.17) over all alternative decisions, we
have:
2
¸
i=1
Q
i
(t )
ˆ
b
i
(α(t ), S(t )) ≥
2
¸
i=1
Q
i
(t )
ˆ
b
i
(α
∗
(t ), S(t ))
where α
∗
(t ) represents any alternative (possibly randomized) transmission decision that can be made
on slot t . This includes the case when α
∗
(t ) is an Sonly decision that randomly chooses one of
the three transmit options (transmit 1, transmit 2, or idle) with a distribution that depends on the
observed S(t ). Fixing a particular alternative (possibly randomized) decision α
∗
(t ) for comparison
3.1. SCHEDULINGFORSTABILITY 35
and taking a conditional expectation of the above inequality (given Q(t )) yields:
E
¸
2
¸
i=1
Q
i
(t )
ˆ
b
i
(α(t ), S(t ))Q(t )
¸
≥ E
¸
2
¸
i=1
Q
i
(t )
ˆ
b
i
(α
∗
(t ), S(t ))Q(t )
¸
where the decision α(t ) on the lefthandside of the above inequality represents the maxweight
decision made on slot t , and the decision α
∗
(t ) represents any other particular decision that could
have been made. Plugging the above directly into (3.16) yields:
(Q(t )) ≤ B +
2
¸
i=1
Q
i
(t )λ
i
−E
¸
2
¸
i=1
Q
i
(t )
ˆ
b
i
(α
∗
(t ), S(t ))Q(t )
¸
(3.18)
where the lefthandside represents the drift under the maxweight decision α(t ), and the ﬁnal term
on the righthandside involves any other decision α
∗
(t ). It is remarkable that the inequality (3.18)
holds true for all of the (inﬁnite) number of possible randomized alternative decisions that can be
plugged into the ﬁnal term on the righthandside. However, this should not be too surprising, as
we designed the maxweight policy to have exactly this property! Rearranging the terms in (3.18)
yields:
(Q(t )) ≤ B −
2
¸
i=1
Q
i
(t )[E
¸
b
∗
i
(t )Q(t )
¸
−λ
i
] (3.19)
where we have used the identity b
∗
i
(t )
=
ˆ
b
i
(α
∗
(t ), S(t )) to represent the transmission rate that would
be offered over channel i if decision α
∗
(t ) were made.
Now suppose the arrival rates (λ
1
, λ
2
) are interior to the capacity region , and consider the
particular Sonly decisionα
∗
(t ) that chooses a transmit optionindependent of queue backlog to yield
(3.10). Because channel states are i.i.d. over slots, the resulting rates (b
∗
1
(t ), b
∗
2
(t )) are independent
of current queue backlog, and so by (3.10), we have for i ∈ {1, 2}:
E
¸
b
∗
i
(t )Q(t )
¸
= E
¸
b
∗
i
(t )
¸
≥ λ
i
+
max
(λ)
Plugging this directly into (3.19) yields:
(Q(t )) ≤ B −
2
¸
i=1
Q
i
(t )
max
(λ) (3.20)
where we recall that
max
(λ) > 0. The above is a drift inequality concerning the maxweight al
gorithm on slot t , and it is now in terms of a value
max
(λ) associated with the linear program
(3.3)(3.7). However, we did not need to solve the linear program to obtain this inequality or to
implement the algorithm! It was enough to know that the solution to the linear program exists!
36 3. DYNAMICSCHEDULINGEXAMPLE
3.1.4 ITERATEDEXPECTATIONS ANDTELESCOPINGSUMS
Taking an expectation of (3.20) over the randomness of the Q
1
(t ) and Q
2
(t ) values yields:
E{(Q(t ))} ≤ B −
max
(λ)
2
¸
i=1
E{Q
i
(t )} (3.21)
Using the deﬁnition of (Q(t )) in (3.13) with the law of iterated expectations yields:
E{(Q(t ))} = E{E{L(Q(t +1)) −L(Q(t ))Q(t )}} = E{L(Q(t +1))} −E{L(Q(t ))}
Substituting this identity into (3.21) yields:
E{L(Q(t +1))} −E{L(Q(t ))} ≤ B −
max
(λ)
2
¸
i=1
E{Q
i
(t )}
The above holds for all t ∈ {0, 1, 2, . . .}. Summing over t ∈ {0, 1, . . . , T −1} for some integer
T > 0 yields (by telescoping sums):
E{L(Q(T ))} −E{L(Q(0))} ≤ BT −
max
(λ)
T −1
¸
t =0
2
¸
i=1
E{Q
i
(t )}
Rearranging terms, dividing by
max
(λ)T , and using the fact that L(Q(T )) ≥ 0 yields:
1
T
T −1
¸
t =0
2
¸
i=1
E{Q
i
(t )} ≤
B
max
(λ)
+
E{L(Q(0))}
max
(λ)T
Assuming that E{L(Q(0))} < ∞ and taking a limsup yields:
limsup
T →∞
1
T
T −1
¸
t =0
2
¸
i=1
E{Q
i
(t )} ≤
B
max
(λ)
Thus, all queues are strongly stable, and the total average backlog (summed over both queues) is less
than or equal to B/
max
(λ). Thus, the maxweight algorithm (developed by minimizing a bound
on the Lyapunov drift) ensures the queueing network is strongly stable whenever the rate vector
λ is interior to the capacity region , with an average queue congestion bound that is inversely
proportional to the distance the rate vector is away from the capacity region boundary.
As an example, assume λ
1
= 0.3 and λ
2
= 0.7, illustrated by the point Y of Fig. 3.1(b). Then
max
= 0.12. Assuming arrivals are Bernoulli so that E
¸
A
2
i
¸
= E{A
i
} = λ
i
and using the value of
B = 1.45 obtained from (3.15), we have:
Q
1
+Q
2
≤
1.45
0.12
= 12.083 packets
3.2. STABILITY ANDAVERAGEPOWERMINIMIZATION 37
where Q
1
+Q
2
represents the limsup time average expected queue backlog in the network. By
Little’s Theorem (129), average delay satisﬁes:
W =
Q
1
+Q
2
λ
1
+λ
2
≤ 12.083 slots
A simulation of the algorithm over 10
6
slots yields an empirical average queue backlog of
Q
empirical
1
+Q
empirical
2
= 3.058 packets, and hence in this example, our upper bound overesti
mates backlog by roughly a factor of 4.
Thus, the actual maxweight algorithm performs much better than the bound would suggest.
There are three reasons for this gap: (i) A simple upper bound was used when computing the
Lyapunov drift in (3.12), (ii) The value B used an upper bound on the second moments of service,
(iii) The drift inequality compares to a queueunaware Sonly algorithm, whereas the actual drift
is much better because our algorithm considers queue backlog. The third reason often dominates in
networks with many queues. For example, in (100) it is shown that average congestion and delay in
an Nqueue wireless system with one server and ON/OFF channels is at least proportional to N if
a queueunaware algorithm is used (a related result is derived for N ×N packet switches in (99)).
However, a more sophisticated queue grouping analysis in (101) shows that the maxweight algorithm
on the ON/OFF downlink system gives average backlog and delay that is O(1), independent of the
number of queues. For brevity, we do not include queue grouping concepts in this text. The interested
reader is referred to the above references, see also queue grouping results in (102)(103)(104)(105).
3.1.5 SIMULATIONOFTHEMAXWEIGHTALGORITHM
Fig. 3.2 shows simulation results over 10
6
slots when the rate vector (λ
1
, λ
2
) is pushed up the line
segment from X to Z in the ﬁgure, again assuming independent Bernoulli arrivals. The point Z
is (λ
1
, λ
2
) = (0.372, 0.868). In the ﬁgure, the xaxis is a normalization factor ρ that speciﬁes the
distance along the segment (so that ρ = 0 is the point X, ρ = 1 is the point Z, and ρ = 0.806 is
the point Y). It can be seen that the network is strongly stable for all rates with ρ < 1, and it has
average backlog that increases to inﬁnity at the vertical asymptote deﬁned by the capacity region
boundary (i.e., at ρ = 1).
Also plotted in Fig. 3.2 is the upperbound B/
max
(λ) (where we have computed
max
(λ)
for each input rate vector λ simulated). This bound shows the same qualitative behavior, but it is
roughly a factor of 4 larger than the empirically observed backlog.
3.2 STABILITY ANDAVERAGEPOWERMINIMIZATION
Nowconsider the same system, but deﬁne p(t ) as the power expenditure incurred by the transmission
decision α(t ) on slot t . To emphasize that power is a function of α(t ), we write p(t ) = ˆ p(α(t )) and
38 3. DYNAMICSCHEDULINGEXAMPLE
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
5
10
15
20
25
30
35
40
45
50
Average queue backlog versus ρ
A
v
e
r
a
g
e
q
u
e
u
e
b
a
c
k
l
o
g
E
[
Q
1
+
Q
2
]
ρ
Bound
Simulation
1
2
(0.49, 0.75)
(0.70, 0.33)
(0.14, 1.10)
{
{
max
max
X
Y
Z
Figure 3.2: Average sumqueue backlog (in units of packets) under the maxweight algorithm, as loading
is pushed from point X (i.e., ρ = 0) to point Z (i.e., ρ = 1). Each simulated data point is an average over
10
6
slots.
assume the following simple power function:
ˆ p(α(t )) =
¸
1 if α(t ) ∈ {“Transmit over channel 1,” “Transmit over channel 2”}
0 if α(t ) = “Idle”
That is, we spend 1 unit of power if we transmit over either channel, and no power is spent if we
remain idle. Our goal is now to make transmission decisions to jointly stabilize the system while
also striving to minimize average power expenditure.
For a given rate vector (λ
1
, λ
2
) in the capacity region , deﬁne (λ
1
, λ
2
) as the minimum
average power that can be achieved by any Sonly algorithm that makes all queues rate stable. The
value (λ
1
, λ
2
) can be computed by solving the following linear program(compare with (3.3)(3.7)):
Minimize:
=
¸
(S
1
,S
2
)∈S
Pr[S
1
, S
2
](q
1
(S
1
, S
2
) +q
2
(S
1
, S
2
))
Subject to: λ
1
≤
¸
(S
1
,S
2
)∈S
Pr[S
1
, S
2
]S
1
q
1
(S
1
, S
2
)
λ
2
≤
¸
(S
1
,S
2
)∈S
Pr[S
1
, S
2
]S
2
q
2
(S
1
, S
2
)
q
1
(S
1
, S
2
) +q
2
(S
1
, S
2
) ≤ 1 ∀(S
1
, S
2
) ∈ S
q
1
(S
1
, S
2
) ≥ 0 , q
2
(S
1
, S
2
) ≥ 0 ∀(S
1
, S
2
) ∈ S
Thus, for each λ ∈ , there is an Sonly algorithm α
∗
(t ) such that:
E
¸
ˆ
b
1
(α
∗
(t ), S(t ))
¸
≥ λ
1
, E
¸
ˆ
b
2
(α
∗
(t ), S(t ))
¸
≥ λ
2
, E
¸
ˆ p(α
∗
(t ))
¸
= (λ
1
, λ
2
)
It can be shown that (λ
1
, λ
2
) is the minimum time average expected power expenditure that can
be achieved by any control policy that stabilizes the system (including policies that are not Sonly)
(21). Further, (λ
1
, λ
2
) is continuous, convex, and entrywise nondecreasing.
3.2. STABILITY ANDAVERAGEPOWERMINIMIZATION 39
Now assume that λ = (λ
1
, λ
2
) is interior to , so that (λ
1
+, λ
2
+) ∈ for all such
that 0 ≤ ≤
max
(λ). It follows that whenever 0 ≤ ≤
max
(λ), there exists an Sonly algorithm
α
∗
(t ) such that:
E
¸
ˆ
b
1
(α
∗
(t ), S(t ))
¸
≥ λ
1
+ (3.22)
E
¸
ˆ
b
2
(α
∗
(t ), S(t ))
¸
≥ λ
2
+ (3.23)
E
¸
ˆ p(α
∗
(t ))
¸
= (λ
1
+, λ
2
+) (3.24)
3.2.1 DRIFTPLUSPENALTY
Deﬁne the same Lyapunov function L(Q(t )) as in (3.11), and let (Q(t )) represent the conditional
Lyapunov drift for slot t . While taking actions to minimize a bound on (Q(t )) every slot t would
stabilize the system, the resulting average power expenditure might be unnecessarily large. For ex
ample, suppose the rate vector is (λ
1
, λ
2
) = (0, 0.4), and recall that Pr[S
2
(t ) = 2] = 0.3. Then the
driftminimizing algorithm of the previous section would transmit over channel 2 whenever the
queue is not empty and S
2
(t ) ∈ {1, 2}. In particular, it would sometimes use “inefﬁcient” transmis
sions when S
2
(t ) = 1, which spend one unit of power but only deliver 1 packet. However, if we
only transmit when S
2
(t ) = 2 and when the number of packets in the queue is at least 2, it can be
shown that the system is still stable, but power expenditure is reduced to its minimum of λ
2
/2 = 0.2
units/slot.
Instead of taking a control action to minimize a bound on (Q(t )), we minimize a bound
on the following driftpluspenalty expression:
(Q(t )) +VE{p(t )Q(t )}
where V ≥ 0 is a parameter that represents an “importance weight” on how much we emphasize
power minimization. Such a control decision can be motivated as follows: We want to make (Q(t ))
small to pushqueue backlog towards a lower congestionstate, but we also want to make E{p(t )Q(t )}
small so that we do not incur a large power expenditure. We thus decide according to the above
weighted sum. We nowshowthat this intuitive algorithmleads to a provable powerbacklog tradeoff:
Average power can be pushed arbitrarily close to (λ
1
, λ
2
) by using a large value of V, at the expense
of incurring an average queue backlog that is O(V).
We have already computed a bound on (Q(t )) in (3.16), and so adding VE{p(t )Q(t )} to
both sides of (3.16) yields a bound on the driftpluspenalty:
(Q(t )) +VE{p(t )Q(t )} ≤ B +VE
¸
ˆ p(α(t ))Q(t )
¸
+
2
¸
i=1
Q
i
(t )λ
i
−E
¸
2
¸
i=1
Q
i
(t )
ˆ
b
i
(α(t ), S(t ))Q(t )
¸
(3.25)
40 3. DYNAMICSCHEDULINGEXAMPLE
where we have used the fact that p(t ) = ˆ p(α(t )). The driftpluspenalty algorithm then observes
(Q
1
(t ), Q
2
(t )) and (S
1
(t ), S
2
(t )) every slot t and chooses an action α(t ) to minimize the right
handside of the above inequality. Again, using the concept of opportunistically minimizing an
expectation, this is accomplished by greedily minimizing:
value = V ˆ p(α(t )) −
2
¸
i=1
Q
i
(t )
ˆ
b
i
(α(t ), S(t ))
We thus compare the following values and choose the action corresponding to the smallest (breaking
ties arbitrarily):
• value[1] = V −Q
1
(t )S
1
(t ) if α(t ) = “Transmit over channel 1.”
• value[2] = V −Q
2
(t )S
2
(t ) if α(t ) = “Transmit over channel 2.”
• value[Idle] = 0 if α(t ) = “Idle.”
3.2.2 ANALYSIS OFTHEDRIFTPLUSPENALTY ALGORITHM
Because our decisions α(t ) minimize the righthandside of the driftpluspenalty inequality (3.25)
on every slot t (given the observed Q(t )), we have:
(Q(t )) +VE{p(t )Q(t )} ≤ B +VE
¸
ˆ p(α
∗
(t ))Q(t )
¸
+
2
¸
i=1
Q
i
(t )λ
i
−E
¸
2
¸
i=1
Q
i
(t )
ˆ
b
i
(α
∗
(t ), S(t ))Q(t )
¸
(3.26)
where α
∗
(t ) is any other (possibly randomized) transmission decision that can be made on slot t .
Now assume that λ is interior to , and ﬁx any value such that 0 ≤ ≤
max
(λ). Plugging the
Sonly algorithm (3.22)(3.24) into the righthandside of the above inequality and noting that
this policy makes decisions independent of queue backlog yields:
(Q(t )) +VE{p(t )Q(t )} ≤ B +V(λ
1
+, λ
2
+) +
2
¸
i=1
Q
i
(t )λ
i
−
2
¸
i=1
Q
i
(t )(λ
i
+)
= B +V(λ
1
+, λ
2
+) −
2
¸
i=1
Q
i
(t ) (3.27)
3.2. STABILITY ANDAVERAGEPOWERMINIMIZATION 41
Taking expectations of the above inequality and using the law of iterated expectations as before
yields:
E{L(Q(t +1))} −E{L(Q(t ))} +VE{p(t )} ≤ B +V(λ
1
+, λ
2
+) −
2
¸
i=1
E{Q
i
(t )}
Summing the above over t ∈ {0, 1, . . . , T −1} for some positive integer T yields:
E{L(Q(T ))} −E{L(Q(0))} +V
T −1
¸
t =0
E{p(t )} ≤ BT +VT (λ
1
+, λ
2
+)
−
T −1
¸
t =0
2
¸
i=1
E{Q
i
(t )} (3.28)
Rearranging terms in the above and neglecting nonnegative quantities where appropriate yields the
following two inequalities:
1
T
T −1
¸
t =0
E{p(t )} ≤ (λ
1
+, λ
2
+) +
B
V
+
E{L(Q(0))}
VT
1
T
T −1
¸
t =0
2
¸
i=1
E{Q
i
(t )} ≤
B +V[(λ
1
+, λ
2
+) −
1
T
¸
T −1
t =0
E{p(t )}]
+
E{L(Q(0))}
T
where the ﬁrst inequality follows by dividing (3.28) by VT and the second follows by dividing (3.28)
by T . Taking limits as T → ∞ shows that:
1
p
=
lim
T →∞
1
T
T −1
¸
t =0
E{p(t )} ≤ (λ
1
+, λ
2
+) +
B
V
(3.29)
Q
1
+Q
2
=
lim
T →∞
1
T
T −1
¸
t =0
2
¸
i=1
E{Q
i
(t )} ≤
B
+
V[(λ
1
+, λ
2
+) −p]
(3.30)
3.2.3 OPTIMIZINGTHEBOUNDS
The bounds (3.29) and (3.30) hold for any that satisﬁes 0 ≤ ≤
max
(λ), and hence they can be
optimized separately. Plugging
max
(λ) into (3.30) shows that both queues are strongly stable. Using
= 0 in (3.29) thus yields:
(λ
1
, λ
2
) ≤ p ≤ (λ
1
, λ
2
) +
B
V
(3.31)
1
In this simple example, the system evolves according to a countably inﬁnite state space Discrete Time Markov Chain (DTMC),
and it can be shown that the limits in (3.29) and (3.30) are well deﬁned.
42 3. DYNAMICSCHEDULINGEXAMPLE
where the ﬁrst inequality follows because our algorithm stabilizes the network and thus cannot
yield time average expected power lower than (λ
1
, λ
2
), the inﬁmum time average expected power
required for stability of any algorithm.
Because p ≥ (λ
1
, λ
2
), it can be shown that:
(λ
1
+, λ
2
+) −p ≤ (λ
1
+, λ
2
+) −(λ
1
, λ
2
) ≤ 2
where the ﬁnal inequality holds because it requires at most one unit of energy to support each new
packet, and so increasing the total input rate from λ
1
+λ
2
to λ
1
+λ
2
+2 increases the minimum
required average power by at most 2. Plugging the above into (3.30) yields:
Q
1
+Q
2
≤
B
+2V
The above holds for all that satisfy 0 ≤ ≤
max
(λ), and so plugging =
max
(λ) yields:
Q
1
+Q
2
≤
B
max
(λ)
+2V (3.32)
The performance bounds (3.31) and (3.32) demonstrate an [O(1/V), O(V)] powerbacklog trade
off: We can use an arbitrarily large V to make B/V arbitrarily small, so that (3.31) implies the time
average power p is arbitrarily close to the optimum (λ
1
, λ
2
). This comes with a tradeoff: The
average queue backlog bound in (3.32) is O(V).
3.2.4 SIMULATIONS OFTHEDRIFTPLUSPENALTY ALGORITHM
Consider the previous example of Bernoulli arrivals with λ
1
= 0.3, λ
2
= 0.7,
max
(λ) = 0.12, B =
1.45, which corresponds to point Y in Fig. 3.1(b). Then the bounds (3.31)(3.32) become:
p ≤ (λ
1
, λ
2
) +
1.45
V
(3.33)
Q
1
+Q
2
≤
1.45
0.12
+2V (3.34)
Figs. 3.3 and 3.4 plot simulations for this systemtogether with the above power and backlog bounds.
Each simulated data point represents a simulation over 2 ×10
6
slots using a particular value of V.
Values of V in the range 0 to 100 are shown. It is clear from the ﬁgures that average power converges
to the optimal p
∗
= 0.7 as V increases, while average backlog increases linearly in V.
Performance can be signiﬁcantly improved by noting that the driftpluspenalty algorithm
given in Section 3.2.1 never transmits from queue 1 unless Q
1
(t ) ≥ V (else, value[1] would be
positive). Hence, Q
1
(t ) ≥ Q
place
1
=
max[V −1, 0] for all slots t ≥ 0, provided that this holds at
t = 0. Similarly, the algorithm never transmits from queue 2 unless Q
2
(t ) ≥ V/2, and so Q
2
(t ) ≥
Q
place
2
=
max[V/2 −2, 0] for all slots t ≥ 0, provided this holds at t = 0. It follows that we can
stack the queues with fake packets (called placeholder packets) that never get transmitted, as described
3.3. GENERALIZATIONS 43
0 10 20 30 40 50 60 70 80 90 100
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
Average power versus V
V
A
v
e
r
a
g
e
p
o
w
e
r
Upper bound
Simulation with and without placeholders
(indistinguishable)
Optimal value p*
Figure 3.3: Average power versus V with
(λ
1
, λ
2
) = (0.3, 0.7).
0 10 20 30 40 50 60 70 80 90 100
0
50
100
150
200
250
Simulation (with placeholders)
Bound (with placeholders)
Bound (without placeholders)
Simulation (without placeholders)
A
v
e
r
a
g
e
b
a
c
k
l
o
g
E
[
Q
1
+
Q
2
]
(
p
a
c
k
e
t
s
)
V
Average backlog versus V
Figure 3.4: Average backlog versus V with
(λ
1
, λ
2
) = (0.3, 0.7).
in more detail in Section 4.8 of the next chapter. This placeholder technique yields the same power
guarantee (3.33), but it has a signiﬁcantly improved queue backlog bound given by:
(with placeholders) Q
1
+Q
2
≤
1.45
0.12
+2V −max[V −1, 0] −max[V/2 −2, 0]
Thus, the average queue bound under the placeholder technique grows like 0.5V, rather than 2V as
suggested in (3.34), a dramatic savings when V is large. Simulations of the placeholder technique
are also shown in Figs. 3.3 and 3.4. The queue backlog improvements due to placeholders are quite
signiﬁcant (Fig. 3.4), with no noticeable difference in power expenditure (Fig. 3.3). Indeed, the sim
ulated power expenditure curves for the cases with and without placeholders are indistinguishable
in Fig. 3.3. A plot of queue values over the ﬁrst 3000 slots is given in Chapter 4, Fig. 4.2.
3.3 GENERALIZATIONS
The reader can easily see that the analysis in this chapter, which considers an example system of
2 queues, can be repeated for a larger system of K queues. Indeed, in that case the “min drift
pluspenalty” algorithm generalizes to choosing α(t ) to maximize
¸
K
k=1
Q
k
(t )
ˆ
b
k
(α(t ), S(t )) −
V ˆ p(α(t )). This holds for systems with more general channel states S(t ), more general resource
allocation decisions α(t ), and for arbitrary rate functions
ˆ
b
k
(α(t ), S(t )) and “penalty functions”
ˆ p(α(t )). In particular:
• The vector S(t ) might have an inﬁnite number of possible outcomes (rather than just 6
outcomes).
• The decision α(t ) might represent one of an inﬁnite number of possible power allocation
options (rather than just one of three options). Alternatively, α(t ) might represent one of an
44 3. DYNAMICSCHEDULINGEXAMPLE
inﬁnite number of more sophisticated physical layer actions that can take place on slot t (such
as modulation, coding, beamforming, etc.).
• The rate function
ˆ
b
k
(α(t ), S(t )) can be any function that maps a resource allocation decision
α(t ) and a channel state vector S(t ) into a transmission rate (and does not need to have the
structure (3.2)).
• The “penalty” function ˆ p(α(t )) does not have to represent power, and it can be any general
function of α(t ).
The next chapter presents the general theory. It develops an important concept of virtual
queues to ensure general time average equality and inequality constraints are satisﬁed. It also considers
variable V algorithms that achieve the exact minimum average penalty subject to mean rate stability
(which typically incurs inﬁnite average backlog). Finally, it shows how to analyze systems with
noni.i.d. and nonergodic arrival and channel processes.
45
C H A P T E R 4
Optimizing Time Averages
This chapter considers the problem (1.1)(1.5), which seeks to minimize the time average of a
network attribute subject to additional time average constraints. We ﬁrst develop the main results
of Lyapunov drift and Lyapunov optimization theory.
4.1 LYAPUNOVDRIFTANDLYAPUNOVOPTIMIZATION
Consider a systemof N queues, and let (t ) = (
1
(t ), . . . ,
N
(t )) be the queue backlog vector. The
reason we use notation (t ) to represent a queue vector, instead of Q(t ), is that in later sections we
deﬁne (t )
=
[Q(t ), Z(t ), H(t )], where Q(t ) is a vector of actual queues in the network and Z(t ),
H(t ) are suitably chosen virtual queues. Assume the (t ) vector evolves over slots t ∈ {0, 1, 2, . . .}
according to some probability law. The components
n
(t ) are real numbers and can possibly be
negative. Allowing
n
(t ) to take negative values is often useful for the virtual queues that are
deﬁned later.
As a scalar measure of the “size” of the vector (t ), deﬁne a quadratic Lyapunov function
L((t )) as follows:
L((t ))
=
1
2
N
¸
n=1
w
n
n
(t )
2
(4.1)
where {w
n
}
N
n=1
are a collection of positive weights. We typically use w
n
= 1 for all n, as in (3.11)
of Chapter 3, although different weights are often useful to allow queues to be treated differently.
This function L((t )) is always nonnegative, and it is equal to zero if and only if all components
of (t ) are zero. Deﬁne the oneslot conditional Lyapunov drift ((t )) as follows:
1
((t ))
=
E{L((t +1)) −L((t ))(t )} (4.2)
This drift is the expected change in the Lyapunov function over one slot, given that the current state
in slot t is (t ).
4.1.1 LYAPUNOVDRIFTTHEOREM
Theorem 4.1 (Lyapunov Drift) Consider the quadratic Lyapunov function (4.1), and assume
E{L((0))} < ∞. Suppose there are constants B > 0, ≥ 0 such that the following drift condition
1
Strictly speaking, better notation would be ((t ), t ), as the drift may be due to a nonstationary policy. However, we use the
simpler notation ((t )) as a formal representation of the righthandside of (4.2).
46 4. OPTIMIZINGTIMEAVERAGES
holds for all slots τ ∈ {0, 1, 2, . . .} and all possible (τ):
((τ)) ≤ B −
N
¸
n=1

n
(τ) (4.3)
Then:
a) If ≥ 0 then all queues
n
(t ) are mean rate stable.
b) If > 0, then all queues are strongly stable and:
limsup
t →∞
1
t
t −1
¸
τ=0
N
¸
n=1
E{
n
(τ)} ≤
B
(4.4)
Proof. We ﬁrst prove part (b). Taking expectations of (4.3) and using the lawof iterated expectations
yields:
E{L((τ +1))} −E{L((τ))} ≤ B −
N
¸
n=1
E{
n
(τ)}
Summing the above over τ ∈ {0, 1, . . . , t −1} for some slot t > 0 and using the law of telescoping
sums yields:
E{L((t ))} −E{L((0))} ≤ Bt −
t −1
¸
τ=0
N
¸
n=1
E{
n
(τ)} (4.5)
Now assume that > 0. Dividing by t , rearranging terms, and using the fact that E{L((t ))} ≥ 0
yields:
1
t
t −1
¸
τ=0
N
¸
n=1
E{
n
(τ)} ≤
B
+
E{L((0))}
t
(4.6)
The above holds for all slots t > 0. Taking a limit as t → ∞ proves part (b).
To prove part (a), we have from (4.5) that for all slots t > 0:
E{L((t ))} −E{L((0))} ≤ Bt
Using the deﬁnition of L((t )) yields:
1
2
N
¸
n=1
w
n
E
¸
n
(t )
2
¸
≤ E{L((0))} +Bt
Therefore, for all n ∈ {1, . . . , N}, we have:
E
¸
n
(t )
2
¸
≤
2E{L((0))}
w
n
+
2Bt
w
n
4.1. LYAPUNOVDRIFTANDLYAPUNOVOPTIMIZATION 47
However, because the variance of 
n
(t ) cannot be negative, we have E
¸
n
(t )
2
¸
≥ E{
n
(t )}
2
.
Thus, for all slots t > 0, we have:
E{
n
(t )} ≤
2E{L((0))}
w
n
+
2Bt
w
n
(4.7)
Dividing by t and taking a limit as t → ∞ proves that:
lim
t →∞
E{
n
(t )}
t
≤ lim
t →∞
2E{L((0))}
t
2
w
n
+
2B
t w
n
= 0
Thus, all queues
n
(t ) are mean rate stable, proving part (a). 2
The above theoremshows that if the drift condition (4.3) holds with ≥ 0, so that ((t )) ≤
B, then all queues are mean rate stable. Further, if > 0, then all queues are strongly stable with time
average expected queue backlog bounded by B/. We note that the proof reveals further detailed
information concerning expected queue backlog for all slots t > 0, showing how the affect of the
initial condition (0) decays over time (see (4.6) and (4.7)).
4.1.2 LYAPUNOVOPTIMIZATIONTHEOREM
Suppose that, in addition to the queues (t ) that we want to stabilize, we have an associated
stochastic “penalty” process y(t ) whose time average we want to make less than (or close to) some
target value y
∗
. The process y(t ) can represent penalties incurred by control actions on slot t , such
as power expenditures, packet drops, etc. Assume the expected penalty is lower bounded by a ﬁnite
(possibly negative) value y
min
, so that for all t and all possible control actions, we have:
E{y(t )} ≥ y
min
(4.8)
Theorem 4.2 (Lyapunov Optimization) Suppose L((t )) and y
min
are deﬁned by (4.1) and (4.8),
and that E{L((0))} < ∞. Suppose there are constants B ≥ 0, V ≥ 0, ≥ 0, and y
∗
such that for all
slots τ ∈ {0, 1, 2, . . .} and all possible values of (τ), we have:
((τ)) +VE{y(τ)(τ)} ≤ B +Vy
∗
−
N
¸
n=1

n
(τ) (4.9)
Then all queues
n
(t ) are mean rate stable. Further, if V > 0 and > 0 then time average expected
penalty and queue backlog satisfy:
limsup
t →∞
1
t
t −1
¸
τ=0
E{y(τ)} ≤ y
∗
+
B
V
(4.10)
limsup
t →∞
1
t
t −1
¸
τ=0
N
¸
n=1
E{
n
(τ)} ≤
B +V(y
∗
−y
min
)
(4.11)
48 4. OPTIMIZINGTIMEAVERAGES
Finally, if V = 0 then (4.11) still holds, and if = 0 then (4.10) still holds.
Proof. Fix any slot τ. Because (4.9) holds for this slot, we can take expectations of both sides and
use the law of iterated expectations to yield:
E{L((τ +1))} −E{L((τ))} +VE{y(τ)} ≤ B +Vy
∗
−
N
¸
n=1
E{
n
(τ)}
Summing over τ ∈ {0, 1, . . . , t −1} for some t > 0 and using the law of telescoping sums yields:
E{L((t ))} −E{L((0))} +V
t −1
¸
τ=0
E{y(τ)} ≤ (B +Vy
∗
)t −
t −1
¸
τ=0
N
¸
n=1
E{
n
(τ)} (4.12)
Rearranging terms and neglecting nonnegative terms when appropriate, it is easy to show that the
above inequality directly implies the following two inequalities for all t > 0:
1
t
t −1
¸
τ=0
E{y(τ)} ≤ y
∗
+
B
V
+
E{L((0))}
Vt
(4.13)
1
t
t −1
¸
τ=0
N
¸
n=1
E{
n
(τ)} ≤
B +V(y
∗
−y
min
)
+
E{L((0))}
t
(4.14)
where (4.13) follows by dividing (4.12) by Vt , and (4.14) follows by dividing (4.12) by t . Taking
limits of the above as t → ∞ proves (4.10) and (4.11).
Rearranging (4.12) also yields:
E{L((t ))} ≤ E{L((0))} +(B +V(y
∗
−y
min
))t
from which mean rate stability follows by an argument similar to that given in the proof of Theorem
4.1. 2
Theorem4.2 can be understood as follows: If for any parameter V > 0, we can design a control
algorithmto ensure the drift condition (4.9) is satisﬁed on every slot τ, then the time average expected
penalty satisﬁes (4.10) and hence is either less than the target value y
∗
, or differs from y
∗
by no
more than a “fudge factor” B/V, which can be made arbitrarily small as V is increased. However, the
time average queue backlog bound increases linearly in the V parameter, as shown by (4.11). This
presents a performancebacklog tradeoff of [O(1/V), O(V)]. Because Little’s Theorem tells us that
average queue backlog is proportional to average delay (129), we often call this a performancedelay
tradeoff. The proof reveals further details concerning the affect of the initial condition (0) on time
average expectations at any slot t (see (4.13) and (4.14)).
4.1. LYAPUNOVDRIFTANDLYAPUNOVOPTIMIZATION 49
This result suggests the following control strategy: Every slot τ, observe the current (τ)
values and take a control action that, subject to the known (τ), greedily minimizes the driftplus
penalty expression on the lefthandside of the desired drift inequality (4.9):
((τ)) +VE{y(τ)(τ)} (4.15)
It follows that if on every slot τ, there exists a particular control action that satisﬁes the drift require
ment (4.9), then the driftpluspenalty minimizing policy must also satisfy this drift requirement.
For intuition, note that taking an action on slot τ to minimize the drift ((τ)) alone would
tend to push queues towards a lower congestion state, but it may incur a large penalty y(τ). Thus,
we minimize a weighted sum of drift and penalty, where the penalty is scaled by an “importance”
weight V, representing how much we emphasize penalty minimization. Using V = 0 corresponds
to minimizing the drift ((τ)) alone, which reduces to the TassiulasEphremides technique for
network stability in (7)(8). While this does not provide any guarantees on the resulting time average
penalty y(t ) (as the bound (4.10) becomes inﬁnity for V = 0), it still ensures strong stability by (4.11).
The case for V > 0 includes a weighted penalty term in the greedy minimization, and corresponds
to our technique for joint stability and performance optimization, developed for utility optimal ﬂow
control in (17)(18) and used for average power optimization in (20)(21) and for problems similar to
the type (1.1)(1.5) and (1.6)(1.11) in (22).
4.1.3 PROBABILITY 1 CONVERGENCE
Here we present a version of the Lyapunov optimization theorem that treats probability 1 conver
gence of sample path time averages, rather than time average expectations. We have the following
preliminary lemma, related to the Kolmogorov law of large numbers:
Lemma 4.3 Let X(t ) be a randomprocess deﬁned over t ∈ {0, 1, 2, . . .}, and suppose that the following
hold:
• E
¸
X(t )
2
¸
is ﬁnite for all t ∈ {0, 1, 2, . . .} and satisﬁes:
∞
¸
t =1
E
¸
X(t )
2
¸
t
2
< ∞
• There is a realvalued constant β such that for all t ∈ {1, 2, 3, . . .} and all possible X(0), . . . , X(t −
1), the conditional expectation satisﬁes:
E{X(t )X(t −1), X(t −2), . . . , X(0)} ≤ β
Then:
limsup
t →∞
1
t
t −1
¸
τ=0
X(τ) ≤ β (w.p.1)
50 4. OPTIMIZINGTIMEAVERAGES
where “(w.p.1)” stands for “with probability 1.”
A proof of this lemma is given in (138) as a simple application of the Kolmogorov law of large
numbers for martingale differences. See (139)(140)(130)(141) for background on martingales and a
statement and proof of the Kolmogorov law of large numbers. The lemma is used in (138) to prove
the probability 1 version of the Lyapunov optimization theorem given below.
Let (t ) be a vector of queues and y(t ) a penalty process, as before. Rather than deﬁning
a drift that conditions on (t ), we must condition on the full history H(t ), which includes values
of (τ) for τ ∈ {0, . . . , t } and values of y(τ) for τ ∈ {0, . . . , t −1}. Speciﬁcally, for integers t ≥ 0
deﬁne:
H(t )
=
{(0), (1), . . . , (t ), y(0), y(1), . . . , y(t −1)}
Deﬁne (t, H(t )) by:
(t, H(t ))
=
E{L((t +1)) −L((t ))H(t )}
Assume that:
• The penalty process y(t ) is deterministically lower bounded by a (possibly negative) constant
y
min
, so that:
y(t ) ≥ y
min
∀t (w.p.1) (4.16)
• The second moments E
¸
y(t )
2
¸
are ﬁnite for all t ∈ {0, 1, 2, . . .}, and:
∞
¸
t =1
E
¸
y(t )
2
¸
t
2
< ∞ (4.17)
• There is a ﬁnite constant D > 0 such that for all n ∈ {1, . . . , N}, all t , and all possible H(t ),
we have:
E
¸
(
n
(t +1) −
n
(t ))
4
H(t )
¸
≤ D (4.18)
so that conditional fourth moments of queue changes are uniformly bounded.
Theorem 4.4 (Lyapunov Optimization with Probability 1 Convergence) Deﬁne L((t )) by (4.1),
assume that (0) is ﬁnite with probability 1, and suppose that assumptions (4.16)(4.18) hold. Suppose
there are constants B ≥ 0, V > 0, > 0, and y
∗
such that for all slots τ ∈ {0, 1, 2, . . .} and all possible
H(τ), we have:
(τ, H(τ)) +VE{y(τ)H(τ)} ≤ B +Vy
∗
−
N
¸
n=1

n
(τ)
4.1. LYAPUNOVDRIFTANDLYAPUNOVOPTIMIZATION 51
Then all queues
n
(t ) are rate stable, and:
limsup
t →∞
1
t
t −1
¸
τ=0
y(τ) ≤ y
∗
+
B
V
(w.p.1) (4.19)
limsup
t →∞
1
t
t −1
¸
τ=0
N
¸
n=1

n
(τ) ≤
B +V(y
∗
−y
min
)
(w.p.1) (4.20)
Further, if these same assumptions hold, and if there is a value y
such that the following additional
inequality also holds for all τ and all possible (τ):
(τ, H(τ)) +VE{y(τ)H(τ)} ≤ B +Vy
Then:
limsup
t →∞
1
t
t −1
¸
τ=0
y(τ) ≤ y
+B/V (w.p.1) (4.21)
Proof. Fix (0) as a given ﬁnite initial condition. Deﬁne the process X(t ) for t ∈ {0, 1, 2, . . .} as
follows:
X(t )
=
L((t +1)) −L((t )) +Vy(t ) −B −Vy
∗
+
N
¸
n=1

n
(t )
The conditions on y(t ) and (t ) are shown in (138) to ensure that the queues
n
(t ) are rate stable,
that E
¸
X(t )
2
¸
is ﬁnite for all t , and that for all t > 0 and all possible values of X(t −1), . . . , X(0):
∞
¸
t =1
E
¸
X(t )
2
¸
t
2
< ∞ , E{X(t )X(t −1), X(t −2), . . . , X(0)} ≤ 0
Thus, we can apply Lemma 4.3 to X(t ) to yield:
limsup
t →∞
1
t
t −1
¸
τ=0
X(τ) ≤ 0 (w.p.1) (4.22)
However, by deﬁnition of X(t ), we have for all t > 0:
1
t
t −1
¸
τ=0
X(τ) =
L((t )) −L((0))
t
+
1
t
t −1
¸
τ=0
¸
Vy(τ) +
N
¸
n=1

n
(τ)
−B −Vy
∗
52 4. OPTIMIZINGTIMEAVERAGES
Rearranging terms in the above inequality and neglecting nonnegative terms where appropriate
directly leads to the following two inequalities that hold for all t > 0 :
1
Vt
t −1
¸
τ=0
X(τ) ≥
−L((0))
Vt
+
1
t
t −1
¸
τ=0
y(τ) −[B/V +y
∗
]
1
t
t −1
¸
τ=0
X(τ) ≥
−L((0))
t
+
1
t
t −1
¸
τ=0
N
¸
n=1

n
(τ) −
[B +V(y
∗
−y
min
)]
Taking limits of the above two inequalities and using (4.22) proves the results (4.19)(4.20). Asimilar
argument proves (4.21). 2
Conditioning onthe history H(t ) is neededtoproveTheorem4.4via Lemma 4.3. Apolicy that
greedily minimizes (t, H(t )) +VE{y(t )H(t )} every slot will also greedily minimize ((t )) +
VE{y(t )(t )}. In this text, we focus primarily on time average expectations of the type (4.10) and
(4.11), with the understanding that the same bounds can be shown to hold for time averages (with
probability 1) if the additional assumptions (4.16)(4.18) hold.
4.2 GENERAL SYSTEMMODEL
Q
1
(t)
Q
2
(t)
Q
K
(t)
a
1
(t)
a
2
(t)
a
K
(t)
b
1
(t)
b
2
(t)
b
K
(t)
Attributes:
1) y
l
(t) for l in {1, ..., L}
2) e
j
(t) for j in {1, ..., J}
Random State: (t)
Control Action: (t)
Figure 4.1: An illustration of a general Kqueue network with attributes y
l
(t ), e
j
(t ).
Consider now a system with queue backlog vector Q(t ) = (Q
1
(t ), . . . , Q
K
(t )), as shown in
Fig. 4.1. Queue dynamics are given by:
Q
k
(t +1) = max[Q
k
(t ) −b
k
(t ), 0] +a
k
(t ) (4.23)
where a(t ) = (a
1
(t ), . . . , a
K
(t )) and b(t ) = (b
1
(t ), . . . , b
K
(t )) are general functions of a random
event ω(t ) and a control action α(t ):
a
k
(t ) = ˆ a
k
(α(t ), ω(t )) , b
k
(t ) =
ˆ
b
k
(α(t ), ω(t ))
4.3. OPTIMALITY VIAωONLY POLICIES 53
Every slot t the network controller observes ω(t ) and chooses an action α(t ) ∈ A
ω(t )
.The set A
ω(t )
is
the action space associated with event ω(t ). In addition to affecting these arrival and service variables,
α(t ) and ω(t ) also determine the attribute vectors x(t ), y(t ), e(t ) according to general functions
ˆ x
m
(α, ω), ˆ y
l
(α, ω), ˆ e
j
(α, ω), as described in Section 1.2.
We assume that ω(t ) is a stationary process with a stationary probability distribution π(ω).
Assume that ω(t ) takes values in some sample space . If is a ﬁnite or countably inﬁnite set,
then for each ω ∈ , π(ω) represents a probability mass function associated with the stationary
distribution, and:
Pr[ω(t ) = ω] = π(ω) ∀t ∈ {0, 1, 2, . . .} (4.24)
If is uncountably inﬁnite, then we assume ω(t ) is a random vector, and that π(ω) represents a
probability density associated with the stationary distribution. The simplest model, which we mainly
consider in this text, is the case when ω(t ) is i.i.d. over slots t with stationary probabilities π(ω).
4.2.1 BOUNDEDNESS ASSUMPTIONS
The arrival function ˆ a
k
(α, ω) is assumed to be nonnegative for all ω ∈ and all α ∈ A
ω
. The
service function
ˆ
b
k
(·) and the attribute functions ˆ x
m
(·), ˆ y
l
(·), ˆ e
j
(·) can possibly take negative values.
All of these functions are general (possibly nonconvex and discontinuous). However, we assume that
these functions, together with the stationary probabilities π(ω), satisfy the following boundedness
properties: For all t and all (possibly randomized) control decisions α(t ) ∈ A
ω(t )
, we have:
E
¸
ˆ a
k
(α(t ), ω(t ))
2
¸
≤ σ
2
∀k ∈ {1, . . . , K} (4.25)
E
¸
ˆ
b
k
(α(t ), ω(t ))
2
¸
≤ σ
2
∀k ∈ {1, . . . , K} (4.26)
E
¸
ˆ x
m
(α(t ), ω(t ))
2
¸
≤ σ
2
∀m ∈ {1, . . . , M} (4.27)
E
¸
ˆ y
l
(α(t ), ω(t ))
2
¸
≤ σ
2
∀l ∈ {1, . . . , L} (4.28)
E
¸
ˆ e
j
(α(t ), ω(t ))
2
¸
≤ σ
2
∀j ∈ {1, . . . , J} (4.29)
for some ﬁnite constant σ
2
> 0. Further, for all t and all actions α(t ) ∈ A
ω(t )
, we require the
expectation of y
0
(t ) to be bounded by some ﬁnite constants y
0,min
, y
0,max
:
y
0,min
≤ E
¸
ˆ y
0
(α(t ), ω(t ))
¸
≤ y
0,max
(4.30)
4.3 OPTIMALITY VIAωONLY POLICIES
For each l ∈ {0, 1, . . . , L}, deﬁne y
l
(t ) as the time average expectation of y
l
(t ) over the ﬁrst t slots
under a particular control strategy:
y
l
(t )
=
1
t
t −1
¸
τ=0
E{y
l
(τ)}
54 4. OPTIMIZINGTIMEAVERAGES
where the expectation is over the randomness of the ω(τ) values and the random control actions.
Deﬁne time average expectations a
k
(t ), b
k
(t ), e
j
(t ) similarly. Deﬁne y
l
and e
j
as the limiting values
of y
l
(t ) and e
j
(t ), assuming temporarily that these limits are well deﬁned. We desire a control policy
that solves the following problem:
Minimize: y
0
Subject to: 1) y
l
≤ 0 ∀l ∈ {1, . . . , L}
2) e
j
= 0 ∀j ∈ {1, . . . , J}
3) Queues Q
k
(t ) are mean rate stable ∀k ∈ {1, . . . , K}
4) α(t ) ∈ A
ω(t )
∀t
The above description of the problemis convenient, althoughwe can state the problemmore precisely
without assuming limits are well deﬁned as follows:
Minimize: limsup
t →∞
y
0
(t ) (4.31)
Subject to: 1) limsup
t →∞
y
l
(t ) ≤ 0 ∀l ∈ {1, . . . , L} (4.32)
2) lim
t →∞
e
j
(t ) = 0 ∀j ∈ {1, . . . , J} (4.33)
3) Queues Q
k
(t ) are mean rate stable ∀k ∈ {1, . . . , K} (4.34)
4) α(t ) ∈ A
ω(t )
∀t (4.35)
An example of such a problem is when we have a Kqueue wireless network that must be
stabilized subject to average power constraints P
l
≤ P
av
l
for each node l ∈ {1, . . . , L}, where P
l
represents the time average power of node l, and P
av
l
represents a prespeciﬁed average power
constraint. Suppose the goal is to maximize the time average of the total admitted trafﬁc. Then y
0
(t )
is −1 times the admitted trafﬁc on slot t . We also deﬁne y
l
(t ) = P
l
(t ) −P
av
l
, being the difference
between the average power expenditure of node l and its time average constraint, so that y
l
≤ 0
corresponds to P
l
≤ P
av
l
. In this example, there are no time average equality constraints, and so
J = 0. See also Section 4.6 and Exercises 2.11, 4.74.14 for more examples.
Consider now the special class of stationary and randomized policies that we call ωonly
policies, which observe ω(t ) for each slot t and independently choose a control action α(t ) ∈ A
ω(t )
as a pure (possibly randomized) function of the observed ω(t ). Let α
∗
(t ) represent the decisions
under such an ωonly policy over time t ∈ {0, 1, 2, . . .}. Because ω(t ) has the stationary distribution
π(ω) for all t , the expectation of the arrival, service, and attribute values are the same for all t :
E
¸
ˆ y
l
(α
∗
(t ), ω(t ))
¸
= y
l
∀l ∈ {0, 1, . . . , L}
E
¸
ˆ e
j
(α
∗
(t ), ω(t ))
¸
= e
j
∀j ∈ {1, . . . , J}
E
¸
ˆ a
k
(α
∗
(t ), ω(t ))
¸
= a
k
∀k ∈ {1, . . . , K}
E
¸
ˆ
b
k
(α
∗
(t ), ω(t ))
¸
= b
k
∀k ∈ {1, . . . , K}
for some quantities y
l
, e
j
, a
k
, b
k
. In the case when is ﬁnite or countably inﬁnite, the expectations
above can be understood as weighted sums over all ω values, weighted by the stationary distribution
4.3. OPTIMALITY VIAωONLY POLICIES 55
π(ω). Speciﬁcally:
E
¸
ˆ y
l
(α
∗
(t ), ω(t ))
¸
=
¸
ω∈
π(ω)E
¸
ˆ y
l
(α
∗
(t ), ω)ω(t ) = ω
¸
The above expectations y
l
, e
j
, a
k
, b
k
are ﬁnite under any ωonly policy because of the bound
edness assumptions (4.25)(4.30). In addition to assuming ω(t ) is a stationary process, we make
the following mild “law of large numbers” assumption concerning time averages (not time average
expectations): Under any ωonly policy α
∗
(t ) that yields expectations y
l
, e
j
, a
k
, b
k
on every slot t , the
inﬁnite horizon time averages of ˆ y
l
(α
∗
(t ), ω(t )), ˆ e
j
(α
∗
(t ), ω(t )), ˆ a
k
(α
∗
(t ), ω(t )),
ˆ
b
k
(α
∗
(t ), ω(t ))
are equal to y
l
, e
j
, a
k
, b
k
with probability 1. For example:
lim
t →∞
1
t
t −1
¸
τ=0
ˆ y
l
(α
∗
(τ), ω(τ)) = y
l
(w.p.1)
where “(w.p.1)” means “with probability 1.” This is a mild assumption that holds whenever ω(t ) is
i.i.d. over slots. This is because, by the law of large numbers, the resulting ˆ y
l
(α
∗
(t ), ω(t )) process
is i.i.d. over slots with ﬁnite mean y
l
. However, this also holds for a large class of other stationary
processes, including stationary processes deﬁned over ﬁnite state irreducible Discrete Time Markov
Chains (as considered in Section 4.9). It does not hold, for example, for degenerate stationary
processes where ω(0) can take different values according to some probability distribution, but is
then held ﬁxed for all slots thereafter so that ω(t ) = ω(0) for all t .
Under these assumptions, we say that the problem (4.31)(4.35) is feasible if there exists a
control policy that satisﬁes the constraints (4.32)(4.35). Assuming feasibility, deﬁne y
opt
0
as the in
ﬁmumvalue of the cost metric (4.31) over all control policies that satisfy the constraints (4.32)(4.35).
This inﬁmum is ﬁnite by (4.30). We emphasize that y
opt
0
considers all possible control policies that
choose α(t ) ∈ A
ω(t )
over slots t , not just ωonly policies. However, in Appendix 4.A, it is shown that
y
opt
0
can be computed in terms of ωonly policies. Speciﬁcally, it is shown that the set of all possible
limiting time average expectations of the variables [(y
l
(t )), (e
j
(t )), (a
k
(t )), (b
k
(t ))], considering all
possible algorithms, is equal to the closure of the set of all oneslot averages [(y
l
), (e
j
), (a
k
), (b
k
)]
achievable under ωonly policies. Further, the next theorem shows that if the problem (4.31)(4.35)
is feasible, then the utility y
opt
0
and the constraints y
l
≤ 0, e
j
≤ 0, a
k
≤ b
k
can be achieved arbitrarily
closely by ωonly policies.
Theorem4.5 (Optimality over ωonly Policies) Suppose the ω(t ) process is stationary with distribution
π(ω), and that the systemsatisﬁes the boundedness assumptions (4.25)(4.30) and the lawof large numbers
assumption speciﬁed above. If the problem (4.31)(4.35) is feasible, then for any δ > 0 there is an ωonly
56 4. OPTIMIZINGTIMEAVERAGES
policy α
∗
(t ) that satisﬁes α
∗
(t ) ∈ A
ω(t )
for all t , and:
E
¸
ˆ y
0
(α
∗
(t ), ω(t ))
¸
≤ y
opt
0
+δ (4.36)
E
¸
ˆ y
l
(α
∗
(t ), ω(t ))
¸
≤ δ ∀l ∈ {1, . . . , L} (4.37)
E
¸
ˆ e
j
(α
∗
(t ), ω(t ))
¸
 ≤ δ ∀j ∈ {1, . . . , J} (4.38)
E
¸
ˆ a
k
(α
∗
(t ), ω(t ))
¸
≤ E
¸
ˆ
b
k
(α
∗
(t ), ω(t ))
¸
+δ ∀k ∈ {1, . . . , K} (4.39)
Proof. See Appendix 4.A. 2
The inequalities (4.36)(4.39) are similar to those seen in Chapter 3, which related the ex
istence of such randomized policies to the existence of linear programs that yield the desired time
averages. The stationarity of ω(t ) simpliﬁes the proof of Theorem 4.5 but is not crucial to its result.
Similar results are derived in (15)(21)(136) without the stationary assumption but under the addi
tional assumption that ω(t ) can take at most a ﬁnite (but arbitrarily large) number of values and has
well deﬁned time averages.
We have stated Theorem 4.5 in terms of arbitrarily small values δ > 0. It may be of interest
to note that for most practical systems, there exists an ωonly policy that satisﬁes all inequalities
(4.36)(4.39) with δ = 0. Appendix 4.A shows that this holds whenever the set , deﬁned as the set
of all oneslot expectations achievable under ωonly policies, is closed. Thus, one may prefer a more
“aesthetically pleasing” version of Theorem 4.5 that assumes the additional mild closure property in
order to remove the appearance of “δ” in the theorem statement. We have presented the theorem in
the above form because it is sufﬁcient for our purposes. In particular, we do not require the closure
property in order to apply the Lyapunov optimization techniques developed next.
4.4 VIRTUAL QUEUES
To solve the problem (4.31)(4.35), we ﬁrst transform all inequality and equality constraints into
queue stability problems. Speciﬁcally, deﬁne virtual queues Z
l
(t ) and H
j
(t ) for each l ∈ {1, . . . , L}
and j ∈ {1, . . . , J}, with update equations:
Z
l
(t +1) = max[Z
l
(t ) +y
l
(t ), 0] (4.40)
H
j
(t +1) = H
j
(t ) +e
j
(t ) (4.41)
The virtual queue Z
l
(t ) is used to enforce the y
l
≤ 0 constraint. Indeed, recall that if Z
l
(t ) satisﬁes
(4.40) then by our basic sample path properties in Chapter 2, we have for all t > 0:
Z
l
(t )
t
−
Z
l
(0)
t
≥
1
t
t −1
¸
τ=0
y
l
(τ)
4.4. VIRTUAL QUEUES 57
Taking expectations of the above and taking t → ∞ shows:
limsup
t →∞
E{Z
l
(t )}
t
≥ limsup
t →∞
y
l
(t )
where we recall that y
l
(t ) is the time average expectation of y
l
(τ) over τ ∈ {0, . . . , t −1}. Thus, if
Z
l
(t ) is mean rate stable, the lefthandside of the above inequality is 0 and so:
limsup
t →∞
y
l
(t ) ≤ 0
This means our desired time average constraint for y
l
(t ) is satisﬁed. This turns the problem of
satisfying a time average inequality constraint into a pure queue stability problem! This discussion
is of course just a repeated derivation of Theorem 2.5 (as well as Exercise 2.11).
The virtual queue H
j
(t ) is designed to turn the time average equality constraint e
j
= 0 into a
pure queue stability problem. The H
j
(t ) queue has a different structure, and can possibly be negative,
because it enforces an equality constraint rather than an inequality constraint. It is easy to see by
summing (4.41) that for any t > 0:
H
j
(t ) −H
j
(0) =
t −1
¸
τ=0
e
j
(τ)
Taking expectations and dividing by t yields:
E
¸
H
j
(t )
¸
−E
¸
H
j
(0)
¸
t
= e
j
(t ) (4.42)
Therefore, if H
j
(t ) is mean rate stable then:
2
lim
t →∞
e
j
(t ) = 0
so that the desired equality constraint for e
j
(t ) is satisﬁed.
It follows that if we can design a control algorithm that chooses α(t ) ∈ A
ω(t )
for all t , makes
all actual queues Q
k
(t ) and virtual queues Z
l
(t ), H
j
(t ) mean rate stable, and yields a time average
expectation of y
0
(t ) that is equal to our target y
opt
0
, then we have solved the problem (4.31)(4.35).
This transforms the original probleminto a problemof minimizing the time average of a cost function
subject to queue stability. We assume throughout that initial conditions satisfy Z
l
(0) ≥ 0 for all
l ∈ {1, . . . , L}, H
j
(0) ∈ R for all j ∈ {1, . . . , J}, and that E
¸
Z
l
(0)
2
¸
< ∞ and E
¸
H
j
(0)
2
¸
< ∞
for all l and j.
2
Note by Jensen’s inequality that 0 ≤ E{H(t )}  ≤ E{H(t )}, and so if E{H(t )} /t → 0, then E{H(t )} /t → 0.
58 4. OPTIMIZINGTIMEAVERAGES
4.5 THEMINDRIFTPLUSPENALTY ALGORITHM
Let (t ) = [Q(t ), Z(t ), H(t )] be a concatenated vector of all actual and virtual queues, with update
equations (4.23), (4.40), (4.41). Deﬁne the Lyapunov function:
L((t ))
=
1
2
K
¸
k=1
Q
k
(t )
2
+
1
2
L
¸
l=1
Z
l
(t )
2
+
1
2
J
¸
j=1
H
j
(t )
2
(4.43)
If there are no equality constraints, we have J = 0 and we remove the H
j
(t ) queues. If there are no
inequality constraints, then L = 0 and we remove the Z
l
(t ) queues.
Lemma 4.6 Suppose ω(t ) is i.i.d. over slots. Under any control algorithm, the driftpluspenalty ex
pression has the following upper bound for all t , all possible values of (t ), and all parameters V ≥ 0:
((t )) +VE{y
0
(t )(t )} ≤ B +VE{y
0
(t )(t )} +
K
¸
k=1
Q
k
(t )E{a
k
(t ) −b
k
(t )  (t )}
+
L
¸
l=1
Z
l
(t )E{y
l
(t )(t )} +
J
¸
j=1
H
j
(t )E
¸
e
j
(t )(t )
¸
(4.44)
where B is a positive constant that satisﬁes the following for all t :
B ≥
1
2
K
¸
k=1
E
¸
a
k
(t )
2
+b
k
(t )
2
 (t )
¸
+
1
2
L
¸
l=1
E
¸
y
l
(t )
2
(t )
¸
+
1
2
J
¸
j=1
E
¸
e
j
(t )
2
(t )
¸
−
K
¸
k=1
E
¸
˜
b
k
(t )a
k
(t )(t )
¸
(4.45)
where we recall that
˜
b
k
(t ) = min[Q
k
(t ), b
k
(t )]. Such a constant B exists because ω(t ) is i.i.d. and the
boundedness assumptions in Section 4.2.1 hold.
Proof. Squaring the queue update equation(4.23) and using the fact that max[q −b, 0]
2
≤ (q −b)
2
yields:
Q
k
(t +1)
2
≤ (Q
k
(t ) −b
k
(t ))
2
+a
k
(t )
2
+2 max[Q
k
(t ) −b
k
(t ), 0]a
k
(t )
= (Q
k
(t ) −b
k
(t ))
2
+a
k
(t )
2
+2(Q
k
(t ) −
˜
b
k
(t ))a
k
(t ) (4.46)
Therefore:
Q
k
(t +1)
2
−Q
k
(t )
2
2
≤
a
k
(t )
2
+b
k
(t )
2
2
−
˜
b
k
(t )a
k
(t ) +Q
k
(t )[a
k
(t ) −b
k
(t )]
4.5. THEMINDRIFTPLUSPENALTY ALGORITHM 59
Similarly,
Z
l
(t +1)
2
−Z
l
(t )
2
2
≤
y
l
(t )
2
2
+Z
l
(t )y
l
(t ) (4.47)
H
j
(t +1)
2
−H
j
(t )
2
2
=
e
j
(t )
2
2
+H
j
(t )e
j
(t )
Taking conditional expectations of the above three equations and summing over k ∈ {1, . . . , K},
l ∈ {1, . . . , L}, j ∈ {1, . . . , J} gives a bound on ((t )). Adding VE{y
0
(t )(t )} to both sides
proves the result. 2
Rather than directly minimize the expression ((t )) +VE{y
0
(t )(t )} every slot t , our
strategy actually seeks to minimize the bound given in the righthandside of (4.44). This is done via
the framework of opportunistically minimizing a (conditional) expectation as described in Section
1.8 (see also Exercise 4.5), and the resulting algorithm is given below.
Min DriftPlusPenalty Algorithm for solving (4.31)(4.35): Every slot t , observe the current
queue states (t ) and the random event ω(t ), and make a control decision α(t ) ∈ A
ω(t )
as follows:
Minimize: V ˆ y
0
(α(t ), ω(t )) +
¸
K
k=1
Q
k
(t )[ ˆ a
k
(α(t ), ω(t )) −
ˆ
b
k
(α(t ), ω(t ))]
+
¸
L
l=1
Z
l
(t ) ˆ y
l
(α(t ), ω(t )) +
¸
J
j=1
H
j
(t )ˆ e
j
(α(t ), ω(t )) (4.48)
Subject to: α(t ) ∈ A
ω(t )
(4.49)
Then update the virtual queues Z
l
(t ) and H
j
(t ) according to (4.40) and (4.41), and the actual queues
Q
k
(t ) according to (4.23).
Aremarkable property of this algorithmis that it does not need to knowthe probabilities π(ω).
After observing ω(t ), it seeks to minimize a (possibly nonlinear, nonconvex, and discontinuous)
function of α over all α ∈ A
ω(t )
. Its complexity depends on the structure of the functions ˆ a
k
(·),
ˆ
b
k
(·), ˆ y
l
(·), ˆ e
j
(·). However, in the case when the set A
ω(t )
contains a ﬁnite (and small) number of
possible control actions, the policy simply evaluates the function over each option and chooses the
best one.
Before presenting the analysis, we note that the problem (4.48)(4.49) may not have a well
deﬁned minimumwhen the set A
ω(t )
is inﬁnite. However, rather than assuming our decisions obtain
the exact minimum every slot (or come close to the inﬁmum), we analyze the performance when our
implementation comes within an additive constant of the inﬁmum in the righthandside of (4.44).
Deﬁnition 4.7 For a given constant C ≥ 0, a Cadditive approximation of the driftpluspenalty
algorithmis one that, every slot t and given the current (t ), chooses a (possibly randomized) action
α(t ) ∈ A
ω(t )
that yields a conditional expected value on the righthandside of the drift expression
(4.44) (given (t )) that is within a constant C from the inﬁmum over all possible control actions.
Deﬁnition 4.7 allows the deviation from the inﬁmum to be in an expected sense, rather than a
deterministic sense, which is useful in some applications. These Cadditive approximations are also
60 4. OPTIMIZINGTIMEAVERAGES
useful for implementations with outofdate queue backlog information, as shown in Exercise 4.10,
and for achieving maximum throughput in interference networks via approximation algorithms, as
shown in Chapter 6.
Theorem4.8 (Performance of Min DriftPlusPenalty Algorithm) Suppose that ω(t ) is i.i.d. over slots
with probabilities π(ω), the problem (4.31)(4.35) is feasible, and that E{L((0))} < ∞. Fix a value
C ≥ 0. If we use a Cadditive approximation of the algorithm every slot t , then:
a) Time average expected cost satisﬁes:
limsup
t →∞
1
t
t −1
¸
τ=0
E{y
0
(τ)} ≤ y
opt
0
+
B +C
V
(4.50)
where y
opt
0
is the inﬁmum time average cost achievable by any policy that meets the required constraints,
and B is deﬁned in (4.45).
b) All queues Q
k
(t ), Z
l
(t ), H
j
(t ) are mean rate stable, and all required constraints (4.32)(4.35)
are satisﬁed.
c) Suppose there are constants > 0 and () for which the Slater condition of Assumption A1
holds, stated below in (4.61)(4.64). Then:
limsup
t →∞
1
t
t −1
¸
τ=0
K
¸
k=1
E{Q
k
(τ)} ≤
B +C +V[() −y
opt
0
]
(4.51)
where [() −y
opt
0
] ≤ y
0,max
−y
0,min
, and y
0,min
, y
0,max
are deﬁned in (4.30).
We note that the bounds given in (4.50) and (4.51) are not just inﬁnite horizon bounds:
Inequalities (4.58) and (4.59) in the below proof show that these bounds hold for all time t > 0 in
the case when all initial queue backlogs are zero, and that a “fudge factor” that decays like O(1/t )
must be included if initial queue backlogs are nonzero. The above theorem is for the case when
ω(t ) is i.i.d. over slots. The same algorithm can be shown to offer similar performance under more
general ergodic ω(t ) processes as well as for nonergodic processes, as discussed in Section 4.9.
Proof. (Theorem 4.8) Because, every slot t , our implementation comes within an additive constant
C of minimizing the righthandside of the drift expression (4.44) over all α(t ) ∈ A
ω(t )
, we have
for each slot t :
((t )) +VE{y
0
(t )(t )} ≤ B +C +VE
¸
y
∗
0
(t )(t )
¸
+
L
¸
l=1
Z
l
(t )E
¸
y
∗
l
(t )(t )
¸
+
J
¸
j=1
H
j
(t )E
¸
e
∗
j
(t )(t )
¸
+
K
¸
k=1
Q
k
(t )E
¸
a
∗
k
(t ) −b
∗
k
(t )  (t )
¸
(4.52)
4.5. THEMINDRIFTPLUSPENALTY ALGORITHM 61
where a
∗
k
(t ), b
∗
k
(t ), y
∗
l
(t ), e
∗
j
(t ) are the resulting arrival, service, and attribute values under
any alternative (possibly randomized) decision α
∗
(t ) ∈ A
ω(t )
. Speciﬁcally, a
∗
k
(t )
=
ˆ a
k
(α
∗
(t ), ω(t )),
b
∗
k
(t )
=
ˆ
b
k
(α
∗
(t ), ω(t )), y
∗
l
(t )
=
ˆ y
l
(α
∗
(t ), ω(t )), e
∗
j
(t )
=
ˆ e
j
(α
∗
(t ), ω(t )).
Now ﬁx δ > 0, and consider the ωonly policy α
∗
(t ) that yields (4.36)(4.39). Because this
is an ωonly policy, and ω(t ) is i.i.d. over slots, the resulting values of y
∗
0
(t ), a
∗
k
(t ), b
∗
k
(t ), e
∗
j
(t ) are
independent of the current queue backlogs (t ), and we have from (4.36)(4.39):
E
¸
y
∗
0
(t )(t )
¸
= E
¸
y
∗
0
(t )
¸
≤ y
opt
0
+δ (4.53)
E
¸
y
∗
l
(t )(t )
¸
= E
¸
y
∗
l
(t )
¸
≤ δ ∀l ∈ {1, . . . , L} (4.54)
E
¸
e
∗
j
(t )(t )
¸
 = E
¸
e
∗
j
(t )
¸
 ≤ δ ∀j ∈ {1, . . . , J} (4.55)
E
¸
a
∗
k
(t ) −b
∗
k
(t )(t )
¸
= E
¸
a
∗
k
(t ) −b
∗
k
(t )
¸
≤ δ ∀k ∈ {1, . . . , K} (4.56)
Plugging these into the righthandside of (4.52) and taking δ → 0 yields:
((t )) +VE{y
0
(t )(t )} ≤ B +C +Vy
opt
0
(4.57)
This is in the exact form for application of the Lyapunov Optimization Theorem (Theorem 4.2).
Hence, all queues are mean rate stable, and so all required time average constraints are satisﬁed,
which proves part (b). Further, from the above drift expression, we have for any t > 0 (from (4.13)
of Theorem 4.2, or simply from taking iterated expectations and telescoping sums):
1
t
t −1
¸
τ=0
E{y
0
(τ)} ≤ y
opt
0
+
B +C
V
+
E{L((0))}
Vt
(4.58)
which proves part (a) by taking a limsup as t → ∞.
To prove part (c), assume Assumption A1 holds (stated below). Plugging the ωonly policy
that yields (4.61)(4.64) into the righthandside of the drift bound (4.52) yields:
((t )) +VE{y
0
(t )(t )} ≤ B +C +V() −
K
¸
k=1
Q
k
(t )
Taking iterated expectations, summing the telescoping series, and rearranging terms as usual yields:
1
t
t −1
¸
τ=0
K
¸
k=1
E{Q
k
(τ)} ≤
B +C +V[() −
1
t
¸
t −1
τ=0
E{y
0
(τ)}]
+
E{L((0))}
t
(4.59)
However, because our algorithm satisﬁes all of the desired constraints of the optimization problem
(4.31)(4.35), its limiting time average expectation for y
0
(t ) cannot be better than y
opt
0
:
liminf
t →∞
1
t
t −1
¸
τ=0
E{y
0
(τ)} ≥ y
opt
0
(4.60)
62 4. OPTIMIZINGTIMEAVERAGES
Indeed, this fact is shown in Appendix 4.A (equation (4.96)). Taking a limsup of (4.59) as t → ∞
and using (4.60) yields:
limsup
t →∞
1
t
t −1
¸
τ=0
K
¸
k=1
E{Q
k
(τ)} ≤
B +C +V[() −y
opt
0
]
2
The following is the Assumption A1 needed in part (c) of Theorem 4.8.
Assumption A1 (Slater Condition): There are values > 0 and () (where y
min
0
≤ () ≤
y
max
0
) and an ωonly policy α
∗
(t ) that satisﬁes:
E
¸
ˆ y
0
(α
∗
(t ), ω(t ))
¸
= () (4.61)
E
¸
ˆ y
l
(α
∗
(t ), ω(t ))
¸
≤ 0 ∀l ∈ {1, . . . , L} (4.62)
E
¸
ˆ e
j
(α
∗
(t ), ω(t ))
¸
= 0 ∀j ∈ {1, . . . , J} (4.63)
E
¸
ˆ a
k
(α
∗
(t ), ω(t ))
¸
≤ E
¸
ˆ
b
k
(α
∗
(t ), ω(t ))
¸
− ∀k ∈ {1, . . . , K} (4.64)
Assumption A1 ensures strong stability of the Q
k
(t ) queues. However, often the structure of
a particular problem allows stronger deterministic queue bounds, even without Assumption A1 (see
Exercise 4.9). A variation on the above proof that considers probability 1 convergence is treated in
Exercise 4.6.
4.5.1 WHEREAREWEUSINGTHEI.I.D. ASSUMPTIONS?
In (4.53)(4.56) of the above proof, we used equalities of the form E
¸
y
∗
l
(t )(t )
¸
= E
¸
y
∗
l
(t )
¸
,
which hold for any ωonly policy α
∗
(t ) when ω(t ) is i.i.d. over slots. Because past values of ω(τ)
for τ < t have inﬂuenced the current queue states (t ), this inﬂuence might skew the conditional
distribution of ω(t ) (given (t )) unless ω(t ) is independent of the past. However, while the i.i.d.
assumption is crucial for the above proof, it is not crucial for efﬁcient performance of the algorithm,
as shown in Section 4.9.
4.6 EXAMPLES
Here we provide examples of using the driftpluspenalty algorithmfor the same systems considered
in Sections 2.3.1 and 2.3.2. More examples are given in Exercises 4.74.15.
4.6.1 DYNAMICSERVERSCHEDULING
Example Problem: Consider the 3queue, 2server system described in Section 2.3.1 (see Fig. 2.1).
Deﬁne ω(t )
=
(a
1
(t ), a
2
(t ), a
3
(t )) as the random arrivals on slot t , and assume ω(t ) is i.i.d. over slots
with E{a
i
(t )} = λ
i
, E
¸
a
i
(t )
2
¸
= E
¸
a
2
i
¸
for i ∈ {1, 2, 3}.
a) Suppose (λ
1
, λ
2
, λ
3
) ∈ , where we recall that is deﬁned by the constraints 0 ≤ λ
i
≤ 1
for all i ∈ {1, 2, 3}, and λ
1
+λ
2
+λ
3
≤ 2. State the driftpluspenalty algorithm (with V = 0 and
C = 0) for stabilizing all three queues.
4.6. EXAMPLES 63
b) Suppose the Slater condition (Assumption A1) holds for a value > 0. Using the drift
pluspenalty algorithm with V = 0, C = 0, derive a value B such that time average queue backlog
satisﬁes Q
1
+Q
2
+Q
3
≤ B/, where Q
1
+Q
2
+Q
3
is the limsup time average expected backlog
in the system.
c) Suppose we must choose b(t ) ∈ {(1, 1, 0), (1, 0, 1), (0, 1, 1)} every slot t . Suppose that
choosing b(t ) = (1, 1, 0) or b(t ) = (1, 0, 1) consumes one unit of power per slot, but using the
vector b(t ) = (0, 1, 1) uses two units of power per slot. State the driftpluspenalty algorithm (with
V > 0 and C = 0) that seeks to minimize time average power subject to queue stability. Conclude
that p ≤ p
opt
+B/V, where p is the limsup time average expected power expenditure of the
algorithm, and p
opt
is the minimum possible time average power expenditure required for queue
stability. Assuming the Slater condition of part (b), conclude that Q
1
+Q
2
+Q
3
≤ (B +V)/.
Solution:
a) We have K = 3 with queues Q
1
(t ), Q
2
(t ), Q
3
(t ). There is no penalty to minimize, so
y
0
(t ) = 0 (and so we also choose V = 0). There are no additional y
l
(t ) or e
j
(t ) attributes, and so L =
J = 0. The control action α(t ) determines the server allocations, so that α(t ) = (b
1
(t ), b
2
(t ), b
3
(t )),
and the set of possible action vectors is A = {(1, 1, 0), (1, 0, 1), (0, 1, 1)} (so that we choose
which two queues to serve on each slot). The control action does not affect the arrivals, and so
ˆ a
k
(α(t ), ω(t )) = a
k
(t ). The algorithm (4.48)(4.49) with V = 0 reduces to observing the queue
backlogs every slot t and choosing (b
1
(t ), b
2
(t ), b
3
(t )) as follows:
Minimize: −
¸
3
k=1
Q
k
(t )b
k
(t ) (4.65)
Subject to: (b
1
(t ), b
2
(t ), b
3
(t )) ∈ {(1, 1, 0), (1, 0, 1), (0, 1, 1)} (4.66)
Then update the queues Q
k
(t ) according to (4.23). Note that the problem(4.65)(4.66) is equivalent
to minimizing
¸
3
k=1
Q
k
(t )[a
k
(t ) −b
k
(t )] subject to the same constraints, but to minimize this, it
sufﬁces to minimize only the terms we can control (so we can remove the
¸
3
k=1
Q
k
(t )a
k
(t ) term
that is the same regardless of our control decision). It is easy to see that the problem (4.65)(4.66)
reduces to choosing the two largest queues to serve every slot, breaking ties arbitrarily. This simple
policy does not require any knowledge of (λ
1
, λ
2
, λ
3
), yet ensures all queues are mean rate stable
whenever possible!
b) From (4.45) and using the fact that L = J = 0 and
˜
b
k
(t )a
k
(t ) ≥ 0, we want to ﬁnd a value
B that satisﬁes:
B ≥
1
2
3
¸
k=1
E
¸
a
2
k
(t )(t )
¸
+
1
2
E
¸
3
¸
k=1
b
k
(t )
2
(t )
¸
Because a
k
(t ) is i.i.d. over slots, it is independent of (t ) and so E
¸
a
k
(t )
2
(t )
¸
= E
¸
a
2
k
¸
. Further,
b
k
(t )
2
= b
k
(t ) (because b
k
(t ) ∈ {0, 1}). Thus, it sufﬁces to ﬁnd a value B that satisﬁes:
B ≥
1
2
3
¸
k=1
E
¸
a
2
k
¸
+
1
2
E
¸
3
¸
k=1
b
k
(t )(t )
¸
64 4. OPTIMIZINGTIMEAVERAGES
However, since b
1
(t ) +b
2
(t ) +b
3
(t ) ≤ 2 for all t (regardless of (t )), we can choose:
B =
1
2
3
¸
k=1
E
¸
a
2
k
¸
+1
Because Assumption A1 is satisﬁed and V = C = 0, we have from (4.51) that:
Q
1
+Q
2
+Q
3
≤ B/
c) We now deﬁne penalty y
0
(t ) = ˆ y
0
(b
1
(t ), b
2
(t ), b
3
(t )), where:
ˆ y
0
(b
1
(t ), b
2
(t ), b
3
(t )) =
¸
1 if (b
1
(t ), b
2
(t ), b
3
(t )) ∈ {(1, 1, 0) ∪ (1, 0, 1)}
2 if (b
1
(t ), b
2
(t ), b
3
(t )) = (0, 1, 1)
Then the driftpluspenalty algorithm (with V > 0) now observes (Q
1
(t ), Q
2
(t ), Q
3
(t )) every slot
t and chooses a server allocation to solve:
Minimize: V ˆ y
0
(b
1
(t ), b
2
(t ), b
3
(t )) −
¸
2
k=1
Q
k
(t )b
k
(t ) (4.67)
Subject to: (b
1
(t ), b
2
(t ), b
3
(t )) ∈ {(1, 1, 0), (1, 0, 1), (0, 1, 1)} (4.68)
This can be solved easily by comparing the value of (4.67) associated with each option:
• Option (1, 1, 0): value = V −Q
1
(t ) −Q
2
(t ).
• Option (1, 0, 1): value = V −Q
1
(t ) −Q
3
(t ).
• Option (0, 1, 1): value = 2V −Q
2
(t ) −Q
3
(t ).
Thus, every slot t we pick the option with the smallest of the above three values, breaking ties
arbitrarily. This is again a simple dynamic algorithm that does not require knowledge of the rates
(λ
1
, λ
2
, λ
3
). By (4.50), we know that the achieved time average power p (where p
=
y
0
) satisﬁes
p ≤ p
opt
+B/V, where B is deﬁned in part (b). Because y
0,max
= 2 and y
0,min
= 1, by (4.51),
we know the resulting average backlog satisﬁes Q
1
+Q
2
+Q
3
≤ (B +(2 −1)V)/, where is
deﬁned in (b). This illustrates the [O(1/V), O(V)] tradeoff between average power and average
backlog.
The above problem assumes we must allocate exactly two servers on every slot. The problem
can of course be modiﬁed if we allowthe option of serving only 1 queue, or 0 queues, at some reduced
power expenditure.
4.6.2 OPPORTUNISTICSCHEDULING
Example Problem: Consider the 2queue wireless system with ON/OFF channels described in
Section 2.3.2 (see Fig. 2.2). Suppose channel vectors (S
1
(t ), S
2
(t )) are i.i.d. over slots with
S
i
(t ) ∈ {ON, OFF}, as before. However, suppose that newarrivals are not immediately sent into the
4.6. EXAMPLES 65
queue, but are only admitted via a ﬂowcontrol decision. Speciﬁcally, suppose that (A
1
(t ), A
2
(t )) repre
sents the random vector of new packet arrivals on slot t , where A
1
(t ) is i.i.d. over slots and Bernoulli
with Pr[A
1
(t ) = 1] = λ
1
, and A
2
(t ) is i.i.d. over slots and Bernoulli with Pr[A
2
(t ) = 1] = λ
2
.
Every slot a ﬂow controller observes (A
1
(t ), A
2
(t )) and makes an admission decision a
1
(t ), a
2
(t ),
subject to the constraints:
a
1
(t ) ∈ {0, A
1
(t )}, a
2
(t ) ∈ {0, A
2
(t )}
Packets that are not admitted are dropped. We thus have ω(t ) = [(S
1
(t ), S
2
(t )), (A
1
(t ), A
2
(t ))].
The control action is given by α(t ) = [(α
1
(t ), α
2
(t )); (β
1
(t ), β
2
(t ))] where α
k
(t ) is a binary value
that is 1 if we choose to admit the packet (if any) arriving to queue k on slot t , and β
k
(t ) is a binary
value that is 1 if we choose serve queue k on slot t , with the constraint β
1
(t ) +β
2
(t ) ≤ 1.
a) Use the driftpluspenalty method (with V > 0 and C = 0) to stabilize the queues while
seeking to maximize the linear utility function of throughput w
1
a
1
+w
2
a
2
, where w
1
and w
2
are
given positive weights and a
k
represents the time average rate of data admitted to queue k.
b) Assuming the Slater condition of Assumption A1 holds for some value > 0, state the
resulting utility and average backlog performance.
c) Redo parts (a) and (b) withthe additional constraint that a
1
≥ 0.1 (assuming this constraint,
is feasible).
Solution:
a) We have K = 2 queues to stabilize. We have penalty function y
0
(t ) = −w
1
a
1
(t ) −w
2
a
2
(t )
(so that minimizing the time average of this penalty maximizes w
1
a
1
+w
2
a
2
). There are no
other attributes y
l
(t ) or e
j
(t ), so L = J = 0. The arrival and service variables are given by
a
k
(t ) = ˆ a
k
(α
k
(t ), A
k
(t )) and b
k
(t ) =
ˆ
b
k
(β
k
(t ), S
k
(t )) for k ∈ {1, 2}, where:
ˆ a
k
(α
k
(t ), A
k
(t )) = α
k
(t )A
k
(t ) ,
ˆ
b
k
(β
k
(t ), S
k
(t )) = β
k
(t )1
{S
k
(t )=ON}
where 1
{S
k
(t )=ON}
is an indicator function that is 1 if S
k
(t ) = ON, and 0 else. The driftpluspenalty
algorithm of (4.48) thus reduces to observing the queue backlogs (Q
1
(t ), Q
2
(t )) and the current
network state ω(t ) = [(S
1
(t ), S
2
(t )), (A
1
(t ), A
2
(t ))] and making ﬂow control and transmission
actions α
k
(t ) and β
k
(t ) to solve:
Min: −V[w
1
α
1
(t )A
1
(t ) +w
2
α
2
(t )A
2
(t )] +
2
¸
k=1
Q
k
(t )[α
k
(t )A
k
(t ) −β
k
(t )1
{S
k
(t )=ON}
]
Subj. to: α
k
(t ) ∈ {0, 1} ∀k ∈ {1, 2} , β
k
(t ) ∈ {0, 1} ∀k ∈ {1, 2}, β
1
(t ) +β
2
(t ) ≤ 1
The ﬂow control and transmission decisions appear in separate terms in the above problem,
and so they can be chosen to minimize their respective terms separately. This reduces to the following
simple algorithm:
• (Flow Control) For each k ∈ {1, 2}, choose α
k
(t ) = 1 (so that we admit A
k
(t ) to queue k)
whenever Vw
k
≥ Q
k
(t ), and choose α
k
(t ) = 0 else.
66 4. OPTIMIZINGTIMEAVERAGES
• (Transmission) Choose (β
1
(t ), β
2
(t )) subject to the constraints to maximize
Q
1
(t )β
1
(t )1
{S
1
(t )=ON}
+Q
2
(t )β
2
(t )1
{S
2
(t )=ON}
. This reduces to the “Longest Con
nected Queue” algorithm of (8). Speciﬁcally, we place the server to the queue that is ON and
that has the largest value of queue backlog, breaking ties arbitrarily.
b) We compute B from (4.45). Because L = J = 0, we choose B to satisfy:
B ≥
1
2
2
¸
k=1
E
¸
a
k
(t )
2
(t )
¸
+
1
2
2
¸
k=1
E
¸
b
k
(t )
2
(t )
¸
Because arrivals are i.i.d. Bernoulli, they are independent of queue backlog and so E
¸
a
k
(t )
2
(t )
¸
=
E
¸
a
k
(t )
2
¸
= E{a
k
(t )} = λ
k
. Further, b
k
(t )
2
= b
k
(t ), and b
1
(t ) +b
2
(t ) ≤ 1. Thus we can choose:
B = (λ
1
+λ
2
+1)/2. It follows from (4.50) that:
w
1
a
1
+w
2
a
2
≥ ut ility
opt
−B/V
where ut ility
opt
is the maximum possible utility value subject to stability. Further, because y
0,min
=
−(w
1
+w
2
) and y
0,max
= 0, we have from (4.51):
Q
1
+Q
2
≤ (B +V(w
1
+w
2
))/
c) The constraint a
1
≥ 0.1 is equivalent to 0.1 −a
1
≤ 0. To enforce this constraint, we simply
introduce a virtual queue Z
1
(t ) as follows:
Z
1
(t +1) = max[Z
1
(t ) +0.1 −a
1
(t ), 0] (4.69)
This can be viewed as introducing an additional penalty y
1
(t ) = 0.1 −a
1
(t ). The driftpluspenalty
algorithm (4.48) reduces to observing the queue backlogs and network state ω(t ) every slot t and
making actions to solve
Min: −V[w
1
α
1
(t )A
1
(t ) +w
2
α
2
(t )A
2
(t )] +
¸
2
k=1
Q
k
(t )[α
k
(t )A
k
(t ) −β
k
(t )1
{S
k
(t )=ON}
]
+Z
1
(t )[0.1 −α
1
(t )A
1
(t )]
Subj. to: α
k
(t ) ∈ {0, 1} ∀k ∈ {1, 2} , β
k
(t ) ∈ {0, 1} ∀k ∈ {1, 2}, β
1
(t ) +β
2
(t ) ≤ 1
Then update virtual queue Z
1
(t ) according to (4.69) at the end of the slot, and update the queues
Q
k
(t ) according to (4.23). This reduces to:
• (Flow Control) Choose α
1
(t ) = 1 whenever Vw
1
+Z
1
(t ) ≥ Q
1
(t ), and choose α
1
(t ) = 0
else. Choose α
2
(t ) = 1 whenever Vw
2
≥ Q
2
(t ), and choose α
2
(t ) = 0 else.
• (Transmission) Choose (β
1
(t ), β
2
(t )) the same as in part (a).
4.7. VARIABLEV ALGORITHMS 67
4.7 VARIABLEV ALGORITHMS
The [O(1/V), O(V)] performancedelay tradeoff suggests that if we use a variable parameter V(t )
that gradually increases with time, then we can maintain mean rate stability while driving the time
average penalty to its exact optimumvalue y
opt
0
. This is shown below, and is analogous to diminishing
stepsize methods for static convex optimization problems (133)(134).
Theorem 4.9 Suppose that ω(t ) is i.i.d. over slots with probabilities π(ω), the problem (4.31)(4.35)
is feasible, and E{L((0))} < ∞. Suppose that every slot t , we implement a Cadditive approximation
that comes within C ≥ 0 of the inﬁmum of a modiﬁed righthandside of (4.44), where the V parameter
is replaced with V(t ), deﬁned:
V(t )
=
V
0
(t +1)
β
∀t ∈ {0, 1, 2, . . .} (4.70)
for some constants V
0
> 0 and β such that 0 < β < 1. Then all queues are mean rate stable, all required
constraints (4.32)(4.35) are satisﬁed, and:
lim
t →∞
1
t
t −1
¸
τ=0
E{y
0
(τ)} = y
opt
0
The manner in which the V
0
and β parameters affect convergence is described in the proof, speciﬁcally in
(4.72) and (4.73).
While this variable V approach yields the exact optimum y
opt
0
, its disadvantage is that we
achieve only mean rate stability and not strong stability, so that there is no ﬁnite bound on average
queue size and average delay. In fact, it is known that for typical problems (except for those with
a trivial structure), average backlog and delay necessarily grow to inﬁnity as we push performance
closer and closer to optimal, becoming inﬁnity at the optimal point (50)(51)(52)(53). The very large
queue sizes incurred by this variable V algorithm also make it more difﬁcult to adapt to changes in
system parameters, whereas ﬁxed V algorithms can easily adapt.
Proof. (Theorem 4.9) Repeating the proof of Theorem 4.8 by replacing V with V(t ) for a given slot
t , the equation (4.57) becomes:
((t )) +V(t )E{y
0
(t )(t )} ≤ B +C +V(t )y
opt
0
Taking expectations of both sides of the above and using iterated expectations yields:
E{L((t +1))} −E{L((t ))} +V(t )E{y
0
(t )} ≤ B +C +V(t )y
opt
0
(4.71)
Noting that E{y
0
(t )} ≥ y
0,min
yields:
E{L((t +1))} −E{L((t ))} ≤ B +C +V(t )(y
opt
0
−y
0,min
)
68 4. OPTIMIZINGTIMEAVERAGES
The above holds for all t ≥ 0. Summing over τ ∈ {0, . . . , t −1} yields:
E{L((t ))} −E{L((0))} ≤ (B +C)t +(y
opt
0
−y
0,min
)
t −1
¸
τ=0
V(τ)
Using the deﬁnition of the Lyapunov function in (4.43) yields the following for all t > 0:
K
¸
k=1
E
¸
Q
k
(t )
2
¸
+
L
¸
l=1
E
¸
Z
l
(t )
2
¸
+
J
¸
j=1
E
¸
H
j
(t )
2
¸
≤
2(B +C)t +2E{L((0))} +2(y
opt
0
−y
0,min
)
t −1
¸
τ=0
V(τ)
Take any queue Q
k
(t ). Because E{Q
k
(t )}
2
≤ E
¸
Q
k
(t )
2
¸
, we have for all queues Q
k
(t ):
E{Q
k
(t )} ≤
2(B +C)t +2E{L((0))} +2(y
opt
0
−y
0,min
)
t −1
¸
τ=0
V(τ)
and the same bound holds for E{Z
l
(t )} and E
¸
H
j
(t )
¸
for all l ∈ {1, . . . , L}, j ∈ {1, . . . , J}.
Dividing both sides of the above inequality by t yields the following for all t > 0:
E{Q
k
(t )}
t
≤
2(B +C)
t
+
2E{L((0))}
t
2
+2(y
opt
0
−y
0,min
)
1
t
2
t −1
¸
τ=0
V(τ) (4.72)
and the same bound holds for all E{Z
l
(t )} /t and E
¸
H
j
(t )
¸
/t . However, we have:
0 ≤
1
t
2
t −1
¸
τ=0
V(τ) =
V
0
t
2
t −1
¸
τ=0
(1 +τ)
β
≤
V
0
t
2
t
0
(1 +v)
β
dv =
V
0
t
2
¸
(1 +t )
1+β
−1
1 +β
¸
Because 0 < β < 1, taking a limit of the above as t → ∞ shows that
1
t
2
¸
t −1
τ=0
V(τ) → 0. Using
this and taking a limit of (4.72) shows that all queues are mean rate stable, and hence (by Section
4.4)) all required constraints (4.32)(4.35) are satisﬁed.
To prove that the time average expectation of y
0
(t ) converges to y
opt
0
, consider again the
inequality (4.71), which holds for all t . Dividing both sides of (4.71) by V(t ) yields:
E{L((t +1))} −E{L((t ))}
V(t )
+E{y
0
(t )} ≤
B +C
V(t )
+y
opt
0
Summing the above over τ ∈ {0, 1, . . . , t −1} and collecting terms yields:
E{L((t ))}
V(t −1)
−
E{L((0))}
V(0)
+
t −1
¸
τ=1
E{L((τ))}
¸
1
V(τ −1)
−
1
V(τ)
¸
+
t −1
¸
τ=0
E{y
0
(τ)} ≤
ty
opt
0
+(B +C)
t −1
¸
τ=0
1
V(τ)
4.8. PLACEHOLDERBACKLOG 69
Because V(t ) is nondecreasing, we have for all τ ≥ 1:
¸
1
V(τ −1)
−
1
V(τ)
¸
≥ 0
Using this in the above inequality and dividing by t yields:
1
t
t −1
¸
τ=0
E{y
0
(τ)} ≤ y
opt
0
+(B +C)
1
t
t −1
¸
τ=0
1
V(τ)
+
E{L((0))}
V(0)t
(4.73)
However:
0 ≤
1
t
t −1
¸
τ=0
1
V(τ)
≤
1
t V(0)
+
1
V
0
t
t −1
0
1
(1 +v)
β
dv =
1
t V(0)
+
1
V
0
t
¸
t
1−β
−1
1 −β
¸
Taking a limit as t → ∞ shows that this term vanishes, and so the limsup of the lefthandside in
(4.73) is less than or equal to y
opt
0
. However, the policy satisﬁes all constraints (4.32)(4.35) and so
the liminf must be greater than or equal to y
opt
0
(by the Appendix 4.A result (4.96)), so the limit
exists and is equal to y
opt
0
. 2
4.8 PLACEHOLDERBACKLOG
Here we present a simple delay improvement for the ﬁxedV driftpluspenalty algorithm. The
queue backlogs under this algorithm can be viewed as a stochastic version of a Lagrange multiplier
for classical static convex optimization problems (see (45)(37) for more intuition on this), and they
need to be large to appropriately inform the stochastic optimizer about good decisions to take.
However, for many such problems, we can trick the stochastic optimizer by making it think actual
queue backlog is larger than it really is. This allows the same performance with reduced queue
backlog. To develop the technique, we make the following three preliminary observations:
• The inﬁnite horizon time average expected penalty and backlog bounds of Theorem 4.8 are
insensitive to the initial condition (0).
• All sample paths of backlog and penalty are the same under any service order for the Q
k
(t )
queues, provided that queueing dynamics satisfy (4.23). In particular, the results are the same
if service is FirstInFirstOut (FIFO) or LastInFirstOut (LIFO).
• It is often the case that, under the driftpluspenalty algorithm (or a particular Cadditive
approximation of it), some queues are never served until they have at least a certain minimum
amount of backlog.
The third observation motivates the following deﬁnition.
Deﬁnition 4.10 (PlaceHolder Values) A nonnegative value Q
place
k
is a placeholder value for
network queue Q
k
(t ) with respect to a given algorithm if for all possible sample paths, we have
70 4. OPTIMIZINGTIMEAVERAGES
Q
k
(t ) ≥ Q
place
k
for all slots t ≥ 0 whenever Q
k
(0) ≥ Q
place
k
. Likewise, a nonnegative value Z
place
l
is a placeholder value for queue Z
l
(t ) if for all possible sample paths, we have Z
l
(t ) ≥ Z
place
l
for
all t ≥ 0 whenever Z
l
(0) ≥ Z
place
l
.
Clearly 0 is a placeholder value for all queues Q
k
(t ) and Z
l
(t ), but the idea is to compute
the largest possible placeholder values. It is often easy to precompute positive placeholder values
without knowing anything about the system probabilities. This is done in the Chapter 3 example
for minimizing average power expenditure subject to stability (see Section 3.2.4), and Exercises 4.8
and 4.11 provide further examples. Suppose now we run the algorithm with initial queue backlog
Q
k
(0) = Q
place
k
for all k ∈ {1, . . . , K}. Then we achieve exactly the same backlog and penalty
sample paths under either FIFO or LIFO. However, none of the initial backlog Q
place
k
would ever
exit the system under LIFO! Thus, we can achieve the same performance by replacing this initial
backlog Q
place
k
with fake backlog, called placeholder backlog (142)(143). Whenever a transmission
opportunity arises, we transmit only actual data whenever possible, serving the actual data in any
order we like (such as FIFO or LIFO). Because queue backlog never dips below Q
place
k
, we never
have to serve any fake data. Thus, the actual queue backlog under this implementation is equal to
Q
act ual
k
(t ) = Q
k
(t ) −Q
place
k
for all t , which reduces the actual backlog by an amount exactly equal
to Q
place
k
. This does not affect the sample path and hence does not affect the time average penalty.
Speciﬁcally, for all k ∈ {1, . . . , K} and l ∈ {1, . . . , L}, we initialize the actual backlog
Q
act ual
k
(0) = Z
act ual
l
(0) = 0, but we use placeholder backlogs Q
place
k
, Z
place
l
so that:
Q
k
(0) = Q
place
k
, Z
l
(0) = Z
place
l
∀k ∈ {1, . . . , K}, l ∈ {1, . . . , L}
We then operate the algorithm using the Q
k
(t ) and Z
l
(t ) values (not the actual values Q
act ual
k
(t )
and Z
act ual
l
(t )). The above discussion ensures that for all time t , we have:
Q
act ual
k
(t ) = Q
k
(t ) −Q
place
k
, Z
act ual
l
(t ) = Z
l
(t ) −Z
place
l
∀t ≥ 0
Because the bounds in Theorem 4.8 are independent of the initial condition, the same penalty and
backlog bounds are achieved. However, the actual backlog is reduced by exactly Q
place
k
and Z
place
l
at every instant of time. This is a “free” reduction in the queue backlog, with no impact on the
limiting time average penalty. This has already been illustrated in the example minimum average
power problem of the previous chapter (Section 3.2.4, Figs. 3.33.4). The Fig. 4.2 below provides
further insight: Fig. 4.2 shows a sample path of Q
2
(t ) for the same example system of Section 3.2.4
(using V = 100 and (λ
1
, λ
2
) = (0.3, 0.7)). We use Q
place
2
= min[V −2, 0] = 48 as the initial
backlog, and the ﬁgure illustrates that Q
2
(t ) indeed never drops below 48. The placeholder savings
is illustrated in the ﬁgure.
We developed this method of placeholder bits in (143) for use in dynamic data compression
problems and in (142) for general constrained cost minimization problems (including multihop
wireless networks with unreliable channels). The reader is referred to the examples and simulations
4.8. PLACEHOLDERBACKLOG 71
0 500 1000 1500 2000 2500 3000
0
20
40
60
80
100
120
Backlog Q
2
(t) versus time
B
a
c
k
l
o
g
Q
2
(
t
)
(
p
a
c
k
e
t
s
)
t
Placeholder value Q
2
place
savings
Figure 4.2: A sample path of Q
2
(t ) over 3000 slots for the example system of Section 3.2.4.
given in (143)(142). A more aggressive placeholder technique is developed in (37). The idea of
(37) can be illustrated easily from Fig. 4.2: While the ﬁgure illustrates that Q
2
(t ) never drops be
low Q
place
2
, the backlog actually increases until it reaches a “plateau” around 100 packets, and then
oscillates with some noise about this value. Intuitively, we can almost double the placeholder value
in the ﬁgure, raising the horizontal line up to a level that is close to the minimum backlog value
seen in the plateau. While we cannot guarantee that backlog will never drop below this new line,
the idea is to show that such events occur rarely. Work in (45) shows that scaled queue backlog con
verges to a Lagrange multiplier of a related static optimization problem, and work in (37) shows that
actual queue backlog oscillates very closely about this Lagrange multiplier. Speciﬁcally, it is shown
in (37) that, under mild assumptions, the steady state backlog distribution decays exponentially in
distance from the Lagrange multiplier value. It then develops an algorithm that uses a placeholder
that is a distance of O(log
2
(V)) from the Lagrange multiplier, showing that deviations by more
than this amount are rare and can be handled separately by dropping a small amount of pack
ets. The result fundamentally changes the performancebacklog tradeoff from [O(1/V), O(V)] to
[O(1/V), O(log
2
(V))] (within a logarithmic factor of the optimal tradeoff shown in (52)(51)(53)).
A disadvantage of this aggressive approach is that Lagrange multipliers must be known in
advance, which is difﬁcult as they may depend on systemstatistics and they may be different for each
queue in the system. This is handled elegantly in a LastInFirstOut (LIFO) implementation of the
driftpluspenalty method, developed in (54). That LIFO can improve delay can be understood by
Fig. 4.2: First, a LIFO implementation would achieve all of the savings of the original placeholder
value of Q
place
2
= 48 (at the cost of never serving the ﬁrst 48 packets). Next, a LIFOimplementation
72 4. OPTIMIZINGTIMEAVERAGES
would intuitively lead to delays of “most” packets that are on the order of the magnitude of noise
variations in the plateau area. That is, LIFO can achieve the more aggressive placeholder gains
without computing the Lagrange multipliers! This is formally proven in (55). Experiments with the
LIFO driftpluspenalty method on an actual multihop wireless network deployment in (54) show
a dramatic improvement in delay (by more than an order of magnitude) for all but 2%of the packets.
4.9 NONI.I.D. MODELS ANDUNIVERSAL SCHEDULING
Here we show that the same driftpluspenalty algorithm provides similar [O(1/V), O(V)] per
formance guarantees when ω(t ) varies according to a more general ergodic (possibly noni.i.d.)
process. We then show it also provides efﬁcient performance for arbitrary (possibly nonergodic)
sample paths. The main proof techniques are the same as those we have already developed, with the
exception that we use a multislot drift analysis rather than a 1slot drift analysis.
We consider the same system as in Section 4.2.1, with K queues with dynamics (4.23), and
attributes y
l
(t ) = ˆ y
l
(α(t ), ω(t )) for l ∈ {1, . . . , L}. For simplicity, we eliminate the attributes e
j
(t )
associated with equality constraints (so that J = 0). We seek an algorithmfor choosing α(t ) ∈ A
ω(t )
every slot to minimize y
0
subject to mean rate stability of all queues Q
k
(t ) and subject to y
l
≤ 0
for all l ∈ {1, . . . , L}. The virtual queues Z
l
(t ) for l ∈ {1, . . . , L} are the same as before, deﬁned in
(4.40). For simplicity of exposition, we assume:
• The exact driftpluspenalty algorithm of (4.48)(4.49) is used, rather than a Cadditive ap
proximation (so that C = 0).
• The functions ˆ a
k
(·),
ˆ
b
k
(·), ˆ y
l
(·) are deterministically bounded, so that:
0 ≤ ˆ a
k
(α(t ), ω(t )) ≤ a
max
k
∀k ∈ {1, . . . , K}, ∀ω(t ), α(t ) ∈ A
ω(t )
(4.74)
0 ≤
ˆ
b
k
(α(t ), ω(t )) ≤ b
max
k
∀k ∈ {1, . . . , K}, ∀ω(t ), α(t ) ∈ A
ω(t )
(4.75)
y
min
l
≤ ˆ y
l
(α(t ), ω(t )) ≤ y
max
l
∀l ∈ {0, 1, . . . , L}, ∀ω(t ), α(t ) ∈ A
ω(t )
(4.76)
Deﬁne (t )
=
[Q(t ), Z(t )], and deﬁne the Lyapunov function L((t )) as follows:
L((t ))
=
1
2
K
¸
k=1
Q
k
(t )
2
+
1
2
L
¸
l=1
Z
l
(t )
2
(4.77)
4.9. NONI.I.D. MODELS ANDUNIVERSAL SCHEDULING 73
We have the following preliminary lemma.
Lemma 4.11 (T slot Drift) Assume (4.74)(4.76) hold. For any slot t , any queue backlogs (t ), and
any integer T > 0, the driftpluspenalty algorithm ensures that:
L((t +T )) −L((t )) +V
t +T −1
¸
τ=t
ˆ y
0
(α(τ), ω(τ)) ≤ DT
2
+V
t +T −1
¸
τ=t
ˆ y
0
(α
∗
(τ), ω(τ))
+
L
¸
l=1
Z
l
(t )
t +T −1
¸
τ=t
[ ˆ y
l
(α
∗
(τ), ω(τ))]
+
K
¸
k=1
Q
k
(t )
t +T −1
¸
τ=t
[ ˆ a
k
(α
∗
(τ), ω(τ)) −
ˆ
b
k
(α
∗
(τ), ω(τ))]
where L((t )) is deﬁned in (4.77), α
∗
(τ) for τ ∈ {t, . . . , t +T −1} is any sequence of alternative
decisions that satisfy α
∗
(τ) ∈ A
ω(τ)
, and the constant D is deﬁned:
D
=
1
2
K
¸
k=1
[(a
max
k
)
2
+(b
max
k
)
2
] +
1
2
L
¸
l=1
max[(y
min
l
)
2
, (y
max
l
)
2
] (4.78)
Proof. From (4.46)(4.47), we have for any slot τ:
L((τ +1)) −L((τ)) ≤ D +
K
¸
k=1
Q
k
(τ)[ ˆ a
k
(α(τ), ω(τ)) −
ˆ
b
k
(α(τ), ω(τ))]
+
L
¸
l=1
Z
l
(τ) ˆ y
l
(α(τ), ω(τ))
where D is deﬁned in (4.78). We then add V ˆ y
0
(α(τ), ω(τ)) to both sides. Because the driftplus
penalty algorithm is designed to choose α(τ) to deterministically minimize the righthandside of
the resulting inequality when this term is added, it follows that:
L((τ +1)) −L((τ)) +V ˆ y
0
(α(τ), ω(τ)) ≤ D +V ˆ y
0
(α
∗
(τ), ω(τ))
+
K
¸
k=1
Q
k
(τ)[ ˆ a
k
(α
∗
(τ), ω(τ)) −
ˆ
b
k
(α
∗
(τ), ω(τ))]
+
L
¸
l=1
Z
l
(τ) ˆ y
l
(α
∗
(τ), ω(τ))
74 4. OPTIMIZINGTIMEAVERAGES
where α
∗
(τ) is any other decision that satisﬁes α
∗
(τ) ∈ A
ω(τ)
. However, we now note that for all
τ ∈ {t, . . . , t +T −1}:
Q
k
(τ) −Q
k
(t ) ≤ (τ −t ) max[a
max
k
, b
max
k
]
Z
l
(τ) −Z
l
(t ) ≤ (τ −t ) max[y
max
l
, y
min
l
]
Plugging these in, it can be shown that:
L((τ +1)) −L((τ)) +V ˆ y
0
(α(τ), ω(τ)) ≤ D +2D ×(τ −t ) +V ˆ y
0
(α
∗
(τ), ω(τ))
+
L
¸
l=1
Z
l
(t ) ˆ y
l
(α
∗
(τ), ω(τ))
+
K
¸
k=1
Q
k
(t )[ ˆ a
k
(α
∗
(τ), ω(τ)) −
ˆ
b
k
(α
∗
(τ), ω(τ))]
Summing the above over τ ∈ {t, . . . , t +T −1} and using the fact that
¸
t +T −1
τ=t
(τ −t ) = (T −
1)T/2 yields the result. 2
4.9.1 MARKOVMODULATEDPROCESSES
Here we present a method developed in (144) for proving that the [O(1/V), O(V)] behavior of
the driftpluspenalty algorithm is preserved in ergodic (but noni.i.d.) contexts. Let (t ) be an
irreducible (possibly not aperiodic) Discrete Time Markov Chain (DTMC) with a ﬁnite state space
S.
3
Let π
i
represent the stationary distribution over states i ∈ S. Such a distribution always exists
(and is unique) for irreducible ﬁnite state Markov chains. It is well known that all π
i
probabilities
are positive, and the time average fraction of time being in state i is π
i
with probability 1. Further,
1/π
i
represents the (ﬁnite) mean recurrence time to state i, which is the average number of slots
required to get back to state i, given that we start in state i. Finally, it is known that second moments
of recurrence time are also ﬁnite (see (132)(130) for more details on DTMCs).
The randomnetwork event process ω(t ) is modulated by the DTMC(t ) as follows: When
ever (t ) = i, the value of ω(t ) is chosen independently with some distribution p
i
(ω). Then the
stationary distribution of ω(t ) is given by:
Pr[ω(t ) = ω] =
¸
i∈S
π
i
p
i
(ω)
Assume the state space S has a state “0” that we designate as a “renewal” state. Assume for simplicity
that (0) = 0, and let the sequence {T
0
, T
1
, T
2
, . . .} represent the recurrence times to state 0. Clearly
{T
r
}
∞
r=0
is an i.i.d. sequence with E{T
r
} = 1/π
0
for all r. Deﬁne E{T } and E
¸
T
2
¸
as the ﬁrst and
second moments of these recurrence times (so that E{T } = 1/π
0
). Deﬁne t
0
= 0, and for integers
3
This subsection (Subsection 4.9.1) assumes familiarity with DTMC theory and can be skipped without loss of continuity.
4.9. NONI.I.D. MODELS ANDUNIVERSAL SCHEDULING 75
r > 0 deﬁne t
r
as the time of the rth revisitation to state 0, so that t
r
=
¸
r
j=1
T
j
. We now deﬁne
the variable slot drift ((t
r
)) as follows:
((t
r
))
=
E{L((t
r+1
)) −L((t
r
))(t
r
)}
This drift represents the expected change in the Lyapunov function from renewal time t
r
to re
newal time t
r+1
, where the expectation is over the random duration of the renewal period and the
random events on each slot of this period. By plugging t = t
r
and T = T
r
into Lemma 4.11 and
taking conditional expectations given (t
r
), we have the following variableslot driftpluspenalty
expression:
((t
r
)) +VE
¸
t
r
+T
r
−1
¸
τ=t
r
ˆ y
0
(α(τ), ω(τ))(t
r
)
¸
≤ DE
¸
T
2
r
(t
r
)
¸
+VE
¸
t
r
+T
r
−1
¸
τ=t
r
ˆ y
0
(α
∗
(τ), ω(τ))(t
r
)
¸
+
L
¸
l=1
Z
l
(t
r
)E
¸
t
r
+T
r
−1
¸
τ=t
r
ˆ y
l
(α
∗
(τ), ω(τ))(t
r
)
¸
+
K
¸
k=1
Q
k
(t
r
)E
¸
t
r
+T
r
−1
¸
τ=t
r
[ ˆ a
k
(α
∗
(τ), ω(τ)) −
ˆ
b
k
(α
∗
(τ), ω(τ))](t
r
)
¸
where α
∗
(τ) are decisions fromany other policy. First note that E
¸
T
2
r
(t
r
)
¸
= E
¸
T
2
¸
because the
renewal durationis independent of the queue state (t
r
). Next, note that the conditional expectations
in the next three terms on the righthandside of the above inequality can be changed into pure
expectations (given that t
r
is a renewal time) under the assumption that the policy α
∗
(τ) is ωonly.
Thus:
((t
r
)) +VE
¸
t
r
+T
r
−1
¸
τ=t
r
ˆ y
0
(α(τ), ω(τ))(t
r
)
¸
≤ DE
¸
T
2
¸
(4.79)
+VE
¸
t
r
+T
r
−1
¸
τ=t
r
ˆ y
0
(α
∗
(τ), ω(τ))
¸
+
L
¸
l=1
Z
l
(t
r
)E
¸
t
r
+T
r
−1
¸
τ=t
r
ˆ y
l
(α
∗
(τ), ω(τ))
¸
+
K
¸
k=1
Q
k
(t
r
)E
¸
t
r
+T
r
−1
¸
τ=t
r
[ ˆ a
k
(α
∗
(τ), ω(τ)) −
ˆ
b
k
(α
∗
(τ), ω(τ))]
¸
76 4. OPTIMIZINGTIMEAVERAGES
The expectations in the ﬁnal terms are expected rewards over a renewal period, and so by basic
renewal theory (130)(66), we have for all l ∈ {0, 1, . . . , L} and all k ∈ {1, . . . , K}:
E
¸
t
r
+T
r
−1
¸
τ=t
r
ˆ y
l
(α(τ), ω(τ))
¸
= E{T } y
∗
l
(4.80)
E
¸
t
r
+T
r
−1
¸
τ=t
r
[ ˆ a
k
(α
∗
(τ), ω(τ)) −
ˆ
b
k
(α
∗
(τ), ω(τ))]
¸
= E{T } (a
∗
k
−b
∗
k
) (4.81)
where y
∗
l
, a
∗
k
, b
∗
k
are the inﬁnite horizon time average values achieved for the ˆ y
l
(α
∗
(t ), ω(t )),
ˆ a
k
(α
∗
(t ), ω(t )), and
ˆ
b
k
(α
∗
(t ), ω(t )) processes under the ωonly policy α
∗
(t ). This basic renewal
theory fact can easily be understood as follows (with the below equalities holding with probability
1):
4
y
∗
l
= lim
R→∞
1
t
R
t
R
−1
¸
τ=0
ˆ y
l
(α
∗
(τ), ω(τ))
= lim
R→∞
¸
R−1
r=0
¸
t
r
+T
r
−1
τ=t
r
ˆ y
l
(α
∗
(τ), ω(τ))
¸
R−1
r=0
T
r
=
lim
R→∞
1
R
¸
R−1
r=0
¸
t
r
+T
r
−1
τ=t
r
ˆ y
l
(α
∗
(τ), ω(τ))
lim
R→∞
1
R
¸
R−1
r=0
T
r
=
E
¸
¸
T
0
−1
τ=0
ˆ y
l
(α
∗
(τ), ω(τ))
¸
E{T }
where the ﬁnal equality holds by the strong law of large numbers (noting that both the numerator
and denominator are just a time average of i.i.d. quantities). In particular, the numerator is a sum of
i.i.d. quantities because the policy α
∗
(t ) is ωonly, and so the sum penalty over each renewal period
is independent but identically distributed. Plugging (4.80)(4.81) into (4.79) yields:
((t
r
)) +VE
¸
t
r
+T
r
−1
¸
τ=t
r
ˆ y
0
(α(τ), ω(τ))(t
r
)
¸
≤ DE
¸
T
2
¸
+VE{T } y
∗
0
+
L
¸
l=1
Z
l
(t )E{T } y
∗
l
+
K
¸
k=1
Q
k
(t )E{T } (a
∗
k
−b
∗
k
)
The above holds for any time averages {y
∗
l
, a
∗
k
, b
∗
k
} that can be achieved by ωonly policies. However,
byTheorem4.5, we knowthat if the problemis feasible, then either there is a single ωonly policy that
achieves time averages y
∗
0
= y
opt
0
, y
∗
l
≤ 0 for all l ∈ {1, . . . , L}, (a
∗
k
−b
∗
k
) ≤ 0 for all k ∈ {1, . . . , K},
4
Because the processes are deterministically bounded and have time averages that converge with probability 1, the Lebesgue
Dominated Convergence Theorem (145) ensures the time average expectations are the same as the pure time averages (see
Exercise 7.9).
4.9. NONI.I.D. MODELS ANDUNIVERSAL SCHEDULING 77
or there is an inﬁnite sequence of ωonly policies that approach these averages. Plugging this into
the above yields:
((t
r
)) +VE
¸
t
r
+T
r
−1
¸
τ=t
r
ˆ y
0
(α(τ), ω(τ))(t
r
)
¸
≤ DE
¸
T
2
¸
+VE{T } y
opt
0
Taking expectations of the above, summing the resulting telescoping series over r ∈ {0, . . . , R −1},
and dividing by VRE{T } yields:
E{L((t
R
))} −E{L((0))}
VE{T } R
+
1
E{T } R
E
¸
t
R
−1
¸
τ=0
ˆ y
0
(α(τ), ω(τ))
¸
≤ y
opt
0
+
DE
¸
T
2
¸
VE{T }
Because t
R
/R → E{T } with probability 1 (by the law of large numbers), it can be shown that the
middle term has a limsup that is equal to the limsup time average expected penalty. Thus, assuming
E{L((0))} < ∞, we have:
y
0
=
limsup
t →∞
1
t
t −1
¸
τ=0
E
¸
ˆ y
0
(α(τ), ω(τ))
¸
≤ y
opt
0
+
DE
¸
T
2
¸
VE{T }
= y
opt
0
+O(1/V) (4.82)
where we note that the constants D, E{T }, and E
¸
T
2
¸
do not depend on V. Similarly, it can
be shown that if the problem is feasible then all queues are mean rate stable, and if the slackness
condition of Assumption A1 holds, then sum average queue backlog is O(V) (144). This leads to
the following theorem.
Theorem 4.12 (Markov Modulated Processes (144)) Assume the ω(t ) process is modulated by the
DTMC (t ) as described above, the boundedness assumptions (4.74)(4.76) hold, E{L((0))} < ∞,
and that the driftpluspenalty algorithm is used every slot t . If the problem is feasible, then:
(a) The penalty satisﬁes (4.82), so that y
0
≤ y
opt
0
+O(1/V).
(b) All queues are mean rate stable, and so y
l
≤ 0 for all l ∈ {1, . . . , L}.
(c) If the Slackness Assumption A1 holds, then all queues Q
k
(t ) are strongly stable with average
backlog O(V).
4.9.2 NONERGODICMODELS ANDARBITRARY SAMPLEPATHS
Now assume that the ω(t ) process follows an arbitrary sample path, possibly one with nonergodic
behavior. However, continue to assume that the deterministic bounds (4.74)(4.76) hold, so that
Lemma 4.11 applies. We present a technique developed in (41)(40) for stock market trading and
modiﬁed in(39)(38) for use inwireless networks witharbitrary trafﬁc, channels and mobility. Because
ω(t ) follows an arbitrary sample path, usual “equilibrium” notions of optimality are not relevant,
and so we use a different metric for evaluation of the driftpluspenalty algorithm, called the T 
slot lookahead metric. Speciﬁcally, let T and R be positive integers, and consider the ﬁrst RT slots
78 4. OPTIMIZINGTIMEAVERAGES
{0, 1, . . . , RT −1} being divided into Rframes of size T . For the rthframe (for r ∈ {0, . . . , R −1}),
we deﬁne c
∗
r
as the optimal cost associated with the following static optimization problem, called the
T slot lookahead problem. This problem has variables α(τ) ∈ {rT, . . . , (r +1)T −1}, and treats
the ω(τ) values in this interval as known quantities:
Minimize: c
r
=
1
T
(r+1)T −1
¸
τ=rT
ˆ y
0
(α(τ), ω(τ)) (4.83)
Subject to: 1)
(r+1)T −1
¸
τ=rT
ˆ y
l
(α(τ), ω(τ)) ≤ 0 ∀l ∈ {1, . . . , L}
2)
(r+1)T −1
¸
τ=rT
[ ˆ a
k
(α(τ), ω(τ)) −
ˆ
b
k
(α(τ), ω(τ))] ≤ 0 ∀k ∈ {1, . . . , K}
3) α(τ) ∈ A
ω(τ)
∀τ ∈ {rT, . . . , (r +1)T −1}
The value c
∗
r
thus represents the optimal empirical average penalty for frame r over all policies
that have full knowledge of the future ω(τ) values over the frame and that satisfy the constraints.
5
We assume throughout that the constraints are feasible for the above problem. Feasibility is often
guaranteed when there is an “idle” action, such as the action of admitting and transmitting no data,
which can be used on all slots to trivially satisfy the constraints in the form 0 ≤ 0.
Frame r consists of slots τ ∈ {rT, . . . , (r +1)T −1}. Let α
∗
(τ) represent the decisions that
solve the T slot lookahead problem (4.83) over this frame to achieve cost c
∗
r
.
6
It is generally im
possible to solve for the α
∗
(τ) decisions, as these would require knowledge of the ω(τ) values up to
T slots into the future. However, the α
∗
(τ) values exist, and can still be plugged into Lemma 4.11
to yield the following (using t = rT and T as the frame size):
L((rT +T )) −L((rT )) +V
rT +T −1
¸
τ=rT
ˆ y
0
(α(τ), ω(τ))
≤ DT
2
+V
rT +T −1
¸
τ=rT
ˆ y
0
(α
∗
(τ), ω(τ)) +
L
¸
l=1
Z
l
(rT )
rT +T −1
¸
τ=rT
[ ˆ y
l
(α
∗
(τ), ω(τ))]
+
K
¸
k=1
Q
k
(rT )
rT +T −1
¸
τ=rT
[ ˆ a
k
(α
∗
(τ), ω(τ)) −
ˆ
b
k
(α
∗
(τ), ω(τ))]
≤ DT
2
+VT c
∗
r
where the ﬁnal inequality follows by noting that the α
∗
(τ) policy satisﬁes the constraints of the
T slot lookahead problem (4.83) and yields cost c
∗
r
.
5
Theorem 4.13 holds exactly as stated in the extended case when c
∗
r
is redeﬁned by a T slot lookahead problem that al
lows actions [( ˜ y
∗
l
(τ)), ( ˜ a
∗
k
(τ)), (
˜
b
∗
k
(τ))] every slot τ to be taken within the convex hull of the set of all possible values of
[( ˆ y
l
(α, ω(τ))), ( ˆ a
k
(α, ω(τ))), (
ˆ
b
k
(α, ω(τ)))] under α ∈ A
ω(τ)
, but we skip this extension for simplicity of exposition.
6
For simplicity, we assume the inﬁmum cost is achievable. Else, we can derive the same result by taking a limit over policies that
approach the inﬁmum.
4.9. NONI.I.D. MODELS ANDUNIVERSAL SCHEDULING 79
Summing the above over r ∈ {0, . . . , R −1} (for any integer R > 0) yields:
L((RT )) −L((0)) +V
RT −1
¸
τ=0
ˆ y
0
(α(τ), ω(τ)) ≤ DT
2
R +VT
R−1
¸
r=0
c
∗
r
(4.84)
Dividing by VT R, using the fact that L((RT )) ≥ 0, and rearranging terms yields:
1
RT
RT −1
¸
τ=0
ˆ y
0
(α(τ), ω(τ)) ≤
1
R
R−1
¸
r=0
c
∗
r
+
DT
V
+
L((0))
VT R
(4.85)
where we recall that α(τ) represents the decisions under the driftpluspenalty algorithm. The
inequality (4.85) holds for all integers R > 0. When R is large, the ﬁnal term on the righthand
side above goes to zero (this term is exactly zero if L((0)) = 0). Thus, we have that the time
average cost is within O(1/V) of the time average of the c
∗
r
values. The above discussion proves part
(a) of the following theorem:
Theorem 4.13 (Universal Scheduling) Assume the ω(t ) sample path satisﬁes the boundedness assump
tions (4.74)(4.76), and that initial queue backlog is ﬁnite. Fix any integers R > 0 and T > 0, and
assume the T slot lookahead problem (4.83) is feasible for every frame r ∈ {0, 1, . . . , R −1}. If the
driftpluspenalty algorithm is implemented every slot t , then:
(a) The time average cost over the ﬁrst RT slots satisﬁes (4.85). In particular,
7
limsup
t →∞
1
t
t −1
¸
τ=0
ˆ y
0
(α(τ), ω(τ)) ≤ limsup
R→∞
1
R
R−1
¸
r=0
c
∗
r
+DT/V
where c
∗
r
is the optimal cost in the T slot lookahead problem (4.83) for frame r, and D is deﬁned in (4.78).
(b) All actual and virtual queues are rate stable, and so we have:
limsup
t →∞
1
t
t −1
¸
τ=0
ˆ y
l
(α(τ), ω(τ)) ≤ 0 ∀l ∈ {1, . . . , L}
(c) Suppose there exists an > 0 and a sequence of decisions ˜ α(τ) ∈ A
ω(τ)
that satisﬁes the following
slackness assumptions for all frames r:
rT +T −1
¸
τ=rT
ˆ y
l
( ˜ α(τ), ω(τ)) ≤ 0 ∀l ∈ {1, . . . , L} (4.86)
1
T
rT +T −1
¸
τ=rT
[ ˆ a
k
( ˜ α(τ), ω(τ)) −
ˆ
b
k
( ˜ α(τ), ω(τ))] ≤ − ∀k ∈ {1, . . . , K} (4.87)
7
It is clear that the limsup over times sampled every T slots is the same as the regular limsup because the ˆ y
0
(·) values are bounded.
Indeed, we have
¸
t /T T
τ=0
ˆ y
0
(α(τ), ω(τ)) +Ty
min
0
≤
¸
t
τ=0
ˆ y
0
(α(τ), ω(τ)) ≤
¸
t /T T
τ=0
ˆ y
0
(α(τ), ω(τ)) +Ty
max
0
. Dividing
both sides by t and taking limits shows these limits are equal.
80 4. OPTIMIZINGTIMEAVERAGES
Then:
limsup
t →∞
1
t
t −1
¸
τ=0
K
¸
k=1
Q
k
(τ) ≤
DT
+
V(y
max
0
−y
min
0
)
+
T −1
2
K
¸
k=1
max[a
max
k
, b
max
k
]
Proof. Part (a) has already been shown in the above discussion. We provide a summary of parts (b)
and (c): The inequality (4.84) plus the boundedness assumptions (4.74)(4.76) imply that there is
a ﬁnite constant F > 0 such that L((RT )) ≤ FR for all R. By an argument similar to part (a)
of Theorem 4.1, it can then be shown that lim
R→∞
Q
k
(RT )/(RT ) = 0 for all k ∈ {1, . . . , K} and
lim
R→∞
Z
l
(RT )/(RT ) = 0 for all l ∈ {1, . . . , L}. Further, these limits that sample only on slots
RT (as R → ∞) are clearly the same when taken over all t → ∞ because the queues can change
by at most a constant proportional to T in between the sample times. This proves part (b).
Part (c) follows by plugging the policy ˜ α(τ) for τ ∈ {rT, . . . , (r +1)T −1} into Lemma 4.11
and using (4.86)(4.87) to yield:
L((rT +T )) −L((rT )) +V
rT +T −1
¸
τ=rT
ˆ y
0
(α(τ), ω(τ)) ≤ DT
2
+VTy
max
0
−T
K
¸
k=1
Q
k
(rT )
and hence:
L((rT +T )) −L((rT )) ≤ DT
2
+VT (y
max
0
−y
min
0
) −T
K
¸
k=1
Q
k
(rT )
≤ DT
2
+VT (y
max
0
−y
min
0
) −
K
¸
k=1
T −1
¸
j=0
Q
k
(rT +j)
+
K
¸
k=1
T −1
¸
j=0
j max[a
max
k
, b
max
k
]
= DT
2
+VT (y
max
0
−y
min
0
) −
K
¸
k=1
T −1
¸
j=0
Q
k
(rT +j)
+
(T −1)T
2
K
¸
k=1
max[a
max
k
, b
max
k
]
Summing the above over r ∈ {0, . . . , R −1} yields:
L((RT )) −L((0)) +
RT −1
¸
τ=0
K
¸
k=1
Q
k
(τ) ≤ RDT
2
+RVT (y
max
0
−y
min
0
)
+
R(T −1)T
2
K
¸
k=1
max[a
max
k
, b
max
k
]
4.10. EXERCISES 81
Using L((RT )) ≥ 0, dividing by RT and taking a limsup as R → ∞ yields:
limsup
R→∞
1
RT
RT −1
¸
τ=0
K
¸
k=1
Q
k
(τ) ≤
DT
+
V(y
max
0
−y
min
0
)
+
T −1
2
K
¸
k=1
max[a
max
k
, b
max
k
]
2
Inequality (4.85) holds for all R and T , and hence it can be viewed as a family of bounds that
apply to the same sample path under the driftpluspenalty algorithm. Note also that increasing the
value of T changes the frame size and typically improves the c
∗
r
values (as it allows these values to be
achieved with a larger future lookahead). However, this affects the error term DT/V, requiring V
to also be increased as T increases. Increasing V creates a larger queue backlog. We thus see a similar
[O(1/V), O(V)] costbacklog tradeoff for this sample path context. If the slackness assumptions
(4.86)(4.87) are modiﬁed to also include slackness in the y
l
(·) constraints, a modiﬁed argument
can be used to show the worst case queue backlog is bounded for all time by a constant that is O(V)
(see also (146)(39)(38)).
The target value
1
R
¸
R−1
r=0
c
∗
r
that we use for comparison does not represent the optimal cost
that can be achieved over the full horizon RT if the entire future were known. However, when T is
large it still represents a meaningful target that is not trivial to achieve, as it is one that is deﬁned in
terms of an ideal policy with T slot lookahead. It is remarkable that the driftpluspenalty algorithm
can closely track such an “ideal” T slot lookahead algorithm.
4.10 EXERCISES
Exercise 4.1. Let Q = (Q
1
, . . . , Q
K
) and L(Q) =
1
2
¸
K
k=1
Q
2
k
.
a) If L(Q) ≤ 25, show that Q
k
≤
√
50 for all k ∈ {1, . . . , K}.
b) If L(Q) > 25, show that Q
k
>
√
50/K for at least one queue k ∈ {1, . . . , K}.
c) Let K = 2. Plot the region of all nonnegative vectors (Q
1
, Q
2
) such that L(Q) = 2. Also
plot for L(Q) = 2.5. Give an example where L(Q
1
(t ), Q
2
(t )) = 2.5, L(Q
1
(t +1), Q
2
(t +1)) =
2, but where Q
1
(t ) < Q
1
(t +1).
Exercise 4.2. For any constants Q ≥ 0, b ≥ 0, a ≥ 0, show that:
(max[Q−b, 0] +a)
2
≤ Q
2
+b
2
+a
2
+2Q(a −b)
Exercise 4.3. Let Q(t ) be a discrete time vector process with Q(0) = 0, and let f (t ) and g(t )
be discrete time real valued processes. Suppose there is a nonnegative function L(Q(t )) such that
82 4. OPTIMIZINGTIMEAVERAGES
L(0) = 0, and such that its conditional drift (Q(t )) satisﬁes the following every slot τ and for all
possible Q(τ):
(Q(τ)) +E{f (τ)Q(τ)} ≤ E{g(τ)Q(τ)}
a) Use the law of iterated expectations to prove that:
E{L(Q(τ +1))} −E{L(Q(τ))} +E{f (τ)} ≤ E{g(τ)}
b) Use telescoping sums together with part (a) to prove that for any t > 0:
1
t
¸
t −1
τ=0
E{f (τ)} ≤
1
t
¸
t −1
τ=0
E{g(τ)}
Exercise 4.4. (Opportunistically Minimizing an Expectation) Consider the game described in
Section 1.8. Suppose that ω is a Gaussian random variable with mean m and variance σ
2
. Deﬁne
c(α, ω) = ω
2
+ω(3 −2α) +α
2
.
a) Compute the optimal choice of α (as a function of the observed ω) to minimize E{c(α, ω)}.
Compute E{c(α, ω)} under your optimal policy.
b) Suppose that ωis exponentially distributed withmean 1/λ. Does the optimal policy change?
Does E{c(α, ω)} change?
c) Let ω = (ω
1
, . . . , ω
K
), α = (α
1
, . . . , α
K
), = (
1
, . . . ,
K
) be nonnegative vectors.
Deﬁne c(α, ω, ) =
¸
K
k=1
¸
Vα
k
−
k
log(1 +α
k
ω
k
)
¸
, where log(·) denotes the natural logarithm
and V ≥ 0. We choose α subject to 0 ≤ α
k
≤ 1 for all k, and α
k
α
j
= 0 for k = j. Design a policy
that observes ω and chooses α to minimize E{c(α, ω, )}. Hint: First compute the solution
assuming that α
k
> 0.
Exercise 4.5. (The DriftPlusPenalty Method) Explain, using the game of opportunistically min
imizing an expectation described in Section 1.8, how choosing α(t ) ∈ A
ω(t )
according to (4.48)
(4.49) minimizes the righthandside of (4.44).
Exercise 4.6. (Probability 1 Convergence) Consider the ﬁxedV driftpluspenalty algorithm
(4.48)(4.49), but assume the following modiﬁed Slater condition holds:
Assumption A2: There is an > 0 such that for any Jdimensional vector h = (h
1
, . . . , h
J
)
that consists only of values 1 and −1, there is an ωonly policy α
∗
(t ) (which depends on h) that
satisﬁes:
E
¸
ˆ y
0
(α
∗
(t ), ω(t ))
¸
≤ y
0,max
(4.88)
E
¸
ˆ y
l
(α
∗
(t ), ω(t ))
¸
≤ − ∀l ∈ {1, . . . , L} (4.89)
E
¸
ˆ e
j
(α
∗
(t ), ω(t ))
¸
= h
j
∀j ∈ {1, . . . , J} (4.90)
E
¸
ˆ a
k
(α
∗
(t ), ω(t ))
¸
≤ E
¸
ˆ
b
k
(α
∗
(t ), ω(t ))
¸
− ∀k ∈ {1, . . . , K} (4.91)
4.10. EXERCISES 83
Using H(t ) and (t, H(t )) as deﬁned in Section 4.1.3, it can be shown that for all t and all possible
H(t ), we have (compare with (4.52)):
(t, H(t )) +VE{y
0
(t )H(t )} ≤ B +C +VE
¸
y
∗
0
(t )H(t )
¸
+
L
¸
l=1
Z
l
(t )E
¸
y
∗
l
(t )H(t )
¸
+
J
¸
j=1
H
j
(t )E
¸
e
∗
j
(t )H(t )
¸
+
K
¸
k=1
Q
k
(t )E
¸
a
∗
k
(t ) −b
∗
k
(t )  H(t )
¸
(4.92)
where y
∗
l
(t ), e
∗
j
(t ), a
∗
k
(t ), b
∗
k
(t ) represent decisions under any other (possibly randomized) action
α
∗
(t ) that can be made on slot t (so that y
∗
l
(t ) = ˆ y
l
(α
∗
(t ), ω(t )), etc.).
a) Deﬁne h = (h
1
, . . . , h
J
) by:
h
j
=
¸
−1 if H
j
(t ) ≥ 0
1 if H
j
(t ) < 0
Using this h, plug the ωonly policy α
∗
(t ) from (4.88)(4.91) into the righthandside of (4.92) to
obtain:
(t, H(t )) +VE{y
0
(t )H(t )} ≤ B +C +Vy
0,max
−
⎡
⎣
L
¸
l=1
Z
l
(t ) +
K
¸
k=1
Q
k
(t ) +
J
¸
j=1
H
j
(t )
⎤
⎦
b) Assume that (4.16)(4.17) hold for y
0
(t ), and that the fourth moment assumption (4.18)
holds. Use this with part (a) to obtain probability 1 bounds on the limsup time average queue backlog
via Theorem 4.4.
c) Nowconsider the ωonly policy that yields (4.53)(4.56), and plug this into the righthand
side of (4.92) to yield a probability 1 bound on the limsup time average of y
0
(t ), again by Theorem
4.4.
Exercise 4.7. (Min Average Power (21)) Consider a wireless downlink with arriving data a(t ) =
(a
1
(t ), . . . , a
K
(t )) every slot t . The data is stored in separate queues Q(t ) = (Q
1
(t ), . . . , Q
K
(t ))
for transmission over K different channels. The update equation is (4.23). Service variables b
k
(t ) are
determined by a power allocation vector P(t ) = (P
1
(t ), . . . , P
K
(t )) according to b
k
(t ) = log(1 +
S
k
(t )P
k
(t )), where log(·) denotes the natural logarithm, and S(t ) = (S
1
(t ), . . . , S
K
(t )) is a vector
of channel attenuations. Assume that S(t ) is known at the beginning of each slot t , and satisﬁes
0 ≤ S
k
(t ) ≤ 1 for all k. Power is allocated subject to P(t ) ∈ A, where A is the set of all power
vectors with at most one nonzero element and such that 0 ≤ P
k
≤ P
max
for all k ∈ {1, . . . , K},
where P
max
is a peak power constraint. Assume that the vectors a(t ) and S(t ) are i.i.d. over slots,
and that 0 ≤ a
k
(t ) ≤ a
max
k
for all t , for some ﬁnite constants a
max
k
.
84 4. OPTIMIZINGTIMEAVERAGES
a) Using ω(t )
=
(a(t ), S(t )), α(t ) = P(t ), J = 0, L = 0, y
0
(t ) =
¸
K
k=1
P
k
(t ), state the drift
pluspenalty algorithm for a ﬁxed V in this context.
b) Assume we use an exact implementation of the algorithm in part (a) (so that C = 0), and
that the problem is feasible. Use Theorem 4.8 to conclude that all queues are mean rate stable, and
compute a value B such that:
limsup
t →∞
1
t
t −1
¸
τ=0
K
¸
k=1
E{P
k
(τ)} ≤ P
opt
av
+B/V
where P
opt
av
is the minimum average power over any stabilizing algorithm.
c) Assume Assumption A1 holds for a given > 0. Use Theorem 4.8c to give a bound on the
time average sum of queue backlog in all queues.
Exercise 4.8. (PlaceHolder Backlog)
a) Show that for any values V, p, s, q such that V > 0, p ≥ 0, q ≥ 0, 0 ≤ s ≤ 1, if q < V,
we have Vp −q log(1 +sp) > 0 whenever p > 0 (where log(·) denotes the natural logarithm).
Conclude that the algorithm from Exercise 4.7 chooses P
k
(t ) = 0 whenever Q
k
(t ) < V.
b) Use part (a) to conclude that Q
k
(t ) ≥ max[V −log(1 +P
max
), 0] for all t greater than or
equal to the time t
∗
for which this inequality ﬁrst holds. By how much can placeholder bits reduce
average backlog from the bound given in part (c) of Exercise 4.7? This exercise computes a simple
placeholder Q
place
k
that is not the largest possible. A more detailed analysis in (143) computes a
larger placeholder value.
Exercise 4.9. (MaximumThroughput Subject to Peak and Average Power Constraints (21)) Con
sider the same system of Exercise 4.7, with the exception that it is now a wireless uplink, and queue
backlogs now satisfy:
Q
k
(t +1) = max[Q
k
(t ) −b
k
(t ), 0] +x
k
(t )
where x
k
(t ) is a ﬂowcontrol decision for slot t , made subject to the constraint 0 ≤ x
k
(t ) ≤ a
k
(t ) for all
t . The control action is now a joint ﬂow control and power allocation decision α(t ) = [x(t ), P(t )].
We want the average power expenditure over each link k to be less than or equal to P
av
k
, where P
av
k
is a ﬁxed constant for each k ∈ {1, . . . , K} (satisfying P
av
k
≤ P
max
). The new goal is to maximize
a weighted sum of admission rates
¸
K
k=1
θ
k
x
k
subject to queue stability and to all average power
constraints, where {θ
1
, . . . , θ
K
} are a given set of positive weights.
a) Using J = 0, L = K, y
0
(t ) = −
¸
K
k=1
θ
k
x
k
(t ), and a ﬁxed V, state the driftpluspenalty
algorithmfor this problem. Note that the constraints P
k
≤ P
av
k
should be enforced by virtual queues
Z
k
(t ) of the form (4.40) with a suitable deﬁnition of y
k
(t ).
4.10. EXERCISES 85
b) Use Theorem 4.8 to conclude that all queues are mean rate stable (and hence all average
power constraints are met), and compute a value B such that:
liminf
t →∞
1
t
t −1
¸
τ=0
K
¸
k=1
θ
k
E{x
k
(τ)} ≥ ut il
opt
−B/V
where ut il
opt
is the optimal weighted sum of admitted rates into the network under any algorithm
that stabilizes the queues and satisﬁes all average power constraints.
c) Show that the algorithm is such that x
k
(t ) = 0 whenever Q
k
(t ) > Vθ
k
. Assume that all
queues are initially empty, and compute values Q
max
k
such that Q
k
(t ) ≤ Q
max
k
for all t ≥ 0 and
all k ∈ {1, . . . , K}. This shows that queues are deterministically bounded, even without the Slater
condition of Assumption A1.
d) Show that the algorithm is such that P
k
(t ) = 0 whenever Z
k
(t ) > Q
k
(t ). Conclude that
Z
k
(t ) ≤ Z
max
k
, where Z
max
k
is deﬁned Z
max
k
=
Q
max
k
+(P
max
−P
av
k
).
e) Use part (d) and the sample path inputoutput inequality (2.3) to conclude that for any
positive integer T , the total power expendedby eachlink k over any T slot interval is deterministically
less than or equal to T P
av
k
+Z
max
k
. That is:
t
0
+T −1
¸
τ=t
0
P
k
(τ) ≤ T P
av
k
+Z
max
k
∀t
0
∈ {0, 1, 2, . . .}, ∀T ∈ {1, 2, 3, . . .}
f ) Suppose link k is a wireless transmitter with a battery that has initial energy E
k
. Use part
(e) to provide a guarantee on the lifetime of the link.
Exercise 4.10. (OutofDate Queue Backlog Information) Consider the Kqueue problem with
L = J = 0, and 0 ≤ a
k
(t ) ≤ a
max
and 0 ≤ b
k
(t ) ≤ b
max
for all k and all t , for some ﬁnite constants
a
max
and b
max
. The network controller attempts to performthe driftpluspenalty algorithm(4.48)
(4.49) every slot. However, it does not have access to the current queue backlogs Q
k
(t ), and only
receives delayed information Q
k
(t −T ) for some integer T ≥ 0. It thus uses Q
k
(t −T ) in place
of Q
k
(t ) in (4.48). Let α
ideal
(t ) be the optimal decision of (4.48)(4.49) in the ideal case when
current queue backlogs Q
k
(t ) are used, and let α
approx
(t ) be the implemented decision that uses the
outofdate queue backlogs Q
k
(t −T ). Show that α
approx
(t ) yields a Cadditive approximation for
some ﬁnite constant C. Speciﬁcally, compute a value C such that:
V ˆ y
0
(α
approx
(t ), ω(t )) +
K
¸
k=1
Q
k
(t )[ ˆ a
k
(α
approx
(t ), ω(t )) −
ˆ
b
k
(α
approx
(t ), ω(t ))] ≤
V ˆ y
0
(α
ideal
(t ), ω(t )) +
K
¸
k=1
Q
k
(t )[ ˆ a
k
(α
ideal
(t ), ω(t )) −
ˆ
b
k
(α
ideal
(t ), ω(t ))] +C
86 4. OPTIMIZINGTIMEAVERAGES
This shows that we can still optimize the systemand provide stability with outofdate queue backlog
information. Treatment of delayed queue information for Lyapunov drift arguments was perhaps
ﬁrst used in (147), where random delays without a deterministic bound are also considered.
t 0 1 2 3 4 5 6 7 8
Arrivals a
1
(t ) 3 0 3 0 0 1 0 1 0
a
2
(t ) 2 0 1 0 1 1 0 0 0
Channels S
1
(t ) G G M M G G M M G
S
2
(t ) M M B M B M B G B
Max Q
i
b
i
Q
1
(t ) 0 3 0 3 1 0 1 1 2
Policy Q
2
(t ) 0 2 2 2 2 3 2 1 0
Figure 4.3: Arrivals, channel conditions, and queue backlogs for a two queue wireless downlink.
Exercise 4.11. (Simulation) Consider a 2queue system with time varying channels (S
1
(t ), S
2
(t )),
where S
i
(t ) ∈ {G, M, B}, representing “Good,” “Medium,” “Bad” channel conditions for i ∈ {1, 2}.
Only one channel can be served per slot. All packets have ﬁxed length, and 3 packets can be served
when a channel is “Good,” 2 when “Medium,” and 1 when “Bad.” Exactly one unit of power is
expended when we serve any channel (regardless of its condition). Asample path example is given in
Fig. 4.3, which expends 8 units of power over the ﬁrst 9 slots under the policy that serves the queue
that yields the largest Q
i
(t )b
i
(t ) value, which is a special case of the driftpluspenalty algorithm
for K = 2, J = L = 0, V = 0.
a) Given the full future arrival and channel events as shown in the table, and given Q
1
(0) =
Q
2
(0) = 0, select a different set of channels to serve over slots {0, 1, . . . , 8} that also leaves the
system empty on slot 9, but that minimizes the amount of power required to do so (so that more
than 1 slot will be idle). How much power is used?
b) Assume these arrivals and channels are repeated periodically every 9 slots. Simulate the
system using the driftpluspenalty policy of choosing the queue i that maximizes Q
i
(t )b
i
(t ) −V
whenever this quantity is nonnegative, and remains idle if this is negative for both i = 1 and i = 2.
Find the empirical average power expenditure and the empirical average queue backlog over 10
6
slots when V = 0. Repeat for V = 1, V = 5, V = 10, V = 20, V = 50, V = 100, V = 200.
c) Repeat part (b) in the case when arrival vectors (a
1
(t ), a
2
(t )) and channel vectors
(S
1
(t ), S
2
(t )) are independent and i.i.d. over slots with the same empirical distribution as that
achieved over 9 slots in the table, so that Pr[(a
1
, a
2
) = (3, 2)] = 1/9, Pr[(S
1
, S
2
) = (G, M)] =
3/9, Pr[(S
1
, S
2
) = (M, B)] = 2/9, etc. Note: You should ﬁnd that the resulting minimumpower that is
approached as V is increased is the same as part (b), and is strictly less than the empirical power expenditure
of part (a).
4.10. EXERCISES 87
d) Show that queue i is only served if Q
i
(t ) ≥ V/3. Conclude that Q
i
(t ) ≥ max[V/3 −
3, 0]
=
Q
place
for all t , providedthat this inequality holds for Q
i
(0). Hence, using Q
place
placeholder
packets would reduce average backlog by exactly this amount, with no loss of power performance.
Exercise 4.12. (Wireless Network Coding) Consider a systemof 4 wireless users that communicate
to each other through a base station (Fig. 4.4). User 1 desires to send data to user 2 and user 2 desires
to send data to user 1. Likewise, user 3 desires to send data to user 4 and user 4 desires to send data
to user 3.
1
2
3
4
Base
Station
Phase 1:
Uplink transmission
of different packets p
i
p1
p2
p3
p4
1
2
3
4
Base
Station
p3+p4
Phase 2:
Downlink Broadcast
of an XORed packet
p3+p4
p3+p4
p3+p4
Figure 4.4: An illustration of the 2 phases forming a cycle.
Let t ∈ {0, 1, 2, . . .} index a cycle. Each cycle t is divided into 2 phases: In the ﬁrst phase,
users 1, 2, 3, and 4 all send a new packet (if any) to the base station (this can be accomplished, for
example, using TDMA or FDMA in the ﬁrst phase). In the second phase, the base station makes
a transmission decision α(t ) ∈ {{1, 2}, {3, 4}}. If α(t ) = {1, 2}, the headofline packets for users 1
and 2 are XORed together, XORing with 0 if only one packet is available, and creating a null packet
if no packets from users 1 or 2 are available. The XORed packet (or null packet) is then broadcast
to all users. We assume all packets are labeled with sequence numbers, and the sequence numbers of
both XORed packets are placed in a packet header. As in (148), users 1 and 2 can decode the new
data if they keep copies of the previous packets they sent. If α(t ) = {3, 4}, a similar XOR operation
is done for user 3 and 4 packets.
Assume that downlink channel conditions are timevarying and known at the beginning of
each cycle, with channel state vector S(t ) = (S
1
(t ), S
2
(t ), S
3
(t ), S
4
(t )), where S
i
(t ) ∈ {ON, OFF}.
Only users with ON channel states can receive the transmission. The queueing dynamics from one
cycle to the next thus satisfy:
Q
1
(t +1) = max[Q
1
(t ) −b
1
(t ), 0] +a
2
(t ) , Q
2
(t +1) = max[Q
2
(t ) −b
2
(t ), 0] +a
1
(t )
Q
3
(t +1) = max[Q
3
(t ) −b
3
(t ), 0] +a
4
(t ) , Q
4
(t +1) = max[Q
4
(t ) −b
4
(t ), 0] +a
3
(t )
88 4. OPTIMIZINGTIMEAVERAGES
where Q
k
(t ) is the integer number of packets waiting in the base station for transmission to desti
nation k, b
k
(t ) ∈ {0, 1} is the number of packets transmitted over the downlink to node k during
cycle t , satisfying:
b
k
(t ) =
ˆ
b
k
(α(t ), S(t )) =
¸
1 if S
k
(t ) = ON and k ∈ α(t )
0 otherwise
and a
k
(t ) is the number of packets arriving over the uplink from node k during cycle t (notice that
data destined for node 1 arrives as the process a
2
(t ), etc.). Suppose that S(t ) is i.i.d. over cycles, with
probabilities π
s
= Pr[S(t ) = s], where s = (S
1
, S
2
, S
3
, S
4
). Arrivals a
k
(t ) are i.i.d. over cycles with
rate λ
k
= E{a
k
(t )}, for k ∈ {1, . . . , 4}, and with bounded second moments.
a) Suppose that S(t ) = (ON, ON, OFF, ON) and that Q
k
(t ) > 0 for all queues k ∈
{1, 2, 3, 4}. It is tempting to assume that mode α(t ) = {1, 2} is the best choice in this case,
although this is not always true. Give an example where it is impossible to stabilize the sys
tem if the controller always chooses α(t ) = {1, 2} whenever S(t ) = (ON, ON, OFF, ON) or
S(t ) = (ON, ON, ON, OFF), but where a more intelligent control choice would stabilize the
system.
8
b) Deﬁne L(Q(t )) =
1
2
¸
4
k=1
Q
k
(t )
2
. Compute (Q(t )) and show it has the form:
(Q(t )) ≤ B −E
¸
¸
4
k=1
Q
k
(t )[b
k
(t ) −λ
m(k)
]
Q(t )
¸
(4.93)
where m(1) = 2, m(2) = 1, m(3) = 4, m(4) = 3, and where B < ∞. Design a control policy that
observes S(t ) and chooses actions α(t ) to minimize the righthandside of (4.93) over all feasible
control policies.
c) Consider all possible Sonly algorithms that choose a transmission mode as a stationary
and random function of the observed S(t ) (and independent of queue backlog). Deﬁne the Sonly
throughput region as the set of all (λ
1
, λ
2
, λ
3
, λ
4
) vectors for which there exists an Sonly policy
α
∗
(t ) such that:
E
¸
ˆ
b
1
(α
∗
(t ), S(t )),
ˆ
b
2
(α
∗
(t ), S(t )),
ˆ
b
3
(α
∗
(t ), S(t )),
ˆ
b
4
(α
∗
(t ), S(t ))
¸
≥ (λ
2
, λ
1
, λ
4
, λ
3
)
Suppose that (λ
1
, λ
2
, λ
3
, λ
4
) is interior to , so that (λ
1
+, λ
2
+, λ
3
+, λ
4
+) ∈ for some
value > 0. Conclude that the driftminimizing policy of part (b) makes all queues strongly stable,
and provide an upper bound on time average expected backlog.
Exercise 4.13. (A modiﬁed algorithm) Suppose the conditions of Theorem 4.8 hold. However,
suppose that every slot t we observe (t ), ω(t ) and choose an action α(t ) ∈ A
ω(t )
that minimizes the
8
It can also be shown that an algorithm that always chooses α(t ) = {1, 2} under states (ON, ON, OFF, ON) or
(ON, ON, ON, OFF) and when there are indeed two packets to serve will not necessarily work—we need to take queue
length into account. See (10) for related examples in the context of a 3 ×3 packet switch.
4.10. EXERCISES 89
exact driftpluspenalty expression ((t )) +VE
¸
ˆ y
0
(α(t ), ω(t ))(t )
¸
, rather than minimizing
the upper bound on the righthandside of (4.44).
a) Show that the same performance guarantees of Theorem 4.8 hold.
b) Using (2.2), state this algorithm (for C = 0) in the special case when L = J = 0,
y
l
(t ) = e
j
(t ) = 0, ω(t ) = [(a
1
(t ), . . . , a
K
(t )), (S
1
(t ), . . . , S
K
(t ))], ˆ a
k
(α(t ), ω(t )) = a
k
(t ), α(t ) ∈
{1, . . . , K} (representing a single queue that we serve every slot), and:
ˆ
b
k
(α(t ), ω(t )) =
¸
S
k
(t ) if α(t ) = k
0 if α(t ) = k
Q(t) Compressor (A(t), (t))
b(t) a(t)
Distortion d(t)
Figure 4.5: A dynamic data compression system for Exercise 4.14.
Exercise 4.14. (DistortionAware Data Compression (143)) Consider a single queue Q(t ) with
dynamics (2.1), where b(t ) is an i.i.d. transmission rate process with bounded second moments. As
shown in Fig. 4.5, the arrival process a(t ) is generated as the output of a data compression operation.
Speciﬁcally, every slot t a new packet of size A(t ) bits arrives to the system (where A(t ) = 0 if
no packet arrives). This packet has metadata β(t ), where β(t ) ∈ B, where B represents a set of
different data types. Assume the pair (A(t ), β(t )) is i.i.d. over slots. Every slot t , a network controller
observes (A(t ), β(t )) and chooses a data compression option c(t ) ∈ {0, 1, . . . , C}, where c(t ) indexes a
collectionof possible data compressionalgorithms.The output of the compressor is a compressed packet
of random size a(t ) = ˆ a(A(t ), β(t ), c(t )), causing a random distortion d(t ) =
ˆ
d(A(t ), β(t ), c(t )).
Note that ˆ a(·) and
ˆ
d(·) are random functions. Assume the pair (a(t ), d(t )) is i.i.d. over all slots with
the same A(t ), β(t ), c(t ). Deﬁne functions m(A, β, c) and δ(A, β, c) as follows:
m(A, β, c)
=
E
¸
ˆ a(A(t ), β(t ), c(t ))A(t ) = A, β(t ) = β, c(t ) = c
¸
δ(A, β, c)
=
E
¸
ˆ
d(A(t ), β(t ), c(t ))A(t ) = A, β(t ) = β, c(t ) = c
¸
Assume that c(t ) = 0 corresponds to no compression, so that m(A, β, 0) = A, δ(A, β, 0) = 0 for
all (A, β). Further, assume that c(t ) = C corresponds to throwing the packet away, so that
m(A, β, C) = 0 for all (A, β). Further assume there is a ﬁnite constant σ
2
such that for all (A, β, c),
we have:
E
¸
ˆ a(A(t ), β(t ), c(t ))
2
A(t ) = A, β(t ) = β, c(t ) = c
¸
≤ σ
2
E
¸
ˆ
d(A(t ), β(t ), c(t ))
2
A(t ) = A, β(t ) = β, c(t ) = c
¸
≤ σ
2
90 4. OPTIMIZINGTIMEAVERAGES
Assume the functions m(A, β, c) and δ(A, β, c) are known. We want to design an algorithm
that minimizes the time average expected distortion d subject to queue stability. It is clear that this
problem is feasible, as we can always choose c(t ) = C (although this would maximize distortion).
Use the driftpluspenalty framework (with ﬁxed V) to design such an algorithm. Hint: Use iterated
expectations to claim that:
E
¸
ˆ a(A(t ), β(t ), c(t ))Q(t )
¸
= E
¸
E
¸
ˆ a(A(t ), β(t ), c(t ))Q(t ), A(t ), β(t ), c(t )
¸
Q(t )
¸
= E{m(A(t ), β(t ), c(t ))Q(t )}
Exercise 4.15. (Weighted Lyapunov Functions) Recompute the driftpluspenalty bound in
Lemma 4.6 under the following modiﬁed Lyapunov function:
L((t )) =
1
2
K
¸
k=1
w
k
Q
k
(t )
2
+
1
2
L
¸
l=1
Z
l
(t )
2
+
1
2
J
¸
j=1
H
j
(t )
2
where {w
k
}
K
k=1
are a positive weights. How does the driftpluspenalty algorithm change?
Q
1
(t)
Q
2
(t)
Q
3
(t)
a
1
(t)
a
2
(t)
X(t)
Y(t)
1
(t)
2
(t)
3
(t)
Figure 4.6: The 3node multihop network for Exercise 4.16.
Exercise 4.16. (MultiHop with Orthogonal Channels) Consider the 3node wireless network of
Fig. 4.6.The network operates indiscrete time withunit time slots t ∈ {0, 1, 2, . . .}. It has orthogonal
channels, so that node 3 can send and receive at the same time. The network controller makes power
allocation decisions and routing decisions.
• (Power Allocation) Let μ
i
(t ) be the transmission rate at node i on slot t , for i ∈ {1, 2, 3}. This
transmission rate depends on the channel state S
i
(t ) and the power allocation decision P
i
(t )
by the following function:
μ
i
(t ) = log(1 +P
i
(t )S
i
(t )) ∀i ∈ {1, 2, 3}, ∀t
4.10. EXERCISES 91
where log(·) denotes the natural logarithm. Every time slot t , the network controller
observes the channels (S
1
(t ), S
2
(t ), S
3
(t )) and determines the power allocation decisions
(P
1
(t ), P
2
(t ), P
3
(t )), made subject to the following constraints:
0 ≤ P
i
(t ) ≤ 1 ∀i ∈ {1, 2, 3}, ∀t
• (Routing) There are two arrival processes X(t ) and Y(t ), taking units of bits. The X(t ) process
can be routed to either queue 1 or 2. The Y(t ) process goes directly into queue 3. Let a
1
(t )
and a
2
(t ) represent the routing decision variables, where a
1
(t ) is the amount of bits routed to
queue 1, and a
2
(t ) is the amount of bits routed to queue 2. The network controller observes
X(t ) every slot and makes decisions for (a
1
(t ), a
2
(t )) subject to the following constraints:
a
1
(t ) ≥ 0 , a
2
(t ) ≥ 0 , a
1
(t ) +a
2
(t ) = X(t ) ∀t
It can be shown that the Lyapunov drift (Q(t )) satisﬁes the following every slot t :
(Q(t )) ≤ B +Q
1
(t )E{a
1
(t ) −μ
1
(t )Q(t )} +Q
2
(t )E{a
2
(t ) −μ
2
(t )Q(t )}
+Q
3
(t )E{μ
1
(t ) +Y(t ) −μ
3
(t )Q(t )}
where B is a positive constant. We want to design a dynamic algorithm that solves the following
problem:
Minimize: P
1
+P
2
+P
3
Subject to: 1) Q
i
(t ) is mean rate stable ∀i ∈ {1, 2, 3}
2) a
1
(t ) ≥ 0 , a
2
(t ) ≥ 0 , a
1
(t ) +a
2
(t ) = X(t ) ∀t
3) 0 ≤ P
i
(t ) ≤ 1 ∀i ∈ {1, 2, 3}, ∀t
a) Using a ﬁxed parameter V > 0, state the driftpluspenalty algorithm for this problem.
The algorithm should have separable power allocation and routing decisions.
b) Suppose that V = 20, Q
1
(t ) = 50, Q
2
(t ) = Q
3
(t ) = 20, S
1
(t ) = S
2
(t ) = S
3
(t ) = 1.
What should the value of P
1
(t ) be under the driftpluspenalty algorithm? (give a numeric value)
c) Suppose (X(t ), Y(t )) is i.i.d. over slots with E{X(t )} = λ
X
and E{Y(t )} = λ
Y
.
Suppose (S
1
(t ), S
2
(t ), S
3
(t )) is i.i.d. over slots. Suppose there is a stationary and ran
domized policy that observes (X(t ), Y(t ), S
1
(t ), S
2
(t ), S
3
(t )) every slot t , and makes ran
domized decisions (a
∗
1
(t ), a
∗
2
(t ), P
∗
1
(t ), P
∗
2
(t ), P
∗
3
(t )) based only on the observed vector
(X(t ), Y(t ), S
1
(t ), S
2
(t ), S
3
(t )). State desirable properties for the expectations of E
¸
a
∗
1
(t )
¸
,
E
¸
a
∗
2
(t )
¸
, E
¸
log(1 +P
∗
i
(t )S
i
(t ))
¸
for i ∈ {1, 2, 3} that would ensure your algorithm of part (a)
would make all queues mean rate stable with time average expected power expenditure given by:
P
1
+P
2
+P
3
≤ φ +B/V
where φ is a desired value for the sum time average power. Your properties should be in the form of
desirable inequalities.
92 4. OPTIMIZINGTIMEAVERAGES
4.11 APPENDIX4.A—PROVINGTHEOREM4.5
This appendix characterizes the set of all possible time average expectations for the variables [(y
l
(t )),
(e
j
(t )), (a
k
(t )), (b
k
(t ))] deﬁned in Section 4.2. It concludes with a proof of Theorem 4.5, which
shows that optimality for the problem (4.31)(4.35) can be deﬁned over the class of ωonly policies.
The proof involves set theoretic concepts of convex sets, closed sets, limit points, and convergent subse
quences. In particular, we use the well known fact that if {x(t )}
∞
t =0
is an inﬁnite sequence of vectors
that are contained in some bounded set X ⊆ R
k
(for some ﬁnite integer k > 0), then there must
exist a convergent subsequence {x(t
i
)}
∞
i=1
that converges to a point x in the closure of X (see, for
example, A14 of (145)). Speciﬁcally, there is a vector x in the closure of X and an inﬁnite sequence
of increasing positive integers {t
1
, t
2
, t
3
, . . .} such that:
lim
i→∞
x(t
i
) = x
4.11.1 THEREGION
Let represent the region of all [(y
l
)
L
l=0
, (e
j
)
J
j=1
, (a
k
)
K
k=1
, (b
k
)
K
k=1
] values that can be achieved by
ωonly policies. Equivalently, this can be viewed as the region of all oneslot expectations that can
be achieved via randomized decisions when the ω(t ) variable takes values according to its stationary
distribution. The boundedness assumptions (4.25)(4.30) ensure that the set is bounded. It is easy
to showthat is also convex by using an ωonly policy that is a mixture of two other ωonly policies.
Now note that for any slot τ and assuming that ω(τ) has its stationary distribution, the one
slot expectation under any decision α(τ) ∈ A
ω(τ)
is in the set , even if that decision is from an
arbitrary policy that is not an ωonly policy. That is:
E
¸
[( ˆ y
l
(α(τ), ω(τ)), (ˆ e
j
(α(τ), ω(τ)), ( ˆ a
k
(α(τ), ω(τ)), (
ˆ
b
k
(α(τ), ω(τ))]
¸
∈
where the expectation is with respect to the random ω(τ) (which has the stationary distribution)
and the possibly random α(τ) that is made by the policy in reaction to the observed ω(τ). This
expectation is in because any sample path of events that lead to the policy choosing α(τ) on
slot τ simply affects the conditional distribution of α(τ) given the observed ω(τ), and hence the
expectation can be equally achieved by the ωonly policy that uses the same conditional distribution.
9
This observation directly leads to the following simple lemma.
Lemma 4.17 If ω(τ) is in its stationary distribution for all slots τ, then for any policy that chooses
α(τ) ∈ A
ω(τ)
over time (including policies that are not ωonly), we have for any slot t > 0:
1
t
t −1
¸
τ=0
E
¸
[( ˆ y
l
(α(τ), ω(τ)), (ˆ e
j
(α(τ), ω(τ)), ( ˆ a
k
(α(τ), ω(τ)), (
ˆ
b
k
(α(τ), ω(τ))]
¸
∈ (4.94)
9
We implicitly assume that the decision α(τ) on slot τ has a well deﬁned conditional distribution.
4.11. APPENDIX4.A—PROVINGTHEOREM4.5 93
Thus, if r
∗
is a limit point of the time average on the lefthandside of (4.94) over a subsequence of times
t
i
that increase to inﬁnity, then r
∗
is in the closure of .
Proof. Each term in the time average is itself in , and so the time average is also in because is
convex. 2
Thus, the ﬁnite horizon time average expectation under any policy cannot escape the set ,
and any inﬁnite horizon time average that converges to a limit point cannot escape the closure of
. If the set is closed, then any limit point r
∗
is inside and hence (by deﬁnition of ) can
be exactly achieved as the oneslot average under some ωonly policy. If is not closed, then r
∗
can be achieved arbitrarily closely (i.e., within a distance δ, for any arbitrarily small δ > 0), by an
ωonly policy. This naturally leads to the following characterization of optimality in terms of ωonly
policies.
4.11.2 CHARACTERIZINGOPTIMALITY
Deﬁne
˜
as the set of all points [(y
l
), (e
j
), (a
k
), (b
k
)] in the closure of that satisfy:
y
l
≤ 0 ∀l ∈ {1, . . . , L} , e
j
= 0 ∀j ∈ {1, . . . , J} , a
k
≤ b
k
∀k ∈ {1, . . . , K} (4.95)
It can be shown that, if nonempty,
˜
is closed and bounded. If
˜
is nonempty, deﬁne y
∗
0
as the
minimum value of y
0
for which there is a point [(y
l
), (e
j
), (a
k
), (b
k
)] ∈
˜
. Intuitively, the set
˜
is the set of all time averages achievable by ωonly policies that meet the required time average
constraints and that have time average expected arrivals less than or equal to time average expected
service, and y
∗
0
is the minimum time average penalty achievable by such ωonly policies. We now
show that y
∗
0
= y
opt
0
.
Theorem 4.18 Suppose the ω(t ) process is stationary with distribution π(ω), and that the system
satisﬁes the boundedness assumptions (4.25)(4.30) and the law of large numbers assumption speciﬁed in
Section 4.2. Suppose the problem (4.31)(4.35) is feasible. Let α(t ) be any control policy that satisﬁes the
constraints (4.32)(4.35), and let r(t ) represent the t slot expected time average in the lefthandside of
(4.94) under this policy.
a) Any limit point [(y
l
), (e
j
), (a
k
), (b
k
)] of {r(t )}
∞
t =1
is in the set
˜
. In particular, the set
˜
is
nonempty.
b) The time average expected penalty under the algorithm α(t ) satisﬁes:
liminf
t →∞
1
t
t −1
¸
τ=0
E
¸
ˆ y
0
(α(t ), ω(t ))
¸
≥ y
∗
0
(4.96)
Thus, no algorithm that satisﬁes the constraints (4.32)(4.35) can yield a time average expected penalty
smaller than y
∗
0
. Further, y
∗
0
= y
opt
0
.
94 4. OPTIMIZINGTIMEAVERAGES
Proof. To prove part (a), note from Lemma 4.17 that r(t ) is always inside the (bounded) set .
Hence, it has a limit point, and any such limit point is in the closure of . Now consider a particular
limit point [(y
l
), (e
j
), (a
k
), (b
k
)], and let {t
i
}
∞
i=1
be the subsequence of nonnegative integer time
slots that increase to inﬁnity and satisfy:
lim
i→∞
r(t
i
) = [(y
l
), (e
j
), (a
k
), (b
k
)]
Because the constraints (4.32) and (4.33) are satisﬁed, it must be the case that:
y
l
≤ 0 ∀l ∈ {1, . . . , L} , e
j
= 0 ∀j ∈ {1, . . . , J} (4.97)
Further, by the samplepath inequality (2.5), we have for all t
i
> 0 and all k:
E{Q
k
(t
i
)}
t
i
−
E{Q
k
(0)}
t
i
≥
1
t
i
t
i
−1
¸
τ=0
E
¸
ˆ a
k
(α(τ), ω(τ)) −
ˆ
b
k
(α(τ), ω(τ))
¸
Because the control policy makes all queues mean rate stable, taking a limit of the above over the
times t
i
→ ∞ yields 0 ≥ a
k
−b
k
, and hence we ﬁnd that:
a
k
≤ b
k
∀k ∈ {1, . . . , K} (4.98)
The results (4.97) and (4.98) imply that the limit point [(y
l
), (e
j
), (a
k
), (b
k
)] is in the set
˜
.
To prove part (b), let {t
i
}
∞
i=1
be a subsequence of nonnegative integer time slots that increase
to inﬁnity, that yield the liminf by:
lim
i→∞
1
t
i
t
i
−1
¸
τ=0
E
¸
ˆ y
0
(α(τ), ω(τ))
¸
= liminf
t →∞
1
t
t −1
¸
τ=0
E
¸
ˆ y
0
(α(τ), ω(τ))
¸
(4.99)
and that yield well deﬁned time averages [(y
l
), (e
j
), (a
k
), (b
k
)] for r(t
i
) (such a subsequence can be
constructed by ﬁrst taking a subsequence {t
i
} that achieves the liminf, and then taking a convergent
subsequence {t
i
} of {t
i
} that ensures the r(t
i
) values converge to a limit point). Then by part (a), we
know that [(y
l
), (e
j
), (a
k
), (b
k
)] ∈
˜
, and so its y
0
component (being the liminf value in (4.99)) is
greater than or equal to y
∗
0
because y
∗
0
, is the smallest possible y
0
value of all points in
˜
.
It follows that no control algorithm that satisﬁes the required constraints has a time average
expected penalty less than y
∗
0
. We now show that it is possible to achieve y
∗
0
, and so y
∗
0
= y
opt
0
. For
simplicity, we consider only the case when is closed. Let [(y
∗
l
), (e
∗
j
), (a
∗
k
), (b
∗
k
)] be the point in
˜
that has component y
∗
0
. Because is closed,
˜
is a subset of , and so [(y
∗
l
), (e
∗
j
), (a
∗
k
), (b
∗
k
)] ∈ . It
follows there is an ωonly algorithmα
∗
(t ) with expectations exactly equal to [(y
∗
l
), (e
∗
j
), (a
∗
k
), (b
∗
k
)]
on every slot t . Thus, the time average penalty is y
∗
0
, and the constraints (4.32), (4.33) are satisﬁed
because y
∗
l
≤ 0 for all l ∈ {1, . . . , L}, e
∗
j
= 0 for all j ∈ {1, . . . , J}. Further, our “lawoflarge
number” assumption on ω(t ) ensures the time averages of ˆ a
k
(α
∗
(t ), ω(t )) and
ˆ
b
k
(α
∗
(t ), ω(t )),
4.11. APPENDIX4.A—PROVINGTHEOREM4.5 95
achieved under the ωonly algorithm α
∗
(t ), are equal to a
∗
k
and b
∗
k
with probability 1. Because
a
∗
k
≤ b
∗
k
and the second moments of a
k
(t ) and b
k
(t ) are bounded by a ﬁnite constant σ
2
for all t ,
the Rate Stability Theorem (Theorem 2.4) ensures that all queues Q
k
(t ) are mean rate stable. 2
We use this result to prove Theorem 4.5.
Proof. (Theorem 4.5) Let [(y
∗
l
), (e
∗
j
), (a
∗
k
), (b
∗
k
)] be the point in
˜
that has component y
∗
0
(where
y
∗
0
= y
opt
0
by Theorem 4.18). Note by deﬁnition that
˜
is in the closure of . If is closed,
then [(y
∗
l
), (e
∗
j
), (a
∗
k
), (b
∗
k
)] ∈ and so there exists an ωonly policy α
∗
(t ) that achieves the av
erages [(y
∗
l
), (e
∗
j
), (a
∗
k
), (b
∗
k
)] and thus satisﬁes (4.36)(4.39) with δ = 0. If is not closed, then
[(y
∗
l
), (e
∗
j
), (a
∗
k
), (b
∗
k
)] is a limit point of and so there is an ωonly policy that gets arbitrarily close
to [(y
∗
l
), (e
∗
j
), (a
∗
k
), (b
∗
k
)], yielding (4.36)(4.39) for any δ > 0. 2
The above proof shows that if the assumptions of Theorem 4.5 hold and if the set is closed,
then an ωonly policy exists that satisﬁes the inequalities (4.36)(4.39) with δ = 0.
97
C H A P T E R 5
Optimizing Functions of Time
Averages
Here we use the driftpluspenalty technique to develop methods for optimizing convex functions of
time averages, and for ﬁnding local optimums for nonconvex functions of time averages. To begin,
consider a discrete time queueing system Q(t ) = (Q
1
(t ), . . . , Q
K
(t )) with the standard update
equation:
Q
k
(t +1) = max[Q
k
(t ) −b
k
(t ), 0] +a
k
(t ) (5.1)
Let x(t ) = (x
1
(t ), . . . , x
M
(t )), y(t ) = (y
1
(t ), . . . , y
L
(t )) be attribute vectors. As before, the ar
rival, service, and attribute variables are determined by general functions a
k
(t ) = ˆ a
k
(α(t ), ω(t )),
b
k
(t ) =
ˆ
b
k
(α(t ), ω(t )), x
m
(t ) = ˆ x
m
(α(t ), ω(t )) and y
l
(t ) = ˆ y
l
(α(t ), ω(t )). Consider now the fol
lowing problem:
Maximize: φ(x) (5.2)
Subject to: 1) y
l
≤ 0 ∀l ∈ {1, . . . , L} (5.3)
2) All queues Q
k
(t ) are mean rate stable (5.4)
3) α(t ) ∈ A
ω(t )
∀t (5.5)
where φ(x) is a concave, continuous, and entrywise nondecreasing utility function deﬁned over an
appropriate region of R
M
(such as the nonnegative orthant when x
m
(t ) attributes are nonnegative,
or all R
M
otherwise). A more general problem, without the entrywise nondecreasing assumption,
is considered in Section 5.4.
Problems with the structure (5.2)(5.5) arise, for example, when maximizing network
throughpututility, where x represents a vector of achieved throughput and φ(x) is a concave
function that measures network fairness. An example utility function that is useful when attributes
x
m
(t ) are nonnegative is:
φ(x) =
M
¸
m=1
log(1 +ν
m
x
m
) (5.6)
where ν
m
are positive constants. This is useful because each component function log(1 +ν
m
x
m
)
has a diminishing returns property as x
m
is increased, has maximum derivative ν
m
, and is 0 when
x
m
= 0. Another common example is:
φ(x) =
M
¸
m=1
log(x
m
) (5.7)
98 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
This corresponds to the proportional fairness objective (1)(2)(5). The function φ(x) does not need to
be differentiable. An example nondifferentiable function that is concave, continuous, and entrywise
nondecreasing is φ(x) = min[x
1
, x
2
, . . . , x
M
].
The problem (5.2)(5.5) is different from all of the problems seen in Chapter 4 because it
involves a function of a time average. It does not conform to the structure required for the driftplus
penalty framework of Chapter 4 unless the function φ(x) is linear, because a linear function of a
time average is equal to the time average of the linear function. In the case when φ(x) is concave but
nonlinear, maximizing the time average of φ(x(t )) is typically not the same as maximizing φ(x) (see
Exercise 5.12 for a special case when it is the same). Below we transform the problem by adding a
rectangle constraint and auxiliary variables in such a way that the transformed problem involves only
time averages (not functions of time averages), so that the driftpluspenalty framework of Chapter
4 can be applied. The key step in analyzing the transformed problem is Jensen’s inequality.
5.0.3 THERECTANGLECONSTRAINTR
Deﬁne φ
opt
as the maximumutility associatedwiththe above problem, augmentedwiththe following
rectangle constraint:
x ∈ R (5.8)
where Ris deﬁned:
R
=
{(x
1
, . . . , x
M
) ∈ R
M
γ
m,min
≤ x
m
≤ γ
m,max
∀m ∈ {1, . . . , M}}
where γ
m,min
and γ
m,max
are ﬁnite constants (we typically choose γ
m,min
= 0 incases whenattributes
x
m
(t ) are nonnegative). This rectangle constraint is useful because it limits the xvector to a bounded
region, and it will ensure that the auxiliary variables that we soon deﬁne are also bounded. While
this x ∈ Rconstraint may limit optimality, it is clear that φ
opt
increases to the maximum utility of
the problem without this constraint as the rectangle Ris expanded. Further, φ
opt
is exactly equal to
the maximum utility of the original problem (5.2)(5.5) whenever the rectangle R is chosen large
enough to contain a time average attribute vector x that is optimal for the original problem.
5.0.4 JENSEN’S INEQUALITY
Assume the concave utility function φ(x) is deﬁned over the rectangle region x ∈ R. Let X =
(X
1
, . . . , X
M
) be a random vector that takes values in R. Jensen’s inequality for concave functions
states that:
E{X} ∈ R , and E{φ(X)} ≤ φ(E{X}) (5.9)
Indeed, even though we stated Jensen’s inequality in Section 1.8 in terms of convex functions
f (x) with a reversed inequality E{f (X)} ≥ f (E{X}), this immediately implies (5.9) by deﬁning
f (X) = −φ(X).
Nowlet γ (τ) = (γ
1
(τ), . . . , γ
M
(τ)) be an inﬁnite sequence of randomvectors that take values
in the set R for τ ∈ {0, 1, 2, . . .}. It is easy to show that Jensen’s inequality for concave functions
99
directly implies the following for all t > 0 (see Exercise 5.3):
1
t
t −1
¸
τ=0
γ (τ) ∈ R and
1
t
t −1
¸
τ=0
φ(γ (τ)) ≤ φ
1
t
t −1
¸
τ=0
γ (τ)
(5.10)
1
t
t −1
¸
τ=0
E{γ (τ)} ∈ R and
1
t
t −1
¸
τ=0
E{φ(γ (τ))} ≤ φ
1
t
t −1
¸
τ=0
E{γ (τ)}
(5.11)
Taking limits of (5.11) as t → ∞ yields:
γ ∈ R and φ(γ ) ≤ φ(γ )
where γ and φ(γ ) are deﬁned as the following limits:
γ
=
lim
t →∞
1
t
t −1
¸
τ=0
E{γ (τ)} , φ(γ )
=
lim
t →∞
1
t
t −1
¸
τ=0
E{φ(γ (τ))} (5.12)
where we temporarily assume the above limits exist. We have used the fact that the rectangle Ris a
closed set to conclude that a limit of vectors in Ris also in R.
In summary, whenever the limits of γ and φ(γ ) exist, we can conclude by Jensen’s inequality
that φ(γ ) ≥ φ(γ ). That is, the utility function evaluated at the time average expectation γ is greater
than or equal to the time average expectation of φ(γ (t )).
5.0.5 AUXILIARY VARIABLES
Let γ (t ) = (γ
1
(t ), . . . , γ
M
(t )) be a vector of auxiliary variables chosen within the set Revery slot.
We consider the following modiﬁed problem:
Maximize: φ(γ ) (5.13)
Subject to: 1) y
l
≤ 0 ∀l ∈ {1, . . . , L} (5.14)
2) γ
m
≤ x
m
∀m ∈ {1, . . . , M} (5.15)
3) All queues Q
k
(t ) are mean rate stable (5.16)
4) γ (t ) ∈ R ∀t (5.17)
5) α(t ) ∈ A
ω(t )
∀t (5.18)
where φ(γ ) and γ = (γ
1
, . . . , γ
M
) are deﬁned in (5.12). This transformed problem involves only
time averages, rather than functions of time averages, and hence can be solved with the driftplus
penalty framework of Chapter 4. Indeed, we can deﬁne y
0
(t )
=
−φ(γ (t )), and deﬁne a new control
action α
(t ) = (α(t ), γ (t )) subject to α
(t ) ∈ [A
ω(t )
, R].
This transformed problem (5.13)(5.18) relates to the original problem as follows: Suppose
we have an algorithm that makes decisions α
∗
(t ) and γ
∗
(t ) over time t ∈ {0, 1, 2, . . .} to solve the
transformed problem. That is, assume the solution meets all constraints (5.14)(5.18) and yields a
maximumvalue for the objective (5.13). For simplicity, assume all limiting time average expectations
x
∗
, y
∗
l
, γ
∗
, φ(γ
∗
) exist, where φ(γ
∗
) is the maximum objective value. Then:
100 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
• The decisions α
∗
(t ) produce time averages that satisfy all desired constraints of the original
problem (5.2)(5.5) (so that y
∗
l
≤ 0 for all l and all queues Q
k
(t ) are mean rate stable), and
the resulting time average attribute vector x
∗
satisﬁes φ(x
∗
) ≥ φ(γ
∗
). This is because:
φ(x
∗
) ≥ φ(γ
∗
) ≥ φ(γ
∗
)
where the ﬁrst inequality is due to (5.15) and the entrywise nondecreasing property of φ(x),
and the second inequality is Jensen’s inequality.
• φ(γ
∗
) ≥ φ
opt
.That is, the maximumutility of the transformedproblem(5.13)(5.18) is greater
than or equal to φ
opt
. This is shown in Exercise 5.2.
The above two observations imply that φ(x
∗
) ≥ φ
opt
. Thus, designing a policy to solve the
transformed problem ensures all desired constraints of the original problem (5.2)(5.5) are satisﬁed while
producing a utility that is at least as good as φ
opt
.
5.1 SOLVINGTHETRANSFORMEDPROBLEM
Following the driftpluspenalty method (using a ﬁxed V), we enforce the constraints y
l
≤ 0 and
γ
m
≤ x
m
in the transformed problem (5.13)(5.18) with virtual queues Z
l
(t ) and G
m
(t ):
Z
l
(t +1) = max[Z
l
(t ) +y
l
(t ), 0] , ∀l ∈ {1, . . . , L} (5.19)
G
m
(t +1) = max[G
m
(t ) +γ
m
(t ) −x
m
(t ), 0] , ∀m ∈ {1, . . . , M} (5.20)
Deﬁne (t )
=
[Q(t ), Z(t ), G(t )], and deﬁne the Lyapunov function:
L((t ))
=
1
2
¸
¸
K
k=1
Q
k
(t )
2
+
¸
L
l=1
Z
l
(t )
2
+
¸
M
m=1
G
m
(t )
2
¸
Assume that ω(t ) is i.i.d., and that y
l
(t ), x
m
(t ), a
k
(t ), b
k
(t ) satisfy the boundedness assump
tions (4.25)(4.28). It is easy to show the driftpluspenalty expression satisﬁes:
((t )) −VE{φ(γ (t ))(t )} ≤ D −VE{φ(γ (t ))(t )} +
L
¸
l=1
Z
l
(t )E{y
l
(t )(t )}
+
K
¸
k=1
Q
k
(t )E{a
k
(t ) −b
k
(t )(t )} +
M
¸
m=1
G
m
(t )E{γ
m
(t ) −x
m
(t )(t )} (5.21)
where D is a ﬁnite constant related to the worstcase second moments of y
l
(t ), x
m
(t ), a
k
(t ), b
k
(t ).
A Cadditive approximation chooses γ (t ) ∈ R and α(t ) ∈ A
ω(t )
such that, given (t ), the right
handside of (5.21) is within C of its inﬁmum value. A 0additive approximation thus performs the
following:
• (Auxiliary Variables) For each slot t , observe G(t ) and choose γ (t ) to solve:
Maximize: Vφ(γ (t )) −
¸
M
m=1
G
m
(t )γ
m
(t ) (5.22)
Subject to: γ
m,min
≤ γ
m
(t ) ≤ γ
m,max
∀m ∈ {1, . . . , M} (5.23)
5.1. SOLVINGTHETRANSFORMEDPROBLEM 101
• (α(t ) Decision) For each slot t , observe (t ) and ω(t ), and choose α(t ) ∈ A
ω(t )
to minimize:
L
¸
l=1
Z
l
(t ) ˆ y
l
(α(t ), ω(t )) +
K
¸
k=1
Q
k
(t )[ ˆ a
k
(α(t ), ω(t )) −
ˆ
b
k
(α(t ), ω(t ))]
−
M
¸
m=1
G
m
(t ) ˆ x
m
(α(t ), ω(t ))
• (Queue Update) Update the virtual queues Z
l
(t ) and G
m
(t ) according to (5.19) and (5.20),
and the actual queues Q
k
(t ) by (5.1).
Deﬁne time average expectations x(t ), γ (t ), y
l
(t ) by:
x(t )
=
1
t
t −1
¸
τ=0
E{x(τ)} , γ (t )
=
1
t
t −1
¸
τ=0
E{γ (τ)} , y
l
(t )
=
1
t
t −1
¸
τ=0
E{y
l
(τ)} (5.24)
Deﬁne φ
max
as an upper bound on φ(γ (t )) for all t , and assume it is ﬁnite:
φ
max
=
φ(γ
1,max
, γ
2,max
, . . . , γ
m,max
) < ∞ (5.25)
Theorem 5.1 Suppose the boundedness assumptions (4.25)(4.28), (5.25) hold, the function φ(x) is
continuous, concave, and entrywise nondecreasing, the problem(5.2)(5.5), (5.8) (including the constraint
x ∈ R) is feasible, and E{L((0))} < ∞. If ω(t ) is i.i.d. over slots and any Cadditive approximation
is used every slot, then all actual and virtual queues are mean rate stable and:
liminf
t →∞
φ(x(t )) ≥ φ
opt
−(D +C)/V (5.26)
limsup
t →∞
y
l
(t ) ≤ 0 , ∀l ∈ {1, . . . , L} (5.27)
where φ
opt
is the maximum utility of the problem (5.2)(5.5), (5.8) (including the constraint x ∈ R),
and x(t ), y
l
(t ) are deﬁned in (5.24).
The following extended result provides average queue bounds and utility bounds for all slots
t .
Theorem5.2 Suppose the assumptions of Theorem 5.1 hold.
(a) If there is an > 0, an ωonly policy α
∗
(t ), and a ﬁnite constant φ
such that the following
Slatertype conditions hold:
E
¸
ˆ y
l
(α
∗
(t ), ω(t ))
¸
≤ 0 ∀l ∈ {1, . . . , L} (5.28)
E
¸
ˆ a
k
(α
∗
(t ), ω(t )) −
ˆ
b
k
(α
∗
(t ), ω(t ))
¸
≤ − ∀k ∈ {1, . . . , K} (5.29)
γ
m,min
≤ E
¸
ˆ x
m
(α
∗
(t ), ω(t ))
¸
≤ γ
m,max
∀m ∈ {1, . . . , M} (5.30)
φ(E
¸
ˆ x(α
∗
(t ), ω(t ))
¸
) = φ
(5.31)
102 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
then all queues Q
k
(t ) are strongly stable and for all t > 0, we have:
1
t
t −1
¸
τ=0
K
¸
k=1
E{Q
k
(τ)} ≤
D +C +V
φ(γ
∗
) −φ
+
E{L((0))}
t
where φ(γ
∗
) is the maximum objective function value for the transformed problem (5.13)(5.18).
(b) If all virtual and actual queues are initially empty (so that (0) = 0) and if there are ﬁnite
constants ν
m
≥ 0 such that for all γ (t ) and all x(t ), we have:
φ(γ (t )) −φ(x(t )) ≤
M
¸
m=1
ν
m
γ
m
(t ) −x
m
(t ) (5.32)
then for all t > 0, we have:
φ(x(t )) ≥ φ
opt
−
D +C
V
−
M
¸
m=1
ν
m
E{G
m
(t )}
t
(5.33)
where E{G
m
(t )} /t is O(1/
√
t ) for all m ∈ {1, . . . , M}.
The assumption that all queues are initially empty, made in part (b) of the above theorem,
is made only for convenience. The righthandside of (5.33) would be modiﬁed by subtracting
the additional term E{L((0))} /Vt otherwise. We note that the ν
m
constraint (5.32) needed
in part (b) of the above theorem is satisﬁed for the example utility function in (5.6), but not for
the proportionally fair utility function in (5.7). Further, the algorithm developed in this section
(or Cadditive approximations of the algorithm) often result in deterministically bounded queues,
regardless of whether or not the Slater assumptions (5.28)(5.31) hold (see ﬂow control examples
in Sections 5.25.3 and Exercises 5.55.7). For example, it can be shown that if (5.32) holds, if γ (t )
is chosen by (5.22)(5.23), and if x
m
(t ) ≥ γ
m,min
for all t , then G
m
(t ) ≤ Vν
m
+γ
m,max
for all t
(provided this holds at t = 0). In this case, E{G
m
(t )} /t is O(1/t ), better than the O(1/
√
t ) bound
given in the above theorem. As before, the same algorithm can be shown to perform efﬁciently when
the ω(t ) process is noni.i.d. (38)(39)(136)(42). This is because the auxiliary variables transform the
problemto a structure that is the same as that covered by the ergodic theory and universal scheduling
theory of Section 4.9.
Proof. (Theorem 5.1) Because the Cadditive approximation comes within C of minimizing the
righthandside of (5.21), we have:
((t )) −VE{φ(γ (t ))(t )} ≤ D +C −Vφ(γ
∗
) +
L
¸
l=1
Z
l
(t )E
¸
y
∗
l
(t )(t )
¸
+
K
¸
k=1
Q
k
(t )E
¸
a
∗
k
(t ) −b
∗
k
(t )(t )
¸
+
M
¸
m=1
G
m
(t )E
¸
γ
∗
m
−x
∗
m
(t )(t )
¸
(5.34)
5.1. SOLVINGTHETRANSFORMEDPROBLEM 103
where γ
∗
= (γ
∗
1
, . . . , γ
∗
M
) is any vector in R, and y
∗
l
(t ), a
∗
k
(t ), b
∗
k
(t ), x
∗
m
(t ) are from any alternative
(possibly randomized) policy α
∗
(t ) ∈ A
ω(t )
. Now note that feasibility of the problem (5.2)(5.5),
(5.8) implies feasibility of the transformed problem (5.13)(5.18).
1
This together with Theorem 4.5
implies that for any δ > 0, there is an ωonly policy α
∗
(t ) ∈ A
ω(t )
and a vector γ
∗
∈ Rsuch that:
−φ(γ
∗
) ≤ −φ
opt
+δ
E
¸
ˆ y
l
(α
∗
(t ), ω(t ))
¸
≤ δ ∀l ∈ {1, . . . , L}
E
¸
ˆ a
k
(α
∗
(t ), ω(t )) −
ˆ
b
k
(α
∗
(t ), ω(t ))
¸
≤ δ ∀k ∈ {1, . . . , K}
E
¸
γ
∗
m
− ˆ x
m
(α
∗
(t ), ω(t ))
¸
≤ δ ∀m ∈ {1, . . . , M}
Assuming that δ = 0 for convenience and plugging the above into (5.34) gives:
2
((t )) −VE{φ(γ (t ))(t )} ≤ D +C −Vφ
opt
(5.35)
This is in the exact form for application of the Lyapunov Optimization Theorem (Theorem 4.2)
and hence by that theorem (or, equivalently, by using iterated expectations and telescoping sums in
the above inequality), for all t > 0, we have:
1
t
t −1
¸
τ=0
E{φ(γ (τ))} ≥ φ
opt
−(D +C)/V −E{L((0))} /(Vt )
By Jensen’s inequality for the concave function φ(γ ), we have for all t > 0:
φ(γ (t )) ≥ φ
opt
−(D +C)/V −E{L((0))} /(Vt ) (5.36)
Taking a liminf of both sides yields:
liminf
t →∞
φ(γ (t )) ≥ φ
opt
−(D +C)/V (5.37)
On the other hand, rearranging (5.35) yields:
((t )) ≤ D +C +V(φ
max
−φ
opt
)
Thus, by the Lyapunov Drift Theorem (Theorem 4.1), we know that all queues Q
k
(t ), Z
l
(t ), G
m
(t )
are meanrate stable (infact, we knowthat E{Q
k
(t )} /t , E{G
m
(t )} /t , andE{Z
l
(t )} /t are O(1/
√
t )).
Mean rate stability of Z
l
(t ) and G
m
(t ) together with Theorem 2.5 implies that (5.27) holds, and
that for all m ∈ {1, . . . , M}:
limsup
t →∞
[γ
m
(t ) −x
m
(t )] ≤ 0
Using this with the continuity and entrywise nondecreasing properties of φ(x), it can be shown
that:
liminf
t →∞
φ(γ (t )) ≤ liminf
t →∞
φ(x(t ))
Using this in (5.37) proves (5.26). 2
1
To see this, the transformed problem can just use the same α(t ) decisions, and it can choose γ (t ) = x for all t .
2
The same can be derived using δ > 0 and then taking a limit as δ → 0.
104 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
Proof. (Theorem 5.2) We ﬁrst prove part (b). We have:
φ(γ (t )) = φ(x(t ) +[γ (t ) −x(t )])
≤ φ(x(t ) +max[γ (t ) −x(t ), 0]) (5.38)
≤ φ(x(t )) +
N
¸
m=1
ν
m
max[γ
m
(t ) −x
m
(t ), 0] (5.39)
where (5.38) follows by the entrywise nondecreasing property of φ(x) (where the max[·] rep
resents an entrywise max), and (5.39) follows by (5.32). Substituting this into (5.36) and using
E{L((0))} = 0 yields:
φ(x(t )) ≥ φ
opt
−(D +C)/V −
M
¸
m=1
ν
m
max[γ
m
(t ) −x
m
(t ), 0] (5.40)
By deﬁnition of G
m
(t ) in (5.20) and the sample path queue property (2.5) together with the fact
that G
m
(0) = 0, we have for all m ∈ {1, . . . , M} and any t > 0:
G
m
(t )
t
≥
1
t
t −1
¸
τ=0
γ
m
(τ) −
1
t
t −1
¸
τ=0
x
m
(τ)
Taking expectations above yields for all t > 0:
E{G
m
(t )}
t
≥ γ
m
(t ) −x
m
(t ) ⇒
E{G
m
(t )}
t
≥ max[γ
m
(t ) −x
m
(t ), 0]
Using this in (5.40) proves part (b) of the theorem.
To prove part (a), we plug the ωonly policy α
∗
(t ) from (5.28)(5.31) (using γ
∗
(t ) =
E
¸
ˆ x(α
∗
(t ), ω(t ))
¸
) into (5.34). This directly leads to a version of part (a) of the theorem with
φ(γ
∗
) replaced with φ
max
. A more detailed analysis shows this can be replaced with φ(γ
∗
) because
all constraints of the transformed problem are satisﬁed and so the limsup time average objective can
be no bigger than φ(γ
∗
) (recall (4.96) of Theorem 4.18). 2
5.2 AFLOWBASEDNETWORKMODEL
Here we apply the stochastic utility maximization framework to a simple ﬂow based network model,
where we neglect the actual network queueing and develop a ﬂow control policy that simply ensures
the ﬂow rate over each link is no more than the link capacity (similar to the ﬂow based models for
internet and wireless systems in (2)(23)(29)(149)(150)). Section 5.3 treats a more extensive network
model that explicitly accounts for all queues.
Suppose there are N nodes and L links, where each link l ∈ {1, . . . , L} has a possibly time
varying link capacity b
l
(t ), for slotted time t ∈ {0, 1, 2, . . .}. Suppose there are M sessions, and let
5.2. AFLOWBASEDNETWORKMODEL 105
A
m
(t ) represent the newarrivals to session mon slot t . Each session m ∈ {1, . . . , M} has a particular
source node and a particular destination node. The random network event ω(t ) is thus:
ω(t )
=
[(b
1
(t ), . . . , b
L
(t )); (A
1
(t ), . . . , A
M
(t ))] (5.41)
The control action taken every slot is to ﬁrst choose x
m
(t ), the amount of type m trafﬁc admitted
into the network on slot t , according to:
0 ≤ x
m
(t ) ≤ A
m
(t ) ∀m ∈ {1, . . . , M}, ∀t (5.42)
The constraint (5.42) is just one example of a ﬂow control constraint. We can easily modify this
to the constraint x
m
(t ) ∈ {0, A
m
(t )}, which either admits all newly arriving data, or drops all of
it. Alternatively, the ﬂow controller could place all nonadmitted data into a transport layer storage
reservoir (rather than dropping it), as in (18)(22)(19)(17) (see also Section 5.6). One can model a
network where all sources always have data to send by A
m
(t ) = γ
m,max
for all t , for some ﬁnite value
γ
m,max
used to limit the amount of data admitted to the network on any slot.
Next, we must specify a path for the newly arriving data froma collectionof paths P
m
associated
with path options of session m on slot t (possibly being the set of all possible paths in the network
from the source of session m to its destination). Here, a path is deﬁned in the usual sense, being a
sequence of links starting at the source, ending at the destination, and being such that the end node
of each link is the start node of the next link. Let 1
l,m
(t ) be an indicator variable that is 1 if the
data x
m
(t ) is selected to use a path that contains link l, and is 0 else. The (1
l,m
(t )) values completely
specify the chosen paths for slot t , and hence the decision variable for slot t is given by:
α(t )
=
[(x
1
(t ), . . . , x
M
(t )); (1
l,m
(t ))
l∈{1,...,L},m∈{1,...,M}
]
Let x = (x
1
, . . . , x
M
) be a vector of the inﬁnite horizon time average admitted ﬂow rates.
Let φ(x) =
¸
M
m=1
φ
m
(x
m
) be a separable utility function, where each φ
m
(x) is a continuous, concave,
nondecreasing function in x. Our goal is to maximize the throughpututility φ(x) subject to the
constraint that the time average ﬂowover each link l is less than or equal to the time average capacity
of that link. The inﬁnite horizon utility optimization problem of interest is thus:
Maximize:
¸
M
m=1
φ
m
(x
m
) (5.43)
Subject to:
¸
M
m=1
1
l,m
x
m
≤ b
l
∀l ∈ {1, . . . , L} (5.44)
0 ≤ x
m
(t ) ≤ A
m
(t ) , (1
l,m
(t )) ∈ P
m
∀m ∈ {1, . . . , M}, ∀t (5.45)
106 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
where the time averages are deﬁned:
x
m
=
lim
t →∞
1
t
t −1
¸
τ=0
E{x
m
(τ)}
1
l,m
x
m
=
lim
t →∞
1
t
t −1
¸
τ=0
E
¸
1
l,m
(τ)x
m
(τ)
¸
b
l
=
lim
t →∞
1
t
t −1
¸
τ=0
E{b
l
(τ)}
We emphasize that while the actual network can queue data at each link l, we are not explicitly
accounting for such queueing dynamics. Rather, we are only ensuring the time average ﬂow rate on
each link l satisﬁes (5.44).
Deﬁne φ
opt
as the maximum utility associated with the above problem and subject to the
additional constraint that:
0 ≤ x
m
≤ γ
m,max
∀m ∈ {1, . . . , M} (5.46)
for some ﬁnite values γ
m,max
. This ﬁts the framework of the utility maximization problem (5.2)
(5.5) with y
l
(t )
=
¸
M
m=1
1
l,m
(t )x
m
(t ) −b
l
(t ), K = 0, and with R being all γ vectors that satisfy
0 ≤ γ
m
≤ γ
m,max
for all m ∈ {1, . . . , M} (we choose γ
m,min
= 0 because attributes x
m
(t ) are non
negative). As there are no actual queues Q
k
(t ) in this model, we use only virtual queues Z
l
(t ) and
G
m
(t ), deﬁned by update equations:
Z
l
(t +1) = max
¸
Z
l
(t ) +
M
¸
m=1
1
l,m
(t )x
m
(t ) −b
l
(t ), 0
(5.47)
G
m
(t +1) = max[G
m
(t ) +γ
m
(t ) −x
m
(t ), 0] (5.48)
where γ
m
(t ) are auxiliary variables for m ∈ {1, . . . , M}. The algorithm given in Section 5.0.5 thus
reduces to:
• (Auxiliary Variables) Every slot t , each session m ∈ {1, . . . , M} observes G
m
(t ) and chooses
γ
m
(t ) as the solution to:
Maximize: Vφ
m
(γ
m
(t )) −G
m
(t )γ
m
(t ) (5.49)
Subject to: 0 ≤ γ
m
(t ) ≤ γ
m,max
(5.50)
• (Routing and Flow Control) For each slot t and each session m ∈ {1, . . . , M}, observe the
new arrivals A
m
(t ), the virtual queue backlogs G
m
(t ), and the link queues Z
l
(t ), and choose
x
m
(t ) and a path to maximize:
Maximize: x
m
(t )G
m
(t ) −x
m
(t )
¸
L
l=1
1
l,m
(t )Z
l
(t )
Subject to: 0 ≤ x
m
(t ) ≤ A
m
(t )
The path speciﬁed by (1
l,m
(t )) is in P
m
5.2. AFLOWBASEDNETWORKMODEL 107
This reduces to the following: First ﬁnd a shortest path from the source of session m to the
destination of session m, using link weights Z
l
(t ) as link costs. If the total weight of the
shortest path is less than or equal to G
m
(t ), choose x
m
(t ) = A
m
(t ) and route this data over
this single shortest path. Else, there is too much congestion in the network, and so we choose
x
m
(t ) = 0 (thereby dropping all data A
m
(t )).
• (Virtual Queue Updates) Update the virtual queues according to (5.47) and (5.48).
The shortest path routing in this algorithm is similar to that given in (149), which treats a
ﬂowbased network stability problemunder the assumption that arriving trafﬁc is admissible (so that
ﬂow control is not used). This problem with ﬂow control was introduced in (39) using the universal
scheduling framework of Section 4.9.2, where there are no probabilistic assumptions on the arrivals
or time varying link capacities.
5.2.1 PERFORMANCEOFTHEFLOWBASEDALGORITHM
To apply Theorems 5.1 and 5.2, assume ω(t ) = [(b
1
(t ), . . . , b
L
(t )); (A
1
(t ), . . . , A
M
(t ))] is i.i.d.
over slots, and that the b
l
(t ) and A
m
(t ) processes have bounded second moments. Note that the
problem (5.43)(5.46) is trivially feasible because it is always possible to satisfy the constraints by
admitting no new arrivals on any slot. Suppose we use any Cadditive approximation (where a 0
additive approximation is an exact implementation of the above algorithm). It follows fromTheorem
5.1 that all virtual queues are mean rate stable, and so the time average constraints (5.44) are satisﬁed,
and the achieved utility satisﬁes:
liminf
t →∞
φ(x(t )) ≥ φ
opt
−(D +C)/V (5.51)
where D is a ﬁnite constant related to the maximum second moments of A
m
(t ) and b
l
(t ). Thus,
utility can be pushed arbitrarily close to optimal by increasing V.
We now show that, under some mild additional assumptions, the ﬂow control structure of
this algorithm yields tight deterministic bounds of size O(V) on the virtual queues. Suppose that
A
m
(t ) ≤ A
m,max
for all t , for some ﬁnite constant A
m,max
. Further, to satisfy the constraints (5.32)
needed for Theorem 5.2, assume the utility functions φ
m
(x) have ﬁnite right derivatives at x = 0,
given by constants ν
m
≥ 0, so that for any nonnegative x and y we have:
φ
m
(x) −φ
m
(y) ≤ ν
m
x −y (5.52)
It can be shown that if G
m
(t ) > Vν
m
, then the solution to (5.49)(5.50) is γ
m
(t ) = 0 (see
Exercise 5.5). Because γ
m
(t ) acts as the arrival to virtual queue G
m
(t ) deﬁned in (5.48), it follows
that G
m
(t ) cannot increase on the next slot. Therefore, for all m ∈ {1, . . . , M}:
0 ≤ G
m
(t ) ≤ Vν
m
+γ
m,max
∀t ∈ {0, 1, 2, . . .} (5.53)
provided that this is true for G
m
(0) (which is indeed the case if G
m
(0) = 0). This allows one to
deterministically bound the queue sizes Z
l
(t ) for all l ∈ {1, . . . , L}:
0 ≤ Z
l
(t ) ≤ Vν
max
+γ
max
+MA
max
∀t (5.54)
108 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
provided this holds at time 0, and where ν
max
, γ
max
, A
max
are deﬁned as the maximum of all ν
m
,
γ
m,max
, A
m,max
values:
ν
max
=
max
m∈{1,...,M}
ν
m
, γ
max
=
max
m∈{1,...,M}
γ
m,max
, A
max
=
max
m∈{1,...,M}
A
m,max
To prove this fact, note that if a link l satisﬁes Z
l
(t ) ≤ Vν
max
+γ
max
, then on the next slot, we
have Z
l
(t +1) ≤ Vν
max
+γ
max
+MA
max
because the queue can increase by at most MA
max
on
any slot (see update equation (5.47)). Else, if Z
l
(t ) > Vν
max
+γ
max
, then any path that uses this
link incurs a cost larger than Vν
max
+γ
max
, and thus would incur a cost larger than G
m
(t ) for any
session m. Thus, by the routing and ﬂow control algorithm, no session will choose a path that uses
this link on the current slot, and so Z
l
(t ) cannot increase on the next slot.
Using the sample path inequality (2.3) with the deterministic bound on Z
l
(t ) in (5.54), it
follows that over any interval of T slots (for any positive integer T and any initial slot t
0
), the data
injected for use over link l is no more than Vν
max
+γ
max
+MA
max
beyond the total capacity
offered by the link over that interval:
t
0
+T −1
¸
τ=t
0
M
¸
m=1
1
l,m
(τ)x
m
(τ) ≤
t
0
+T −1
¸
τ=t
0
b
l
(τ) +Vν
max
+γ
max
+MA
max
(5.55)
5.2.2 DELAYEDFEEDBACK
We note that it may be difﬁcult to use the exact queue values Z
l
(t ) when solving for the shortest
path, as these values change every slot. Hence, a practical implementation may use outofdate values
Z
l
(t −τ
l,t
) for some time delay τ
l,t
that may depend on l and t . Further, the virtual queue updates
for Z
l
(t ) in (5.47) are most easily done at each link l, in which case, the actual admitted data x
m
(t )
for that link may not be known until some time delay, arriving as a process x
m
(t −τ
l,m.t
). However,
as the virtual queue size cannot change by more than a ﬁxed amount every slot, the queue value
used differs from the ideal queue value by no more than an additive constant that is proportional to
the maximum time delay. In this case, provided that the maximum time delay is bounded, we are
simply using a Cadditive approximation and the utility and queue bounds are adjusted accordingly
(see Exercise 4.10 and also Section 6.1.1). A more extensive treatment of delayed feedback for the
case of networks without dynamic arrivals or channels is found in (150), which uses a differential
equation method.
5.2.3 LIMITATIONS OFTHIS MODEL
While (5.55) is a very strong deterministic bound that says no link is given more data than it can
handle, it does not directly imply anything about the actual network queues (other than the links
are not overloaded). The (unproven) understanding is that, because the links are not overloaded, the
actual network queues will be stable and all data can arrive to its destination with (hopefully small)
delay.
5.3. MULTIHOP QUEUEINGNETWORKS 109
One might approximate average congestion or delay on a link as a convex function of the time
average ﬂow rate over the link, as in (151)(129)(150).
3
However, we emphasize that this is only an
approximation and does not represent the actual network delay, or even a bound on delay. Indeed,
while it is known that average queue congestion and delay is convex if a general stream of trafﬁc
is probabilistically split (152), this is not necessarily true (or relevant) for dynamically controlled
networks, particularly when the control depends on the queue backlogs and delays themselves. Most
problems involving optimization of actual network delay are difﬁcult and unsolved. Such prob
lems involve not only optimization of rate based utility functions, but engineering of the Lagrange
multipliers (which are related to queue backlogs) associated with those utility functions.
Finally, observe that the update equation for Z
l
(t ) in (5.47) can be interpreted as a queueing
model where all admitted data on slot t is placed immediately on all links l of its path. Similar models
are used in (23)(29)(150)(31). However, this is clearly an approximation because data in an actual
network will traverse its path one link at a time. It is assumed that the actual network stamps all
data with its intended path, so that there is no dynamic rerouting midpath. Section 5.3 treats an
actual multihop queueing network and allows such dynamic routing.
5.3 MULTIHOP QUEUEINGNETWORKS
Here we consider a general multihop network, treating the actual queueing rather than using the
ﬂowbased model of the previous section. Suppose the network has N nodes and operates in slotted
time. There are M sessions, and we let A(t ) = (A
1
(t ), . . . , A
M
(t )) represent the vector of data that
exogenously arrives to the transport layer for each session on slot t (measured either in integer units
of packets or real units of bits).
Each session m ∈ {1, . . . , M} has a particular source node and destination node. Data delivery
takes place by transmissions over possibly multihop paths. We assume that a transport layer ﬂow
controller observes A
m
(t ) every slot and decides how much of this data to add to the network layer
at its source node and how much to drop (ﬂow control decisions are made to limit queue buffers and
ensure the network is stable). Let (x
m
(t ))
M
m=1
be the collection of ﬂow control decision variables on
slot t . These decisions are made subject to the constraints 0 ≤ x
m
(t ) ≤ A
m
(t ) (see also discussion
after (5.42) on modiﬁcations of this constraint).
All data that is intended for destination node c ∈ {1, . . . , N} is called commodity c data,
regardless of its particular session. For each n ∈ {1, . . . , N} and c ∈ {1, . . . , N}, let M
(c)
n
denote
the set of all sessions m ∈ {1, . . . , M} that have source node n and commodity c. All data is queued
according to its commodity, and we deﬁne Q
(c)
n
(t ) as the amount of commodity c data in node n on
slot t . We assume that Q
(n)
n
(t ) = 0 for all t , as data that reaches its destination is removed from the
network. Let Q(t ) denote the matrix of current queue backlogs for all nodes and commodities.
3
Convex constraints can be incorporated using the generalized structure of Section 5.4.
110 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
The queue backlogs change from slot to slot as follows:
Q
(c)
n
(t +1) = Q
(c)
n
(t ) −
N
¸
j=1
˜ μ
(c)
nj
(t ) +
N
¸
i=1
˜ μ
(c)
in
(t ) +
¸
m∈M
(c)
n
x
m
(t )
where ˜ μ
(c)
ij
(t ) denotes the actual amount of commodity c data transmitted from node i to node j
(i.e., over link (i, j)) on slot t . It is useful to deﬁne transmission decision variables μ
(c)
ij
(t ) as the bit
rate offered by link (i, j) to commodity c data, where this full amount is used if there is that much
commodity c data available at node i, so that:
˜ μ
(c)
ij
(t ) ≤ μ
(c)
ij
(t ) ∀i, j, c ∈ {1, . . . , N}, ∀t
For simplicity, we assume that if there is not enough data to send at the offered rate, then null data
is sent, so that:
4
Q
(c)
n
(t +1) = max
⎡
⎣
Q
(c)
n
(t ) −
N
¸
j=1
μ
(c)
nj
(t ), 0
⎤
⎦
+
N
¸
i=1
μ
(c)
in
(t ) +
¸
m∈M
(c)
n
x
m
(t ) (5.56)
This satisﬁes (5.1) if we relate index k (for Q
k
(t ) in (5.1)) to index (n, c) (for Q
(c)
n
(t ) in (5.56)), and
if we deﬁne:
b
(c)
n
(t )
=
N
¸
j=1
μ
(c)
nj
(t ) , a
(c)
n
(t )
=
N
¸
i=1
μ
(c)
in
(t ) +
¸
m∈M
(c)
n
x
m
(t )
5.3.1 TRANSMISSIONVARIABLES
Let S(t ) represent the topology state of the network on slot t , observed on each slot t as in (22). The
value of S(t ) is an abstract and possibly multidimensional quantity that describes the current link
conditions between all nodes under the current slot. The collection of all transmission rates that
can be offered over each link (i, j) of the network is given by a general transmission rate function
b(I (t ), S(t )):
5
b(I (t ), S(t )) = (b
ij
(I (t ), S(t )))
i,j∈{1,...,N},i=j
where I (t ) is a general networkwide resource allocationdecision(suchas link scheduling, bandwidth
selection, modulation, etc.) and takes values in some abstract set I
S(t )
that possibly depends on the
current S(t ).
4
All results hold exactly as stated if this null data is not sent, so that “=” in (5.56) is modiﬁed to “≤” (22).
5
It is worth noting now that for networks with orthogonal channels, our “maxweight” transmission algorithm (to be deﬁned in
the next subsection) decouples to allow nodes to make transmission decisions based only on those components of the current
topology state S(t ) that relate to their own local channels. Of course, for wireless interference networks, all channels are coupled,
although distributed approximations of maxweight transmission exist in this case (see Chapter 6).
5.3. MULTIHOP QUEUEINGNETWORKS 111
Every slot the network controller observes the current S(t ) and makes a resource alloca
tion decision I (t ) ∈ I
S(t )
. The controller then chooses μ
(c)
ij
(t ) variables subject to the following
constraints:
μ
(c)
ij
(t ) ≥ 0 ∀i, j, c ∈ {1, . . . , N} (5.57)
μ
(c)
ii
(t ) = μ
(i)
ij
(t ) = 0 ∀i, j, c ∈ {1, . . . , N} (5.58)
N
¸
c=1
μ
(c)
ij
≤ b
ij
(I (t ), S(t )) ∀i, j ∈ {1, . . . , N} (5.59)
Constraints (5.58) are due to the commonsense observation that it makes no sense to transmit
data from a node to itself, or to keep transmitting data that has already arrived to its destination.
One can easily incorporate additional constraints that restrict the set of allowable links that certain
commodities are allowed to use, as in (22).
5.3.2 THEUTILITY OPTIMIZATIONPROBLEM
This problem ﬁts our general framework by deﬁning the random event ω(t )
=
[A(t ); S(t )]. The
control action α(t ) is deﬁned by:
α(t )
=
[I (t ); (μ
(c)
ij
(t ))
i,j,c∈{1,...,N}
; (x
m
(t ))
M
m=1
]
representing the resource allocation, transmission, and ﬂowcontrol decisions. The action space A
ω(t )
is deﬁned by the set of all I (t ) ∈ I
S(t )
, all (μ
(c)
ij
(t )) that satisfy (5.57)(5.59), and all (x
m
(t )) that
satisfy 0 ≤ x
m
(t ) ≤ A
m
(t ) for all m ∈ {1, . . . , M}.
Deﬁne x as the time average expectation of the vector x(t ). Our objective is to solve the
following problem:
Maximize: φ(x) (5.60)
Subject to: α(t ) ∈ A
ω(t )
∀t (5.61)
All queues Q
(c)
n
(t ) are mean rate stable (5.62)
where φ(x) =
¸
M
m=1
φ
m
(x
m
) is a continuous, concave, and entrywise nondecreasing utility func
tion.
5.3.3 MULTIHOP NETWORKUTILITY MAXIMIZATION
The rectangle R is deﬁned by all (γ
1
, . . . , γ
M
) vectors such that 0 ≤ γ
m
≤ γ
m,max
. Deﬁne φ
opt
as
the maximumutility for the problem(5.60)(5.62) augmented with the additional constraint x ∈ R.
Because we have not speciﬁed any additional constraints, there are no Z
l
(t ) queues. However, we
have auxiliary variables γ
m
(t ) and virtual queues G
m
(t ) for m ∈ {1, . . . , M}, with update:
G
m
(t +1) = max[G
m
(t ) +γ
m
(t ) −x
m
(t ), 0] (5.63)
The algorithm of Section 5.0.5 is thus:
112 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
• (Auxiliary Variables) For each slot t , each session m ∈ {1, . . . , M} observes the current virtual
queue G
m
(t ) and chooses auxiliary variable γ
m
(t ) to solve:
Maximize: Vφ
m
(γ
m
(t )) −G
m
(t )γ
m
(t ) (5.64)
Subject to: 0 ≤ γ
m
(t ) ≤ γ
m,max
• (Flow Control) For each slot t , each session m observes A
m
(t ) and the queue values G
m
(t ),
Q
(c
m
)
n
m
(t ) (where n
m
denotes the source node of session m, and c
m
represents its destination).
Note that these queues are all local to the source node of the session, and hence they can be
observed easily. It then chooses x
m
(t ) to solve:
Maximize: G
m
(t )x
m
(t ) −Q
(c
m
)
n
m
(t )x
m
(t ) (5.65)
Subject to: 0 ≤ x
m
(t ) ≤ A
m
(t )
This reduces tothe“bangbang” ﬂowcontrol decisionof choosing x
m
(t ) = A
m
(t ) if Q
(c
m
)
n
m
(t ) ≤
G
m
(t ), and x
m
(t ) = 0 otherwise.
• (Resource Allocation andTransmission) For each slot t , the network controller observes queue
backlogs {Q
(c)
n
(t )} and the topology state S(t ) and chooses I (t ) ∈ I
S(t )
and {μ
(c)
ij
(t )} to solve:
Maximize:
¸
n,c
Q
(c)
n
(t )[
¸
N
j=1
μ
(c)
nj
(t ) −
¸
N
i=1
μ
(c)
in
(t )] (5.66)
Subject to: I (t ) ∈ I
S(t )
and (5.57)(5.59)
• (Queue Updates) Update the virtual queues G
m
(t ) according to (5.63) and the actual queues
Q
(c)
n
(t ) according to (5.56).
The resource allocationandtransmissiondecisions that solve (5.66) are describedinSubsection
5.3.4 below. Before covering this, we state the performance of the algorithm under a general C
additive approximation. Assuming that second moments of arrivals and service variables are ﬁnite,
and that ω(t ) is i.i.d. over slots, by Theorem 5.1, we have that all virtual and actual queues are mean
rate stable, and:
liminf
t →∞
φ(x(t )) ≥ φ
opt
−(D +C)/V (5.67)
where D is a constant related to the maximum second moments of arrivals and transmission rates.
The queues Q
(c)
n
(t ) can be shown to be strongly stable with average size O(V) under an additional
Slatertype condition. If the φ
m
(x) functions are bounded with bounded right derivatives, it can be
shown that the queues G
m
(t ) are deterministically bounded. A slight modiﬁcation of the algorithm
that results in a Cadditive approximation can deterministically bound all actual queues by a constant
of size O(V) (38)(42)(153), even without the Slater condition. The theory of Section 4.9 can be
used to show that the same algorithm operates efﬁciently for noni.i.d. trafﬁc and channel processes,
including processes that arise from arbitrary node mobility (38).
5.3. MULTIHOP QUEUEINGNETWORKS 113
5.3.4 BACKPRESSUREBASEDROUTINGANDRESOURCEALLOCATION
By switching the sums in (5.66), it is easy to show that the resource allocation and transmission
maximization reduces to the following generalized “maxweight” and “backpressure” algorithms
(see (7)(22)): Every slot t , choose I (t ) ∈ I
S(t )
to maximize:
N
¸
i=1
N
¸
j=1
b
ij
(I (t ), S(t ))W
ij
(t ) (5.68)
where W
ij
(t ) are weights deﬁned by:
W
ij
(t )
=
max
c∈{1,...,N}
max[W
(c)
ij
(t ), 0] (5.69)
where W
(c)
ij
(t ) are differential backlogs:
W
(c)
ij
(t )
=
Q
(c)
i
(t ) −Q
(c)
j
(t )
The transmission decision variables are then given by:
μ
(c)
ij
(t ) =
¸
b
ij
(I (t ), S(t )) if c = c
∗
ij
(t ) and W
(c)
ij
(t ) ≥ 0
0 otherwise
(5.70)
where c
∗
ij
(t ) is deﬁned as the commodity c ∈ {1, . . . , N} that maximizes the differential backlog
W
(c)
ij
(t ) (breaking ties arbitrarily).
This backpressure approach achieves throughput optimality, but, because it explores all pos
sible routes, may incur large delay. A useful Cadditive approximation that experimentally improves
delay is to combine the queue differential with a shortest path estimate for each link. This is pro
posed in (15) as an enhancement to backpressure routing, and it is shown to perform quite well in
simulations given in (154)(22) ((154) extends to networks with unreliable channels). Related work
that combines shortest paths and backpressure using the driftpluspenalty method is developed in
(155) to treat maximum hop count constraints. A theory of more aggressive placeholder packets
for delay improvement in backpressure is developed in (37), although the algorithm ideally requires
knowledge of Lagrange multiplier information in advance. A related and very simple LastInFirst
Out (LIFO) implementation of backpressure that does not need Lagrange multiplier information
is developed in (54), where experiments on wireless sensor networks show delay improvements by
more than an order of magnitude over FIFO implementations (for all but 2% of the packets) while
preserving efﬁcient throughput (note that LIFO does not change the dynamics of (5.1) or (5.56)).
Analysis of the LIFO rule and its connection to placeholders and Lagrange multipliers is in (55).
114 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
5.4 GENERAL OPTIMIZATIONOF CONVEXFUNCTIONS
OFTIMEAVERAGES
Here we provide a recipe for the following more general problem of optimizing convex functions of
time averages:
Minimize: y
0
+f (x) (5.71)
Subject to: 1) y
l
+g
l
(x) ≤ 0 ∀l ∈ {1, . . . , L} (5.72)
2) x ∈ X ∩ R (5.73)
3) All queues Q
k
(t ) are mean rate stable (5.74)
4) α(t ) ∈ A
ω(t )
∀t (5.75)
where f (x) and g
l
(x) are continuous and convex functions of x ∈ R
M
, X is a closed and convex
subset of R
M
, and Ris an Mdimensional hyperrectangle deﬁned as:
R = {(x
1
, . . . , x
M
) ∈ R
M
γ
m,min
≤ x
m
≤ γ
m,max
∀m ∈ {1, . . . , M}}
where γ
m,min
and γ
m,max
are ﬁnite constants (this rectangle set Ris only added to bound the auxiliary
variables that we use, as in the previous sections).
Let γ (t ) = (γ
1
(t ), . . . , γ
M
(t )) be a vector of auxiliary variables that can be chosen within the
set X ∩ Revery slot t . We transform the problem (5.71)(5.75) to:
Minimize: y
0
+f (γ ) (5.76)
Subject to: 1) y
l
+g
l
(γ ) ≤ 0 ∀l ∈ {1, . . . , L} (5.77)
2) γ
m
= x
m
∀m ∈ {1, . . . , M} (5.78)
3) All queues Q
k
(t ) are mean rate stable (5.79)
4) γ (t ) ∈ X ∩ R ∀t (5.80)
5) α(t ) ∈ A
ω(t )
∀t (5.81)
where we deﬁne:
f (γ )
=
lim
t →∞
1
t
t −1
¸
τ=0
E{f (γ (τ))} , g
l
(γ )
=
lim
t →∞
1
t
t −1
¸
τ=0
E{g
l
(γ (τ))}
It is not difﬁcult to show that this transformed problem is equivalent to the problem (5.71)(5.75),
in that the maximum utility values are the same, and any solution to one can be used to construct a
solution to the other (see Exercise 5.9).
We solve the transformed problem (5.76)(5.81) simply by restating the driftpluspenalty
algorithm for this context. While a variableV implementation can be developed, we focus here on
the ﬁxed V algorithm as speciﬁed in (4.48)(4.49). For each inequality constraint (5.77), deﬁne a
virtual queue Z
l
(t ) with update equation:
Z
l
(t +1) = max[Z
l
(t ) + ˆ y
l
(α(t ), ω(t )) +g
l
(γ (t )), 0] ∀l ∈ {1, . . . , L} (5.82)
5.4. GENERAL OPTIMIZATIONOF CONVEXFUNCTIONS OFTIMEAVERAGES 115
For each equality constraint (5.78), deﬁne a virtual queue H
m
(t ) with update equation:
H
m
(t +1) = H
m
(t ) +γ
m
(t ) − ˆ x
m
(α(t ), ω(t )) ∀m ∈ {1, . . . , M} (5.83)
Deﬁne (t ) = [Q(t ), Z(t ), H(t )]. Assume the boundedness assumptions (4.25)(4.30) hold, and
that ω(t ) is i.i.d. over slots. For the Lyapunov function (4.43), we have the following drift bound:
((t )) +VE{y
0
(t ) +f (γ (t ))(t )} ≤ D +VE{y
0
(t ) +f (γ (t ))(t )}
+
L
¸
l=1
Z
l
(t )E{y
l
(t ) +g
l
(γ (t ))(t )}
+
K
¸
k=1
Q
k
(t )E{a
k
(t ) −b
k
(t )(t )}
+
M
¸
m=1
H
m
(t )E{γ
m
(t ) −x
m
(t )(t )} (5.84)
where D is a ﬁnite constant related to the worst case second moments of the arrival, service, and
attribute vectors. Now deﬁne a Cadditive approximation as any algorithm for choosing γ (t ) ∈
X ∩ Rand α(t ) ∈ A
ω(t )
every slot t that, subject to a given (t ), yields a righthandside in (5.84)
that is within a distance C from its inﬁmum value.
Theorem 5.3 (Algorithm Performance) Suppose the boundedness assumptions (4.25)(4.30) hold, the
problem(5.71)(5.75) is feasible, and E{L((0))} < ∞. Suppose the functions f (γ ) and g
l
(γ ) are upper
and lower bounded by ﬁnite constants over γ ∈ X ∩ R. If ω(t ) is i.i.d. over slots and any Cadditive
approximation is used every slot, then:
limsup
t →∞
¸
y
0
(t ) +f (x(t ))
¸
≤ y
opt
0
+f
opt
+
D +C
V
(5.85)
where y
opt
0
+f
opt
represents the inﬁmumcost metric of the problem(5.71)(5.75) over all feasible policies.
Further, all actual and virtual queues are mean rate stable, and:
limsup
t →∞
¸
y
l
(t ) +g
l
(x(t ))
¸
≤ 0 ∀l ∈ {1, . . . , L} (5.86)
lim
t →∞
dist (x(t ), X ∩ R) = 0 (5.87)
where dist(x(t ), X ∩ R) represents the distance between the vector x(t ) and the set X ∩ R, being zero
if and only if x(t ) is in the (closed) set X ∩ R.
Proof. See Exercise 5.10. 2
As before, an O(V) backlog bound can also be derived under a Slater assumption.
116 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
5.5 NONCONVEXSTOCHASTICOPTIMIZATION
Consider now the problem:
Minimize: f (x) (5.88)
Subject to: y
l
≤ 0 ∀l ∈ {1, . . . , L} (5.89)
α(t ) ∈ A
ω(t )
(5.90)
All queues Q
k
(t ) are mean rate stable (5.91)
where f (x) is a possibly nonconvex function that is assumed to be continuously differentiable with
upper and lower bounds f
min
and f
max
, and with partial derivatives ∂f (x)/∂x
m
having bounded
magnitudes ν
m
≥ 0. Applications of such problems include throughpututility maximization with
f (x) given by −1 times a sum of nonconcave “sigmoidal” functions that give low utility until
throughput exceeds a certain threshold (see Fig. 5.1). Such problems are treated in a nonstochastic
(static) network optimization setting in (156)(157). A related utilityproportional fairness objective
is studied for static networks in (158), which treats a convex optimization problemthat has a fairness
interpretation with respect to a nonconcave utility function. The stochastic problemwe present here
is developed in (43). An application to risk management in network economics is given in Exercise
5.11.
Utility(x)
Attribute x (such as throughput)
Figure 5.1: An example nonconcave utility function of a time average attribute.
Performing such a general nonconvex optimization is, in some cases, as hard as combinatorial
binpacking, and so we do not expect to ﬁnd a global optimum. Rather, we seek an algorithm that
satisﬁes the constraints (5.89)(5.91) and that yields a local optimum of f (x).
We use the driftpluspenalty framework with the same virtual queues as before:
Z
l
(t +1) = max[Z
l
(t ) + ˆ y
l
(α(t ), ω(t )), 0] (5.92)
The actual queues Q
k
(t ) are assumed to satisfy (5.1). Deﬁne (t )
=
[Q(t ), Z(t ), x
av
(t )], where
x
av
(t ) is deﬁned as an empirical running time average of the attribute vector:
x
av
(t )
=
¸
1
t
¸
t −1
τ=0
x
m
(τ) if t > 0
ˆ x
m
(α(−1), ω(−1)) if t = 0
5.5. NONCONVEXSTOCHASTICOPTIMIZATION 117
where ˆ x
m
(α(−1), ω(−1)) can be viewed as an initial sample taken at time “t = −1” before the
network implementation begins. Deﬁne L((t ))
=
1
2
[
¸
K
k=1
Q
k
(t )
2
+
¸
L
l=1
Z
l
(t )
2
]. Assume ω(t )
is i.i.d. over slots. We thus have:
((t )) +VE{Penalty(t )(t )} ≤ D +VE{Penalty(t )(t )}
+
K
¸
k=1
Q
k
(t )E
¸
ˆ a
k
(α(t ), ω(t )) −
ˆ
b
k
(α(t ), ω(t ))(t )
¸
+
L
¸
l=1
Z
l
(t )E
¸
ˆ y
l
(α(t ), ω(t ))(t )
¸
(5.93)
The penalty we use is:
Penalty(t )
=
M
¸
m=1
ˆ x
m
(α(t ), ω(t ))
∂f (x
av
(t ))
∂x
m
Below we state the performance of the algorithm that observes queue backlogs every slot t and
takes an action α(t ) ∈ A
ω(t )
that comes within C of minimizing the righthandside of the drift
expression (5.93).
Theorem5.4 (NonConvex Stochastic Network Optimization (43)) Suppose ω(t ) is i.i.d. over slots, the
boundedness assumptions (4.25)(4.28) hold, the function f (x) is bounded and continuously differentiable
with partial derivatives bounded in magnitude by ﬁnite constants ν
m
≥ 0, and the problem (5.88)(5.91)
is feasible. For simplicity, assume that (0) = 0. For any V ≥ 0, and for any Cadditive approximation
of the above algorithm that is implemented every slot, we have:
(a) All queues Q
k
(t ) and Z
l
(t ) are mean rate stable and:
limsup
t →∞
y
l
(t ) ≤ 0 ∀l ∈ {1, . . . , L}
(b) For all t > 0 and for any alternative vector x
∗
that can be achieved as the time average of a
policy that makes all queues mean rate stable and satisﬁes all required constraints, we have:
1
t
t −1
¸
τ=0
M
¸
m=1
E
¸
x
m
(τ)∂f (x
av
(τ))
∂x
m
¸
≤
1
t
t −1
¸
τ=0
M
¸
m=1
x
∗
m
E
¸
∂f (x
av
(τ))
∂x
m
¸
+
D +C
V
where D is a ﬁnite constant related to second moments of the a
k
(t ), b
k
(t ), y
l
(t ) processes.
c) If all time averages converge, so that there is a constant vector x such that x
av
(t ) →x with
probability 1 and x(t ) →x, then the achieved limit is a near local optimum, in the sense that for any
alternative vector x
∗
that can be achieved as the time average of a policy that makes all queues mean rate
stable and satisﬁes all required constraints, we have:
M
¸
m=1
(x
∗
m
−x
m
)
∂f (x)
∂x
m
≥ −
D +C
V
118 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
d) Suppose there is an > 0 and an ωonly policy α
∗
(t ) such that:
E
¸
ˆ y
l
(α
∗
(t ), ω(t ))
¸
≤ 0 ∀l ∈ {1, . . . , L} (5.94)
E
¸
ˆ a
k
(α
∗
(t ), ω(t )) −
ˆ
b
k
(α
∗
(t ), ω(t ))
¸
≤ − ∀k ∈ {1, . . . , K} (5.95)
Then all queues Q
k
(t ) are strongly stable with average size O(V).
e) Suppose we use a variable V(t ) algorithmwith V(t )
=
V
0
· (1 +t )
d
for V
0
> 0 and 0 < d < 1,
and use any Cadditive approximation (where C is constant for all t ). Then all virtual and actual queues
are mean rate stable (and so all constraints y
l
≤ 0 are satisﬁed), and under the convergence assumptions of
part (c), the limiting x is a local optimum, in that:
M
¸
m=1
(x
∗
m
−x
m
)
∂f (x)
∂x
m
≥ 0
where x
∗
is any alternative vector as speciﬁed in part (c).
That the inequality guarantee in part (e) demonstrates local optimality can be understood as
follows: Suppose we start at our achieved time average attribute vector x, and we want to shift this
in any feasible direction by moving towards another feasible vector x
∗
by an amount (for some
> 0). Then:
f
x +(x
∗
−x)
≈ f (x) +
M
¸
m=1
(x
∗
m
−x
m
)
∂f (x)
∂x
m
≥ f (x)
Hence, the new cost achieved by taking a small step in any feasible direction is no less than the cost
f (x) that we are already achieving. More precisely, the change in cost
cost
() satisﬁes:
lim
→0
cost
()
≥ 0
Proof. (Theorem5.4) Our proof uses the same driftpluspenalty technique as described in previous
sections. Analogous to Theorem 4.5, it can be shown that for any x
∗
= (x
∗
1
, . . . , x
∗
M
) that is a limit
point of x(t ) under any policy that makes all queues mean rate stable and satisﬁes all constraints,
and for any δ > 0, there exists an ωonly policy α
∗
(t ) such that (43):
E
¸
ˆ y
l
(α
∗
(t ), ω(t ))
¸
≤ δ ∀l ∈ {1, . . . , L}
E
¸
ˆ a
k
(α
∗
(t ), ω(t )) −
ˆ
b
k
(α
∗
(t ), ω(t ))
¸
≤ δ ∀k ∈ {1, . . . , K}
dist(E
¸
ˆ x(α
∗
(t ), ω(t ))
¸
, x
∗
) ≤ δ
For simplicity of the proof, assume the above holds with δ = 0. Plugging the above into the right
handside of (5.93) with δ = 0 yields:
6
((t )) +VE
¸
M
¸
m=1
ˆ x
m
(α(t ), ω(t ))
∂f (x
av
(t ))
∂x
m
(t )
¸
≤ D +C +V
M
¸
m=1
x
∗
m
∂f (x
av
(t ))
∂x
m
6
The same result can be derived by plugging in with δ > 0 and then taking a limit as δ → 0.
5.5. NONCONVEXSTOCHASTICOPTIMIZATION 119
Taking expectations of the above drift bound (using the law of iterated expectations), summing the
telescoping series over τ ∈ {0, 1, . . . , t −1}, and dividing by Vt immediately yields the result of
part (b).
On the other hand, this drift expression can also be rearranged as:
((t )) ≤ D +C +V
M
¸
m=1
ν
m
(x
∗
m
−x
m,min
)
where x
m,min
is a bound on the expectation of x
m
(t ) under any policy, known to exist by the
boundedness assumptions. Hence, the drift is less than or equal to a ﬁnite constant, and so by
Theorem 4.2, we know all queues are mean rate stable, proving part (a). The proof of part (d) follows
similarly by plugging in the policy α
∗
(t ) of (5.94)(5.95).
The proof of part (c) follows by taking a limit of the result in part (b), where the limits can be
pushed through by the boundedness assumptions and the continuity assumption on the derivatives
of f (x). The proof of part (e) is similar to that of Theorem 4.9 and is omitted for brevity. 2
Using a penalty given by partial derivatives of the function evaluated at the empirical average
attribute vector can be viewed as a “primaldual” operation that differs fromour “puredual” approach
for convex problems. Such a primaldual approach was ﬁrst used in context of convex network utility
maximization problems in (32)(33)(34). Speciﬁcally, the work (32)(33) used a partial derivative
evaluated at the time average x
av
(t ) to maximize a concave function of throughput in a multiuser
wireless downlink with time varying channels. However, the system in (32)(33) assumed inﬁnite
backlog in all queues (similar to Exercise 5.6), so that there were no queue stability constraints.
This was extended in (34) to consider the primaldual technique for joint stability and performance
optimization, again for convex problems, but using an exponential weighted average, rather than a
running time average x
av
(t ). There, it was shown that a related “ﬂuid limit” of the system has an
optimal utility, and that this limit is “weakly” approached under appropriately scaled systems. It was
also conjectured in (34) that the actual network will have utility that is close to this ﬂuid limit as a
parameter β related to the exponential weighting is scaled (see Section 4.9 in (34)). However, the
analysis does not specify the size of β needed to achieve a nearoptimal utility. Recent work in (36)
considers related primaldual updates for convex problems, and it shows the long term utility of the
actual network is close to optimal as a parameter is scaled.
For the special case of convex problems, Theorem 5.4 above shows that, if the algorithm is
assumed to converge to well deﬁned time averages, and if we use a running time average x
av
(t )
rather than an exponential average, the primaldual algorithm achieves a similar [O(1/V), O(V)]
performancecongestion tradeoff as the dual algorithm. Unfortunately, it is not clear how long the
system must run to approach convergence. The pure dual algorithm seems to provide stronger
analytical guarantees for convex problems because: (i) It does not need a running time average
x
av
(t ) and hence can be shown to be robust to changes in system parameters (as in Section 4.9
and (42)(38)(17)), (ii) It does not require additional assumptions about convergence, (iii) It provides
results for all t > 0 that show how long we must run the system to be close to the inﬁnite horizon
120 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
limit guarantees. However, if one applies the pure dual technique with a nonconvex cost function
f (x), one would get a global optimum of the time average f (x), which may not even be a local
optimum of f (x). This is where the primaldual technique shows its real potential, as it can achieve
a local optimum for nonconvex problems.
5.6 WORSTCASEDELAY
Here we extend the utility optimization framework to enable O(V) tradeoffs in worst case
delay. Related problems are treated in (76)(159). Consider a 1hop network with K queues
Q(t ) = (Q
1
(t ), . . . , Q
K
(t )). In addition to these queues, we keep transport layer queues L(t ) =
(L
1
(t ), . . . , L
K
(t )), where L
k
(t ) stores incoming data before it is admitted to the network layer
queue Q
k
(t ) (as in (17)). Let ω(t ) = [A(t ), S(t )], where A(t ) = (A
1
(t ), . . . , A
K
(t )) is a vector of
new arrivals to the transport layer, and S(t ) = (S
1
(t ), . . . , S
K
(t )) is a vector of channel conditions
that affect transmission. Assume that ω(t ) is i.i.d. over slots.
Every slot t , choose admission variables a(t ) = (a
1
(t ), . . . , a
K
(t )) subject to the constraints:
0 ≤ a
k
(t ) ≤ min[L
k
(t ) +A
k
(t ), A
max
] (5.96)
where A
max
is a ﬁnite constant. This means that a
k
(t ) is chosen from the L
k
(t ) +A
k
(t ) amount of
data available on slot t , and is no more than A
max
per slot (which limits the amount we can send
into the network layer). It is assumed that A
k
(t ) ≤ A
max
for all k and all t . Newly arriving data
A
k
(t ) that is not immediately admitted into the network layer is stored in the transport layer queue
L
k
(t ). The controller also chooses a channelaware transmission decision I (t ) ∈ I
S(t )
, where I
S(t )
is
an abstract set that deﬁnes transmission options under channel state S(t ). The transmission rates
are given by deterministic functions of I (t ) and S(t ):
b
k
(t ) =
ˆ
b
k
(I (t ), S(t ))
Second moments of b
k
(t ) are assumed to be uniformly bounded.
In addition, deﬁne packet drop decisions d(t ) = (d
1
(t ), . . . , d
K
(t )). These allowpackets already
admitted to the network layer queues Q
k
(t ) to be dropped if their delay is too large. Drop decisions
d
k
(t ) are chosen subject to the constraints:
0 ≤ d
k
(t ) ≤ A
max
The resulting queue update equation is thus:
Q
k
(t +1) = max[Q
k
(t ) −b
k
(t ) −d
k
(t ), 0] +a
k
(t ) ∀k ∈ {1, . . . , K} (5.97)
For each k ∈ {1, . . . , K}, let φ
k
(a) be a continuous, concave, and nondecreasing utility func
tion deﬁned over the interval 0 ≤ a ≤ A
max
. Let ν
k
be the maximum rightderivative of φ
k
(a)
(which occurs at a = 0), and assume ν
k
< ∞. Example utility functions that have this form are:
φ
k
(a) = log(1 +ν
k
a)
5.6. WORSTCASEDELAY 121
where log(·) denotes the natural logarithm. We desire a solution to the following problem, deﬁned
in terms of a parameter > 0:
Maximize:
K
¸
k=1
φ
k
(a
k
) −
K
¸
k=1
βν
k
d
k
(5.98)
Subject to: All queues Q
k
(t ) are mean rate stable (5.99)
b
k
≥ ∀k ∈ {1, . . . , K} (5.100)
0 ≤ a
k
(t ) ≤ A
k
(t ) ∀k ∈ {1, . . . , K}, ∀t (5.101)
I (t ) ∈ I
S(t )
∀k ∈ {1, . . . , K}, ∀t (5.102)
where β is a constant that satisﬁes 1 ≤ β < ∞. This problem does not specify anything about
worstcase delay, but we soon develop an algorithm with worst case delay of O(V) that comes
within O(1/V) of optimizing the utility associated with the above problem (5.98)(5.102). Note
the following:
• The constraint (5.101) is different fromthe constraint (5.96).Thus, the less stringent constraint
(5.96) is used for the actual algorithm, but performance is measured with respect to the
optimum utility achievable in the problem (5.98)(5.102). It turns out that optimal utility is
the same with either constraint (5.101) or (5.96), and in particular, it is the same if there are
no transport layer queues, so that L
k
(t ) = 0 for all t and all data is either admitted or dropped
upon arrival. We include the L
k
(t ) queues as they are useful in situations where it is preferable
to store data for later transmission than to drop it.
• An optimal solution to (5.98)(5.102) has d
k
= 0 for all k. That is, the objective (5.98) can
equivalently be replaced by the objective of maximizing
¸
K
k=1
φ
k
(a
k
) and by adding the
constraint d
k
= 0 for all k. This is because the penalty for dropping is βν
k
, which is greater
than or equal to the largest derivative of the utility function φ
k
(a). Thus, it can be shown
that it is always better to restrict data at the transport layer rather than admitting it and later
dropping it. We recommend choosing β such that 1 ≤ β ≤ 2. A larger value of β will trade
packet drops at the network layer for packet nonadmissions at the ﬂow controller.
• The constraint (5.100) requires each queue to transmit with a timeaverage rate of at least .
This constraint ensures all queues are getting at least a minimum rate of service. If the input
rate E{A
k
(t )} is less than , then this constraint is wasteful. However, we shall not enforce
this constraint. Rather, we simply measure utility of our system with respect to the optimal
utility of the problem (5.98)(5.102), which includes this constraint. It is assumed throughout
that this constraint is feasible, and so the problem (5.98)(5.102) is feasible. If one prefers to
enforce constraint (5.100), this is easily done with an appropriate virtual queue.
122 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
5.6.1 THEPERSISTENTSERVICEQUEUE
To ensure worstcase delay is bounded, we deﬁne an persistent service queue, being a virtual queue
Z
k
(t ) for each k ∈ {1, . . . , K} with Z
k
(0) = 0 and with dynamics:
Z
k
(t +1) =
¸
max[Z
k
(t ) −b
k
(t ) −d
k
(t ) +, 0] if Q
k
(t ) > b
k
(t ) +d
k
(t )
0 if Q
k
(t ) ≤ b
k
(t ) +d
k
(t )
(5.103)
where > 0. We assume throughout that ≤ A
max
. The condition Q
k
(t ) ≤ b
k
(t ) +d
k
(t ) is satis
ﬁed whenever the backlog Q
k
(t ) is cleared (by service and/or drops) on slot t . If this constraint is
not active, then Z
k
(t ) has a departure process that is the same as Q
k
(t ), but it has an arrival of size
every slot. The size of the queue Z
k
(t ) can provide a bound on the delay of the headofline data in
queue Q
k
(t ) in a ﬁrstinﬁrstout (FIFO) system. This is similar to (76) (where explicit delays are
kept for each packet) and (159) (which uses a slightly different update). If a scheduling algorithm
is used that ensures Z
k
(t ) ≤ Z
k,max
and Q
k
(t ) ≤ Q
k,max
for all t , for some ﬁnite constants Z
k,max
and Q
k,max
, then worstcase delay is also bounded, as shown in the following lemma:
Lemma 5.5 Suppose Q
k
(t ) and Z
k
(t ) evolve according to (5.97) and (5.103), and that an algorithm
is used that ensures Q
k
(t ) ≤ Q
k,max
and Z
k
(t ) ≤ Z
k,max
for all slots t ∈ {0, 1, 2, . . .}. Assume service
and drops are done in FIFOorder. Then the worstcase delay of all nondropped data in queue k is W
k,max
,
deﬁned:
W
k,max
=
(Q
k,max
+Z
k,max
)/ (5.104)
Proof. Fix a slot t . We show that all arrivals a(t ) are either served or dropped on or before slot
t +W
k,max
. Suppose this is not true. We reach a contradiction. Note by (5.97) that arrivals a(t ) are
added to the queue backlog Q
k
(t +1) and are ﬁrst available for service on slot t +1. It must be
that Q
k
(τ) > b
k
(τ) +d
k
(τ) for all τ ∈ {t +1, . . . , t +W
k,max
} (else, the backlog on slot τ would
be cleared). Therefore, by (5.103), we have for all slots τ ∈ {t +1, . . . , t +W
k,max
}:
Z
k
(τ +1) = max[Z
k
(τ) −b
k
(τ) −d
k
(τ) +, 0]
In particular, for all slots τ ∈ {t +1, . . . , t +W
k,max
}:
Z
k
(τ +1) ≥ Z
k
(τ) −b
k
(τ) −d
k
(τ) +
Summing the above over τ ∈ {t +1, . . . , t +W
k,max
} yields:
Z
k
(t +W
k,max
+1) −Z
k
(t +1) ≥ −
t +W
k,max
¸
τ=t +1
[b
k
(τ) +d
k
(τ)] +W
k,max
5.6. WORSTCASEDELAY 123
Rearranging terms in the above inequality and using the fact that Z
k
(t +1) ≥ 0 and Z
k
(t +
W
k,max
+1) ≤ Z
k,max
yields:
W
k,max
≤
t +W
k,max
¸
τ=t +1
[b
k
(τ) +d
k
(τ)] +Z
k,max
(5.105)
On the other hand, the sum of b
k
(τ) +d
k
(τ) over the interval τ ∈ {t +1, . . . , t +W
k,max
} must
be strictly less than Q
k
(t +1) (else, by the FIFO service, all data a(t ), which is included at the end
of the backlog Q
k
(t +1), would have been cleared during this interval). Thus:
t +W
k,max
¸
τ=t +1
[b
k
(τ) +d
k
(τ)] < Q
k
(t +1) ≤ Q
k,max
(5.106)
Combining (5.106) and (5.105) yields:
W
k,max
< Q
k,max
+Z
k,max
which implies:
W
k,max
< (Q
k,max
+Z
k,max
)/
This contradicts (5.104), proving the result. 2
5.6.2 THEDRIFTPLUSPENALTY FORWORSTCASEDELAY
As usual, we transform the problem (5.98)(5.102) using auxiliary variables γ (t ) =
(γ
1
(t ), . . . , γ
K
(t )) by:
Maximize:
K
¸
k=1
φ
k
(γ
k
) −
K
¸
k=1
βν
k
d
k
(5.107)
Subject to: a
k
≥ γ
k
∀k ∈ {1, . . . , K} (5.108)
All queues Q
k
(t ) are mean rate stable (5.109)
b
k
≥ ∀k ∈ {1, . . . , K} (5.110)
0 ≤ γ
k
(t ) ≤ A
max
∀k ∈ {1, . . . , K} (5.111)
0 ≤ a
k
(t ) ≤ A
k
(t ) ∀k ∈ {1, . . . , K} (5.112)
I (t ) ∈ I
S(t )
∀k ∈ {1, . . . , K} (5.113)
To enforce the constraints (5.108), deﬁne virtual queues G
k
(t ) by:
G
k
(t +1) = max[G
k
(t ) −a
k
(t ) +γ
k
(t ), 0] (5.114)
Nowdeﬁne (t )
=
[Q(t ), Z(t ), G(t )] as the combined queue vector, and deﬁne the Lyapunov
function L((t )) by:
L((t ))
=
1
2
K
¸
k=1
[Q
k
(t )
2
+Z
k
(t )
2
+G
k
(t )
2
]
124 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
Using the fact that Z
k
(t +1) ≤ max[Z
k
(t ) −b
k
(t ) −d
k
(t ) +, 0], it can be shown (as usual) that
the Lyapunov drift satisﬁes:
((t )) −VE
¸
K
¸
k=1
[φ
k
(γ
k
(t )) −βν
k
d
k
(t )](t )
¸
≤ B
−VE
¸
K
¸
k=1
[φ
k
(γ
k
(t )) −βν
k
d
k
(t )](t )
¸
+
K
¸
k=1
Z
k
(t )E
¸
−
ˆ
b
k
(I (t ), S(t )) −d
k
(t )(t )
¸
+
K
¸
k=1
Q
k
(t )E
¸
a
k
(t ) −
ˆ
b
k
(I (t ), S(t )) −d
k
(t )(t )
¸
+
K
¸
k=1
G
k
(t )E{γ
k
(t ) −a
k
(t )(t )} (5.115)
where B is a constant that satisﬁes:
B ≥
1
2
K
¸
k=1
[E
¸
( −b
k
(t ) −d
k
(t ))
2
(t )
¸
+
1
2
K
¸
k=1
E
¸
a
k
(t )
2
+(b
k
(t ) −d
k
(t ))
2
+(γ
k
(t ) −a
k
(t ))
2
(t )
¸
(5.116)
Such a constant B exists by the boundedness assumptions on the processes.
The algorithmthat minimizes the righthandside of (5.115) thus observes Z(t ), Q(t ), G(t ),
S(t ) every slot t , and does the following:
• (Auxiliary Variables) For each k ∈ {1, . . . , K}, choose γ
k
(t ) to solve:
Maximize: Vφ
k
(γ
k
(t )) −G
k
(t )γ
k
(t ) (5.117)
Subject to: 0 ≤ γ
k
(t ) ≤ A
max
(5.118)
• (Flow Control) For each k ∈ {1, . . . , K}, choose a
k
(t ) by:
a
k
(t ) =
¸
min[L
k
(t ) +A
k
(t ), A
max
] if Q
k
(t ) ≤ G
k
(t )
0 if Q
k
(t ) > G
k
(t )
(5.119)
• (Transmission) Choose I (t ) ∈ I
S(t )
to maximize:
K
¸
k=1
[Q
k
(t ) +Z
k
(t )]
ˆ
b
k
(I (t ), S(t )) (5.120)
5.6. WORSTCASEDELAY 125
• (Packet Drops) For each k ∈ {1, . . . , K}, choose d
k
(t ) by:
d
k
(t ) =
¸
A
max
if Q
k
(t ) +Z
k
(t ) > βVν
k
0 if Q
k
(t ) +Z
k
(t ) ≤ βVν
k
(5.121)
• (Queue Update) Update Q
k
(t ), Z
k
(t ), G
k
(t ) by (5.97), (5.103), (5.114).
In some cases, the above algorithm may choose a drop variable d
k
(t ) such that Q
k
(t ) <
b
k
(t ) +d
k
(t ). In this case, all queue updates are kept the same (so the algorithm is unchanged), but
it is useful to ﬁrst transmit data with offered rate b
k
(t ) on slot t , and then drop only what remains.
5.6.3 ALGORITHMPERFORMANCE
Deﬁne Z
k,max
and Q
k,max
as follows:
Z
k,max
=
βVν
k
+ (5.122)
Q
k,max
=
min[βVν
k
+A
max
, Vν
k
+2A
max
] (5.123)
G
k,max
=
Vν
k
+A
max
(5.124)
Theorem5.6 If ≤ A
max
, then for arbitrary sample paths the above algorithm ensures:
Z
k
(t ) ≤ Z
k,max
, Q
k
(t ) ≤ Q
k,max
, G
k
(t ) ≤ G
k,max
∀t
where Z
k,max
, Q
k,max
, G
k,max
are deﬁned in (5.122)(5.124), provided that these inequalities hold for
t = 0. Thus, worstcase delay W
k,max
is given by:
W
k,max
=
(Z
k,max
+Q
k,max
)/ = O(V)
Proof. That G
k
(t ) ≤ G
k,max
for all t follows by an argument similar to that given in Section 5.2.1,
showing that the auxiliary variable update (5.117)(5.118) chooses γ
k
(t ) = 0 whenever G
k
(t ) >
Vν
k
.
To showthe Q
k,max
bound, it is clear that the packet dropdecision(5.121) yields d
k
(t ) = A
max
whenever Q
k
(t ) > βVν
k
. Because a
k
(t ) ≤ A
max
, the arrivals are less than or equal to the offered
drops whenever Q
k
(t ) > βVν
k
, and so Q
k
(t ) ≤ βVν
k
+A
max
for all t . However, we also see that
if Q
k
(t ) > G
k,max
, then the ﬂow control decision will choose a
k
(t ) = 0, and so Q
k
(t ) also cannot
increase. It follows that Q
k
(t ) ≤ G
k,max
+A
max
for all t . This proves the Q
k,max
bound. The Z
k,max
bound is proven similarly. The worstcasedelay result then follows immediately from Lemma 5.5.
2
126 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
The above theorem only uses the fact that packet drops d
k
(t ) take place according to the
rule (5.121), ﬂow control decisions a
k
(t ) take place according to the rule (5.119), and auxiliary
variable decisions satisfy γ
k
(t ) = 0 whenever G
k
(t ) > Vν
k
(a property of the solution to (5.117)
(5.118)).The fact that γ
k
(t ) = 0 whenever G
k
(t ) > Vν
k
canbe hardwiredinto the auxiliary variable
decisions, even when they are chosen to approximately solve (5.117)(5.118) otherwise. Further, the
I (t ) decisions can be arbitrary and are not necessarily those that maximize (5.120). The next theorem
holds for any Cadditive approximation for minimizing the righthandside of (5.115) that preserves
the above basic properties. A 0additive approximation performs the exact algorithm given above.
Theorem 5.7 Suppose ω(t ) is i.i.d. over slots and any Cadditive approximation for minimizing
the righthandside of (5.115) is used such that (5.121), (5.119) hold exactly, and γ
k
(t ) = 0 whenever
G
k
(t ) > Vν
k
. Suppose Q
k
(0) ≤ Q
k,max
, Z
k
(0) ≤ Z
k,max
, G
k
(0) ≤ G
k,max
for all k, and ≤ A
max
.
Then the worstcase queue backlog and delay bounds given in Theorem 5.6 hold, and achieved utility
satisﬁes:
liminf
t →∞
¸
¸
K
k=1
φ
k
(a
k
(t )) −
¸
K
k=1
βd
k
(t )
¸
≥ φ
∗
−B/V
where B is deﬁned in (5.116), a
k
(t ) and d
k
(t ) are deﬁned:
a
k
(t )
=
1
t
¸
t −1
τ=0
E{a
k
(τ)} , d
k
(t )
=
1
t
¸
t −1
τ=0
E{d
k
(τ)}
and where φ
∗
is the optimal utility associated with the problem (5.98)(5.102).
The theoremrelies onthe following fact, whichcanbe provenusingTheorem4.5: For all δ > 0,
there exists a vector γ
∗
= (γ
∗
1
, . . . , γ
∗
K
) andanωonly policy [a
∗
(t ), I
∗
(t ), d
∗
(t )] that chooses a
∗
(t )
as a random function of A(t ), I
∗
(t ) as a random function of S(t ), and d
∗
(t ) = 0 (so that it does
not drop any data) such that:
K
¸
k=1
φ
k
(γ
∗
k
) = φ
∗
(5.125)
E
¸
a
∗
k
(t )
¸
= γ
∗
k
∀k ∈ {1, . . . , K} (5.126)
E
¸
ˆ
b
k
(I
∗
(t ), S(t ))
¸
≥ −δ ∀k ∈ {1, . . . , K} (5.127)
E
¸
ˆ
b
k
(I
∗
(t ), S(t ))
¸
≥ E
¸
a
∗
k
(t )
¸
−δ ∀k ∈ {1, . . . , K} (5.128)
I
∗
(t ) ∈ I
S(t )
, 0 ≤ γ
∗
k
≤ A
max
, 0 ≤ a
∗
k
(t ) ≤ A
k
(t ) ∀k ∈ {1, . . . , K}, ∀t (5.129)
where φ
∗
is the optimal utility associated with the problem (5.98)(5.102).
5.6. WORSTCASEDELAY 127
Proof. (Theorem 5.7) The Cadditive approximation ensures by (5.115):
((t )) −VE
¸
K
¸
k=1
[φ
k
(γ
k
(t )) −βν
k
d
k
(t )](t )
¸
≤ B +C
−VE
¸
K
¸
k=1
[φ
k
(γ
∗
k
) −βν
k
d
∗
k
(t )](t )
¸
+
K
¸
k=1
Z
k
(t )E
¸
−
ˆ
b
k
(I
∗
(t ), S(t )) −d
∗
k
(t )(t )
¸
+
K
¸
k=1
Q
k
(t )E
¸
a
∗
k
(t ) −
ˆ
b
k
(I
∗
(t ), S(t )) −d
∗
k
(t )(t )
¸
+
K
¸
k=1
G
k
(t )E
¸
γ
∗
k
(t ) −a
∗
k
(t )(t )
¸
where d
∗
(t ), I
∗
(t ), a
∗
(t ) are any alternative decisions that satisfy I
∗
(t ) ∈ I
S(t )
, 0 ≤ d
∗
k
(t ) ≤ A
max
,
and 0 ≤ a
∗
k
(t ) ≤ min[L
k
(t ) +A
k
(t ), A
max
] for all k ∈ {1, . . . , K} and all t . Substituting the ωonly
policy from (5.125)(5.129) in the righthandside of the above inequality and taking δ → 0 yields:
((t )) −VE
¸
K
¸
k=1
[φ
k
(γ
k
(t )) −βν
k
d
k
(t )](t )
¸
≤ B +C −Vφ
∗
Using iterated expectations and telescoping sums as usual yields for all t > 0:
1
t
t −1
¸
τ=0
E
¸
K
¸
k=1
[φ
k
(γ
k
(τ)) −βν
k
d
k
(τ)]
¸
≥ φ
∗
−(B +C)/V −E{L((0))} /(Vt )
Using Jensen’s inequality for the concave functions φ
k
(γ ) yields for all t > 0:
K
¸
k=1
[φ
k
(γ
k
(t )) −d
k
(t )] ≥ φ
∗
−(B +C)/V −E{L((0))} /(Vt ) (5.130)
However, because G
k
(t ) ≤ G
k,max
for all t , it is easy to show (via (5.114) and (2.5)) that for all k
and all slots t > 0:
a
k
(t ) ≥ max[γ
k
(t ) −G
k,max
/t, 0]
Therefore, since φ
k
(γ ) is continuous and nondecreasing, it can be shown:
liminf
t →∞
K
¸
k=1
[φ
k
(a
k
(t )) −d
k
(t )] ≥ liminf
t →∞
K
¸
k=1
[φ
k
(γ
k
(t )) −d
k
(t )]
Using this in (5.130) proves the result. 2
128 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
Because the network layer packet drops d
k
(t ) are inefﬁcient, it can be shown that:
limsup
t →∞
K
¸
k=1
ν
k
d
k
(t ) ≤
B +C
V(β −1)
+
(φ
∗
=0
−φ
∗
)
β −1
where φ
∗
is the optimal solution to (5.98)(5.102) for the given > 0, and φ
∗
=0
is the solution to
(5.98)(5.102) with = 0 (which removes constraint (5.100)). Thus, if φ
∗
= φ
∗
=0
, network layer
drops can be made arbitrarily small by either increasing β or V.
7
The above analysis allows for an arbitrary operation of the transport layer queues L
k
(t ).
Indeed, the above theorems only assume that L
k
(t ) ≥ 0 for all t . Thus, as in (17), these can have
either inﬁnite buffer space, ﬁnite buffer space, or 0 buffer space. With 0 buffer space, all data that is
not immediately admitted to the network layer is dropped.
5.7 ALTERNATIVEFAIRNESS METRICS
One type of fairness used in the literature is the socalled maxmin fairness (see, for example,
(129)(3)(5)(6)). Let (x
1
, . . . , x
M
) represent average throughputs achieved by users {1, . . . , M} un
der some stabilizing control algorithm, and let denote the set of all possible (x
1
, . . . , x
M
) vectors.
A vector (x
1
, . . . , x
M
) ∈ is maxmin fair if:
• It maximizes the lowest entry of (x
1
, . . . , x
M
) over all possible vectors in .
• It maximizes the second lowest entry over all vectors in that satisfy the above condition.
• It maximizes the third lowest entry over all vectors in that satisfy the above two conditions,
and so on.
This can be viewed as a sequence of nested optimizations, much different from the utility opti
mization framework treated in this chapter. For ﬂowbased networks with capacitated links, one can
reach a maxmin fair allocation by starting from 0 and gradually increasing all ﬂows equally until a
bottleneck link is found, then increasing all nonbottlenecked ﬂows equally, and so on (see Chapter
6.5.2 in (129)). A tokenbased scheduling scheme is developed in (160) for achieving maxmin
fairness in onehop wireless networks on graphs with link selections deﬁned by matchings.
One can approximate maxmin fairness using a concave utility function in a network with
capacitated links. Indeed, it is shown in (3) that optimizing a sum of concave functions of the form
g
α
(x) =
−1
x
α
approaches a maxmin fair point as α → ∞. It is likely that such an approach also holds
for more general wireless networks with transmission rate allocation and scheduling. However, such
functions are nonsingular at x = 0 (preventing worstcase backlog bounds as in Exercises 5.65.7),
7
If b
k
≥ for all k then the ﬁnal term (φ
∗
=0
−φ
∗
)/(β −1) can be removed. Alternatively, if virtual queues H
k
(t +
1) = max[H
k
(t ) −μ
k
(t ) +, 0] are added to enforce these constraints, then limsup
t →∞
[ν
1
d
1
(t ) +. . . +ν
K
d
K
(t )] ≤ (
˜
B +
C)/(V(β −1)), where
˜
B adds second moment terms (μ
k
(t ) −)
2
to (5.116).
5.8. EXERCISES 129
and for large α they have very large values of g
α
(x)/g
α
(x) for x > 0, which typically results in
large queue backlog if used in conjunction with the driftpluspenalty method.
Asimpler hard fairness approach seeks only to maximize the minimumthroughput (161). This
easily ﬁts into the concave utility based driftpluspenalty framework using the concave function
g(x) = min[x
1
, . . . , x
M
]:
Maximize: min[x
1
, x
2
, . . . , x
M
] (5.131)
Subject to: 1) All queues are mean rate stable (5.132)
2) α(t ) ∈ A
ω(t )
∀t ∈ {0, 1, 2, . . .} (5.133)
See also Exercise 5.4. A “mixed” approach can also be considered, which seeks to maximize
β min[x
1
, . . . , x
M
] +
¸
M
m=1
log(1 +x
m
). The constant β is a large weight that ensures maximizing
the minimum throughput has a higher priority than maximizing the logarithmic terms.
5.8 EXERCISES
Exercise 5.1. (Using Logarithmic Utilities) Give a closed form solution to the auxiliary variable
update of (5.49)(5.50) when:
a) φ(γ ) =
¸
M
m=1
log(γ
m
), where log(·) denotes the natural logarithm.
b) φ(γ ) =
¸
M
m=1
log(1 +ν
m
γ
m
), where log(·) denotes the natural logarithm.
Exercise 5.2. (Transformed Problem with Auxiliary Variables) Let α
(t ) be a policy that yields
well deﬁned averages x
, y
l
, and that satisﬁes all constraints of problem (5.2)(5.5),(5.8) (including
the constraint x ∈ R), with utility φ(x
) = φ
opt
. Construct a policy that satisﬁes all constraints of
problem (5.13)(5.18) and that yields the same utility value φ(x
). Hint: Use γ (t ) = x
for all t .
Exercise 5.3. ( Jensen’s Inequality) Let φ(γ ) be a concave function deﬁned over a convex set R ⊆
R
M
. Let γ (τ) be a sequence of random vectors in R for τ ∈ {0, 1, 2, . . .}. Fix an integer t > 0,
and deﬁne T as an independent and random time that is uniformly distributed over the integers
{0, 1, . . . , t −1}. Deﬁne the random vector X = γ (T ). Use (5.9) to prove (5.10)(5.11).
Exercise 5.4. (Hard Fairness (161)) Consider a system with M attributes x(t ) =
(x
1
(t ), . . . , x
M
(t )), where x
m
(t ) = ˆ x
m
(α(t ), ω(t )) for m ∈ {1, . . . , M}. Assume there is a positive
constant θ
max
such that:
0 ≤ ˆ x
m
(α, ω) ≤ θ
max
∀m ∈ {1, . . . , M}, ∀ω, ∀α ∈ A
ω
130 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
a) State the driftpluspenalty algorithm for solving the following problem, with θ(t ) as a
new variable:
Maximize: θ
Subject to: 1) x
m
≥ θ ∀m ∈ {1, . . . , M}
2) 0 ≤ θ(t ) ≤ θ
max
∀t ∈ {0, 1, 2, . . .}
3) α(t ) ∈ A
ω(t )
∀t ∈ {0, 1, 2, . . .}
b) State the utilitybased driftpluspenalty algorithm for solving the problem:
Maximize: min[x
1
, x
2
, . . . , x
M
]
Subject to: α(t ) ∈ A
ω(t )
∀t ∈ {0, 1, 2, . . .}
which is solved with auxiliary variables γ
m
(t ) with 0 ≤ γ
m
(t ) ≤ θ
max
.
c) The problems in (a) and (b) both seek to maximize the minimum throughput. Show that
if both algorithms “break ties” when choosing auxiliary variables by choosing the lowest possible
values, then they are exactly the same algorithm. Show they are slightly different if ties are broken
to choose the largest possible auxiliary variables, particularly in cases when some virtual queues are
zero.
Exercise 5.5. (Bounded Virtual Queues) Consider the auxiliary variable optimization for γ
m
(t ) in
(5.49)(5.50), where φ
m
(x) has the property that:
φ
m
(x) ≤ φ
m
(0) +ν
m
x whenever 0 ≤ x ≤ γ
m,max
for a constant ν
m
> 0. Show that if 0 ≤ γ
m
(t ) ≤ γ
m,max
, we have:
Vφ
m
(γ
m
(t )) −G
m
(t )γ
m
(t ) ≤ Vφ
m
(0) +(Vν
m
−G
m
(t ))γ
m
(t )
Use this to prove that γ
m
(t ) = 0 is the unique optimal solution to (5.49)(5.50) whenever G
m
(t ) >
Vν
m
. Conclude from (5.48) that G
m
(t ) ≤ Vν
m
+γ
m,max
for all t , provided this is true at t = 0.
Exercise 5.6. (1Hop Wireless System with Inﬁnite Backlog) Consider a wireless system with
M channels. Transmission rates on slot t are given by b(t ) = (b
1
(t ), . . . , b
M
(t )) with b
m
(t ) =
ˆ
b
m
(α(t ), ω(t )), where ω(t ) = (S
1
(t ), . . . , S
M
(t )) is an observed channel state vector for slot t (as
sumed to be i.i.d. over slots), and α(t ) is a control action chosen within a set A
ω(t )
. Assume that
each channel has an inﬁnite backlog of data, so that there is always data to send. The goal is to
choose α(t ) every slot to maximize φ(b), where φ(b) is a concave and entrywise nondecreasing
utility function.
a) Verify that the algorithm of Section 5.0.5 in this case is:
5.8. EXERCISES 131
• (Auxiliary Variables) Choose γ (t ) = (γ
1
(t ), . . . , γ
M
(t )) to solve:
Maximize: Vφ(γ (t )) −
¸
M
m=1
G
m
(t )γ
m
(t )
Subject to: 0 ≤ γ
m
(t ) ≤ γ
m,max
∀m ∈ {1, . . . , M}
• (Transmission) Observe ω(t ) and choose α(t ) ∈ A
ω(t )
to maximize
¸
M
m=1
G
m
(t )
ˆ
b
m
(α(t ), ω(t )).
• (Virtual Queue Update) Update G
m
(t ) for all m ∈ {1, . . . , M} according to:
G
m
(t +1) = max[G
m
(t ) +γ
m
(t ) −
ˆ
b
m
(α(t ), ω(t )), 0]
b) Suppose that φ(b) =
¸
M
m=1
φ
m
(b
m
), where the functions φ
m
(b
m
) are continuous, concave,
nondecreasing, with maximum rightderivative ν
m
< ∞, so that φ
m
(γ ) ≤ φ
m
(0) +ν
m
γ for all
γ ≥ 0. Prove that the auxiliary variable decisions above yield γ
m
(t ) = 0 if G
m
(t ) > Vν
m
(see also
Exercise 5.5). Conclude that 0 ≤ G
m
(t ) ≤ Vν
m
+γ
m,max
for all t , provided that this holds at t = 0.
c) Use (5.33) to conclude that if the conditions of part (b) hold, if all virtual queues are initially
empty, and if any Cadditive approximation is used, then:
φ(b(t )) ≥ φ
opt
−
D +C
V
−
M
¸
m=1
ν
m
(Vν
m
+γ
m,max
)
t
, ∀t > 0
Exercise 5.7. (1Hop Wireless System with Random Arrivals) Consider the same system as Ex
ercise 5.6, with the exception that we have random arrivals A
m
(t ) and:
Q
m
(t +1) = max[Q
m
(t ) −
ˆ
b
m
(α(t ), ω(t )), 0] +x
m
(t )
where x
m
(t ) is a ﬂow control decision, made subject to 0 ≤ x
m
(t ) ≤ A
m
(t ). We want to maximize
φ(x).
a) State the new algorithm for this case.
b) Suppose 0 ≤ A
m
(t ) ≤ A
m,max
for some ﬁnite constant A
m,max
. Suppose φ(b) has the
structure of Exercise 5.6(b). Using a similar argument, show that all queues G
m
(t ) and Q
k
(t ) are
deterministically bounded.
Exercise 5.8. (Imperfect Channel Knowledge) Consider the general problem of Theorem 5.3, but
under the assumption that ω(t ) provides only a partial understanding of the channel for each queue
Q
k
(t ), so that
ˆ
b
k
(α(t ), ω(t )) is a random function of α(t ) and ω(t ), assumed to be i.i.d. over all slots
132 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
with the same α(t ) and ω(t ), and assumed to have ﬁnite second moments regardless of the choice
of α(t ). Deﬁne:
β
k
(α, ω)
=
E
¸
ˆ
b
k
(α(t ), ω(t ))α(t ) = α, ω(t ) = ω
¸
Assume that the function β
k
(α, ω) is known. Assume the other functions ˆ x
m
(·), ˆ y
l
(·), ˆ a
k
(·) are
deterministic as before. State the modiﬁed algorithm that minimizes the righthandside of (5.84)
in this case. Hint:
E{b
k
(t )(t )} = E{E{b
k
(t )(t ), α(t ), ω(t )} (t )} = E{β
k
(α(t ), ω(t ))(t )}
Note: Related problems with randomized service outcomes and Lyapunov drift are consid
ered in (162)(163)(164)(154)(165)(161), where knowledge of the channel statistics is needed for
computing the β
k
(α, ω) functions and their generalizations, and a maxweight learning framework
is developed in (166) for the case of unknown statistics.
Exercise 5.9. (Equivalence of the Transformed Problem Using Auxiliary Variables)
a) Suppose that α
∗
(t ) is a policy that satisﬁes all constraints of the problem (5.71)(5.75),
yielding time averages x
∗
and y
∗
l
and a cost value of y
∗
0
+f (x
∗
). Show that this policy also satisﬁes
all constraints of the problem (5.76)(5.81), and yields the same cost value, if we deﬁne the auxiliary
variable decisions to be γ (t ) = x
∗
for all t .
b) Suppose that α
(t ), γ
(t ) is a policy that satisﬁes all constraints of problem (5.76)(5.81),
yielding time averages x
, y
l
and a cost value in (5.76) given by some value v. Show that this same
policy also satisﬁes all constraints of problem (5.71)(5.75), with a cost y
0
+f (x
) ≤ v.
Exercise 5.10. (Proof of Theorem 5.3) We make use of the following fact, analogous to Theorem
4.5: If problem(5.71)(5.75) is feasible, then for all δ > 0 there exists an ωonly policy α
∗
(t ) ∈ A
ω(t )
such that E
¸
ˆ x(α
∗
(t ), ω(t ))
¸
= γ
∗
for some vector γ
∗
, and:
E
¸
ˆ y
0
(α
∗
(t ), ω(t ))
¸
+f (γ
∗
) ≤ y
opt
0
+f
opt
+δ
E
¸
ˆ y
l
(α
∗
(t ), ω(t ))
¸
+g
l
(γ
∗
) ≤ δ , ∀l ∈ {1, . . . , L}
E
¸
ˆ a
k
(α
∗
(t ), ω(t )) −
ˆ
b
k
(α
∗
(t ), ω(t ))
¸
≤ δ , ∀k ∈ {1, . . . , K}
dist
γ
∗
, X ∩ R
≤ δ
For simplicity, in this proof, we assume the above holds for δ = 0, and that all actual and virtual
queues are initially empty. Further assume that the functions f (γ ) and g
l
(γ ) are Lipschitz continuous,
so that there are positive constants ν
m
, β
l,m
such that for all x(t ) and γ (t ), we have:
f (γ (t )) −f (x(t )) ≤
¸
M
m=1
ν
m
γ
m
(t ) −x
m
(t )
g
l
(γ (t )) −g
l
(γ (t )) ≤
¸
M
m=1
β
l,m
γ
m
(t ) −x
m
(t ) , ∀l ∈ {1, . . . , L}
5.8. EXERCISES 133
a) Plug the above policy α
∗
(t ), together with the constant auxiliary vector γ (t ) = γ
∗
, into the
righthandside of the drift bound (5.84) and add C (because of the Cadditive approximation) to
derive a simpler bound on the drift expression. The resulting righthandside should be: D +C +
V(y
opt
0
+f
opt
).
b) Use the Lyapunov optimization theorem to prove that for all t > 0:
1
t
t −1
¸
τ=0
E{y
0
(τ) +f (γ (τ))} ≤ y
opt
0
+f
opt
+(D +C)/V
and hence, by Jensen’s inequality (with y
0
(t ) and γ (t ) deﬁned by (5.24)):
y
0
(t ) +f (γ (t )) ≤ y
opt
0
+f
opt
+(D +C)/V
c) Manipulate the drift bound of part (a) to prove that ((t )) ≤ W for some ﬁnite constant
W. Conclude that all virtual and actual queues are mean rate stable, and that (4.7) holds for all t > 0
and so E{H
m
(t )} /t ≤
√
2W/t .
d) Use (5.83) and (4.42) to prove that for all m ∈ {1, . . . , M}:
0 ≤ lim
t →∞
x
m
(t ) −γ
m
(t ) = lim
t →∞
E{H
m
(t )} 
t
≤ lim
t →∞
E{H
m
(t )}
t
= 0
Argue that γ (t ) ∈ X ∩ Rfor all t , and hence (5.87) holds.
e) Use part (b) and the Lipschitz assumptions to prove (5.85).
f ) Use (5.82), Theorem 2.5, and the Lipschitz conditions to prove (5.86).
Exercise 5.11. (Proﬁt Risk and NonConvexity) Consider a Kqueue system described by (5.1),
with arrival and service functions ˆ a
k
(α(t ), ω(t )) and
ˆ
b
k
(α(t ), ω(t )). Let p(t ) = ˆ p(α(t ), ω(t )) be a
random proﬁt variable that is i.i.d. over all slots for which we have α(t ) and ω(t ), and that has ﬁnite
second moment regardless of the policy. Deﬁne:
φ(α, ω)
=
E
¸
ˆ p(α(t ), ω(t ))α(t ) = α, ω(t ) = ω
¸
ψ(α, ω)
=
E
¸
ˆ p(α(t ), ω(t ))
2
α(t ) = α, ω(t ) = ω
¸
and assume the functions φ(α, ω), ψ(α, ω) are known. The goal is to stabilize all queues while
maximizing a linear combination of the proﬁt minus the variance of the proﬁt (where variance
is a proxy for “risk”). Speciﬁcally, deﬁne the variance as Var(p)
=
p
2
−p
2
, where the notation
h represents a time average expectation of a given process h(t ), as usual. We want to maximize
θ
1
p −θ
2
Var(p), where θ
1
and θ
2
are positive constants.
a) Deﬁne attributes p
1
(t ) = p(t ), p
2
(t ) = p(t )
2
. Write the problem using p
1
and p
2
in the
form of (5.88)(5.91), and show this is a nonconvex stochastic network optimization problem.
134 5. OPTIMIZINGFUNCTIONS OFTIMEAVERAGES
b) State the “primaldual” algorithm that minimizes the righthandside of (5.93) in this
context. Hint: Note that:
E{p
1
(t )(t )} = E{E{p
1
(t )(t ), α(t ), ω(t )} (t )} = E{φ(α(t ), ω(t ))(t )}
Exercise 5.12. (Optimization without Auxiliary Variables (17)(18)) Consider the problem (5.2)
(5.5). Assume there is a vector γ
= (γ
1
, . . . , γ
M
), called the optimal operating point, such that
φ(γ
) = φ
, where φ
is the maximum utility for the problem. Assume that there is an ωonly
policy α
(t ) such that for all possible values of ω(t ), we have:
ˆ x
m
(α
(t ), ω(t )) = γ
m
∀m ∈ {1, . . . , M} (5.134)
E
¸
ˆ a
k
(α
(t ), ω(t ))
¸
≤ E
¸
ˆ
b
k
(α
(t ), ω(t ))
¸
∀k ∈ {1, . . . , K} (5.135)
E
¸
ˆ y
l
(α
(t ), ω(t ))
¸
≤ 0 ∀l ∈ {1, . . . , L} (5.136)
The assumptions (5.134)(5.136) are restrictive, particularly because (5.134) must hold determin
istically for all ω(t ) realizations. However, these assumptions can be shown to hold for the special
case when x
m
(t ) represents the amount of data admitted to a network from a source m when: (i) All
sources are “inﬁnitely backlogged” and hence always have data to send, and (ii) Data can be admitted
as a real number.
The Lyapunov drift can be shown to satisfy the following for some constant B > 0:
((t )) −VE
¸
φ( ˆ x(α(t ), ω(t )))  (t )
¸
≤ B +
L
¸
l=1
Z
l
(t )E
¸
ˆ y
l
(α(t ), ω(t ))(t )
¸
+
K
¸
k=1
Q
k
(t )E
¸
ˆ a
k
(α(t ), ω(t )) −
ˆ
b
k
(α(t ), ω(t ))  (t )
¸
−VE
¸
φ( ˆ x(α(t ), ω(t )))  (t )
¸
Suppose every slot we observe (t ) and ω(t ) and choose an action α(t ) that minimizes the right
handside of the above drift inequality.
a) Assume ω(t ) is i.i.d. over slots. Plug the alternative policy α
(t ) into the righthandside
above to get a greatly simpliﬁed drift expression.
b) Conclude from part (a) that ((t )) ≤ D +V(φ
max
−φ
) for all t , for some ﬁnite con
stant D and where φ
max
is an upper bound on the instantaneous value of φ( ˆ x(·)) (assumed to
be ﬁnite). Conclude that all actual and virtual queues are mean rate stable, and hence all desired
inequality constraints are satisﬁed.
c) Use Jensen’s inequality and part (a) (with iterated expectations and telescoping sums) to
conclude that for all t > 0, we have:
φ(x(t )) ≥
1
t
t −1
¸
τ=0
E{φ(x(τ))} ≥ φ
−B/V −E{L((0))} /(Vt )
5.8. EXERCISES 135
where x(t )
=
1
t
¸
t −1
τ=0
E{x(τ)} and x(τ)
=
ˆ x(α(τ), ω(τ)).
Exercise 5.13. (DelayLimited Transmission (71)) Consider a Kuser wireless system with arrival
vector A(t ) = (A
1
(t ), . . . , A
K
(t )) and channel state vector S(t ) = (S
1
(t ), . . . , S
K
(t )) for each
slot t ∈ {0, 1, 2, . . .}. There is no queueing, and all data must either be transmitted in 1 slot or
dropped (similar to the delaylimited capacity formulation of (70)). Thus, there are no actual queues
in the system. Deﬁne ω(t )
=
[A(t ), S(t )] as the random network event observed every slot. Deﬁne
α(t ) ∈ A
ω(t )
as a general control action, which affects how much of the data to transmit and the
amount of power used according to general functions ˆ μ
k
(α, ω) and ˆ p(α, ω):
μ(t ) = ( ˆ μ
1
(α(t ), ω(t )), . . . , ˆ μ
K
(α(t ), ω(t ))) , p(t ) = ˆ p(α(t ), ω(t ))
where μ(t ) = (μ
1
(t ), . . . , μ
K
(t )) is the transmission vector and p(t ) is the power used on slot t .
Assume these are constrained as follows for all slots t :
0 ≤ μ
k
(t ) ≤ A
k
(t ) ∀k ∈ {1, . . . , K} , 0 ≤ p(t ) ≤ p
max
for some ﬁnite constant p
max
. Assume that A
k
(t ) ≤ A
max
k
for all t , for some ﬁnite constants A
max
k
for k ∈ {1, . . . , K}. Let μ be the time average expectation of the transmission vector μ(t ), and let
φ(μ) be a continuous, concave, and entrywise nondecreasing utility function of μ. The goal is to
solve the following problem:
Maximize: φ(μ)
Subject to: p ≤ P
av
where p is the time average expected power expenditure, and P
av
is a prespeciﬁed average power
constraint. This is a special case of the general problem (5.2)(5.5).
a) Use auxiliary variables γ (t ) = (γ
1
(t ), . . . , γ
K
(t )) subject to 0 ≤ γ
k
(t ) ≤ A
max
k
for all t, k
to write the corresponding transformed problem (5.13)(5.18) for this case.
b) State the driftpluspenalty algorithm that solves this transformed problem. Hint: Use a
virtual queue Z(t ) to enforce the constraint p ≤ P
av
, and use virtual queues G
k
(t ) to enforce the
constraints μ
k
≥ γ
k
for all k ∈ {1, . . . , K}.
Exercise 5.14. (DelayLimited Transmission with Errors (71)) Consider the same system as Ex
ercise 5.13, but now assume that transmissions can have errors, so that μ
k
(t ) = ˆ μ
k
(α(t ), ω(t )) is a
random transmission outcome (as in Exercise 5.8), assumed to be i.i.d. over all slots with the same
α(t ) and ω(t ), with known expectations β
k
(α(t ), ω(t ))
=
E{μ
k
(t )α(t ), ω(t )} for all k ∈ {1, . . . , K}.
Use iterated expectations (as in Exercise 5.8) to redesign the driftpluspenalty algorithm for this
case. Multislot versions of this problem are treated in Section 7.6.1.
137
C H A P T E R 6
Approximate Scheduling
This chapter focuses on the maxweight problem that arises when scheduling for stability or maxi
mum throughpututility in a wireless network with interference. Previous chapters showed the key
step is maximizing the expectation of a weighted sum of link transmission rates, or coming within
an additive constant C of the maximum. Speciﬁcally, consider a (possibly multihop) network with
L links, and let b(t ) = (b
1
(t ), . . . , b
L
(t )) be the transmission rate offered over link l ∈ {1, . . . , L}
on slot t . The goal is to make (possibly randomized) decisions for b(t ) to come within an additive
constant C of maximizing the following expectation:
L
¸
l=1
W
l
(t )E{b
l
(t )W(t )} (6.1)
where the expectation is with respect to the possibly random decision, and where W(t ) =
(W
1
(t ), . . . , W
L
(t )) is a vector of weights for slot t . The weights are related to queue backlogs
for singlehop problems and differential backlogs for multihop problems. Algorithms that accom
plish this for a given constant C ≥ 0 every slot are called Cadditive approximations. For problems of
network stability, previous chapters showed that Cadditive approximations can be used to stabilize
the network whenever arrival rates are inside the network capacity region, with average backlog and
delay bounds that grow linearly with C. For problems of maximum throughpututility, Chapter 5
showed that Cadditive approximations can be used with a simple ﬂowcontrol rule to give utility that
is within (B +C)/V of optimality (where B is a ﬁxed constant and V is any nonnegative parame
ter chosen as desired), with average backlog that grows linearly in both V and C. Thus, Cadditive
approximations can be used to push network utility arbitrarily close to optimal, as determined by
the parameter V.
Such maxweight problems can be very complex for wireless networks with interference. This
is because a transmission on one link can affect transmissions on many other links. Thus, transmission
decisions are coupled throughout the network. In this chapter, we ﬁrst consider a class of interference
networks without time varying channels and develop two Cadditive approximation algorithms for
this context. The ﬁrst is a simple algorithm based on trading off computation complexity and delay.
The second is a more elegant randomized transmission technique that admits a simple distributed
implementation. We then present a multiplicative approximation theorem that holds for general
networks with possibly timevarying channels. It guarantees constant factor throughput results for
algorithms that schedule transmissions within a multiplicative constant of the maxweight solution
every slot.
138 6. APPROXIMATESCHEDULING
6.1 TIMEINVARIANTINTERFERENCENETWORKS
Suppose the network is time invariant, in that the channel conditions do not change and the trans
mission rate options are the same for all slots t ∈ {0, 1, 2, . . .}. Assume that all transmissions are
in units of packets, and each link can transmit at most one packet per slot. The transmission rate
vector b(t ) = (b
1
(t ), . . . , b
L
(t )) is a binary vector with b
l
(t ) = 1 if link l transmits a packet on
slot t , and b
l
(t ) = 0 otherwise. We say that a binary vector b(t ) is feasible if the set of links that
correspond to “1” entries can be simultaneously activated for successful transmission. Deﬁne B as
the collection of all feasible binary vectors, called the link activation set (7). The set B depends on
the interference properties of the network. Every slot t , the network controller observes the current
link weights W(t ) = (W
1
(t ), . . . , W
L
(t )) and chooses a (possibly random) b(t ) ∈ B, with the goal
of maximizing the maxweight value (6.1). It is easy to show that the maximum is achieved by a
deterministic choice b
opt
(t ), where:
b
opt
(t )
=
arg max
b∈B
¸
L
¸
l=1
W
l
(t )b
l
The amount of computation required to ﬁnd an optimal vector b
opt
(t ) depends on the structure of
the set B. If this set is deﬁned by all links that satisfy matching constraints, so that no two active links
share a node, then b
opt
(t ) can be found in polynomial time (via a centralized algorithm). However,
the problem may be NPhard for general sets B, so that no polynomial time solution is available.
Let C be a given nonnegative constant. A Cadditive approximation to the maxweight
problem ﬁnds a vector b(t ) every slot t that satisﬁes:
L
¸
l=1
W
l
(t )E{b
l
(t )W(t )} ≥ max
b∈B
¸
L
¸
l=1
W
l
(t )b
l
−C
6.1.1 COMPUTINGOVERMULTIPLESLOTS
We ﬁrst consider the following simple technique for obtaining a Cadditive approximation with
arbitrarily low pertime slot computation complexity. Fix a positive integer T > 0, and divide the
timeline into successive intervals of T slot frames. Deﬁne t
r
=
rT as the start of frame r, for r ∈
{0, 1, 2, . . .}. At the beginning of each frame r ∈ {0, 1, 2, . . .}, the network controller observes the
weights W(t
r
) and begins a computation to ﬁnd b
opt
(t
r
). We assume the computation is completed
within the T slot frame, possibly by exhaustively searching through all options in the set B. The
network controller then allocates the constant rate vector b(t
r
) for all slots of frame r +1, while also
computing b
opt
(t
r+1
) during that frame. Thus, every frame r ∈ {1, 2, 3, . . .} the algorithm allocates
the constant rate vector that was computed on the previous frame. Meanwhile, it also computes the
optimal solution to the maxweight problem for the current frame (see Fig. 6.1). Thus, for any frame
r ∈ {1, 2, 3, . . .}, we have:
b(t ) = b
opt
(t
r−1
) ∀t ∈ {t
r
, . . . , t
r
+T −1}
6.1. TIMEINVARIANTINTERFERENCENETWORKS 139
t
0
t
1
t
2
t
3
Compute b
opt
(t
0
) Compute b
opt
(t
1
) Compute b
opt
(t
2
)
Implement b
opt
(t
0
) Implement b
opt
(t
1
)
Figure 6.1: An illustration of the frame structure for the algorithm of Section 6.1.1.
Nowassume the maximumchange inqueue backlog over one slot is deterministically bounded,
as is the maximumchange in each link weight. Speciﬁcally, assume that no link weight can change by
an amount more than θ, where θ is some positive constant. It follows that for any two slots t
1
< t
2
:
W
l
(t
1
) −W
l
(t
2
) ≤ θ(t
2
−t
1
)
Under this assumption, we now compute a value C such that the above algorithm is a C
additive approximation for all slots t ≥ T . Fix any slot t ≥ T . Let r represent the frame containing
this slot. Note that t −t
r−1
 ≤ 2T −1. We have:
L
¸
l=1
W
l
(t )b
l
(t ) =
L
¸
l=1
W
l
(t )b
opt
l
(t
r−1
)
=
L
¸
l=1
W
l
(t
r−1
)b
opt
l
(t
r−1
) +
L
¸
l=1
(W
l
(t ) −W
l
(t
r−1
))b
opt
l
(t
r−1
)
≥
L
¸
l=1
W
l
(t
r−1
)b
opt
l
(t
r−1
) −
L
¸
l=1
θt −t
r−1
b
opt
l
(t
r−1
)
≥
L
¸
l=1
W
l
(t
r−1
)b
opt
l
(t
r−1
) −Lθ(2T −1) (6.2)
Further, because b
opt
(t
r−1
) solves the maxweight problem for links W(t
r−1
), we have:
L
¸
l=1
W
l
(t
r−1
)b
opt
l
(t
r−1
) = max
b∈B
¸
L
¸
l=1
W
l
(t
r−1
)b
l
≥
L
¸
l=1
W
l
(t
r−1
)b
opt
l
(t )
140 6. APPROXIMATESCHEDULING
=
L
¸
l=1
W
l
(t )b
opt
l
(t ) −
L
¸
l=1
[W
l
(t ) −W
l
(t
r−1
)]b
opt
l
(t )
≥
L
¸
l=1
W
l
(t )b
opt
l
(t ) −Lθ(2T −1)
= max
b∈B
¸
L
¸
l=1
W
l
(t )b
l
−Lθ(2T −1) (6.3)
Combining (6.2) and (6.3) yields:
L
¸
l=1
W
l
(t )b
l
(t ) ≥ max
b∈B
¸
L
¸
l=1
W
l
(t )b
l
−2Lθ(2T −1)
Taking conditional expectations gives:
L
¸
l=1
W
l
(t )E{b
l
(t )W(t )} ≥ max
b∈B
¸
L
¸
l=1
W
l
(t )b
l
−2Lθ(2T −1)
It follows that this algorithm yields a Cadditive approximation for C
=
2Lθ(2T −1). The constant
C is linear in the number of links L and in the frame size T .
Nowlet complexity represent the number of operations required to compute the maxweight
solution (assuming for simplicity that this number is independent of the size of the weights W
l
(t )).
Because this complexity is amortized over T slots, the algorithm yields a perslot computation
complexity of complexity/T . This can be made as small as desired by increasing T , with a tradeoff
of increasing the value of C linearly in T . This shows that maximum throughput can be achieved
with arbitrarily low pertime slot complexity, with a tradeoff in average queue backlog and average
delay.
This technique was used in (167)(168) to reduce the perslot complexity of scheduling in
N ×N packet switches. The maxweight problem for N ×N packet switches is a maxweight
matching problem that can be computed in time that is polynomial in N. The work (168) uses this
to provide a smooth complexitydelay tradeoff for switches, showing average delay of O(N
4−α
) is
possible with perslot complexity O(N
α
), for any α such that 0 ≤ α ≤ 3.
Unfortunately, the maxweight problem for networks with general activation sets B may be
NPhard, so that the only available computation algorithms have complexity that is exponential in
the network size L. This means the frame size T must be chosen to be at least exponential in L to
achieve polynomial perslot complexity, which in turn incurs delay that is exponential in L.
6.1.2 RANDOMIZEDSEARCHINGFORTHEMAXWEIGHTSOLUTION
The ﬁrst lowcomplexity algorithm for fullthroughput scheduling in timeinvariant interference
networks was perhaps (169), where newlink activations are tried randomly and compared in the max
6.1. TIMEINVARIANTINTERFERENCENETWORKS 141
weight metric against the previously tried activation. This is analyzed with a different Markov chain
argument in (169). However, intuitively this works for the same reason as the framebased scheme
presented in the previous subsection: The randomized selection can be viewed as a (randomized)
computation algorithm that solves the maxweight problem over a (variable length) frame. The
optimal solution is computed in some random number of T slots, where T is geometric with success
probability equal to the number of optimal vectors in B divided by the size of the set B. While
the implementation of the algorithm is more elegant than the deterministic computation method
described in the previous subsection, its resulting delay bounds can be worse. For example, in a
N ×N packet switch, the randomized method yields complexity that is O(N) and an average delay
bound of O(N!). However, the deterministic method of (168) can achieve complexity that is O(N)
with an average delay bound of O(N
3
). This is achieved by using α = 1 in the smooth complexity
delay tradeoff curve described in the previous subsection. A variation on the randomized algorithm
of (169) for more complex networks is developed in (170).
All known methods for achieving throughpututility within of optimality for networks
with general interference constraints (and for arbitrary > 0) have either nonpolynomial perslot
complexity, or nonpolynomial delays and/or convergence times. This is not surprising: Suppose the
problemof maximizing the number of activated links is NPhard. If we can design an algorithmthat,
after a polynomial time T , has produced a throughput that is within 1/2 from the maximum sum
throughput with high probability, then this algorithm (with high probability) must have selected a
vector b(t ) that is a maxsize vector during some slot t ∈ {0, . . . , T } (else, the throughput would
be at least 1 away from optimal). Thus, this could be used as a randomized algorithm for ﬁnding
a maxsize vector in polynomial time. Related NPhardness results are developed in (171) for pure
stability problems with low delay, even when arrival rates are very low.
6.1.3 THEJIANGWALRANDTHEOREM
Here we present a randomized algorithm that produces a Cadditive approximation by allocating
a link vector b(t ) according to the steady state solution of a particular reversible Markov chain. The
Markov chain can easily be simulated, and it has a simple relation to distributed scheduling in a
carrier sense multiple access (CSMA) system. Further, if the vector is chosen according to the desired
distribution every slot t , the value of C that this algorithm produces is linear in the network size,
and hence this yields maximum throughput with polynomial delay. We ﬁrst present the result, and
then discuss the complexity associated with generating a vector with the desired distribution, related
to the convergence time required for the Markov chain to approach steady state.
The following randomized algorithm for choosing b(t ) ∈ B was developed in (172) for wire
less systems with general interference constraints, and in (173) for scheduling in optical networks:
Max Link Weight Plus Entropy Algorithm: Every slot t , observe the current link weights
W(t ) = (W
1
(t ), . . . , W
L
(t )) and choose b(t ) by randomly selecting a binary vector b =
142 6. APPROXIMATESCHEDULING
(b
1
, . . . , b
L
) ∈ B with probability distribution:
p
∗
(b)
=
Pr[b(t ) = b] =
L
l=1
exp(W
l
(t )b
l
)
A
(6.4)
where A is a normalizing constant that makes the distribution sum to 1.
The work (172) motivates this algorithmby the modiﬁed problemthat computes a probability
distribution p(b) over the set B to solve the following:
Maximize: −
¸
b∈B
p(b) log(p(b)) +
¸
b∈B
p(b)
¸
L
l=1
W
l
(t )b
l
(6.5)
Subject to: 0 ≤ p(b) ∀b ∈ B ,
¸
b∈B
p(b) = 1 (6.6)
where log(·) denotes the natural logarithm. This problem is equivalent to maximizing H(p(·)) +
¸
L
l=1
W
l
(t )E{b
l
(t )W(t )}, where H(p(·)) is the entropy (in nats) associated with the probability
distribution p(b), and E{b
l
(t )W(t )} is the expected transmission rate over link l given that b(t )
is selected according to the probability distribution p(b). However, note that because the set B
contains at most 2
L
link activation sets, and the entropy of any probability distribution that contains
at most k probabilities is at most log(k), we have for any probability distribution p(b):
0 ≤ −
¸
b∈B
p(b) log(p(b)) ≤ Llog(2)
It follows that if we can ﬁnd a probability distribution p(b) to solve the problem(6.5)(6.6), then this
produces a Cadditive approximation to the maxweight problem (6.1), with C = Llog(2). It follows
that such an algorithm can yield full throughput optimality, and can come arbitrarily close to utility
optimality, with an average backlog and delay expression that is polynomial in the network size.
Remarkably, the next theorem, developed in (172), shows that the probability distribution (6.4) is
the desired distribution, in that it exactly solves the problem (6.5)(6.6). Thus, the max linkweight
plusentropy algorithm is a Cadditive approximation for the maxweight problem.
Theorem 6.1 (JiangWalrand Theorem (172)) The probability distribution p
∗
(b) that solves (6.5)
and (6.6) is given by (6.4).
Proof. The proof follows directly from the analysis techniques used in (172), although we organize
the proof differently below. We ﬁrst compute the value of the maximization objective under the
particular distribution p
∗
(b) given in (6.4). We have:
−
¸
b∈B
p
∗
(b) log(p
∗
(b)) +
¸
b∈B
p
∗
(b)
L
¸
l=1
W
l
(t )b
l
=
¸
b∈B
p
∗
(b) log(A) −
¸
b∈B
p
∗
(b)
L
¸
l=1
W
l
(t )b
l
+
¸
b∈B
p
∗
(b)
L
¸
l=1
W
l
(t )b
l
= log(A)
6.1. TIMEINVARIANTINTERFERENCENETWORKS 143
where we have used the fact that p
∗
(b) is a probability distribution and hence sums to 1. We now
show that the expression in the objective of (6.5) for any other distribution p(b) is no larger than
log(A), so that p
∗
(b) is optimal for this objective. To this end, consider any other distribution p(b).
We have:
−
¸
b∈B
p(b) log(p(b)) +
¸
b∈B
p(b)
L
¸
l=1
W
l
(t )b
l
= −
¸
b∈B
p(b) log
p
∗
(b)
p(b)
p
∗
(b)
+
¸
b∈B
p(b)
L
¸
l=1
W
l
(t )b
l
= −
¸
b∈B
p(b) log
p(b)
p
∗
(b)
−
¸
b∈B
p(b) log(p
∗
(b))
+
¸
b∈B
p(b)
L
¸
l=1
W
l
(t )b
l
≤ −
¸
b∈B
p(b) log(p
∗
(b)) +
¸
b∈B
p(b)
L
¸
l=1
W
l
(t )b
l
(6.7)
= −
¸
b∈B
p(b) log(1/A) −
¸
b∈B
p(b)
L
¸
l=1
W
l
(t )b
l
+
¸
b∈B
p(b)
L
¸
l=1
W
l
(t )b
l
= log(A)
where in (6.7), we have used the well known KullbackLeibler divergence result, which states that
the divergence between any two distributions p
∗
(b) and p(b) is nonnegative (174):
d
KL
(pp
∗
)
=
¸
b∈B
p(b) log
p(b)
p
∗
(b)
≥ 0
Thus, the maximum value of the objective function (6.5) is log(A), which is achieved by the distri
bution p
∗
(b), proving the result. 2
Assume now the set B of all valid link activation vectors has a connectedness property, so that it
is possible to get from any b
1
∈ B to any other b
2
∈ B by a sequence of adding or removing single
links, where each step of the sequence produces another valid activation vector in B (this holds in
the reasonable case when removing any activated link from an activation vector in B yields another
activation vector in B). In this case, the distribution (6.4) is particularly interesting because it is the
exact stationary distribution associated with a continuous time ergodic Markov chain with state b(v)
(where v is a continuous time variable that is not related to the discrete time index t for the current
144 6. APPROXIMATESCHEDULING
slot). Transitions for this Markov chain take place by having each link l such that b
l
(v) = 1 de
activate at times according to an independent exponential distribution with rate μ = 1, and having
each link l such that b
l
(v) = 0 independently activate according to an exponential distribution with
rate λ
l
= exp(W
l
(t )), provided that turning this link ON does not violate the link constraints B.
That the resulting steady state is given by (6.4) can be shown by state space truncation arguments as
in (129)(131). This has the form of a simple distributed algorithm where links independently turn
ON or OFF, with Carrier Sense Multiple Access (CSMA) telling us if it is possible to turn a new
link ON (see also (175)(172)(173)(176)(177) for details on this).
Unfortunately, we need to run such an algorithm in continuous time for a long enough time
to reach a near steady state, and this all needs to be done within one slot to implement the result. Of
course, we can use a T slot argument as in Section 6.1.1 to allow more time to reach the steady state,
with the understanding that the queue backlog changes by an amount O(T ) that yields an additional
additive term in our Cadditive approximation (see (176) for an argument in this direction using
stochastic approximation theory). However, for general networks, the convergence of the Markov
chain to nearsteadystate takes a nonpolynomial amount of time (else, we could solve NPhard
problems with efﬁcient randomized algorithms). This is because the Markov chain can get “trapped”
for long durations of time incertainsuboptimal link activations (this is compensated for inthe steady
state distribution by getting “trapped” in a maxweight link activation for an even longer duration
of time). Even computing the normalizing A constant for the distribution in (6.4) is known to be a
“#Pcomplete” problem (178) (see also factor graph approximations in (179)). However, it is known
that for link activation sets with certain degree2 properties, such as those formed by networks
on rings, similar Markov chains require only a small (polynomial) time to reach near steady state
(180)(181). This may explain why the simulations in (172) for networks with small degree provide
good performance.
6.2 MULTIPLICATIVEFACTORAPPROXIMATIONS
While Cadditive approximations can push throughput and throughpututility arbitrarily close to
optimal, they may have large convergence times and delays as discussed in the previous section.
It is often possible to provide low complexity decisions for b(t ) that come within a multiplicative
factor of the maxweight solution. This section shows that such algorithms immediately lead to
constantfactor stability and throughpututility guarantees. The result holds for general networks,
possibly with timevarying channels, and possibly with nonbinary rate vectors.
Let S(t ) describe the channel randomness on slot t (i.e., the topology state), and let I (t )
be the transmission action on slot t , chosen within an abstract set I
S(t )
. The rate vector b(t ) =
(b
1
(t ), . . . , b
L
(t )) is determined by a general function of I (t ) and S(t ):
b
l
(t ) =
ˆ
b
l
(I (t ), S(t )) ∀l ∈ {1, . . . , L} (6.8)
6.2. MULTIPLICATIVEFACTORAPPROXIMATIONS 145
Deﬁnition 6.2 Let β, C be constants such that 0 < β ≤ 1 and C ≥ 0. A (β, C)approximation is
an algorithm that makes (possibly randomized) decisions I (t ) ∈ I
S(t )
every slot t to satisfy:
L
¸
l=1
W
l
(t )E
¸
ˆ
b
l
(I (t ), S(t ))W(t )
¸
≥ β sup
I∈I
S(t )
¸
L
¸
l=1
W
l
(t )
ˆ
b
l
(I, S(t ))
−C
Under this deﬁnition, a (1, C) approximation is the same as a Cadditive approximation. It is
known that (β, C)approximations can provide stability in single or multihop networks whenever
the arrival rates are interior to β, being a βscaled version of the capacity region (17)(22)(19)(182).
For example, if β = 1/2, then stability is only guaranteed when arrival rates are at most half the
distance to the capacity region boundary (so that the region where we can provide stability guarantees
shrinks by 50%). Related constantfactor guarantees are available for joint scheduling and ﬂow
control to maximize throughpututility, where the βscaling goes inside the utility function (see
(22)(19) for a precise scaledutility statement, (137) for applications to cognitive radio, and (154) for
applications to channels with errors). Here, we prove this result only for the special case of achieving
stability in a 1hop network. This provides all of the necessary insight with the least amount of
notation, and the reader is referred to the above references for proofs of the more general versions.
Consider a 1hop network with L queues with dynamics:
Q
l
(t +1) = max[Q
l
(t ) −b
l
(t ), 0] +a
l
(t ) ∀l ∈ {1, . . . , L}
where the service variables b
l
(t ) are determined by I (t ) and S(t ) by (6.8), and a(t ) =
(a
1
(t ), . . . , a
L
(t )) is the random vector of new data arrivals on slot t . Deﬁne ω(t )
=
[S(t ), a(t )],
and assume that ω(t ) is i.i.d. over slots with some probability distribution. Deﬁne λ
l
= E{a
l
(t )} as
the arrival rate to queue l.
Deﬁne an Sonly policy as a policy that independently chooses I (t ) ∈ I
S(t )
based only on a
(possibly randomized) function of the observed S(t ). Deﬁne as the set of all vectors (b
1
, . . . , b
L
)
that can be achieved as 1slot expectations under Sonly policies. That is, (b
1
, . . . , b
L
) ∈ if and
only if there is a Sonly policy I
∗
(t ) that satisﬁes I
∗
(t ) ∈ I
S(t )
and:
E
¸
ˆ
b
l
(I
∗
(t ), S(t ))
¸
= b
l
∀l ∈ {1, . . . , L}
where the expectation in the lefthandside is with respect to the distribution of S(t ) and the possibly
randomized decision for I
∗
(t ) that is made in reaction to the observed S(t ). For simplicity, assume
the set is closed. Recall that for any rate vector (λ
1
, . . . , λ
N
) in the capacity region , there exists
a Sonly policy I
∗
(t ) that satisﬁes:
E
¸
ˆ
b
l
(I
∗
(t ), S(t ))
¸
≥ λ
l
∀l ∈ {1, . . . , L}
We say that a vector (λ
1
, . . . , λ
L
) is interior to the scaled capacity region β if there is an > 0
such that:
(λ
1
+, . . . , λ
L
+) ∈ β
146 6. APPROXIMATESCHEDULING
Assume second moments of the arrival and service rate processes are bounded. Deﬁne
L(Q(t )) =
1
2
¸
L
l=1
Q
l
(t )
2
, and recall that Lyapunov drift satisﬁes (see (3.16)):
(Q(t )) ≤ B +
L
¸
l=1
Q
l
(t )λ
l
−
L
¸
l=1
Q
l
(t )E
¸
ˆ
b
l
(I (t ), S(t ))Q(t )
¸
(6.9)
where B is a positive constant that depends on the maximum second moments.
Theorem 6.3 Consider the above 1hop network with ω(t ) i.i.d. over slots and with arrival rates
(λ
1
, . . . , λ
L
). Fix β such that 0 < β ≤ 1. Suppose there is an > 0 such that:
(λ
1
+, . . . , λ
L
+) ∈ β (6.10)
If a (β, C)approximation is used for all slots t (where C ≥ 0 is a given constant), and if E{L(Q(0))} <
∞, then the network is mean rate stable and strongly stable, with average queue backlog bound:
limsup
t →∞
1
t
t −1
¸
τ=0
L
¸
l=1
E{Q
l
(τ)} ≤ B/
where B is the constant from (6.9).
Proof. Fix slot t . Because our decision I (t ) yields a (β, C)approximation for minimizing the ﬁnal
term in the righthandside of (6.9), we have:
(Q(t )) ≤ B +C +
L
¸
l=1
Q
l
(t )λ
l
−β
L
¸
l=1
Q
l
(t )E
¸
ˆ
b
l
(I
∗
(t ), S(t ))Q(t )
¸
(6.11)
where I
∗
(t ) is any other (possibly randomized) decision in the set I
S(t )
. Because (6.10) holds, we
know that:
(λ
1
/β +/β, . . . , λ
L
/β +/β) ∈
Thus, there exists a Sonly policy I
∗
(t ) that satisﬁes:
E
¸
ˆ
b
l
(I
∗
(t ), S(t ))Q(t )
¸
= E
¸
ˆ
b
l
(I
∗
(t ), S(t ))
¸
≥ λ
l
/β +/β ∀l ∈ {1, . . . , L}
where the ﬁrst equality above holds because I
∗
(t ) is Sonly and hence independent of the queue
backlogs Q(t ). Plugging this policy into the righthandside of (6.11) yields:
(Q(t )) ≤ B +C +
L
¸
l=1
Q
l
(t )λ
l
−β
L
¸
l=1
Q
l
(t )(λ
l
/β +/β) (6.12)
= B +C −
L
¸
l=1
Q
l
(t ) (6.13)
The result then follows by the Lyapunov drift theorem (Theorem 4.1). 2
6.2. MULTIPLICATIVEFACTORAPPROXIMATIONS 147
The above theoremcan be intuitively interpreted as follows: Any (perhaps approximate) effort
to schedule transmissions to maximize the weighted sum of transmission rates translates into good
network performance. More concretely, simple greedy algorithms with β = 1/2 and C = 0 (i.e.
(1/2, 0)approximation algorithms) exist for networks with matching constraints (where links can be
simultaneously scheduled if they do not share a common node). Indeed, it can be shown that the
greedy maximal match algorithm that ﬁrst selects the largest weight link (breaking ties arbitrarily),
then selects the next largest weight link that does not conﬂict with the previous one, and so on, yields
a (1/2, 0)approximation, so that it comes within a factor β = 1/2 of the maxweight decision (see,
for example, (137)). Distributed random access versions of this that produce (β, C) approximations
are considered in (154).
Different forms of approximate scheduling, not based on approximating the queuebased
maxweight rule, are treated using maximal matchings for stable switch scheduling in (183)(102),
for stable wireless networks in (184)(104)(103), for utility optimization in (185), and for energy
optimization in (186).
149
C H A P T E R 7
Optimization of Renewal
Systems
Here we extend the driftpluspenalty framework to allow optimization over renewal systems. In
previous chapters, we considered a slotted structure and assumed that every slot t a single random
event ω(t ) is observed, a single action α(t ) is taken, and the combination of α(t ) and ω(t ) generates
a vector of attributes (i.e., either penalties or rewards) for that slot. Here, we change the slot structure
to a renewal frame structure. The frame durations are variable and can depend on the decisions made
over the course of the frame. Rather than specifying a single action to take on each frame r, we must
specify a dynamic policy π[r] for the frame. A policy is a contingency plan for making a sequence
of decisions, where new random events might take place after each decision in the sequence. This
model allows a larger class of problems to be treated, including Markov Decision Problems, described
in more detail in Section 7.6.2.
An example renewal system is a wireless sensor network that is repeatedly used to perform
sensing tasks. Assume that each new task starts immediately when the previous task is completed.
The duration of each task and the network resources used depend on the policy implemented for
that task. Examples of this type are given in Section 7.4 and Exercise 7.1.
7.1 THERENEWAL SYSTEMMODEL
t[0]=0 t[1] t[2] t[3] t[4]
T[0] T[1] T[2] T[3]
Figure 7.1: An illustration of a sequence of renewal frames.
Consider a dynamic systemover the continuous timeline t ≥ 0 (where t can be a real number).
We decompose the timeline into successive renewal frames. Renewal frames occur one after the other,
and the start of each renewal frame is a time when the system state is “refreshed,” which will be
made precise below. Deﬁne t [0] = 0, and let {t [0], t [1], t [2], . . .} be a strictly increasing sequence
that represents renewal events. For each r ∈ {0, 1, 2, . . .}, the interval of time [t [r], t [r +1]) is the
150 7. OPTIMIZATIONOF RENEWAL SYSTEMS
rth renewal frame. Denote T [r]
=
t [r +1] −t [r] as the duration of the rth renewal frame (see Fig.
7.1).
At the start of each renewal frame r ∈ {0, 1, 2, . . .}, the controller chooses a policy π[r] from
some abstract policy space P. This policy is implemented over the course of the frame. There may be a
sequence of randomevents during each frame r, and the policy π[r] speciﬁes decisions that are made
in reaction to these events. The size of the frame T [r] is random and may depend on the policy. Fur
ther, the policy on frame r generates a random vector of penalties y[r] = (y
0
[r], y
1
[r], . . . , y
L
[r]).
We formally write the renewal size T [r] and the penalties y
l
[r] as random functions of π[r]:
T [r] =
ˆ
T (π[r]) , y
l
[r] = ˆ y
l
(π[r]) ∀l ∈ {0, 1, . . . , L}
Thus, given π[r],
ˆ
T (π[r]) and ˆ y
l
(π[r]) are random variables. We make the following renewal
assumptions:
• For any policy π ∈ P, the conditional distribution of (T [r], y[r]), given π[r] = π, is inde
pendent of the events and outcomes from past frames, and is identically distributed for each
frame that uses the same policy π.
• The frame sizes T [r] are always strictly positive, and there are ﬁnite constants T
min
, T
max
,
y
0,min
, y
0,max
such that for all policies π ∈ P, we have:
0 < T
min
≤ E
¸
ˆ
T (π[r])π[r] = π
¸
≤ T
max
, y
0,min
≤ E
¸
ˆ y
0
(π[r])π[r] = π
¸
≤ y
0,max
• There are ﬁnite constants D
2
and y
2
l,max
for l ∈ {1, . . . , L} such that for all π ∈ P:
E
¸
ˆ
T (π[r])
2
π[r] = π
¸
≤ D
2
(7.1)
E
¸
ˆ y
l
(π[r])
2
π[r] = π
¸
≤ y
2
l,max
∀l ∈ {1, . . . , L} (7.2)
That is, second moments are uniformly bounded, regardless of the policy.
In the special case when the system evolves in discrete time with unit time slots, all frame
sizes T [r] are positive integers, and T
min
= 1.
7.1.1 THEOPTIMIZATIONGOAL
Suppose we have an algorithm that chooses π[r] ∈ P at the beginning of each frame r ∈
{0, 1, 2, . . .}. Assume temporarily that this algorithm yields well deﬁned frame averages T and
y
l
with probability 1, so that:
lim
R→∞
1
R
R−1
¸
r=0
T [r] = T (w.p.1) , lim
R→∞
1
R
R−1
¸
r=0
y
l
[r] = y
l
(w.p.1) (7.3)
7.1. THERENEWAL SYSTEMMODEL 151
We want to design an algorithm that chooses policies π[r] over each frame r ∈ {0, 1, 2, . . .} to solve
the following problem:
Minimize: y
0
/T (7.4)
Subject to: y
l
/T ≤ c
l
∀l ∈ {1, . . . , L} (7.5)
π[r] ∈ P ∀r ∈ {0, 1, 2, . . .} (7.6)
where (c
1
, . . . , c
L
) are a given collection of real numbers that deﬁne time average cost constraints for
each penalty.
The value y
l
/T represents the time average penalty associated with the y
l
[r] process. To
understand this, note that the time average penalty, sampled at renewal times, is given by:
lim
R→∞
¸
R−1
r=0
y
l
[r]
¸
R−1
r=0
T [r]
=
lim
R→∞
1
R
¸
R−1
r=0
y
l
[r]
lim
R→∞
1
R
¸
R−1
r=0
T [r]
=
y
l
T
Hence, our goal is to minimize the time average associated with the y
0
[r] penalty, subject to the
constraint that the time average associated with the y
l
[r] process is less than or equal to c
l
, for all
l ∈ {1, . . . , L}.
As before, we shall ﬁnd it easier to work with time average expectations of the form:
T [R]
=
1
R
R−1
¸
r=0
E{T [r]} , y
l
[R]
=
1
R
R−1
¸
r=0
E{y
l
[r]} ∀l ∈ {0, 1, . . . , L} (7.7)
Under mild boundedness assumptions on T [r] and y
l
[r] (for example, when these are determinis
tically bounded), the Lebesgue dominated convergence theorem ensures that the limiting values of
T [R] and y
l
[R] also converge to T and y
l
whenever (7.3) holds (see Exercise 7.9).
7.1.2 OPTIMALITY OVERI.I.D. ALGORITHMS
Deﬁne an i.i.d. algorithm as one that, at the beginning of each new frame r ∈ {0, 1, 2, . . .}, chooses
a policy π[r] by independently and probabilistically selecting π ∈ P according to some distribution
that is the same for all frames r. Let π
∗
[r] represent such an i.i.d. algorithm. Then the values
{
ˆ
T (π
∗
[r])}
∞
r=0
are independent and identically distributed (i.i.d.) over frames, as are { ˆ y
l
(π
∗
[r])}
∞
r=0
.
Thus, by the law of large numbers, these have well deﬁned averages T
∗
and y
∗
l
with probability 1,
where the averages are equal to the expectations over one frame. We say that the problem (7.4)(7.6)
is feasible if there is an i.i.d. algorithm π
∗
[r] that satisﬁes:
E
¸
ˆ y
l
(π
∗
[r])
¸
E
¸
ˆ
T (π
∗
[r])
¸ ≤ c
l
∀l ∈ {1, . . . , L} (7.8)
152 7. OPTIMIZATIONOF RENEWAL SYSTEMS
Assuming feasibility, we deﬁne rat io
opt
as the inﬁmum value of the following quantity over
all i.i.d. algorithms that meet the constraints (7.8):
E
¸
ˆ y
0
(π
∗
[r])
¸
E
¸
ˆ
T (π
∗
[r])
¸
The following lemma is an immediate consequence of these deﬁnitions:
Lemma 7.1 If there is an i.i.d. algorithm that satisﬁes the feasibility constraints (7.8), then for any
δ > 0 there is an i.i.d. algorithm π
∗
[r] that satisﬁes:
E
¸
ˆ y
0
(π
∗
[r])
¸
≤ E
¸
ˆ
T (π
∗
[r])
¸
(rat io
opt
+δ) (7.9)
E
¸
ˆ y
l
(π
∗
[r])
¸
≤ E
¸
ˆ
T (π
∗
[r])
¸
c
l
∀l ∈ {1, . . . , L} (7.10)
The value rat io
opt
is deﬁned in terms of i.i.d. algorithms. It can be shown that, under mild
assumptions, the value rat io
opt
is also the inﬁmum of the objective function in the problem (7.4)
(7.6), which does not restrict to i.i.d. algorithms. This is similar in spirit to Theorems 4.18 and 4.5.
However, rather than stating these assumptions and proving this result, we simply use rat io
opt
as
our target, so that we desire to push the time average penalty objective as close as possible to the
smallest value that can be achieved over i.i.d. algorithms.
It is often useful to additionally assume that the following “Slater” assumption holds:
Slater Assumption for Renewal Systems: There is a value > 0 and an i.i.d. algorithm π
∗
[r]
such that:
E
¸
ˆ y
l
(π
∗
[r])
¸
≤ E
¸
ˆ
T (π
∗
[r])
¸
(c
l
−) ∀l ∈ {1, . . . , L} (7.11)
7.2 DRIFTPLUSPENALTY FORRENEWAL SYSTEMS
For each l ∈ {1, . . . , L}, deﬁne virtual queues Z
l
[r] with Z
l
[0] = 0, and with dynamics as follows:
Z
l
[r +1] = max[Z
l
[r] +y
l
[r] −c
l
T [r], 0] ∀l ∈ {1, . . . , L} (7.12)
Let Z[r] be the vector of queue values, and deﬁne the Lyapunov function L(Z[r]) by:
L(Z[r])
=
1
2
L
¸
l=1
Z
l
[r]
2
(7.13)
Deﬁne the conditional Lyapunov drift (Z[r]) as:
(Z[r])
=
E{L(Z[r +1]) −L(Z[r])Z[r]}
7.2. DRIFTPLUSPENALTY FORRENEWAL SYSTEMS 153
Using the same techniques as in previous chapters, it is easy to show that:
(Z[r]) ≤ B +
L
¸
l=1
Z
l
[r]E
¸
ˆ y
l
(π[r]) −c
l
ˆ
T (π[r])Z[r]
¸
(7.14)
where B is a ﬁnite constant that satisﬁes the following for all r and all possible Z[r]:
B ≥
1
2
L
¸
l=1
E
¸
(y
l
[r] −c
l
T [r])
2
Z[r]
¸
(7.15)
Such a ﬁnite constant B exists by the boundedness assumptions (7.1)(7.2). The driftpluspenalty
for frame r thus satisﬁes:
(Z[r]) +VE{y
0
[r]Z[r]} ≤ B +VE
¸
ˆ y
0
(π[r])Z[r]
¸
+
L
¸
l=1
Z
l
[r]E
¸
ˆ y
l
(π[r])Z[r]
¸
−
L
¸
l=1
Z
l
[r]c
l
E
¸
ˆ
T (π[r])Z[r]
¸
(7.16)
This variableframe drift methodology was developed in (56)(57) for optimizing delay in networks
deﬁned on Markov chains. However, the analysis in (56)(57) used a policy based on minimizing the
righthandside of the above inequality, which was only shown to be effective for pure feasibility
problems (where ˆ y
0
(π[r]) = 0 for all r) or for problems where the frame durations are independent of
the policy (see also Exercise 7.3). Our algorithmbelow, which can be applied to the general problem,
is inspired by the decision rule in (58), which minimizes the ratio of expected driftpluspenalty
over expected frame size.
RenewalBased DriftPlusPenalty Algorithm: At the beginningof eachframe r ∈ {0, 1, 2, . . .},
observe Z[r] and do the following:
• Choose a policy π[r] ∈ P that minimizes the following ratio:
E
¸
V ˆ y
0
(π[r]) +
¸
L
l=1
Z
l
[r] ˆ y
l
(π[r])Z[r]
¸
E
¸
ˆ
T (π[r])Z[r]
¸ (7.17)
• Update the virtual queues Z
l
[r] by (7.12).
As before, we deﬁne a Cadditive approximation to the ratiominimizing decision as follows.
Deﬁnition 7.2 A policy π[r] is a Cadditive approximation of the policy that minimizes (7.17) if:
E
¸
V ˆ y
0
(π[r]) +
¸
L
l=1
Z
l
[r] ˆ y
l
(π[r])Z[r]
¸
E
¸
ˆ
T (π[r])Z[r]
¸ ≤ C + inf
π∈P
⎡
⎣
E
¸
V ˆ y
0
(π) +
¸
L
l=1
Z
l
[r] ˆ y
l
(π)Z[r]
¸
E
¸
ˆ
T (π)Z[r]
¸
⎤
⎦
154 7. OPTIMIZATIONOF RENEWAL SYSTEMS
In particular, if policy π[r] is a Cadditive approximation, then:
E
¸
V ˆ y
0
(π[r]) +
L
¸
l=1
Z
l
[r] ˆ y
l
(π[r])Z[r]
¸
≤ CT
max
+E
¸
ˆ
T (π[r])Z[r]
¸ E
¸
V ˆ y
0
(π
∗
[r]) +
¸
L
l=1
Z
l
[r] ˆ y
l
(π
∗
[r])
¸
E
¸
ˆ
T (π
∗
[r])
¸ (7.18)
where π
∗
[r] is any i.i.d. algorithm that is chosen in P and is independent of queues Z[r]. In the
above inequality, we have used the fact that:
E
¸
ˆ
T (π[r])Z[r]
¸
≤ T
max
Theorem 7.3 (RenewalBased DriftPlusPenalty Performance) Assume there is an i.i.d. algorithm
π
∗
[r] that satisﬁes the feasibility constraints (7.8). Suppose we implement the above renewalbased drift
pluspenalty algorithmusing a Cadditive approximationfor all frames r, with initial conditionZ
l
[0] = 0
for all l ∈ {1, . . . , L}. Then:
a) All queues Z
l
[r] are mean rate stable, in that:
lim
R→∞
E{Z
l
[R]}
R
= 0 ∀l ∈ {1, . . . , L}
b) For all l ∈ {1, . . . , L} we have:
limsup
R→∞
(y
l
[R] −c
l
T [R]) ≤ 0 and so limsup
R→∞
y
l
[R]/T [R] ≤ c
l
where y
l
[R] and T [R] are deﬁned in (7.7).
c) The penalty process y
0
[r] satisﬁes the following for all R > 0:
y
0
[R] −rat io
opt
T [R] ≤
B +CT
max
V
where B is deﬁned in (7.15).
d) If the Slater assumption (7.11) holds for a constant > 0, then all queues Z
l
[r] are strongly
stable and satisfy the following for all R > 0:
1
R
R−1
¸
r=0
L
¸
l=1
E{Z
l
[r]} ≤
VF
T
min
(7.19)
7.2. DRIFTPLUSPENALTY FORRENEWAL SYSTEMS 155
where the constant F is deﬁned below in (7.22). Further, if for all l ∈ {1, . . . , L}, y
l
[r] is either deter
ministically lower bounded or deterministically upper bounded, then queues Z
l
[r] are rate stable and:
limsup
R→∞
¸
1
R
¸
R−1
r=0
y
l
[r]
1
R
¸
R−1
r=0
T [r]
≤ c
l
∀l ∈ {1, . . . , L} (w.p.1)
Proof. (Theorem 7.3) Because we use a Cadditive approximation every frame r, we know that
(7.18) holds. Plugging the i.i.d. algorithm π
∗
[r] from (7.18) into the righthandside of the drift
pluspenalty inequality (7.16) yields:
(Z[r]) +VE{y
0
[r]Z[r]} ≤ B +CT
max
+
E
¸
ˆ
T (π[r])Z[r]
¸
E
¸
ˆ
T (π
∗
[r])
¸ VE
¸
ˆ y
0
(π
∗
[r])
¸
+
E
¸
ˆ
T (π[r])Z[r]
¸
E
¸
ˆ
T (π
∗
[r])
¸
L
¸
l=1
Z
l
[r]E
¸
ˆ y
l
(π
∗
[r])
¸
−
L
¸
l=1
Z
l
[r]c
l
E
¸
ˆ
T (π[r])Z[r]
¸
(7.20)
where π
∗
[r] is any policy in P. Now ﬁx δ > 0, and plug into the righthandside of (7.20) the policy
π
∗
[r] that satisﬁes (7.9)(7.10), which makes decisions independent of Z[r], to yield:
(Z[r]) +VE{y
0
[r]Z[r]} ≤ B +CT
max
+E
¸
ˆ
T (π[r])Z[r]
¸
V(rat io
opt
+δ)
The above holds for all δ > 0. Taking a limit as δ → 0 yields:
(Z[r]) +VE{y
0
[r]Z[r]} ≤ B +CT
max
+E
¸
ˆ
T (π[r])Z[r]
¸
Vrat io
opt
(7.21)
To prove part (a), we can rearrange (7.21) to yield:
(Z[r]) ≤ B +CT
max
+V max[rat io
opt
T
max
, rat io
opt
T
min
] −Vy
0,min
where we use max[rat io
opt
T
max
, rat io
opt
T
min
] because rat io
opt
may be negative. This proves
that all components Z
l
[r] are mean rate stable by Theorem 4.1, proving part (a). The ﬁrst limsup
statement in part (b) follows immediately from mean rate stability of Z
l
[r] (via Theorem 2.5(b)).
The second limsup statement in part (b) follows from the ﬁrst (see Exercise 7.4).
To prove part (c), we take expectations of (7.21) to ﬁnd:
E{L(Z[r +1])} −E{L(Z[r])} +VE{y
0
[r]} ≤ B +CT
max
+E
¸
ˆ
T (π[r])
¸
Vrat io
opt
Summing over r ∈ {0, . . . , R −1} and dividing by RV yields:
E{L(Z[R])} −E{L(Z[0])}
RV
+
1
R
R−1
¸
r=0
E{y
0
[r]} ≤
B +CT
max
V
+rat io
opt
1
R
R−1
¸
r=0
E{T [r]}
156 7. OPTIMIZATIONOF RENEWAL SYSTEMS
Using the deﬁnitions of y
0
[R] and T [R] in (7.7) and noting that E{L(Z[R])} ≥ 0 and
E{L(Z[0])} = 0 yields:
y
0
[R] ≤
B +CT
max
V
+rat io
opt
T [R]
This proves part (c).
Part (d) follows from plugging the policy π
∗
[r] from (7.11) into (7.20) to obtain:
(Z[r]) +VE{y
0
[r]Z[r]} ≤ B +CT
max
+V
E
¸
ˆ
T (π[r])Z[r]
¸
E
¸
ˆ
T (π
∗
[r])
¸ y
0,max
−T
min
L
¸
l=1
Z
l
[r]
This can be written in the form:
(Z[r]) ≤ VF −T
min
L
¸
l=1
Z
l
[r]
where the constant F is deﬁned:
F
=
B +CT
max
V
+max
¸
T
max
T
min
y
0,max
,
T
min
T
max
y
0,max
¸
−y
0,min
(7.22)
Thus, from Theorem 4.1, we have that (7.19) holds, so that all queues Z
l
[r] are strongly stable.
In the special case when the y
l
[r] are deterministically bounded, we have by the Strong Stability
Theorem (Theorem 2.8) that all queues are rate stable. Thus, by Theorem 2.5(a):
limsup
R→∞
¸
1
R
R−1
¸
r=0
y
l
[r] −c
l
1
R
R−1
¸
r=0
T [r]
≤ 0 (w.p.1)
However:
1
R
¸
R−1
r=0
y
l
[r]
1
R
¸
R−1
r=0
T [r]
−c
l
≤ max
¸
1
R
¸
R−1
r=0
y
l
[r]
1
R
¸
R−1
r=0
T [r]
−c
l
, 0
1
R
¸
R−1
r=0
T [r]
1
R
¸
R−1
r=0
T [r]
= max
¸
1
R
R−1
¸
r=0
y
l
[r] −c
l
1
R
R−1
¸
r=0
T [r], 0
1
1
R
¸
R−1
r=0
T [r]
(7.23)
Further, because for all r ∈ {1, 2, . . .} we have E{T [r]T [0], T [1], . . . , T [r −1]} ≥ T
min
and
E
¸
T [r]
2
T [0], T [1], . . . , T [r −1]
¸
≤ D
2
, from Lemma 4.3 it follows that:
liminf
R→∞
1
R
R−1
¸
r=0
T [r] ≥ T
min
> 0 (w.p.1)
7.3. MINIMIZINGTHEDRIFTPLUSPENALTY RATIO 157
and so taking a limsup of (7.23) yields:
limsup
R→∞
¸
1
R
¸
R−1
r=0
y
l
[r]
1
R
¸
R−1
r=0
T [r]
−c
l
≤ 0 ×
1
T
min
= 0 (w.p.1)
This proves part (d). 2
The above theorem shows that time average penalty can be pushed to within O(1/V) of
optimal (for arbitrarily large V). The tradeoff is that the virtual queues are O(V) in size, which
affects the time required for the penalties to be close to their required time averages c
l
.
7.2.1 ALTERNATEFORMULATIONS
In some cases, we care more about y
l
itself, rather than y
l
/T . Consider the following variation of
problem (7.4)(7.6):
Minimize: y
0
/T
Subject to: y
l
≤ 0 ∀l ∈ {1, . . . , L}
π[r] ∈ P ∀r ∈ {0, 1, 2, . . .}
This changes the constraints from y
l
/T ≤ c
l
to y
l
≤ 0. However, this is just a special case of the
original problem (7.4)(7.6) with c
l
= 0.
Now suppose we seek to minimize y
0
, rather than y
0
/T . The problem is:
Minimize: y
0
Subject to: y
l
/T ≤ c
l
∀l ∈ {1, . . . , L}
π[r] ∈ P ∀r ∈ {0, 1, 2, . . .}
This problem has a signiﬁcantly different structure than (7.4)(7.6), and it is considerably easier to
solve. Indeed, Exercise 7.3 shows that it can be solved by minimizing an expectation every frame,
rather than a ratio of expectations.
Finally, we note that Exercise 7.5 explores an alternative algorithm for the original problem
(7.4)(7.6). The alternative uses only a minimum of an expectation every frame, rather than a ratio
of expectations.
7.3 MINIMIZINGTHEDRIFTPLUSPENALTY RATIO
We rewrite the driftpluspenalty ratio (7.17) in the following simpliﬁed form:
E{a(π)}
E{b(π)}
where a(π) represents the numerator and b(π) the denominator, both expressed as a function of the
policy π ∈ P. We note that T
max
≥ E{b(π)} ≥ T
min
> 0 for all π ∈ P. Deﬁne θ
∗
as the inﬁmum
158 7. OPTIMIZATIONOF RENEWAL SYSTEMS
of the above ratio:
θ
∗
=
inf
π∈P
¸
E{a(π)}
E{b(π)}
¸
(7.24)
We want to understand how to ﬁnd θ
∗
.
In the special case when E{b(π)} does not depend on the policy π (which holds when the
expected renewal interval size is the same for all policies), the minimization is achieved by choosing
π ∈ P to minimize E{a(π)}. This is important because the minimization of an expectation is typi
cally much simpler than a minimization of the ratio of expectations, and it can often be accomplished
through dynamic programming algorithms (64)(67)(57) and their special cases of stochastic shortest path
algorithms.
To treat the case when E{b(π)} may depend on the policy, we use the following simple but
useful lemmas.
Lemma 7.4 For any policy π ∈ P, we have:
E
¸
a(π) −θ
∗
b(π)
¸
≥ 0 (7.25)
with equality if and only if policy π achieves the inﬁmum ratio E{a(π)} /E{b(π)} = θ
∗
.
Proof. By deﬁnition of θ
∗
, we have for any policy π ∈ P:
E{a(π)}
E{b(π)}
≥ inf
π∈P
¸
E{a(π)}
E{b(π)}
¸
= θ
∗
Multiplying both sides by E{b(π)} and noting that E{b(π)} > 0 yields (7.25). That equality holds
if and only if E{a(π)} /E{b(π)} = θ
∗
follows immediately. 2
Lemma 7.5 We have:
inf
π∈P
E
¸
a(π) −θ
∗
b(π)
¸
= 0 (7.26)
Further, for any value θ ∈ R, we have:
inf
π∈P
E{a(π) −θb(π)} < 0 if θ > θ
∗
(7.27)
inf
π∈P
E{a(π) −θb(π)} > 0 if θ < θ
∗
(7.28)
7.3. MINIMIZINGTHEDRIFTPLUSPENALTY RATIO 159
Proof. To prove (7.26), note from Lemma 7.4 that we have for any policy π:
0 ≤ E
¸
a(π) −θ
∗
b(π)
¸
= E{b(π)}
E{a(π)}
E{b(π)}
−θ
∗
≤ T
max
E{a(π)}
E{b(π)}
−θ
∗
Taking inﬁmums over π ∈ P of the above yields:
0 ≤ inf
π∈P
E
¸
a(π) −θ
∗
b(π)
¸
≤ T
max
inf
π∈P
E{a(π)}
E{b(π)}
−θ
∗
= 0
where the ﬁnal equality uses the deﬁnition of θ
∗
in (7.24). This proves (7.26).
To prove (7.27), suppose that θ > θ
∗
. Then:
inf
π∈P
E{a(π) −θb(π)} = inf
π∈P
¸
E
¸
a(π) −θ
∗
b(π)
¸
−(θ −θ
∗
)E{b(π)}
¸
≤ inf
π∈P
E
¸
a(π) −θ
∗
b(π)
¸
−(θ −θ
∗
)T
min
= −(θ −θ
∗
)T
min
< 0
where we have used (7.26). This proves (7.27). To prove (7.28), suppose θ < θ
∗
. Then:
inf
π∈P
E{a(π) −θb(π)} = inf
π∈P
[E
¸
a(π) −θ
∗
b(π)
¸
+E
¸
(θ
∗
−θ)b(π)
¸
]
≥ inf
π∈P
E
¸
a(π) −θ
∗
b(π)
¸
+(θ
∗
−θ)T
min
= (θ
∗
−θ)T
min
> 0
2
7.3.1 THEBISECTIONALGORITHM
Lemmas 7.4 and 7.5 show that we can approach the optimal ratio θ
∗
with a simple iterative bisection
algorithm that computes inﬁmums of expectations at each step. Speciﬁcally, suppose that on stage k
of the iteration, we have ﬁnite bounds θ
(k)
min
and θ
(k)
max
such that we know:
θ
(k)
min
< θ
∗
< θ
(k)
max
Deﬁne θ
(k)
bisect
as:
θ
(k)
bisect
=
(θ
(k)
max
+θ
(k)
min
)/2
We then compute inf
π∈P
E
¸
a(π) −θ
(k)
bisect
b(π)
¸
. If the result is 0, then θ
(k)
bisect
= θ
∗
. If the result
is positive then we know θ
(k)
bisect
< θ
∗
, and if the result is negative we know θ
(k)
bisect
> θ
∗
. We then
appropriately adjust our upper and lower bounds for stage k +1. The uncertainty interval decreases
by a factor of 2 on each stage, and so this algorithm converges exponentially fast to the value θ
∗
. This
is useful because each stage involves minimizing an expectation, rather than a ratio of expectations.
160 7. OPTIMIZATIONOF RENEWAL SYSTEMS
7.3.2 OPTIMIZATIONOVERPUREPOLICIES
Let P
pure
be any ﬁnite or countably inﬁnite set of policies that we call pure policies:
P
pure
= {π
1
, π
2
, π
3
, . . .}
Let P be the larger policy space that considers all probabilistic mixtures of pure policies. Speciﬁcally,
the space P considers policies that make a randomized decision about which policy π
i
∈ P
pure
to use, according to some probabilities q
i
= Pr[Implement policy π
i
] with
¸
∞
i=1
q
i
= 1. It turns
out that minimizing the ratio E{a(π)} /E{b(π)} over π ∈ P can be achieved by considering only
pure policies π ∈ P
pure
. To see this, deﬁne θ
∗
as the inﬁmum ratio over π ∈ P, and for simplicity,
assume that θ
∗
is achieved by some particular policy π
∗
∈ P, which corresponds to a probability
distribution (q
∗
1
, q
∗
2
, . . .) for selecting pure policies (π
1
, π
2
, . . .). Then:
0 = E
¸
a(π
∗
) −θ
∗
b(π
∗
)
¸
=
∞
¸
i=1
q
∗
i
E
¸
a(π
i
) −θ
∗
b(π
i
)
¸
≥
∞
¸
i=1
q
∗
i
¸
inf
π∈P
pure
E
¸
a(π) −θ
∗
b(π)
¸
¸
= inf
π∈P
pure
E
¸
a(π) −θ
∗
b(π)
¸
On the other hand, because P is a larger policy space than P
pure
, we have:
0 = inf
π∈P
E
¸
a(π) −θ
∗
b(π)
¸
≤ inf
π∈P
pure
E
¸
a(π) −θ
∗
b(π)
¸
Thus:
inf
π∈P
pure
E
¸
a(π) −θ
∗
b(π)
¸
= 0
which shows that the inﬁmum ratio θ
∗
can be found over the set of pure policies.
The same result holds more generally: Let P
pure
be any (possibly uncountably inﬁnite) set of
policies that we call pure policies. Deﬁne as the set of all vectors (E{a(π)} , E{b(π)}) that can
be achieved by policies π ∈ P
pure
. Suppose P is a larger policy space that contains all pure policies
and is such that the set of all vectors (E{a(π)} , E{b(π)}) that can be achieved by policies π ∈ P is
equal to the convex hull of , denoted Conv().
1
If θ
∗
is the inﬁmum ratio of E{a(π)} /E{b(π)}
over π ∈ P, then:
0 = inf
π∈P
E
¸
a(π) −θ
∗
b(π)
¸
= inf
(a,b)∈Conv()
[a −θ
∗
b]
= inf
(a,b)∈
[a −θ
∗
b]
= inf
π∈P
pure
E
¸
a(π) −θ
∗
b(π)
¸
1
The convex hull of a set ⊆ R
k
(for some integer k > 0) is the set of all ﬁnite probabilistic mixtures of vectors in . It can be
shown that Conv() is the set of all expectations E{X} that can be achieved by random vectors X that take values in the set
according to any probability distribution that leads to a ﬁnite expectation.
7.3. MINIMIZINGTHEDRIFTPLUSPENALTY RATIO 161
where we have used the well known fact that the inﬁmum of a linear function over the convex hull
of a set is equal to the inﬁmum over the set itself. Therefore, by Lemma 7.4, it follows that θ
∗
is also
the inﬁmum ratio of E{a(π)} /E{b(π)} over the smaller set of pure policies P
pure
.
7.3.3 CAVEAT FRAMES WITHINITIAL INFORMATION
Suppose at the beginning of each frame r, we observe a vector η[r] of initial information that
inﬂuences the penalties and frame size. Assume {η[r]}
∞
r=0
is i.i.d. over frames. Eachpolicy π ∈ P ﬁrst
observes η[r] and then chooses a subpolicy π
∈ P
η[r]
that possibly depends on the observed η[r].
One might (incorrectly) implement the policy that ﬁrst observes η[r] and then chooses π
∈ P
η[r]
that minimizes the ratio of conditional expectations E
¸
a(π
)η[r]
¸
/E
¸
b(π
)η[r]
¸
. This would
work if the denominator does not depend onthe policy, but it may be incorrect ingeneral. Minimizing
the ratio of expectations is not always achieved by the policy that minimizes the ratio of conditional
expectations given the observed initial information. For example, suppose there are two possible
initial vectors η
1
and η
2
, both equally likely. Suppose there are two possible policies for each vector:
• Under η
1
: π
11
gives [a = 1, b = 1], π
12
gives [a = 2, b = 1].
• Under η
2
: π
21
gives [a = 20, b = 10], π
22
gives [a = .4, b = .1].
It can be shown that any achievable (E{a(π)} , E{b(π)}) vector can be achieved by a probabilistic
mixture of the following four pure policies:
• Pure policy π
1
: Choose π
11
if η[r] = η
1
, π
21
if η[r] = η
2
.
• Pure policy π
2
: Choose π
11
if η[r] = η
1
, π
22
if η[r] = η
2
.
• Pure policy π
3
: Choose π
12
if η[r] = η
1
, π
21
if η[r] = η
2
.
• Pure policy π
4
: Choose π
12
if η[r] = η
1
, π
22
if η[r] = η
2
.
Clearly π
11
minimizes the conditional ratio a/b given η
1
, and π
21
minimizes the conditional ratio
a/b given η
2
. The policy π
1
that chooses π
11
whenever η
1
is observed, and π
21
whenever η
2
is
observed, yields:
E{a(π
1
)}
E{b(π
1
)}
=
(1/2)1 +(1/2)20
(1/2)1 +(1/2)10
=
10.5
5.5
≈ 1.909
On the other hand, the policy that minimizes the ratio E{a(π)} /E{b(π)} is the policy π
2
, which
chooses π
11
whenever η
1
is observed, and chooses π
22
whenever η
2
is observed:
E{a(π
2
)}
E{b(π
2
)}
=
(1/2)1 +(1/2)0.4
(1/2)1 +(1/2)0.1
=
.7
.55
≈ 1.273
A correct minimization of the ratio can be obtained as follows: If we happen to know the
optimal ratio θ
∗
, we can use the fact that:
0 = inf
π∈P
E
¸
a(π) −θ
∗
b(π)
¸
= E
¸
inf
π
∈P
η[r]
E
¸
a(π
) −θ
∗
b(π
)η[r]
¸
¸
162 7. OPTIMIZATIONOF RENEWAL SYSTEMS
and so using the policy π
∗
that ﬁrst observes η[r] and then chooses π
∈ P
η[r]
to minimize the con
ditional expectation E
¸
a(π
) −θ
∗
b(π
)η[r]
¸
yields E{a(π
∗
) −θ
∗
b(π
∗
)} = 0, which by Lemma
7.4 shows it must also minimize the ratio E{a(π)} /E{b(π)}.
If θ
∗
is unknown, we can compute an approximation of θ
∗
via the bisection algorithm as
follows. At step k, we have θ
bisect
[k], and we want to compute:
inf
π∈P
E{a(π) −θ
bisect
[k]b(π)} = E
¸
inf
π
∈P
η[r]
¸
E
¸
a(π
) −θ
bisect
[k]b(π
)η[r]
¸¸
¸
This can be done by generating a collection of W i.i.d. samples {η
1
, η
2
, . . . , η
W
} (all with the same
distribution as η[r]), computing the inﬁmum conditional expectation for each sample, and then
using the law of large numbers to approximate the expectation as follows:
E
¸
inf
π
∈P
η[r]
¸
E
¸
a(π
) −θ
bisect
[k]b(π
)η[r]
¸¸
¸
≈
1
W
W
¸
w=1
inf
π
∈P
η
w
E
¸
a(π
) −θ
bisect
[k]b(π
)η[r] = η
w
¸
=
val(θ
bisect
[k]) (7.29)
For a given frame r, the same samples {η
1
, . . . , η
W
} should be used for each step of the
bisection routine. This ensures the stager approximation function val(θ) uses the same samples
and is thus nonincreasing in θ, important for the bisection to work properly (see Exercise 7.2).
However, new samples should be used on each frame. If it is difﬁcult to generate new i.i.d. samples
{η
1
, . . . , η
W
} on each frame (possibly because the distribution of η[r] is unknown), we can use W
past values of η[r]. There is a subtle issue here because these past values are not independent of the
queue backlogs Z
l
[r] that are part of the a(π) function. However, using these past values can still
be shown to work via a delayedqueue argument given in the maxweight learning theory of (166).
7.4 TASKPROCESSINGEXAMPLE
Consider a network of L wireless nodes that collaboratively process tasks and report the results to a
receiver. There are an inﬁnite sequence of tasks {T ask[0], T ask[1], T ask[2], . . .} that are performed
backtoback, and the starting time of task r ∈ {0, 1, 2, . . .} is considered to be the start of renewal
frame r. At the beginning of each task r ∈ {0, 1, 2, . . .}, the network observes a vector η[r] of
task information. We assume {η[r]}
∞
r=0
is i.i.d. over tasks with an unknown distribution. Every task
must be processed using one of K pure policies P
pure
= {π
1
, π
2
, . . . , π
K
}. The frame size T [r], task
processing utility g[r], and energy expenditures y
l
[r] for each node l ∈ {1, . . . , L} are deterministic
functions of η[r] and π[r]:
T [r] =
ˆ
T (η[r], π[r]) , g[r] = ˆ g(η[r], π[r]) , y
l
[r] = ˆ y
l
(η[r], π[r])
7.4. TASKPROCESSINGEXAMPLE 163
Let p
av
be a positive constant. The goal is design an algorithm to solve:
Maximize: g/T
Subject to: y
l
/T ≤ p
av
∀l ∈ {1, . . . , L}
π[r] ∈ P
pure
∀r ∈ {0, 1, 2, . . .}
Example Problem:
a) State the renewalbased driftpluspenalty algorithm for this problem.
b) Assume that the frame size is independent of the policy, so that
ˆ
T (η[r], π[r]) =
ˆ
T (η[r]).
Show that minimization of the ratio of expectations can be done without bisection, by solving a
single deterministic problem every slot.
c) Assume the general case when the frame size depends on the policy. Suppose the optimal
ratio value θ
∗
[r] is known for frame r. State the deterministic problem to solve every slot, with the
structure of minimizing a(π) −θ
∗
[r]b(π) as in Section 7.3.3.
d) Describe the bisection algorithm that obtains an estimate of θ
∗
[r] for part (c). Assume we
have W past values of initial information {η[r], η[r −1], . . . , η[r −W +1]}, and that we know
θ
min
≤ θ
∗
[r] ≤ θ
max
for some constants θ
min
and θ
max
.
Solution:
a) Create virtual queues Z
l
[r] for each l ∈ {1, . . . , L} as follows:
Z
l
[r +1] = max[Z
l
[r] + ˆ y
l
(η[r], π[r]) −
ˆ
T (η[r], π[r])p
av
, 0] (7.30)
Every frame r ∈ {0, 1, 2, . . .}, observe η[r] and Z[r] and do the following:
• Choose π[r] ∈ P
pure
to minimize:
E
¸
−V ˆ g(η[r], π[r]) +
¸
L
l=1
Z
l
[r] ˆ y
l
(η[r], π[r])Z[r]
¸
E
¸
ˆ
T (η[r], π[r])Z[r]
¸ (7.31)
• Update queues Z
l
[r] according to (7.30).
b) If E
¸
ˆ
T (η[r], π[r])Z[r]
¸
does not depend on the policy, it sufﬁces to minimize the
numerator in (7.31). This is done by observing η[r] and Z[r] and choosing the policy π[r] ∈ P
pure
as the one that minimizes:
−V ˆ g(η[r], π[r]) +
L
¸
l=1
Z
l
[r] ˆ y
l
(η[r], π[r])
c) If θ
∗
[r] is known, then we observe η[r] and Z[r] and choose the policy π[r] ∈ P
pure
as
the one that minimizes:
−V ˆ g(η[r], π[r]) +
L
¸
l=1
Z
l
[r] ˆ y
l
(η[r], π[r]) −θ
∗
[r]
ˆ
T (η[r], π[r])
164 7. OPTIMIZATIONOF RENEWAL SYSTEMS
d) Fix a particular frame r. Let θ
(k)
min
and θ
(k)
max
be the bounds on θ
∗
[r] for step k of the bisection,
where θ
(0)
min
= θ
min
and θ
(k)
max
= θ
max
. Deﬁne θ
(k)
bisect
= (θ
(k)
min
+θ
(k)
max
)/2. Deﬁne {η
1
, . . . η
W
} as the
W samples to be used. Deﬁne the function val(θ) as follows:
val(θ) =
1
W
W
¸
i=1
¸
min
π∈P
pure
¸
−V ˆ g(η
i
, π) +
L
¸
l=1
Z
l
[r] ˆ y
l
(η
i
, π) −θ
ˆ
T (η
i
, π)
(7.32)
Note that computing val(θ) involves W separate minimizations. Note also that val(θ) is non
increasing in θ (see Exercise 7.2). Now compute val(θ
(k)
bisect
):
• If val(θ
(k)
bisect
) = 0, we are done and we declare θ
(k)
bisect
as our estimate of θ
∗
[r].
• If val(θ
(k)
bisect
) > 0, then deﬁne θ
(k+1)
min
= θ
(k)
bisect
, θ
(k+1)
max
= θ
(k)
max
.
• If val(θ
(k)
bisect
) < 0, then deﬁne θ
(k+1)
min
= θ
(k)
min
, θ
(k+1)
max
= θ
(k)
bisect
.
Then proceed with the iterations until our error bounds are sufﬁciently low. Note that this algorithm
requires val(θ
(0)
min
) ≥ 0 ≥ val(θ
(0)
max
), which should be checked before the iterations begin. If this is
violated, we simply increase θ
(0)
max
and/or decrease θ
(0)
min
.
7.5 UTILITY OPTIMIZATIONFORRENEWAL SYSTEMS
Now consider a renewal system that generates both a penalty vector y[r] = (y
1
[r], . . . , y
L
[r])
and an attribute vector x[r] = (x
1
[r], . . . , x
M
[r]). These are random functions of the policy π[r]
implemented on frame r:
x
m
[r] = ˆ x
m
(π[r]) , y
l
[r] = ˆ y
l
(π[r]) ∀m ∈ {1, . . . , M}, l ∈ {1, . . . , L}
The frame size T [r] is also a random function of the policy as before: T [r] =
ˆ
T (π[r]). We make
the same assumptions as before, including that second moments of ˆ x
m
(π[r]) are uniformly bounded
regardless of the policy, and that the conditional distribution of (T [r], y[r], x[r]), given π[r] = π,
is independent of events on previous frames, and is identically distributed on each frame that uses
the same policy π. Let T
min
, T
max
, x
m,min
, x
m,max
be ﬁnite constants such that for all policies π ∈ P
and all m ∈ {1, . . . , M}, we have:
0 < T
min
≤ E
¸
ˆ
T (π[r])π[r] = π
¸
≤ T
max
, x
m,min
≤ E
¸
ˆ x(π[r])π[r] = π
¸
≤ x
m,max
Under a particular algorithm for choosing policies π[r] over frames r ∈ {0, 1, 2, . . .}, deﬁne T [R],
y
l
[R], x
m
[R] for R > 0 by:
T [R]
=
1
R
R−1
¸
r=0
E{T [r]} , y
l
[R]
=
1
R
R−1
¸
r=0
E{y
l
[r]} , x
m
[R]
=
1
R
R−1
¸
r=0
E{x
m
[r]}
7.5. UTILITY OPTIMIZATIONFORRENEWAL SYSTEMS 165
Deﬁne T , y
l
, x
m
as the limiting values of T [R], y
l
[R], x
m
[R], assuming temporarily that the limit
exists. For each m ∈ {1, . . . , M}, deﬁne γ
m,min
and γ
m,max
by:
γ
m,min
=
min
¸
x
m,min
T
min
,
x
m,min
T
max
¸
, γ
m,max
=
max
¸
x
m,max
T
min
,
x
m,max
T
max
¸
It is clear that for all m ∈ {1, . . . , M} and all R > 0, we have:
γ
m,min
≤
x
m
[R]
T [R]
≤ γ
m,max
, γ
m,min
≤
x
m
T
≤ γ
m,max
(7.33)
Let φ(γ ) be a continuous, concave, and entrywise nondecreasing function of vector γ =
(γ
1
, . . . , γ
M
) over the rectangle γ ∈ R, where:
R = {(γ
1
, . . . , γ
M
)γ
m,min
≤ γ
m
≤ γ
m,max
∀m ∈ {1, . . . , M}} (7.34)
Consider the following problem:
Maximize: φ(x/T ) (7.35)
Subject to: y
l
/T ≤ c
l
∀l ∈ {1, . . . , L} (7.36)
π[r] ∈ P ∀r ∈ {0, 1, 2, . . .} (7.37)
To transform this problem to one that has the structure given in Section 7.1.1, we deﬁne
auxiliary variables γ [r] = (γ
1
[r], . . . , γ
M
[r]) that are chosen in the rectangle Revery frame r. We
then deﬁne a new penalty y
0
[r] as follows:
y
0
[r]
=
−T [r]φ(γ [r])
Now consider the following transformed (and equivalent) problem:
Maximize: T φ(γ )/T (7.38)
Subject to: x
m
≥ T γ
m
∀m ∈ {1, . . . , M} (7.39)
y
l
/T ≤ c
l
∀l ∈ {1, . . . , L} (7.40)
γ [r] ∈ R ∀r ∈ {0, 1, 2, . . .} (7.41)
π[r] ∈ P ∀r ∈ {0, 1, 2, . . .} (7.42)
where:
T φ(γ )
=
lim
R→∞
1
R
R−1
¸
r=0
E{T [r]φ(γ [r])} = −y
0
T γ
m
=
lim
R→∞
1
R
R−1
¸
r=0
E{T [r]γ
m
[r])} ∀m ∈ {1, . . . , M}
166 7. OPTIMIZATIONOF RENEWAL SYSTEMS
That the problems (7.35)(7.37) and (7.38)(7.42) are equivalent is proven in Exercise 7.7
using the fact:
T φ(γ )/T ≤ φ(T γ /T )
This fact is a variation on Jensen’s inequality and is proven in the following lemma.
Lemma 7.6 Let φ(γ ) be any continuous and concave (not necessarily nondecreasing) function deﬁned
over γ ∈ R, where Ris deﬁned in (7.34).
(a) Let (T, γ ) be a random vector that takes values in the set {(T, γ )T > 0, γ ∈ R} according to
any joint distribution that satisﬁes 0 < E{T } < ∞. Then:
E{T φ(γ )}
E{T }
≤ φ
E{T γ }
E{T }
(b) Let (T [r], γ [r]) be a sequence of random vectors of the type speciﬁed in part (a), for r ∈
{0, 1, 2, . . .}. Then for any integer R > 0:
1
R
¸
R−1
r=0
T [r]φ(γ [r])
1
R
¸
R−1
r=0
T [r]
≤ φ
1
R
¸
R−1
r=0
T [r]γ [r]
1
R
¸
R−1
r=0
T [r]
(7.43)
1
R
¸
R−1
r=0
E{T [r]φ(γ [r])}
1
R
¸
R−1
r=0
E{T [r]}
≤ φ
1
R
¸
R−1
r=0
E{T [r]γ [r]}
1
R
¸
R−1
r=0
E{T [r]}
(7.44)
and thus T φ(γ )/T ≤ φ(T γ /T ).
Proof. Part (b) follows easily from part (a) (see Exercise 7.6). Here we prove part (a). Let
{(T [r], γ [r])}
∞
r=0
be an i.i.d. sequence of random vectors, each with the same distribution as (T, γ ).
Deﬁne t
0
= 0, and for integers R > 0 deﬁne t
R
=
¸
R−1
r=0
T [r]. Let interval [t
r
, t
r+1
) represent the
rth frame. Deﬁne ˆ γ (t ) to take the value γ [r] if t is in the rth frame, so that:
ˆ γ (t ) = γ [r] if t ∈ [t
r
, t
r+1
)
We thus have for any integer R > 0:
1
t
R
t
R
0
φ( ˆ γ (t ))dt =
¸
R−1
r=0
T [r]φ(γ [r])
¸
R−1
r=0
T [r]
=
1
R
¸
R−1
r=0
T [r]φ(γ [r])
1
R
¸
R−1
r=0
T [r]
(7.45)
On the other hand, by Jensen’s inequality for the concave function φ(γ ):
1
t
R
t
R
0
φ( ˆ γ (t ))dt ≤ φ
1
t
R
t
R
0
ˆ γ (t )dt
= φ
1
R
¸
R−1
r=0
T [r]γ [r]
1
R
¸
R−1
r=0
T [r]
(7.46)
7.5. UTILITY OPTIMIZATIONFORRENEWAL SYSTEMS 167
Taking limits of (7.45) as R → ∞ and using the law of large numbers yields:
lim
R→∞
1
t
R
t
R
0
φ( ˆ γ (t ))dt =
E{T φ(γ )}
E{T }
(w.p.1)
Taking limits of (7.46) as R → ∞and using the law of large numbers and continuity of φ(γ ) yields:
lim
R→∞
1
t
R
t
R
0
φ( ˆ γ (t ))dt ≤ φ
E{T γ }
E{T }
(w.p.1)
2
7.5.1 THEUTILITY OPTIMAL ALGORITHMFORRENEWAL SYSTEMS
To solve (7.38)(7.42), we enforce the constraints x
m
≥ T γ
m
and y
l
/T ≤ c
l
with virtual queues
Z
l
[r] and G
m
[r] for l ∈ {1, . . . , L} and m ∈ {1, . . . , M}:
Z
l
[r +1] = max[Z
l
[r] +y
l
[r] −T [r]c
l
, 0] (7.47)
G
m
[r +1] = max[G
m
[r] +T [r]γ
m
[r] −x
m
[r], 0] (7.48)
Note that the constraint x
m
≥ T γ
m
is equivalent to T γ
m
−x
m
≤ 0, which is the same as
p
m
/T ≤ 0 for p
m
[r]
=
T [r]γ
m
[r] −x
m
[r]. Hence, the transformed problem ﬁts the general renewal
framework (7.4)(7.6). Using y
0
[r] = −T [r]φ(γ [r]), the algorithmthen observes Z[r], G[r] at the
beginning of each frame r ∈ {0, 1, 2, . . . , } and chooses a policy π[r] ∈ P and auxiliary variables
γ [r] ∈ Rto minimize:
−VE
¸
ˆ
T (π[r])φ(γ [r])Z[r], G[r]
¸
E
¸
ˆ
T (π[r])Z[r], G[r]
¸
+
E
¸
¸
L
l=1
Z
l
[r] ˆ y
l
(π[r]) +
¸
M
m=1
G
m
[r][
ˆ
T (π[r])γ
m
[r] − ˆ x(π[r])]Z[r], G[r]
¸
E
¸
ˆ
T (π[r])Z[r], G[r]
¸
This minimization can be simpliﬁed by separating out the terms that use auxiliary variables. The
expression to minimize is thus:
E
¸
ˆ
T (π[r])[−Vφ(γ [r]) +
¸
M
m=1
G
m
[r]γ
m
[r]]Z[r], G[r]
¸
E
¸
ˆ
T (π[r])Z[r], G[r]
¸
+
E
¸
¸
L
l=1
Z
l
[r] ˆ y
l
(π[r]) −
¸
M
m=1
G
m
[r] ˆ x(π[r])Z[r], G[r]
¸
E
¸
ˆ
T (π[r])Z[r], G[r]
¸
Clearly, the γ [r] variables can be optimized separately to minimize the ﬁrst term, making the
frame size in the numerator and denominator of the ﬁrst term cancel. The resulting algorithm is
168 7. OPTIMIZATIONOF RENEWAL SYSTEMS
thus: Observe Z[r] and G[r] at the beginning of each frame r ∈ {0, 1, 2, . . .}, and perform the
following:
• (Auxiliary Variables) Choose γ [r] to solve:
Maximize: Vφ(γ [r]) −
¸
M
m=1
G
m
[r]γ
m
[r]
Subject to: γ
m,min
≤ γ
m
[r] ≤ γ
m,max
∀m ∈ {1, . . . , M}
• (Policy Selection) Choose π[r] ∈ P to minimize the following:
E
¸
¸
L
l=1
Z
l
[r] ˆ y
l
(π[r]) −
¸
M
m=1
G
m
[r] ˆ x(π[r])Z[r], G[r]
¸
E
¸
ˆ
T (π[r])Z[r], G[r]
¸
• (Virtual Queue Updates) At the end of frame r, update Z[r] and G[r] by (7.47) and (7.48).
The auxiliary variable update has the same structure as that given in Chapter 5, and it is
a deterministic optimization that reduces to M optimizations of single variable functions if φ(γ )
has the form φ(γ ) =
¸
M
m=1
φ
m
(γ
m
). The policy selection stage is a minimization of a ratio of
expectations, and it can be solved with the techniques given in Section 7.3.
7.6 DYNAMICPROGRAMMINGEXAMPLES
This section presents more complex renewal system examples that involve the theory of dynamic
programming. Readers unfamiliar with dynamic programming can skip this section, and are referred
to (64) for a coverage of that theory. Readers familiar with dynamic programming can peruse these
examples.
7.6.1 DELAYLIMITEDTRANSMISSIONEXAMPLE
Here we present an example similar to the delaylimited transmission systemdeveloped for coopera
tive communication in (71), although we remove the cooperative component for simplicity. Consider
a system with L wireless transmitters that deliver data to a common receiver. Time is slotted with
unit size, and all frames are ﬁxed to T slots, where T is a positive integer. At the beginning of each
frame r ∈ {0, 1, 2, . . .}, new packets arrive for transmission. These packets must be delivered within
the T slot frame τ ∈ {rT, . . . , (r +1)T −1}, or they are dropped at the end of the frame. Let
A[r] = (A
1
[r], . . . , A
L
[r]) be the vector of new packet arrivals, treated as initial information about
frame r. Assume that A[r] is i.i.d. over frames. On each slot τ of the T slot frame, at most one
transmitter l ∈ {1, . . . , L} is allowed to transmit, and it can transmit at most a single packet. Let
Q
l
(τ) represent the (integer) queue size for transmitter l on slot τ. Then for frame r ∈ {0, 1, . . .},
7.6. DYNAMICPROGRAMMINGEXAMPLES 169
we have:
Q
l
(rT ) = A
l
[r]
Q
l
(rT +v) = A
l
[r] −
rT +v−1
¸
τ=rT
1
l
(τ) , ∀v ∈ {1, . . . , T −1}
where 1
l
(τ) is an indicator function that is 1 if transmitter l successfully delivers a packet on slot τ,
and is 0 otherwise.
The success of each packet transmission depends on the power that was used. Let p(τ) =
(p
1
(τ), . . . , p
L
(τ)) represent the power allocation vector on each slot τ in the T slot frame. This
vector is chosen every slot τ subject to the constraints:
0 ≤ p
l
(τ) ≤ p
max
∀l ∈ {1, . . . , L}, ∀τ
p
l
(τ) = 0 if Q
l
(τ) = 0 ∀l ∈ {1, . . . , L}, ∀τ
p
l
(τ)p
m
(τ) = 0 ∀l, m ∈ {1, . . . , L}, ∀τ
The third constraint above ensures at most one transmitter can send on any given slot. Transmission
successes are conditionally independent of past history given the transmission power used, with
success probability for each l ∈ {1, . . . , L} given by:
q
l
(p)
=
Pr[transmitter l is successful on slot τp
l
(τ) = p, Q
l
(τ) > 0]
We assume that q
l
(0) = 0 for all l ∈ {1, . . . , L}. Deﬁne D
l
[r] and y
l
[r] as the total packets delivered
and total energy expended by transmitter l on frame r:
D
l
[r]
=
rT +T −1
¸
τ=rT
1
l
(τ) ∀l ∈ {1, . . . , L}
y
l
[r]
=
rT +T −1
¸
τ=rT
p
l
(τ) ∀l ∈ {1, . . . , L}
The goal is to maximize a weighted sum of throughput subject to average power constraints:
Maximize:
¸
L
l=1
w
l
D
l
/T
Subject to: y
l
/T ≤ p
av
∀l ∈ {1, . . . , L}
π[r] ∈ P ∀r ∈ {0, 1, 2, . . .}
where {w
l
}
L
l=1
are a given collection of positive weights, p
av
is a given constant power constraint, D
l
and y
l
are the average delivered data and energy expenditure by transmitter l on one frame, and P is
the policy space that conforms to the above transmission constraints over the frame. This problem
ﬁts the standard renewal form given in Section 7.1 with c
l
= p
av
for all l ∈ {1, . . . , L}, and:
y
0
[r]
=
−
L
¸
l=1
w
l
rT +T −1
¸
τ=rT
1
l
(τ)
170 7. OPTIMIZATIONOF RENEWAL SYSTEMS
We thus form virtual queues Z
l
[r] for each l ∈ {1, . . . , L}, with updates:
Z
l
[r +1] = max
¸
Z
l
[r] +
rT +T −1
¸
τ=rT
p
l
(τ) −p
av
T, 0
(7.49)
Then perform the following:
• For every frame r, observe A[r] and make actions over the course of the frame to solve:
Maximize:
L
¸
l=1
rT +T −1
¸
τ=rT
E{Vw
l
1
l
(τ) −Z
l
[r]p
l
(τ)Z[r], A[r]}
Subject to: (1) 0 ≤ p
l
(τ) ≤ p
max
∀l, ∀τ ∈ {rT, . . . , rT +T −1}
(2) p
l
(τ) = 0 if Q
l
(τ) = 0 ∀l, ∀τ ∈ {rT, . . . , rT +T −1}
(3) p
l
(τ)p
m
(τ) = 0 ∀l, m, ∀τ ∈ {rT, . . . , rT +T −1}
• Update Z
l
[r] according to (7.49).
The above uses the fact that the desired ratio (7.17) in this case has a constant denominator T ,
andhence it sufﬁces tominimize the numerator, whichcanbe achievedby minimizing the conditional
expectation given both the Z[r] and A[r] values. Using iterated expectations, the expression to be
maximized can be rewritten:
L
¸
l=1
rT +T −1
¸
τ=rT
E{Vw
l
q
l
(p
l
(τ)) −Z
l
[r]p
l
(τ)Z[r], A[r]}
The problem can be solved as a dynamic program (64). Speciﬁcally, we can start backwards and
deﬁne J
T
(Q) as the optimal reward in the ﬁnal stage T (corresponding to slot τ = rT +T −1)
given that Q(rT +T −1) = Q:
J
T
(Q)
=
max
lQ
l
>0
¸
max
{p0≤p≤p
max
}
[Vw
l
q
l
(p) −Z
l
[r]p]
¸
This function J
T
(Q) is computed for all integer vectors Qthat satisfy 0 ≤ Q ≤ A[r]. Then deﬁne
J
T −1
(Q) as the optimal expected sum reward in the last two stages {T −1, T }, given that Q(rT +
T −2) = Q:
J
T −1
(Q)
=
max
lQ
l
>0
¸
max
{p0≤p≤p
max
}
[Vw
l
q
l
(p) −Z
l
[r]p +q
l
(p)J
T
(Q−e
l
) +(1 −q
l
(p))J
T
(Q)]
¸
where e
l
is a vector that is zero in all entries j = l, and is 1 in entry l. The function J
T −1
(Q) is also
computed for all Q that satisfy 0 ≤ Q ≤ A[r]. In general, we have for stages k ∈ {1, . . . , T −1}
the following recursive equation:
J
k
(Q)
=
max
lQ
l
>0
¸
max
{p0≤p≤p
max
}
[Vw
l
q
l
(p) −Z
l
[r]p +q
l
(p)J
k+1
(Q−e
l
) +(1 −q
l
(p))J
k+1
(Q)]
¸
7.6. DYNAMICPROGRAMMINGEXAMPLES 171
The value J
1
(Q) represents the expected total reward over frame r under the optimal policy, given
that Q(rT ) = Q. The optimal action to take at each stage k corresponds to the transmitter l and
the power level p that achieves the maximum in the computation of J
k
(Q).
For a modiﬁed problem where power allocations are restricted to p
l
(τ) ∈ {0, p
max
}, it can
be shown the problem has a simple greedy solution: On each slot τ of frame r, consider the set
of links l such that Q
l
(τ) > 0, and transmit over the link l in this set that has the largest positive
Vw
l
q
l
(p
max
) −Z
l
[r]p
max
value, breaking ties arbitrarily and choosing not to transmit over any
link if none of these values are positive.
7.6.2 MARKOVDECISIONPROBLEMFORMINIMUMDELAY
SCHEDULING
Here we consider a Markov decision problem involving queueing delay, from (56)(57). Consider a
2queue wireless downlink in slotted time t ∈ {0, 1, 2, . . .}. Packets arrive randomly every slot, and
the controller can transmit a packet from at most one queue per slot. Let Q
i
(t ) be the (integer)
number of packets in queue i on slot t , for i ∈ {1, 2}. We assume the queues have a ﬁnite buffer of 10
packets, so that packets arriving when the Q
i
(t ) = 10 are dropped. To enforce a renewal structure,
let χ(t ) be an independent process of i.i.d. Bernoulli variables with Pr[χ(t ) = 1] = δ, for some
renewal probability δ > 0. The contents of both queues are emptied whenever χ(t ) = 1, so that
queueing dynamics are given by:
Q
i
(t +1) =
¸
min[Q
i
(t ) +A
i
(t ), 10] −1
i
(t ) if χ(t ) = 0
0 if χ(t ) = 1
where 1
i
(t ) is an indicator function that is 1 if a packet is successfully transmitted from queue i on
slot t (and is 0 otherwise), and A
i
(t ) is the (integer) number of new packet arrivals to queue i. The
maximum packet loss rate due to forced renewals is thus 20δ, which can be made arbitrarily small
with a small choice of δ > 0. We assume the controller knows the value of χ(t ) at the beginning
of each slot. We have two choices of a renewal deﬁnition: (i) Deﬁne a renewal event on slot t
whenever (Q
1
(t ), Q
2
(t )) = (0, 0), (ii) Deﬁne a renewal event on slot t whenever χ(t −1) = 1.
The ﬁrst deﬁnition has shorter renewal frames, but the frames sizes depend on the control actions.
This would require minimizing a ratio of expectations every slot. The second deﬁnition has frame
sizes that are independent of the control actions, and have mean 1/δ. For simplicity, we use the
second deﬁnition.
Let g
i
(t ) be the number of packets dropped from queue i on slot t :
g
i
(t ) =
¸
A
i
(t )1{Q
i
(t ) = 10} if χ(t ) = 0
Q
i
(t ) +A
i
(t ) −1
i
(t ) if χ(t ) = 1
where 1{Q
i
(t ) = 10} is an indicator function that is 1 if Q
i
(t ) = 10, and 0 otherwise.
Assume the processes A
1
(t ) and A
2
(t ) are independent of each other. A
1
(t ) is i.i.d. Bernoulli
with Pr[A
1
(t ) = 1] = λ
1
, and A
2
(t ) is i.i.d. Bernoulli with Pr[A
2
(t ) = 1] = λ
2
. Every slot, the
172 7. OPTIMIZATIONOF RENEWAL SYSTEMS
controller chooses a queue for transmission by selecting a power allocation vector (p
1
(t ), p
2
(t ))
subject to the constraints:
0 ≤ p
i
(t ) ≤ p
max
, p
1
(t )p
2
(t ) = 0 ∀i ∈ {1, 2}, ∀t
p
i
(t ) = 0 , if Q
i
(t ) = 0 ∀i ∈ {1, 2}, ∀t
where p
max
is a given maximum power level. Let P(Q) denote the set of all power vectors that
satisfy these constraints. Transmission successes are independent of past history given the power
level used, with probabilities:
q
i
(p)
=
Pr[1
i
(t ) = 1Q
i
(t ) > 0, p
i
(t ) = p]
Assume that q
1
(0) = q
2
(0) = 0.
The goal is to minimize the time average rate of packet drops g
1
+g
2
subject to an average
power constraint p
av
and an average delay constraint of 3 slots for all nondropped packets in each
queue: W
1
≤ 3, W
2
≤ 3. Speciﬁcally, deﬁne
˜
λ
i
= λ
i
−g
i
as the throughput of queue i. By Little’s
Theorem (129), we have Q
i
=
˜
λ
i
W
i
, and so the delay constraints can be transformed to Q
i
≤ 3
˜
λ
i
,
which is equivalent to Q
i
−3(λ
i
−g
i
) ≤ 0.
Let t [r] be the slot that starts renewal frame r ∈ {0, 1, 2, . . .} (where t [0] = 0), and let T [r]
represent the number of slots in the rth renewal frame. Thus, we have constraints:
lim
R→∞
1
R
R−1
¸
r=0
t [r]+T [r]−1
¸
τ=t [r]
[Q
1
(τ) −3(A
1
(τ) −g
1
(τ))] ≤ 0
lim
R→∞
1
R
¸
R−1
r=0
¸
t [r]+T [r]−1
τ=t [r]
[p
1
(τ) +p
2
(τ)]
1
R
¸
R−1
r=0
T [r]
≤ p
av
Following the renewal system framework, we deﬁne virtual queues Z
1
[r], Z
2
[r], Z
p
[r]:
Z
1
[r +1] = max
⎡
⎣
Z
1
[r] +
t [r]+T [r]−1
¸
τ=t [r]
[Q
1
(τ) −3(A
1
(τ) −g
1
(τ))], 0
⎤
⎦
(7.50)
Z
2
[r +1] = max
⎡
⎣
Z
2
[r] +
t [r]+T [r]−1
¸
τ=t [r]
[Q
2
(τ) −3(A
2
(τ) −g
2
(τ))], 0
⎤
⎦
(7.51)
Z
p
[r +1] = max
⎡
⎣
Z
p
[r] +
t [r]+T [r]−1
¸
τ=t [r]
[p
1
(τ) +p
2
(τ) −p
av
], 0
⎤
⎦
(7.52)
Making the queues Z
1
[r] and Z
2
[r] rate stable ensures the desired delay constraints are satisﬁed,
and making queue Z
p
[r] rate stable ensures the power constraint is satisﬁed. We thus have the
following algorithm, which only minimizes the numerator in the ratio of expectations because the
denominator is independent of the policy:
7.6. DYNAMICPROGRAMMINGEXAMPLES 173
• At the beginning of each frame r, observe Z[r] = [Z
1
[r], Z
2
[r], Z
p
[r]] and make power
allocation decisions to minimize the following expression over the frame:
E
⎧
⎨
⎩
t [r]+T [r]−1
¸
τ=t [r]
f (p(τ), A(τ), Q(τ), Z[r])
Z[r]
⎫
⎬
⎭
where f (p(τ), A(τ), Q(τ), Z[r]) is deﬁned:
f (p(τ), A(τ), Q(τ), Z[r])
=
V(g
1
(τ) +g
2
(τ)) +
2
¸
i=1
Z
i
[r][Q
i
(τ) +3g
i
(τ)]
+Z
p
[r][p
1
(τ) +p
2
(τ)]
• Update the virtual queues Z[r] by (7.50)(7.52).
The minimization in the above algorithmcan be solved by dynamic programming. Speciﬁcally,
given queues Z[r] = Z that start the frame, deﬁne J
Z
(Q) as the optimal cost until the end of a
renewal frame, given the initial queue backlog is Q. Then J
Z
(0) is the value of the expression to be
minimized. We have (56)(57):
J
Z
(Q) = δE
A
¸
inf
p∈P(Q)
f (p, A, Q, Z)χ = 1, Q, Z
¸
+(1 −δ)E
A
¸
inf
p∈P(Q)
¸
f (p, A, Q, Z) +h(p, A, Q, Z)
¸
χ = 0, Q, Z
¸
(7.53)
where h(p, A, Q, Z) is deﬁned:
h(p, A, Q, Z)
=
J
Z
(min[Q+A, 10])(1 −q(p)) +J
Z
(min[Q+A, 10] −e(p))q(p)
where:
q(p) =
¸
q
1
(p
1
) if p
1
> 0
q
2
(p
2
) if p
1
= 0
, e(p) =
¸
(1, 0) if p
1
> 0
(0, 1) if p
1
= 0
The equation (7.53) must be solved to ﬁnd J
Z
(Q) for all Q ∈ {0, 1, . . . , 10} ×{0, 1, . . . , 10}.
Deﬁne (J) as an operator that takes a function J(Q) (for Q ∈ {0, 1, . . . , 10} ×
{0, 1, . . . , 10}) and maps it to another such function via the righthandside of (7.53). Then (7.53)
reduces to:
J
Z
(Q) = (J
Z
(Q))
and hence the desired J
Z
(Q) is a ﬁxed point of the (·) operator. It can be shown that (·) is a
contraction with an appropriate deﬁnition of distance (67)(57), and so the ﬁxed point is unique and
can be obtained by iteration of the (·) operator starting with any initial function J
(0)
(Q) (such as
J
(0)
(Q) = 0):
J
(0)
(Q) = 0 , J
(i+1)
(Q) = (J
(i)
(Q)) ∀i ∈ {0, 1, 2, . . .}
174 7. OPTIMIZATIONOF RENEWAL SYSTEMS
Then lim
i→∞
J
(i)
(Q) solves the ﬁxed point equation and hence is equal to the desired J
Z
(Q)
function. While this then needs to be recomputed for the next frame (because the queue Z[r]
change), the change in these queues over one frame is bounded and the resulting J
Z
(Q) function
for frame r is already a good approximation for this function on frame r +1. Thus, the initial value
of the iteration can be the ﬁnal value found in the previous frame.
Iteration of the (J) operator requires knowledge of the A(t ) distribution to compute the
desired expectations. In this case of independent Bernoulli inputs, this involves knowing only two
scalars λ
1
and λ
2
. However, for larger problems when the random events every slot can be a large
vector, the expectations can be accurately approximated by averaging over past samples, as in (7.29).
See (57) for an analysis of the error bounds in this technique.
See also (61)(60)(59) for alternative approximations to the Markov DecisionProblemfor wire
less queueing delay. A detailed treatment of stochastic shortest path problems and approximations is
found in (67). Approximate dynamic programming methods that approximate value functions with
simpler functions can be found in (68)(187)(67)(69). Recent work in (62)(63) combines Markov
Decision theory and approximate value functions for treatment of energy and delay optimization in
wireless systems.
7.7 EXERCISES
Exercise 7.1. (Deterministic Task Processing) Suppose N network nodes cooperate to process
a sequence of tasks. A new task is started when the previous task ends, and we label the tasks
r ∈ {0, 1, 2, . . .}. For each new task r, the network controller makes a decision about which single
node n[r] will process the task, and what modality m[r] will be used in the processing. Assume
there are M possible modalities, each with different durations and energy expenditures. The task r
decision is π[r] = (n[r], m[r]), where n[r] ∈ {1, . . . , N} and m[r] ∈ {1, . . . , M}. Deﬁne T (n, m)
and β(n, m) as the duration of time and the energy expenditure, respectively, required for node n to
process a task using modality m. Assume that T (n, m) ≥ 0 and β(n, m) ≥ 0 for all n, m. Let e
n
[r]
represent the energy expended by node n ∈ {1, . . . , N} during task r:
e
n
[r] =
¸
β(n[r], m[r]) if n[r] = n
0 if n[r] = n
We want to maximize the task processing rate subject to average power constraints at each node:
Maximize: 1/T
Subject to: 1) e
n
/T ≤ p
n,av
, ∀n ∈ {1, . . . , N}
2) n[r] ∈ {1, . . . , N}, m[r] ∈ {1, . . . , M} , ∀r ∈ {0, 1, 2, . . .}
where p
n,av
is the average power constraint for node n ∈ {1, . . . , N}. State the renewalbased drift
pluspenalty algorithmof Section 7.2 for this problem. Note that there is no randomness here, and so
the ratio of expectations to be minimized on each frame becomes a ratio of deterministic functions.
7.7. EXERCISES 175
Exercise 7.2. (NonIncreasing Property of val(θ)). Consider the val(θ) functionin(7.32). Suppose
that θ
1
≤ θ
2
.
a) Argue that for all η
i
, π, Z
l
[r], we have:
−V ˆ g(η
i
, π) +
L
¸
l=1
Z
l
[r] ˆ y
l
(η
i
, π) −θ
1
ˆ
T (η
i
, π) ≥ −V ˆ g(η
i
, π) +
L
¸
l=1
Z
l
[r] ˆ y
l
(η
i
, π) −θ
2
ˆ
T (η
i
, π)
b) Prove that val(θ
1
) ≥ val(θ
2
).
Exercise 7.3. (An Alternative Algorithmwith Modiﬁed Objective) Consider the systemof Section
7.1. However, suppose we desire a solution to the following modiﬁed problem:
Minimize: y
0
Subject to: y
l
/T ≤ c
l
∀l ∈ {1, . . . , L}
π[r] ∈ P ∀r ∈ {0, 1, 2, . . .}
This differs from (7.4)(7.6) because we seek to minimize y
0
rather than y
0
/T . Deﬁne the same
virtual queues Z[r] in (7.12). Note that (7.16) still applies. Consider the algorithm that, every frame
r, observes Z[r] and chooses a policy π[r] ∈ P to minimize the righthandside of (7.16). It then
updates Z[r] by (7.12) at the end of the frame. Assume there is an i.i.d. algorithmπ
∗
[r] that yields:
E
¸
ˆ y
0
(π
∗
[r])
¸
= y
opt
0
(7.54)
E
¸
ˆ y
l
(π
∗
[r])
¸
≤ E
¸
ˆ
T (π
∗
[r])
¸
c
l
∀l ∈ {1, . . . , L} (7.55)
a) Plug the i.i.d. algorithmπ
∗
[r] into the righthandside of (7.16) to showthat (Z[r]) ≤ F
for some ﬁnite constant F, and hence all queues are mean rate stable so that:
limsup
R→∞
[y
l
[R] −c
l
T [R]] ≤ 0
b) Again plug the i.i.d. algorithm π
∗
[r] into the righthandside of (7.16), and use iterated
expectations and telescoping sums to prove:
limsup
R→∞
y
0
[R] ≤ y
opt
0
+B/V
Exercise 7.4. (Manipulating limits) Suppose that limsup
R→∞
[y
l
[R] −c
l
T [R]] ≤ 0, where 0 <
T
min
≤ T [R] ≤ T
max
for all R > 0.
a) Argue that for all integers R > 0:
y
l
[R]
T [R]
−c
l
≤ max
¸
0,
y
l
[R]
T [R]
−c
l
¸
T [R]
T
min
= max
¸
0, y
l
[R] −c
l
T [R]
¸
1
T
min
176 7. OPTIMIZATIONOF RENEWAL SYSTEMS
b) Take limits of the inequality in (a) to conclude that:
limsup
R→∞
y
l
[R]
T [R]
≤ c
l
Exercise 7.5. (An Alternative Algorithm with Time Averaging) Consider the optimization prob
lem (7.4)(7.6) for a renewal system with frame sizes T [r] that depend on the policy π[r]. Deﬁne
θ[0] = 0. For each stage r ∈ {1, 2, . . . , } deﬁne θ[r] by:
θ[r]
=
1
r
¸
r−1
k=0
y
0
[k]
1
r
¸
r−1
k=0
T [k]
(7.56)
so that θ[r] is the empirical time average of the penalty to be minimized over the ﬁrst r frames.
Consider the following modiﬁed algorithm, which does not require the multistep bisection phase,
but makes assumptions about convergence:
• Every frame r, observe θ[r], Z[r], and choose a policy π[r] ∈ P to minimize:
E
¸
V[ ˆ y
0
(π[r]) −θ[r]
ˆ
T (π[r])] +
¸
L
l=1
Z
l
[r][ ˆ y
l
(π[r]) −c
l
ˆ
T (π[r])]Z[r], θ[r]
¸
• Update θ[r] by (7.56) and update Z[r] by (7.12).
To analyze this algorithm, we assume that there are constants θ, T , y
0
such that, with probability 1:
lim
R→∞
θ[R] = θ , lim
R→∞
1
R
¸
R−1
r=0
T [r] = T , lim
R→∞
1
R
¸
R−1
r=0
y
0
[r] = y
0
(7.57)
We further assume there is an i.i.d. algorithm π
∗
[r] that satisﬁes (7.9)(7.10) with δ = 0.
a) Use (7.14) to complete the righthandside of the following inequality:
(Z[r]) +VE{y
0
[r] −θ[r]T [r]Z[r]} ≤ B +· · ·
b) Assume E{L(Z[0])} = 0. Plug the i.i.d. algorithmπ
∗
[r] from(7.54)(7.55) into the right
handside of part (a) to prove that (Z[r]) ≤ F for some constant F, and so all queues are mean
rate stable. Use iterated expectations and the lawof telescoping sums to conclude that for any R > 0:
E
¸
1
R
¸
R−1
r=0
[y
0
[r] −θ[r]T [r]]
¸
≤ E
¸
ˆ
T (π
∗
[r])
¸ ¸
rat io
opt
−
1
R
¸
R−1
r=0
E{θ[r]}
¸
+B/V
c) Argue from (7.56) and (7.57) that, with probability 1:
lim
R→∞
1
R
¸
R−1
r=0
[y
0
[r] −θ[r]T [r]] = 0 , lim
R→∞
1
R
¸
R−1
r=0
θ[r] = θ
7.7. EXERCISES 177
d) Assume that:
lim
R→∞
E
¸
1
R
¸
R−1
r=0
[y
0
[r] −θ[r]T [r]]
¸
= 0 , lim
R→∞
1
R
¸
R−1
r=0
E{θ[r]} = θ
This can be justiﬁed via part (c) together with the Lebesgue Dominated convergence theorem,
provided that mild additional boundedness assumptions on the processes are introduced. Use this
with part (b) to prove:
θ = lim
R→∞
1
R
¸
R−1
r=0
y
0
[r]
1
R
¸
R−1
r=0
T [r]
≤ rat io
opt
+
B
E
¸
ˆ
T (π
∗
[r])
¸
V
(w.p.1)
Exercise 7.6. (Variation on Jensen’s Inequality) Assume the result of Lemma 7.6(a).
a) Let {T [0], T [1], . . . , T [R −1]}, {γ [0], γ [1], . . . , γ [R −1]} be deterministic sequences.
Prove (7.43) by deﬁning X as a random integer that is uniform over {0, . . . , R −1} and deﬁning
the random vector (T [X], γ [X]).
b) Prove (7.44) by considering {T [0], T [1], . . . , T [R −1]}, {γ [0], γ [1], . . . , γ [R −1]} as
random sequences that are independent of X.
Exercise 7.7. (Equivalence of the Transformed Problem)
a) Suppose that π
[r], γ
[r] solve (7.38)(7.42), yielding γ
m
, T
, y
l
, T
φ(γ
), T
γ
. Use
the fact that φ(T
γ
/T
) ≥ T
φ(γ
)/T
to show that the same policy π
[r] satisﬁes the feasibility
constraints (7.36)(7.37) and yields φ(x
/T
) ≥ T
φ(γ
)/T
.
b) Suppose that π
∗
[r] is an algorithmthat solves (7.35)(7.37), yielding x
∗
, T
∗
, and y
∗
l
. Show
that the optimal value of (7.38) is greater than or equal to φ(x
∗
/T
∗
). Hint: Use the same policy
π
∗
[r], and use the constant γ [r] = x
∗
/T
∗
for all r ∈ {0, 1, 2, . . .}, noting from (7.33) that this is
in R.
Exercise 7.8. (Utility Optimization with DelayLimited Scheduling) Modify the example in Sec
tion 7.6.1 to treat the problem of maximizing the utility function
¸
L
l=1
log(1 +D
l
/T ), rather than
maximizing
¸
L
l=1
w
l
D
l
/T .
Exercise 7.9. (A simple form of Lebesgue Dominated Convergence) Let {f [r]}
∞
r=0
be an inﬁnite
sequence of randomvariables. Suppose there are ﬁnite constants f
min
and f
max
such that the random
variables deterministically satisfy f
min
≤ f [r] ≤ f
max
for all r ∈ {0, 1, 2, . . .}. Suppose there is a
ﬁnite constant f such that:
lim
R→∞
1
R
¸
R−1
r=0
f [r] = f (w.p.1)
178 7. OPTIMIZATIONOF RENEWAL SYSTEMS
We will show that lim
R→∞
1
R
¸
R−1
r=0
E{f [r]} = f .
a) Fix > 0. Argue that for any integer R > 0:
E
¸
1
R
¸
R−1
r=0
f [r]
¸
≤ (f +)Pr
¸
1
R
¸
R−1
r=0
f [r] ≤ f +
¸
+f
max
Pr
¸
1
R
¸
R−1
r=0
f [r] > f +
¸
b) Argue that for any > 0:
lim
R→∞
Pr
¸
1
R
¸
R−1
r=0
f [r] > f +
¸
= 0
Use this with part (a) to conclude that for all > 0:
lim
R→∞
1
R
¸
R−1
r=0
E{f [r]} ≤ f +
Conclude that the lefthandside in the above inequality is less than or equal to f .
c) Make a similar argument to show lim
t →∞
1
R
¸
R−1
r=0
E{f [r]} ≥ f .
179
C H A P T E R 8
Conclusions
This text has presented a theory for optimizing time averages in stochastic networks. The tools
of Lyapunov drift and Lyapunov optimization were developed to solve these problems. Our focus
was on communication and queueing networks, including networks with wireless links and mobile
devices. The theory can be used for networks with a variety of goals and functionalities, such as
networks with:
• Network coding capabilities (see Exercise 4.12 and (188)(189)(190)).
• Dynamic data compression (see Exercise 4.14 and (191)(165)(143)).
• Multiinput, multioutput (MIMO) antenna capabilities (162)(192)(193).
• Multireceiver diversity (154).
• Cooperative combining (194)(71).
• Hop count minimization (155).
• Economic considerations (195)(153).
Lyapunov optimization theory also has applications to a wide array of other problems, including
(but not limited to):
• Stock market trading (40)(41).
• Product assembly plants (196)(175)(197)(198).
• Energy allocation for smart grids (159).
This text has included several representative simulation results for 1hop networks (see Chap
ter 3). Further simulation and experimentation results for Lyapunov based algorithms in singlehop
and multihop networks can be found in (54)(55)(199)(200)(201)(202)(203)(154)(142)(42).
We have highlighted the simplicity of Lyapunov drift and Lyapunov optimization, empha
sizing that it only uses techniques of (see Chapters 1 and 3): (i) Telescoping sums, (ii) Iterated
expectations, (iii) Opportunistically minimizing an expectation, (iv) Jensen’s inequality. Further, the
driftpluspenalty algorithmof Lyapunov optimization theory is analyzed with the following simple
framework:
1. Deﬁne a Lyapunov function as the sum of squares of queue backlog.
180 8. CONCLUSIONS
2. Compute a bound on the driftpluspenalty by squaring the queueing equation. The bound
typically has the form:
((t )) +VE{penalty(t )(t )} ≤
B +VE{penalty(t )(t )} +
N
¸
n=1
n
(t )E{h
n
(t )(t )}
where B is a constant that bounds second moments of the processes, (t ) =
(
1
(t ), . . . ,
n
(t )) is a general vector of (possibly virtual) queues, and h
n
(t ) is the arrival
minusdeparture value for queue
n
(t ) on slot t .
3. Design the policy to minimize the righthandside of the above driftpluspenalty bound.
4. Conclude that, under this algorithm, the driftpluspenalty is bounded by plugging any other
policy into the righthandside:
((t )) +VE{penalty(t )(t )} ≤
B +VE
¸
penalty
∗
(t )(t )
¸
+
N
¸
n=1
n
(t )E
¸
h
∗
n
(t )(t )
¸
5. Plug an ωonly policy α
∗
(t ) into the righthandside, one that is known to exist (although it
would be hard to compute) that satisﬁes all constraints and yields a greatly simpliﬁed drift
pluspenalty expression on the righthandside.
Also important in this theory is the use of virtual queues to transform time average inequality
constraints into queue stability problems, and auxiliary variables for the case of optimizing convex
functions of time averages.The driftpluspenalty frameworkwas alsoshowntoholdfor optimization
of nonconvex functions of time averages, and for optimization over renewal systems.
The resulting mindrift (or “maxweight”) algorithms can be very complex for general prob
lems, particularly for wireless networks with interference. However, we have seen that lowcomplexity
approximations can be used to provide good performance. Further, for interference networks with
out timevariation, methods that take a longer time to ﬁnd the maxweight solution (either by a
deterministic or randomized search) were seen to provide full throughput and throughpututility op
timality with arbitrarily low pertimeslot computation complexity, provided that we let convergence
time and/or delay increase (possibly nonpolynomially) to inﬁnity. Simple distributed Carrier Sense
Multiple Access (CSMA) implementations are often possible (and provably throughput optimal)
for these networks via the JiangWalrand theorem, which hints at deeper connections with Lya
punov optimization, maxweight theory, Cadditive approximations, maximum entropy solutions,
randomized algorithms, and Markov chain steady state theory.
181
Bibliography
[1] F. Kelly. Charging and rate control for elastic trafﬁc. European Transactions on Telecommuni
cations, vol. 8, no. 1 pp. 3337, Jan.Feb. 1997. DOI: 10.1002/ett.4460080106 3, 98
[2] F.P. Kelly, A.Maulloo, and D. Tan. Rate control for communication networks: Shadowprices,
proportional fairness, and stability. Journ. of the Operational Res. Society, vol. 49, no. 3, pp.
237252, March 1998. DOI: 10.2307/3010473 3, 7, 98, 104
[3] J. Mo and J. Walrand. Fair endtoend windowbased congestion control. IEEE/ACM
Transactions on Networking, vol. 8, no. 5, Oct. 2000. DOI: 10.1109/90.879343 3, 128
[4] L. Massoulié and J. Roberts. Bandwidth sharing: Objectives and algorithms. IEEE/ACM
Transactions on Networking, vol. 10, no. 3, pp. 320328, June 2002.
DOI: 10.1109/TNET.2002.1012364 3
[5] A. Tang, J. Wang, and S. Low. Is fair allocation always inefﬁcient. Proc. IEEE INFOCOM,
March 2004. DOI: 10.1109/INFCOM.2004.1354479 3, 98, 128
[6] B. Radunovic and J.Y. Le Boudec. Rate performance objectives of multihop wireless net
works. IEEE Transactions on Mobile Computing, vol. 3, no. 4, pp. 334349, Oct.Dec. 2004.
DOI: 10.1109/TMC.2004.45 3, 128
[7] L. Tassiulas and A. Ephremides. Stability properties of constrained queueing systems and
scheduling policies for maximum throughput in multihop radio networks. IEEETransactions
on Automatic Control, vol. 37, no. 12, pp. 19361948, Dec. 1992. DOI: 10.1109/9.182479 6,
49, 113, 138
[8] L. Tassiulas and A. Ephremides. Dynamic server allocation to parallel queues with randomly
varying connectivity. IEEE Transactions on Information Theory, vol. 39, no. 2, pp. 466478,
March 1993. DOI: 10.1109/18.212277 6, 10, 24, 49, 66
[9] P. R. Kumar and S. P. Meyn. Stability of queueing networks and scheduling policies. IEEE
Trans. on Automatic Control, vol.40,.n.2, pp.251260, Feb. 1995. DOI: 10.1109/9.341782 6
[10] N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand. Achieving 100% throughput
in an inputqueued switch. IEEETransactions on Communications, vol. 47, no. 8, August 1999.
6, 88
182 BIBLIOGRAPHY
[11] E. Leonardi, M. Mellia, F. Neri, and M. Ajmone Marsan. Bounds on average delays and queue
size averages and variances in inputqueued cellbased switches. Proc. IEEE INFOCOM,
2001. DOI: 10.1109/INFCOM.2001.916303 6
[12] L. Tassiulas. Scheduling and performance limits of networks with constantly changing topol
ogy. IEEE Transactions on Information Theory, vol. 43, no. 3, pp. 10671073, May 1997.
DOI: 10.1109/18.568722 6
[13] N. Kahale and P. E. Wright. Dynamic global packet routing in wireless networks. Proc. IEEE
INFOCOM, 1997. DOI: 10.1109/INFCOM.1997.631182 6
[14] M. Andrews, K. Kumaran, K. Ramanan, A. Stolyar, P. Whiting, and R. Vijaykumar. Providing
quality of service over a shared wireless link. IEEE Communications Magazine, vol. 39, no.2,
pp.150154, Feb. 2001. DOI: 10.1109/35.900644 6
[15] M. J. Neely, E. Modiano, and C. E Rohrs. Dynamic power allocation and routing for time
varying wireless networks. IEEE Journal on Selected Areas in Communications, vol. 23, no. 1,
pp. 89103, January 2005. DOI: 10.1109/JSAC.2004.837349 6, 24, 56, 113
[16] B. Awerbuch and T. Leighton. A simple localcontrol approximation algorithm for mul
ticommodity ﬂow. Proc. 34th IEEE Conf. on Foundations of Computer Science, Oct. 1993.
DOI: 10.1109/SFCS.1993.366841 6
[17] M. J. Neely. Dynamic Power Allocation and Routing for Satellite and Wireless Networks with
Time Varying Channels. PhD thesis, Massachusetts Institute of Technology, LIDS, 2003. 6,
8, 11, 49, 105, 119, 120, 128, 134, 145
[18] M. J. Neely, E. Modiano, and C. Li. Fairness and optimal stochastic control for heterogeneous
networks. Proc. IEEEINFOCOM, March 2005. DOI: 10.1109/INFCOM.2005.1498453 6,
49, 105, 134
[19] M. J. Neely, E. Modiano, and C. Li. Fairness and optimal stochastic control for heterogeneous
networks. IEEE/ACM Transactions on Networking, vol. 16, no. 2, pp. 396409, April 2008.
DOI: 10.1109/TNET.2007.900405 6, 105, 145
[20] M. J. Neely. Energy optimal control for time varying wireless networks. Proc. IEEE INFO
COM, March 2005. DOI: 10.1109/INFCOM.2005.1497924 6, 49
[21] M. J. Neely. Energy optimal control for time varying wireless networks. IEEETransactions on
InformationTheory, vol. 52, no. 7, pp. 29152934, July 2006. DOI: 10.1109/TIT.2006.876219
6, 28, 38, 49, 56, 83, 84
[22] L. Georgiadis, M. J. Neely, and L. Tassiulas. Resource allocation and crosslayer control
in wireless networks. Foundations and Trends in Networking, vol. 1, no. 1, pp. 1149, 2006.
DOI: 10.1561/1300000001 xi, 7, 8, 24, 49, 105, 110, 111, 113, 145
BIBLIOGRAPHY 183
[23] S. H. Low and D. E. Lapsley. Optimization ﬂow control, i: Basic algorithm and con
vergence. IEEE/ACM Transactions on Networking, vol. 7 no. 6, pp. 861875, Dec. 1999.
DOI: 10.1109/90.811451 7, 104, 109
[24] S. H. Low. A duality model of TCP and queue management algorithms. IEEE Trans. on
Networking, vol. 11, no. 4, pp. 525536, August 2003. DOI: 10.1109/TNET.2003.815297 7
[25] L. Xiao, M. Johansson, and S. Boyd. Simultaneous routing and resource allocation for wireless
networks. Proc. of the 39th Annual Allerton Conf. on Comm., Control, Comput., Oct. 2001. 7
[26] L. Xiao, M. Johansson, and S. P. Boyd. Simultaneous routing and resource allocation via dual
decomposition. IEEE Transactions on Communications, vol. 52, no. 7, pp. 11361144, July
2004. DOI: 10.1109/TCOMM.2004.831346 7
[27] J. W. Lee, R. R. Mazumdar, and N. B. Shroff. Downlink power allocation for multiclass cdma
wireless networks. Proc. IEEE INFOCOM, 2002. DOI: 10.1109/INFCOM.2002.1019399
7
[28] M. Chiang. Balancing transport and physical layer in wireless multihop networks: Jointly op
timal congestion control and power control. IEEEJournal on Selected Areas in Communications,
vol. 23, no. 1, pp. 104116, Jan. 2005. DOI: 10.1109/JSAC.2004.837347 7
[29] M. Chiang, S. H. Low, A. R. Calderbank, and J. C. Doyle. Layering as optimization decom
position: A mathematical theory of network architectures. Proceedings of the IEEE, vol. 95,
no. 1, Jan. 2007. DOI: 10.1109/JPROC.2006.887322 7, 104, 109
[30] R. Cruz and A. Santhanam. Optimal routing, link scheduling, and power
control in multihop wireless networks. Proc. IEEE INFOCOM, April 2003.
DOI: 10.1109/INFCOM.2003.1208720 7
[31] X. Lin and N. B. Shroff. Joint rate control and scheduling in multihop wireless networks.
Proc. of 43rd IEEE Conf. on Decision and Control, Paradise Island, Bahamas, Dec. 2004. 7, 8,
109
[32] R. Agrawal and V. Subramanian. Optimality of certain channel aware scheduling policies.
Proc. 40th Annual Allerton Conference on Communication , Control, and Computing, Monticello,
IL, Oct. 2002. 7, 119
[33] H. Kushner and P. Whiting. Asymptotic properties of proportionalfair sharing algorithms.
Proc. of 40th Annual Allerton Conf. on Communication, Control, and Computing, 2002. 7, 119
[34] A. Stolyar. Maximizing queueing network utility subject to stability: Greedy primaldual algo
rithm. Queueing Systems, vol. 50, no. 4, pp. 401457, 2005. DOI: 10.1007/s1113400514500
7, 119
184 BIBLIOGRAPHY
[35] A. Stolyar. Greedy primaldual algorithm for dynamic resource allocation in complex net
works. Queueing Systems, vol. 54, no. 3, pp. 203220, 2006. DOI: 10.1007/s1113400600672
7
[36] Q. Li and R. Negi. Scheduling in wireless networks under uncertainties: Agreedy primaldual
approach. Arxiv Technical Report: arXiv:1001:2050v2, June 2010. 8, 119
[37] L. Huang and M. J. Neely. Delay reduction via lagrange multipliers in stochastic network
optimization. Proc. of 7th Intl. Symposium on Modeling and Optimization in Mobile, Ad Hoc,
and Wireless Networks (WiOpt), June 2009. DOI: 10.1109/WIOPT.2009.5291609 8, 10, 69,
71, 113
[38] M. J. Neely. Universal scheduling for networks with arbitrary trafﬁc, channels, and mobility.
Proc. IEEEConf. on Decision and Control (CDC), Atlanta, GA, Dec. 2010. 8, 77, 81, 102, 112,
119
[39] M. J. Neely. Universal scheduling for networks with arbitrary trafﬁc, channels, and mobility.
ArXiv technical report, arXiv:1001.0960v1, Jan. 2010. 8, 77, 81, 102, 107
[40] M. J. Neely. Stock market trading via stochastic network optimization. Proc. IEEEConference
on Decision and Control (CDC), Atlanta, GA, Dec. 2010. 8, 77, 179
[41] M. J. Neely. Stock market trading via stochastic network optimization. ArXiv Technical
Report, arXiv:0909.3891v1, Sept. 2009. 8, 77, 179
[42] M. J. Neely and R. Urgaonkar. Cross layer adaptive control for wireless mesh networks. Ad
Hoc Networks (Elsevier), vol. 5, no. 6, pp. 719743, August 2007.
DOI: 10.1016/j.adhoc.2007.01.004 8, 102, 112, 119, 179
[43] M. J. Neely. Stochastic network optimization with nonconvex utilities and costs. Proc. Infor
mationTheory and Applications Workshop (ITA), Feb. 2010. DOI: 10.1109/ITA.2010.5454100
8, 116, 117, 118
[44] A. Eryilmaz and R. Srikant. Fair resource allocation in wireless networks using queue
lengthbased scheduling and congestion control. Proc. IEEE INFOCOM, March 2005.
DOI: 10.1109/INFCOM.2005.1498459 8
[45] A. Eryilmaz and R. Srikant. Fair resource allocation in wireless networks using queuelength
based scheduling and congestion control. IEEE/ACM Transactions on Networking, vol. 15,
no. 6, pp. 13331344, Dec. 2007. DOI: 10.1109/TNET.2007.897944 8, 69, 71
[46] J. W. Lee, R. R. Mazumdar, and N. B. Shroff. Opportunistic power scheduling for dynamic
multiserver wireless systems. IEEETransactions on Wireless Communications, vol. 5, no.6, pp.
15061515, June 2006. DOI: 10.1109/TWC.2006.1638671 8
BIBLIOGRAPHY 185
[47] V. Tsibonis, L. Georgiadis, and L. Tassiulas. Exploiting wireless channel state information
for throughput maximization. IEEE Transactions on Information Theory, vol. 50, no. 11, pp.
25662582, Nov. 2004. DOI: 10.1109/TIT.2004.836687 8
[48] V. Tsibonis, L. Georgiadis, and L. Tassiulas. Exploiting wireless channel state
information for throughput maximization. Proc. IEEE INFOCOM, April 2003.
DOI: 10.1109/TIT.2004.836687 8
[49] X. Liu, E. K. P. Chong, and N. B. Shroff. A framework for opportunistic schedul
ing in wireless networks. Computer Networks, vol. 41, no. 4, pp. 451474, March 2003.
DOI: 10.1016/S13891286(02)004012 8
[50] R. Berry and R. Gallager. Communication over fading channels with delay constraints.
IEEE Transactions on Information Theory, vol. 48, no. 5, pp. 11351149, May 2002.
DOI: 10.1109/18.995554 8, 9, 67
[51] M. J. Neely. Optimal energy and delay tradeoffs for multiuser wireless downlinks.
IEEE Transactions on Information Theory, vol. 53, no. 9, pp. 30953113, Sept. 2007.
DOI: 10.1109/TIT.2007.903141 8, 10, 67, 71
[52] M. J. Neely. Superfast delay tradeoffs for utility optimal fair scheduling in wireless
networks. IEEE Journal on Selected Areas in Communications, Special Issue on Nonlin
ear Optimization of Communication Systems, vol. 24, no. 8, pp. 14891501, Aug. 2006.
DOI: 10.1109/JSAC.2006.879357 8, 10, 67, 71
[53] M. J. Neely. Intelligent packet dropping for optimal energydelay tradeoffs in wireless down
links. IEEE Transactions on Automatic Control, vol. 54, no. 3, pp. 565579, March 2009.
DOI: 10.1109/TAC.2009.2013652 8, 10, 67, 71
[54] S. Moeller, A. Sridharan, B. Krishnamachari, and O. Gnawali. Routing without routes: The
backpressure collection protocol. Proc. 9th ACM/IEEE Intl. Conf. on Information Processing
in Sensor Networks (IPSN), April 2010. DOI: 10.1145/1791212.1791246 8, 10, 71, 72, 113,
179
[55] L. Huang, S. Moeller, M. J. Neely, and B. Krishnamachari. LIFObackpressure achieves near
optimal utilitydelay tradeoff. Arxiv Technical Report, arXiv:1008.4895v1, August 2010. 8,
10, 72, 113, 179
[56] M. J. Neely. Stochastic optimization for Markov modulated networks with application to
delay constrained wireless scheduling. Proc. IEEE Conf. on Decision and Control (CDC),
Shanghai, China, Dec. 2009. DOI: 10.1109/CDC.2009.5400270 8, 9, 153, 171, 173
[57] M. J. Neely. Stochastic optimizationfor Markov modulatednetworks withapplicationto delay
constrained wireless scheduling. ArXiv Technical Report, arXiv:0905.4757v1, May 2009. 8,
9, 153, 158, 171, 173, 174
186 BIBLIOGRAPHY
[58] C.P. Li and M. J. Neely. Network utility maximization over partially observable markovian
channels. Arxiv Technical Report: arXiv:1008.3421v1, Aug. 2010. 8, 153
[59] F. J. Vázquez Abad and V. Krishnamurthy. Policy gradient stochastic approximation algo
rithms for adaptive control of constrained time varying Markov decision processes. Proc.
IEEE Conf. on Decision and Control, Dec. 2003. DOI: 10.1109/CDC.2003.1273053 8, 174
[60] D. V. Djonin and V. Krishnamurthy. qlearning algorithms for constrained Markov de
cision processes with randomized monotone policies: Application to mimo transmission
control. IEEE Transactions on Signal Processing, vol. 55, no. 5, pp. 21702181, May 2007.
DOI: 10.1109/TSP.2007.893228 8, 9, 174
[61] N. Salodkar, A. Bhorkar, A. Karandikar, and V. S. Borkar. An online learning algo
rithm for energy efﬁcient delay constrained scheduling over a fading channel. IEEE
Journal on Selected Areas in Communications, vol. 26, no. 4, pp. 732742, May 2008.
DOI: 10.1109/JSAC.2008.080514 8, 9, 174
[62] F. Fu and M. van der Schaar. A systematic framework for dynamically optimizing multiuser
video transmission. IEEE Journal on Selected Areas in Communications, vol. 28, no. 3, pp.
308320, April 2010. DOI: 10.1109/JSAC.2010.100403 8, 9, 174
[63] F. Fu and M. van der Schaar. Decomposition principles and online learning in crosslayer
optimization for delaysensitive applications. IEEE Trans. Signal Processing, vol. 58, no. 3,
pp. 14011415, March 2010. DOI: 10.1109/TSP.2009.2034938 8, 9, 174
[64] D. P. Bertsekas. Dynamic Programming and Optimal Control, vols. 1 and 2. Athena Scientiﬁc,
Belmont, Mass, 1995. 8, 158, 168, 170
[65] E. Altman. Constrained Markov Decision Processes. Boca Raton, FL, Chapman and Hall/CRC
Press, 1999. 8
[66] S. Ross. Introduction to Probability Models. Academic Press, 8th edition, Dec. 2002. 8, 12, 27,
76
[67] D. P. Bertsekas and J. N. Tsitsiklis. NeuroDynamic Programming. Athena Scientiﬁc, Belmont,
Mass, 1996. 8, 158, 173, 174
[68] W. B. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. John
Wiley & Sons, 2007. DOI: 10.1002/9780470182963 8, 174
[69] S. Meyn. Control Techniques for Complex Networks. Cambridge University Press, 2008. 8, 174
[70] D. Tse and S. Hanly. Multiaccess fading channels: Part ii: Delaylimited capacities.
IEEE Transactions on Information Theory, vol. 44, no. 7, pp. 28162831, Nov. 1998.
DOI: 10.1109/18.737514 9, 135
BIBLIOGRAPHY 187
[71] R. Urgaonkar and M. J. Neely. Delaylimited cooperative communication with re
liability constraints in wireless networks. Proc. IEEE INFOCOM, April 2009.
DOI: 10.1109/INFCOM.2009.5062187 9, 135, 168, 179
[72] A. Mekkittikul and N. McKeown. Astarvation free algorithmfor achieving 100%throughput
in an inputqueued switch. Proc. ICCN, pp. 226231, 1996. 9
[73] A. L. Stolyar and K. Ramanan. Largest weighted delay ﬁrst scheduling: Large de
viations and optimality. Annals of Applied Probability, vol. 11, no. 1, pp. 148, 2001.
DOI: 10.1214/aoap/998926986 9, 11
[74] M. Andrews, K. Kumaran, K. Ramanan, A. Stolyar, R. Vijaykumar, and P. Whiting.
Scheduling in a queueing system with asynchronously varying service rates. Probabil
ity in the Engineering and Informational Sciences, vol. 18, no. 2, pp. 191217, April 2004.
DOI: 10.1017/S0269964804182041 9
[75] S. Shakkottai and A. Stolyar. Scheduling for multiple ﬂows sharing a timevarying channel:
The exponential rule. American Mathematical Society Translations, series 2, vol. 207, 2002. 9
[76] M. J. Neely. Delaybased network utility maximization. Proc. IEEE INFOCOM, March
2010. DOI: 10.1109/INFCOM.2010.5462097 9, 120, 122
[77] A. Fu, E. Modiano, and J. Tsitsiklis. Optimal energy allocation for delayconstrained data
transmission over a timevarying channel. Proc. IEEE INFOCOM, 2003. 9
[78] M. Zafer and E. Modiano. Optimal rate control for delayconstrained data transmission over
a wireless channel. IEEE Transactions on Information Theory, vol. 54, no. 9, pp. 40204039,
Sept. 2008. DOI: 10.1109/TIT.2008.928249 9
[79] M. Zafer and E. Modiano. Minimum energy transmission over a wireless channel with
deadline and power constraints. IEEETransactions on Automatic Control, vol. 54, no. 12, pp.
28412852, December 2009. DOI: 10.1109/TAC.2009.2034202 9
[80] M. Goyal, A. Kumar, and V. Sharma. Power constrained and delay optimal policies
for scheduling transmission over a fading channel. Proc. IEEE INFOCOM, April 2003.
DOI: 10.1109/INFCOM.2003.1208683 9
[81] A. Wierman, L. L. H. Andrew, and A. Tang. Poweraware speed scaling in pro
cessor sharing systems. Proc. IEEE INFOCOM, Rio de Janeiro, Brazil, April 2009.
DOI: 10.1109/INFCOM.2009.5062123 9
[82] E. UysalBiyikoglu, B. Prabhakar, and A. El Gamal. Energyefﬁcient packet transmission
over a wireless link. IEEE/ACM Trans. Networking, vol. 10, no. 4, pp. 487499, Aug. 2002.
DOI: 10.1109/TNET.2002.801419 9
188 BIBLIOGRAPHY
[83] M. Zafer and E. Modiano. A calculus approach to minimum energy transmission
policies with quality of service guarantees. Proc. IEEE INFOCOM, March 2005.
DOI: 10.1109/INFCOM.2005.1497922 9
[84] M. Zafer and E. Modiano. A calculus approach to energyefﬁcient data transmission with
qualityofservice constraints. IEEE/ACM Transactions on Networking, vol. 17, no. 13, pp.
898911, June 2009. DOI: 10.1109/TNET.2009.2020831 9
[85] W. Chen, M. J. Neely, and U. Mitra. Energyefﬁcient transmissions with individual packet
delay constraints. IEEE Transactions on Information Theory, vol. 54, no. 5, pp. 20902109,
May 2008. DOI: 10.1109/TIT.2008.920344 9
[86] W. Chen, U. Mitra, and M. J. Neely. Energyefﬁcient scheduling with individual packet delay
constraints over a fading channel. Wireless Networks, vol. 15, no. 5, pp. 601618, July 2009.
DOI: 10.1007/s112760070093y 9
[87] M. A. Khojastepour and A. Sabharwal. Delayconstrained scheduling: Power
efﬁciency, ﬁlter design, and bounds. Proc. IEEE INFOCOM, March 2004.
DOI: 10.1109/INFCOM.2004.1354603 9
[88] B. Hajek. Optimal control of two interacting service stations. IEEETransactions on Automatic
Control, vol. 29, no. 6, pp. 491499, June 1984. DOI: 10.1109/TAC.1984.1103577 9
[89] S. Sarkar. Optimumscheduling andmemory management ininput queuedswitches withﬁnite
buffer space. Proc. IEEE INFOCOM, April 2003. DOI: 10.1109/INFCOM.2003.1208973
9
[90] A. Tarello, J. Sun, M. Zafer, and E. Modiano. Minimum energy transmission scheduling
subject to deadline constraints. ACM Wireless Networks, vol. 14, no. 5, pp. 633645, 2008.
DOI: 10.1007/s1127600600056 9
[91] B. Sadiq, S. Baek, and Gustavo de Veciana. Delayoptimal opportunistic scheduling and
approximations: the log rule. Proc. IEEE INFOCOM, April 2009.
DOI: 10.1109/INFCOM.2009.5062088 9
[92] B. Sadiq and G. de Veciana. Optimality and large deviations of queues under the pseudolog
rule opportunistic scheduling. 46th Annual Allerton Conference on Communication, Control,
and Computing, Monticello, IL, Sept. 2008. DOI: 10.1109/ALLERTON.2008.4797636 9, 11
[93] A. L. Stolyar. Large deviations of queues sharing a randomly timevarying server. Queueing
Systems Theory and Applications, vol. 59, no. 1, pp. 135, 2008.
DOI: 10.1007/s111340089072y 9, 11
BIBLIOGRAPHY 189
[94] A. Ganti, E. Modiano, and J. N. Tsitsiklis. Optimal transmission scheduling in symmetric
communication models with intermittent connectivity. IEEE Transactions on Information
Theory, vol. 53, no. 3, pp. 9981008, March 2007. DOI: 10.1109/TIT.2006.890695 10
[95] E. M. Yeh and A. S. Cohen. Delay optimal rate allocation in multiaccess fading communi
cations. Proc. Allerton Conference on Communication, Control, and Computing, Monticello, IL,
2004. 10
[96] E. M. Yeh. Multiaccess and Fading in Communication Networks. PhD thesis, Massachusetts
Institute of Technology, Laboratory for Information and Decision Systems (LIDS), 2001. 10
[97] S. Kittipiyakul and T. Javidi. Delayoptimal server allocation in multiqueue multiserver
systems with timevarying connectivities. IEEE Transactions on Information Theory, vol. 55,
no. 5, pp. 23192333, May 2009. DOI: 10.1109/TIT.2009.2016051 10
[98] A. Ephremides, P. Varaiya, and J. Walrand. A simple dynamic routing problem. IEEE
Transactions on Automatic Control, vol. AC25, no.4, pp. 690693, Aug. 1980. 10
[99] M. J. Neely, E. Modiano, and Y.S. Cheng. Logarithmic delay for n ×n packet switches
under the crossbar constraint. IEEETransactions on Networking, vol. 15, no. 3, pp. 657668,
June 2007. DOI: 10.1109/TNET.2007.893876 10, 11, 37
[100] M. J. Neely. Order optimal delay for opportunistic scheduling in multiuser wireless uplinks
anddownlinks. IEEE/ACMTransactions onNetworking, vol. 16, no. 5, pp. 11881199, October
2008. DOI: 10.1109/TNET.2007.909682 10, 24, 37
[101] M. J. Neely. Delay analysis for max weight opportunistic scheduling in wireless sys
tems. IEEE Transactions on Automatic Control, vol. 54, no. 9, pp. 21372150, Sept. 2009.
DOI: 10.1109/TAC.2009.2026943 10, 11, 24, 37
[102] S. Deb, D. Shah, and S. Shakkottai. Fast matching algorithms for repetitive optimization: An
application to switch scheduling. Proc. of 40th Annual Conference on Information Sciences and
Systems (CISS), Princeton, NJ, March 2006. DOI: 10.1109/CISS.2006.286659 10, 37, 147
[103] M. J. Neely. Delay analysis for maximal scheduling with ﬂow control in wireless networks
with bursty trafﬁc. IEEE Transactions on Networking, vol. 17, no. 4, pp. 11461159, August
2009. DOI: 10.1109/TNET.2008.2008232 10, 11, 37, 147
[104] X. Wu, R. Srikant, and J. R. Perkins. Scheduling efﬁciency of distributed greedy scheduling
algorithms in wireless networks. IEEE Transactions on Mobile Computing, vol. 6, no. 6, pp.
595605, June 2007. DOI: 10.1109/TMC.2007.1061 11, 37, 147
[105] J. G. Dai and B. Prabhakar. The throughput of data switches with and without speedup. Proc.
IEEE INFOCOM, 2000. DOI: 10.1109/INFCOM.2000.832229 11, 37
190 BIBLIOGRAPHY
[106] J. M. Harrison and J. A. Van Mieghem. Dynamic control of brownian networks: State space
collapse and equivalent workload formulations. The Annals of Applied Probability, vol. 7(3),
pp. 747771, Aug. 1997. DOI: 10.1214/aoap/1034801252 11
[107] S. Shakkottai, R. Srikant, and A. Stolyar. Pathwise optimality of the exponential scheduling
rule for wireless channels. Advances in Applied Probability, vol. 36, no. 4, pp. 10211045, Dec.
2004. DOI: 10.1239/aap/1103662957 11
[108] A. L. Stolyar. Maxweight scheduling in a generalized switch: State space collapse and
workload minimization in heavy trafﬁc. Annals of Applied Probability, pp. 153, 2004.
DOI: 10.1214/aoap/1075828046 11
[109] D. Shah and D. Wischik. Optimal scheduling algorithms for inputqueued switches. Proc.
IEEE INFOCOM, 2006. DOI: 10.1109/INFOCOM.2006.238 11
[110] I. Keslassy and N. McKeown. Analysis of scheduling algorithms that provide 100% through
put in inputqueued switches. Proc. 39th Annual Allerton Conf. on Communication, Control,
and Computing, Oct. 2001. 11
[111] T. Ji, E. Athanasopoulou, and R. Srikant. Optimal scheduling policies in small generalized
switches. Proc. IEEE INFOCOM, Rio De Janeiro, Brazil, 2009.
DOI: 10.1109/INFCOM.2009.5062259 11
[112] V. J. Venkataramanan and X. Lin. Structural properties of ldp for queuelength based wireless
scheduling algorithms. Proc. of 45th Annual Allerton Conference on Communication, Control,
and Computing, Monticello, Illinois, September 2007. 11
[113] D. Bertsimas, I. C. Paschalidis, and J. N. Tsitsiklis. Large deviations analysis of the
generalized processor sharing policy. Queueing Systems, vol. 32, pp. 319349, 1999.
DOI: 10.1023/A:1019151423773 11
[114] D. Bertsimas, I. C. Paschalidis, and J. N. Tsitsiklis. Asymptotic buffer overﬂow probabilities
in multiclass multiplexers: An optimal control approach. IEEE Transactions on Automatic
Control, vol. 43, no. 3, pp. 315335, March 1998. DOI: 10.1109/9.661587 11
[115] S. Bodas, S. Shakkottai, L. Ying, and R. Srikant. Scheduling in multichannel wireless
networks: Rate function optimality in the smallbuffer regime. Proc. ACM SIGMET
RICS/Performance Conference, June 2009. DOI: 10.1145/1555349.1555364 11
[116] P. Gupta andP. R. Kumar. The capacity of wireless networks. IEEETransactions onInformation
Theory, vol. 46, no. 2, pp. 388404, March 2000. DOI: 10.1109/18.825799 11
[117] M. Grossglauser and D. Tse. Mobility increases the capacity of adhoc wireless net
works. IEEE/ACM Trans. on Networking, vol. 10, no. 4, pp. 477486, August 2002.
DOI: 10.1109/TNET.2002.801403 11
BIBLIOGRAPHY 191
[118] M. J. Neely and E. Modiano. Capacity and delay tradeoffs for adhoc mobile net
works. IEEE Transactions on Information Theory, vol. 51, no. 6, pp. 19171937, June 2005.
DOI: 10.1109/TIT.2005.847717 11
[119] S. Toumpis and A. J. Goldsmith. Large wireless networks under fading, mobility, and delay
constraints. Proc. IEEE INFOCOM, 2004. DOI: 10.1109/INFCOM.2004.1354532 12
[120] X. Lin and N. B. Shroff. Towards achieving the maximum capacity in large mobile wireless
networks. Journal of Communications and Networks, Special Issue on Mobile Ad Hoc Wireless
Networks, vol. 6, no. 4, December 2004. 12
[121] X. Lin and N. B. Shroff. The fundamental capacitydelay tradeoff in large mobile ad hoc
networks. Purdue University Tech. Report, 2004. 12
[122] A. El Gamal, J. Mammen, B. Prabhakar, and D. Shah. Optimal throughputdelay scaling in
wireless networks – part 1: The ﬂuid model. IEEE Transactions on Information Theory, vol.
52, no. 6, pp. 25682592, June 2006. DOI: 10.1109/TIT.2006.874379 12
[123] G. Sharma, R. Mazumdar, and N. Shroff. Delay and capacity tradeoffs in mobile adhoc
networks: A global perspective. Proc. IEEE INFOCOM, April 2006.
DOI: 10.1109/INFOCOM.2006.144 12
[124] X. Lin, G. Sharma, R. R. Mazumdar, and N. B. Shroff. Degenerate delaycapacity tradeoffs
in ad hoc networks with brownian mobility. IEEE Transactions on Information Theory, vol.
52, no. 6, pp. 27772784, June 2006. DOI: 10.1109/TIT.2006.874544 12
[125] N. Bansal and Z. Liu. Capacity, delay and mobility in wireless adhoc networks. Proc. IEEE
INFOCOM, April 2003. DOI: 10.1109/INFCOM.2003.1208990 12
[126] L. Ying, S. Yang, and R. Srikant. Optimal delaythroughput tradeoffs in mobile ad hoc
networks. IEEETransactions on InformationTheory, vol. 54, no. 9, pp. 41194143, Sept. 2008.
DOI: 10.1109/TIT.2008.928247 12
[127] Z. Kong, E. M. Yeh, and E. Soljanin. Coding improves the throughputdelay tradeoff in
mobile wireless networks. Proceedings of the International Symposium on Information Theory,
Seoul, Korea, June 2009. 12
[128] Z. Kong, E. M. Yeh, and E. Soljanin. Coding improves the throughputdelay trade
off in mobile wireless networks. IEEE Transactions on Information Theory, to appear.
DOI: 10.1109/ISIT.2009.5205277 12
[129] D. P. Bertsekas and R. Gallager. Data Networks. New Jersey: PrenticeHall, Inc., 1992. 12,
19, 25, 27, 37, 48, 109, 128, 144, 172
192 BIBLIOGRAPHY
[130] R. Gallager. Discrete Stochastic Processes. Kluwer Academic Publishers, Boston, 1996. 12, 27,
50, 74, 76
[131] F. P. Kelly. Reversibility and Stochastic Networks. Wiley, Chichester, 1979. 12, 27, 144
[132] S. Ross. Stochastic Processes. John Wiley & Sons, Inc., New York, 1996. 12, 74
[133] D. P. Bertsekas, A. Nedic, and A. E. Ozdaglar. Convex Analysis and Optimization. Boston:
Athena Scientiﬁc, 2003. 12, 67
[134] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. 12,
67
[135] R. T. Rockafellar. Convex Analysis. Princeton University Press, 1996. 12
[136] M. J. Neely. Stability and capacity regions for discrete time queueing networks. ArXiv
Technical Report: arXiv:1003.3396v1, March 2010. 18, 19, 56, 102
[137] R. Urgaonkar and M. J. Neely. Opportunistic scheduling with reliability guarantees in cogni
tive radio networks. IEEETransactions on Mobile Computing, vol. 8, no. 6, pp. 766777, June
2009. DOI: 10.1109/TMC.2009.38 28, 145, 147
[138] M. J. Neely. Queue stability and probability 1 convergence via lyapunov optimization. Arxiv
Technical Report, arXiv:1008.3519, August 2010. 50, 51
[139] O. Kallenberg. Foundations of Modern Probability, 2nd ed., Probability and its Applications.
SpringerVerlag, 2002. 50
[140] D. Williams. Probability with Martingales. Cambridge Mathematical Textbooks, Cambridge
University Press, 1991. 50
[141] Y. V. Borovskikh and V. S. Korolyuk. Martingale Approximation. VSP BV, The Netherlands,
1997. 50
[142] M. J. Neely and R. Urgaonkar. Opportunism, backpressure, and stochastic optimization with
the wireless broadcast advantage. Asilomar Conference on Signals, Systems, and Computers,
Paciﬁc Grove, CA, Oct. 2008. DOI: 10.1109/ACSSC.2008.5074815 70, 71, 179
[143] M. J. Neely and A. Sharma. Dynamic data compression with distortion constraints for wireless
transmission over a fading channel. arXiv:0807.3768v1, July 24, 2008. 70, 71, 84, 89, 179
[144] L. Huang and M. J. Neely. Maxweight achieves the exact [O(1/V), O(V)] utilitydelay
tradeoff under Markov dynamics. ArxivTechnical Report, arXiv:1008.0200, August 2010. 74,
77
BIBLIOGRAPHY 193
[145] P. Billingsley. Probability Theory and Measure, 2nd edition. New York: John Wiley & Sons,
1986. 76, 92
[146] M. J. Neely. Distributed and secure computation of convex programs over a network of
connected processors. DCDIS Conf., Guelph, Ontario, July 2005. 81
[147] L. Tassiulas and A. Ephremides. Throughput properties of a queueing network with dis
tributed dynamic routing and ﬂow control. Advances in Applied Probability, vol. 28, pp.
285307, 1996. DOI: 10.2307/1427922 86
[148] Y. Wu, P. A. Chou, and SY Kung. Information exchange in wireless networks with network
coding and physicallayer broadcast. Conference on Information Sciences and Systems, Johns
Hopkins University, March 2005. 87
[149] E. Leonardi, M. Mellia, M. A. Marsan, and F. Neri. Optimal scheduling and routing for
maximizing network throughput. IEEE/ACM Transactions on Networking, vol. 15, no. 6,
Dec. 2007. DOI: 10.1109/TNET.2007.896486 104, 107
[150] Y. Li, A. Papachristodoulou, and M. Chiang. Stability of congestion control schemes with
delay sensitive trafﬁc. Proc. IEEE ACC, Seattle, WA, June 2008.
DOI: 10.1109/ACC.2008.4586779 104, 108, 109
[151] J. K. MacKieMason and H. R. Varian. Pricing congestible network resources. IEEE Journal
on Selected Areas in Communications, vol. 13, no. 7, September 1995. DOI: 10.1109/49.414634
109
[152] M. J. Neely and E. Modiano. Convexity in queues with general inputs. IEEETransactions on
Information Theory, vol. 51, no. 2, pp. 706714, Feb. 2005. DOI: 10.1109/TIT.2004.840859
109
[153] M. J. Neely. Optimal pricing in a free market wireless network. Wireless Networks, vol. 15, no.
7, pp. 901915, October 2009. DOI: 10.1007/s1127600700830 112, 179
[154] M. J. Neely and R. Urgaonkar. Optimal backpressure routing in wireless networks with
multireceiver diversity. Ad Hoc Networks (Elsevier), vol. 7, no. 5, pp. 862881, July 2009.
DOI: 10.1016/j.adhoc.2008.07.009 113, 132, 145, 147, 179
[155] L. Ying, S. Shakkottai, and A. Reddy. On combining shortestpath and back
pressure routing over multihop wireless networks. Proc. IEEE INFOCOM, 2009.
DOI: 10.1109/INFCOM.2009.5062086 113, 179
[156] J.W. Lee, R. R. Mazumdar, and N. B. Shroff. Nonconvex optimization and rate control
for multiclass services in the internet. IEEE/ACM Trans. on Networking, vol. 13, no. 4, pp.
827840, Aug. 2005. DOI: 10.1109/TNET.2005.852876 116
194 BIBLIOGRAPHY
[157] M. Chiang. Nonconvex optimization of communication systems. Advances in Mechanics and
Mathematics, Special volume on Strang’s 70th Birthday, Springer, vol. 3, 2008. 116
[158] W.H. Wang, M. Palaniswami, and S. H. Low. Applicationoriented ﬂow control: Funda
mentals, algorithms, and fairness. IEEE/ACMTransactions on Networking, vol. 14, no. 6, Dec.
2006. DOI: 10.1109/TNET.2006.886318 116
[159] M. J. Neely, A. S. Tehrani, and A. G. Dimakis. Efﬁcient algorithms for renewable energy
allocation to delay tolerant consumers. 1st IEEE International Conference on Smart Grid
Communications, 2010. 120, 122, 179
[160] L. Tassiulas and S. Sarkar. Maxmin fair scheduling in wireless ad hoc networks. IEEE
Journal on Selected Areas in Communications, Special Issue on Ad Hoc Networks, vol. 23, no. 1,
pp. 163173, Jan. 2005. 128
[161] H. ShiraniMehr, G. Caire, and M. J. Neely. Mimo downlink scheduling with nonperfect
channel state knowledge. IEEETransactions on Communications, vol. 58, no. 7, pp. 20552066,
July 2010. DOI: 10.1109/TCOMM.2010.07.090377 129, 132
[162] M. Kobayashi, G. Caire, and D. Gesbert. Impact of multiple transmit antennas in a queued
SDMA/TDMA downlink. In Proc. of 6th IEEE Workshop on Signal Processing Advances in
Wireless Communications (SPAWC), June 2005. DOI: 10.1109/SPAWC.2005.1506198 132,
179
[163] C. Li and M. J. Neely. Energyoptimal scheduling with dynamic channel acquisition in
wireless downlinks. IEEETransactions on Mobile Computing, vol. 9, no. 4, pp. 527539, April
2010. DOI: 10.1109/TMC.2009.140 132
[164] A. Gopalan, C. Caramanis, and S. Shakkottai. On wireless scheduling with partial channel
state information. Allerton Conf. on Comm., Control, and Computing, Sept. 2007. 132
[165] M. J. Neely. Dynamic data compression for wireless transmission over a fading channel. Proc.
Conference on Information Sciences and Systems (CISS), invited paper, Princeton, March 2008.
DOI: 10.1109/CISS.2008.4558703 132, 179
[166] M. J. Neely. Max weight learning algorithms with application to scheduling in unknown
environments. arXiv:0902.0630v1, Feb. 2009. 132, 162
[167] D. Shah and M. Kopikare. Delay bounds for approximate maximum weight match
ing algorithms for input queued switches. Proc. IEEE INFOCOM, June 2002.
DOI: 10.1109/INFCOM.2002.1019350 140
[168] M. J. Neely, E. Modiano, and C. E. Rohrs. Tradeoffs in delay guarantees and computation
complexity for n ×n packet switches. Proc. of Conf. on Information Sciences and Systems (CISS),
Princeton, March 2002. 140, 141
BIBLIOGRAPHY 195
[169] L.Tassiulas. Linear complexity algorithms for maximumthroughput inradio networks andin
put queued switches. Proc. IEEE INFOCOM, 1998. DOI: 10.1109/INFCOM.1998.665071
140, 141
[170] E. Modiano, D. Shah, and G. Zussman. Maximizing throughput in wireless net
works via gossiping. Proc. ACM SIGMETRICS / IFIP Performance’06, June 2006.
DOI: 10.1145/1140103.1140283 141
[171] D. Shah, D. N. C. Tse, and J. N. Tsitsiklis. Hardness of low delay network scheduling. under
submission. 141
[172] L. Jiang andJ. Walrand. Adistributedcsma algorithmfor throughput andutility maximization
in wireless networks. Proc. Allerton Conf. on Communication, Control, and Computing, Sept.
2008. DOI: 10.1109/ALLERTON.2008.4797741 141, 142, 144
[173] S. Rajagopalan and D. Shah. Reversible networks, distributed optimization, and network
scheduling: What do they have in common? Proc. Conf. on Information Sciences and Systems
(CISS), 2008. 141, 144
[174] T. M. Cover and J. A. Thomas. Elements of Information Theory. New York: John Wiley &
Sons, Inc., 1991. DOI: 10.1002/0471200611 143
[175] L. Jiang and J. Walrand. Scheduling and congestion control for wireless and processing
networks. Synthesis Lectures on Communication Networks, vol. 3, no. 1, pp. 1156, 2010.
DOI: 10.2200/S00270ED1V01Y201008CNT006 144, 179
[176] L. Jiang andJ. Walrand. Convergence andstability of a distributedcsma algorithmfor maximal
network throughput. Proc. IEEE Conference on Decision and Control (CDC), Shanghai, China,
December 2009. DOI: 10.1109/CDC.2009.5400349 144
[177] J. Ni, B. Tan, and R. Srikant. Qcsma: Queue length based csma/ca algorithms for achiev
ing maximum throughput and low delay in wireless networks. ArXive Technical Report:
arXiv:0901.2333v4, Dec. 2009. 144
[178] G. Louth, M. Mitzenmacher, and F. Kelly. Computational complexity of loss networks. The
oretical Computer Science, vol. 125, pp. 4559, 1994. DOI: 10.1016/03043975(94)90216X
144
[179] J. Ni and S. Tatikonda. A factor graph modelling of productform loss and queueing net
works. 43rd Allerton Conference on Communication, Control, and Computing (Monticello, IL),
September 2005. 144
[180] M. Luby and E. Vigoda. Fast convergence of the glauber dynamics for sampling independent
sets: Part i. International Computer Science Institute, Berkeley, CA, Technical Report TR99002,
196 BIBLIOGRAPHY
Jan. 1999.
DOI: 10.1002/(SICI)10982418(199910/12)15:3/4%3C229::AIDRSA3%3E3.0.CO;2X
144
[181] D. Randall and P. Tetali. Analyzing glauber dynamics by comparison of Markov chains.
Lecture Notes in Computer Science, Proc. of the 3rd Latin American Symposium on Theoretical
Informatics, vol. 1380:pp. 292–304, 1998. DOI: 10.1063/1.533199 144
[182] L. Bui, E. Eryilmaz, R. Srikant, and X. Wu. Joint asynchronous congestion control and
distributed scheduling for multihop wireless networks. Proc. IEEE INFOCOM, 2006.
DOI: 10.1109/INFOCOM.2006.210 145
[183] D. Shah. Maximal matching scheduling is good enough. Proc. IEEE Globecom, Dec. 2003.
DOI: 10.1109/GLOCOM.2003.1258788 147
[184] P. Chaporkar, K. Kar, X. Luo, and S. Sarkar. Throughput and fairness guarantees through
maximal scheduling in wireless networks. IEEE Trans. on Information Theory, vol. 54, no. 2,
pp. 572594, Feb. 2008. DOI: 10.1109/TIT.2007.913537 147
[185] X. Lin and N. B. Shroff. The impact of imperfect scheduling on crosslayer rate control in
wireless networks. Proc. IEEE INFOCOM, 2005. DOI: 10.1109/INFCOM.2005.1498460
147
[186] L. Lin, X. Lin, and N. B. Shroff. Lowcomplexity and distributed energy minimization in
multihop wireless networks. Proc. IEEE INFOCOM, 2007.
DOI: 10.1109/TNET.2009.2032419 147
[187] C. C. Moallemi, S. Kumar, and B. Van Roy. Approximate and datadriven dynamic program
ming for queuing networks. Submitted for publication, 2008. 174
[188] T. Ho, M. Médard, J. Shi, M. Effros, and D. R. Karger. On randomized network coding.
Proc. of 41
st
Annual Allerton Conf. on Communication, Control, and Computing, Oct. 2003. 179
[189] A. Eryilmaz and D. S. Lun. Control for intersession network coding. Proc. Information
Theory and Applications Workshop (ITA), Jan./Feb. 2007. 179
[190] X. Yan, M. J. Neely, and Z. Zhang. Multicasting in time varying wireless networks: Cross
layer dynamic resource allocation. Proc. IEEE International Symposium on Information Theory
(ISIT), June 2007. DOI: 10.1109/ISIT.2007.4557630 179
[191] A. Sharma, L. Golubchik, R. Govindan, and M. J. Neely. Dynamic data compression in
multihop wireless networks. Proc. SIGMETRICS, 2009. DOI: 10.1145/1555349.1555367
179
BIBLIOGRAPHY 197
[192] C. Swannack, E. UysalBiyikoglu, and G. Wornell. Low complexity multiuser scheduling
for maximizing throughput in the mimo broadcast channel. Proc. of 42nd Allerton Conf. on
Communication, Control, and Computing, September 2004. 179
[193] H. ShiraniMehr, G. Caire, and M. J. Neely. Mimo downlink scheduling with non
perfect channel state knowledge. IEEE Transactions on Communications, to appear.
DOI: 10.1109/TCOMM.2010.07.090377 179
[194] E. M. Yeh and R. A. Berry. Throughput optimal control of cooperative relay networks. IEEE
Transactions on Information Theory: Special Issue on Models, Theory, and Codes for Relaying
and Cooperation in Communication Networks, vol. 53, no. 10, pp. 38273833, October 2007.
DOI: 10.1109/TIT.2007.904978 179
[195] L. Huang and M. J. Neely. The optimality of two prices: Maximizing revenue in a stochastic
communication system. IEEE/ACMTransactions on Networking, vol. 18, no. 2, pp. 406419,
April 2010. DOI: 10.1109/TNET.2009.2028423 179
[196] L. Jiang and J. Walrand. Stable and utilitymaximizing scheduling for stochastic pro
cessing networks. Allerton Conference on Communication, Control, and Computing, 2009.
DOI: 10.1109/ALLERTON.2009.5394870 179
[197] M. J. Neely and L. Huang. Dynamic product assembly and inventory control for maximum
proﬁt. Proc. IEEE Conf. on Decision and Control (CDC), Atlanta, GA, Dec. 2010. 179
[198] M. J. Neely and L. Huang. Dynamic product assembly and inventory control for maximum
proﬁt. ArXiv Technical Report, arXiv:1004.0479v1, April 2010. 179
[199] A. Warrier, S. Ha, P. Wason, I. Rhee, and J. H. Kim. Diffq: Differential backlog congestion
control for wireless multihop networks. Conference on Sensor, Mesh and Ad Hoc Communi
cations and Networks (SECON), San Francisco, US, 2008. DOI: 10.1109/SAHCN.2008.78
179
[200] A. Warrier, S. Janakiraman, S. Ha, and I. Rhee. Diffq: Practical differential backlog con
gestion control for wireless networks. Proc. IEEE INFOCOM, Rio de Janeiro, Brazil, 2009.
DOI: 10.1109/INFCOM.2009.5061929 179
[201] A. Sridharan, S. Moeller, and B. Krishnamachari. Making distributed rate control us
ing lyapunov drifts a reality in wireless sensor networks. 6th Intl. Symposium on Mod
eling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), April 2008.
DOI: 10.4108/ICST.WIOPT2008.3205 179
[202] U. Akyol, M. Andrews, P. Gupta, J. Hobby, I. Saniee, and A. Stolyar. Joint schedul
ing and congestion control in mobile adhoc networks. Proc. IEEE INFOCOM, 2008.
DOI: 10.1109/INFOCOM.2008.111 179
198 BIBLIOGRAPHY
[203] B. Radunovi´ c, C. Gkantsidis, D. Gunawardena, and P. Key. Horizon: Balancing
tcp over multiple paths in wireless mesh network. Proc. ACM Mobicom, 2008.
DOI: 10.1145/1409944.1409973 179
199
Author’s Biography
MICHAEL J. NEELY
Michael J. Neely received B.S. degrees in both Electrical Engineering and Mathematics from the
University of Maryland, College Park, in 1997. He then received a 3 year Department of Defense
NDSEG Fellowship for graduate study at the Massachusetts Institute of Technology, where he
completed the M.S. degree in 1999 and the Ph.D. in 2003, both in Electrical Engineering. He
joined the faculty of Electrical Engineering at the University of Southern California in 2004, where
he is currently an Associate Professor. His research interests are in the areas of stochastic network
optimization and queueing theory, with applications to wireless networks, mobile adhoc networks,
and switching systems. Michael received the NSF Career award in 2008 and the Viterbi School of
Engineering Junior Research Award in 2009. He is a member of Tau Beta Pi and Phi Beta Kappa.
Stochastic Network Optimization with Application to Communication and Queueing Systems
Copyright © 2010 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Stochastic Network Optimization with Application to Communication and Queueing Systems Michael J. Neely www.morganclaypool.com ISBN: 9781608454556 ISBN: 9781608454563 paperback ebook
DOI 10.2200/S00271ED1V01Y201006CNT007
A Publication in the Morgan & Claypool Publishers series SYNTHESIS LECTURES ON COMMUNICATION NETWORKS Lecture #7 Series Editor: Jean Walrand, University of California, Berkeley Series ISSN Synthesis Lectures on Communication Networks Print 19354185 Electronic 19354193
This material is supported in part by one or more of the following: the DARPA ITMANET program grant W911NF070028, the NSF Career grant CCF0747525, and continuing through participation in the Network Science Collaborative Technology Alliance sponsored by the U.S. Army Research Laboratory.
Synthesis Lectures on Communication Networks
Editor
Jean Walrand, University of California, Berkeley Synthesis Lectures on Communication Networks is an ongoing series of 50 to 100page publications on topics on the design, implementation, and management of communication networks. Each lecture is a selfcontained presentation of one topic by a leading expert. The topics range from algorithms to hardware implementations and cover a broad spectrum of issues from security to multipleaccess protocols. The series addresses technologies from sensor networks to reconﬁgurable optical networks. The series is designed to: • Provide the best available presentations of important aspects of communication networks. • Help engineers and advanced students keep up with recent developments in a rapidly evolving technology. • Facilitate the development of courses in this ﬁeld.
Stochastic Network Optimization with Application to Communication and Queueing Systems
Michael J. Neely 2010
Scheduling and Congestion Control for Wireless and Processing Networks
Libin Jiang and Jean Walrand 2010
Performance Modeling of Communication Networks with Markov Chains
Jeonghoon Mo 2010
Communication Networks: A Concise Introduction
Jean Walrand and Shyam Parekh 2010
Path Problems in Networks
John S. Baras and George Theodorakopoulos 2010
Perumalla. and George F.iv Performance Modeling. Mazumdar 2009 Network Simulation Richard M. and Statistical Multiplexing Ravi R. Kalyan S. Loss Networks. Fujimoto. Riley 2006 .
Stochastic Network Optimization with Application to Communication and Queueing Systems Michael J. Neely University of Southern California SYNTHESIS LECTURES ON COMMUNICATION NETWORKS #7 M &C Morgan & cLaypool publishers .
including wireless networks with timevarying channels. A simple driftpluspenalty framework is used to optimize time averages such as throughput. multihop. maxweight. decision theory. This theory is also applicable to problems in operations research and economics. backpressure. throughpututility. KEYWORDS dynamic scheduling. wireless networks. where energyefﬁcient and proﬁtmaximizing decisions must be made without knowing the future. Explicit performancedelay tradeoffs are provided to illustrate the cost of approaching optimality. fairness. mobile networks. Mathematical techniques of Lyapunov drift and Lyapunov optimization are developed and shown to enable constrained optimization of time averages in general stochastic systems. maxweight. The focus is on communication and queueing systems. congestion control. network utility maximization. and virtual queue methods • Primaldual methods for nonconvex stochastic utility maximization • Universal scheduling theory for arbitrary sample paths • Approximate and randomized scheduling theory • Optimization of renewal systems and Markov decision systems Detailed examples and numerous problem set questions are provided to reinforce the main concepts. routing. control. Lyapunov optimization. mobility. virtual queues . and distortion.ABSTRACT This text presents a modern theory of analysis. power. and randomly arriving trafﬁc. Topics in the text include the following: • Queue stability theory • Backpressure. and optimization for dynamic networks.
. . . . . . . . . . . . . . . . . . . . . . . 2 1. . . . . . . . . . . .7. . . . . . . . . . . . . . . . . . . . . . . .6 1. . . . . . . . . . . . . . . . . . . . 15 2. . . . . .7. . . . . . 7 On General Markov Decision Problems . . . . . . . . . . .3 Example Problem 3: Maximizing ThroughputUtility Subject to Time Average Power Constraints . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1. . . . . 2Server Example . . . . . . 17 18 19 20 22 25 2. . . . . . . . . . . . . . . . . . . . xi 1 Introduction . . . . . .1. . . . . . . . . . . . . . .7 1. . .7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Example Problem 2: Maximizing Throughput Subject to Time Average Power Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1. .2 1. . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . 4 Lyapunov Drift and Lyapunov Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 A 3Queue. . . . . . . . . . . . . . .5 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Randomized Scheduling for Rate Stability . . . 2 1. . . . . . . . . . . . . . . . . . . . . . . . . .2 2. . . . . . .3 Rate Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 Heavy Trafﬁc and Decay Exponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1. . . . . . . . Exercises . . . . . . . . 5 Differences from our Earlier Text . . . . . . . . . . . . . . . . . .4 Orderoptimal Delay Scheduling and Queue Grouping .1 Example Opportunistic Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stronger Forms of Stability . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Preliminaries . . . . . . 10 1. . . . . . . . . .1 Delay and Dynamic Programming . . . . . 7 Alternative Approaches . . 3 General Stochastic Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 2. . . . . . .6 Capacity and Delay Tradeoffs for Mobile Networks . . . . . . . 8 On Network Delay . . . . .2 Optimal O( V ) and O(log(V )) delay tradeoffs . . . .2 A 2Queue Opportunistic Scheduling Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1. . . . . . . . . . . . . . .3.7. 2. . . . . . . . . . . . . .vii Contents Preface . . . . . . . . . . . . . . . . . . . . . 9 √ 1. . . . . .3 Delayoptimal Algorithms for Symmetric Networks . . . . .1. .1 Example Problem 1: Minimizing Time Average Power Subject to Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7. . .8 2 Introduction to Queues . . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 1. . . . . . . . . . . . . . . . . . . . . . . . . . .7. . . . . . . . . . . . .4 1. . . . . . . . . . . . 1 1. . . 1 1. . . . . . . . .
. . . . . . . . . .1. . . . . . . . . . 4. . . . 4. . . . . . . Stability and Average Power Minimization . . . . . Models and Universal Scheduling . . . . . . . . 29 3. . . . . .2 Characterizing Optimality . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Variable V Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11. . . . . . . . . . . . . . . . .1. . . . . . 4.1. . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . 3. . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . .4 Iterated Expectations and Telescoping Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 DriftPlusPenalty . . . . . . . . . . . . . . . . . . .1 Lyapunov Drift Theorem . 3. . . . . . . . . . . . . . .1. . . .6. .2. . . . . . . . . . . . 4. . .5 4. . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Markov Modulated Processes . . . .5 . . . 45 45 47 49 52 53 53 56 58 62 62 62 64 67 69 72 74 77 81 92 92 93 4. . . . . . . . . . .2 Lyapunov Drift for Stable Scheduling . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Min DriftPlusPenalty Algorithm . . . . . . . . . . . . . . . .1 The Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Optimizing the Bounds . . . . .8 4.2.6. . . . . . . . . . . . . . . . . . . . . . .6 4.viii 3 Dynamic Scheduling Example . . . . . . . . . . . . . . .1 Where are we Using the i.4 Simulations of the DriftPlusPenalty Algorithm . . . .2. . . . . . . 3. . 29 30 31 34 36 37 37 39 40 41 42 43 3. . . . . . . .i. . . . . . . . . . . . . . .11 . . . . 4. . .9. . . . 4. . . . . . . . . . . . . . . . . . . .5 Simulation of the MaxWeight Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . .1 Boundedness Assumptions . . . . . . . . . . . . .9 4. . . . . . . . . . . . . . . . . . . . . . . . . 3. . PlaceHolder Backlog . . . . . . . . . . . . . . . . . . .7 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9. . 3. . . . . . . . . .i. . . . . . . . Virtual Queues . . . . . . . . . . . 3. . . . . . . .3 4 Optimizing Time Averages . . . . . . . . . . .d. . . . . .10 4. . . . .1 The Sonly Algorithm and max . . . . . . 4. . . . . . . . . . . . Appendix 4.3 4. . . . . . . . . . Noni. . . . . . . . . . . . 4. . . . . . . . . . . . . .5. . .2 NonErgodic Models and Arbitrary Sample Paths . . . . . . . . . . General System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 4. . . . . . . . . . . . . . . . . . . . . . . Generalizations . . . .2 4. . . . . . . . . . . . . . . . . . Optimality via ωonly Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . Exercises . .2 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples . . . . . . . . . . .11. . . . . . . . . . . . . . . . . . . . . . . .1 Scheduling for Stability . . . . . . . . . . . . . . . . . . . . . . . . .1 Lyapunov Drift and Lyapunov Optimization . .2 Analysis of the DriftPlusPenalty Algorithm . . . . . . . . . . . . . . .3 Probability 1 Convergence . . . . . . . . . . . . . . . . . . . .2 Lyapunov Optimization Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . .3 The “MinDrift” or “MaxWeight” Algorithm . . . . . . . . . . . . . . . . . .2 Opportunistic Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . 45 4. . . . . . . . . . . Assumptions? . . . . . . . . . . . . .A — Proving Theorem 4. . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . .1 Dynamic Server Scheduling . . . . . . . . .1. . . . . .d. . .
. . . . . . . . . 114 NonConvex Stochastic Optimization . . . . . . . . . . . . . . . . . . .2 The Utility Optimization Problem . . . . . . . . . . . . . . . 6. . 5. . .3 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . .1 The Renewal System Model . . . . . . . . . . . . . . . . . . . . . . . . . . .6. . . . . . . . . . . .1. . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 107 108 108 109 110 111 111 113 5. . . . . . . . . . .4 5. . .2 Multiplicative Factor Approximations .7 5. . . . . 98 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 138 140 141 6. . . .3 The JiangWalrand Theorem . . . .1 Alternate Formulations .2 Delayed Feedback . . . . 5. 157 7. . . .1. . . . . . . . . .1 Transmission Variables . . . . . . . . . . . 149 7. . . . . . . . . . . . . . .2 Solving the Transformed Problem . . . . . . . . . . . . . . . 128 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . .6. . . . . . . . . . . . . . . . . . . . . . .3 Limitations of this Model . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Worst Case Delay . . . . . . . . . . . . . . . . .0. . . . . . . . . . . .1 The persistent service queue . . . . . . . . . . . .1 Computing over Multiple Slots . . . . . . . . . . . . . . . . 5. . . . . 129 6 Approximate Scheduling . . . . . . . . . . . . . . . . . algorithms . . .1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5. . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . 151 DriftPlusPenalty for Renewal Systems . . . . . . . . . . . . . . . . . . . . . . . . . .1 Performance of the FlowBased Algorithm . . . .1 The Optimization Goal . . . 97 5. . . . . . . . . . . . . . . . . .6 General Optimization of Convex Functions of Time Averages . 152 7. 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 A FlowBased Network Model . MultiHop Queueing Networks . . . . . . . . . . .6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 7. . . . . .ix 5 Optimizing Functions of Time Averages . . .1 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . .1 TimeInvariant Interference Networks . . . . . . . . .i. . . . . . . .4 BackpressureBased Routing and Resource Allocation . . . . . . . . . .2 Optimality over i. . . . . . . . . . . . . . . . .0. . . . . . .2 Randomized Searching for the MaxWeight Solution . . 144 7 Optimization of Renewal Systems . . . . . . . . . . . . . . . . . . . . .3 The Rectangle Constraint R . . . . . . . . . . . 6. . . . . . . .3 MultiHop Network Utility Maximization . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . .2. . . . . . . . . . . . . . . . .3 Algorithm Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Alternative Fairness Metrics . . . . . . . . 5. . . .d. . . . . . . . . . . . . . . .4 Jensen’s Inequality . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . .5 Auxiliary Variables . 5. .2 The DriftPlusPenalty for WorstCase Delay . . . . . 98 5. . . . . . . . .5 5. . 120 122 123 125 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . .2. . . . . . . . .0. .3. .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . Utility Optimization for Renewal Systems . . . .6. .3. . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . . . . . . . . . . . . .x 0. . .4 7. . . . . . . . . . . 181 Author’s Biography . . . . . . . . . .5 7. . . . . . . . . .6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Bibliography . . . . . . . . . . . . . . . . . . . Dynamic Programming Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 7. . . .2 Optimization over Pure Policies . . . .7 Minimizing the DriftPlusPenalty Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 7. . . . . . . . . . . . . . . . . . .1 The Bisection Algorithm . . . . . . . . . . . . 7. 199 .3. . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS 7. . . . . .2 Markov Decision Problem for Minimum Delay Scheduling . . . . . . . .1 DelayLimited Transmission Example . . . . . . . . . . 7. . . . . . . . . . . . . . . . .1 The Utility Optimal Algorithm for Renewal Systems . .5. . . . . . . . . . . . . . . . . . . . . Task Processing Example . . . . . . . . . . . . . . . 7. . . . . . . . . . . . 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Caveat — Frames with Initial Information . . . . . . . . . . . 157 159 160 161 162 164 167 168 168 171 174 8 Conclusions . . . . . . . . . . . . . . . . . . . . . .
Diverse problem set questions (several with example solutions) are also given. • Placeholder bits for delay improvement (Sections 3. . • Nonconvex stochastic optimization (Section 5. A variety of examples and simulation results are given to illustrate the main concepts.4).2. • Universal scheduling for nonergodic sample paths (Section 4.6 and 7. • Approximate scheduling and full throughput scheduling in interference networks via the JiangWalrand theorem (Chapter 6). It has been reorganized with many more examples to help the reader. The current text is signiﬁcantly different from (22). This text also provides many recent topics not covered in (22).5). • VariableV algorithms that provide exact optimality of time averages subject to a weaker form of stability called “mean rate stability” (Section 4. multihop routing. The Lyapunov theory for optimizing network time averages was described collectively in our previous text (22).Preface This text is written to teach the theory of Lyapunov drift and Lyapunov optimization for stochastic network optimization. energyconstrained and delayconstrained queueing. Familiarity with Markov chains and with standard (nonstochastic) optimization is useful but not required. This is done while still keeping all of the details for a complete and selfcontained exposition of the material. It assumes only that the reader is familiar with basic probability concepts (such as expectations and the law of large numbers). including: • A more detailed development of queue stability theory (Chapter 2).1). network coding for maximum throughput.8). distortionaware data compression. • Optimization of renewal systems and Markov decision examples (Chapter 7). • Worst case delay bounds (Sections 5. These questions and examples were developed over several years for use in the stochastic network optimization course taught by the author.9).4 and 4.6. dynamic decision making for maximum proﬁt.7). and more. • Treatment of problems with equality constraints and abstract set constraints (Section 5. They include topics of wireless opportunistic scheduling.
xii PREFACE Finally. (iii) opportunistically minimizing an expectation. this text emphasizes the simplicity of the Lyapunov method. and (iv) Jensen’s inequality. Neely September 2010 . Michael J. showing how all of the results follow directly from four simple concepts: (i) telescoping sums. (ii) iterated expectations.
that is. S2 (t)) denote the channel conditions between users and the receiver on slot t.The channel conditions represent any information that affects the channel on slot t. Channel conditions are assumed to be constant for the duration of a slot. and they can be solved with a common mathematical framework that is intimately connected to queueing theory. Here we provide a simple wireless example to illustrate how the theory for optimizing time averages can be used. This channelaware scheduling is called opportunistic scheduling. Consider a 2user wireless uplink that operates in slotted time t ∈ {0. These problems can be formulated as problems that optimize the time averages of certain quantities subject to time average constraints on other quantities.p(t)) Figure 1. time variation. Our focus is on communication and queueing systems. 1. . Example applications include wireless mesh networks with opportunistic scheduling.1).p(t)) Receiver S2(t) b2(t)=b2(S(t). . Let (a1 (t). Every slot new data randomly arrives to each user for transmission to a common receiver. and sensor networks with joint compression and transmission.1: The 2user wireless system for the example of Section 1. networks with random events. in units of bits.1 EXAMPLE OPPORTUNISTIC SCHEDULING PROBLEM a1(t) a2(t) Q1(t) Q2(t) S1(t) b1(t)=b1(S(t). Every slot t. Let S (t) = (S1 (t). We assume the network controller can observe S (t) at the beginning of each slot t before making a transmission decision. . but they can change from slot to slot. 1. transportation.}. and smartgrid energy distribution. 2. cognitive radio networks. and uncertainty. We assume the receiver coordinates network decisions every slot.1 CHAPTER 1 Introduction This text considers the analysis and control of stochastic networks. 1. The data is stored in queues Q1 (t) and Q2 (t) to await transmission (see Fig. The techniques are also applicable to stochastic systems that arise in operations research. internets with peertopeer communication. such as fading coefﬁcients and/or noise ratios.1. the network controller observes the current S (t) . economics. a2 (t)) be the vector of new arrivals on slot t. adhoc mobile networks.
without requiring apriori knowledge of the probabilities associated with the arrival and channel processes a(t) and S (t). 1. . 1. S (t)). We thus have two decision vectors: p(t) (the power allocation vector) . However.1 EXAMPLE PROBLEM 1: MINIMIZING TIME AVERAGE POWER SUBJECT TO STABILITY Let pk be the time average power expenditure of user k under a particular power allocation algorithm (for k ∈ {1. This decision. Choosing a large value of V can thus push average power arbitrarily close to optimal. b2 (t) = b2 (p(t). this comes with a tradeoff in average queue backlog and delay that is O(V ). we have general transmission rate functions bk (p(t). b2 (t)) for slot t. 2. . The algorithm meets all desired constraints in the above problem whenever it is possible to do so. S (t)): ˆ ˆ b1 (t) = b1 (p(t). 2}. ∀t ∈ {0. It is shown in the next chapter that queue stability ensures the time average output rate of the queue is equal to the time average input rate. . .1. The queueing dynamics are then: ˆ Qk (t + 1) = max[Qk (t) − bk (p(t). determines the transmission rate vector (b1 (t). 2} ˆ to the receiver on slot t. 1. . S (t)) . 1. S (t)) The precise form of these functions depends on the modulation and coding strategies used for transmission. . INTRODUCTION and chooses a power allocation vector p(t) = (p1 (t).1. the algorithm is parameterized by a constant V ≥ 0 that can be chosen as desired to yield time average power within O(1/V ) from the minimum possible time average power required for queue stability. Our theory will allow the design of a simple algorithm that makes decisions p(t) ∈ P every slot t. 2} 2) p(t) ∈ P ∀t ∈ {0.2 1.} where queue stability is deﬁned in the next chapter. together with the current S (t). a2 (t)) can be controlled by a ﬂow control mechanism.} Several types of optimization problems can be considered for this simple system.2 EXAMPLE PROBLEM 2: MAXIMIZING THROUGHPUT SUBJECT TO TIME AVERAGE POWER CONSTRAINTS Consider the same system. Further. Speciﬁcally. 0] + ak (t) ∀k ∈ {1. p2 (t)) within some set P of possible power allocations. where bk (t) represents the transmission rate (in bits/slot) from user k ∈ {1. but now assume the arrival process a(t) = (a1 (t). 2}): p k = limt→∞ 1 t t−1 τ =0 pk (τ ) The problem of designing an algorithm to minimize time average power expenditure subject to queue stability can be written mathematically as: Minimize: p1 + p2 Subject to: 1) Queues Qk (t) are stable ∀k ∈ {1. 2.
1. . Indeed.av are given constants that represent desired average power constraints for each user. We then have the problem: Maximize: Subject to: 1) 2) 3) 4) g1 (a 1 ) + g2 (a 2 ) pk ≤ pk. Speciﬁcally. 2} Queues Qk (t) are stable ∀k ∈ {1. Again.av .} a(t) ∈ A ∀t ∈ {0. 1.av ∀k ∈ {1. concave. w2 are given positive weights that deﬁne the relative importance of user 1 trafﬁc and user 2 trafﬁc.3 EXAMPLE PROBLEM 3: MAXIMIZING THROUGHPUTUTILITY SUBJECT TO TIME AVERAGE POWER CONSTRAINTS Consider the same system as Example Problem 2. The admission vector a(t) is chosen within some set A every slot t. . The value g1 (a 1 ) represents the utility (or satisfaction) that user 1 gets by achieving a throughput of a 1 . Fairness properties of different types of utility functions are considered in (3)(4)(5)(6). We have the following problem of maximizing a weighted sum of throughput subject to average power constraints: Maximize: Subject to: 1) 2) 3) 4) w1 a 1 + w2 a 2 p k ≤ pk. our theory leads to an algorithm that meets all desired constraints and comes within O(1/V ) of the maximum throughput possible under these constraints. and nondecreasing functions of a over the range a ≥ 0. 1. let g1 (a) and g2 (a) be continuous. This creates a more evenly distributed throughput vector.} a(t) ∈ A ∀t ∈ {0. . 2. p2. 2} p(t) ∈ P ∀t ∈ {0.1. which is the same as the time average throughput of user k if its queue is stable (as shown in the next chapter). . 2. or g1 (a) = g2 (a) = log(1 + a).} where w1 . a 2 ). the sum utility g1 (a 1 ) + g2 (a 2 ) would be improved more by increasing a 1 than by increasing a 2 . maximizing a linear function often yields a vector with one component that is very high and the other component very low (possibly 0).1. . Such functions are called utility functions. 2. . . 2} p(t) ∈ P ∀t ∈ {0. . 1. .1. . The log(a) utility functions provide a type of fairness called proportional fairness (see (1)(2)). so that g1 (a 1 ) has a diminishing returns property with each incremental increase in throughput a 1 .} Typical utility functions are g1 (a) = g2 (a) = log(a). . . Let a k be the time average admission rate (in bits/slot) for user k.av ∀k ∈ {1. 2. Maximizing g1 (a 1 ) + g2 (a 2 ) can provide a more “fair” throughput vector (a 1 . with an O(V ) tradeoff in average backlog and delay. rather than a linear function of throughput (the deﬁnition of “concave” is given in footnote 1 in the next subsection). 2} Queues Qk (t) are stable ∀k ∈ {1. 1. . but now assume the objective is to maximize a concave function of throughput. EXAMPLE OPPORTUNISTIC SCHEDULING PROBLEM 3 and a(t) (the data admission vector). This means that if a 1 < a 2 . These functions are nondecreasing and strictly concave. and p1.
data compression. y1 (t). 2. network coding. the examples and problem set questions provided in this text include networks with probabilistic channel errors.4 1. .The attributes can be positive or negative. . Every slot t. The network is described by a collection of queue backlogs. written in vector form Q(t) = (Q1 (t). 1. M} ˆ yl (t) = yl (α(t). yl (t). The case K = 0 corresponds to a system without queues. . y (t). . y l . . . e(t): x(t) = (x1 (t). . L. such as power expenditures. . . . a control action is taken. or packet drops/admissions. . . The theory is also useful for problems within operations research and economics. . 1. .2 GENERAL STOCHASTIC OPTIMIZATION PROBLEMS The three example problems considered in the previous section all involved optimizing a time average (or a function of time averages) subject to time average constraints. yL (t)) e(t) = (e1 (t). . . . . . . Indeed. These attributes are given by general functions: xm (t) = xm (α(t). . INTRODUCTION For any given continuous and concave utility functions. and they represent penalties or rewards associated with the network on slot t. J } ˆ where ω(t) is a random event observed on slot t (such as new packet arrivals or channel conditions) and α(t) is the control action taken on slot t (such as packet admissions or transmissions). ω(t)) ∀m ∈ {1. 1. our theory enables the design of an algorithm that meets all desired constraints and provides throughpututility within O(1/V ) of optimality. . . QK (t)). We emphasize that these three problems are just examples. ej represent the time average of xm (t). . ej (t) under a particular control algorithm. distortions. J (used to distinguish between equality constraints and two types of inequality constraints). . ω(t)) ∀l ∈ {0. L} ˆ ej (t) = ej (α(t). and this action affects arrivals and departures of the queues and also creates a collection of real valued attribute vectors x(t). . with a tradeoff in average backlog and delay that is O(V ). multihop communication. . ω(t)) ∀j ∈ {1. Here we state the general problems of this type. Consider a stochastic network that operates in discrete time with unit time slots t ∈ {0. Let x m . and mobility. Our ﬁrst objective is to . . . . where K is a nonnegative integer.}. xM (t)) y (t) = (y0 (t). The action α(t) is chosen within an abstract set Aω(t) that possibly depends on ω(t). The general theory can treat many more types of networks. . eJ (t)) for some nonnegative integers M.
. we have f (p1 x1 + p2 x2 ) ≤ p1 f (x1 ) + p2 f (x2 ). such that all of the constraints are satisﬁed and the quantity to be minimized is as small as possible. and are analogues of the classic linear programs and convex programs of static optimization theory. These problems have wide applications. .1) (1. . . This enables algorithms that do not require knowledge of the probabilities associated with the random network events ω(t).8) (1. . and let X be a closed and convex subset of RM . g1 (x). While this theory is presented in detail in future chapters. . A solution is an algorithm for choosing control actions over time in reaction to the existing network state. A function f (x) is afﬁne if it is linear plus a constant. m=1 . . . J } α(t) ∈ Aω(t) ∀t Stability of all Network Queues (1. . we can introduce virtual queues as a strong method for ensuring that the required time average constraints are satisﬁed. x2 ∈ X and any two probabilities p1 . . more general than the ﬁrst. The ﬁrst step is to look at the constraints of the problem to be solved. . Inefﬁcient control actions incur larger backlog in certain queues. it turns out that queueing theory plays a central role in this type of stochastic optimization.4) (1. having the form: f (x) = c0 + M cm xm .5) and (1.10) (1. we brieﬂy describe it here.11) These problems (1. 1.2) (1. L} ej = 0 for all j ∈ {1. Indeed. A function f (x) is concave if −f (x) is convex. . . . . J } x∈X α(t) ∈ Aω(t) ∀t Stability of all Network Queues (1. . . We desire a solution to the following problem: Minimize: Subject to: 1) 2) 3) 4) 5) y 0 + f (x ) y l + gl (x) ≤ 0 for all l ∈ {1.6)(1. LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION 5 design an algorithm that solves the following problem: Minimize: Subject to: 1) 2) 3) 4) y0 y l ≤ 0 for all l ∈ {1.9) (1.1 Speciﬁcally. . Let x = (x 1 . p2 ≥ 0 such that p1 + p2 = 1. A function f (x) deﬁned over a convex set X is a convex function if for any two points x1 .5) and (1. x M ) be the vector of time averages of the xm (t) attributes under a given control algorithm.3. gL (x) be convex functions from RM to R.11) are removed).5) Our second objective. These backlogs act as “sufﬁcient statistics” on which to base the next control decision. .1)(1. let f (x). L} ej = 0 for all j ∈ {1.7) (1. for the 1 A set X ⊆ RM is convex if the line segment formed by any two points in X is also in X . even if there are no underlying queues in the original problem. .11) can be viewed as stochastic programs. .1.6) (1. However. and they are of interest even when there is no underlying queueing network to be stabilized (so that the “Stability” constraints in (1. For example. is to optimize convex functions of time averages.3 LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION We solve the problems described above with a simple and elegant theory of Lyapunov drift and Lyapunov optimization. .3) (1. . .
deﬁne a function L(t) as the sum of squares of backlog in all virtual and actual queues on slot t. While Lyapunov techniques have a long history in the ﬁeld of control theory. with a time average queue backlog bound of O(V ). The objective function to be minimized has not yet been incorporated.5). including packet switch networks (9)(10)(11). The objective function is mapped to an appropriate function penalty(t). Minimizing (t) has had wide success for stabilizing many other types of networks. Then construct virtual queues (in a way to be speciﬁed) that help to meet the desired constraints. . and we introduced the virtual queue technique in (20)(21) to solve problems of maximizing throughput in a wireless network 2The notation used in later chapters is slightly different. We show that the time average objective function deviates by at most O(1/V ) from optimality. and adhoc mobile networks (15).5). This is called a Lyapunov function.2 If control decisions are made every slot t to greedily minimize (t). Next. and Chapter 4 shows it also stabilizes general networks. Intuitively. Instead of taking actions to greedily minimize (t). resulting in backpressure routing and maxweight scheduling algorithms that stabilize the network whenever possible. Deﬁne (t) = L(t + 1) − L(t). and if L(t) is “large. However.6 1. Chapter 3 shows this method provides queue stability for a particular example network. actions are taken every slot t to greedily minimize the following driftpluspenalty expression: (t) + V × penalty(t) where V is a nonnegative control parameter that is chosen as desired. the problem is only half solved: The virtual queues and Lyapunov drift help only to ensure the desired time average constraints are met. Simpliﬁed notation is used here to give the main ideas.2)(1.” then all queues are small. We introduced the V × penalty(t) term to the drift minimization in (17)(18)(19) to solve problems of joint network stability and stochastic utility maximization.1)(1. For example. this form of Lyapunov drift was perhaps ﬁrst used to construct stable routing and scheduling policies for queueing networks in the pioneering works (7)(8) by Tassiulas and Ephremides.1)(1. wireless systems (7)(8)(12)(13)(14). if L(t) is “small. at this point.These works used the technique of minimizing (t) every slot.” then at least one queue is large. Minimizing (t) every slot is called minimizing the Lyapunov drift. y0 (t) is the objective function for the problem (1. the constraints are (1. which intuitively maintains network stability (where “stability” is precisely deﬁned in the next chapter). being the difference in the Lyapunov function from one slot to the next. A related technique was used for computing multicommodity network ﬂows in (16).5). and it is a scalar measure of network congestion. Choosing V = 0 corresponds to the original algorithm of minimizing the drift alone. INTRODUCTION problem (1. then backlogs are consistently pushed towards a lower congestion state. Choosing V > 0 includes the weighted penalty term in the control decision and allows a smooth tradeoff between backlog reduction and penalty minimization. The algorithms are particularly interesting because they only require knowledge of the current network state. and they do not require knowledge of the probabilities associated with future random events.
5). • Nonconvex stochastic optimization (Section 5.6)(1.5). These have been developed over several years for use in the stochastic network optimization course taught by the author. • Treatment of problems with equality constraints (1. We also provide a variety of examples and problem set questions to help the reader. Our previous text (22) uniﬁed these ideas for application to general problems of the type described in Section 1.4.9) (Section 5.3) and abstract set constraints (1. Primaldual techniques for maximizing utility in a stochastic wireless downlink are developed in (32)(33) for systems without queues. 1. DIFFERENCES FROM OUR EARLIER TEXT 7 subject to individual average power constraints at each node. convex programming.9). showing how the problem (1.6. • Optimization of renewal systems and Markov decision examples (Chapter 7). The current text is different from (22) in that we emphasize the general optimization problems ﬁrst. Speciﬁcally.4 DIFFERENCES FROM OUR EARLIER TEXT The theory of Lyapunov drift and Lyapunov optimization is described collectively in our previous text (22).7).4 and 4.2. This text also provides many new topics not covered in (22). and duality theory is developed for static wireline networks in (2)(23)(24) and for wireless networks in (25)(26)(27)(28)(29) where the goal is to converge to a static ﬂow allocation and/or resource allocation over the network.1)(1.6)(1. including: • A more detailed development of queue stability theory (Chapter 2).6 and 7. • Approximate scheduling and full throughput scheduling in interference networks via the JiangWalrand theorem (Chapter 6).11) in a ﬂuid limit sense. Scheduling in wireless networks with static channels is considered from a duality perspective in (30)(31).11) can be solved directly by using the solution to the simpler problem (1. the work (34) shows the primaldual technique leads to a ﬂuid . The primaldual technique is extended in (34)(35) to treat networks with queues and to solve problems similar to (1. • VariableV algorithms that provide exact optimality of time averages subject to a weaker form of stability called “mean rate stability” (Section 4.1). Lagrange multipliers. • Universal scheduling for nonergodic sample paths (Section 4. 1. • Placeholder bits for delay improvement (Sections 3. • Worst case delay bounds (Sections 5.4).8).5 ALTERNATIVE APPROACHES The relationship between network utility maximization.2.1.
It makes a statement concerning weak limits of scaled systems. O(V )] performancedelay tradeoff. A related 2timescale approach to learning optimal decisions in Markov decision problems is developed in (59). where z(t) is a controlled Markov ˆ chain (possibly related to the queue backlog) with transition probabilities that depend on control actions.9 (see also (38)(39)(40)(41)(42)). ω(t)). In particular. All of these approaches may suffer from large . respectively. z(t)). and learning approaches to poweraware scheduling in single queues are developed in (60)(61)(62)(63). i. √ An optimal [O(1/V ).6 ON GENERAL MARKOV DECISION PROBLEMS The penalties xm (α(t). one advantage of the primaldual approach is that it provides local optimum guarantees for problems of minimizing f (x) for nonconvex functions f (·) (see Section 5. but it uses a more aggressive drift steering technique. and optimal performancedelay tradeoffs for multiqueue systems are developed in (51)(52)(53) and shown to be achievable even when channel statistics are unknown. The dual approach is also robust to nonergodic variations and has “universal scheduling” properties. ω(t). properties that hold for systems with arbitrary sample paths.This tradeoff is not shown in the alternative approaches described above. as shown in Section 4. This latter work builds on the Lyapunov optimization method. and approximate dynamic programming. However. and it reduces to the well known dual subgradient algorithm for linear and convex programs when applied to nonstochastic problems (see (37)(22)(17) for discussions on this). neurodynamic programming. Related dualbased approaches are used for “inﬁnitely backlogged” systems in (31)(44)(45)(46) using static optimization. and Qlearning theory can be found in (67)(68)(69). depend only on the network control action ˆ α(t) and the random event ω(t) (where ω(t) is generated by “nature” and is not inﬂuenced by past control actions).e. ﬂuid limits. described in Section 1. O(V )] performancedelay tradeoff achieved by the driftpluspenalty algorithm on general systems is not necessarily the optimal tradeoff for particular networks.5 and (43)). A more advanced penalty structure would be xm (α(t). and it conjectures that the utility of the actual network is close to this ﬂuid limit when an exponential averaging parameter is scaled. A related primaldual algorithm is used in (36) and shown to converge to utilityoptimality as a parameter is scaled. We note that the [O(1/V ). INTRODUCTION limit with an optimal utility.. One advantage of the driftpluspenalty approach is the explicit convergence analysis and performance bounds. Related algorithms for channelaware scheduling in wireless downlinks with different analytical techniques are developed in (47)(48)(49). Extensions of Lyapunov optimization for this case are developed in Chapter 7 using a driftpluspenalty metric deﬁned over renewal frames (56)(57)(58).2. O( V )] energydelay tradeoff is shown by Berry and Gallager in (50) for a single link with known channel statistics.8 1. and stochastic gradients. 1. resulting in the [O(1/V ). Background on dynamic programming and Markov decision problems can be found in (64)(65)(66). Our driftpluspenalty approach can be viewed as a dualbased approach to the stochastic problem (rather than a primaldual approach). the queue backlogs Q(t) are not included in the penalties. A placeholder technique for achieving nearoptimal delay tradeoffs is developed in (37) and related implementations are in (54)(55).
are considered in (61)(60)(56)(57)(62)(63).62.1).6.10). delaylimited transmission (Exercises 5. O(V )] performancedelay tradeoffs we derive for general networks in this text are not necessarily the optimal tradeoffs for particular networks. (90) shows the optimal multidimensional dynamic program has a very simple threshold structure. Minimum energy problems with delay deadlines are considered for multiqueue wireless systems in (90).6. Related work in (91) considers delay optimal scheduling in multiqueue systems and derives structural results of the dynamic programs. Further work on delaylimited transmission is found in (70)(71). high complexity. In the case when channels are static.7 ON NETWORK DELAY This text develops general [O(1/V ).7. and ﬁlter theory is used to establish delay bounds in (87). and average delay constraints (Section 7. A mixed Lyapunov optimization and dynamic programming approach is given in (56) for networks with a small number of delayconstrained queues and an arbitrarily large number of other queues that only require stability.14 and Section 7. There are many additional interesting topics on network delay that we do not cover in this text.135. We also provide examples of exact delay analysis for randomized algorithms (Exercises 2. and Lyapunov drift algorithms that use delays as weights. This is due to the curse of dimensionality for Markov decision problems. These approximations are shown to have optimal decay exponents for sum queue backlog in (92). The work (50) considers the optimal . This problem does not arise when using the Lyapunov optimization technique and when penalties have the structure given in Section 1.6). Heuristic approximations are given for more general ratepower curves. Onequeue problems with strict deadlines and apriori knowledge of future events are treated in (82)(83)(84)(85)(86). 1. rather than queue backlogs. O(V )] tradeoffs.1 DELAY AND DYNAMIC PROGRAMMING Dynamic programming and Markov decision frameworks are considered for onequeue energy and delay optimality problems in (77)(78)(79)(80)(81).1. In the case when channels are varying but ratepower functions are linear. resulting in efﬁcient approximation algorithms.2. Approximate dynamic programs and qlearning type algorithms. We brieﬂy discuss some of those topics in the following subsections. 1. worst case delay (Section 5. are considered in (72)(73)(74)(75)(76). ON NETWORK DELAY 9 convergence times. Control rules for two interacting service stations are given in (88).7.2). or inaccurate approximation when applied to large networks. giving explicit bounds on average queue backlog and delay that grow linearly with V .7. 1. with references given for further reading. Optimal scheduling in a ﬁnite buffer 2 × 2 packet switch is treated in (89). the work (90) maps the problem to a shortest path problem.2 √ OPTIMAL O( V ) AND O(log(V )) DELAY TRADEOFFS The [O(1/V ). which attempt to learn optimal decision strategies. which relies on techniques developed in (93) for optimal maxqueue exponents.
and an algorithm with an exponential Lyapunov function and aggressive drift steering is shown to meet this tradeoff to within a logarithmic factor. but it can be made O(1) with a simple queueaware queue grouping algorithm. Related work in (100) considers scheduling in Nuser wireless systems with ON/OFF channels and shows that delay is at least linear in N if queueunaware algorithms are used. O(log2 (V ))] tradeoffs are shown for the basic quadratic Lyapunov driftpluspenalty method in (37)(55) using placeholders and LastInFirstOut (LIFO) scheduling. 1. O(log(V ))] tradeoff is achievable in certain exceptional cases with piecewise linear structure. This O(1) delay. and it develops a simple queueaware scheduling algorithm that gives O(log(N)) delay whenever rates are within the capacity region. In particular. is called order optimal because it differs from optimal only in a constant coefﬁcient that does not depend on N. It shows that no √ algorithm can do better than an [O(1/V ). again using queue grouping theory.” where arrival rates and channel probabilities are the same for all queues. and it is not known if it is possible to achieve O(1) delay in this region. O(log(V ))] energydelay tradeoffs are shown in (53) in cases when packet dropping is allowed. and related implementations are in (54). Nearoptimal [O(1/V ). This optimal √ [O(1/V ).4 ORDEROPTIMAL DELAY SCHEDULING AND QUEUE GROUPING The work (99) shows that delay is at least linear in N for N × N packet switches that use queueunaware scheduling. and optimal [O(1/V ). Related work in (98) proves delay optimality of the join the shortest queue strategy for routing packets to two queues with identical exponential service.7. it is shown that N × N packet switches can provide O(1) delay (orderoptimal) if they are at most halfloaded. The work (8) proves delay optimality of the longest connected queue ﬁrst algorithm for ON/OFF channels with a single server. The work (51) also shows an improved [O(1/V ). Orderoptimal delay for 1hop switch scheduling under maximal scheduling (which provides stability only when rates are within a constant factor of the capacity boundary) are developed in (102)(103). O( V )] tradeoff is extended to multiqueue systems in (51). O( V )] tradeoff. The best known delay bound beyond the halfloaded region is the O(log(N )) delay result of (99). INTRODUCTION energydelay tradeoff for a onequeue wireless system with a fading channel. 1. and it proposes a bufferpartitioning algorithm that can be shown to come within a logarithmic factor of this tradeoff. Timecorrelated “bursty” trafﬁc is considered in (103).3 DELAYOPTIMAL ALGORITHMS FOR SYMMETRIC NETWORKS The works (8)(94)(95)(96)(97) treat multiqueue wireless systems with “symmetry. The .7.10 1. O(log(V ))] utilitydelay tradeoffs are shown for ﬂow control problems in (52). the work (94)(97) considers multiserver systems. Order optimality of the simple longest connected queue ﬁrst rule (simpler than the algorithm of (100)) is proven in (101) via a queue grouping analysis.8. and the work (95)(96) considers wireless problems under the information theoretic multiaccess capacity region. described in more detail in Section 4. independent of the number of users. Optimal [O(1/V ). They use stochastic coupling theory to prove delay optimality for particular algorithms.
An analysis of backlog distributions that are valid also in the small buffer regime is given in (115) for the case when the number of network channels is scaled to inﬁnity. Heavy trafﬁc analysis is considered in (107) for wireless scheduling and (108)(109) for packet switches. 1. possibly one that uses redundant packet transfers. and for processor sharing queues in (113)(114). the algorithm in (117) uses a 2hop relay algorithm that creates a large delay. Grossglauser and Tse show in (117) that mobility increases pernode capacity to (1). Simulations on N × N packet switches in (110) show that delay is improved when α is positive but small. The work (118)(17) also shows for this simple model that the average delay W of any scheduling and routing protocol. Large deviation theory is often used to analyze queue backlog and delay. must satisfy: W N −d ≥ (1 − log(2)) λ 4d .5 HEAVY TRAFFIC AND DECAY EXPONENTS A line of work addresses asymptotic delay optimality in a “heavy trafﬁc” regime where input rates are pushed very close to the capacity region boundary. which achieves O(log(N )) delay. Algorithms that optimize the exponent of queue backlog are considered in (93) for optimizing the maxqueue exponent and in (92) for the sumqueue exponent.The exact capacity and average endtoend delay are computed in (118)(17) for a cellpartitioned network with a simpliﬁed i. Of course. delay grows to inﬁnity if input rates are pushed toward the capacity boundary.7. It is interesting to note that αmax weight policies with small but positive α make matching decisions that are similar to the maxsize matches used in the framebased algorithm of (99). where 0 < α ≤ 1. However. to an αmax weight rule that seeks to maximize i Qi (t)α μi (t).7. for delaybased scheduling in (73). 1.1.7. The work (108)(109) suggests that delay in packet switches can be improved by changing the wellknown maxweight rule.6 CAPACITY AND DELAY TRADEOFFS FOR MOBILE NETWORKS Work by Gupta and Kumar in (116) shows that pernode capacity of adhoc wireless networks with √ N nodes and with random sourcedestination pairings is roughly (1/ N) (neglecting logarithmic factors in N for simplicity). This may be a reason why the delay of αmax weight policies is also small.i. A discussion of this in the context of heavy trafﬁc theory is given in (111). Delay is often easier to understand in this heavy trafﬁc regime due to a phenomenon of state space collapse (106). which seeks to maximize a weighted sum of queue backlog and service rates every slot t ( i Qi (t)μi (t)). which does not vanish with N.d. mobility model. and this is considered for αmax weight policies in (112). but the goal is to design an algorithm that minimizes an asymptotic growth coefﬁcient. along with some counterexamples. These consider analysis of queue backlog when the queue is very large. ON NETWORK DELAY 11 queue grouping results in (101)(103) are inspired by queuegrouped Lyapunov functions developed in (104)(105) for stability analysis.
if the node/cell density d = (1). and convex functions is useful but not required as we present or derive results in these areas as needed in the text. Recent network coding approaches are in (126)(127)(128). and it also shows improved tradeoffs are possible if the model is changed to allow time slot scaling and network bitpipelining. including discussions of Little’s Theorem and the renewalreward theorem. All of the major results of this text are derived directly from one or more of the following four key concepts: • Law of Telescoping Sums: For any function f (t) deﬁned over integer times t ∈ {0. (ii) E {max[X. etc. the law of iterated expectations holds whenever the result of Fubini’s Theorem holds (which allows one to switch the integration order of a double integral).) and with basic mathematical analysis. 2. This holds whenever any one of the following hold: (i) E {X} < ∞. C is the number of cells.i.8 PRELIMINARIES We assume the reader is comfortable with basic concepts of probability and random processes (such as expectations. and log(·) denotes the natural logarithm. then W /λ ≥ (N). . (iii) E {min[X. This is the main idea behind Lyapunov drift arguments: Controlling the change in a function at every step allows one to control the ending value of the function. . see (129)(66)(130)(131)(132). • Law of Iterated Expectations: For any random variables X and Y . . For additional references on queueing theory and Markov chains. Thus.i. Familiarity with queueing theory. and Jensen’s inequality. The work (120)(121) quantiﬁes the optimal tradeoff achievable under this type of radius scaling.}. the law of large numbers.d. Similar i. mobility models are considered in (119)(120)(121). Markov chains. we have:3 E {X} = E {E {XY }} 3 Strictly speaking. and a relay algorithm √ that redundantly transmits packets over multiple paths meets this bound with λ = (1/ N) and √ W = ( N ). . 0]} < ∞. Related delay tradeoffs via transmission radius scaling for nonmobile networks are in (122). Caratheodory’s theorem. INTRODUCTION where λ is the peruser throughput. For additional references on convex analysis. The work (119) shows that improved tradeoffs are possible if the transmission radius of each node can be scaled to include a large amount of users in each transmission (so that the d = (1) assumption is relaxed).d. see (133)(134)(135). including discussions of convex hulls. The 2hop relay algorithm meets this bound with λ = (1) and W = (N ). 1.12 1. Analysis of noni. 1. we have for any integer time t > 0: t−1 [f (τ + 1) − f (τ )] = f (t) − f (0) τ =0 The proof follows by a simple cancellation of terms. mobility models is more complex and considered in (123)(124)(122)(125). d = N/C is the node/cell density. 0]} > −∞.
and assume that E {X } is well deﬁned and ﬁnite (where the expectation is taken entrywise). g(t) are functions that satisfy f (t) ≤ g(t) for all t. • Opportunistically Minimizing an Expectation: Consider a game we play against nature. Using (or not using) these limits does not impact any of the main ideas in this text. Then.” without loss of rigor.8. PRELIMINARIES 13 where the outer expectation is with respect to the distribution of Y . Let c(α. and let f (x) be a convex function over X . the policy that minimizes E {c(α. we have: c(αω . αω . and for which the limit of f (tk ) exists. • Jensen’s Inequality (not needed until Chapter 5): Let X be a convex subset of RM (possibly being the full space RM itself ). Likewise. ω) over all α ∈ Aω . We deﬁne lim supt→∞ f (t) as the largest possible limiting value of f (t) over any subsequence of times tk that increase to inﬁnity. ω) represent a general cost function. We look at nature’s choice of ω and then choose a control action α within some action set Aω that possibly depends on ω. then lim supt→∞ f (t) ≤ lim supt→∞ g(t). Our goal is to design a (possibly randomized) policy for choosing α ∈ Aω to minimize the expectation E {c(α. αω . for example. a cosine function). lim inf t→∞ f (t) is the smallest possible limiting value. note that a function f (t) may or may not have a well deﬁned limit as t → ∞ (consider. the lim sup and lim inf. min there is at least one action αω that minimizes the function c(α. For example. showing that the expectation under the policy α min is less than or equal to the E c(αω ω expectation under any other policy. ω) ≤ c(αω . not surprisingly.1. This is an inequality relationship min ∗ min concerning the random variables ω. under the additional assumption that the regular limit exists. . Then: E {X } ∈ X and f (E {X }) ≤ E {f (X )} (1. ω)}.12) This text also uses. It can be shown that these limits always exist (possibly being ∞ or −∞). ω) . For readers interested in more details on this. respectively. ω)} is the one that observes ω and selects min a minimizing action αω . lim inf t→∞ f (t) ≤ lim inf t→∞ g(t). where the expectation is taken with respect to the distribution of ω and the distribution of our action α that possibly depends on ω. and the inner expectation is with respect to the conditional distribution of X given Y . ω) ≤ ∗ . the lim sup and lim inf of the cosine function are 1 and −1. ω). The main properties of lim sup and lim inf that we use in this text are: • If f (t). Assume for simplicity that for any given outcome ω. Let X be any random vector that takes values in X . Taking expectations yields E c(αω . Likewise. This is useful for designing drift minimizing algorithms. ∗ This is easy to prove: If αω represents any random control action chosen in the set Aω in min ∗ response to the observed ω. in addition to regular limits of functions. and readers who are not familiar with these limits can replace all instances of “lim sup” and “lim inf” with regular limits “lim. where nature generates a random variable ω with some (possibly unknown) probability distribution.
with equality if and only if the regular limit exists. we have lim inf t→∞ f (t) ≤ lim supt→∞ f (t). we have lim inf t→∞ f (t) = lim supt→∞ f (t) = limt→∞ f (t). whenever the regular limit exists. . INTRODUCTION • For any function f (t). where g ∗ is a ﬁnite constant. Further. we have lim inf t→∞ f (t) = − lim sup[−f (t)].14 1. lim supt→∞ f (t) = − lim inf t→∞ [−f (t)] and • If f (t) and g(t) are functions such that limt→∞ g(t) = g ∗ . • For any function f (t). then lim supt→∞ [g(t) + f (t)] = g ∗ + lim supt→∞ f (t).
For example. The units of Q(t). it is clear from (2.1) We call Q(t) the backlog on slot t.1) that Q(t) ≥ 0 for all slots t. Speciﬁcally.}. where a(t) can be a sum of exogenous and endogenous arrivals. . . 2. . 1.}. or some other unit of unﬁnished work relevant to the system. kilobits. Speciﬁcally.1 Because we assume Q(0) ≥ 0 and a(t) ≥ 0 for all slots t. 2. . a(t). 0] is the same as (2. 0] operator as follows: ˜ Q(t + 1) = Q(t) − b(t) + a(t) for t ∈ {0. The stochastic processes {a(t)}∞ and {b(t)}∞ are sequences of real valued random variables deﬁned t=0 t=0 over slots t ∈ {0. 1. Alternatively. they might be real numbers with units of bits. a queue with dynamics Q(t + 1) = max[Q(t) − β(t) + α(t). in a communication system with ﬁxed size data units.2) ˜ where b(t) is the actual work processed on slot t (which may be less than the offered amount b(t) ˜ if there is little or no backlog in the system on slot t).} (2. Leaving a(t) outside the max[·. . .1) with a(t) = 0 and b(t) = β(t) − α(t) for all t. Q(t)] 1 Assuming that the b(t) value in (2. 0] is crucial for treatment of multihop networks. the initial state Q(0) is assumed to be a nonnegative real valued random variable. . 1. as it can represent an amount of work that needs to be done. We can equivalently rewrite the dynamics (2. This is useful for the virtual queues deﬁned in future sections where b(t) can be interpreted as a (possibly negative) attribute.} (2. . and it is assumed to be nonnegative. b(t) is mathematically deﬁned: ˜ b(t)= min[b(t). 1. The value of b(t) represents the amount of work the server of the queue can process on slot t. 2. . 0] operator. .1) without the nonlinear max[·. b(t) is assumed to be nonnegative. . .15 CHAPTER 2 Introduction to Queues Let Q(t) represent the contents of a singleserver discrete time queueing system deﬁned over integer time slots t ∈ {0. and b(t) depend on the context of the system.1) is possibly negative also allows treatment of modiﬁed queueing models that place new ˆ ˆ arrivals inside the max[·. these quantities might be integers with units of packets. 0] + a(t) for t ∈ {0. . 2. although it is sometimes convenient to allow b(t) to take negative values. Future states are driven by stochastic arrival and server processes a(t) and b(t) according to the following dynamic equation: Q(t + 1) = max[Q(t) − b(t). For most physical queueing systems. The value of a(t) represents the amount of new work that arrives on slot t. For example.
16
2. INTRODUCTION TO QUEUES
˜ Note by deﬁnition that b(t) ≤ b(t) for all t.The dynamic equation (2.2) yields a simple but important property for all sample paths, described in the following lemma.
Lemma 2.1 (Sample Path Property) For any discrete time queueing system described by (2.1), and for any two slots t1 and t2 such that 0 ≤ t1 < t2 , we have:
t2 −1 t2 −1
Q(t2 ) − Q(t1 ) =
τ =t1
a(τ ) −
τ =t1
˜ b(τ )
(2.3)
Therefore, for any t > 0, we have: Q(t) Q(0) − t t Q(t) Q(0) − t t = ≥ 1 t 1 t
t−1 τ =0 t−1 τ =0
1 a(τ ) − t a(τ ) − 1 t
t−1 τ =0 t−1
˜ b(τ ) b(τ )
(2.4) (2.5)
τ =0
Proof. By (2.2), we have for any slot τ ≥ 0: ˜ Q(τ + 1) − Q(τ ) = a(τ ) − b(τ ) Summing the above over τ ∈ {t1 , . . . , t2 − 1} and using the law of telescoping sums yields:
t2 −1 t2 −1
Q(t2 ) − Q(t1 ) =
τ =t1
a(τ ) −
τ =t1
˜ b(τ )
This proves (2.3). Inequality (2.4) follows by substituting t1 = 0, t2 = t, and dividing by t. Inequality ˜ (2.5) follows because b(τ ) ≤ b(τ ) for all τ . 2 An important application of Lemma 2.1 to poweraware systems is treated in Exercise 2.11. The equality (2.4) is illuminating. It shows that limt→∞ Q(t)/t = 0 if and only if the time average ˜ ˜ of the process a(t) − b(t) is zero (where the time average of a(t) − b(t) is the limit of the righthandside of (2.4)). This happens when the time average rate of arrivals a(t) is equal to the time ˜ average rate of actual departures b(t). This motivates the deﬁnitions of rate stability and mean rate stability, deﬁned in the next section.
2.1. RATE STABILITY
17
2.1
RATE STABILITY
Let Q(t) be a real valued stochastic process that evolves in discrete time over slots t ∈ {0, 1, 2, . . .} according to some probability law.
Deﬁnition 2.2
A discrete time process Q(t) is rate stable if: lim Q(t) = 0 with probability 1 t
t→∞
Deﬁnition 2.3
A discrete time process Q(t) is mean rate stable if: E {Q(t)} =0 t→∞ t lim
We use an absolute value of Q(t) in the mean rate stability deﬁnition, even though our queue in (2.1) is nonnegative, because later it will be useful to deﬁne mean rate stability for virtual queues that can be possibly negative.
Theorem 2.4
(Rate Stability Theorem) Suppose Q(t) evolves according to (2.1), with a(t) ≥ 0 for all t, and with b(t) real valued (and possibly negative) for all t. Suppose that the time averages of the processes a(t) and b(t) converge with probability 1 to ﬁnite constants a av and bav , so that: 1 t→∞ t lim lim 1 t
t−1
a(τ ) = a av
τ =0 t−1
with probability 1 with probability 1
(2.6) (2.7)
t→∞
b(τ ) = bav
τ =0
Then: (a) Q(t) is rate stable if and only if a av ≤ bav . (b) If a av > bav , then: lim Q(t) = a av − bav with probability 1 t ≤ C for
t→∞
(c) Suppose there are ﬁnite constants > 0 and C > 0 such that E [a(t) + b− (t)]1+ all t, where b− (t)= − min[b(t), 0]. Then Q(t) is mean rate stable if and only if a av ≤ bav .
18
2. INTRODUCTION TO QUEUES
Proof. Here we prove only the necessary condition of part (a). Suppose that Q(t) is rate stable, so that Q(t)/t → 0 with probability 1. Because (2.5) holds for all slots t > 0, we can take limits in (2.5) as t → ∞ and use (2.6)(2.7) to conclude that 0 ≥ a av − bav . Thus, a av ≤ bav is necessary for rate stability. The proof for sufﬁciency in part (a) and the proof of part (b) are developed in Exercises 2.3 and 2.4. The proof of part (c) is more complex and is omitted (see (136)). 2 The following theorem presents a more general necessary condition for rate stability that does not require the arrival and server processes to have well deﬁned limits.
Theorem 2.5
(Necessary Condition for Rate Stability) Suppose Q(t) evolves according to (2.1), with any general processes a(t) and b(t) such that a(t) ≥ 0 for all t. Then: (a) If Q(t) is rate stable, then: 1 lim sup t→∞ t
t−1
[a(τ ) − b(τ )] ≤ 0 with probability 1
τ =0
(2.8)
(b) If Q(t) is mean rate stable and if E {Q(0)} < ∞, then: lim sup
t→∞
1 t
t−1
E {a(τ ) − b(τ )} ≤ 0
τ =0
(2.9)
Proof. The proof of (a) follows immediately by taking a lim sup of both sides of (2.5) and noting that Q(t)/t → 0 because Q(t) is rate stable. The proof of (b) follows by ﬁrst taking an expectation of (2.5) and then taking limits. 2
2.2
STRONGER FORMS OF STABILITY
Rate stability and mean rate stability only describe the long term average rate of arrivals and departures from the queue, and do not say anything about the fraction of time the queue backlog exceeds a certain value, or about the time average expected backlog. The stronger stability deﬁnitions given below are thus useful.
Deﬁnition 2.6
A discrete time process Q(t) is steady state stable if:
M→∞
lim g(M) = 0
where for each M ≥ 0, g(M) is deﬁned: g(M)= lim sup
t→∞
1 t
t−1
P r[Q(τ ) > M]
τ =0
(2.10)
2.3. RANDOMIZED SCHEDULING FOR RATE STABILITY
19
Deﬁnition 2.7
A discrete time process Q(t) is strongly stable if: lim sup
t→∞
1 t
t−1
E {Q(τ )} < ∞
τ =0
(2.11)
Under mild boundedness assumptions, strong stability implies all of the other forms of stability, as speciﬁed in Theorem 2.8 below.
Theorem 2.8
(Strong Stability Theorem) Suppose Q(t) evolves according to (2.1) for some general stochastic processes {a(t)}∞ and {b(t)}∞ , where a(t) ≥ 0 for all t, and b(t) is real valued for all t. t=0 t=0 Suppose Q(t) is strongly stable. Then: (a) Q(t) is steady state stable. (b) If there is a ﬁnite constant C such that either a(t) + b− (t) ≤ C with probability 1 for all t (where b− (t)= − min[b(t), 0]), or b(t) − a(t) ≤ C with probability 1 for all t, then Q(t) is rate stable, so that Q(t)/t → 0 with probability 1. (c) If there is a ﬁnite constant C such that either E a(t) + b− (t) ≤ C for all t, or E {b(t) − a(t)} ≤ C for all t, then Q(t) is mean rate stable. Proof. Part (a) is given in Exercise 2.5. Parts (b) and (c) are omitted (see (136)).
2
Readers familiar with discrete time Markov chains (DTMCs) may be interested in the following connection: For processes Q(t) deﬁned over an ergodic DTMC with a ﬁnite or countably inﬁnite state space and with the property that, for each real value M, the event {Q(t) ≤ M} corresponds to only a ﬁnite number of states, steady state stability implies the existence of a steady state distribution, and strong stability implies ﬁnite average backlog and (by Little’s theorem (129)) ﬁnite average delay.
2.3
RANDOMIZED SCHEDULING FOR RATE STABILITY
The Rate Stability Theorem (Theorem 2.4) suggests the following simple method for stabilizing a multiqueue network: Make scheduling decisions so that the time average service and arrival rates av are well deﬁned and satisfy aiav ≤ bi for each queue i. This method typically requires perfect knowledge of the arrival and channel probabilities so that the desired time averages can be achieved. Some representative examples are provided below. A better method that does not require apriori statistical knowledge is developed in Chapters 3 and 4.
2. 3} by: bi (t) = 1 0 if a server is connected to queue i on slot t otherwise av av av Assume the arrival processes have well deﬁned time average rates (a1 . 1) with probability 1/2 and (1.7. 0. b2 (t). 1). 2. While this is a randomized scheduling algorithm. a2 . choosing (0. . such as one that alternates between (0.1. All packets have ﬁxed length.5.5. a av ) = (0. a2 . av (1. equally likely over the three options (1. over slots. 0. 0). Likewise. 0.9) av av av b) (a1 . b2 = 0. 2. Design a server allocation algorithm to make all queues rate stable when arrival rates are given as follows: av av av a) (a1 .5) to prove that the constraints 0 ≤ aiav ≤ 1 for all i ∈ {1. The service is given for i ∈ {1. 0.5 by the law of large numbers. 2/3. and (0. over slots with b1 = 0. in units of packets/slot. 1) with probability 1/2.d. 2/3) av . 1).5. and a queue that is allocated a server on a given slot can serve exactly one packet on that slot. a av .i. one could also design a deterministic algorithm.d. 0. b3 (t)) to be independent and identically distributed (i.75) av av e) Use (2.20 2. a3 ). a3 ) = (0. 3}. 2. a3 ) = (0. 2SERVER EXAMPLE Example Problem: Consider the 3queue.1: A 3queue.5 and t=0 av av b3 = 1. 1. 0.) every slot.3.4) c) (a1 2 3 av av av d) (a1 . b3 (t)) i. 3}. 0. b) Choose (b1 (t). A single queue cannot receive 2 servers on the same slot. a2 . a2 . 2.d. 0. and so by the Rate Stability Theorem all queues are rate stable.i. b2 (t). are necessary for the existence of a rate stabilizing algorithm. Then av av {b1 (t)}∞ is i. Every slot the network controller decides which 2 queues receive servers. 1. and a1 + a2 + av ≤ 2. Every slot we choose which 2 queues to serve.1 A 3QUEUE. INTRODUCTION TO QUEUES a1(t) a2(t) a3(t) Q1(t) Q2(t) Q3(t) Figure 2. Then clearly aiav ≤ bi for all i ∈ {1. 2server system of Fig. 2. 3}. 0.65. 2server system. Then bi = 2/3 = aiav for all i ∈ {1.9. 1) (on odd slots) and (1.i. and so the Rate Stability Theorem ensures all queues are rate stable. a3 ) = (2/3. 1) (on even slots). a3 Solution: a) Choose the service vector (b1 (t). 1. 1.
8) Then we can use p1 = 0. 3} (2.2. 0. and so all queues are rate stable by the Rate Stability Theorem. 1) with probability p1 . For each queue i. 0) ≥ (0. 1) + p3 (1. 1. b2 = p1 + p3 = 0. with probability 1: 0 ≥ aiav − 1 . Thus.3. 1. 0. 0. d) Use the same linear program (2. 1.65. p2 = 0. Taking a limit as t → ∞ and using the fact that queue i is rate stable yields. 0) ≥ (0.3.7. 1.5) and the ﬁnal inequality holds because bi (τ ) ≤ 1 for all τ .3.75) This can be solved by hand by trialanderror. p3 = 0. It is an interesting exercise to design an alternative deterministic algorithm that uses a periodic schedule to produce the same time averages. p2 .7. 0.12)(2.This is an example of a linear program. The above holds for all t > 0. One simplifying trick is to replace the above inequality constraint with the following equality constraint: p1 (0. b1 = p2 + p3 = 0.4.5. 0. av av av p2 = 0. RANDOMIZED SCHEDULING FOR RATE STABILITY 21 c) Every slot.6. b3 = p1 + p2 = 0. 1) + p2 (1. Linear programs are typically difﬁcult to solve by hand.9.13) (2. 0.14). 1.2. 1) with probability p2 . 0) = (0.7. 1. 1) + p2 (1.9. e) Consider any algorithm that makes all queues rate stable. 1) + p3 (1. 1.12) with the following: p1 (0. 1.4) p1 + p2 + p3 = 1 pi ≥ 0 ∀i ∈ {1. 0. 1) + p3 (1. 0) with probability p3 . 1) + p2 (1. One can verify the following (unique) solution: p1 = 0. but this one can be solved easily by guessing that the constraint in (2.1. and let bi (t) be the queuei decision made by the algorithm on slot t. p3 = 0.5.12) is taken entrywise. 0. 2. and (1. we have for all t > 0: Qi (t) Qi (0) − t t 1 t 1 t t−1 ≥ ≥ ai (τ ) − τ =0 t−1 τ =0 1 t t−1 bi (τ ) τ =0 ai (τ ) − 1 where the ﬁrst inequality follows by (2. 0. p3 satisfy: p1 (0.14) where the inequality (2.5. independently choose the service vector (0.12) can be solved with equality. so that p1 . 0. but replace the constraint (2.12) (2. (1.
22
2. INTRODUCTION TO QUEUES
and so we ﬁnd that, for each i ∈ {1, 2, 3}, the condition aiav ≤ 1 is necessary for the existence of an algorithm that makes all queues rate stable. Similarly, we have: Q1 (t) + Q2 (t) + Q3 (t) Q1 (0) + Q2 (0) + Q3 (0) − t t t−1 t−1 1 1 ≥ [a1 (τ ) + a2 (τ ) + a3 (τ )] − [b1 (τ ) + b2 (τ ) + b3 (τ )] t t ≥ 1 t
τ =0 t−1 τ =0 τ =0
[a1 (τ ) + a2 (τ ) + a3 (τ )] − 2
where the ﬁnal inequality holds because b1 (τ ) + b2 (τ ) + b3 (τ ) ≤ 2 for all τ . Taking limits shows av av av that 0 ≥ a1 + a2 + a3 − 2 is also a necessary condition. av av av Discussion: Deﬁne as the set of all rate vectors (a1 , a2 , a3 ) that satisfy the constraints in av av av part (e) of the above example problem. We know from part (e) that (a1 , a2 , a3 ) ∈ is a necessary condition for existence of an algorithm that makes all queues rate stable. Further, it can be shown av av av that for any vector (a1 , a2 , a3 ) ∈ , there exist probabilities p1 , p2 , p3 that solve the following linear program:
av av av p1 (0, 1, 1) + p2 (1, 0, 1) + p3 (1, 1, 0) ≥ (a1 , a2 , a3 ) p1 + p 2 + p 3 = 1 pi ≥ 0 ∀i ∈ {1, 2, 3}
Showing this is not trivial and is left as an advanced exercise. However, this fact, together with the Rate Stability Theorem, shows that it is possible to design an algorithm to make all queues rate av av av av av av stable whenever (a1 , a2 , a3 ) ∈ . That is, (a1 , a2 , a3 ) ∈ is necessary and sufﬁcient for the existence of an algorithm that makes all queues rate stable. The set is called the capacity region for the network. Exercises 2.7 and 2.8 provide additional practice questions about scheduling and delay in this system.
2.3.2
A 2QUEUE OPPORTUNISTIC SCHEDULING EXAMPLE
Example Problem: Consider a 2queue wireless downlink that operates in discrete time (Fig. 2.2a). All data consists of ﬁxed length packets. The arrival process (a1 (t), a2 (t)) represents the (integer) number of packets that arrive to each queue on slot t. There are two wireless channels, and packets in queue i must be transmitted over channel i, for i ∈ {1, 2}. At the beginning ofeach slot, the network controller observes the channel state vector S (t) = (S1 (t), S2 (t)), where Si (t) ∈ {ON, OF F }, so that there are four possible channel state vectors. The controller can transmit at most one packet per slot, and it can only transmit a packet over a channel that is ON. Thus, for each channel i ∈ {1, 2}, we have: bi (t) = 1 0 if Si (t) = ON and channel i is chosen for transmission on slot t otherwise
2.3. RANDOMIZED SCHEDULING FOR RATE STABILITY
23
a1(t) a2(t)
Q1(t) Q2(t) (a)
S1(t)
2
(0,0.4)
(0.36,0.4)
(0.6,0.16)
S2(t)
(0,0)
(b)
(0.6,0)
1
Figure 2.2: (a) The 2queue, 1server opportunistic scheduling system with ON/OFF channels. (b) The capacity region for the speciﬁc channel probabilities given below.
If S (t) = (OF F, OF F ), then b1 (t) = b2 (t) = 0. If exactly one channel is ON, then clearly the controller should choose to transmit over that channel. The only decision is which channel to use when S (t) = (ON, ON). Suppose that (a1 (t), a2 (t)) is i.i.d. over slots with E {a1 (t)} = λ1 and E {a2 (t)} = λ2 . Suppose that S (t) is i.i.d. over slots with P r[(OF F, OF F )]=p00 , P r[(OF F, ON)] = p01 , P r[(ON, OF F )] = p10 , P r[ON, ON ] = p11 . a) Deﬁne as the set of all vectors (λ1 , λ2 ) that satisfy the constraints 0 ≤ λ1 ≤ p10 + p11 , 0 ≤ λ2 ≤ p01 + p11 , λ1 + λ2 ≤ p01 + p10 + p11 . Show that (λ1 , λ2 ) ∈ is necessary for the existence of a rate stabilizing algorithm. b) Plot the 2dimensional region for the special case when p00 = 0.24, p10 = 0.36, p01 = 0.16, p11 = 0.24. c) For the system of part (b): Use a randomized algorithm that independently transmits over channel 1 with probability β whenever S (t) = (ON, ON ). Choose β to make both queues rate stable when (λ1 , λ2 ) = (0.6, 0.16). d) For the system of part (b): Choose β to make both queues rate stable when (λ1 , λ2 ) = (0.5, 0.26). Solution: a) Let b1 (t), b2 (t) be the decisions made by a particular algorithm that makes both queues rate stable. From (2.5), we have for queue 1 and for all slots t > 0: Q1 (t) Q1 (0) 1 − ≥ t t t
t−1
a1 (τ ) −
τ =0
1 t
t−1
b1 (τ )
τ =0
Because b1 (τ ) ≤ 1{S1 (τ )=ON } , where the latter is an indicator function that is 1 if S1 (τ ) = ON, and 0 else, we have: 1 Q1 (t) Q1 (0) − ≥ t t t
t−1
a1 (τ ) −
τ =0
1 t
t−1
1{S1 (τ )=ON }
τ =0
(2.15)
24
2. INTRODUCTION TO QUEUES
However, we know that Q1 (t)/t → 0 with probability 1. Further, by the law of large numbers, we have (with probability 1): 1 lim t→∞ t
t−1 τ =0
1 a1 (τ ) = λ1 , lim t→∞ t
t−1
1{S1 (τ )=ON } = p10 + p11
τ =0
Thus, taking a limit as t → ∞ in (2.15) yields: 0 ≥ λ1 − (p10 + p11 ) and hence λ1 ≤ p10 + p11 is a necessary condition for any rate stabilizing algorithm. A similar argument shows that λ2 ≤ p01 + p11 is a necessary condition. Finally, note that for all t > 0: 1 Q1 (t) + Q2 (t) Q1 (0) + Q2 (0) − ≥ t t t
t−1
[a1 (τ ) + a2 (τ )] −
τ =0
1 t
t−1
1{{S1 (τ )=ON }∪{S2 (τ )=ON }}
τ =0
Taking a limit of the above proves that λ1 + λ2 ≤ p01 + p10 + p11 is necessary. b) See Fig. 2.2b. c) If S (t) = (OF F, OF F ) then don’t transmit. If S (t) = (ON, OF F ) or (ON, ON) then transmit over channel 1. If S (t) = (OF F, ON ), then transmit over channel 2. Then by the law av av of large numbers, we have b1 = p10 + p11 = 0.6, b2 = p01 = 0.16, and so both queues are rate stable (by the Rate Stability Theorem). av av d) Choose β = 0.14/0.24. Then b1 = 0.36 + 0.24β = 0.5, and b2 = 0.16 + 0.24(1 − β) = 0.26. Discussion: Exercise 2.9 treats scheduling and delay issues in this system. It can be shown that the set given in part (a) above is the capacity region, so that (λ1 , λ2 ) ∈ is necessary and sufﬁcient for the existence of a rate stabilizing policy. See (8) for the derivation of the capacity region for ON/OFF opportunistic scheduling systems with K queues (with K ≥ 2). See also (8) for optimal delay scheduling in symmetric systems of this type (where all arrival rates are the same, as are all ON/OFF probabilities), and (101)(100) for “orderoptimal” delay in general (possibly asymmetric) situations. It is possible to support any point in using a stationary randomized policy that makes a scheduling decision as a random function of the observed channel state S (t). Such policies are called S only policies. The solutions given in parts (c) and (d) above use S only policies. Further, the randomized server allocation policies considered in the 3queue, 2server example of Section 2.3.1 can be viewed as “degenerate” S only policies, because, in that case, there is only one “channel state” (i.e., (ON, ON, ON)). It is known that the capacity region of general singlehop and multihop networks with time varying channels S (t) can be described in terms of S only policies (15)(22) (see also Theorem 4.5 of Chapter 4 for a related result for more general systems). Note that S only policies do not consider queue backlog information, and thus they may serve a queue that is empty, which is clearly inefﬁcient. Thus, one might wonder how S only policies can
2.4. EXERCISES
25
stabilize queueing networks whenever trafﬁc rates are inside the capacity region. Intuitively, the reason is that inefﬁciency only arises when a queue becomes empty, a rare event when trafﬁc rates are near the boundary of the capacity region.2 Thus, using queue backlog information cannot “enlarge” the region of supportable rates. However, Chapter 3 shows that queue backlogs are extremely useful for designing dynamic algorithms that do not require apriori knowledge of channel statistics or apriori computation of a randomized policy with speciﬁc time averages.
2.4
EXERCISES
(Queue Sample Path) Fill in the missing entries of the table in Fig. 2.3 for a queue Q(t) that satisﬁes (2.1).
Exercise 2.1.
Arrivals Current Rate Backlog Transmitted
t a(t) b(t) Q(t) ˜ b(t)
0 3 4 0 0
1 3 2 3 2
2 0 1 4 1
3 2 3 3
4 1 3 2
5 0 2 1
6 0 2
7 2 4
8 0
9 0 2 2
10 0 1
Figure 2.3: An example sample path for the queueing system of Exercise 2.1.
(Inequality comparison) Let Q(t) satisfy (2.1) with server process b(t) and arrival ˜ process a(t). Let Q(t) be another queueing system with the same server process b(t) but with an arrival process a(t) = a(t) + z(t), where z(t) ≥ 0 for all t ∈ {0, 1, 2, . . .}. Assuming that Q(0) = ˜ ˜ ˜ Q(0), prove that Q(t) ≤ Q(t) for all t ∈ {0, 1, 2, . . .}.
Exercise 2.2.
Exercise 2.3. (Proving sufﬁciency for Theorem 2.4a) Let Q(t) satisfy (2.1) with arrival and server processes with well deﬁned time averages a av and bav . Suppose that a av ≤ bav . Fix > 0, and deﬁne Q (t) as a queue with Q (0) = Q(0), and with the same server process b(t) but with an arrival process a(t) = a(t) + (bav − a av ) + for all t. ˜ a) Compute the time average of a(t). ˜ b) Assuming the result of Theorem 2.4b, compute limt→∞ Q (t)/t. c) Use the result of part (b) and Exercise 2.2 to prove that Q(t) is rate stable. Hint: I am thinking of a nonnegative number x. My number has the property that x ≤ for all > 0. What is my number?
2 For example, in the GI/B/1 queue of Exercise 2.6, it can be shown by Little’s Theorem (129) that the fraction of time the queue
is empty is 1 − λ/μ (assuming λ ≤ μ), which goes to zero when λ → μ.
2) to ﬁnd limt→∞ E b(t) . lim E Q(t)2 = E Q2 Using ergodic Markov chain theory.5) to show that if b(ti ) < b(ti ). there is some slot t ∗ ≥ 0 such that for all t > t ∗ . Exercise 2.26 2.9). respectively. INTRODUCTION TO QUEUES (Proof of Theorem 2.4b. Use (2. P r[b(t) = 0] = 1 − μ. Here we want to compute E {Q}.1).d. (Strong stability implies steady state stability) Prove that strong stability implies steady state stability using the fact that E {Q(τ )} ≥ MP r[Q(τ ) > M].5. .1).4. Q(t) is always integer valued. 1 τ =0 t→∞ lim E {Q(t)} = E {Q} . ˜ a) Take expectations of equation (2. we have: Q(t) = Q(t ∗ ) + Use this to prove the result of Theorem 2. d) Use part (c) to conclude that if a av > bav . ˜ c) Use part (b) and (2. then a av ≤ bav . lim 1 t→∞ t t−1 Q(τ ) = Qav with prob.6. Q(ti )]). Thus. and that there are ﬁnite values E {Q}. 1 t→∞ t lim t−1 τ =0 t→∞ E {Q(τ )} = Q . Assume that b(t) is independent of the arrivals and is i.i. it can be shown that Q = Qav = E {Q} (see also Exercise 7. Suppose that λ < μ. with E {a(t)} = λ and E a(t)2 = E a 2 . Q. then: ti a(ti ) ≥ Q(0) + τ =0 [a(τ ) − b(τ )] ˜ Conclude that if b(ti ) < b(ti ) for an inﬁnite number of slots ti . Qav .i. t−1 [a(τ ) − b(τ )] τ =t ∗ (Discrete time GI/B/1 queue) Consider a queue Q(t) with dynamics (2.1) to compute Q(ti + 1). 1 t +1 t a(τ ) = τ =0 t t +1 1 t t−1 a(τ ) + τ =0 t t +1 a(t) t ˜ ˜ b) Suppose that b(ti ) < b(ti ) for some slot ti (where we recall that b(ti )= min[b(ti ).4b) Let Q(t) be a queue that satisﬁes (2. Assume that a(t) is i. Assume time averages of a(t) and b(t) are given by ﬁnite constants a av and bav .d. over slots with P r[b(t) = 1] = μ. using the magic of a quadratic. a) Use the following equation to prove that limt→∞ a(t)/t = 0 with probability 1: Exercise 2. E Q2 such that: Exercise 2. over slots with nonnegative integer values.
9) d) (λ1 . 3/4. which makes analysis of tandems of B/B/1 queues very easy. λ2 . 0. 3}.d.i. When the arrival process is Bernoulli. 2. it can be shown that the steady state output process of a B/B/1 queue is also i.2.9. Using reversible Markov chain theory (130)(66)(131).1.5.6. Bernoulli with rate λ (regardless of μ. Exercise 2. 2.d. (Server Scheduling) Consider the 3queue. over slots with E {ai (t)} = λi for i ∈ {1.3. 2server system of Fig. λ3 ) = (0.2). 2. Exercise 2. even though it is not independent of b(t).2 (Fig. a2 (t). (Delay for Opportunistic Scheduling) Consider the 2queue wireless downlink with ON/OFF channels as described in the example of Section 2. with P r[a1 (t) = 0] = 0.5.6 to compute the average backlog Q1 and average delay W 1 in queue 1. a3 (t)) is i.d. over slots. This establishes the average backlog for an integerbased GI/B/1 queue (where “GI” means the arrivals are general and i.2. Assume the arrival vector (a1 (t). over slots and Bernoulli. (Delay for Server Scheduling) Consider the 3queue. p2 = 0. c) Square equation (2. λ2 .3. By Little’s Theorem (129). Use the formula of Exercise 2.65. Bernoulli. The channel probabilities . λ2 . 0.i.3. EXERCISES 27 ˜ ˜ ˜ b) Explain why b(t)2 = b(t) and Q(t)b(t) = Q(t)b(t). these formulas simplify to Q = λ(1 − λ)/(μ − λ) and W = (1 − λ)/(μ − λ). you must convince yourself that queue 1 is indeed a discrete time GI/B/1 queue).4. Suppose a1 (t) is i. 0. f ) Give a deterministic algorithm that uses a periodic schedule to support the rates in part (c).i.2.3.9.1).7. λ2 . 2. it follows that average delay (in units of slots) is W = Q/λ. p3 = 0.1 (Fig. λ3 ) = (0.35. 0. (First.8. 0. provided that λ < μ). P r[a1 (t) = 1] = 0. λ3 ) = (3/4. 2server system example of Section 2. 0.d. Design a randomized server allocation algorithm to make all queues rate stable when: a) (λ1 .1 that operates according to the randomized schedule of the solution given in part (d) of Section 2.5) e) Give a deterministic algorithm that uses a periodic schedule to support the rates in part (b).6) b) (λ1 . Exercise 2. so that p1 = 0. λ3 ) = (0.7. and “1” means there is a single server).i. “B” means the service is i.2) and use part (b) to prove: ˜ ˜ Q(t + 1)2 = Q(t)2 + b(t) + a(t)2 − 2Q(t)(b(t) − a(t)) − 2b(t)a(t) d) Take expectations in (c) and let t → ∞ to conclude that: E {Q} = E a 2 + λ − 2λ2 2(μ − λ) ˜ We have used the fact that Q(t) is independent of b(t). 1/2) c) (λ1 .6.d.i.
.10. Assume that μ = 0. 1. a) Use Lemma 2. .16. and {b(t)}∞ is independent of the arrival process t=0 and is i. we have Q(0) = 0. Design a randomized algorithm. More generally.3) to show that for any integer T > 0 and any interval of T slots.6 (ﬁrst convincing yourself that each queue is indeed a GI/B/1 queue) along with an educated guess for β and/or trial and error for β. 0. 2. Compare these to the exact value (given in Exercise 2. so that P r[a1 (t) = 1] = 0.6. {a(t)}∞ is i.d over slots t=0 with P r[a(t) = 1] = λ.8.i. that ensures the average delay satisﬁes W 1 ≤ 25 slots and W 2 ≤ 25 slots. Exercise 2. 1. . 0. and updates Z(t) at the end of each slot via (2. Deﬁne a virtual queue Z(t) with Z(0) = 0. run the experiment over 106 slots.i. Bernoulli with rate λ1 = 0. 0.16) using the power p(t) that was spent on that slot. Bernoulli with rate λ2 = 0. using parameter β as the probability that we transmit over channel 1 when S (t) = (ON.24. then:3 limt→∞ 1 t t−1 τ =0 p(τ ) ≤ 12. Suppose the arrival process a1 (t) is i.6. P r[a(t) = 0] = 1 − λ.” . .d.d. 1 3 For simplicity.d.1 to prove that if Z(t) is rate stable. (Virtual Queues) Suppose we have a system that operates in discrete time with slots t ∈ {0. INTRODUCTION TO QUEUES are given as in that example: p00 = 0. A controller makes decisions every slot t about how to operate the system. P r[b(t) = 0] = 1 − μ.}.28 2. Speciﬁcally.7. ON).3 with probability 1 b) Suppose there is a positive constant Zmax such that Z(t) ≤ Zmax for all t ∈ {0. plus a constant allowable “power burst” Zmax . . p01 = 0. over slots with P r[b(t) = 1] = μ. we have implicitly assumed the limit lim t→∞ t t−1 τ =0 p(τ ) in Exercise 2. Use (2. 0] + p(t) (2.7. 2. and give the empirical time average Qav and the value of Q(t)/t for t = 106 .16) Exercise 2. 0. The controller keeps the value of Z(t) as a state variable.3T + Zmax This idea is used in (21) to ensure the total power used in a communication system over any interval is less than or equal to the desired perslot average power constraint multiplied by the interval size. and with update equation: Z(t + 1) = max[Z(t) − 12. . we have: t1 +T −1 p(τ ) τ =t1 ≤ 12. . You should use the delay formula in Exercise 2. .i. deﬁned by {t1 . for λ values of 0. .3 power units per slot. Suppose a2 (t) is i.3. the result holds when “lim” is replaced with “lim sup.6) for t → ∞. A variation of this technique is used in (137) to bound the worstcase number of collisions with a primary user in a cognitive radio network. t1 + T − 1} (where t1 ≥ 0). p10 = 0.i. The controller wants to ensure the time average power expenditure is no more than 12.11. p11 = 0.36. .11(a) exists.5.4.3.}. and these decisions incur power p(t). P r[a1 (t) = 0] = 0. (Simulation of a B/B/1 queue) Write a computer program to simulate a Bernoulli/Bernoulli/1 (B/B/1) queue.24.4.4.
1.1 SCHEDULING FOR STABILITY Consider a slotted system with two queues. The arrival vector (A1 (t).2} { max (0. 2 (0. the problem is formulated in terms of known arrival rates and channel state probabilities. where A1 (t) and A2 (t) take integer units of packets.2 of the previous chapter. This Lyapunov drift technique is extended at the end of the chapter to allow for joint stability and average power minimization.1(a). For λ = (0. this chapter introduces the main concepts for a simple 2user wireless downlink example. over slots. similar to the example given in Section 2. First. However. A2 (t)) is i.d.49. The wireless channels are time varying. as shown in Fig.29 CHAPTER 3 Dynamic Scheduling Example The dynamic scheduling algorithms developed in this text use powerful techniques of Lyapunov drift and Lyapunov optimization.1} max Y { (0.33) 1 2 (a) X 1 (b) Figure 3.14. The arrival rates are given by λ1 =E {A1 (t)} and λ2 =E {A2 (t)}.10) Z A1(t) Q1(t) S1(t) in {0. we use an alternative approach based on minimizing the drift of a Lyapunov function.1: (a) The 2queue wireless downlink example with timevarying channels.e.7) (i. we have max (λ) = 0. 0. 0.70. 3. and it does not require apriori knowledge of trafﬁc rates or channel probabilities. The advantage is that the driftminimizing approach uses both current channel states and current queue backlogs to stabilize the system. and every 2 . To build intuition. (b) The capacity region .i. rather than using a randomized scheduling algorithm that bases decisions only on the current channel states (as considered in the previous chapter). point Y illustrated).12.1. 0..3. The second moments E A2 =E A1 (t)2 and 1 E A2 =E A2 (t)2 are assumed to be ﬁnite. 3.3.75) A2(t) E{A1(t)} = E{A2(t)} = Q2(t) S2(t) in {0.
“Transmit over channel 2”. provided that the scheduler decides to transmit over that channel. ∀t ∈ {0. A particular S only algorithm for this system is characterized by probabilities q1 (S1 .2 . P r[S2 (t) = 2] = 0. b2 (t)=b2 (α ∗ (t). S (t)). 2}). S2 ) (S1 . (0. DYNAMIC SCHEDULING EXAMPLE slot t we have a channel vector S (t) = (S1 (t).S2 )∈S = . S2 ) and q2 (S1 . The channel state processes S1 (t) and S2 (t) are independent of each other and are i. S2 ). .5 . Let α ∗ (t) represent the transmission ∗ ∗ ˆ ˆ decisions under a particular S only policy. “Idle”} where α(t) = “Idle” means that no transmission takes place on slot t.3 Every slot t the network controller observes the current channel state vector S (t) and chooses a single channel over which to transmit. (0. S (t)) as the resulting transmission rates offered by this policy on slot t.30 3.} (3. 2}. and deﬁne b1 (t)=b1 (α ∗ (t). S2 ]S2 q2 (S1 .d. deﬁned ˆ by a function bi (α(t). 2}). 2. (1. 2). S2 ) + q2 (S1 . The queueing dynamics are given by: Qi (t + 1) = max[Qi (t) − bi (t).1) where bi (t) represents the amount of service offered to channel i on slot t (for i ∈ {1. S2 ) is the probability of transmitting over channel i if S (t) = (S1 . over slots. 0] + Ai (t) ∀i ∈ {1.1 THE SONLY ALGORITHM AND Let S represent the set of the 6 possible outcomes for channel state vector S (t) in the above system: S ={(0. (1. (1. S (t))= Si (t) 0 if α(t) = “Transmit over channel i” otherwise max (3. 1). S2 ) ∈ S . P r[S1 (t) = 1] = 0. taking three possible values: α(t) ∈ {“Transmit over channel 1”. Let α(t) be the transmission decision on slot t.3 .1. S2 ) for all (S1 . S2 ]S1 q1 (S1 . where we use inequality to allow the possibility of transmitting over neither channel (useful for the power minimization problem considered later). We thus have for every slot t: ∗ E b1 (t) ∗ E b2 (t) = (S1 . These probabilities must satisfy q1 (S1 . 0). S2 (t)).i. stationary. . 2)} Consider ﬁrst the class of S only scheduling algorithms that make independent.S2 )∈S P r[S1 . S2 ) P r[S1 . with: • P r[S1 (t) = 0] = 0. 0).7 • P r[S2 (t) = 0] = 0. S2 ) ∈ S . P r[S2 (t) = 1] = 0. 1). where qi (S1 . S2 ) ≤ 1 for all (S1 . and randomized transmission decisions every slot t based only on the observed S (t) (and hence independent of queue backlog). S (t)): ˆ bi (t) = bi (α(t). where Si (t) is a nonnegative integer that represents the number of packets that can be transmitted over channel i on slot t (for i ∈ {1. . 1.2) 3.
queue 2 is rate stable if and only if λ2 ≤ E b2 (t) .7) (shown as point Y in the ﬁgure). If the rate vector λ is interior to the capacity region . λ2 ). it is useful to design the transmission rates to be strictly larger than the arrival rates (see Exercises 2.7) = 0.i.9) (3.4) that queue 1 is ∗ ∗ rate stable if and only if λ1 ≤ E b1 (t) . SCHEDULING FOR STABILITY 31 where we have used P r[S1 .1(b). It follows by the Rate Stability Theorem (Theorem 2. and that is shown in Fig. 3. Thus.S2 )∈S P r[S1 .S2 )∈S P r[S1 .10). and deﬁne max (λ) as the maximum value of in the above problem. here we pursue queue stability via an algorithm that makes decisions based on both the current channel states and the current queue backlogs. Note that the above expectations are over the random channel state vector S (t) and the random ∗ transmission decision in reaction to this vector. S2 ) ∈ S (3. It can be shown that the network capacity region is the set of all nonnegative rate vectors λ for which max (λ) ≥ 0.7) There are 8 known parameters that appear as constants in the above linear program: λ1 .2 LYAPUNOV DRIFT FOR STABLE SCHEDULING Rather than trying to solve the linear program of the preceding subsection (which would require apriori knowledge of the arrival rates and channel probabilities speciﬁed in (3. S2 ) q1 (S1 . λ2 ) that is interior to the capacity region . q2 (S1 . 0. E b2 (t) ≥ λ2 + max (λ) (3. S2 (t)) = (S1 .8. S2 ) ∈ S q1 (S1 . S2 )]. 2. S2 ) ≥ 0. q1 (S1 . b1 (t) is i. However.6.6) (3. λ2 . S2 ] ∀(S1 . S2 ) ≥ 0 ∀(S1 . In this simple example. Likewise.4) (3. S2 ]S1 q1 (S1 . Remarkably. we have ∗ ∗ max (λ) > 0.3. for ﬁnite delay. and there exists an S only algorithm that yields transmission variables (b1 (t). then max (λ) > 0. S2 ) λ2 + ≤ (S1 .3. S2 ) ∀(S1 .3) (3. The ﬁgure also illustrates an example arrival rate vector (λ1 .3. .d. 0. Under this S only algorithm. S2 ) + q2 (S1 . S2 ]S2 q2 (S1 . S2 ) ∈ S (3.5) (3.9. It follows that for any rate vector λ = (λ1 . S2 ] as shorthand notation for P r[(S1 (t). and thus the time average of b∗ (t) is equal to E b∗ (t) with probability 1 (by with mean E b1 1 1 the law of large numbers). q2 (S1 . S2 ) ≤ 1 ∀(S1 . it is possible to compute the capacity region explicitly.1. the algorithm we present is not an S only algorithm. 2. for which we have max (0.8) Deﬁne λ=(λ1 . S2 ). The following linear program seeks to design an ∗ ∗ S only policy that maximizes the value of for which λ1 + ≤ E b1 (t) and λ2 + ≤ E b2 (t) : Maximize: Subject to: λ1 + ≤ (S1 .8)).10) 3. The value of max represents a measure of the distance between the rate vector λ and the capacity region boundary. λ2 ) = (0.1. b2 (t)) that satisfy: ∗ E b1 (t) ≥ λ1 + max (λ) ∗ .12. P r[S1 . 2. S2 ) ∈ S There are 13 unknowns that act as variables to be optimized in the above linear program: . over slots ∗ (t) .
if L(Q(t)) ≤ 32. • L(Q(t)) being “small” implies that both queue backlogs are “small. we ﬁrst use (3.11) 2 This represents a scalar measure of queue congestion in the network. would satisfy properties similar to those stated above. When computing the change in the Lyapunov function from one slot to the next. 0] + Ai (t))2 − Qi (t)2 i=1 [Ai (t)2 + bi (t)2 ] + 2 2 Qi (t)[Ai (t) − bi (t)] i=1 (3.32 3. Let Q(t) = (Q1 (t).10).6.9) that deﬁne this S only algorithm. when another function. such as a linear function. DYNAMIC SCHEDULING EXAMPLE the proof that it provides strong stability whenever the arrival rate vector is interior to the capacity region will use the existence of the S only algorithm that satisﬁes (3. it is intuitively clear that designing an algorithm to consistently push the queue backlog towards a region such that L(Q(t)) ≤ M (for some ﬁnite constant M) will help to control congestion and stabilize the queues. and deﬁne a Lyapunov function L(Q(t)) as follows: 1 L(Q(t))= [Q1 (t)2 + Q2 (t)2 ] (3. we will ﬁnd that the quadratic has important dominant cross terms that include an inner product of queue backlogs and transmission rates. Q2 (t)) be the vector of current queue backlogs. with equality if and only if the network is empty on slot t.12) .” For example. While we usually cannot guarantee that the Lyapunov function is deterministically bounded. One may wonder why we use a quadratic Lyapunov function. Q2 (t)). To understand how we can consistently push the Lyapunov function towards a low congestion region. and so all queues are trivially strongly stable.1) to compute a bound on the change in the Lyapunov function from one slot to the next: L(Q(t + 1)) − L(Q(t)) = = ≤ i=1 1 2 1 2 2 2 [Qi (t + 1)2 − Qi (t)2 ] i=1 2 (max[Qi (t) − bi (t). and has the following properties: • L(Q(t)) ≥ 0 for all backlog vectors Q(t) = (Q1 (t). This is important for the same reason that it was important to use a quadratic function in the delay computation of Exercise 2. If there is a ﬁnite√ constant M such that L(Q(t)) ≤ M for all t. and readers seeking more intuition on the “magic” of the quadratic function are encouraged to review that exercise. then Q1 (t)2 + Q2 (t)2 ≤ 64. without ever needing to solve for the 13 variables in (3.” • L(Q(t)) being “large” implies that at least one queue backlog is “large. and thus we know that both Q1 (t) ≤ 8 and Q2 (t) ≤ 8. then clearly all queue backlogs are always bounded by 2M.
12). all possible Q(t). From (3. S (t)) to yield: 2 2 (Q(t)) ≤ B + i=1 Qi (t)λi − E i=1 ˆ Qi (t)bi (α(t).15) (Q(t)) ≤ B + i=1 Qi (t)λi − E i=1 Qi (t)bi (t)Q(t) To emphasize how the righthandside of the above inequality depends on the transmission decision ˆ α(t). so that E {Ai (t)Q(t)} = E {Ai (t)} = λi .13) where the expectation depends on the control policy. b ≥ 0.The probability that the nonzero bi (t) (if any) is equal to 2 is at most 0. and is with respect to the random channel states and the (possibly random) control actions made in reaction to these channel states.14) yields: 2 2 2 bi (t)2 Q(t) ≤ i=1 22 (0. we have that (Q(t)) for a general control policy satisﬁes: 2 (Q(t)) ≤ E i=1 Ai (t)2 + bi (t)2  Q(t) + 2 2 2 Qi (t)λi − E i=1 i=1 Qi (t)bi (t)Q(t) (3. 0] + A)2 ≤ Q2 + A2 + b2 + 2Q(A − b) Now deﬁne (Q(t)) as the conditional Lyapunov drift for slot t: (Q(t))=E {L(Q(t + 1) − L(Q(t))Q(t)} (3.3) + 12 (0. Hence: 1 E 2 and thus we can deﬁne B as: 1 B =0.3). S (t))Q(t) (3.i. we have that at most one bi (t) value can be nonzero on a given slot t. over slots and hence independent of current queue backlogs. so that for all t. Now deﬁne B as a ﬁnite constant that bounds the ﬁrst term on the righthandside of the above drift inequality.14) where we have used the fact that arrivals are i.7) = 0. and if it is not equal to 2. we have: 2 E i=1 Ai (t)2 + bi (t)2  Q(t) ≤ B 2 For our system. we use the identity bi (t) = bi (α(t).95 2 2 E A2 i i=1 (3. we have: (max[Q − b.16) . SCHEDULING FOR STABILITY 33 where in the ﬁnal inequality we have used the fact that for any Q ≥ 0. and all possible control actions that can be taken. then it is at most 1.1.3.3 (because P r[S2 (t) = 2] = 0.d.95 + 2 Using this in (3. A ≥ 0.
S (t)) 2 ˆ i=1 Qi (t)bi (α(t).17) over all alternative decisions. S2 (t)) and to make a transmission decision α(t) to minimize the righthandside of the drift bound (3. or idle) with a distribution that depends on the observed S (t). Fixing a particular alternative (possibly randomized) decision α ∗ (t) for comparison α ∗ (t) . and it does not need knowledge of the arrival rates or channel probabilities.1.34 3.3 THE “MINDRIFT” OR “MAXWEIGHT” ALGORITHM Our dynamic algorithm is designed to observe the current queue backlogs (Q1 (t). DYNAMIC SCHEDULING EXAMPLE 3. It follows that the maxweight algorithm chooses to transmit over the channel i with the largest (positive) value of Qi (t)Si (t). as it seeks to maximize a weighted sum of the transmission rates. where the weights are queue backlogs. Note that the transmission decision on slot t only affects the ﬁnal term on the righthandside.17) for each option: • • • 2 ˆ i=1 Qi (t)bi (α(t). As there are only three decisions (transmit over channel 1. = 0 if we choose to remain idle. or don’t transmit). Because this algorithm maximizes the weighted sum (3. S (t)) 2 ˆ i=1 Qi (t)bi (α(t). we have: 2 i=1 ˆ Qi (t)bi (α(t). Q2 (t)) and channel states (S1 (t). This simple algorithm just makes decisions based on the current queue states and channel states. S (t)) ≥ 2 i=1 ˆ Qi (t)bi (α ∗ (t). transmit 2. and remains idle if this value is 0 for both channels.16). Q2 (t)) and the current channel states (S1 (t).17) This is often called the “maxweight” algorithm. we seek to design an algorithm that maximizes the following expression: 2 E i=1 ˆ Qi (t)bi (α(t). S2 (t)) and the (possibly random) control decision α(t). it is easy to evaluate the weighted sum (3. S (t))Q(t) The above conditional expectation is with respect to the randomly observed channel states S (t) = (S1 (t). This includes the case when α ∗ (t) is an S only decision that randomly chooses one of the three transmit options (transmit 1. S (t)) represents any alternative (possibly randomized) transmission decision that can be made where on slot t. We now use the concept of opportunistically maximizing an expectation: The above expression is maximized by the algorithm that observes the current queues (Q1 (t). S2 (t)) and chooses α(t) to maximize: 2 i=1 ˆ Qi (t)bi (α(t). Thus. S (t)) = Q1 (t)S1 (t) if we choose to transmit over channel 1. S (t)) (3. transmit over channel 2. = Q2 (t)S2 (t) if we choose to transmit over channel 2.
2}: ∗ ∗ E bi (t)Q(t) = E bi (t) ≥ λi + max (λ) Plugging this directly into (3.1. However. Plugging the above directly into (3. the resulting rates (b1 (t). It is remarkable that the inequality (3. and the ﬁnal term on the righthandside involves any other decision α ∗ (t).19) ∗ ˆ where we have used the identity bi (t)=bi (α ∗ (t). S (t))Q(t) ≥ E 2 i=1 ˆ Qi (t)bi (α ∗ (t).3. and so by (3.18) holds true for all of the (inﬁnite) number of possible randomized alternative decisions that can be plugged into the ﬁnal term on the righthandside. b2 (t)) are independent of current queue backlog.10). we did not need to solve the linear program to obtain this inequality or to implement the algorithm! It was enough to know that the solution to the linear program exists! . S (t))Q(t) (3.19) yields: 2 (Q(t)) ≤ B − i=1 Qi (t) max (λ) (3. However. The above is a drift inequality concerning the maxweight algorithm on slot t.20) where we recall that max (λ) > 0. over slots.18) yields: 2 (Q(t)) ≤ B − i=1 ∗ Qi (t)[E bi (t)Q(t) − λi ] (3.d.7). S (t)) to represent the transmission rate that would ∗ (t) were made. and it is now in terms of a value max (λ) associated with the linear program (3.3)(3. this should not be too surprising.i.18) where the lefthandside represents the drift under the maxweight decision α(t). Because channel states are i. we have for i ∈ {1. λ2 ) are interior to the capacity region .10). SCHEDULING FOR STABILITY 35 and taking a conditional expectation of the above inequality (given Q(t)) yields: 2 E i=1 ˆ Qi (t)bi (α(t). be offered over channel i if decision α Now suppose the arrival rates (λ1 . and the decision α ∗ (t) represents any other particular decision that could have been made. and consider the particular S only decision α ∗ (t) that chooses a transmit option independent of queue backlog to yield ∗ ∗ (3. as we designed the maxweight policy to have exactly this property! Rearranging the terms in (3. S (t))Q(t) where the decision α(t) on the lefthandside of the above inequality represents the maxweight decision made on slot t.16) yields: 2 2 (Q(t)) ≤ B + i=1 Qi (t)λi − E i=1 ˆ Qi (t)bi (α ∗ (t).
083 packets 0. DYNAMIC SCHEDULING EXAMPLE 3. . Thus.45 = 12.12 . 3. . . and the total average backlog (summed over both queues) is less than or equal to B/ max (λ).13) with the law of iterated expectations yields: E { (Q(t))} = E {E {L(Q(t + 1)) − L(Q(t))Q(t)}} = E {L(Q(t + 1))} − E {L(Q(t))} Substituting this identity into (3. .20) over the randomness of the Q1 (t) and Q2 (t) values yields: 2 E { (Q(t))} ≤ B − Using the deﬁnition of max (λ) i=1 E {Qi (t)} (3. As an example.1. and max (λ) t=0 i=1 E {Qi (t)} using the fact that L(Q(T )) ≥ 0 yields: B E {L(Q(0))} + max (λ) max (λ)T T −1 2 E {Qi (t)} ≤ t=0 i=1 Assuming that E {L(Q(0))} < ∞ and taking a lim sup yields: lim sup T →∞ 1 T T −1 2 E {Qi (t)} ≤ t=0 i=1 B max (λ) Thus. .}.15).21) (Q(t)) in (3. with an average queue congestion bound that is inversely proportional to the distance the rate vector is away from the capacity region boundary. 1.36 3.21) yields: 2 E {L(Q(t + 1))} − E {L(Q(t))} ≤ B − max (λ) i=1 E {Qi (t)} The above holds for all t ∈ {0. the maxweight algorithm (developed by minimizing a bound on the Lyapunov drift) ensures the queueing network is strongly stable whenever the rate vector λ is interior to the capacity region . assume λ1 = 0.7. Summing over t ∈ {0. .45 obtained from (3. illustrated by the point Y of Fig.1(b). dividing by 1 T max (λ)T . T − 1} for some integer T > 0 yields (by telescoping sums): T −1 2 E {L(Q(T ))} − E {L(Q(0))} ≤ BT − Rearranging terms.4 ITERATED EXPECTATIONS AND TELESCOPING SUMS Taking an expectation of (3.3 and λ2 = 0.12. we have: Q1 + Q 2 ≤ 1. Assuming arrivals are Bernoulli so that E Ai = E {Ai } = λi and using the value of B = 1. all queues are strongly stable. Then 2 max = 0. . 1. 2.
. but deﬁne p(t) as the power expenditure incurred by the transmission decision α(t) on slot t. we write p(t) = p(α(t)) and ˆ . The interested reader is referred to the above references. Also plotted in Fig. However.372. λ2 ) = (0.2 shows simulation results over 106 slots when the rate vector (λ1 .1.5 SIMULATION OF THE MAXWEIGHT ALGORITHM Fig. Thus. To emphasize that power is a function of α(t). and hence in this example. 3. For brevity. For example.806 is the point Y ). The third reason often dominates in networks with many queues. a more sophisticated queue grouping analysis in (101) shows that the maxweight algorithm on the ON/OFF downlink system gives average backlog and delay that is O(1). 0. average delay satisﬁes: W = Q1 + Q 2 ≤ 12. the actual maxweight algorithm performs much better than the bound would suggest. again assuming independent Bernoulli arrivals. the xaxis is a normalization factor ρ that speciﬁes the distance along the segment (so that ρ = 0 is the point X.868). (ii) The value B used an upper bound on the second moments of service.3. whereas the actual drift is much better because our algorithm considers queue backlog. In the ﬁgure. λ2 ) is pushed up the line segment from X to Z in the ﬁgure. and it has average backlog that increases to inﬁnity at the vertical asymptote deﬁned by the capacity region boundary (i. 3. independent of the number of queues. at ρ = 1).2 is the upperbound B/ max (λ) (where we have computed max (λ) for each input rate vector λ simulated). see also queue grouping results in (102)(103)(104)(105).058 packets. STABILITY AND AVERAGE POWER MINIMIZATION 37 where Q1 + Q2 represents the lim sup time average expected queue backlog in the network. (iii) The drift inequality compares to a queueunaware S only algorithm.083 slots λ1 + λ 2 A simulation of the algorithm over 106 slots yields an empirical average queue backlog of empirical empirical Q1 + Q2 = 3.2.12). 3. our upper bound overestimates backlog by roughly a factor of 4. and ρ = 0. There are three reasons for this gap: (i) A simple upper bound was used when computing the Lyapunov drift in (3. This bound shows the same qualitative behavior. It can be seen that the network is strongly stable for all rates with ρ < 1. we do not include queue grouping concepts in this text. 3. By Little’s Theorem (129). but it is roughly a factor of 4 larger than the empirically observed backlog. The point Z is (λ1 .e. in (100) it is shown that average congestion and delay in an Nqueue wireless system with one server and ON/OFF channels is at least proportional to N if a queueunaware algorithm is used (a related result is derived for N × N packet switches in (99)).2 STABILITY AND AVERAGE POWER MINIMIZATION Now consider the same system. ρ = 1 is the point Z.
33) Simulation 10 5 0 0 0.7)): Minimize: Subject to: (S1 . For a given rate vector (λ1 .e. 0.75) 25 20 Bound 15 { max (0. S2 ) ∈ S = Thus. 0. assume the following simple power function: p(α(t)) = ˆ 1 0 if α(t) ∈ {“Transmit over channel 1. .49. for each λ ∈ . E b2 (α ∗ (t). DYNAMIC SCHEDULING EXAMPLE Average queue backlog versus ρ 50 45 2 (0.S2 )∈S P r[S1 . S2 ) ∈ S q1 (S1 . (λ1 .” “Transmit over channel 2”} if α(t) = “Idle” That is. and entrywise nondecreasing. λ2 ) can be computed by solving the following linear program (compare with (3. S2 ) λ2 ≤ (S1 .5 ρ 0.e. S (t)) ≥ λ2 .2: Average sum queue backlog (in units of packets) under the maxweight algorithm. 1. ρ = 1). E p(α ∗ (t)) = It can be shown that (λ1 .3 0. S2 ](q1 (S1 . ρ = 0) to point Z (i. Each simulated data point is an average over 106 slots.2 0.8 0.38 3. Further. S (t)) ≥ λ1 .6 0.S2 )∈S P r[S1 . S2 ) ≥ 0 ∀(S1 . λ2 ) is the minimum time average expected power expenditure that can be achieved by any control policy that stabilizes the system (including policies that are not S only) (21). λ2 ) ˆ ˆ ˆ E b1 (α ∗ (t)..1 0.S2 )∈S P r[S1 . S2 ) ≥ 0 .7 0. S2 ]S1 q1 (S1 . S2 ) ≤ 1 ∀(S1 . q2 (S1 . λ2 ) as the minimum average power that can be achieved by any S only algorithm that makes all queues rate stable. Our goal is now to make transmission decisions to jointly stabilize the system while also striving to minimize average power expenditure. S2 )) λ1 ≤ (S1 . S2 ) + q2 (S1 .10) Average queue backlog E[Q1 + Q2] 40 Z max 35 30 Y { (0.4 0. S2 ) q1 (S1 . as loading is pushed from point X (i.3)(3. deﬁne (λ1 . and no power is spent if we remain idle. λ2 ) is continuous.. we spend 1 unit of power if we transmit over either channel. S2 ) + q2 (S1 .9 1 X 1 Figure 3.14. convex. there is an S only algorithm α ∗ (t) such that: (λ1 . The value (λ1 . S2 ]S2 q2 (S1 . λ2 ) in the capacity region .70.
We now show that this intuitive algorithm leads to a provable powerbacklog tradeoff: Average power can be pushed arbitrarily close to (λ1 .24) E p(α (t)) ˆ ∗ 3. Instead of taking a control action to minimize a bound on (Q(t)).3.16) yields a bound on the driftpluspenalty: 2 (Q(t)) + V E {p(t)Q(t)} ≤ B + V E p(α(t))Q(t) + ˆ i=1 2 Qi (t)λi (3.11). While taking actions to minimize a bound on (Q(t)) every slot t would stabilize the system.2 units/slot. at the expense of incurring an average queue backlog that is O(V ). Then the driftminimizing algorithm of the previous section would transmit over channel 2 whenever the queue is not empty and S2 (t) ∈ {1. there exists an S only algorithm α ∗ (t) such that: ˆ E b1 (α ∗ (t). λ2 + ) ∈ for all such that 0 ≤ ≤ max (λ).25) −E i=1 ˆ Qi (t)bi (α(t). it can be shown that the system is still stable. so that (λ1 + . We thus decide according to the above weighted sum.2. and so adding V E {p(t)Q(t)} to both sides of (3.2. λ2 ) by using a large value of V . 2}.1 DRIFTPLUSPENALTY Deﬁne the same Lyapunov function L(Q(t)) as in (3. and recall that P r[S2 (t) = 2] = 0. we minimize a bound on the following driftpluspenalty expression: (Q(t)) + V E {p(t)Q(t)} where V ≥ 0 is a parameter that represents an “importance weight” on how much we emphasize power minimization. but power expenditure is reduced to its minimum of λ2 /2 = 0. S (t)) ˆ E b2 (α (t). However.16).22) (3. λ2 ) is interior to . For example. λ2 + ) (3. It follows that whenever 0 ≤ ≤ max (λ). which spend one unit of power but only deliver 1 packet. and let (Q(t)) represent the conditional Lyapunov drift for slot t. S (t)) ∗ ≥ λ1 + ≥ λ2 + = (λ1 + . if we only transmit when S2 (t) = 2 and when the number of packets in the queue is at least 2. We have already computed a bound on (Q(t)) in (3. but we also want to make E {p(t)Q(t)} small so that we do not incur a large power expenditure. the resulting average power expenditure might be unnecessarily large. suppose the rate vector is (λ1 . STABILITY AND AVERAGE POWER MINIMIZATION 39 Now assume that λ = (λ1 .3. it would sometimes use “inefﬁcient” transmissions when S2 (t) = 1. S (t))Q(t) . 0. Such a control decision can be motivated as follows: We want to make (Q(t)) small to push queue backlog towards a lower congestion state.4). λ2 ) = (0. In particular.23) (3.
using the concept of opportunistically minimizing an expectation.26) where α ∗ (t) is any other (possibly randomized) transmission decision that can be made on slot t.2. this is accomplished by greedily minimizing: 2 value = V p(α(t)) − ˆ i=1 ˆ Qi (t)bi (α(t).27) . S2 (t)) every slot t and chooses an action α(t) to minimize the righthandside of the above inequality. Again. S (t)) We thus compare the following values and choose the action corresponding to the smallest (breaking ties arbitrarily): • value[1] = V − Q1 (t)S1 (t) if α(t) = “Transmit over channel 1. Plugging the S only algorithm (3. Q2 (t)) and (S1 (t).” • value[Idle] = 0 if α(t) = “Idle.2 ANALYSIS OF THE DRIFTPLUSPENALTY ALGORITHM Because our decisions α(t) minimize the righthandside of the driftpluspenalty inequality (3. we have: ˆ (Q(t)) + V E {p(t)Q(t)} ≤ B + V E p(α ∗ (t))Q(t) + 2 2 Qi (t)λi i=1 −E i=1 ˆ Qi (t)bi (α ∗ (t).” • value[2] = V − Q2 (t)S2 (t) if α(t) = “Transmit over channel 2. λ2 + ) + i=1 Qi (t)λi − i=1 Qi (t)(λi + ) 2 = B +V (λ1 + . λ2 + ) − i=1 Qi (t) (3.22)(3.24) into the righthandside of the above inequality and noting that this policy makes decisions independent of queue backlog yields: 2 (Q(t)) + V E {p(t)Q(t)} ≤ B + V 2 (λ1 + .40 3. DYNAMIC SCHEDULING EXAMPLE where we have used the fact that p(t) = p(α(t)). Now assume that λ is interior to .25) on every slot t (given the observed Q(t)). The driftpluspenalty algorithm then observes ˆ (Q1 (t). S (t))Q(t) (3. and ﬁx any value such that 0 ≤ ≤ max (λ).” 3.
2. the system evolves according to a countably inﬁnite state space Discrete Time Markov Chain (DTMC).28) by T . T − 1} for some positive integer T yields: T −1 E {L(Q(T ))} − E {L(Q(0))} + V t=0 E {p(t)} ≤ BT + V T T −1 2 (λ1 + . λ2 + ) + + 1 T t=0 T −1 2 t=0 i=1 B V (3.29) thus yields: B (λ1 .2.30) hold for any that satisﬁes 0 ≤ ≤ max (λ). STABILITY AND AVERAGE POWER MINIMIZATION 41 Taking expectations of the above inequality and using the law of iterated expectations as before yields: 2 E {L(Q(t + 1))} − E {L(Q(t))} + V E {p(t)} ≤ B + V (λ1 + . . and it can be shown that the limits in (3. and hence they can be optimized separately. 1. Taking limits as T → ∞ shows that:1 1 p = lim T →∞ T Q1 + Q2 = lim T →∞ T −1 E {p(t)} ≤ E {Qi (t)} ≤ B (λ1 + . .30) shows that both queues are strongly stable. Using = 0 in (3. Plugging max (λ) into (3.31) V 1 In this simple example.3.28) by V T and the second follows by dividing (3. λ2 + ) − p] 3.28) − t=0 i=1 Rearranging terms in the above and neglecting nonnegative quantities where appropriate yields the following two inequalities: 1 T T −1 E {p(t)} ≤ E {Qi (t)} ≤ (λ1 + .3 OPTIMIZING THE BOUNDS The bounds (3. λ2 + ) − + E {L(Q(0))} T where the ﬁrst inequality follows by dividing (3. . λ2 + ) + 1 T t=0 T −1 2 t=0 i=1 B E {L(Q(0))} + V VT 1 T T −1 t=0 E {p(t)}] B + V [ (λ1 + .30) V [ (λ1 + . . λ2 + ) E {Qi (t)} (3.30) are well deﬁned. .29) and (3. λ2 ) ≤ p ≤ (λ1 .29) (3. λ2 ) + (3.29) and (3. λ2 + ) − i=1 E {Qi (t)} Summing the above over t ∈ {0.
It is clear from the ﬁgures that average power converges to the optimal p∗ = 0.31) implies the time average power p is arbitrarily close to the optimum (λ1 . 0] for all slots t ≥ 0. 0] for all slots t ≥ 0. Performance can be signiﬁcantly improved by noting that the driftpluspenalty algorithm given in Section 3.32) Q1 + Q2 ≤ B + 2V max (λ) The performance bounds (3. and so Q2 (t) ≥ place Q2 = max[V /2 − 2.7.3. This comes with a tradeoff: The average queue backlog bound in (3. λ2 ). provided that this holds at t = 0. It follows that we can stack the queues with fake packets (called placeholder packets) that never get transmitted. Q1 (t) ≥ Q1 = max[V − 1.42 3. Hence. Similarly.45. the algorithm never transmits from queue 2 unless Q2 (t) ≥ V /2. value[1] would be place positive).45 V (3.31) and (3.32) is O(V ). B = 1. Plugging the above into (3. 3. which corresponds to point Y in Fig. λ2 = 0.31)(3. λ2 ) ≤ 2 where the ﬁnal inequality holds because it requires at most one unit of energy to support each new packet. 3. Each simulated data point represents a simulation over 2 × 106 slots using a particular value of V .3 and 3.32) become: p ≤ Q1 + Q 2 ≤ (λ1 . Because p ≥ (λ1 .2. as described .45 + 2V 0.32) demonstrate an [O(1/V ). λ2 ).4 plot simulations for this system together with the above power and backlog bounds.7 as V increases. max (λ) = 0. it can be shown that: (λ1 + . λ2 ) + 1. O(V )] powerbacklog tradeoff: We can use an arbitrarily large V to make B/V arbitrarily small.1(b). and yields: (3. while average backlog increases linearly in V .12 1. and so increasing the total input rate from λ1 + λ2 to λ1 + λ2 + 2 increases the minimum required average power by at most 2 . Then the bounds (3.12.4 SIMULATIONS OF THE DRIFTPLUSPENALTY ALGORITHM Consider the previous example of Bernoulli arrivals with λ1 = 0. so that (3. 3. λ2 + ) − p ≤ (λ1 + . λ2 ).1 never transmits from queue 1 unless Q1 (t) ≥ V (else.30) yields: Q1 + Q 2 ≤ The above holds for all that satisfy 0 ≤ ≤ B + 2V so plugging = max (λ) max (λ). DYNAMIC SCHEDULING EXAMPLE where the ﬁrst inequality follows because our algorithm stabilizes the network and thus cannot yield time average expected power lower than (λ1 . provided this holds at t = 0. the inﬁmum time average expected power required for stability of any algorithm.2.33) (3. λ2 + ) − (λ1 . Values of V in the range 0 to 100 are shown.34) Figs.
8 Simulation with and without placeholders (indistinguishable) Optimal value p* 0 10 20 30 40 50 60 70 80 90 100 0.3 GENERALIZATIONS The reader can easily see that the analysis in this chapter. rather than 2V as suggested in (3. Indeed. 3.4.3 and 3.1 Upper bound 1 1 100 Bound (with placeholders) Simulation (with placeholders) 50 0. 0.33).3: Average power versus V with (λ1 .45 + 2V − max[V − 1.5V . 0.3). 3.3. λ2 ) = (0.3.34).6 250 1.5 43 Average backlog versus V 1. which considers an example system of 2 queues.8 of the next chapter. 3. the simulated power expenditure curves for the cases with and without placeholders are indistinguishable in Fig. 3. but it has a signiﬁcantly improved queue backlog bound given by: (with placeholders) Q1 + Q2 ≤ 1. This holds for systems with more general channel states S (t). The queue backlog improvements due to placeholders are quite signiﬁcant (Fig. This placeholder technique yields the same power guarantee (3.3 Average power 1. 3. In particular: ˆ • The vector S (t) might have an inﬁnite number of possible outcomes (rather than just 6 outcomes). Figure 3. 4.4). in that case the “min driftˆ pluspenalty” algorithm generalizes to choosing α(t) to maximize K Qk (t)bk (α(t). A plot of queue values over the ﬁrst 3000 slots is given in Chapter 4.7). in more detail in Section 4. S (t)) and “penalty functions” p(α(t)). 0] − max[V /2 − 2. Fig.9 0. α(t) might represent one of an . Alternatively. with no noticeable difference in power expenditure (Fig. a dramatic savings when V is large.4 Average backlog E[Q + Q ] (packets) 200 Bound (without placeholders) Simulation (without placeholders) 1.3.4: Average backlog versus V with (λ1 . 0] 0. GENERALIZATIONS Average power versus V 1. more general resource ˆ ˆ allocation decisions α(t). the average queue bound under the placeholder technique grows like 0.3. Indeed.2 2 150 1. Simulations of the placeholder technique are also shown in Figs. • The decision α(t) might represent one of an inﬁnite number of possible power allocation options (rather than just one of three options). S (t)) − k=1 V p(α(t)). and for arbitrary rate functions bk (α(t).12 Thus.3.2.7). λ2 ) = (0. can be repeated for a larger system of K queues.7 0 0 10 20 30 40 50 60 70 80 90 100 V V Figure 3.
The next chapter presents the general theory. etc. beamforming.i. ˆ • The rate function bk (α(t).). S (t)) can be any function that maps a resource allocation decision α(t) and a channel state vector S (t) into a transmission rate (and does not need to have the structure (3. • The “penalty” function p(α(t)) does not have to represent power. Finally. it shows how to analyze systems with noni.2)). DYNAMIC SCHEDULING EXAMPLE inﬁnite number of more sophisticated physical layer actions that can take place on slot t (such as modulation.44 3. and nonergodic arrival and channel processes. coding.d. It develops an important concept of virtual queues to ensure general time average equality and inequality constraints are satisﬁed. . and it can be any general ˆ function of α(t). It also considers variable V algorithms that achieve the exact minimum average penalty subject to mean rate stability (which typically incurs inﬁnite average backlog).
1)(1.11) where of Chapter 3. deﬁne a quadratic Lyapunov function L( (t)) as follows: N 1 wn n (t)2 (4.The reason we use notation (t) to represent a queue vector. H (t) are suitably chosen virtual queues. Allowing n (t) to take negative values is often useful for the virtual queues that are deﬁned later. . given that the current state in slot t is (t). as the drift may be due to a nonstationary policy. We typically use wn = 1 for all n. ≥ 0 such that the following drift condition 1 Strictly speaking. is that in later sections we deﬁne (t)=[Q(t).2) {wn }N n=1 This drift is the expected change in the Lyapunov function over one slot. we use the ( (t)) as a formal representation of the righthandside of (4.1) L( (t))= 2 n=1 are a collection of positive weights.45 CHAPTER 4 Optimizing Time Averages This chapter considers the problem (1. 4. and let (t) = ( 1 (t). instead of Q(t). We ﬁrst develop the main results of Lyapunov drift and Lyapunov optimization theory.1 (Lyapunov Drift) Consider the quadratic Lyapunov function (4. and assume E {L( (0))} < ∞. where Q(t) is a vector of actual queues in the network and Z (t).5). . H (t)]. This function L( (t)) is always nonnegative. although different weights are often useful to allow queues to be treated differently.1). better notation would be simpler notation ( (t). as in (3. . However. Z (t). . and it is equal to zero if and only if all components of (t) are zero. As a scalar measure of the “size” of the vector (t). Suppose there are constants B > 0. 4.} according to some probability law. Deﬁne the oneslot conditional Lyapunov drift ( (t)) as follows:1 ( (t))=E {L( (t + 1)) − L( (t)) (t)} (4.1 LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION Consider a system of N queues. N (t)) be the queue backlog vector. t). The components n (t) are real numbers and can possibly be negative. . . .1 LYAPUNOV DRIFT THEOREM Theorem 4.2). 1. 2. Assume the (t) vector evolves over slots t ∈ {0. . which seeks to minimize the time average of a network attribute subject to additional time average constraints.1.
we have: E n (t) 2 ≤ 2E {L( (0))} 2Bt + wn wn . . . > 0. . .3) Then: a) If b) If ≥ 0 then all queues n (t) are mean rate stable.5) that for all slots t > 0: E {L( (t))} − E {L( (0))} ≤ Bt Using the deﬁnition of L( (t)) yields: 1 2 N wn E n=1 2 n (t) ≤ E {L( (0))} + Bt Therefore.5) Now assume that > 0. and using the fact that E {L( (t))} ≥ 0 yields: 1 t t−1 N E { τ =0 n=1 n (τ )} ≤ B + E {L( (0))} t (4.4) Proof. . . 1. . . .3) and using the law of iterated expectations yields: N E {L( (τ + 1))} − E {L( (τ ))} ≤ B − n=1 E { n (τ )} Summing the above over τ ∈ {0. OPTIMIZING TIME AVERAGES holds for all slots τ ∈ {0. for all n ∈ {1.} and all possible (τ ): N ( (τ )) ≤ B − n=1  n (τ ) (4. Taking a limit as t → ∞ proves part (b). Taking expectations of (4. Dividing by t . rearranging terms. we have from (4. 1. N}. 2.6) The above holds for all slots t > 0. We ﬁrst prove part (b). . then all queues are strongly stable and: lim sup t→∞ 1 t t−1 N E { τ =0 n=1 n (τ )} ≤ B (4.46 4. t − 1} for some slot t > 0 and using the law of telescoping sums yields: t−1 N E {L( (t))} − E {L( (0))} ≤ Bt − τ =0 n=1 E { n (τ )} (4. To prove part (a). .
for all slots t > 0. 4. we have an associated stochastic “penalty” process y(t) whose time average we want to make less than (or close to) some target value y ∗ . The process y(t) can represent penalties incurred by control actions on slot t.3) holds with ≥ 0.8) (Lyapunov Optimization) Suppose L( (t)) and ymin are deﬁned by (4. V ≥ 0. then all queues are strongly stable with time average expected queue backlog bounded by B/ . all queues n (t) are mean rate stable.2 LYAPUNOV OPTIMIZATION THEOREM Suppose that. we have: ( (τ )) + V E {y(τ ) (τ )} ≤ B + V y ∗ − N  n=1 n (τ ) (4. .1. Further. LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION 47 2 However.10) (4.7)). n (t)} ≤ (4. 2 The above theorem shows that if the drift condition (4. etc. . in addition to the queues (t) that we want to stabilize. We note that the proof reveals further detailed information concerning expected queue backlog for all slots t > 0.8). Further. showing how the affect of the initial condition (0) decays over time (see (4. 1. if V > 0 and penalty and queue backlog satisfy: 1 lim sup t→∞ t lim sup t→∞ t−1 τ =0 > 0 then time average expected E {y(τ )} ≤ y ∗ + n (τ )} B V (4.1) and (4. if > 0. proving part (a).1. 2. Assume the expected penalty is lower bounded by a ﬁnite (possibly negative) value ymin . ≥ 0. so that ( (t)) ≤ B. we have: E {y(t)} ≥ ymin Theorem 4. Suppose there are constants B ≥ 0. then all queues are mean rate stable.2 (4.9) Then all queues n (t) are mean rate stable. so that for all t and all possible control actions.7) Dividing by t and taking a limit as t → ∞ proves that: lim E { n (t)} t→∞ t ≤ lim t→∞ 2B 2E {L( (0))} + =0 twn t 2 wn Thus. because the variance of  Thus.11) 1 t t−1 N E { τ =0 n=1 ≤ B + V (y ∗ − ymin ) . . packet drops.6) and (4. we have: E { n (t) cannot be negative. and that E {L( (0))} < ∞.} and all possible values of (τ ). and y ∗ such that for all slots τ ∈ {0. we have E 2E {L( (0))} 2Bt + wn wn 2 n (t) ≥ E { n (t)} . such as power expenditures.4.
14) follows by dividing (4. Taking limits of the above as t → ∞ proves (4.9) holds for this slot. The proof reveals further details concerning the affect of the initial condition (0) on time average expectations at any slot t (see (4. if V = 0 then (4.13) follows by dividing (4. OPTIMIZING TIME AVERAGES Finally. Rearranging (4. However.11) still holds.10) still holds.10) and hence is either less than the target value y ∗ .12) also yields: E {L( (t))} ≤ E {L( (0))} + (B + V (y ∗ − ymin ))t from which mean rate stability follows by an argument similar to that given in the proof of Theorem 4. or differs from y ∗ by no more than a “fudge factor” B/V . then the time average expected penalty satisﬁes (4.13) (4.14)). we often call this a performancedelay tradeoff.13) and (4. O(V )]. 2 Theorem 4. t − 1} for some t > 0 and using the law of telescoping sums yields: t−1 E {L( (t))} − E {L( (0))} + V τ =0 E {y(τ )} ≤ (B + V y ∗ )t − t−1 N E { τ =0 n=1 n (τ )} (4. Proof. Because (4.12) Rearranging terms and neglecting nonnegative terms when appropriate. and (4. as shown by (4. Because Little’s Theorem tells us that average queue backlog is proportional to average delay (129).12) by V t. 1. we can take expectations of both sides and use the law of iterated expectations to yield: E {L( (τ + 1))} − E {L( (τ ))} + V E {y(τ )} ≤ B + V y ∗ − N E { n=1 n (τ )} Summing over τ ∈ {0. .48 4. which can be made arbitrarily small as V is increased. we can design a control algorithm to ensure the drift condition (4. This presents a performancebacklog tradeoff of [O(1/V ).11).14) E { τ =0 n=1 ≤ B + V (y ∗ − ymin ) where (4. the time average queue backlog bound increases linearly in the V parameter. and if = 0 then (4. it is easy to show that the above inequality directly implies the following two inequalities for all t > 0: 1 t 1 t t−1 N t−1 τ =0 E {y(τ )} ≤ y ∗ + n (τ )} E {L( (0))} B + V Vt + E {L( (0))} t (4. . .9) is satisﬁed on every slot τ .11). .10) and (4.12) by t.2 can be understood as follows: If for any parameter V > 0. Fix any slot τ .1. .
representing how much we emphasize penalty minimization. developed for utility optimal ﬂow control in (17)(18) and used for average power optimization in (20)(21) and for problems similar to the type (1.1) τ =0 . 1. . . 1. there exists a particular control action that satisﬁes the drift requirement (4. . we minimize a weighted sum of drift and penalty. 2. X(t − 2). We have the following preliminary lemma. 2.9). related to the Kolmogorov law of large numbers: Lemma 4.5) and (1. The case for V > 0 includes a weighted penalty term in the greedy minimization. which reduces to the TassiulasEphremides technique for network stability in (7)(8). Thus. where the penalty is scaled by an “importance” weight V .1)(1. 2. it still ensures strong stability by (4.3 PROBABILITY 1 CONVERGENCE Here we present a version of the Lyapunov optimization theorem that treats probability 1 convergence of sample path time averages. X(0)} ≤ β Then: lim sup t→∞ t−1 1 t X(τ ) ≤ β (w. and suppose that the following hold: • E X(t)2 is ﬁnite for all t ∈ {0. . rather than time average expectations. observe the current (τ ) values and take a control action that.15) It follows that if on every slot τ .} and all possible X(0). X(t − 1). . For intuition. . . . . . . 3.10) becomes inﬁnity for V = 0).3 Let X(t) be a random process deﬁned over t ∈ {0.11).1. .11) in (22). .} and satisﬁes: ∞ t=1 E X(t)2 <∞ t2 • There is a realvalued constant β such that for all t ∈ {1.}. greedily minimizes the driftpluspenalty expression on the lefthandside of the desired drift inequality (4. note that taking an action on slot τ to minimize the drift ( (τ )) alone would tend to push queues towards a lower congestion state. subject to the known (τ ). Using V = 0 corresponds to minimizing the drift ( (τ )) alone. While this does not provide any guarantees on the resulting time average penalty y(t) (as the bound (4. then the driftpluspenalty minimizing policy must also satisfy this drift requirement.9): ( (τ )) + V E {y(τ ) (τ )} (4. .4. .1. but it may incur a large penalty y(τ ). 4.p. . the conditional expectation satisﬁes: E {X(t)X(t − 1). LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION 49 This result suggests the following control strategy: Every slot τ . .6)(1. and corresponds to our technique for joint stability and performance optimization.
Theorem 4. . .p. y(1). we have: (4.18) hold.4 (Lyapunov Optimization with Probability 1 Convergence) Deﬁne L( (t)) by (4.50 4. . . > 0. and: ∞ t=1 E y(t)2 <∞ t2 (4. 2.} and all possible H(τ ). . y(0). .}. and all possible H(t). so that: (4. . . 1. we must condition on the full history H(t).1). (1).” A proof of this lemma is given in (138) as a simple application of the Kolmogorov law of large numbers for martingale differences. . all t. N}. t − 1}. . Speciﬁcally. . t} and values of y(τ ) for τ ∈ {0. .p. . V > 0. . OPTIMIZING TIME AVERAGES where “(w. .17) • There is a ﬁnite constant D > 0 such that for all n ∈ {1. 2. y(t − 1)} Deﬁne (t. . . . (t). for integers t ≥ 0 deﬁne: H(t)={ (0). Suppose there are constants B ≥ 0.16)(4. Rather than deﬁning a drift that conditions on (t). See (139)(140)(130)(141) for background on martingales and a statement and proof of the Kolmogorov law of large numbers. H(t))=E {L( (t + 1)) − L( (t))H(t)} Assume that: • The penalty process y(t) is deterministically lower bounded by a (possibly negative) constant ymin . assume that (0) is ﬁnite with probability 1.18) E ( n (t + 1) − n (t))4 H(t) ≤ D so that conditional fourth moments of queue changes are uniformly bounded.1)” stands for “with probability 1.1) • The second moments E y(t)2 are ﬁnite for all t ∈ {0. we have: (τ. and y ∗ such that for all slots τ ∈ {0. . 1. . . as before. The lemma is used in (138) to prove the probability 1 version of the Lyapunov optimization theorem given below. . . Let (t) be a vector of queues and y(t) a penalty process. . which includes values of (τ ) for τ ∈ {0. and suppose that assumptions (4. H(t)) by: (t.16) y(t) ≥ ymin ∀t (w. . . H(τ )) + V E {y(τ )H(τ )} ≤ B + V y ∗ − N  n=1 n (τ ) .
. E {X(t)X(t − 1). that E X(t)2 is ﬁnite for all t.22) However.21) Proof.19) (4. . .p. Deﬁne the process X(t) for t ∈ {0. LYAPUNOV DRIFT AND LYAPUNOV OPTIMIZATION 51 Then all queues n (t) are rate stable.1) (4.p. and: lim sup t→∞ 1 t t−1 τ =0 y(τ ) ≤ y ∗ + n (τ ) B V (w.1) (w. X(0): ∞ t=1 E X(t)2 < ∞ .3 to X(t) to yield: lim sup t→∞ 1 t t−1 X(τ ) ≤ 0 (w. .p. X(0)} ≤ 0 t2 Thus. 2. 1.1. by deﬁnition of X(t). .1) (4. X(t − 2). and that for all t > 0 and all possible values of X(t − 1). H(τ )) + V E {y(τ )H(τ )} ≤ B + V y Then: 1 lim sup t→∞ t t−1 y(τ ) ≤ y + B/V τ =0 (w.} as X(t)=L( (t + 1)) − L( (t)) + V y(t) − B − V y ∗ + N  n=1 n (t) The conditions on y(t) and (t) are shown in (138) to ensure that the queues n (t) are rate stable.1) τ =0 (4. . . and if there is a value y such that the following additional inequality also holds for all τ and all possible (τ ): (τ.20) lim sup t→∞ 1 t t−1 N  τ =0 n=1 ≤ B + V (y ∗ − ymin ) Further. we have for all t > 0: 1 t t−1 X(τ ) = τ =0 L( (t)) − L( (0)) 1 + t t t−1 N V y(τ ) + τ =0 n=1  n (τ ) − B − V y∗ . .4. if these same assumptions hold. . . we can apply Lemma 4. Fix follows: (0) as a given ﬁnite initial condition.p. .
18) hold. bk (t) = bk (α(t).1: An illustration of a general Kqueue network with attributes yl (t). aK (t)) and b(t) = (b1 (t). A similar argument proves (4. 4. .4 via Lemma 4. . . as shown in Fig. .23) where a(t) = (a1 (t). L} 2) ej(t) for j in {1.. . . ω(t)) ˆ . ω(t)) . . 2 Conditioning on the history H(t) is needed to prove Theorem 4...16)(4.. QK (t)).11).52 4.10) and (4. J} Random State: Control Action: (t) (t) aK(t) QK(t) bK(t) Figure 4.22) proves the results (4. 4.20). . bK (t)) are general functions of a random event ω(t) and a control action α(t): ˆ ak (t) = ak (α(t). ej (t).21). .. .2 GENERAL SYSTEM MODEL a1(t) a2(t) Q1(t) Q2(t) b1(t) b2(t) Attributes: 1) yl(t) for l in {1.19)(4. OPTIMIZING TIME AVERAGES Rearranging terms in the above inequality and neglecting nonnegative terms where appropriate directly leads to the following two inequalities that hold for all t > 0 : 1 Vt 1 t t−1 X(τ ) ≥ τ =0 t−1 −L( (0)) 1 + Vt t −L( (0)) 1 + t t t−1 y(τ ) − [B/V + y ∗ ]  n (τ ) − τ =0 t−1 N τ =0 n=1 X(τ ) ≥ τ =0 [B + V (y ∗ − ymin )] Taking limits of the above two inequalities and using (4. Queue dynamics are given by: Qk (t + 1) = max[Qk (t) − bk (t). . A policy that greedily minimizes (t. 0] + ak (t) (4.3.. In this text. H(t)) + V E {y(t)H(t)} every slot will also greedily minimize ( (t)) + V E {y(t) (t)}. . Consider now a system with queue backlog vector Q(t) = (Q1 (t). we focus primarily on time average expectations of the type (4. .1. with the understanding that the same bounds can be shown to hold for time averages (with probability 1) if the additional assumptions (4. .
4.3. OPTIMALITY VIA ωONLY POLICIES
53
Every slot t the network controller observes ω(t) and chooses an action α(t) ∈ Aω(t) .The set Aω(t) is the action space associated with event ω(t). In addition to affecting these arrival and service variables, α(t) and ω(t) also determine the attribute vectors x(t), y (t), e(t) according to general functions xm (α, ω), yl (α, ω), ej (α, ω), as described in Section 1.2. ˆ ˆ ˆ We assume that ω(t) is a stationary process with a stationary probability distribution π(ω). Assume that ω(t) takes values in some sample space . If is a ﬁnite or countably inﬁnite set, then for each ω ∈ , π(ω) represents a probability mass function associated with the stationary distribution, and: P r[ω(t) = ω] = π(ω) ∀t ∈ {0, 1, 2, . . .} (4.24) If is uncountably inﬁnite, then we assume ω(t) is a random vector, and that π(ω) represents a probability density associated with the stationary distribution. The simplest model, which we mainly consider in this text, is the case when ω(t) is i.i.d. over slots t with stationary probabilities π(ω).
4.2.1
BOUNDEDNESS ASSUMPTIONS
The arrival function ak (α, ω) is assumed to be nonnegative for all ω ∈ and all α ∈ Aω . The ˆ ˆ service function bk (·) and the attribute functions xm (·), yl (·), ej (·) can possibly take negative values. ˆ ˆ ˆ All of these functions are general (possibly nonconvex and discontinuous). However, we assume that these functions, together with the stationary probabilities π(ω), satisfy the following boundedness properties: For all t and all (possibly randomized) control decisions α(t) ∈ Aω(t) , we have: E ak (α(t), ω(t))2 ˆ ˆ E bk (α(t), ω(t))2 E xm (α(t), ω(t))2 ˆ E yl (α(t), ω(t))2 ˆ E ej (α(t), ω(t))2 ˆ ≤ σ 2 ∀k ∈ {1, . . . , K} ≤ σ 2 ∀k ∈ {1, . . . , K} ≤ σ 2 ∀m ∈ {1, . . . , M} ≤ σ 2 ∀l ∈ {1, . . . , L} ≤ σ 2 ∀j ∈ {1, . . . , J } (4.25) (4.26) (4.27) (4.28) (4.29)
for some ﬁnite constant σ 2 > 0. Further, for all t and all actions α(t) ∈ Aω(t) , we require the expectation of y0 (t) to be bounded by some ﬁnite constants y0,min , y0,max : y0,min ≤ E y0 (α(t), ω(t)) ≤ y0,max ˆ (4.30)
4.3
OPTIMALITY VIA ωONLY POLICIES
For each l ∈ {0, 1, . . . , L}, deﬁne y l (t) as the time average expectation of yl (t) over the ﬁrst t slots under a particular control strategy: y l (t)= 1 t
t−1
E {yl (τ )}
τ =0
54
4. OPTIMIZING TIME AVERAGES
where the expectation is over the randomness of the ω(τ ) values and the random control actions. Deﬁne time average expectations a k (t), bk (t), ej (t) similarly. Deﬁne y l and ej as the limiting values of y l (t) and ej (t), assuming temporarily that these limits are well deﬁned. We desire a control policy that solves the following problem: Minimize: Subject to: 1) 2) 3) 4) y0 y l ≤ 0 ∀l ∈ {1, . . . , L} ej = 0 ∀j ∈ {1, . . . , J } Queues Qk (t) are mean rate stable ∀k ∈ {1, . . . , K} α(t) ∈ Aω(t) ∀t
The above description of the problem is convenient, although we can state the problem more precisely without assuming limits are well deﬁned as follows: Minimize: lim sup y 0 (t)
t→∞ t→∞
(4.31) (4.32) (4.33) (4.34) (4.35)
Subject to: 1) lim sup y l (t) ≤ 0 ∀l ∈ {1, . . . , L} 2) 3) Queues Qk (t) are mean rate stable ∀k ∈ {1, . . . , K} 4) α(t) ∈ Aω(t) ∀t
t→∞
lim ej (t) = 0 ∀j ∈ {1, . . . , J }
An example of such a problem is when we have a Kqueue wireless network that must be stabilized subject to average power constraints P l ≤ Plav for each node l ∈ {1, . . . , L}, where P l represents the time average power of node l, and Plav represents a prespeciﬁed average power constraint. Suppose the goal is to maximize the time average of the total admitted trafﬁc. Then y0 (t) is −1 times the admitted trafﬁc on slot t. We also deﬁne yl (t) = Pl (t) − Plav , being the difference between the average power expenditure of node l and its time average constraint, so that y l ≤ 0 corresponds to P l ≤ Plav . In this example, there are no time average equality constraints, and so J = 0. See also Section 4.6 and Exercises 2.11, 4.74.14 for more examples. Consider now the special class of stationary and randomized policies that we call ωonly policies, which observe ω(t) for each slot t and independently choose a control action α(t) ∈ Aω(t) as a pure (possibly randomized) function of the observed ω(t). Let α ∗ (t) represent the decisions under such an ωonly policy over time t ∈ {0, 1, 2, . . .}. Because ω(t) has the stationary distribution π(ω) for all t, the expectation of the arrival, service, and attribute values are the same for all t: E yl (α ∗ (t), ω(t)) ˆ E ej (α ∗ (t), ω(t)) ˆ E ak (α ∗ (t), ω(t)) ˆ ˆ E bk (α ∗ (t), ω(t)) = y l ∀l ∈ {0, 1, . . . , L} = ej ∀j ∈ {1, . . . , J } = a k ∀k ∈ {1, . . . , K} = bk ∀k ∈ {1, . . . , K}
for some quantities y l , ej , a k , bk . In the case when is ﬁnite or countably inﬁnite, the expectations above can be understood as weighted sums over all ω values, weighted by the stationary distribution
4.3. OPTIMALITY VIA ωONLY POLICIES
55
π(ω). Speciﬁcally: E yl (α ∗ (t), ω(t)) = ˆ
ω∈
π(ω)E yl (α ∗ (t), ω)ω(t) = ω ˆ
The above expectations y l , ej , a k , bk are ﬁnite under any ωonly policy because of the boundedness assumptions (4.25)(4.30). In addition to assuming ω(t) is a stationary process, we make the following mild “law of large numbers” assumption concerning time averages (not time average expectations): Under any ωonly policy α ∗ (t) that yields expectations y l , ej , a k , bk on every slot t, the ˆ inﬁnite horizon time averages of yl (α ∗ (t), ω(t)), ej (α ∗ (t), ω(t)), ak (α ∗ (t), ω(t)), bk (α ∗ (t), ω(t)) ˆ ˆ ˆ are equal to y l , ej , a k , bk with probability 1. For example: 1 t→∞ t lim
t−1 τ =0
yl (α ∗ (τ ), ω(τ )) = y l (w.p.1) ˆ
where “(w.p.1)” means “with probability 1.” This is a mild assumption that holds whenever ω(t) is i.i.d. over slots. This is because, by the law of large numbers, the resulting yl (α ∗ (t), ω(t)) process ˆ is i.i.d. over slots with ﬁnite mean y l . However, this also holds for a large class of other stationary processes, including stationary processes deﬁned over ﬁnite state irreducible Discrete Time Markov Chains (as considered in Section 4.9). It does not hold, for example, for degenerate stationary processes where ω(0) can take different values according to some probability distribution, but is then held ﬁxed for all slots thereafter so that ω(t) = ω(0) for all t. Under these assumptions, we say that the problem (4.31)(4.35) is feasible if there exists a opt control policy that satisﬁes the constraints (4.32)(4.35). Assuming feasibility, deﬁne y0 as the inﬁmum value of the cost metric (4.31) over all control policies that satisfy the constraints (4.32)(4.35). opt This inﬁmum is ﬁnite by (4.30). We emphasize that y0 considers all possible control policies that choose α(t) ∈ Aω(t) over slots t, not just ωonly policies. However, in Appendix 4.A, it is shown that opt y0 can be computed in terms of ωonly policies. Speciﬁcally, it is shown that the set of all possible limiting time average expectations of the variables [(yl (t)), (ej (t)), (ak (t)), (bk (t))], considering all possible algorithms, is equal to the closure of the set of all oneslot averages [(y l ), (ej ), (a k ), (bk )] achievable under ωonly policies. Further, the next theorem shows that if the problem (4.31)(4.35) opt is feasible, then the utility y0 and the constraints y l ≤ 0, ej ≤ 0, a k ≤ bk can be achieved arbitrarily closely by ωonly policies.
Theorem 4.5
(Optimality over ωonly Policies) Suppose the ω(t) process is stationary with distribution π(ω), and that the system satisﬁes the boundedness assumptions (4.25)(4.30) and the law of large numbers assumption speciﬁed above. If the problem (4.31)(4.35) is feasible, then for any δ > 0 there is an ωonly
56
4. OPTIMIZING TIME AVERAGES
policy α ∗ (t) that satisﬁes α ∗ (t) ∈ Aω(t) for all t, and: E y0 (α ∗ (t), ω(t)) ≤ y0 + δ ˆ E yl (α ∗ (t), ω(t)) ≤ δ ∀l ∈ {1, . . . , L} ˆ E ej (α ∗ (t), ω(t))  ≤ δ ∀j ∈ {1, . . . , J } ˆ ˆ E ak (α ∗ (t), ω(t)) ≤ E bk (α ∗ (t), ω(t)) + δ ∀k ∈ {1, . . . , K} ˆ
opt
(4.36) (4.37) (4.38) (4.39)
Proof. See Appendix 4.A.
2
The inequalities (4.36)(4.39) are similar to those seen in Chapter 3, which related the existence of such randomized policies to the existence of linear programs that yield the desired time averages. The stationarity of ω(t) simpliﬁes the proof of Theorem 4.5 but is not crucial to its result. Similar results are derived in (15)(21)(136) without the stationary assumption but under the additional assumption that ω(t) can take at most a ﬁnite (but arbitrarily large) number of values and has well deﬁned time averages. We have stated Theorem 4.5 in terms of arbitrarily small values δ > 0. It may be of interest to note that for most practical systems, there exists an ωonly policy that satisﬁes all inequalities (4.36)(4.39) with δ = 0. Appendix 4.A shows that this holds whenever the set , deﬁned as the set of all oneslot expectations achievable under ωonly policies, is closed. Thus, one may prefer a more “aesthetically pleasing” version of Theorem 4.5 that assumes the additional mild closure property in order to remove the appearance of “δ” in the theorem statement. We have presented the theorem in the above form because it is sufﬁcient for our purposes. In particular, we do not require the closure property in order to apply the Lyapunov optimization techniques developed next.
4.4
VIRTUAL QUEUES
To solve the problem (4.31)(4.35), we ﬁrst transform all inequality and equality constraints into queue stability problems. Speciﬁcally, deﬁne virtual queues Zl (t) and Hj (t) for each l ∈ {1, . . . , L} and j ∈ {1, . . . , J }, with update equations: Zl (t + 1) = max[Zl (t) + yl (t), 0] Hj (t + 1) = Hj (t) + ej (t) (4.40) (4.41)
The virtual queue Zl (t) is used to enforce the y l ≤ 0 constraint. Indeed, recall that if Zl (t) satisﬁes (4.40) then by our basic sample path properties in Chapter 2, we have for all t > 0: 1 Zl (t) Zl (0) − ≥ t t t
t−1
yl (τ )
τ =0
It follows that if we can design a control algorithm that chooses α(t) ∈ Aω(t) for all t. . and so if E {H (t)} /t → 0. . . t − 1}. . Hj (0) ∈ R for all j ∈ {1.5 (as well as Exercise 2. 2 Note by Jensen’s inequality that 0 ≤ E {H (t)}  ≤ E {H (t)}. .35). . and that E Zl (0)2 < ∞ and E Hj (0)2 < ∞ for all l and j . It is easy to see by summing (4. We assume throughout that initial conditions satisfy Zl (0) ≥ 0 for all l ∈ {1. . The virtual queue Hj (t) is designed to turn the time average equality constraint ej = 0 into a pure queue stability problem. then we have solved the problem (4. L}.The Hj (t) queue has a different structure. Hj (t) mean rate stable. J }. makes all actual queues Qk (t) and virtual queues Zl (t). . the lefthandside of the above inequality is 0 and so: lim sup y l (t) ≤ 0 t→∞ This means our desired time average constraint for yl (t) is satisﬁed. because it enforces an equality constraint rather than an inequality constraint. and can possibly be negative. . if Zl (t) is mean rate stable. .31)(4.4.41) that for any t > 0: t−1 Hj (t) − Hj (0) = τ =0 ej (τ ) Taking expectations and dividing by t yields: E Hj (t) − E Hj (0) = ej (t) t Therefore. and yields a time average opt expectation of y0 (t) that is equal to our target y0 . . This transforms the original problem into a problem of minimizing the time average of a cost function subject to queue stability.4. . This turns the problem of satisfying a time average inequality constraint into a pure queue stability problem! This discussion is of course just a repeated derivation of Theorem 2. . then E {H (t)} /t → 0. if Hj (t) is mean rate stable then:2 t→∞ (4. VIRTUAL QUEUES 57 Taking expectations of the above and taking t → ∞ shows: lim sup t→∞ E {Zl (t)} ≥ lim sup y l (t) t t→∞ where we recall that y l (t) is the time average expectation of yl (τ ) over τ ∈ {0. Thus.42) lim ej (t) = 0 so that the desired equality constraint for ej (t) is satisﬁed.11).
and all parameters V ≥ 0: K ( (t)) + V E {y0 (t) (t)} ≤ B + V E {y0 (t) (t)} + k=1 L Qk (t)E {ak (t) − bk (t)  J (t)} + l=1 Zl (t)E {yl (t) (t)} + j =1 Hj (t)E ej (t) (t) (4.i.5 THE MIN DRIFTPLUSPENALTY ALGORITHM Let (t) = [Q(t).2.43) If there are no equality constraints.d.46) . bk (t)].d. Squaring the queue update equation (4.58 4. over slots. the driftpluspenalty expression has the following upper bound for all t.23) and using the fact that max[q − b. we have J = 0 and we remove the Hj (t) queues.1 hold. (4. Such a constant B exists because ω(t) is i.44) where B is a positive constant that satisﬁes the following for all t: B ≥ 1 2 + K E ak (t)2 + bk (t)2  1 2 k=1 J (t) + K 1 2 L E yl (t)2  (t) l=1 E ej (t)2  (t) − j =1 k=1 ˜ E bk (t)ak (t) (t) (4. If there are no inequality constraints.23). Lemma 4.40). Under any control algorithm. Proof.6 Suppose ω(t) is i. (4. H (t)] be a concatenated vector of all actual and virtual queues. OPTIMIZING TIME AVERAGES 4. then L = 0 and we remove the Zl (t) queues.41). 0]2 ≤ (q − b)2 yields: Qk (t + 1)2 Therefore: ak (t)2 + bk (t)2 Qk (t + 1)2 − Qk (t)2 ˜ ≤ − bk (t)ak (t) + Qk (t)[ak (t) − bk (t)] 2 2 ≤ (Qk (t) − bk (t))2 + ak (t)2 + 2 max[Qk (t) − bk (t).45) ˜ where we recall that bk (t) = min[Qk (t). Z (t). 0]ak (t) ˜ = (Qk (t) − bk (t))2 + ak (t)2 + 2(Qk (t) − bk (t))ak (t) (4. and the boundedness assumptions in Section 4. Deﬁne the Lyapunov function: L( (t))= 1 2 K Qk (t)2 + k=1 1 2 L Zl (t)2 + l=1 1 2 J Hj (t)2 j =1 (4. with update equations (4.i. all possible values of (t).
41). . Zl (t + 1)2 − Zl (t)2 yl (t)2 ≤ + Zl (t)yl (t) 2 2 2 − H (t)2 2 Hj (t + 1) ej (t) j = + Hj (t)ej (t) 2 2 (4. .40) and (4. rather than assuming our decisions obtain the exact minimum every slot (or come close to the inﬁmum). and make a control decision α(t) ∈ Aω(t) as follows: ˆ Minimize: V y0 (α(t). J } gives a bound on ( (t)).5). These Cadditive approximations are also . ˆ ˆk (·).49) may not have a well deﬁned minimum when the set Aω(t) is inﬁnite. rather than a deterministic sense. .7 Deﬁnition 4. Before presenting the analysis.48) (4.49) Then update the virtual queues Zl (t) and Hj (t) according to (4. ej (·). in the case when the set Aω(t) contains a ﬁnite (and small) number of b ˆ ˆ possible control actions.48)(4. . THE MIN DRIFTPLUSPENALTY ALGORITHM 59 Similarly. and discontinuous) function of α over all α ∈ Aω(t) . However. ω(t)) − bk (α(t). .31)(4. . j ∈ {1. and the resulting algorithm is given below.4. . ω(t)) + J=1 Hj (t)ej (α(t).5. Min DriftPlusPenalty Algorithm for solving (4. Adding V E {y0 (t) (t)} to both sides proves the result. Its complexity depends on the structure of the functions ak (·). ω(t))] ˆ ˆ k=1 + L Zl (t)yl (α(t).44) (given (t)) that is within a constant C from the inﬁmum over all possible control actions.44). K}.23). we note that the problem (4. ω(t)) ˆ ˆ l=1 j Subject to: α(t) ∈ Aω(t) (4. . every slot t and given the current (t). our strategy actually seeks to minimize the bound given in the righthandside of (4. chooses a (possibly randomized) action α(t) ∈ Aω(t) that yields a conditional expected value on the righthandside of the drift expression (4. observe the current queue states (t) and the random event ω(t).7 allows the deviation from the inﬁmum to be in an expected sense.47) Taking conditional expectations of the above three equations and summing over k ∈ {1. the policy simply evaluates the function over each option and chooses the best one. l ∈ {1.35): Every slot t. it seeks to minimize a (possibly nonlinear. which is useful in some applications. ω(t)) + K Qk (t)[ak (α(t). and the actual queues Qk (t) according to (4. This is done via the framework of opportunistically minimizing a (conditional) expectation as described in Section 1. After observing ω(t). For a given constant C ≥ 0. . a Cadditive approximation of the driftpluspenalty algorithm is one that.8 (see also Exercise 4. we analyze the performance when our implementation comes within an additive constant of the inﬁmum in the righthandside of (4. . . . A remarkable property of this algorithm is that it does not need to know the probabilities π(ω). Deﬁnition 4. 2 Rather than directly minimize the expression ( (t)) + V E {y0 (t) (t)} every slot t. yl (·). nonconvex.44). However. L}.
as shown in Chapter 6.32)(4.i.35) is feasible.max are deﬁned in (4. y0. our implementation comes within an additive constant C of minimizing the righthandside of the drift expression (4. c) Suppose there are constants > 0 and ( ) for which the Slater condition of Assumption A1 holds.10.min . Fix a value C ≥ 0. the problem (4.60 4. over slots. and all required constraints (4.8) Because. stated below in (4.61)(4. as discussed in Section 4. Zl (t). (Theorem 4.64).max − y0.d.8 (Performance of Min DriftPlusPenalty Algorithm) Suppose that ω(t) is i. Hj (t) are mean rate stable. The same algorithm can be shown to offer similar performance under more general ergodic ω(t) processes as well as for nonergodic processes.51) are not just inﬁnite horizon bounds: Inequalities (4. b) All queues Qk (t). and that E {L( (0))} < ∞. The above theorem is for the case when ω(t) is i.45).31)(4. and that a “fudge factor” that decays like O(1/t) must be included if initial queue backlogs are nonzero. If we use a Cadditive approximation of the algorithm every slot t. and for achieving maximum throughput in interference networks via approximation algorithms.59) in the below proof show that these bounds hold for all time t > 0 in the case when all initial queue backlogs are zero. OPTIMIZING TIME AVERAGES useful for implementations with outofdate queue backlog information.50) where y0 is the inﬁmum time average cost achievable by any policy that meets the required constraints. and y0.d.min .44) over all α(t) ∈ Aω(t) .58) and (4. and B is deﬁned in (4. Then: 1 lim sup t→∞ t opt t−1 K E {Qk (τ )} ≤ τ =0 k=1 B + C + V [ ( ) − y0 ] opt (4. we have for each slot t: ∗ ( (t)) + V E {y0 (t) (t)} ≤ B + C + V E y0 (t) (t) L J j =1 ∗ ∗ Qk (t)E ak (t) − bk (t)  + l=1 K Zl (t)E yl∗ (t) (t) + ∗ Hj (t)E ej (t) (t) + k=1 (t) (4. Proof. We note that the bounds given in (4.50) and (4. as shown in Exercise 4. then: a) Time average expected cost satisﬁes: lim sup t→∞ opt 1 t t−1 E {y0 (τ )} ≤ y0 τ =0 opt + B +C V (4.30). Theorem 4.9.51) where [ ( ) − y0 ] ≤ y0.52) . over slots with probabilities π(ω). every slot t.i.35) are satisﬁed.
. the resulting values of y0 (t).64) into the righthandside of the drift bound (4. . and ω(t) is i.i. assume Assumption A1 holds (stated below). . Hence.36)(4. and consider the ωonly policy α ∗ (t) that yields (4. . . Because this ∗ ∗ ∗ ∗ is an ωonly policy. its limiting time average expectation for y0 (t) cannot be better than y0 : lim inf t→∞ 1 t t−1 E {y0 (τ )} ≥ y0 τ =0 opt (4.39): ∗ E y0 (t) (t) E yl∗ (t) (t) ∗ = E y0 (t) ≤ y0 + δ = E yl∗ (t) ≤ δ ∀l ∈ {1. which proves part (b).35). ∗ bk (t).59) However. K} Plugging these into the righthandside of (4.36)(4. ej (t) are independent of the current queue backlogs (t).54) (4. service. ˆk bk = = ˆl = ˆj l j Now ﬁx δ > 0.52) yields: K ( (t)) + V E {y0 (t) (t)} ≤ B + C + V ( )− k=1 Qk (t) Taking iterated expectations. and attribute values under (4. Further.39). ak (t)=ak (α ∗ (t). y ∗ (t) y (α ∗ (t). or simply from taking iterated expectations and telescoping sums): 1 t t−1 E {y0 (τ )} ≤ y0 τ =0 opt + E {L( (0))} B +C + V Vt (4. ω(t)).56) ∗ ∗ E ej (t) (t)  = E ej (t)  ≤ δ ∀j ∈ {1. L} opt ∗ ak (t).2. . yl∗ (t). THE MIN DRIFTPLUSPENALTY ALGORITHM 61 where ∗ any alternative (possibly randomized) decision α ∗ (t) ∈ Aω(t) . and so all required time average constraints are satisﬁed. because our algorithm satisﬁes all of the desired constraints of the optimization problem opt (4. . Speciﬁcally.13) of Theorem 4.5. . Plugging the ωonly policy that yields (4. bk (t).d. we have for any t > 0 (from (4. ω(t)).4.60) . over slots.58) which proves part (a) by taking a lim sup as t → ∞. J } E ∗ ∗ ak (t) − bk (t) (t) = E ∗ ∗ ak (t) − bk (t) ≤ δ ∀k ∈ {1.61)(4. ∗ ej (t) are the resulting arrival. ak (t). and rearranging terms as usual yields: 1 t t−1 K E {Qk (τ )} ≤ τ =0 k=1 B +C +V[ ( )− 1 t t−1 τ =0 E {y0 (τ )}] + E {L( (0))} t (4.52) and taking δ → 0 yields: ( (t)) + V E {y0 (t) (t)} ≤ B + C + V y0 opt (4.57) This is in the exact form for application of the Lyapunov Optimization Theorem (Theorem 4. . .53) (4. To prove part (c).55) (4.31)(4. e∗ (t) e (α ∗ (t). and we have from (4. . .2). ω(t)). summing the telescoping series. ˆ ∗ (t) b (α ∗ (t). from the above drift expression. ω(t)). all queues are mean rate stable.
1). often the structure of a particular problem allows stronger deterministic queue bounds. over slots with E {ai (t)} = λi .i. More examples are given in Exercises 4. ω(t)) − Assumption A1 ensures strong stability of the Qk (t) queues.1 WHERE ARE WE USING THE I.d.6. λ2 . . . ω(t)) ˆ E ej (α ∗ (t). However.A (equation (4. .61) (4. this inﬂuence might skew the conditional distribution of ω(t) (given (t)) unless ω(t) is independent of the past. Deﬁne ω(t)=(a1 (t). ω(t)) ˆ E yl (α ∗ (t).15. 4. it is not crucial for efﬁcient performance of the algorithm. . while the i. we used equalities of the form E yl∗ (t) (t) = E yl∗ (t) . . λ3 ) ∈ .59) as t → ∞ and using (4.i. . .53)(4. over slots. this fact is shown in Appendix 4.74. 4. where we recall that is deﬁned by the constraints 0 ≤ λi ≤ 1 for all i ∈ {1. ASSUMPTIONS? In (4. E ai (t)2 = E ai2 for i ∈ {1.60) yields: 1 lim sup t→∞ t t−1 K E {Qk (τ )} ≤ τ =0 k=1 B + C + V [ ( ) − y0 ] opt 2 ( )≤ (4. A variation on the above proof that considers probability 1 convergence is treated in Exercise 4. 4.5.3.2.3.62 4.1 (see Fig. and λ1 + λ2 + λ3 ≤ 2. However.63) ∀k ∈ {1.96)).6 EXAMPLES Here we provide examples of using the driftpluspenalty algorithm for the same systems considered in Sections 2. a) Suppose (λ1 . . . 2.I.d. .6.d. K} (4. 2.1 and 2. J } ˆ ≤ E bk (α ∗ (t).1 DYNAMIC SERVER SCHEDULING Example Problem: Consider the 3queue.3. 3}. . L} = 0 ∀j ∈ {1. min Assumption A1 (Slater Condition): There are values > 0 and ( ) (where y0 ≤ max y0 ) and an ωonly policy α ∗ (t) that satisﬁes: E y0 (α ∗ (t). 2server system described in Section 2. OPTIMIZING TIME AVERAGES Indeed. ω(t)) ˆ E ak (α ∗ (t).i. and assume ω(t) is i. 2. . assumption is crucial for the above proof. a3 (t)) as the random arrivals on slot t.64) The following is the Assumption A1 needed in part (c) of Theorem 4.9). Because past values of ω(τ ) for τ < t have inﬂuenced the current queue states (t). ω(t)) ˆ = ( ) ≤ 0 ∀l ∈ {1. a2 (t). as shown in Section 4.D. even without Assumption A1 (see Exercise 4.9. .62) (4.56) of the above proof. which hold for any ωonly policy α ∗ (t) when ω(t) is i. Taking a lim sup of (4. 3}. State the driftpluspenalty algorithm (with V = 0 and C = 0) for stabilizing all three queues.8.
0). derive a value B such that time average queue backlog satisﬁes Q1 + Q2 + Q3 ≤ B/ . Note that the problem (4. b3 (t)). It is easy to see that the problem (4. it sufﬁces to ﬁnd a value B that satisﬁes: bk (t) k k B≥ 1 2 3 k=1 1 2 E ak + E 2 3 bk (t) (t) k=1 .65)(4.23).45) and using the fact that L = J = 0 and bk (t)ak (t) ≥ 0. (1. and the set of possible action vectors is A = {(1. it is independent of (t) and so E ak (t)2  (t) = E ak .d. Thus.i. 1) consumes one unit of power per slot. 1)} (so that we choose which two queues to serve on each slot).The control action α(t) determines the server allocations. 1. 1. (1. b2 (t). b3 (t)) as follows: Minimize: − 3 Qk (t)bk (t) k=1 Subject to: (b1 (t). 1). 0. There is no penalty to minimize.6. 1. so that α(t) = (b1 (t). 1. and so ak (α(t). 1)} every slot t. λ2 . 0. 0) or b(t) = (1. Using the driftpluspenalty algorithm with V = 0. breaking ties arbitrarily. Solution: a) We have K = 3 with queues Q1 (t). ω(t)) = ak (t).49) with V = 0 reduces to observing the queue ˆ backlogs every slot t and choosing (b1 (t). over slots. EXAMPLES 63 b) Suppose the Slater condition (Assumption A1) holds for a value > 0. Q2 (t). 0). Assuming the Slater condition of part (b). where Q1 + Q2 + Q3 is the lim sup time average expected backlog in the system. The algorithm (4. (0. yet ensures all queues are mean rate stable whenever possible! ˜ b) From (4. 1. 0.48)(4. and so L = J = 0. C = 0. Further. b2 (t). 1}). 1. Q3 (t). This simple policy does not require any knowledge of (λ1 . State the driftpluspenalty algorithm (with V > 0 and C = 0) that seeks to minimize time average power subject to queue stability. b3 (t)) ∈ {(1. b2 (t). and p opt is the minimum possible time average power expenditure required for queue stability. but using the vector b(t) = (0.66) is equivalent to minimizing 3 Qk (t)[ak (t) − bk (t)] subject to the same constraints. where p is the lim sup time average expected power expenditure of the algorithm. Suppose that choosing b(t) = (1.65) (4. The control action does not affect the arrivals.There are no additional yl (t) or ej (t) attributes. Conclude that p ≤ popt + B/V . 1). we want to ﬁnd a value B that satisﬁes: 3 3 1 1 2 E ak (t) (t) + E bk (t)2  (t) B≥ 2 2 k=1 k=1 2 Because ak (t) is i.66) Then update the queues Qk (t) according to (4. c) Suppose we must choose b(t) ∈ {(1.4. λ3 ).65)(4. (0. 0). 1. (0.66) reduces to choosing the two largest queues to serve every slot. 0. it k=1 sufﬁces to minimize only the terms we can control (so we can remove the 3 Qk (t)ak (t) term k=1 that is the same regardless of our control decision). (1. 1. 2 = b (t) (because b (t) ∈ {0. 1) uses two units of power per slot. conclude that Q1 + Q2 + Q3 ≤ (B + V )/ . so y0 (t) = 0 (and so we also choose V = 0). 1)} (4. but to minimize this. 1).
This is again a simple dynamic algorithm that does not require knowledge of the rates (λ1 . b2 (t). The above problem assumes we must allocate exactly two servers on every slot.67) associated with each option: • Option (1. b2 (t).i. O(V )] tradeoff between average power and average backlog. • Option (0.6.51) that: Q1 + Q2 + Q3 ≤ B/ c) We now deﬁne penalty y0 (t) = y0 (b1 (t). (0. 1). b3 (t)) ∈ {(1. 1. 0): value = V − Q1 (t) − Q2 (t). b3 (t)) − 2 Qk (t)bk (t) ˆ k=1 Subject to: (b1 (t). 1. The problem can of course be modiﬁed if we allow the option of serving only 1 queue. OPTIMIZING TIME AVERAGES However.68) 4. λ3 ). 1)} if (b1 (t). 0. 1): value = V − Q1 (t) − Q3 (t). 0. where is deﬁned in (b).d.67) (4. Because y0.max = 2 and y0. 0) ∪ (1. 0. breaking ties arbitrarily.min = 1.3. b2 (t). or 0 queues. 1. 2. we know that the achieved time average power p (where p=y 0 ) satisﬁes p ≤ popt + B/V . where: ˆ y0 (b1 (t). This illustrates the [O(1/V ). Q3 (t)) every slot t and chooses a server allocation to solve: Minimize: V y0 (b1 (t). S2 (t)) are i. OF F }. by (4.51). every slot t we pick the option with the smallest of the above three values. (1. 1): value = 2V − Q2 (t) − Q3 (t). 1. we can choose: Because Assumption A1 is satisﬁed and V = C = 0. where B is deﬁned in part (b). at some reduced power expenditure. b2 (t).2). b2 (t).2 OPPORTUNISTIC SCHEDULING Example Problem: Consider the 2queue wireless system with ON/OFF channels described in Section 2. b3 (t)) = ˆ 1 2 if (b1 (t). over slots with Si (t) ∈ {ON. 1) Then the driftpluspenalty algorithm (with V > 0) now observes (Q1 (t). 1)} This can be solved easily by comparing the value of (4. b3 (t)). we have from (4.64 4. 0). since b1 (t) + b2 (t) + b3 (t) ≤ 2 for all t (regardless of 1 B= 2 3 2 E ak + 1 k=1 (t)). b3 (t)) ∈ {(1. 1. By (4. Q2 (t). b3 (t)) = (0.2 (see Fig.50). However. as before. 1. we know the resulting average backlog satisﬁes Q1 + Q2 + Q3 ≤ (B + (2 − 1)V )/ . suppose that new arrivals are not immediately sent into the . (4. Thus. • Option (1. Suppose channel vectors (S1 (t). λ2 . b2 (t).
Ak (t)) = αk (t)Ak (t) . b) Assuming the Slater condition of Assumption A1 holds for some value > 0. 2}. Every slot a ﬂow controller observes (A1 (t). β2 (t))] where αk (t) is a binary value that is 1 if we choose to admit the packet (if any) arriving to queue k on slot t. a) Use the driftpluspenalty method (with V > 0 and C = 0) to stabilize the queues while seeking to maximize the linear utility function of throughput w1 a 1 + w2 a 2 .This reduces to the following simple algorithm: • (Flow Control) For each k ∈ {1. The arrival and service variables are given by ˆ ak (t) = ak (αk (t). The driftpluspenalty algorithm of (4.6. A2 (t)) and makes an admission decision a1 (t). over slots and Bernoulli with P r[A1 (t) = 1] = λ1 . Speciﬁcally. β1 (t) + β2 (t) ≤ 1 The ﬂow control and transmission decisions appear in separate terms in the above problem. and 0 else. Ak (t)) and bk (t) = bk (βk (t). bk (βk (t). is feasible). A2 (t)} Packets that are not admitted are dropped. (β1 (t). The control action is given by α(t) = [(α1 (t). We thus have ω(t) = [(S1 (t). S2 (t)). and βk (t) is a binary value that is 1 if we choose serve queue k on slot t.1 (assuming this constraint. . Sk (t)) = βk (t)1{Sk (t)=ON } ˆ where 1{Sk (t)=ON } is an indicator function that is 1 if Sk (t) = ON.4. A2 (t))]. There are no other attributes yl (t) or ej (t).i. 2}.d. suppose that (A1 (t). βk (t) ∈ {0. We have penalty function y0 (t) = −w1 a1 (t) − w2 a2 (t) (so that minimizing the time average of this penalty maximizes w1 a 1 + w2 a 2 ). with the constraint β1 (t) + β2 (t) ≤ 1.i. and choose αk (t) = 0 else. over slots and Bernoulli with P r[A2 (t) = 1] = λ2 . state the resulting utility and average backlog performance. 2} . where w1 and w2 are given positive weights and a k represents the time average rate of data admitted to queue k. and so they can be chosen to minimize their respective terms separately. A1 (t)}. 1} ∀k ∈ {1. 2}. 1} ∀k ∈ {1. a2 (t) ∈ {0. (A1 (t). c) Redo parts (a) and (b) with the additional constraint that a 1 ≥ 0. to: αk (t) ∈ {0. Sk (t)) for k ∈ {1.d. α2 (t)). a2 (t). so L = J = 0. and A2 (t) is i. EXAMPLES 65 queue.48) thus reduces to observing the queue backlogs (Q1 (t). subject to the constraints: a1 (t) ∈ {0. A2 (t))] and making ﬂow control and transmission actions αk (t) and βk (t) to solve: 2 Min: −V [w1 α1 (t)A1 (t) + w2 α2 (t)A2 (t)] + k=1 Qk (t)[αk (t)Ak (t) − βk (t)1{Sk (t)=ON } ] Subj. (A1 (t). where: ˆ ˆ ak (αk (t). choose αk (t) = 1 (so that we admit Ak (t) to queue k) whenever V wk ≥ Qk (t). A2 (t)) represents the random vector of new packet arrivals on slot t. Solution: a) We have K = 2 queues to stabilize. Q2 (t)) and the current network state ω(t) = [(S1 (t). but are only admitted via a ﬂow control decision. S2 (t)). where A1 (t) is i.
to: αk (t) ∈ {0. and b1 (t) + b2 (t) ≤ 1.51): Q1 + Q2 ≤ (B + V (w1 + w2 ))/ c) The constraint a 1 ≥ 0.69) This can be viewed as introducing an additional penalty y1 (t) = 0.1 − a 1 ≤ 0. . 0] (4.d. This reduces to: • (Flow Control) Choose α1 (t) = 1 whenever V w1 + Z1 (t) ≥ Q1 (t). and update the queues Qk (t) according to (4. The driftpluspenalty algorithm (4. Because L = J = 0. we choose B to satisfy: B≥ 1 2 2 E ak (t)2  (t) + k=1 1 2 2 E bk (t)2  (t) k=1 Because arrivals are i. β2 (t)) the same as in part (a). 2} .min = −(w1 + w2 ) and y0. Choose α2 (t) = 1 whenever V w2 ≥ Q2 (t).48) reduces to observing the queue backlogs and network state ω(t) every slot t and making actions to solve Min: −V [w1 α1 (t)A1 (t) + w2 α2 (t)A2 (t)] + 2 Qk (t)[αk (t)Ak (t) − βk (t)1{Sk (t)=ON } ] k=1 +Z1 (t)[0.1 − α1 (t)A1 (t)] Subj. It follows from (4. we have from (4. Further. bk (t)2 = bk (t).1 is equivalent to 0. and choose α2 (t) = 0 else.23). β2 (t)) subject to the constraints to maximize Q1 (t)β1 (t)1{S1 (t)=ON } + Q2 (t)β2 (t)1{S2 (t)=ON } . 2}. OPTIMIZING TIME AVERAGES • (Transmission) Choose (β1 (t). Further. Speciﬁcally. and choose α1 (t) = 0 else. we simply introduce a virtual queue Z1 (t) as follows: Z1 (t + 1) = max[Z1 (t) + 0. To enforce this constraint.1 − a1 (t).max = 0. • (Transmission) Choose (β1 (t). b) We compute B from (4.69) at the end of the slot. Thus we can choose: B = (λ1 + λ2 + 1)/2. βk (t) ∈ {0. 1} ∀k ∈ {1. This reduces to the “Longest Connected Queue” algorithm of (8). β1 (t) + β2 (t) ≤ 1 Then update virtual queue Z1 (t) according to (4. 1} ∀k ∈ {1. because y0. Bernoulli. breaking ties arbitrarily.1 − a1 (t).66 4.45).i. we place the server to the queue that is ON and that has the largest value of queue backlog. they are independent of queue backlog and so E ak (t)2  (t) = E ak (t)2 = E {ak (t)} = λk .50) that: w1 a 1 + w2 a 2 ≥ utility opt − B/V where utility opt is the maximum possible utility value subject to stability.
The very large queue sizes incurred by this variable V algorithm also make it more difﬁcult to adapt to changes in system parameters.72) and (4.9 Suppose that ω(t) is i. Then all queues are mean rate stable.44). the problem (4. it is known that for typical problems (except for those with a trivial structure). its disadvantage is that we achieve only mean rate stability and not strong stability.7.32)(4.73). In fact.31)(4. . we implement a Cadditive approximation that comes within C ≥ 0 of the inﬁmum of a modiﬁed righthandside of (4. and E {L( (0))} < ∞.35) is feasible.7 VARIABLE V ALGORITHMS The [O(1/V ). VARIABLE V ALGORITHMS 67 4.This is shown below. and is analogous to diminishing stepsize methods for static convex optimization problems (133)(134).d. all required constraints (4. so that there is no ﬁnite bound on average queue size and average delay.4. .71) − y0. average backlog and delay necessarily grow to inﬁnity as we push performance closer and closer to optimal. where the V parameter is replaced with V (t). Suppose that every slot t.57) becomes: ( (t)) + V (t)E {y0 (t) (t)} ≤ B + C + V (t)y0 opt opt Taking expectations of both sides of the above and using iterated expectations yields: E {L( (t + 1))} − E {L( (t))} + V (t)E {y0 (t)} ≤ B + C + V (t)y0 Noting that E {y0 (t)} ≥ y0.35) are satisﬁed. .i. O(V )] performancedelay tradeoff suggests that if we use a variable parameter V (t) that gradually increases with time. Theorem 4. becoming inﬁnity at the optimal point (50)(51)(52)(53).min yields: E {L( (t + 1))} − E {L( (t))} ≤ B + C + V (t)(y0 opt opt (4.9) Repeating the proof of Theorem 4. Proof. (Theorem 4.} (4.70) for some constants V0 > 0 and β such that 0 < β < 1. deﬁned: V (t)=V0 (t + 1)β ∀t ∈ {0. the equation (4. and: 1 t→∞ t lim t−1 E {y0 (τ )} = y0 τ =0 opt The manner in which the V0 and β parameters affect convergence is described in the proof. then we can maintain mean rate stability while driving the time opt average penalty to its exact optimum value y0 . While this variable V approach yields the exact optimum y0 . 1. over slots with probabilities π(ω). speciﬁcally in (4. 2.min ) . whereas ﬁxed V algorithms can easily adapt.8 by replacing V with V (t) for a given slot t.
.35) are satisﬁed.43) yields the following for all t > 0: K L J E Qk (t)2 + k=1 l=1 E Zl (t)2 + j =1 opt (0))} + 2(y0 E Hj (t)2 ≤ t−1 2(B + C)t + 2E {L( − y0. taking a limit of the above as t → ∞ shows that t1 t−1 V (τ ) → 0. . and hence (by Section 4. .min ) 2 2 t t t t−1 V (τ ) τ =0 (4.72) shows that all queues are mean rate stable. which holds for all t. Summing over τ ∈ {0. t − 1} and collecting terms yields: E {L( (t))} E {L( (0))} − + V (t − 1) V (0) t−1 τ =1 1 1 E {L( (τ ))} − + V (τ − 1) V (τ ) ty0 opt t−1 E {y0 (τ )} ≤ τ =0 t−1 + (B + C) τ =0 1 V (τ ) . we have: 0≤ 1 t2 t−1 V (τ ) = τ =0 V0 t2 t−1 (1 + τ )β ≤ τ =0 V0 t2 t 0 (1 + v)β dv = V0 (1 + t)1+β − 1 1+β t2 Because 0 < β < 1. opt To prove that the time average expectation of y0 (t) converges to y0 . .71) by V (t) yields: E {L( (t + 1))} − E {L( (t))} B +C opt + E {y0 (t)} ≤ + y0 V (t) V (t) Summing the above over τ ∈ {0. .71). . L}.4)) all required constraints (4. consider again the inequality (4.min ) τ =0 V (τ ) and the same bound holds for E {Zl (t)} and E Hj (t) for all l ∈ {1. t − 1} yields: t−1 E {L( (t))} − E {L( (0))} ≤ (B + C)t + (y0 opt − y0. Dividing both sides of (4. . .72) and the same bound holds for all E {Zl (t)} /t and E Hj (t) /t. Because E {Qk (t)}2 ≤ E Qk (t)2 . . we have for all queues Qk (t): t−1 E {Qk (t)} ≤ 2(B + C)t + 2E {L( (0))} + 2(y0 opt − y0. j ∈ {1. Using 2 τ =0 this and taking a limit of (4. Dividing both sides of the above inequality by t yields the following for all t > 0: E {Qk (t)} ≤ t 1 2(B + C) 2E {L( (0))} opt + + 2(y0 − y0. .68 4. . . OPTIMIZING TIME AVERAGES The above holds for all t ≥ 0.min ) τ =0 V (τ ) Using the deﬁnition of the Lyapunov function in (4.min ) τ =0 V (τ ) Take any queue Qk (t). J }. 1. . . However. .32)(4. .
32)(4. However. and they need to be large to appropriately inform the stochastic optimizer about good decisions to take.73) 1 1 1 ≤ + V (τ ) tV (0) V0 t t−1 0 1 1 1 + dv = β (1 + v) tV (0) V0 t t 1−β − 1 1−β Taking a limit as t → ∞ shows that this term vanishes. for many such problems. and so the lim sup of the lefthandside in opt (4. we have for all τ ≥ 1: 1 1 − ≥0 V (τ − 1) V (τ ) Using this in the above inequality and dividing by t yields: 1 t However: 0≤ 1 t t−1 τ =0 t−1 E {y0 (τ )} ≤ τ =0 opt y0 1 + (B + C) t t−1 τ =0 1 E {L( (0))} + V (τ ) V (0)t (4. 2 4.8 PLACEHOLDER BACKLOG Here we present a simple delay improvement for the ﬁxedV driftpluspenalty algorithm. some queues are never served until they have at least a certain minimum amount of backlog. The third observation motivates the following deﬁnition. • It is often the case that. (PlaceHolder Values) A nonnegative value Qk is a placeholder value for network queue Qk (t) with respect to a given algorithm if for all possible sample paths. In particular.8 are insensitive to the initial condition (0).A result (4. The queue backlogs under this algorithm can be viewed as a stochastic version of a Lagrange multiplier for classical static convex optimization problems (see (45)(37) for more intuition on this).4.23).73) is less than or equal to y0 .10 place .35) and so opt the lim inf must be greater than or equal to y0 (by the Appendix 4. PLACEHOLDER BACKLOG 69 Because V (t) is nondecreasing.96)). under the driftpluspenalty algorithm (or a particular Cadditive approximation of it). • All sample paths of backlog and penalty are the same under any service order for the Qk (t) queues. we can trick the stochastic optimizer by making it think actual queue backlog is larger than it really is. To develop the technique. the policy satisﬁes all constraints (4. we have Deﬁnition 4. However. we make the following three preliminary observations: • The inﬁnite horizon time average expected penalty and backlog bounds of Theorem 4. so the limit opt exists and is equal to y0 . the results are the same if service is FirstInFirstOut (FIFO) or LastInFirstOut (LIFO).8. This allows the same performance with reduced queue backlog. provided that queueing dynamics satisfy (4.
4. L}. . 4. L} We then operate the algorithm using the Qk (t) and Zl (t) values (not the actual values Qactual (t) k and Zlactual (t)). the actual backlog is reduced by exactly Qk and Zl at every instant of time. The Fig. . We developed this method of placeholder bits in (143) for use in dynamic data compression problems and in (142) for general constrained cost minimization problems (including multihop wireless networks with unreliable channels).3. but we use placeholder backlogs Qk .4. Likewise. Because queue backlog never dips below Qk . Zl so that: k Qk (0) = Qk place place place place . K}. . . . Thus. Whenever a transmission opportunity arises. 0] = 48 as the initial backlog.8 and 4. .4). we can achieve the same performance by replacing this initial place backlog Qk with fake backlog. . This is done in the Chapter 3 example for minimizing average power expenditure subject to stability (see Section 3.33. However. none of the initial backlog Qk would ever exit the system under LIFO! Thus. K} and l ∈ {1.11 provide further examples. The reader is referred to the examples and simulations . we have: Qactual (t) = Qk (t) − Qk k place . Zl (0) = Zl place ∀k ∈ {1. 0. This does not affect the sample path and hence does not affect the time average penalty. .2 below provides further insight: Fig. we never have to serve any fake data. for all k ∈ {1. serving the actual data in any place order we like (such as FIFO or LIFO). This has already been illustrated in the example minimum average power problem of the previous chapter (Section 3. . .7)). The above discussion ensures that for all time t. . but the idea is to compute the largest possible placeholder values. we have Zl (t) ≥ Zl for place all t ≥ 0 whenever Zl (0) ≥ Zl . . . This is a “free” reduction in the queue backlog.2. . which reduces the actual backlog by an amount exactly equal k place to Qk . . Zlactual (t) = Zl (t) − Zl place ∀t ≥ 0 Because the bounds in Theorem 4. The placeholder savings is illustrated in the ﬁgure.4 place (using V = 100 and (λ1 . Then we achieve exactly the same backlog and penalty place sample paths under either FIFO or LIFO.2. K}. . l ∈ {1. a nonnegative value Zl place is a placeholder value for queue Zl (t) if for all possible sample paths. Suppose now we run the algorithm with initial queue backlog place Qk (0) = Qk for all k ∈ {1. Clearly 0 is a placeholder value for all queues Qk (t) and Zl (t). Figs.2. we initialize the actual backlog place place Qactual (0) = Zlactual (0) = 0. 3. . . and Exercises 4. the actual queue backlog under this implementation is equal to place Qactual (t) = Qk (t) − Qk for all t.8 are independent of the initial condition. It is often easy to precompute positive placeholder values without knowing anything about the system probabilities. We use Q2 = min[V − 2. OPTIMIZING TIME AVERAGES Qk (t) ≥ Qk for all slots t ≥ 0 whenever Qk (0) ≥ Qk . we transmit only actual data whenever possible. However. λ2 ) = (0.2 shows a sample path of Q2 (t) for the same example system of Section 3. and the ﬁgure illustrates that Q2 (t) indeed never drops below 48.4). the same penalty and place place backlog bounds are achieved. . with no impact on the limiting time average penalty. called placeholder backlog (142)(143).70 4. Speciﬁcally. .
the steady state backlog distribution decays exponentially in distance from the Lagrange multiplier value. 4.2: A sample path of Q2 (t) over 3000 slots for the example system of Section 3. O(V )] to [O(1/V ). Intuitively. under mild assumptions.4.This is handled elegantly in a LastInFirstOut (LIFO) implementation of the driftpluspenalty method. Next. PLACEHOLDER BACKLOG Backlog Q2(t) versus time 120 71 100 Backlog Q (t) (packets) 80 2 60 40 Placeholder value Q2 savings place 20 0 0 500 1000 1500 2000 2500 3000 t Figure 4. A disadvantage of this aggressive approach is that Lagrange multipliers must be known in advance. The idea of (37) can be illustrated easily from Fig. showing that deviations by more than this amount are rare and can be handled separately by dropping a small amount of packets. we can almost double the placeholder value in the ﬁgure.2: While the ﬁgure illustrates that Q2 (t) never drops beplace low Q2 . the idea is to show that such events occur rarely. Work in (45) shows that scaled queue backlog converges to a Lagrange multiplier of a related static optimization problem. raising the horizontal line up to a level that is close to the minimum backlog value seen in the plateau. it is shown in (37) that. which is difﬁcult as they may depend on system statistics and they may be different for each queue in the system.8. While we cannot guarantee that backlog will never drop below this new line.2. the backlog actually increases until it reaches a “plateau” around 100 packets.2: First. developed in (54). and then oscillates with some noise about this value. and work in (37) shows that actual queue backlog oscillates very closely about this Lagrange multiplier. a LIFO implementation value of Q2 . 4. given in (143)(142). a LIFO implementation would achieve all of the savings of the original placeholder place = 48 (at the cost of never serving the ﬁrst 48 packets). Speciﬁcally.4. It then develops an algorithm that uses a placeholder that is a distance of O(log2 (V )) from the Lagrange multiplier. A more aggressive placeholder technique is developed in (37). That LIFO can improve delay can be understood by Fig. The result fundamentally changes the performancebacklog tradeoff from [O(1/V ). O(log2 (V ))] (within a logarithmic factor of the optimal tradeoff shown in (52)(51)(53)).
72 4. For simplicity of exposition. α(t) ∈ Aω(t) ˆk (α(t). L}. . deﬁned in (4. ω(t)) for l ∈ {1. For simplicity. . . L}.23). K}. 4. . MODELS AND UNIVERSAL SCHEDULING Here we show that the same driftpluspenalty algorithm provides similar [O(1/V ). ω(t)) ≤ bmax ∀k ∈ {1. . and attributes yl (t) = yl (α(t). and deﬁne the Lyapunov function L( (t)) as follows: 1 2 K L( (t))= Qk (t)2 + k=1 1 2 L Zl (t)2 l=1 (4. we assume: • The exact driftpluspenalty algorithm of (4. with K queues with dynamics (4. We seek an algorithm for choosing α(t) ∈ Aω(t) every slot to minimize y 0 subject to mean rate stability of all queues Qk (t) and subject to y l ≤ 0 for all l ∈ {1. yl (·) are deterministically bounded. we eliminate the attributes ej (t) ˆ associated with equality constraints (so that J = 0). rather than a Cadditive approximation (so that C = 0). . . LIFO can achieve the more aggressive placeholder gains without computing the Lagrange multipliers! This is formally proven in (55). . Z (t)].1. so that: ˆ ˆ max 0 ≤ ak (α(t). α(t) ∈ Aω(t) 0≤b k ylmin ≤ yl (α(t). ω(t)) ≤ ylmax ∀l ∈ {0. . .2.I.76) Deﬁne (t)=[Q(t). . α(t) ∈ Aω(t) ˆ (4.D. We consider the same system as in Section 4. with the exception that we use a multislot drift analysis rather than a 1slot drift analysis.75) (4. . L}. ∀ω(t).d. OPTIMIZING TIME AVERAGES would intuitively lead to delays of “most” packets that are on the order of the magnitude of noise variations in the plateau area. bk (·).49) is used.9 NONI. . . . .i. 1. K}. . . ∀ω(t). The virtual queues Zl (t) for l ∈ {1. ω(t)) ≤ ak ˆ ∀k ∈ {1. .74) (4. . O(V )] performance guarantees when ω(t) varies according to a more general ergodic (possibly noni. The main proof techniques are the same as those we have already developed. Experiments with the LIFO driftpluspenalty method on an actual multihop wireless network deployment in (54) show a dramatic improvement in delay (by more than an order of magnitude) for all but 2% of the packets. That is. . .40).77) . . ˆ • The functions ak (·). L} are the same as before.48)(4. We then show it also provides efﬁcient performance for arbitrary (possibly nonergodic) sample paths.) process. ∀ω(t). .
t + T − 1} is any sequence of alternative decisions that satisfy α ∗ (τ ) ∈ Aω(τ ) . ω(τ ))] ˆ L + l=1 Zl (τ )yl (α(τ ).47). ω(τ )) − bk (α ∗ (τ ).78). ω(τ )) to both sides.76) hold. ω(τ )) ˆ . we have for any slot τ : K L( (τ + 1)) − L( (τ )) ≤ D + k=1 ˆ Qk (τ )[ak (α(τ ). ω(τ )) − bk (α(τ ). ω(τ )) ˆ where D is deﬁned in (4. and the constant D is deﬁned: D= 1 2 K max max [(ak )2 + (bk )2 ] + k=1 1 2 L max[(ylmin )2 .4. ω(τ )) ≤ D + V y0 (α ∗ (τ ). NONI. ω(τ ))] ˆ + k=1 Qk (t) τ =t ˆ [ak (α ∗ (τ ). and L( (t + T )) − L( (t)) + V τ =t y0 (α(τ ). . We then add V y0 (α(τ ). (ylmax )2 ] l=1 (4. . (T slot Drift) Assume (4. ω(τ )) ≤ DT + V ˆ 2 L τ =t t+T −1 y0 (α ∗ (τ ).I. ω(τ ))] ˆ L + l=1 Zl (τ )yl (α ∗ (τ ). the driftpluspenalty algorithm ensures that: Lemma 4. ω(τ )) ˆ ˆ K + k=1 ˆ Qk (τ )[ak (α ∗ (τ ). α ∗ (τ ) for τ ∈ {t. For any slot t. . any queue backlogs any integer T > 0. ω(τ )) ˆ + K t+T −1 l=1 Zl (t) τ =t [yl (α ∗ (τ ). From (4.78) Proof.D. it follows that: L( (τ + 1)) − L( (τ )) + V y0 (α(τ ).77). .74)(4.11 t+T −1 t+T −1 (t). ω(τ ))] ˆ where L( (t)) is deﬁned in (4.9.46)(4. Because the driftplusˆ penalty algorithm is designed to choose α(τ ) to deterministically minimize the righthandside of the resulting inequality when this term is added. MODELS AND UNIVERSAL SCHEDULING 73 We have the following preliminary lemma. ω(τ )) − bk (α ∗ (τ ).
Deﬁne E {T } and E T 2 as the ﬁrst and r=0 second moments of these recurrence times (so that E {T } = 1/π0 ). ω(τ ))] ˆ t+T −1 (τ τ =t Summing the above over τ ∈ {t. Further.3 Let πi represent the stationary distribution over states i ∈ S . Then the stationary distribution of ω(t) is given by: P r[ω(t) = ω] = i∈S πi pi (ω) Assume the state space S has a state “0” that we designate as a “renewal” state.1 MARKOV MODULATED PROCESSES Here we present a method developed in (144) for proving that the [O(1/V ). ω(τ )) ˆ L + l=1 K Zl (t)yl (α ∗ (τ ). − t) = (T − 2 4. t + T − 1} and using the fact that 1)T /2 yields the result.i. . . However.d. . Let (t) be an irreducible (possibly not aperiodic) Discrete Time Markov Chain (DTMC) with a ﬁnite state space S . . and for integers 3This subsection (Subsection 4. . t + T − 1}: max max Qk (τ ) − Qk (t) ≤ (τ − t) max[ak .1) assumes familiarity with DTMC theory and can be skipped without loss of continuity.} represent the recurrence times to state 0. and let the sequence {T0 . Such a distribution always exists (and is unique) for irreducible ﬁnite state Markov chains.74 4. which is the average number of slots required to get back to state i. ω(τ )) ≤ D + 2D × (τ − t) + V y0 (α ∗ (τ ). T2 . Deﬁne t0 = 0. T1 . ylmin ] Plugging these in. OPTIMIZING TIME AVERAGES where α ∗ (τ ) is any other decision that satisﬁes α ∗ (τ ) ∈ Aω(τ ) . .9.) contexts. .9. Assume for simplicity that (0) = 0. ω(τ )) − bk (α ∗ (τ ). . the value of ω(t) is chosen independently with some distribution pi (ω).i. sequence with E {Tr } = 1/π0 for all r. . The random network event process ω(t) is modulated by the DTMC (t) as follows: Whenever (t) = i. . Clearly {Tr }∞ is an i. It is well known that all πi probabilities are positive. 1/πi represents the (ﬁnite) mean recurrence time to state i. ω(τ )) ˆ + k=1 ˆ Qk (t)[ak (α ∗ (τ ). Finally. . bk ] max Zl (τ ) − Zl (t) ≤ (τ − t) max[yl . it can be shown that: ˆ L( (τ + 1)) − L( (τ )) + V y0 (α(τ ). given that we start in state i.d. . we now note that for all τ ∈ {t. it is known that second moments of recurrence time are also ﬁnite (see (132)(130) for more details on DTMCs). O(V )] behavior of the driftpluspenalty algorithm is preserved in ergodic (but noni. and the time average fraction of time being in state i is πi with probability 1.
ω(τ )) − bk (α ∗ (τ ). NONI. ω(τ )) (tr ) ≤ DE T 2 ˆ tr +Tr −1 (4. where the expectation is over the random duration of the renewal period and the random events on each slot of this period. ω(τ )) ˆ yl (α ∗ (τ ). ω(τ ))] (tr ) ˆ where α ∗ (τ ) are decisions from any other policy. MODELS AND UNIVERSAL SCHEDULING 75 r > 0 deﬁne tr as the time of the rth revisitation to state 0. Next. so that tr = the variable slot drift ( (tr )) as follows: ( (tr ))=E {L( (tr+1 )) − L( (tr )) (tr )} r j =1 Tj .D. Thus: tr +Tr −1 ( (tr )) + V E τ =tr y0 (α(τ ). ω(τ )) (tr ) ≤ DE Tr2  (tr ) ˆ tr +Tr −1 +V E L τ =tr tr +Tr −1 y0 (α ∗ (τ ). note that the conditional expectations in the next three terms on the righthandside of the above inequality can be changed into pure expectations (given that tr is a renewal time) under the assumption that the policy α ∗ (τ ) is ωonly.11 and taking conditional expectations given (tr ). ω(τ )) ˆ + K l=1 tr +Tr −1 Zl (tr )E τ =tr + k=1 Qk (tr )E τ =tr ˆ [ak (α ∗ (τ ). By plugging t = tr and T = Tr into Lemma 4.I. First note that E Tr2  (tr ) = E T 2 because the renewal duration is independent of the queue state (tr ). ω(τ ))] ˆ . We now deﬁne This drift represents the expected change in the Lyapunov function from renewal time tr to renewal time tr+1 .4. ω(τ )) (tr ) ˆ + K l=1 tr +Tr −1 Zl (tr )E τ =tr + k=1 Qk (tr )E τ =tr ˆ [ak (α ∗ (τ ). ω(τ )) (tr ) ˆ yl (α ∗ (τ ).9.79) +V E L τ =tr tr +Tr −1 y0 (α ∗ (τ ). we have the following variableslot driftpluspenalty expression: tr +Tr −1 ( (tr )) + V E τ =tr y0 (α(τ ). ω(τ )) − bk (α ∗ (τ ).
we have for all l ∈ {0. bk are the inﬁnite horizon time average values achieved for the yl (α ∗ (t). . .i. Plugging (4. . (ak − bk ) ≤ 0 for all k ∈ {1.5.9). ω(t)).81) into (4. ω(t)) processes under the ωonly policy α ∗ (t). . ω(τ )) ˆ = E {T } yl∗ ∗ ∗ = E {T } (ak − bk ) (4. However. OPTIMIZING TIME AVERAGES The expectations in the ﬁnal terms are expected rewards over a renewal period. ω(τ )) (tr ) ≤ DE T 2 + V E {T } y0 ˆ L + l=1 Zl (t)E {T } yl∗ K + k=1 ∗ ∗ Qk (t)E {T } (ak − bk ) ∗ ∗ The above holds for any time averages {yl∗ .76 4. . by Theorem 4. ω(τ )) − bk (α ∗ (τ ). the numerator is a sum of i. then either there is a single ωonly policy that opt ∗ ∗ ∗ achieves time averages y0 = y0 .79) yields: tr +Tr −1 ( (tr )) + V E τ =tr ∗ y0 (α(τ ).d. ak . 1. ak . ω(τ )) ˆ τ R−1 1 limR→∞ R r=0 Tr T0 −1 E ˆ ∗ τ =0 yl (α (τ ). . yl∗ ≤ 0 for all l ∈ {1. quantities because the policy α ∗ (t) is ωonly. . In particular. . ω(τ )) E {T } where the ﬁnal equality holds by the strong law of large numbers (noting that both the numerator and denominator are just a time average of i.81) E τ =tr ˆ [ak (α ∗ (τ ). This basic renewal ˆk ak (α ˆ theory fact can easily be understood as follows (with the below equalities holding with probability 1):4 yl∗ = = = = 1 R→∞ tR lim tR −1 τ =0 R−1 r=0 yl (α ∗ (τ ). L}. we know that if the problem is feasible. ω(τ )) ˆ τ =tr lim R−1 R→∞ r=0 Tr R−1 +T 1 limR→∞ R r=0 tr=tr r −1 yl (α ∗ (τ ). K}. K}: tr +Tr −1 E tr +Tr −1 τ =tr yl (α(τ ). bk } that can be achieved by ωonly policies. ω(t)). . L} and all k ∈ {1. ∗ (t). . . . and so by basic renewal theory (130)(66). 4 Because the processes are deterministically bounded and have time averages that converge with probability 1. . and so the sum penalty over each renewal period is independent but identically distributed. ω(τ )) ˆ tr +Tr −1 yl (α ∗ (τ ). . quantities). the Lebesgue Dominated Convergence Theorem (145) ensures the time average expectations are the same as the pure time averages (see Exercise 7. and b (α ∗ (t).d.i. .80)(4. ω(τ ))] ˆ ∗ ∗ ˆ where yl∗ .80) (4. . .
4. continue to assume that the deterministic bounds (4. E {L( (0))} < ∞. We present a technique developed in (41)(40) for stock market trading and modiﬁed in (39)(38) for use in wireless networks with arbitrary trafﬁc. then: opt (a) The penalty satisﬁes (4. channels and mobility. and if the slackness condition of Assumption A1 holds. Plugging this into the above yields: tr +Tr −1 ( (tr )) + V E τ =tr y0 (α(τ ). so that Lemma 4. .12 (Markov Modulated Processes (144)) Assume the ω(t) process is modulated by the DTMC (t) as described above. Similarly.D. usual “equilibrium” notions of optimality are not relevant. called the T slot lookahead metric. then sum average queue backlog is O(V ) (144). so that y 0 ≤ y0 + O(1/V ). Theorem 4.76) hold. (c) If the Slackness Assumption A1 holds. This leads to the following theorem.4.2 NONERGODIC MODELS AND ARBITRARY SAMPLE PATHS Now assume that the ω(t) process follows an arbitrary sample path.82). possibly one with nonergodic behavior. MODELS AND UNIVERSAL SCHEDULING 77 or there is an inﬁnite sequence of ωonly policies that approach these averages. then all queues Qk (t) are strongly stable with average backlog O(V ). NONI.9. ω(τ )) ≤ y0 ˆ τ =0 opt + DE T 2 opt = y0 + O(1/V ) V E {T } (4. it can be shown that the middle term has a lim sup that is equal to the lim sup time average expected penalty.11 applies. Because ω(t) follows an arbitrary sample path. the boundedness assumptions (4. . . assuming E {L( (0))} < ∞. E {T }. (b) All queues are mean rate stable. Thus.9. it can be shown that if the problem is feasible then all queues are mean rate stable. R − 1}. ω(τ )) (tr ) ≤ DE T 2 + V E {T } y0 ˆ opt Taking expectations of the above. let T and R be positive integers.76) hold. . and so y l ≤ 0 for all l ∈ {1. we have: y 0 = lim sup t→∞ 1 t t−1 E y0 (α(τ ). and so we use a different metric for evaluation of the driftpluspenalty algorithm. and E T 2 do not depend on V . and consider the ﬁrst RT slots . L}. . Speciﬁcally. summing the resulting telescoping series over r ∈ {0. . .I.74)(4. and dividing by V RE {T } yields: E {L( (tR ))} − E {L( (0))} 1 + E V E {T } R E {T } R tR −1 τ =0 y0 (α(τ ). ω(τ )) ≤ y0 ˆ opt + DE T 2 V E {T } Because tR /R → E {T } with probability 1 (by the law of large numbers).74)(4. However. and that the driftpluspenalty algorithm is used every slot t. If the problem is feasible. .82) where we note that the constants D.
ω(τ )) ˆ L rT +T −1 ≤ DT 2 + V K y0 (α ∗ (τ ). L} ˆ ˆ [ak (α(τ ).83) and yields cost cr . . Feasibility is often guaranteed when there is an “idle” action. we assume the inﬁmum cost is achievable. . For the rth frame (for r ∈ {0. ˆ ˆ 6 For simplicity. (bk (τ ))] every slot τ to be taken within the convex hull of the set of all possible values of ˜ ˜∗ ˆ [(yl (α. R − 1}).6 It is generally im∗ (τ ) decisions. . ω(τ ))). . Let α ∗ (τ ) represent the decisions that ∗ solve the T slot lookahead problem (4. ω(τ )) + ˆ Zl (rT ) l=1 τ =rT [yl (α ∗ (τ ). and can still be plugged into Lemma 4.83) (r+1)T −1 yl (α(τ ). ω(τ ))] ≤ 0 ∀k ∈ {1. ω(τ )) − bk (α ∗ (τ ). (r + 1)T − 1} ∗ The value cr thus represents the optimal empirical average penalty for frame r over all policies that have full knowledge of the future ω(τ ) values over the frame and that satisfy the constraints. . ω(τ )) ≤ 0 ∀l ∈ {1. 5Theorem 4. . . . . ω(τ ))] ˆ τ =rT rT +T −1 + k=1 Qk (rT ) τ =rT ˆ [ak (α ∗ (τ ). which can be used on all slots to trivially satisfy the constraints in the form 0 ≤ 0.83) over this frame to achieve cost cr . .5 We assume throughout that the constraints are feasible for the above problem. . . (ak (τ )). we can derive the same result by taking a limit over policies that approach the inﬁmum. ω(τ )) − bk (α(τ ). ω(τ ))] ˆ ∗ ≤ DT 2 + V T cr where the ﬁnal inequality follows by noting that the α ∗ (τ ) policy satisﬁes the constraints of the ∗ T slot lookahead problem (4. and treats the ω(τ ) values in this interval as known quantities: Minimize: Subject to: 1) τ =rT (r+1)T −1 cr = 1 T (r+1)T −1 y0 (α(τ ). K} ˆ 2) τ =rT 3) α(τ ) ∈ Aω(τ ) ∀τ ∈ {rT . Else. . ω(τ ))). 1. RT − 1} being divided into R frames of size T . but we skip this extension for simplicity of exposition. as these would require knowledge of the ω(τ ) values up to possible to solve for the α T slots into the future. . .13 holds exactly as stated in the extended case when c∗ is redeﬁned by a T slot lookahead problem that alr ˜∗ lows actions [(yl∗ (τ )). ∗ we deﬁne cr as the optimal cost associated with the following static optimization problem. . (bk (α. This problem has variables α(τ ) ∈ {rT . . . the α ∗ (τ ) values exist. . . . . . .78 4. such as the action of admitting and transmitting no data. . called the T slot lookahead problem. . ω(τ )))] under α ∈ Aω(τ ) . Frame r consists of slots τ ∈ {rT . . OPTIMIZING TIME AVERAGES {0. (r + 1)T − 1}. . However. (r + 1)T − 1}.11 to yield the following (using t = rT and T as the frame size): rT +T −1 L( (rT + T )) − L( (rT )) + V rT +T −1 τ =rT y0 (α(τ ). (ak (α. . ω(τ )) ˆ τ =rT (4.
.86) (4. . The above discussion proves part (a) of the following theorem: Theorem 4.83) is feasible for every frame r ∈ {0. If the driftpluspenalty algorithm is implemented every slot t. and rearranging terms yields: 1 RT RT −1 τ =0 1 y0 (α(τ ).85) holds for all integers R > 0. . ω(τ )) + T y0 . When R is large. ω(τ )) ≤ lim sup ˆ R→∞ R R−1 r=0 ∗ cr + DT /V ∗ where cr is the optimal cost in the T slot lookahead problem (4.78). ω(τ ))] ≤ − ∀k ∈ {1. . the ﬁnal term on the righthandside above goes to zero (this term is exactly zero if L( (0)) = 0). ω(τ )) − bk (α(τ ). . L} ˆ τ =0 (c) Suppose there exists an > 0 and a sequence of decisions α(τ ) ∈ Aω(τ ) that satisﬁes the following ˜ slackness assumptions for all frames r: rT +T −1 yl (α(τ ). In particular. we have that the time ∗ average cost is within O(1/V ) of the time average of the cr values.7 1 lim sup t→∞ t t−1 τ =0 1 y0 (α(τ ).I. . ω(τ )) + T y0 ≤ t =0 y0 (α(τ ). .4.13 (Universal Scheduling) Assume the ω(t) sample path satisﬁes the boundedness assumptions (4. NONI. . Fix any integers R > 0 and T > 0. . R − 1} (for any integer R > 0) yields: RT −1 R−1 L( (RT )) − L( (0)) + V τ =0 y0 (α(τ ).85) where we recall that α(τ ) represents the decisions under the driftpluspenalty algorithm. . R − 1}. and D is deﬁned in (4.74)(4. . ω(τ )) ≤ τ =0 y0 (α(τ ). using the fact that L( (RT )) ≥ 0. Dividing ˆ ˆ ˆ τ both sides by t and taking limits shows these limits are equal. L} ˆ ˜ 1 T rT +T −1 τ =rT τ =rT (4. ω(τ )) ≤ DT 2 R + V T ˆ r=0 ∗ cr (4. . . MODELS AND UNIVERSAL SCHEDULING 79 Summing the above over r ∈ {0.85).76). . The inequality (4. .84) Dividing by V T R. . Thus. . .9. 1. .D.87) ˆ ˜ [ak (α(τ ). (b) All actual and virtual queues are rate stable. . and assume the T slot lookahead problem (4. we have τ =0 y0 (α(τ ). ω(τ )) ≤ ˆ R R−1 r=0 ∗ cr + DT L( (0)) + V VTR (4. ω(τ )) ≤ 0 ∀l ∈ {1. K} ˆ ˜ 7 It is clear that the lim sup over times sampled every T slots is the same as the regular lim sup because the y (·) values are bounded. ω(τ )) ≤ 0 ∀l ∈ {1. . then: (a) The time average cost over the ﬁrst RT slots satisﬁes (4. and so we have: 1 lim sup t→∞ t t−1 yl (α(τ ).83) for frame r. ˆ0 t/T T t/T T min max Indeed. and that initial queue backlog is ﬁnite.
bk ] k=1 Proof.11 ˜ and using (4.76) imply that there is a ﬁnite constant F > 0 such that L( (RT )) ≤ F R for all R. This proves part (b). OPTIMIZING TIME AVERAGES Then: lim sup t→∞ 1 t t−1 K Qk (τ ) ≤ τ =0 k=1 DT + max min V (y0 − y0 ) + T −1 2 K max max max[ak . Part (a) has already been shown in the above discussion. L}. . . . .87) to yield: rT +T −1 K L( (rT + T )) − L( (rT )) + V τ =rT y0 (α(τ ). . . . it can then be shown that limR→∞ Qk (RT )/(RT ) = 0 for all k ∈ {1.86)(4. Further. . . . these limits that sample only on slots RT (as R → ∞) are clearly the same when taken over all t → ∞ because the queues can change by at most a constant proportional to T in between the sample times. ω(τ )) ≤ DT ˆ 2 max + V T y0 −T k=1 Qk (rT ) and hence: K max min L( (rT + T )) − L( (rT )) ≤ DT 2 + V T (y0 − y0 ) − T Qk (rT ) k=1 K T −1 max min ≤ DT 2 + V T (y0 − y0 ) − K T −1 k=1 j =0 max max j max[ak . bk ] k=1 Summing the above over r ∈ {0. .84) plus the boundedness assumptions (4. . . We provide a summary of parts (b) and (c): The inequality (4. Part (c) follows by plugging the policy α(τ ) for τ ∈ {rT . bk ] k=1 . bk ] k=1 j =0 max min = DT 2 + V T (y0 − y0 ) − k=1 j =0 K T −1 Qk (rT + j ) + Qk (rT + j ) + (T − 1)T 2 K max max max[ak . .74)(4. (r + 1)T − 1} into Lemma 4. .1. R − 1} yields: RT −1 K L( (RT )) − L( (0)) + τ =0 k=1 max min Qk (τ ) ≤ RDT 2 + RV T (y0 − y0 ) + R(T − 1)T 2 K max max max[ak . K} and limR→∞ Zl (RT )/(RT ) = 0 for all l ∈ {1. By an argument similar to part (a) of Theorem 4.80 4. .
. but where Q1 (t) < Q1 (t + 1).87) are modiﬁed to also include slackness in the yl (·) constraints. this affects the error term DT /V . requiring V to also be increased as T increases.5. Q2 ) such that L(Q) = 2.86)(4. . Increasing V creates a larger queue backlog. .10. However. If the slackness assumptions (4. show that Qk ≤ 50 for all k ∈ {1.85) holds for all R and T . 4. and let f (t) and g(t) be discrete time real valued processes. √ b) If L(Q) > 25. Q2 (t)) = 2. and hence it can be viewed as a family of bounds that apply to the same sample path under the driftpluspenalty algorithm. It is remarkable that the driftpluspenalty algorithm can closely track such an “ideal” T slot lookahead algorithm. For any constants Q ≥ 0. . . We thus see a similar [O(1/V ). . as it is one that is deﬁned in terms of an ideal policy with T slot lookahead. Exercise 4. a ≥ 0. show that Qk > 50/K for at least one queue k ∈ {1.5. Suppose there is a nonnegative function L(Q(t)) such that . dividing by RT and taking a lim sup as R → ∞ yields: 1 lim sup RT R→∞ RT −1 K Qk (τ ) ≤ τ =0 k=1 DT + max min V (y0 − y0 ) T −1 + 2 K max max max[ak . QK ) and L(Q) = 2 K Q2 . K}. Note also that increasing the ∗ value of T changes the frame size and typically improves the cr values (as it allows these values to be achieved with a larger future lookahead). c) Let K = 2.2. O(V )] costbacklog tradeoff for this sample path context. EXERCISES 81 Using L( (RT )) ≥ 0. Give an example where L(Q1 (t). Plot the region of all nonnegative vectors (Q1 .3. R−1 ∗ 1 The target value R r=0 cr that we use for comparison does not represent the optimal cost that can be achieved over the full horizon RT if the entire future were known.1. K}. However. L(Q1 (t + 1). Let Q(t) be a discrete time vector process with Q(0) = 0. . b ≥ 0. Exercise 4. k=1 k √ a) If L(Q) ≤ 25.4. Q2 (t + 1)) = 2. . when T is large it still represents a meaningful target that is not trivial to achieve.10 EXERCISES 1 Let Q = (Q1 . . . show that: (max[Q − b. bk ] k=1 2 Inequality (4. a modiﬁed argument can be used to show the worst case queue backlog is bounded for all time by a constant that is O(V ) (see also (146)(39)(38)). . 0] + a)2 ≤ Q2 + b2 + a 2 + 2Q(a − b) Exercise 4. . Also plot for L(Q) = 2.
there is an ωonly policy α ∗ (t) (which depends on h) that satisﬁes: E y0 (α ∗ (t).max ˆ E yl (α ∗ (t). . ω.48)(4. hJ ) that consists only of values 1 and −1. Exercise 4. Suppose that ω is a Gaussian random variable with mean m and variance σ 2 . . . . . α = (α1 . . ω(t)) ≤ y0.8. . .5.89) (4.91) . ω(t)) − ˆ (4. . .90) ∀k ∈ {1. L} ˆ E ej (α ∗ (t). ω(t)) ≤ − ∀l ∈ {1. . ω)} change? c) Let ω = (ω1 . K ) be nonnegative vectors. Deﬁne c(α. ω(t)) ≤ E bk (α ∗ (t). ω) = ω2 + ω(3 − 2α) + α 2 . ) = K V αk − k log(1 + αk ωk ) . Compute E {c(α. OPTIMIZING TIME AVERAGES L(0) = 0. but assume the following modiﬁed Slater condition holds: Assumption A2: There is an > 0 such that for any J dimensional vector h = (h1 . ) }. . We choose α subject to 0 ≤ αk ≤ 1 for all k.88) (4. how choosing α(t) ∈ Aω(t) according to (4. (Probability 1 Convergence) Consider the ﬁxedV driftpluspenalty algorithm (4.6.82 4. .4. . . a) Compute the optimal choice of α (as a function of the observed ω) to minimize E {c(α. Does the optimal policy change? Does E {c(α. Deﬁne c(α. = ( 1 . . . .8. using the game of opportunistically minimizing an expectation described in Section 1. Design a policy that observes ω and chooses α to minimize E {c(α.44). ω)} under your optimal policy. .49) minimizes the righthandside of (4.49). ω(t)) = hj ∀j ∈ {1. (Opportunistically Minimizing an Expectation) Consider the game described in Section 1.48)(4. ω. . αK ). (The DriftPlusPenalty Method) Explain. and such that its conditional drift (Q(t)) satisﬁes the following every slot τ and for all possible Q(τ ): (Q(τ )) + E {f (τ )Q(τ )} ≤ E {g(τ )Q(τ )} a) Use the law of iterated expectations to prove that: E {L(Q(τ + 1))} − E {L(Q(τ ))} + E {f (τ )} ≤ E {g(τ )} b) Use telescoping sums together with part (a) to prove that for any t > 0: 1 t t−1 τ =0 E {f (τ )} ≤ 1 t t−1 τ =0 E {g(τ )} Exercise 4. . . and αk αj = 0 for k = j . . . . . ω)}. where log(·) denotes the natural logarithm k=1 and V ≥ 0. Hint: First compute the solution assuming that αk > 0. ωK ). . . K} (4. J } ˆ ˆ E ak (α ∗ (t). b) Suppose that ω is exponentially distributed with mean 1/λ. Exercise 4.
d. H(t)) as deﬁned in Section 4. .4. it can be shown that for all t and all possible H(t).88)(4. . ej (t). .max ⎡ − ⎣ L K J ⎤ Hj (t)⎦ Zl (t) + l=1 k=1 Qk (t) + j =1 b) Assume that (4. SK (t)) is a vector of channel attenuations. The data is stored in separate queues Q(t) = (Q1 (t). . .10. QK (t)) for transmission over K different channels. Power is allocated subject to P (t) ∈ A. where Pmax is a peak power constraint. and that the fourth moment assumption (4. . .3. aK (t)) every slot t.92) to obtain: (t.16)(4. H(t)) + V E {y0 (t)H(t)} ≤ B + C + V y0. hJ ) by: hj = −1 1 if Hj (t) ≥ 0 if Hj (t) < 0 Using this h. K}. and satisﬁes 0 ≤ Sk (t) ≤ 1 for all k.53)(4. we have (compare with (4. ak (t). . . max max and that 0 ≤ ak (t) ≤ ak for all t. . and plug this into the righthandside of (4.23). Assume that the vectors a(t) and S (t) are i.52)): ∗ (t. .91) into the righthandside of (4.56). PK (t)) according to bk (t) = log(1 + Sk (t)Pk (t)).17) hold for y0 (t). . Assume that S (t) is known at the beginning of each slot t. H(t)) + V E {y0 (t)H(t)} ≤ B + C + V E y0 (t)H(t) L J j =1 ∗ ∗ Qk (t)E ak (t) − bk (t)  H(t) + l=1 K Zl (t)E yl∗ (t)H(t) + ∗ Hj (t)E ej (t)H(t) + k=1 (4.7.92) to yield a probability 1 bound on the lim sup time average of y0 (t). . . Service variables bk (t) are determined by a power allocation vector P (t) = (P1 (t). for some ﬁnite constants ak . plug the ωonly policy α ∗ (t) from (4. The update equation is (4. . Use this with part (a) to obtain probability 1 bounds on the lim sup time average queue backlog via Theorem 4.i.). again by Theorem 4. EXERCISES 83 Using H(t) and (t.4. ω(t)). etc. Exercise 4.4. (Min Average Power (21)) Consider a wireless downlink with arriving data a(t) = (a1 (t). . where A is the set of all power vectors with at most one nonzero element and such that 0 ≤ Pk ≤ Pmax for all k ∈ {1. over slots. and S (t) = (S1 (t). . . .1. . where log(·) denotes the natural logarithm. α ˆl l a) Deﬁne h = (h1 .18) holds. bk (t) represent decisions under any other (possibly randomized) action ∗ (t) that can be made on slot t (so that y ∗ (t) = y (α ∗ (t). c) Now consider the ωonly policy that yields (4.92) ∗ ∗ ∗ where yl∗ (t). . . . . .
By how much can placeholder bits reduce average backlog from the bound given in part (c) of Exercise 4.7? This exercise computes a simple place placeholder Qk that is not the largest possible. P (t)]. Use Theorem 4. J = 0. Exercise 4.9.7. q ≥ 0. OPTIMIZING TIME AVERAGES a) Using ω(t)=(a(t). . y0 (t) = − K θk xk (t). . q such that V > 0. The control action is now a joint ﬂow control and power allocation decision α(t) = [x(t). where xk (t) is a ﬂow control decision for slot t. A more detailed analysis in (143) computes a larger placeholder value. . b) Assume we use an exact implementation of the algorithm in part (a) (so that C = 0). Use Theorem 4. 0] + xk (t) Exercise 4.84 4. θK } are a given set of positive weights. . Conclude that the algorithm from Exercise 4. 0] for all t greater than or equal to the time t ∗ for which this inequality ﬁrst holds.8c to give a bound on the time average sum of queue backlog in all queues. . av av We want the average power expenditure over each link k to be less than or equal to Pk . c) Assume Assumption A1 holds for a given > 0. . y0 (t) = K Pk (t). if q < V . α(t) = P (t).8. and that the problem is feasible. p ≥ 0. with the exception that it is now a wireless uplink. state the driftpluspenalty k=1 av algorithm for this problem. we have Vp − q log(1 + sp) > 0 whenever p > 0 (where log(·) denotes the natural logarithm). b) Use part (a) to conclude that Qk (t) ≥ max[V − log(1 + Pmax ). where {θ1 . (PlaceHolder Backlog) a) Show that for any values V . L = 0.8 to conclude that all queues are mean rate stable.7 chooses Pk (t) = 0 whenever Qk (t) < V . s. and compute a value B such that: 1 lim sup t→∞ t opt t−1 K E {Pk (τ )} ≤ Pav + B/V τ =0 k=1 opt where Pav is the minimum average power over any stabilizing algorithm. . The new goal is to maximize K a weighted sum of admission rates k=1 θk x k subject to queue stability and to all average power constraints. Note that the constraints P k ≤ Pk should be enforced by virtual queues Zk (t) of the form (4. 0 ≤ s ≤ 1. S (t)). state the driftk=1 pluspenalty algorithm for a ﬁxed V in this context. made subject to the constraint 0 ≤ xk (t) ≤ ak (t) for all t. K} (satisfying Pk max ). where Pk av ≤ P is a ﬁxed constant for each k ∈ {1. (Maximum Throughput Subject to Peak and Average Power Constraints (21)) Consider the same system of Exercise 4. a) Using J = 0. . and a ﬁxed V . L = K.40) with a suitable deﬁnition of yk (t). and queue backlogs now satisfy: Qk (t + 1) = max[Qk (t) − bk (t). . p.
2.8 to conclude that all queues are mean rate stable (and hence all average power constraints are met). It thus uses Qk (t − T ) in place of Qk (t) in (4. Let α ideal (t) be the optimal decision of (4.}. and 0 ≤ ak (t) ≤ amax and 0 ≤ bk (t) ≤ bmax for all k and all t. ∀T ∈ {1. ω(t)) + ˆ Qk (t)[ak (α approx (t). ω(t)) − bk (α approx (t).4. it does not have access to the current queue backlogs Qk (t). k e) Use part (d) and the sample path inputoutput inequality (2. . d) Show that the algorithm is such that Pk (t) = 0 whenever Zk (t) > Qk (t). This shows that queues are deterministically bounded. ω(t))] + C ˆ k=1 K V y0 (α ideal (t). Speciﬁcally. (OutofDate Queue Backlog Information) Consider the Kqueue problem with L = J = 0.10. Assume that all queues are initially empty. and compute values Qmax such that Qk (t) ≤ Qmax for all t ≥ 0 and k k all k ∈ {1. ω(t))] ≤ ˆ ˆ Qk (t)[ak (α ideal (t). K V y0 (α ˆ approx (t). 1. and compute a value B such that: lim inf t→∞ 1 t t−1 K θk E {xk (τ )} ≥ util opt − B/V τ =0 k=1 where util opt is the optimal weighted sum of admitted rates into the network under any algorithm that stabilizes the queues and satisﬁes all average power constraints.10. Conclude that max max max av Zk (t) ≤ Zk . and only receives delayed information Qk (t − T ) for some integer T ≥ 0. . The network controller attempts to perform the driftpluspenalty algorithm (4. c) Show that the algorithm is such that xk (t) = 0 whenever Qk (t) > V θk . EXERCISES 85 b) Use Theorem 4. compute a value C such that: Exercise 4. even without the Slater condition of Assumption A1. However. .49) in the ideal case when current queue backlogs Qk (t) are used.48). . . . .48)(4. K}. 3. Use part (e) to provide a guarantee on the lifetime of the link. .} f ) Suppose link k is a wireless transmitter with a battery that has initial energy Ek .49) every slot. and let α approx (t) be the implemented decision that uses the outofdate queue backlogs Qk (t − T ). where Zk is deﬁned Zk =Qmax + (Pmax − Pk ). ω(t)) − bk (α ideal (t).48)(4. Show that α approx (t) yields a Cadditive approximation for some ﬁnite constant C.3) to conclude that for any positive integer T . the total power expended by each link k over any T slot interval is deterministically av max less than or equal to T Pk + Zk . . for some ﬁnite constants amax and bmax . 2. That is: t0 +T −1 τ =t0 av max Pk (τ ) ≤ T Pk + Zk ∀t0 ∈ {0. . ω(t)) + ˆ k=1 .
S2 (t)) are independent and i. a2 ) = (3. which expends 8 units of power over the ﬁrst 9 slots under the policy that serves the queue that yields the largest Qi (t)bi (t) value. V = 200. . OPTIMIZING TIME AVERAGES This shows that we can still optimize the system and provide stability with outofdate queue backlog information. select a different set of channels to serve over slots {0. V = 0.i. P r[(S1 . where Si (t) ∈ {G. and queue backlogs for a two queue wireless downlink. S2 ) = (M. M. and 3 packets can be served when a channel is “Good. S2 ) = (G. where random delays without a deterministic bound are also considered.86 4. P r[(S1 . 4.3. (Simulation) Consider a 2queue system with time varying channels (S1 (t). B}.11. .” and 1 when “Bad. Repeat for V = 1. etc. V = 5. representing “Good. and remains idle if this is negative for both i = 1 and i = 2. Find the empirical average power expenditure and the empirical average queue backlog over 106 slots when V = 0. over slots with the same empirical distribution as that achieved over 9 slots in the table. 2)] = 1/9.3: Arrivals. . J = L = 0. Note: You should ﬁnd that the resulting minimum power that is approached as V is increased is the same as part (b). 2}.” “Bad” channel conditions for i ∈ {1. Simulate the system using the driftpluspenalty policy of choosing the queue i that maximizes Qi (t)bi (t) − V whenever this quantity is nonnegative. and is strictly less than the empirical power expenditure of part (a).” Exactly one unit of power is expended when we serve any channel (regardless of its condition). 1. .d. a) Given the full future arrival and channel events as shown in the table. c) Repeat part (b) in the case when arrival vectors (a1 (t). channel conditions. V = 100. Treatment of delayed queue information for Lyapunov drift arguments was perhaps ﬁrst used in (147). B)] = 2/9.” “Medium. V = 50. Exercise 4. V = 20. Arrivals Channels Max Qi bi Policy t a1 (t) a2 (t) S1 (t) S2 (t) Q1 (t) Q2 (t) 0 3 2 G M 1 0 0 G M 2 3 1 M B 3 0 0 M M 4 0 1 G B 5 1 1 G M 6 0 0 M B 7 1 0 M G 8 0 0 G B 0 0 3 2 0 2 3 2 1 2 0 3 1 2 1 1 2 0 Figure 4. a2 (t)) and channel vectors (S1 (t). . M)] = 3/9. which is a special case of the driftpluspenalty algorithm for K = 2. and given Q1 (0) = Q2 (0) = 0. All packets have ﬁxed length. A sample path example is given in Fig. How much power is used? b) Assume these arrivals and channels are repeated periodically every 9 slots. V = 10. but that minimizes the amount of power required to do so (so that more than 1 slot will be idle).” 2 when “Medium. S2 (t)). Only one channel can be served per slot. 8} that also leaves the system empty on slot 9. so that P r[(a1 .
If α(t) = {1. {3. If α(t) = {3. with no loss of power performance. using Qplace placeholder packets would reduce average backlog by exactly this amount. and creating a null packet if no packets from users 1 or 2 are available. User 1 desires to send data to user 2 and user 2 desires to send data to user 1. 0]=Qplace for all t. Q2 (t + 1) = max[Q2 (t) − b2 (t). Conclude that Qi (t) ≥ max[ V /3 − 3.12. S4 (t)). the base station makes a transmission decision α(t) ∈ {{1. provided that this inequality holds for Qi (0). users 1 and 2 can decode the new data if they keep copies of the previous packets they sent.4: An illustration of the 2 phases forming a cycle. 2. The queueing dynamics from one cycle to the next thus satisfy: Q1 (t + 1) = max[Q1 (t) − b1 (t). (Wireless Network Coding) Consider a system of 4 wireless users that communicate to each other through a base station (Fig. 0] + a4 (t) . . Each cycle t is divided into 2 phases: In the ﬁrst phase. a similar XOR operation is done for user 3 and 4 packets. Let t ∈ {0. In the second phase. Base Station Base Station p4 p1 4 1 3 p2 p3 2 p3+p4 p3+p4 p3+p4 p3+p4 2 1 3 4 Phase 1: Uplink transmission of different packets pi Phase 2: Downlink Broadcast of an XORed packet Figure 4. Exercise 4. 2}.4. for example. As in (148). 0] + a3 (t) . using TDMA or FDMA in the ﬁrst phase). 4. We assume all packets are labeled with sequence numbers. 3. 4}}. users 1. 0] + a1 (t) Q3 (t + 1) = max[Q3 (t) − b3 (t). 2}. 0] + a2 (t) . user 3 desires to send data to user 4 and user 4 desires to send data to user 3. OF F }. and 4 all send a new packet (if any) to the base station (this can be accomplished. Q4 (t + 1) = max[Q4 (t) − b4 (t). 2. . Hence. where Si (t) ∈ {ON. Assume that downlink channel conditions are timevarying and known at the beginning of each cycle. . XORing with 0 if only one packet is available. Likewise.10. Only users with ON channel states can receive the transmission. with channel state vector S(t) = (S1 (t). and the sequence numbers of both XORed packets are placed in a packet header.4). 4}. the headofline packets for users 1 and 2 are XORed together.} index a cycle. S3 (t). EXERCISES 87 d) Show that queue i is only served if Qi (t) ≥ V /3 . 1. S2 (t). The XORed packet (or null packet) is then broadcast to all users.
for k ∈ {1. λ2 . . over cycles.88 4. λ4 ) vectors for which there exists an Sonly policy α ∗ (t) such that: ˆ ˆ ˆ ˆ E b1 (α ∗ (t). c) Consider all possible Sonly algorithms that choose a transmission mode as a stationary and random function of the observed S(t) (and independent of queue backlog). λ3 + . Design a control policy that observes S(t) and chooses actions α(t) to minimize the righthandside of (4. ON) or S(t) = (ON.i. OF F. (A modiﬁed algorithm) Suppose the conditions of Theorem 4. where s = (S1 . ON. etc. . S(t)) = 1 0 if Sk (t) = ON and k ∈ α(t) otherwise and ak (t) is the number of packets arriving over the uplink from node k during cycle t (notice that data destined for node 1 arrives as the process a2 (t). S4 ). m(2) = 1. b2 (α ∗ (t). . Compute (Q(t)) and show it has the form: k=1 (Q(t)) ≤ B − E 4 k=1 Qk (t)[bk (t) − λm(k) ] Q(t) (4. λ4 + ) ∈ for some value > 0. ON. 1} is the number of packets transmitted over the downlink to node k during cycle t. ON. ON ) or (ON.d. S(t)). 2} under states (ON. m(4) = 3. and with bounded second moments. bk (t) ∈ {0. Conclude that the driftminimizing policy of part (b) makes all queues strongly stable. λ4 . 4}. 2} whenever S(t) = (ON. ON. ω(t) and choose an action α(t) ∈ Aω(t) that minimizes the 8 It can also be shown that an algorithm that always chooses α(t) = {1.13. However. 2.93) where m(1) = 2. Give an example where it is impossible to stabilize the system if the controller always chooses α(t) = {1. OF F. with probabilities πs = P r[S(t) = s]. λ2 . λ3 ) Suppose that (λ1 . satisfying: ˆ bk (t) = bk (α(t). and provide an upper bound on time average expected backlog. . although this is not always true. so that (λ1 + . over cycles with rate λk = E {ak (t)}. λ1 . ON. OF F ). a) Suppose that S(t) = (ON. 2} is the best choice in this case.8 1 b) Deﬁne L(Q(t)) = 2 4 Qk (t)2 . Deﬁne the Sonly throughput region as the set of all (λ1 . and where B < ∞. S(t)). λ2 + . suppose that every slot t we observe (t). ON. but where a more intelligent control choice would stabilize the system. Arrivals ak (t) are i. See (10) for related examples in the context of a 3 × 3 packet switch.d. S(t)) ≥ (λ2 . OF F. b4 (α ∗ (t).93) over all feasible control policies. OPTIMIZING TIME AVERAGES where Qk (t) is the integer number of packets waiting in the base station for transmission to destination k. OF F ) and when there are indeed two packets to serve will not necessarily work—we need to take queue length into account. S3 . ON. Suppose that S(t) is i.i. m(3) = 4. ON) and that Qk (t) > 0 for all queues k ∈ {1. 4}. Exercise 4. λ3 .8 hold. λ3 . S2 . λ4 ) is interior to . S(t)).). 3. It is tempting to assume that mode α(t) = {1. b3 (α ∗ (t). .
. and: ˆ bk (α(t).5. β(t). α(t) ∈ ˆ {1. ω(t)) = Sk (t) 0 if α(t) = k if α(t) = k (A(t). .The output of the compressor is a compressed packet ˆ of random size a(t) = a(A(t). Deﬁne functions m(A. we have: E a(A(t). β(t). β. c(t))A(t) = A. β.10. β. ˆ ˆ Note that a(·) and d(·) are random functions. . . β. Further assume there is a ﬁnite constant σ 2 such that for all (A. c(t) = c ˆ ˆ E d(A(t). ω(t)) (t) .d. β. c(t) = c ≤ σ2 ≤ σ2 . Further. EXERCISES 89 exact driftpluspenalty expression ( (t)) + V E y0 (α(t). β.i.44).8 hold. d(t)) is i. 1. the arrival process a(t) is generated as the output of a data compression operation. β). ω(t)) = ak (t). (t)) Compressor Distortion d(t) a(t) Q(t) b(t) Figure 4. c). causing a random distortion d(t) = d(A(t). β(t) = β. (DistortionAware Data Compression (143)) Consider a single queue Q(t) with dynamics (2.i. c(t)). β(t)) and chooses a data compression option c(t) ∈ {0. (S1 (t). .d. β. c(t))2 A(t) = A. a network controller observes (A(t). . . a) Show that the same performance guarantees of Theorem 4. c(t)). .i. c(t) = c ˆ ˆ δ(A. β(t). m(A. Assume the pair (a(t). δ(A. β(t). .4. c(t))A(t) = A. yl (t) = ej (t) = 0. over all slots with ˆ the same A(t). c) = E a(A(t). where β(t) ∈ B . rather than minimizing ˆ the upper bound on the righthandside of (4. β(t). . . Speciﬁcally. c(t). over slots. . b) Using (2. every slot t a new packet of size A(t) bits arrives to the system (where A(t) = 0 if no packet arrives). 4. 0) = A. β(t). . c(t) = c Assume that c(t) = 0 corresponds to no compression. β(t). As shown in Fig. β(t) = β. so that m(A. where B represents a set of different data types. c(t))2 A(t) = A.d.14. Every slot t. β(t) = β. c) as follows: Exercise 4. ω(t) = [(a1 (t). Assume the pair (A(t). . 0) = 0 for all (A. β. aK (t)).14.1). so that m(A.2). SK (t))]. state this algorithm (for C = 0) in the special case when L = J = 0. . . C) = 0 for all (A. C}. β(t) = β. where b(t) is an i. This packet has metadata β(t). c) = E d(A(t). ak (α(t). transmission rate process with bounded second moments. assume that c(t) = C corresponds to throwing the packet away. K} (representing a single queue that we serve every slot).5: A dynamic data compression system for Exercise 4. β(t)) is i. c) and δ(A. where c(t) indexes a collection of possible data compression algorithms. β).
1 L( (t)) = 2 K k=1 1 wk Qk (t) + 2 2 L l=1 1 Zl (t) + 2 2 J Hj (t)2 j =1 where {wk }K are a positive weights. so that node 3 can send and receive at the same time. 3}. How does the driftpluspenalty algorithm change? k=1 Y(t) a1(t) X(t) a2(t) Q2(t) 2(t) Q1(t) 1(t) Q3(t) 3(t) Figure 4. 1. A(t). ∀t . . . 3}.}. β(t).6. . 2. It is clear that this problem is feasible.6: The 3node multihop network for Exercise 4. 2. as we can always choose c(t) = C (although this would maximize distortion). β(t). 2. (MultiHop with Orthogonal Channels) Consider the 3node wireless network of Fig. c(t) Q(t) ˆ = E {m(A(t).90 4. We want to design an algorithm that minimizes the time average expected distortion d subject to queue stability. • (Power Allocation) Let μi (t) be the transmission rate at node i on slot t. c(t))Q(t). Exercise 4.16. c) and δ(A. 4. Use the driftpluspenalty framework (with ﬁxed V ) to design such an algorithm.15. β. c(t))Q(t) ˆ = E E a(A(t). c(t))Q(t)} (Weighted Lyapunov Functions) Recompute the driftpluspenalty bound in Lemma 4. This transmission rate depends on the channel state Si (t) and the power allocation decision Pi (t) by the following function: μi (t) = log(1 + Pi (t)Si (t)) ∀i ∈ {1. for i ∈ {1. c) are known. It has orthogonal channels. Hint: Use iterated expectations to claim that: E a(A(t).The network operates in discrete time with unit time slots t ∈ {0. β. β(t). OPTIMIZING TIME AVERAGES Assume the functions m(A. β(t).6 under the following modiﬁed Lyapunov function: Exercise 4.16. The network controller makes power allocation decisions and routing decisions.
over slots. state the driftpluspenalty algorithm for this problem. where a1 (t) is the amount of bits routed to queue 1.d. a1 (t) + a2 (t) = X(t) ∀t 0 ≤ Pi (t) ≤ 1 ∀i ∈ {1.d. ∗ E a2 (t) . P1 (t). a2 (t) ≥ 0 .10. Suppose there is a stationary and randomized policy that observes (X(t). a2 (t). P3 (t)). b) Suppose that V = 20. ∀t • (Routing) There are two arrival processes X(t) and Y (t).i.i. E log(1 + Pi∗ (t)Si (t)) for i ∈ {1. S3 (t)) and determines the power allocation decisions (P1 (t). 2. P3 (t)) based only on the observed vector ∗ (X(t). S2 (t). 3}. and makes ran∗ ∗ ∗ ∗ ∗ domized decisions (a1 (t). ∀t a) Using a ﬁxed parameter V > 0. State desirable properties for the expectations of E a1 (t) . S1 (t) = S2 (t) = S3 (t) = 1. a1 (t) + a2 (t) = X(t) ∀t It can be shown that the Lyapunov drift (Q(t)) satisﬁes the following every slot t: (Q(t)) ≤ B + Q1 (t)E {a1 (t) − μ1 (t)Q(t)} + Q2 (t)E {a2 (t) − μ2 (t)Q(t)} +Q3 (t)E {μ1 (t) + Y (t) − μ3 (t)Q(t)} where B is a positive constant. Your properties should be in the form of desirable inequalities. 2. The X(t) process can be routed to either queue 1 or 2. S2 (t). 3}. S2 (t). Y (t). S1 (t). The network controller observes X(t) every slot and makes decisions for (a1 (t). made subject to the following constraints: 0 ≤ Pi (t) ≤ 1 ∀i ∈ {1. S3 (t)). 3} a1 (t) ≥ 0 . P2 (t). P2 (t). Q1 (t) = 50. S1 (t). a2 (t) ≥ 0 . Let a1 (t) and a2 (t) represent the routing decision variables. over slots with E {X(t)} = λX and E {Y (t)} = λY . S2 (t). S3 (t)) every slot t. The Y (t) process goes directly into queue 3.4. the network controller observes the channels (S1 (t). Suppose (S1 (t). The algorithm should have separable power allocation and routing decisions. S3 (t)) is i. and a2 (t) is the amount of bits routed to queue 2. Y (t)) is i. We want to design a dynamic algorithm that solves the following problem: Minimize: Subject to: 1) 2) 3) P1 + P2 + P3 Qi (t) is mean rate stable ∀i ∈ {1. 2. Y (t). taking units of bits. What should the value of P1 (t) be under the driftpluspenalty algorithm? (give a numeric value) c) Suppose (X(t). EXERCISES 91 where log(·) denotes the natural logarithm. 2. . Q2 (t) = Q3 (t) = 20. a2 (t)) subject to the following constraints: a1 (t) ≥ 0 . Every time slot t. 3} that would ensure your algorithm of part (a) would make all queues mean rate stable with time average expected power expenditure given by: P 1 + P 2 + P 3 ≤ φ + B/V where φ is a desired value for the sum time average power.
5.} such that: i→∞ lim x(ti ) = x 4. Now note that for any slot τ and assuming that ω(τ ) has its stationary distribution. this can be viewed as the region of all oneslot expectations that can be achieved via randomized decisions when the ω(t) variable takes values according to its stationary distribution. (ak (t)). (ak (α(τ ). That is: ˆ E [(yl (α(τ ). . This expectation is in because any sample path of events that lead to the policy choosing α(τ ) on slot τ simply affects the conditional distribution of α(τ ) given the observed ω(τ ). ω(τ )). . ω(τ )). we have for any slot t > 0: Lemma 4. .11. ω(τ )). and hence the expectation can be equally achieved by the ωonly policy that uses the same conditional distribution. then there must exist a convergent subsequence {x(ti )}∞ that converges to a point x in the closure of X (see. (ej (α(τ ). (ej )J=1 . ω(τ ))] ∈ ˆ ˆ ˆ where the expectation is with respect to the random ω(τ ) (which has the stationary distribution) and the possibly random α(τ ) that is made by the policy in reaction to the observed ω(τ ).9 This observation directly leads to the following simple lemma.31)(4. limit points. and convergent subsequences.92 4. then for any policy that chooses α(τ ) ∈ Aω(τ ) over time (including policies that are not ωonly). (bk (t))] deﬁned in Section 4. A14 of (145)). t2 .35) can be deﬁned over the class of ωonly policies.11 APPENDIX 4. for i=1 example. Equivalently.17 1 t t−1 τ =0 ˆ E [(yl (α(τ ). It is easy to show that is also convex by using an ωonly policy that is a mixture of two other ωonly policies. we use the well known fact that if {x(t)}∞ is an inﬁnite sequence of vectors t=0 that are contained in some bounded set X ⊆ Rk (for some ﬁnite integer k > 0). (ej (α(τ ). ω(τ )). If ω(τ ) is in its stationary distribution for all slots τ .5 This appendix characterizes the set of all possible time average expectations for the variables [(yl (t)). It concludes with a proof of Theorem 4.25)(4.94) 9 We implicitly assume that the decision α(τ ) on slot τ has a well deﬁned conditional distribution. The boundedness assumptions (4. even if that decision is from an arbitrary policy that is not an ωonly policy. . closed sets. ω(τ ))] ∈ ˆ ˆ ˆ (4. ω(τ )). ω(τ )). (bk (α(τ ). In particular. (a k )K . The proof involves set theoretic concepts of convex sets. (bk )K ] values that can be achieved by l=0 k=1 k=1 j ωonly policies.30) ensure that the set is bounded.2. which shows that optimality for the problem (4. there is a vector x in the closure of X and an inﬁnite sequence of increasing positive integers {t1 . Speciﬁcally.A — PROVING THEOREM 4.1 THE REGION Let represent the region of all [(y l )L . the oneslot expectation under any decision α(τ ) ∈ Aω(τ ) is in the set . (ej (t)). (bk (α(τ ). (ak (α(τ ). OPTIMIZING TIME AVERAGES 4. t3 .
25)(4. then r ∗ can be achieved arbitrarily closely (i. K} ∗ It can be shown that. J } .96) Thus. the ﬁnite horizon time average expectation under any policy cannot escape the set . then any limit point r ∗ is inside and hence (by deﬁnition of ) can be exactly achieved as the oneslot average under some ωonly policy. . y0 = y0 . . (ak ). Suppose the problem (4. (bk )] of {r (t)}∞ is in the set ˜ . .35) can yield a time average expected penalty opt ∗ ∗ smaller than y0 . (ej ).11. L} . This naturally leads to the following characterization of optimality in terms of ωonly policies. . . for any arbitrarily small δ > 0). ω(t)) ≥ y0 ˆ (4. 4. . If is not closed.A — PROVING THEOREM 4. (ej ). then r ∗ is in the closure of .32)(4. Proof. . . Each term in the time average is itself in . no algorithm that satisﬁes the constraints (4. (ak ). In particular.35) is feasible. Theorem 4.32)(4. a) Any limit point [(yl ). and that the system satisﬁes the boundedness assumptions (4. . ˜ is closed and bounded.94) over a subsequence of times ti that increase to inﬁnity. If ˜ is nonempty. and let r (t) represent the tslot expected time average in the lefthandside of (4. because is 2 r∗ Thus.18 Suppose the ω(t) process is stationary with distribution π(ω). within a distance δ. . Further.11. If the set is closed.e. if nonempty.2 CHARACTERIZING OPTIMALITY Deﬁne ˜ as the set of all points [(yl )..35). (bk )] ∈ ˜ . . . We now opt ∗ show that y0 = y0 . if is a limit point of the time average on the lefthandside of (4. (bk )] in the closure of that satisfy: (4. (ak ). b) The time average expected penalty under the algorithm α(t) satisﬁes: lim inf t→∞ 1 t t−1 τ =0 ∗ E y0 (α(t). and any inﬁnite horizon time average that converges to a limit point cannot escape the closure of . and y0 is the minimum time average penalty achievable by such ωonly policies.95) yl ≤ 0 ∀l ∈ {1. ak ≤ bk ∀k ∈ {1. the set ˜ is t=1 nonempty. .30) and the law of large numbers assumption speciﬁed in Section 4. by an ωonly policy. and so the time average is also in convex. Let α(t) be any control policy that satisﬁes the constraints (4. Intuitively. ej = 0 ∀j ∈ {1.2. (ej ).5 93 Thus. APPENDIX 4. the set ˜ is the set of all time averages achievable by ωonly policies that meet the required time average constraints and that have time average expected arrivals less than or equal to time average expected ∗ service.4.31)(4. deﬁne y0 as the minimum value of y0 for which there is a point [(yl ).94) under this policy.
. (ak ). We now show that it is possible to achieve y0 . our “lawoflargeˆ number” assumption on ω(t) ensures the time averages of ak (α ∗ (t). (b∗ )] follows there is an ωonly algorithm α l j k k ∗ on every slot t.99) and that yield well deﬁned time averages [(yl ). (bk )] be the point in ˜ ∗ ∗ ∗ ∗ that has component y0 . ω(τ )) − bk (α(τ ). (bk )]. . Let [(yl∗ ). (ej ). (ej ). J }. .17 that r (t) is always inside the (bounded) set . ω(t)). . .32) and (4. ˜ is a subset of .33) are satisﬁed. (ak ). (ej ). Then by part (a). . and any such limit point is in the closure of . and so its y0 component (being the lim inf value in (4. (ej ). For ∗ ∗ ∗ simplicity. and then taking a convergent subsequence {ti } of {ti } that ensures the r (ti ) values converge to a limit point). Now consider a particular limit point [(yl ). (ej ). . that yield the lim inf by: 1 i→∞ ti lim ti −1 E y0 (α(τ ). . L} . . ˆ . by the samplepath inequality (2. It ∗ (t) with expectations exactly equal to [(y ∗ ). ej = 0 for all j ∈ {1. . let {ti }∞ be a subsequence of nonnegative integer time slots that increase i=1 to inﬁnity. we know that [(yl ). and so [(yl∗ ). (ak ). . Because is closed. (ak ). Hence. the time average penalty is y0 . . . and so y0 = y0 . Thus. (ak ). (ej ). J } Further. and hence we ﬁnd that: ak ≤ bk ∀k ∈ {1. note from Lemma 4. (bk )] ∈ ˜ .32).94 4. ω(τ )) = lim inf ˆ τ =0 t→∞ 1 t t−1 E y0 (α(τ ). it must be the case that: yl ≤ 0 ∀l ∈ {1. (bk )] for r (ti ) (such a subsequence can be constructed by ﬁrst taking a subsequence {ti } that achieves the lim inf. ω(t)) and bk (α ∗ (t). .33) are satisﬁed ∗ because yl∗ ≤ 0 for all l ∈ {1. (e∗ ). Further. K} (4. (4. ω(τ )) ˆ τ =0 (4. it has a limit point. . It follows that no control algorithm that satisﬁes the required constraints has a time average opt ∗ ∗ ∗ expected penalty less than y0 . (ak ).98) The results (4. L}. is the smallest possible y0 value of all points in ˜ . To prove part (b). OPTIMIZING TIME AVERAGES Proof. and the constraints (4. taking a limit of the above over the times ti → ∞ yields 0 ≥ ak − bk . we have for all ti > 0 and all k: 1 E {Qk (ti )} E {Qk (0)} − ≥ ti ti ti ti −1 τ =0 (4. . and let {ti }∞ be the subsequence of nonnegative integer time i=1 slots that increase to inﬁnity and satisfy: i→∞ lim r (ti ) = [(yl ). (ak ). . . (bk )] Because the constraints (4.97) ˆ E ak (α(τ ). To prove part (a). (bk )] is in the set ˜ . . . (ej ).97) and (4. (a ∗ ).5).99)) is ∗ ∗ greater than or equal to y0 because y0 . (bk )] ∈ . ω(τ )) ˆ Because the control policy makes all queues mean rate stable.98) imply that the limit point [(yl ). ej = 0 ∀j ∈ {1. we consider only the case when is closed.
4.11.36)(4. 2 We use this result to prove Theorem 4. (ak ). (ej ). (Theorem 4.39) with δ = 0. is closed. (ak ). yielding (4.39) with δ = 0. (b∗ )]. ∗ ak ∗ bk The above proof shows that if the assumptions of Theorem 4. . If is closed.5) Let [(yl∗ ). Because ∗ ≤ b∗ and the second moments of a (t) and b (t) are bounded by a ﬁnite constant σ 2 for all t. APPENDIX 4. (ej ). (ak ). (ej ). (a ∗ ). (bk )] be the point in ˜ that has component y0 (where opt ∗ y0 = y0 by Theorem 4. (bk )] ∈ and so there exists an ωonly policy α ∗ (t) that achieves the av∗ ∗ ∗ erages [(yl∗ ).5 hold and if the set then an ωonly policy exists that satisﬁes the inequalities (4. (ak ). (bk )] and thus satisﬁes (4. to [(yl 2 j k k α ∗ (t).18). Note by deﬁnition that ˜ is in the closure of .36)(4. then ∗ ∗ ∗ [(yl∗ ). ∗ ∗ ∗ ∗ Proof.39) for any δ > 0. ak k k k the Rate Stability Theorem (Theorem 2.A — PROVING THEOREM 4. ∗ ∗ ∗ then [(yl∗ ). (bk )] is a limit point of and so there is an ωonly policy that gets arbitrarily close ∗ ).36)(4.5 95 achieved under the ωonly algorithm are equal to and with probability 1. If is not closed.5. (ej ). (e∗ ).4) ensures that all queues Qk (t) are mean rate stable.
.
. A more general problem. without the entrywise nondecreasing assumption.4) (5.5) arise.5) where φ(x) is a concave.97 CHAPTER 5 Optimizing Functions of Time Averages Here we use the driftpluspenalty technique to develop methods for optimizing convex functions of time averages.7) . has maximum derivative νm . L} All queues Qk (t) are mean rate stable α(t) ∈ Aω(t) ∀t (5. and is 0 when xm = 0. yL (t)) be attribute vectors. service. . To begin.2) (5. consider a discrete time queueing system Q(t) = (Q1 (t). y (t) = (y1 (t). ω(t)) and yl (t) = yl (α(t). QK (t)) with the standard update equation: Qk (t + 1) = max[Qk (t) − bk (t). . . xM (t)). . ω(t)). for example. 0] + ak (t) (5. . xm (t) = xm (α(t). when maximizing network throughpututility. ω(t)). . . ω(t)). the arrival.4. continuous. and attribute variables are determined by general functions ak (t) = ak (α(t). or all RM otherwise).1) Let x(t) = (x1 (t). . Problems with the structure (5. Another common example is: M φ(x) = m=1 log(xm ) (5. and entrywise nondecreasing utility function deﬁned over an appropriate region of RM (such as the nonnegative orthant when xm (t) attributes are nonnegative. ˆ ˆ bk (t) = bk (α(t). .2)(5. where x represents a vector of achieved throughput and φ(x) is a concave function that measures network fairness.3) (5. . . This is useful because each component function log(1 + νm xm ) has a diminishing returns property as xm is increased. . . .6) where νm are positive constants. An example utility function that is useful when attributes xm (t) are nonnegative is: M φ(x) = m=1 log(1 + νm xm ) (5. As before. . Consider now the folˆ ˆ lowing problem: Maximize: Subject to: 1) 2) 3) φ(x) y l ≤ 0 ∀l ∈ {1. is considered in Section 5. and for ﬁnding local optimums for nonconvex functions of time averages.
}. An example nondifferentiable function that is concave. 1. M}} where γm. and it will ensure that the auxiliary variables that we soon deﬁne are also bounded.8 in terms of convex functions f (x) with a reversed inequality E {f (X )} ≥ f (E {X }). The problem (5.2)(5.max are ﬁnite constants (we typically choose γm. The function φ(x) does not need to be differentiable. While this x ∈ R constraint may limit optimality. so that the driftpluspenalty framework of Chapter 4 can be applied. and entrywise nondecreasing is φ(x) = min[x1 . The key step in analyzing the transformed problem is Jensen’s inequality. . because a linear function of a time average is equal to the time average of the linear function. .5) whenever the rectangle R is chosen large enough to contain a time average attribute vector x that is optimal for the original problem.12 for a special case when it is the same). maximizing the time average of φ(x(t)) is typically not the same as maximizing φ(x) (see Exercise 5. . .4 JENSEN’S INEQUALITY Assume the concave utility function φ(x) is deﬁned over the rectangle region x ∈ R. XM ) be a random vector that takes values in R. .0.min and γm. . It is easy to show that Jensen’s inequality for concave functions . xM ]. γM (τ )) be an inﬁnite sequence of random vectors that take values in the set R for τ ∈ {0.9) Indeed. Now let γ (τ ) = (γ1 (τ ). . . . continuous. . Below we transform the problem by adding a rectangle constraint and auxiliary variables in such a way that the transformed problem involves only time averages (not functions of time averages). .9) by deﬁning f (X ) = −φ(X ). x2 . φ opt is exactly equal to the maximum utility of the original problem (5. .2)(5.5) is different from all of the problems seen in Chapter 4 because it involves a function of a time average. In the case when φ(x) is concave but nonlinear. . this immediately implies (5. augmented with the following rectangle constraint: x∈R (5. It does not conform to the structure required for the driftpluspenalty framework of Chapter 4 unless the function φ(x) is linear. .min = 0 in cases when attributes xm (t) are nonnegative). . Jensen’s inequality for concave functions states that: E {X } ∈ R . .3 THE RECTANGLE CONSTRAINT R Deﬁne φ opt as the maximum utility associated with the above problem.8) where R is deﬁned: R={(x1 . . 2. 5. even though we stated Jensen’s inequality in Section 1. Further. . and E {φ(X )} ≤ φ(E {X }) (5. .This rectangle constraint is useful because it limits the x vector to a bounded region. . . . OPTIMIZING FUNCTIONS OF TIME AVERAGES This corresponds to the proportional fairness objective (1)(2)(5). Let X = (X1 .min ≤ xm ≤ γm.max ∀m ∈ {1.98 5. xM ) ∈ RM γm. . it is clear that φ opt increases to the maximum utility of the problem without this constraint as the rectangle R is expanded.0. 5.
16) (5. y ∗ . γM (t)) be a vector of auxiliary variables chosen within the set R every slot.99 directly implies the following for all t > 0 (see Exercise 5.12) where we temporarily assume the above limits exist.13)(5. . . .0.18) relates to the original problem as follows: Suppose we have an algorithm that makes decisions α ∗ (t) and γ ∗ (t) over time t ∈ {0. We consider the following modiﬁed problem: Maximize: Subject to: 1) 2) 3) 4) 5) φ(γ ) y l ≤ 0 ∀l ∈ {1. we can deﬁne y0 (t)= − φ(γ (t)). the utility function evaluated at the time average expectation γ is greater than or equal to the time average expectation of φ(γ (t)). . . . 5. and hence can be solved with the driftpluspenalty framework of Chapter 4. M} All queues Qk (t) are mean rate stable γ (t) ∈ R ∀t α(t) ∈ Aω(t) ∀t (5. . . rather than functions of time averages.12).5 AUXILIARY VARIABLES Let γ (t) = (γ1 (t). This transformed problem (5. we can conclude by Jensen’s inequality that φ(γ ) ≥ φ(γ ).14)(5.18) and yields a maximum value for the objective (5. . For simplicity. That is.3): 1 t 1 t t−1 t−1 γ (τ ) ∈ R τ =0 and and 1 t 1 t t−1 φ(γ (τ )) ≤ φ τ =0 t−1 τ =0 1 t t−1 γ (τ ) τ =0 (5. assume the solution meets all constraints (5. L} γ m ≤ x m ∀m ∈ {1. That is. . We have used the fact that the rectangle R is a closed set to conclude that a limit of vectors in R is also in R. Indeed.11) E {γ (τ )} ∈ R τ =0 E {φ(γ (τ ))} ≤ φ 1 t t−1 E {γ (τ )} τ =0 Taking limits of (5. . φ(γ ∗ ) exist. . φ(γ )= lim 1 t→∞ t t−1 E {φ(γ (τ ))} τ =0 (5. . This transformed problem involves only time averages. .17) (5.} to solve the transformed problem.10) (5. assume all limiting time average expectations x∗ .14) (5. Then: l .15) (5. γ ∗ . γ (t)) subject to α (t) ∈ [Aω(t) . . .13) (5.18) where φ(γ ) and γ = (γ 1 . whenever the limits of γ and φ(γ ) exist. and deﬁne a new control action α (t) = (α(t). . γ M ) are deﬁned in (5. where φ(γ ∗ ) is the maximum objective value. In summary. . 1.13).11) as t → ∞ yields: γ ∈ R and φ(γ ) ≤ φ(γ ) where γ and φ(γ ) are deﬁned as the following limits: γ = lim 1 t→∞ t t−1 τ =0 E {γ (τ )} . 2. . R].
given (t). A Cadditive approximation chooses γ (t) ∈ R and α(t) ∈ Aω(t) such that. Z (t).2)(5.18) with virtual queues Zl (t) and Gm (t): Zl (t + 1) = max[Zl (t) + yl (t).5) (so that y ∗ ≤ 0 for all l and all queues Qk (t) are mean rate stable). Thus. observe G(t) and choose γ (t) to solve: Maximize: V φ(γ (t)) − M Gm (t)γm (t) m=1 Subject to: γm.20) + L 2 l=1 Zl (t) + M 2 m=1 Gm (t) Assume that ω(t) is i.15) and the entrywise nondecreasing property of φ(x). bk (t). and the second inequality is Jensen’s inequality.13)(5.25)(4.. A 0additive approximation thus performs the following: • (Auxiliary Variables) For each slot t. . and l the resulting time average attribute vector x∗ satisﬁes φ(x∗ ) ≥ φ(γ ∗ ). . OPTIMIZING FUNCTIONS OF TIME AVERAGES • The decisions α ∗ (t) produce time averages that satisfy all desired constraints of the original problem (5. L} Gm (t + 1) = max[Gm (t) + γm (t) − xm (t).2)(5. M} Deﬁne (t)=[Q(t).max ∀m ∈ {1. This is because: φ(x∗ ) ≥ φ(γ ∗ ) ≥ φ(γ ∗ ) where the ﬁrst inequality is due to (5. . xm (t).That is.d. ∀m ∈ {1. This is shown in Exercise 5. 0] . The above two observations imply that φ(x∗ ) ≥ φ opt .23) . 0] .1 SOLVING THE TRANSFORMED PROBLEM Following the driftpluspenalty method (using a ﬁxed V ). and that yl (t). the righthandside of (5.2. .19) (5. . we enforce the constraints y l ≤ 0 and γ m ≤ x m in the transformed problem (5. • φ(γ ∗ ) ≥ φ opt . . . .i. .min ≤ γm (t) ≤ γm.13)(5.100 5. the maximum utility of the transformed problem (5.28).21) + k=1 Qk (t)E {ak (t) − bk (t) (t)} + m=1 Gm (t)E {γm (t) − xm (t) (t)} where D is a ﬁnite constant related to the worstcase second moments of yl (t). M} (5. ak (t). G(t)].5) are satisﬁed while producing a utility that is at least as good as φ opt . and deﬁne the Lyapunov function: 1 L( (t))= 2 K 2 k=1 Qk (t) (5.18) is greater than or equal to φ opt .22) (5. ∀l ∈ {1. It is easy to show the driftpluspenalty expression satisﬁes: L ( (t)) − V E {φ(γ (t)) (t)} ≤ D − V E {φ(γ (t)) (t)} + l=1 K M Zl (t)E {yl (t) (t)} (5. 5. ak (t). designing a policy to solve the transformed problem ensures all desired constraints of the original problem (5. . . bk (t) satisfy the boundedness assumptions (4. xm (t). .21) is within C of its inﬁmum value.
If ω(t) is i. . .19) and (5. (5. ω(t)) ≤ − ˆ γm. . (5. and E {L( (0))} < ∞.25) Suppose the boundedness assumptions (4. and choose α(t) ∈ Aω(t) to minimize: ˆ Qk (t)[ak (α(t). . and entrywise nondecreasing.24) Deﬁne φ max as an upper bound on φ(γ (t)) for all t. .5). the problem (5. ω(t)) + ˆ l=1 k=1 − m=1 Gm (t)xm (α(t).5). ω(t))] ˆ M Zl (t)yl (α(t).30) (5.31) . The following extended result provides average queue bounds and utility bounds for all slots t. y l (t) are deﬁned in (5. . y l (t) by: 1 x(t)= t t−1 τ =0 1 E {x(τ )} .2 Suppose the assumptions of Theorem 5. and a ﬁnite constant φ such that the following Slatertype conditions hold: E yl (α ∗ (t).29) (5.2)(5. over slots and any Cadditive approximation is used every slot. ω(t)) − bk (α(t). ω(t)) ≤ 0 ˆ ∗ ˆ E ak (α (t). Theorem 5.min ≤ E xm (α ∗ (t). .1).8) (including the constraint x ∈ R).d.1 (5. an ωonly policy α ∗ (t). ω(t)) − bk (α ∗ (t).25) hold. observe L K (t) and ω(t). SOLVING THE TRANSFORMED PROBLEM 101 • (α(t) Decision) For each slot t.max . . then all actual and virtual queues are mean rate stable and: lim inf φ(x(t)) ≥ φ opt − (D + C)/V t→∞ (5.28) (5. . K} ∀m ∈ {1. γ2. . ∀l ∈ {1. M} (5. γm.5.1 hold. (5. .25)(4. and the actual queues Qk (t) by (5. .20).28).27) lim sup y l (t) ≤ 0 .max ˆ ˆ φ(E x(α ∗ (t). ω(t)) ˆ • (Queue Update) Update the virtual queues Zl (t) and Gm (t) according to (5. L} t→∞ where φ opt is the maximum utility of the problem (5.2)(5. L} ∀k ∈ {1. . (a) If there is an > 0. . . γ (t). Deﬁne time average expectations x(t). concave.8) (including the constraint x ∈ R) is feasible.max ) < ∞ Theorem 5.max . ω(t)) ≤ γm. . γ (t)= t t−1 τ =0 1 E {γ (τ )} . . and x(t).26) (5. the function φ(x) is continuous. ω(t)) ) = φ ∀l ∈ {1. and assume it is ﬁnite: φ max =φ(γ1. .24).i. .1. . y l (t)= t t−1 E {yl (τ )} τ =0 (5.
102 5. the algorithm developed in this section (or Cadditive approximations of the algorithm) often result in deterministically bounded queues. The righthandside of (5.13)(5. then Gm (t) ≤ V νm + γm.i. we have: 1 t t−1 K E {Qk (τ )} ≤ τ =0 k=1 D + C + V φ(γ ∗ ) − φ + E {L( (0))} t where φ(γ ∗ ) is the maximum objective function value for the transformed problem (5. it can be shown that if (5. The assumption that all queues are initially empty. we have: M φ(γ (t)) − φ(x(t)) ≤ m=1 νm γm (t) − xm (t) (5. As before. M}.d. We note that the νm constraint (5.6). better than the O(1/ t) bound given in the above theorem.32) then for all t > 0. Proof.23). if γ (t) is chosen by (5. (b) If all virtual and actual queues are initially empty (so that (0) = 0) and if there are ﬁnite constants νm ≥ 0 such that for all γ (t) and all x(t). (Theorem 5. we have: ( (t)) − V E {φ(γ (t)) (t)} ≤ D + C − V φ(γ ) + l=1 K ∗ L Zl (t)E yl∗ (t) (t) (5. and if xm (t) ≥ γm. .18). but not for the proportionally fair utility function in (5.31) hold (see ﬂow control examples in Sections 5.max for all t √ (provided this holds at t = 0).21). For example. regardless of whether or not the Slater assumptions (5.min for all t.33) √ where E {Gm (t)} /t is O(1/ t) for all m ∈ {1. OPTIMIZING FUNCTIONS OF TIME AVERAGES then all queues Qk (t) are strongly stable and for all t > 0.25. Further.32) holds.32) needed in part (b) of the above theorem is satisﬁed for the example utility function in (5.1) Because the Cadditive approximation comes within C of minimizing the righthandside of (5. .55. made in part (b) of the above theorem. E {Gm (t)} /t is O(1/t).33) would be modiﬁed by subtracting the additional term E {L( (0))} /V t otherwise.34) + k=1 ∗ ∗ Qk (t)E ak (t) − bk (t) (t) + M m=1 ∗ ∗ Gm (t)E γm − xm (t) (t) . This is because the auxiliary variables transform the problem to a structure that is the same as that covered by the ergodic theory and universal scheduling theory of Section 4.28)(5. In this case. we have: φ(x(t)) ≥ φ opt − D+C − V M m=1 νm E {Gm (t)} t (5. (38)(39)(136)(42).22)(5. .9.7). is made only for convenience. the same algorithm can be shown to perform efﬁciently when the ω(t) process is noni.3 and Exercises 5.7). .
1 This together with Theorem 4. ak (t).5. E {Gm (t)} /t.8) implies feasibility of the transformed problem (5.36) (5. we know that all queues Qk (t). ω(t)) ≤ δ ˆ ∗ Assuming that δ = 0 for convenience and plugging the above into (5. and it can choose γ (t) = x for all t.26). ω(t)) ≤ δ ˆ ˆ E ak (α ∗ (t). rearranging (5.35) This is in the exact form for application of the Lyapunov Optimization Theorem (Theorem 4. . Gm (t) √ are mean rate stable (in fact. ω(t)) − bk (α ∗ (t). . by using iterated expectations and telescoping sums in the above inequality). . M} − xm (α (t). .5 implies that (5. . SOLVING THE TRANSFORMED PROBLEM 103 ∗ ∗ ∗ ∗ ∗ = (γ1 .18). 2 . L} ∀k ∈ {1. it can be shown that: lim inf φ(γ (t)) ≤ lim inf φ(x(t)) t→∞ t→∞ Using this in (5. . . we have for all t > 0: φ(γ (t)) ≥ φ opt − (D + C)/V − E {L( (0))} /(V t) Taking a lim inf of both sides yields: lim inf φ(γ (t)) ≥ φ opt − (D + C)/V t→∞ (5. γM ) is any vector in R. . .1). Mean rate stability of Zl (t) and Gm (t) together with Theorem 2. and yl∗ (t). we know that E {Qk (t)} /t. we have: 1 t t−1 E {φ(γ (τ ))} ≥ φ opt − (D + C)/V − E {L( (0))} /(V t) τ =0 By Jensen’s inequality for the concave function φ(γ ). .27) holds.2)(5. and E {Zl (t)} /t are O(1/ t)). bk (t). xm (t) are from any alternative (possibly randomized) policy α ∗ (t) ∈ Aω(t) .13)(5. and that for all m ∈ {1. . . .1.5). . there is an ωonly policy α ∗ (t) ∈ Aω(t) and a vector γ ∗ ∈ R such that: −φ(γ ∗ ) ≤ −φ opt + δ E yl (α ∗ (t). K} ∀m ∈ {1.37) proves (5.5 where γ ∗ implies that for any δ > 0. by the Lyapunov Drift Theorem (Theorem 4. ω(t)) ≤ δ ˆ E ∗ γm ∀l ∈ {1.37) On the other hand. 1To see this. Zl (t). .34) gives:2 ( (t)) − V E {φ(γ (t)) (t)} ≤ D + C − V φ opt (5. M}: lim sup[γ m (t) − x m (t)] ≤ 0 t→∞ Using this with the continuity and entrywise nondecreasing properties of φ(x). for all t > 0.2) and hence by that theorem (or. Now note that feasibility of the problem (5.35) yields: ( (t)) ≤ D + C + V (φ max − φ opt ) Thus. (5. . the transformed problem can just use the same α(t) decisions. . . . equivalently. . 2The same can be derived using δ > 0 and then taking a limit as δ → 0.
38) follows by the entrywise nondecreasing property of φ(x) (where the max[·] represents an entrywise max). 0] where (5.39) follows by (5.28)(5. L} has a possibly timevarying link capacity bl (t).34).38) (5. Section 5. M} and any t > 0: Gm (t) 1 ≥ t t t−1 γm (τ ) − τ =0 1 t t−1 xm (τ ) τ =0 Taking expectations above yields for all t > 0: E {Gm (t)} ≥ γ m (t) − x m (t) t ⇒ E {Gm (t)} ≥ max[γ m (t) − x m (t). where each link l ∈ {1. (Theorem 5. . Suppose there are N nodes and L links.104 5. . ω(t)) ) into (5. .36) and using E {L( (0))} = 0 yields: M φ(x(t)) ≥ φ opt − (D + C)/V − m=1 νm max[γ m (t) − x m (t). 0] t Using this in (5. and let . 0]) N (5. OPTIMIZING FUNCTIONS OF TIME AVERAGES Proof. 0] (5.3 treats a more extensive network model that explicitly accounts for all queues. where we neglect the actual network queueing and develop a ﬂow control policy that simply ensures the ﬂow rate over each link is no more than the link capacity (similar to the ﬂow based models for internet and wireless systems in (2)(23)(29)(149)(150)). we have for all m ∈ {1. and (5.5) together with the fact that Gm (0) = 0. 2 5. We have: φ(γ (t)) = φ(x(t) + [γ (t) − x(t)]) ≤ φ(x(t) + max[γ (t) − x(t). 2. Suppose there are M sessions.31) (using γ ∗ (t) = ˆ E x(α ∗ (t). 1.20) and the sample path queue property (2.40) By deﬁnition of Gm (t) in (5. To prove part (a). . Substituting this into (5. This directly leads to a version of part (a) of the theorem with φ(γ ∗ ) replaced with φ max . for slotted time t ∈ {0. A more detailed analysis shows this can be replaced with φ(γ ∗ ) because all constraints of the transformed problem are satisﬁed and so the lim sup time average objective can be no bigger than φ(γ ∗ ) (recall (4.32).2 A FLOWBASED NETWORK MODEL Here we apply the stochastic utility maximization framework to a simple ﬂow based network model.96) of Theorem 4.2) We ﬁrst prove part (b).40) proves part (b) of the theorem. .}.18). . . . . we plug the ωonly policy α ∗ (t) from (5. . .39) ≤ φ(x(t)) + m=1 νm max[γ m (t) − x m (t).
44) (5. L} 0 ≤ xm (t) ≤ Am (t) . . and being such that the end node of each link is the start node of the next link. .m (t)) values completely specify the chosen paths for slot t.m (t) be an indicator variable that is 1 if the data xm (t) is selected to use a path that contains link l. Next. ending at the destination. . Let φ(x) = M φm (xm ) be a separable utility function.5.41) The control action taken every slot is to ﬁrst choose xm (t). and hence the decision variable for slot t is given by: α(t)=[(x1 (t).M} ] Let x = (x 1 .m xm ≤ bl ∀l ∈ {1.max for all t. . . m=1 nondecreasing function in x. . (1l. which either admits all newly arriving data. .max used to limit the amount of data admitted to the network on any slot. . and is 0 else. . . . xM (t)). . a path is deﬁned in the usual sense.42) is just one example of a ﬂow control constraint. . . .. ∀t (5. according to: 0 ≤ xm (t) ≤ Am (t) ∀m ∈ {1. the amount of type m trafﬁc admitted into the network on slot t. concave. Alternatively. the ﬂow controller could place all nonadmitted data into a transport layer storage reservoir (rather than dropping it).45) . ∀t (5. . Am (t)}. we must specify a path for the newly arriving data from a collection of paths Pm associated with path options of session m on slot t (possibly being the set of all possible paths in the network from the source of session m to its destination).. The (1l. (1l. .. We can easily modify this to the constraint xm (t) ∈ {0. The random network event ω(t) is thus: ω(t)=[(b1 (t). . or drops all of it.43) (5. M}..42) The constraint (5. . where each φm (x) is a continuous. (A1 (t).L}. . being a sequence of links starting at the source.. for some ﬁnite value γm. bL (t)).2. . as in (18)(22)(19)(17) (see also Section 5.. . . AM (t))] (5.6). Let 1l. M}. . .m∈{1. The inﬁnite horizon utility optimization problem of interest is thus: Maximize: Subject to: M m=1 φm (x m ) M m=1 1l. . A FLOWBASED NETWORK MODEL 105 Am (t) represent the new arrivals to session m on slot t. . . ..m (t)) ∈ Pm ∀m ∈ {1.. Each session m ∈ {1. . One can model a network where all sources always have data to send by Am (t) = γm. M} has a particular source node and a particular destination node. Here.m (t))l∈{1. . Our goal is to maximize the throughpututility φ(x) subject to the constraint that the time average ﬂow over each link l is less than or equal to the time average capacity of that link. . x M ) be a vector of the inﬁnite horizon time average admitted ﬂow rates.
Deﬁne φ opt as the maximum utility associated with the above problem and subject to the additional constraint that: 0 ≤ x m ≤ γm. M}. observe the new arrivals Am (t). .46) for some ﬁnite values γm. M} (we choose γm. . . . M} observes Gm (t) and chooses γm (t) as the solution to: Maximize: V φm (γm (t)) − Gm (t)γm (t) Subject to: 0 ≤ γm (t) ≤ γm. and with R being all γ vectors that satisfy m=1 0 ≤ γm ≤ γm. . we are only ensuring the time average ﬂow rate on each link l satisﬁes (5.2)(5. . . . .m (t)xm (t) − bl (t). each session m ∈ {1.m xm bl 1 = t→∞ t lim 1 lim = t→∞ t = 1 t→∞ t lim t−1 E {xm (τ )} τ =0 t−1 E 1l. .m (t)) is in Pm . the virtual queue backlogs Gm (t).49) (5. . 0 (5. . Rather.max . As there are no actual queues Qk (t) in this model.m (t)Zl (t) l=1 Subject to: 0 ≤ xm (t) ≤ Am (t) The path speciﬁed by (1l. K = 0.min = 0 because attributes xm (t) are nonnegative).5) with yl (t)= M 1l. . . we are not explicitly accounting for such queueing dynamics.5 thus reduces to: • (Auxiliary Variables) Every slot t. and choose xm (t) and a path to maximize: Maximize: xm (t)Gm (t) − xm (t) L 1l.48) Gm (t + 1) = max[Gm (t) + γm (t) − xm (t). deﬁned by update equations: M Zl (t + 1) = max Zl (t) + m=1 1l.m (t)xm (t) − bl (t). OPTIMIZING FUNCTIONS OF TIME AVERAGES where the time averages are deﬁned: xm 1l. and the link queues Zl (t). 0] where γm (t) are auxiliary variables for m ∈ {1.50) • (Routing and Flow Control) For each slot t and each session m ∈ {1.106 5. This ﬁts the framework of the utility maximization problem (5. .m (τ )xm (τ ) τ =0 t−1 E {bl (τ )} τ =0 We emphasize that while the actual network can queue data at each link l. .47) (5. . The algorithm given in Section 5.44).0.max (5. M}.max ∀m ∈ {1. . .max for all m ∈ {1. . M} (5. we use only virtual queues Zl (t) and Gm (t).
.max ∀t ∈ {0. and that the bl (t) and Am (t) processes have bounded second moments.5.52) It can be shown that if Gm (t) > V νm . Because γm (t) acts as the arrival to virtual queue Gm (t) deﬁned in (5.49)(5. . and so the time average constraints (5.48). We now show that. . • (Virtual Queue Updates) Update the virtual queues according to (5. .1 and 5.max for all t.47) and (5.1 that all virtual queues are mean rate stable. If the total weight of the shortest path is less than or equal to Gm (t). utility can be pushed arbitrarily close to optimal by increasing V . . given by constants νm ≥ 0. then the solution to (5.2. .2. 5. which treats a ﬂowbased network stability problem under the assumption that arriving trafﬁc is admissible (so that ﬂow control is not used). Note that the problem (5. The shortest path routing in this algorithm is similar to that given in (149). . under some mild additional assumptions.5).} (5. Suppose we use any Cadditive approximation (where a 0additive approximation is an exact implementation of the above algorithm). for some ﬁnite constant Am.2. AM (t))] is i. (A1 (t). for all m ∈ {1. . This problem with ﬂow control was introduced in (39) using the universal scheduling framework of Section 4. .1 PERFORMANCE OF THE FLOWBASED ALGORITHM To apply Theorems 5. using link weights Zl (t) as link costs. . and the achieved utility satisﬁes: lim inf φ(x(t)) ≥ φ opt − (D + C)/V t→∞ (5. choose xm (t) = Am (t) and route this data over this single shortest path.54) . This allows one to deterministically bound the queue sizes Zl (t) for all l ∈ {1. assume ω(t) = [(b1 (t). . .44) are satisﬁed. bL (t)).i. and so we choose xm (t) = 0 (thereby dropping all data Am (t)). it follows that Gm (t) cannot increase on the next slot. 1. there is too much congestion in the network. . Further. .43)(5.53) provided that this is true for Gm (0) (which is indeed the case if Gm (0) = 0).32) needed for Theorem 5.d. assume the utility functions φm (x) have ﬁnite right derivatives at x = 0.2. M}: 0 ≤ Gm (t) ≤ V νm + γm.2. the ﬂow control structure of this algorithm yields tight deterministic bounds of size O(V ) on the virtual queues. Suppose that Am (t) ≤ Am. Else. where there are no probabilistic assumptions on the arrivals or time varying link capacities. so that for any nonnegative x and y we have: φm (x) − φm (y) ≤ νm x − y (5.46) is trivially feasible because it is always possible to satisfy the constraints by admitting no new arrivals on any slot. . Therefore.48). Thus. over slots. . .max . L}: 0 ≤ Zl (t) ≤ V ν max + γ max + MAmax ∀t (5. .50) is γm (t) = 0 (see Exercise 5. .51) where D is a ﬁnite constant related to the maximum second moments of Am (t) and bl (t). 2. It follows from Theorem 5.9. to satisfy the constraints (5. A FLOWBASED NETWORK MODEL 107 This reduces to the following: First ﬁnd a shortest path from the source of session m to the destination of session m.
108 5. γm.47)).2.54). OPTIMIZING FUNCTIONS OF TIME AVERAGES provided this holds at time 0.m.55) is a very strong deterministic bound that says no link is given more data than it can handle. 5.max . as these values change every slot. the queue value used differs from the ideal queue value by no more than an additive constant that is proportional to the maximum time delay. However. the data injected for use over link l is no more than V ν max + γ max + MAmax beyond the total capacity offered by the link over that interval: t0 +T −1 M t0 +T −1 1l.1).2. arriving as a process xm (t − τl. a practical implementation may use outofdate values Zl (t − τl. Amax are deﬁned as the maximum of all νm . note that if a link l satisﬁes Zl (t) ≤ V ν max + γ max . Hence. Amax = m∈{1. and so Zl (t) cannot increase on the next slot. we have Zl (t + 1) ≤ V ν max + γ max + MAmax because the queue can increase by at most MAmax on any slot (see update equation (5.. Am. in which case. it follows that over any interval of T slots (for any positive integer T and any initial slot t0 ).. γ max .55) 5. by the routing and ﬂow control algorithm. then on the next slot.3) with the deterministic bound on Zl (t) in (5.max values: ν max = m∈{1. the virtual queue updates for Zl (t) in (5.2 DELAYED FEEDBACK We note that it may be difﬁcult to use the exact queue values Zl (t) when solving for the shortest path. no session will choose a path that uses this link on the current slot. A more extensive treatment of delayed feedback for the case of networks without dynamic arrivals or channels is found in (150). γ max = m∈{1.47) are most easily done at each link l..max To prove this fact. In this case. and thus would incur a cost larger than Gm (t) for any session m..M} max νm .3 LIMITATIONS OF THIS MODEL While (5. it does not directly imply anything about the actual network queues (other than the links are not overloaded)..M} max γm. we are simply using a Cadditive approximation and the utility and queue bounds are adjusted accordingly (see Exercise 4. .. Further. because the links are not overloaded. the actual network queues will be stable and all data can arrive to its destination with (hopefully small) delay. then any path that uses this link incurs a cost larger than V ν max + γ max .t ) for some time delay τl.M} max Am. the actual admitted data xm (t) for that link may not be known until some time delay.10 and also Section 6. which uses a differential equation method. The (unproven) understanding is that......max .1. Else.m (τ )xm (τ ) ≤ τ =t0 m=1 τ =t0 bl (τ ) + V ν max + γ max + MAmax (5. and where ν max . Thus. provided that the maximum time delay is bounded. Using the sample path inequality (2.t that may depend on l and t.. as the virtual queue size cannot change by more than a ﬁxed amount every slot.t ). if Zl (t) > V ν max + γ max .
3 treats an actual multihop queueing network and allows such dynamic routing. while it is known that average queue congestion and delay is convex if a general stream of trafﬁc is probabilistically split (152).5. . . N}. . All data that is intended for destination node c ∈ {1. . . Suppose the network has N nodes and operates in slotted time. However. this is clearly an approximation because data in an actual network will traverse its path one link at a time.4. . and we let A(t) = (A1 (t). AM (t)) represent the vector of data that exogenously arrives to the transport layer for each session on slot t (measured either in integer units of packets or real units of bits). observe that the update equation for Zl (t) in (5. N} is called commodity c data. We assume that a transport layer ﬂow controller observes Am (t) every slot and decides how much of this data to add to the network layer at its source node and how much to drop (ﬂow control decisions are made to limit queue buffers and ensure the network is stable).47) can be interpreted as a queueing model where all admitted data on slot t is placed immediately on all links l of its path. . or even a bound on delay. (c) regardless of its particular session. Similar models are used in (23)(29)(150)(31). . MULTIHOP QUEUEING NETWORKS 109 One might approximate average congestion or delay on a link as a convex function of the time average ﬂow rate over the link. We assume that Qn (t) = 0 for all t. .3. Section 5. .42) on modiﬁcations of this constraint). Data delivery takes place by transmissions over possibly multihop paths. treating the actual queueing rather than using the ﬂowbased model of the previous section. Each session m ∈ {1. we emphasize that this is only an approximation and does not represent the actual network delay. and we deﬁne Qn (t) as the amount of commodity c data in node n on (n) slot t. Let Q(t) denote the matrix of current queue backlogs for all nodes and commodities. There are M sessions. . Let (xm (t))M be the collection of ﬂow control decision variables on m=1 slot t. These decisions are made subject to the constraints 0 ≤ xm (t) ≤ Am (t) (see also discussion after (5. .3 MULTIHOP QUEUEING NETWORKS Here we consider a general multihop network. Finally. so that there is no dynamic rerouting midpath. 3 Convex constraints can be incorporated using the generalized structure of Section 5. . as data that reaches its destination is removed from the network. . . All data is queued (c) according to its commodity. . . . Most problems involving optimization of actual network delay are difﬁcult and unsolved.3 However. let Mn denote the set of all sessions m ∈ {1. For each n ∈ {1. Indeed. . . . but engineering of the Lagrange multipliers (which are related to queue backlogs) associated with those utility functions. M} has a particular source node and destination node. 5. . this is not necessarily true (or relevant) for dynamically controlled networks. N} and c ∈ {1. M} that have source node n and commodity c. Such problems involve not only optimization of rate based utility functions. . It is assumed that the actual network stamps all data with its intended path. . . particularly when the control depends on the queue backlogs and delays themselves. as in (151)(129)(150).
e. although distributed approximations of maxweight transmission exist in this case (see Chapter 6). S(t)):5 b(I (t). j ) of the network is given by a general transmission rate function b(I (t). so that:4 ⎡ ⎤ Q(c) (t + 1) = max ⎣Q(c) (t) − n n N (c) (c) (c) μnj (t).56)). N}. j.56) j =1 This satisﬁes (5. The value of S(t) is an abstract and possibly multidimensional quantity that describes the current link conditions between all nodes under the current slot. c ∈ {1.. S(t)))i. The collection of all transmission rates that can be offered over each link (i.N }. then null data is sent.56) is modiﬁed to “≤” (22).. where this full amount is used if there is that much commodity c data available at node i. we assume that if there is not enough data to send at the offered rate. OPTIMIZING FUNCTIONS OF TIME AVERAGES The queue backlogs change from slot to slot as follows: N N Q(c) (t + 1) = Q(c) (t) − n n j =1 (c) μnj (t) + ˜ i=1 (c) μin (t) + ˜ (c) m∈Mn (c) xm (t) where μij (t) denotes the actual amount of commodity c data transmitted from node i to node j ˜ (i.i=j where I (t) is a general networkwide resource allocation decision (such as link scheduling. . It is useful to deﬁne transmission decision variables μij (t) as the bit rate offered by link (i.1 TRANSMISSION VARIABLES Let S(t) represent the topology state of the network on slot t. for wireless interference networks. 0⎦ + (c) N μin (t) + i=1 (c) m∈Mn (c) xm (t) (c) (5.j ∈{1. and if we deﬁne: N (c) bn (t)= j =1 N μnj (t) . ∀t ˜ For simplicity. . . all channels are coupled. . j ) to commodity c data. etc. our “maxweight” transmission algorithm (to be deﬁned in the next subsection) decouples to allow nodes to make transmission decisions based only on those components of the current topology state S(t) that relate to their own local channels.3. so that “=” in (5. 5 It is worth noting now that for networks with orthogonal channels.1)) to index (n..) and takes values in some abstract set IS(t) that possibly depends on the current S(t). observed on each slot t as in (22). Of course. c) (for Qn (t) in (5. (c) (c) an (t)= i=1 μin (t) + (c) m∈Mn (c) xm (t) 5. S(t)) = (bij (I (t).1) if we relate index k (for Qk (t) in (5. modulation.110 5. over link (i. j )) on slot t. . so that: μij (t) ≤ μij (t) ∀i. bandwidth selection.. 4 All results hold exactly as stated if this null data is not sent..
max . as in (22). .63) .58) are due to the commonsense observation that it makes no sense to transmit data from a node to itself. . The controller then chooses μij (t) variables subject to the following constraints: μij (t) ≥ 0 ∀i. .62) is a continuous.j.3 MULTIHOP NETWORK UTILITY MAXIMIZATION The rectangle R is deﬁned by all (γ1 . concave. .c∈{1. . 0] The algorithm of Section 5. there are no Zl (t) queues. M}. N} ∀i. The control action α(t) is deﬁned by: α(t)=[I (t).3..3.. all (μij (t)) that satisfy (5.5 is thus: (5.59) = (i) μij (t) = 0 ∀i.2 THE UTILITY OPTIMIZATION PROBLEM This problem ﬁts our general framework by deﬁning the random event ω(t)=[A(t).61) (5. One can easily incorporate additional constraints that restrict the set of allowable links that certain commodities are allowed to use. . S(t)]. N} (c) μii (t) N (c) (c) (5. we have auxiliary variables γm (t) and virtual queues Gm (t) for m ∈ {1. . c ∈ {1. transmission. .60)(5. . .58) (5. .62) augmented with the additional constraint x ∈ R. . Because we have not speciﬁed any additional constraints.5. Deﬁne φ opt as the maximum utility for the problem (5. (xm (t))M ] m=1 representing the resource allocation. . . However. . and entrywise nondecreasing utility func 5. j. . (μij (t))i.59).The action space Aω(t) (c) is deﬁned by the set of all I (t) ∈ IS(t) . .. Deﬁne x as the time average expectation of the vector x(t).57) (5.60) (5. Our objective is to solve the following problem: Maximize: Subject to: where φ(x) = tion. 5.N } . .0.3. M m=1 φm (xm ) (c) φ(x) α(t) ∈ Aω(t) ∀t (c) All queues Qn (t) are mean rate stable (5. c ∈ {1. N} μij ≤ bij (I (t). M}. . . with update: Gm (t + 1) = max[Gm (t) + γm (t) − xm (t). . . S(t)) c=1 Constraints (5. γM ) vectors such that 0 ≤ γm ≤ γm. j. . and all (xm (t)) that satisfy 0 ≤ xm (t) ≤ Am (t) for all m ∈ {1. . j ∈ {1.57)(5. MULTIHOP QUEUEING NETWORKS 111 Every slot the network controller observes the current S(t) and makes a resource alloca(c) tion decision I (t) ∈ IS(t) . or to keep transmitting data that has already arrived to its destination.. and ﬂow control decisions.
it can be shown that the queues Gm (t) are deterministically bounded. and xm (t) = 0 otherwise. .4 below.d. . OPTIMIZING FUNCTIONS OF TIME AVERAGES • (Auxiliary Variables) For each slot t.d. over slots. (c) The queues Qn (t) can be shown to be strongly stable with average size O(V ) under an additional Slatertype condition. • (Resource Allocation and Transmission) For each slot t.max (5.59) (c) (c) (c) (5.65) (c ) m This reduces to the “bangbang” ﬂow control decision of choosing xm (t) = Am (t) if Qnm (t) ≤ Gm (t). If the φm (x) functions are bounded with bounded right derivatives. It then chooses xm (t) to solve: m Maximize: Gm (t)xm (t) − Qnm (t)xm (t) Subject to: 0 ≤ xm (t) ≤ Am (t) (c ) (5. trafﬁc and channel processes. each session m ∈ {1. the network controller observes queue (c) (c) backlogs {Qn (t)} and the topology state S(t) and chooses I (t) ∈ IS(t) and {μij (t)} to solve: Maximize: Subject to: n.c Qn (t)[ N=1 μnj (t) − N μin (t)] j i=1 I (t) ∈ IS(t) and (5. and hence they can be observed easily. and: lim inf φ(x(t)) ≥ φ opt − (D + C)/V t→∞ (5.112 5.56).66) are described in Subsection 5. .64) • (Flow Control) For each slot t. Assuming that second moments of arrivals and service variables are ﬁnite.3. The resource allocation and transmission decisions that solve (5.1. each session m observes Am (t) and the queue values Gm (t). (cm ) Qnm (t) (where nm denotes the source node of session m. . we have that all virtual and actual queues are mean rate stable. Note that these queues are all local to the source node of the session.9 can be used to show that the same algorithm operates efﬁciently for noni. .i. even without the Slater condition. The theory of Section 4.i.67) where D is a constant related to the maximum second moments of arrivals and transmission rates.66) • (Queue Updates) Update the virtual queues Gm (t) according to (5. A slight modiﬁcation of the algorithm that results in a Cadditive approximation can deterministically bound all actual queues by a constant of size O(V ) (38)(42)(153). by Theorem 5. we state the performance of the algorithm under a general Cadditive approximation. and that ω(t) is i. Before covering this.57)(5. including processes that arise from arbitrary node mobility (38).63) and the actual queues (c) Qn (t) according to (5. and cm represents its destination). M} observes the current virtual queue Gm (t) and chooses auxiliary variable γm (t) to solve: Maximize: V φm (γm (t)) − Gm (t)γm (t) Subject to: 0 ≤ γm (t) ≤ γm.
This is proposed in (15) as an enhancement to backpressure routing. N} that maximizes the differential backlog Wij (t) (breaking ties arbitrarily).. Analysis of the LIFO rule and its connection to placeholders and Lagrange multipliers is in (55). . MULTIHOP QUEUEING NETWORKS 113 5.68) where Wij (t) are weights deﬁned by: Wij (t)= (c) c∈{1.56)). because it explores all possible routes.1) or (5.70) ∗ where cij (t) is deﬁned as the commodity c ∈ {1. Related work that combines shortest paths and backpressure using the driftpluspenalty method is developed in (155) to treat maximum hop count constraints.3. it is easy to show that the resource allocation and transmission maximization reduces to the following generalized “maxweight” and “backpressure” algorithms (see (7)(22)): Every slot t. but.4 BACKPRESSUREBASED ROUTING AND RESOURCE ALLOCATION By switching the sums in (5..66). 0] (c) (5.5. . S(t)) 0 ∗ if c = cij (t) and Wij (t) ≥ 0 otherwise (c) (5. A theory of more aggressive placeholder packets for delay improvement in backpressure is developed in (37). although the algorithm ideally requires knowledge of Lagrange multiplier information in advance.3. A related and very simple LastInFirstOut (LIFO) implementation of backpressure that does not need Lagrange multiplier information is developed in (54). choose I (t) ∈ IS(t) to maximize: N N bij (I (t). A useful Cadditive approximation that experimentally improves delay is to combine the queue differential with a shortest path estimate for each link.N } max max[Wij (t). .69) where Wij (t) are differential backlogs: Wij (t)=Qi (t) − Qj (t) The transmission decision variables are then given by: μij (t) = (c) (c) (c) (c) bij (I (t). . S(t))Wij (t) i=1 j =1 (5. This backpressure approach achieves throughput optimality. may incur large delay. (c) .. and it is shown to perform quite well in simulations given in (154)(22) ((154) extends to networks with unreliable channels).. where experiments on wireless sensor networks show delay improvements by more than an order of magnitude over FIFO implementations (for all but 2% of the packets) while preserving efﬁcient throughput (note that LIFO does not change the dynamics of (5.
ω(t)) + gl (γ (t)). 0] ∀l ∈ {1.72) (5. . . . . and any solution to one can be used to construct a solution to the other (see Exercise 5. in that the maximum utility values are the same. .82) . xM ) ∈ RM γm.75) where f (x) and gl (x) are continuous and convex functions of x ∈ RM .75) to: Minimize: Subject to: 1) 2) 3) 4) 5) where we deﬁne: f (γ )= lim 1 t→∞ t t−1 y 0 + f (γ ) y l + gl (γ ) ≤ 0 ∀l ∈ {1. and R is an Mdimensional hyperrectangle deﬁned as: R = {(x1 .71)(5. .80) (5.9). . L} γ m = x m ∀m ∈ {1.max ∀m ∈ {1. deﬁne a virtual queue Zl (t) with update equation: Zl (t + 1) = max[Zl (t) + yl (α(t). We solve the transformed problem (5.48)(4. . . . γM (t)) be a vector of auxiliary variables that can be chosen within the set X ∩ R every slot t.76)(5.114 5. M}} where γm. . L} x∈X ∩R All queues Qk (t) are mean rate stable α(t) ∈ Aω(t) ∀t (5.49). L} ˆ (5. Let γ (t) = (γ1 (t).74) (5. as in the previous sections).71) (5. . OPTIMIZING FUNCTIONS OF TIME AVERAGES 5. M} All queues Qk (t) are mean rate stable γ (t) ∈ X ∩ R ∀t α(t) ∈ Aω(t) ∀t (5.77). we focus here on the ﬁxed V algorithm as speciﬁed in (4.min ≤ xm ≤ γm. .76) (5. .max are ﬁnite constants (this rectangle set R is only added to bound the auxiliary variables that we use. . . .81) E {f (γ (τ ))} τ =0 . For each inequality constraint (5. . gl (γ )= lim 1 t→∞ t t−1 E {gl (γ (τ ))} τ =0 It is not difﬁcult to show that this transformed problem is equivalent to the problem (5. .77) (5. .4 GENERAL OPTIMIZATION OF CONVEX FUNCTIONS OF TIME AVERAGES Here we provide a recipe for the following more general problem of optimizing convex functions of time averages: Minimize: Subject to: 1) 2) 3) 4) y 0 + f (x ) y l + gl (x) ≤ 0 ∀l ∈ {1.81) simply by restating the driftpluspenalty algorithm for this context. . . . .79) (5.71)(5. . . .min and γm. We transform the problem (5. While a variableV implementation can be developed. X is a closed and convex subset of RM .78) (5. .73) (5.75).
over slots.10. being zero if and only if x(t) is in the (closed) set X ∩ R.86) (5. L} t→∞ t→∞ (5.84) that is within a distance C from its inﬁmum value.4. . Further.30) hold. subject to a given (t).d.30) hold. all actual and virtual queues are mean rate stable.5. . and: lim sup y l (t) + gl (x(t)) ≤ 0 ∀l ∈ {1. Assume the boundedness assumptions (4.3 (Algorithm Performance) Suppose the boundedness assumptions (4. Z (t). If ω(t) is i. then: lim sup y 0 (t) + f (x(t)) ≤ y0 t→∞ opt opt + f opt + D+C V (5. an O(V ) backlog bound can also be derived under a Slater assumption. Now deﬁne a Cadditive approximation as any algorithm for choosing γ (t) ∈ X ∩ R and α(t) ∈ Aω(t) every slot t that.i. . we have the following drift bound: ( (t)) + V E {y0 (t) + f (γ (t)) (t)} ≤ D + V E {y0 (t) + f (γ (t)) (t)} L + l=1 K Zl (t)E {yl (t) + gl (γ (t)) (t)} Qk (t)E {ak (t) − bk (t) (t)} k=1 M + + m=1 Hm (t)E {γm (t) − xm (t) (t)} (5. GENERAL OPTIMIZATION OF CONVEX FUNCTIONS OF TIME AVERAGES 115 For each equality constraint (5. Theorem 5. See Exercise 5.83) Deﬁne (t) = [Q(t). As before.25)(4. .25)(4. . .78). For the Lyapunov function (4. X ∩ R) represents the distance between the vector x(t) and the set X ∩ R.71)(5.84) where D is a ﬁnite constant related to the worst case second moments of the arrival. yields a righthandside in (5. .85) where y0 + f opt represents the inﬁmum cost metric of the problem (5. Proof.d. and that ω(t) is i. the problem (5. and attribute vectors. X ∩ R) = 0 where dist(x(t).87) lim dist (x(t).43).75) over all feasible policies. H (t)].i. ω(t)) ∀m ∈ {1. and E {L( (0))} < ∞. M} (5.75) is feasible. 2 . Suppose the functions f (γ ) and gl (γ ) are upper and lower bounded by ﬁnite constants over γ ∈ X ∩ R. deﬁne a virtual queue Hm (t) with update equation: ˆ Hm (t + 1) = Hm (t) + γm (t) − xm (α(t). service. over slots and any Cadditive approximation is used every slot.71)(5. .
as hard as combinatorial binpacking.1).88) (5. A related utilityproportional fairness objective is studied for static networks in (158). L} α(t) ∈ Aω(t) All queues Qk (t) are mean rate stable (5.92) The actual queues Qk (t) are assumed to satisfy (5.90) (5.116 5. 5. we seek an algorithm that satisﬁes the constraints (5. An application to risk management in network economics is given in Exercise 5.89) (5. and so we do not expect to ﬁnd a global optimum. where xav (t) is deﬁned as an empirical running time average of the attribute vector: xav (t)= xm (α(−1).89)(5. ω(−1)) ˆ 1 t t−1 τ =0 xm (τ ) if t > 0 if t = 0 . We use the driftpluspenalty framework with the same virtual queues as before: Zl (t + 1) = max[Zl (t) + yl (α(t).91) and that yields a local optimum of f (x).1: An example nonconcave utility function of a time average attribute. Performing such a general nonconvex optimization is.11. in some cases. .1). Deﬁne (t)=[Q(t).The stochastic problem we present here is developed in (43). Such problems are treated in a nonstochastic (static) network optimization setting in (156)(157). which treats a convex optimization problem that has a fairness interpretation with respect to a nonconcave utility function. OPTIMIZING FUNCTIONS OF TIME AVERAGES 5. . Applications of such problems include throughpututility maximization with f (x) given by −1 times a sum of nonconcave “sigmoidal” functions that give low utility until throughput exceeds a certain threshold (see Fig. Utility(x) Attribute x (such as throughput) Figure 5. Z (t). .5 NONCONVEX STOCHASTIC OPTIMIZATION Minimize: Subject to: f (x) y l ≤ 0 ∀l ∈ {1. Rather. ω(t)). and with partial derivatives ∂f (x)/∂xm having bounded magnitudes νm ≥ 0. .91) Consider now the problem: where f (x) is a possibly nonconvex function that is assumed to be continuously differentiable with upper and lower bounds fmin and fmax . xav (t)]. 0] ˆ (5.
in the sense that for any alternative vector x∗ that can be achieved as the time average of a policy that makes all queues mean rate stable and satisﬁes all required constraints.28) hold. For any V ≥ 0. . we have: M m=1 ∗ (xm − x m ) ∂f (x) D+C ≥− ∂xm V .i. .5.93). ω(t)) − bk (α(t).25)(4. . over slots. c) If all time averages converge. the boundedness assumptions (4. and the problem (5. ω(t)) ˆ ∂f (xav (t)) ∂xm Below we state the performance of the algorithm that observes queue backlogs every slot t and takes an action α(t) ∈ Aω(t) that comes within C of minimizing the righthandside of the drift expression (5. We thus have: ( (t)) + V E {P enalty(t) (t)} ≤ D + V E {P enalty(t) (t)} K + k=1 ˆ Qk (t)E ak (α(t).d. then the achieved limit is a near local optimum. For simplicity. we have: 1 t t−1 M E τ =0 m=1 xm (τ )∂f (xav (τ )) ∂xm ≤ 1 t t−1 M τ =0 m=1 ∗ xm E ∂f (xav (τ )) D+C + ∂xm V where D is a ﬁnite constant related to second moments of the ak (t). and for any Cadditive approximation of the above algorithm that is implemented every slot.93) The penalty we use is: M P enalty(t)= m=1 xm (α(t). Theorem 5. over slots.4 (NonConvex Stochastic Network Optimization (43)) Suppose ω(t) is i. ω(t)) (t) ˆ L + l=1 Zl (t)E yl (α(t).88)(5. assume that (0) = 0. Assume ω(t) k=1 l=1 is i. bk (t).91) is feasible.5. the function f (x) is bounded and continuously differentiable with partial derivatives bounded in magnitude by ﬁnite constants νm ≥ 0. L} t→∞ (b) For all t > 0 and for any alternative vector x∗ that can be achieved as the time average of a policy that makes all queues mean rate stable and satisﬁes all required constraints. Deﬁne L( (t))= 2 [ K Qk (t)2 + L Zl (t)2 ]. yl (t) processes. we have: (a) All queues Qk (t) and Zl (t) are mean rate stable and: lim sup y l (t) ≤ 0 ∀l ∈ {1. ω(t)) (t) ˆ (5. ω(−1)) can be viewed as an initial sample taken at time “t = −1” before the ˆ 1 network implementation begins. NONCONVEX STOCHASTIC OPTIMIZATION 117 where xm (α(−1). .i.d. so that there is a constant vector x such that xav (t) → x with probability 1 and x(t) → x.
there exists an ωonly policy α ∗ (t) such that (43): E yl (α ∗ (t). and under the convergence assumptions of part (c). it can be shown that for any x∗ = (x1 . . . . Then all virtual and actual queues are mean rate stable (and so all constraints y l ≤ 0 are satisﬁed). . . ω(t)) ˆ ∂f (xav (t))  (t) ≤ D + C + V ∂xm M m=1 ∗ xm ∂f (xav (t)) ∂xm 6The same result can be derived by plugging in with δ > 0 and then taking a limit as δ → 0. . More precisely. and use any Cadditive approximation (where C is constant for all t). . . . the limiting x is a local optimum. ω(t)) − b ˆ ˆ dist(E x(α ∗ (t).93) with δ = 0 yields:6 M ( (t)) + V E m=1 xm (α(t). (Theorem 5. ω(t)) − b ˆ Then all queues Qk (t) are strongly stable with average size O(V ). L} ≤ − ∀k ∈ {1. Then: f x + ( x ∗ − x ) ≈ f (x ) + M m=1 ∗ (xm − x m ) ∂f (x) ≥ f (x) ∂xm Hence. Analogous to Theorem 4.4) Our proof uses the same driftpluspenalty technique as described in previous ∗ ∗ sections.94) (5. . ω(t)) E ak (α (t). assume the above holds with δ = 0. . the new cost achieved by taking a small step in any feasible direction is no less than the cost f (x) that we are already achieving. K} (5. . . ω(t)) ˆ ∗ ˆk (α ∗ (t). . . xM ) that is a limit point of x(t) under any policy that makes all queues mean rate stable and satisﬁes all constraints. and for any δ > 0. L} ˆ ∗ ˆk (α ∗ (t). K} E ak (α (t). . ω(t)) . . x∗ ) ≤ δ For simplicity of the proof. . the change in cost cost ( ) satisﬁes: lim cost ( ) →0 ≥0 Proof. in that: M ∗ (xm − x m ) m=1 ∂f (x) ≥0 ∂xm where x∗ is any alternative vector as speciﬁed in part (c). ω(t)) ≤ δ ∀l ∈ {1. That the inequality guarantee in part (e) demonstrates local optimality can be understood as follows: Suppose we start at our achieved time average attribute vector x. Plugging the above into the righthandside of (5.95) E yl (α ∗ (t). e) Suppose we use a variable V (t) algorithm with V (t)=V0 · (1 + t)d for V0 > 0 and 0 < d < 1.5. and we want to shift this in any feasible direction by moving towards another feasible vector x∗ by an amount (for some > 0). ω(t)) ≤ δ ∀k ∈ {1. .118 5. . . OPTIMIZING FUNCTIONS OF TIME AVERAGES d) Suppose there is an > 0 and an ωonly policy α ∗ (t) such that: ≤ 0 ∀l ∈ {1.
min is a bound on the expectation of xm (t) under any policy. Hence. the work (32)(33) used a partial derivative evaluated at the time average xav (t) to maximize a concave function of throughput in a multiuser wireless downlink with time varying channels. For the special case of convex problems. we know all queues are mean rate stable. . O(V )] performancecongestion tradeoff as the dual algorithm.95). The pure dual algorithm seems to provide stronger analytical guarantees for convex problems because: (i) It does not need a running time average xav (t) and hence can be shown to be robust to changes in system parameters (as in Section 4. On the other hand. . t − 1}. the system in (32)(33) assumed inﬁnite backlog in all queues (similar to Exercise 5. known to exist by the boundedness assumptions. it is not clear how long the system must run to approach convergence. it was shown that a related “ﬂuid limit” of the system has an optimal utility. 2 Using a penalty given by partial derivatives of the function evaluated at the empirical average attribute vector can be viewed as a “primaldual” operation that differs from our “puredual” approach for convex problems. but using an exponential weighted average. . The proof of part (e) is similar to that of Theorem 4.5. Unfortunately. NONCONVEX STOCHASTIC OPTIMIZATION 119 Taking expectations of the above drift bound (using the law of iterated expectations). 1. the drift is less than or equal to a ﬁnite constant.94)(5. (iii) It provides results for all t > 0 that show how long we must run the system to be close to the inﬁnite horizon . again for convex problems. if the algorithm is assumed to converge to well deﬁned time averages. There. and if we use a running time average xav (t) rather than an exponential average. It was also conjectured in (34) that the actual network will have utility that is close to this ﬂuid limit as a parameter β related to the exponential weighting is scaled (see Section 4. However.4 above shows that. this drift expression can also be rearranged as: M ( (t)) ≤ D + C + V m=1 ∗ νm (xm − xm.6). the analysis does not specify the size of β needed to achieve a nearoptimal utility.min ) where xm.9 and (42)(38)(17)). (ii) It does not require additional assumptions about convergence. rather than a running time average xav (t). Such a primaldual approach was ﬁrst used in context of convex network utility maximization problems in (32)(33)(34). This was extended in (34) to consider the primaldual technique for joint stability and performance optimization. summing the telescoping series over τ ∈ {0. The proof of part (c) follows by taking a limit of the result in part (b).2. and so by Theorem 4. Recent work in (36) considers related primaldual updates for convex problems. the primaldual algorithm achieves a similar [O(1/V ).9 and is omitted for brevity. proving part (a). so that there were no queue stability constraints.9 in (34)). and dividing by V t immediately yields the result of part (b). However. The proof of part (d) follows similarly by plugging in the policy α ∗ (t) of (5. and that this limit is “weakly” approached under appropriately scaled systems. and it shows the long term utility of the actual network is close to optimal as a parameter is scaled. .5. Speciﬁcally. Theorem 5. where the limits can be pushed through by the boundedness assumptions and the continuity assumption on the derivatives of f (x).
S (t)]. . let φk (a) be a continuous. . if one applies the pure dual technique with a nonconvex cost function f (x). where IS (t) is an abstract set that deﬁnes transmission options under channel state S (t). . . . . . . dK (t)). Drop decisions dk (t) are chosen subject to the constraints: 0 ≤ dk (t) ≤ Amax The resulting queue update equation is thus: Qk (t + 1) = max[Qk (t) − bk (t) − dk (t). The controller also chooses a channelaware transmission decision I (t) ∈ IS (t) . . Amax ] (5. . Let ω(t) = [A(t). . . . Let νk be the maximum rightderivative of φk (a) (which occurs at a = 0).These allow packets already admitted to the network layer queues Qk (t) to be dropped if their delay is too large. . . .96) where Amax is a ﬁnite constant. . The transmission rates are given by deterministic functions of I (t) and S (t): ˆ bk (t) = bk (I (t).6 WORST CASE DELAY Here we extend the utility optimization framework to enable O(V ) tradeoffs in worst case delay. one would get a global optimum of the time average f (x). However. Example utility functions that have this form are: φk (a) = log(1 + νk a) . . where A(t) = (A1 (t). we keep transport layer queues L(t) = (L1 (t). In addition to these queues. over slots. Related problems are treated in (76)(159). Assume that ω(t) is i. . It is assumed that Ak (t) ≤ Amax for all k and all t. .i. 0] + ak (t) ∀k ∈ {1. . 5. as it can achieve a local optimum for nonconvex problems. K}. This means that ak (t) is chosen from the Lk (t) + Ak (t) amount of data available on slot t. . . In addition. OPTIMIZING FUNCTIONS OF TIME AVERAGES limit guarantees. . S (t)) Second moments of bk (t) are assumed to be uniformly bounded. and S (t) = (S1 (t). concave. . . and is no more than Amax per slot (which limits the amount we can send into the network layer).d. . AK (t)) is a vector of new arrivals to the transport layer. LK (t)). QK (t)).120 5. which may not even be a local optimum of f (x). . This is where the primaldual technique shows its real potential. Consider a 1hop network with K queues Q(t) = (Q1 (t). . where Lk (t) stores incoming data before it is admitted to the network layer queue Qk (t) (as in (17)). aK (t)) subject to the constraints: 0 ≤ ak (t) ≤ min[Lk (t) + Ak (t). choose admission variables a(t) = (a1 (t). and nondecreasing utility function deﬁned over the interval 0 ≤ a ≤ Amax . and assume νk < ∞. K} (5.97) For each k ∈ {1. SK (t)) is a vector of channel conditions that affect transmission. . . Newly arriving data Ak (t) that is not immediately admitted into the network layer is stored in the transport layer queue Lk (t). . deﬁne packet drop decisions d(t) = (d1 (t). Every slot t.
We include the Lk (t) queues as they are useful in situations where it is preferable to store data for later transmission than to drop it. K} 0 ≤ ak (t) ≤ Ak (t) ∀k ∈ {1. .102) has d k = 0 for all k. . If one prefers to enforce constraint (5. .5.Thus. which includes this constraint. . .6.98)(5. we shall not enforce this constraint. Note the following: • The constraint (5. .102).102) is feasible. .101) is different from the constraint (5. Thus. We recommend choosing β such that 1 ≤ β ≤ 2.98)(5.98)(5. but we soon develop an algorithm with worst case delay of O(V ) that comes within O(1/V ) of optimizing the utility associated with the above problem (5. ∀t I (t) ∈ IS (t) ∀k ∈ {1.100).102) where β is a constant that satisﬁes 1 ≤ β < ∞. . ∀t βνk d k k=1 Qk (t) are mean (5.100) requires each queue to transmit with a timeaverage rate of at least .101) or (5.96). deﬁned in terms of a parameter > 0: K K Maximize: k=1 φk (a k ) − Subject to: rate stable All queues bk ≥ ∀k ∈ {1.102).102). then this constraint is wasteful. . . the objective (5. We desire a solution to the following problem. WORST CASE DELAY 121 where log(·) denotes the natural logarithm. . It is assumed throughout that this constraint is feasible. . This is because the penalty for dropping is βνk . and so the problem (5.98)(5.99) (5. it can be shown that it is always better to restrict data at the transport layer rather than admitting it and later dropping it. it is the same if there are no transport layer queues.98) can equivalently be replaced by the objective of maximizing K φk (a k ) and by adding the k=1 constraint d k = 0 for all k. Rather. It turns out that optimal utility is the same with either constraint (5.101) (5. this is easily done with an appropriate virtual queue. and in particular. That is. This constraint ensures all queues are getting at least a minimum rate of service.100) (5. • An optimal solution to (5. This problem does not specify anything about worstcase delay.96) is used for the actual algorithm. the less stringent constraint (5.96). However.98)(5. K}. K}.98) (5. • The constraint (5. so that Lk (t) = 0 for all t and all data is either admitted or dropped upon arrival. which is greater than or equal to the largest derivative of the utility function φk (a). we simply measure utility of our system with respect to the optimal utility of the problem (5. . If the input rate E {Ak (t)} is less than . but performance is measured with respect to the optimum utility achievable in the problem (5. A larger value of β will trade packet drops at the network layer for packet nonadmissions at the ﬂow controller.
we have for all slots τ ∈ {t + 1. . the backlog on slot τ would be cleared). The condition Qk (t) ≤ bk (t) + dk (t) is satisﬁed whenever the backlog Qk (t) is cleared (by service and/or drops) on slot t. t + Wk. If this constraint is not active. . .max } (else. .max + 1) − Zk (t + 1) ≥ − τ =t+1 [bk (τ ) + dk (τ )] + Wk. . .max .max . t + Wk. and that an algorithm is used that ensures Qk (t) ≤ Qk. . for some ﬁnite constants Zk.max for all slots t ∈ {0. K} with Zk (0) = 0 and with dynamics: Zk (t + 1) = max[Zk (t) − bk (t) − dk (t) + .103). by (5. but it has an arrival of size every slot. .max }: Zk (τ + 1) ≥ Zk (τ ) − bk (τ ) − dk (τ ) + Summing the above over τ ∈ {t + 1. Then the worstcase delay of all nondropped data in queue k is Wk. It must be that Qk (τ ) > bk (τ ) + dk (τ ) for all τ ∈ {t + 1. .max for all t. we deﬁne an persistent service queue. as shown in the following lemma: Suppose Qk (t) and Zk (t) evolve according to (5. for all slots τ ∈ {t + 1. The size of the queue Zk (t) can provide a bound on the delay of the headofline data in queue Qk (t) in a ﬁrstinﬁrstout (FIFO) system.max and Qk. . . OPTIMIZING FUNCTIONS OF TIME AVERAGES 5.max . This is similar to (76) (where explicit delays are kept for each packet) and (159) (which uses a slightly different update).104) Wk. .97) that arrivals a(t) are added to the queue backlog Qk (t + 1) and are ﬁrst available for service on slot t + 1.max . We reach a contradiction.max } yields: t+Wk.103). .max + Zk. deﬁned: (5.max )/ Lemma 5. being a virtual queue Zk (t) for each k ∈ {1. t + Wk. 2. Fix a slot t. .1 THE PERSISTENT SERVICE QUEUE To ensure worstcase delay is bounded.}.max and Zk (t) ≤ Zk. 1. 0] 0 if Qk (t) > bk (t) + dk (t) if Qk (t) ≤ bk (t) + dk (t) (5. We show that all arrivals a(t) are either served or dropped on or before slot t + Wk.5 Proof. then Zk (t) has a departure process that is the same as Qk (t). .max }: Zk (τ + 1) = max[Zk (τ ) − bk (τ ) − dk (τ ) + . . .max = (Qk. Therefore. Assume service and drops are done in FIFO order. If a scheduling algorithm is used that ensures Zk (t) ≤ Zk. . We assume throughout that ≤ Amax . Suppose this is not true. . . Note by (5.max and Qk (t) ≤ Qk. 0] In particular. then worstcase delay is also bounded.97) and (5.6.103) where > 0. . t + Wk. . .122 5.max Zk (t + Wk.
the sum of bk (τ ) + dk (τ ) over the interval τ ∈ {t + 1. WORST CASE DELAY 123 Rearranging terms in the above inequality and using the fact that Zk (t + 1) ≥ 0 and Zk (t + Wk.max yields: t+Wk.6.6. .max which implies: Wk. proving the result. . 0] (5. . and deﬁne the Lyapunov function L( (t)) by: L( (t))= 1 2 K [Qk (t)2 + Zk (t)2 + Gk (t)2 ] k=1 . . .114) Now deﬁne (t)=[Q(t). we transform the problem (5. . .110) (5. . . t + Wk.max )/ This contradicts (5.105) On the other hand. K} All queues Qk (t) are mean rate stable bk ≥ ∀k ∈ {1.max ≤ τ =t+1 [bk (τ ) + dk (τ )] + Zk. .max < Qk.max [bk (τ ) + dk (τ )] < Qk (t + 1) ≤ Qk. .98)(5.5. .max < (Qk. Thus: t+Wk.111) (5. . 2 5. . . . . . .108) (5. K} 0 ≤ ak (t) ≤ Ak (t) ∀k ∈ {1. K} 0 ≤ γk (t) ≤ Amax ∀k ∈ {1.108). by the FIFO service. deﬁne virtual queues Gk (t) by: Gk (t + 1) = max[Gk (t) − ak (t) + γk (t). K} To enforce the constraints (5. . . .104).max + Zk. G(t)] as the combined queue vector. Z (t).max } must be strictly less than Qk (t + 1) (else.max τ =t+1 (5.112) (5. K} I (t) ∈ IS (t) ∀k ∈ {1.max + Zk. γK (t)) by: K K Maximize: k=1 φk (γk ) − k=1 βνk d k (5.109) (5.113) Subject to: a k ≥ γ k ∀k ∈ {1. .2 THE DRIFTPLUSPENALTY FOR WORSTCASE DELAY As usual. which is included at the end of the backlog Qk (t + 1).max Wk.107) (5. .105) yields: Wk.max (5.106) and (5.106) Combining (5. . . would have been cleared during this interval).max + 1) ≤ Zk.102) using auxiliary variables γ (t) = (γ1 (t). . . all data a(t).
.115) thus observes Z (t). . S (t)) (5. . choose γk (t) to solve: Maximize: V φk (γk (t)) − Gk (t)γk (t) Subject to: 0 ≤ γk (t) ≤ Amax • (Flow Control) For each k ∈ {1. G(t). K}. S (t)) − dk (t) (t) + k=1 K Zk (t)E + k=1 ˆ Qk (t)E ak (t) − bk (I (t). . and does the following: • (Auxiliary Variables) For each k ∈ {1. Amax ] 0 if Qk (t) ≤ Gk (t) if Qk (t) > Gk (t) (5. S (t)) − dk (t) (t) K + k=1 Gk (t)E {γk (t) − ak (t) (t)} (5.115) where B is a constant that satisﬁes: B ≥ 1 2 K [E ( − bk (t) − dk (t))2  (t) k=1 K 1 + 2 E ak (t)2 + (bk (t) − dk (t))2 + (γk (t) − ak (t))2  (t) k=1 (5. . .118) • (Transmission) Choose I (t) ∈ IS (t) to maximize: K k=1 ˆ [Qk (t) + Zk (t)]bk (I (t). .124 5.120) . choose ak (t) by: ak (t) = min[Lk (t) + Ak (t).116) Such a constant B exists by the boundedness assumptions on the processes. K}.117) (5. 0]. S (t) every slot t. The algorithm that minimizes the righthandside of (5. OPTIMIZING FUNCTIONS OF TIME AVERAGES Using the fact that Zk (t + 1) ≤ max[Zk (t) − bk (t) − dk (t) + .119) (5. it can be shown (as usual) that the Lyapunov drift satisﬁes: K ( (t)) − V E k=1 [φk (γk (t)) − βνk dk (t)] (t) ≤ B K −V E k=1 K [φk (γk (t)) − βνk dk (t)] (t) ˆ − bk (I (t). . Q(t).
122)(5. all queue updates are kept the same (so the algorithm is unchanged).max .124).max . Gk (t) ≤ Gk.6. (5.max is given by: Wk.122) (5.123) (5. Zk (t).max . Qk.max as follows: Zk. and so Qk (t) ≤ βV νk + Amax for all t.max Qk.118) chooses γk (t) = 0 whenever Gk (t) > V νk . To show the Qk.121) • (Queue Update) Update Qk (t).The Zk.2. Gk (t) by (5.max ∀t where Zk. and so Qk (t) also cannot increase. worstcase delay Wk. showing that the auxiliary variable update (5. then for arbitrary sample paths the above algorithm ensures: Zk (t) ≤ Zk.max + Amax for all t.124) Theorem 5. then the ﬂow control decision will choose ak (t) = 0. The worstcasedelay result then follows immediately from Lemma 5. Gk. provided that these inequalities hold for t = 0. the above algorithm may choose a drop variable dk (t) such that Qk (t) < bk (t) + dk (t).max + Qk. It follows that Qk (t) ≤ Gk.max bound is proven similarly. and then drop only what remains.max . .121) yields dk (t) = Amax whenever Qk (t) > βV νk . In some cases.max )/ = O(V ) Proof.3 ALGORITHM PERFORMANCE = βV νk + = min[βV νk + Amax .97).This proves the Qk.max for all t follows by an argument similar to that given in Section 5. (5.103).max Gk.max bound.6 If ≤ Amax .5. 2 . WORST CASE DELAY 125 • (Packet Drops) For each k ∈ {1. . .max .max are deﬁned in (5. However.114). V νk + 2Amax ] = V νk + Amax Deﬁne Zk.max bound. K}.1. . we also see that if Qk (t) > Gk. In this case.6.max (5.5. the arrivals are less than or equal to the offered drops whenever Qk (t) > βV νk .max and Qk. 5. but it is useful to ﬁrst transmit data with offered rate bk (t) on slot t. it is clear that the packet drop decision (5. Thus.max = (Zk.117)(5. Qk (t) ≤ Qk. Because ak (t) ≤ Amax . That Gk (t) ≤ Gk. choose dk (t) by: dk (t) = Amax 0 if Qk (t) + Zk (t) > βV νk if Qk (t) + Zk (t) ≤ βV νk (5.
The fact that γk (t) = 0 whenever Gk (t) > V νk can be hardwired into the auxiliary variable decisions.d. I ∗ (t) as a random function of S (t). A 0additive approximation performs the exact algorithm given above. d k (t)= 1 t t−1 τ =0 E {dk (τ )} and where φ ∗ is the optimal utility associated with the problem (5. and d∗ (t) = 0 (so that it does not drop any data) such that: K k=1 ∗ φk (γk∗ ) = φ ∗ ∀k ∈ {1. .121). Then the worstcase queue backlog and delay bounds given in Theorem 5. the I (t) decisions can be arbitrary and are not necessarily those that maximize (5. . K} (5.102).121). S (t)) ≥ − δ I (t) ∈ IS (t) .max for all k.126 5.118)). . . ∀t where φ ∗ is the optimal utility associated with the problem (5. . The theorem relies on the following fact. over slots and any Cadditive approximation for minimizing the righthandside of (5. a k (t) and d k (t) are deﬁned: a k (t)= 1 t t−1 τ =0 E {ak (τ )} .i. γK ) and an ωonly policy [a∗ (t). d∗ (t)] that chooses a∗ (t) as a random function of A(t).118) otherwise.128) (5. K}. . I ∗ (t). Zk (0) ≤ Zk. and ≤ Amax . 0 ≤ ∗ ak (t) ≤ Ak (t) ∀k ∈ {1.120).max . .6 hold. Suppose Qk (0) ≤ Qk. . Suppose ω(t) is i.102).116). and achieved utility satisﬁes: Theorem 5. . . . S (t)) ≥ E ak (t) − δ ∗ E ak (t) = γk∗ γk∗ ≤ Amax . 0 ≤ ∗ ∗ ˆ E bk (I ∗ (t). .5: For all δ > 0.117)(5. and γk (t) = 0 whenever Gk (t) > V νk .The next theorem holds for any Cadditive approximation for minimizing the righthandside of (5. . . which can be proven using Theorem 4. . ﬂow control decisions ak (t) take place according to the rule (5. .119) hold exactly. Further.115) is used such that (5.126) (5. K} ∀k ∈ {1. K} ∀k ∈ {1.129) ˆ E bk (I (t).max . .127) (5. . OPTIMIZING FUNCTIONS OF TIME AVERAGES The above theorem only uses the fact that packet drops dk (t) take place according to the rule (5. . Gk (0) ≤ Gk. ∗ ∗ there exists a vector γ ∗ = (γ1 . . .115) that preserves the above basic properties.98)(5.98)(5.117)(5. (5.119). even when they are chosen to approximately solve (5. and auxiliary variable decisions satisfy γk (t) = 0 whenever Gk (t) > V νk (a property of the solution to (5.7 lim inf t→∞ K k=1 φk (a k (t)) − K k=1 βd k (t) ≥ φ ∗ − B/V where B is deﬁned in (5.125) (5.
since φk (γ ) is continuous and nondecreasing. S (t)) − dk (t) (t) K ∗ Gk (t)E γk∗ (t) − ak (t) (t) + k=1 ∗ where d∗ (t).129) in the righthandside of the above inequality and taking δ → 0 yields: K ( (t)) − V E k=1 [φk (γk (t)) − βνk dk (t)] (t) ≤ B + C − V φ ∗ Using iterated expectations and telescoping sums as usual yields for all t > 0: 1 t t−1 K E τ =0 k=1 [φk (γk (τ )) − βνk dk (τ )] ≥ φ ∗ − (B + C)/V − E {L( (0))} /(V t) Using Jensen’s inequality for the concave functions φk (γ ) yields for all t > 0: K k=1 [φk (γ k (t)) − d k (t)] ≥ φ ∗ − (B + C)/V − E {L( (0))} /(V t) (5. because Gk (t) ≤ Gk. K} and all t. .max for all t. (Theorem 5.130) proves the result. I ∗ (t).5)) that for all k and all slots t > 0: a k (t) ≥ max[γ k (t) − Gk. 2 . . it is easy to show (via (5. A and 0 ≤ ak k k max ] for all k ∈ {1. a∗ (t) are any alternative decisions that satisfy I ∗ (t) ∈ IS (t) . . 0] Therefore. ∗ (t) ≤ min[L (t) + A (t).115): K ( (t)) − V E k=1 [φk (γk (t)) − βνk dk (t)] (t) ≤ B + C K −V E k=1 K ∗ [φk (γk∗ ) − βνk dk (t)] (t) + k=1 K Zk (t)E ∗ ˆ − bk (I ∗ (t).6.125)(5.114) and (2. S (t)) − dk (t) (t) + k=1 ∗ ∗ ˆ Qk (t)E ak (t) − bk (I ∗ (t).5.7) The Cadditive approximation ensures by (5. WORST CASE DELAY 127 Proof. Substituting the ωonly policy from (5. .max /t. 0 ≤ dk (t) ≤ Amax . it can be shown: K K lim inf t→∞ k=1 [φk (a k (t)) − d k (t)] ≥ lim inf t→∞ k=1 [φk (γ k (t)) − d k (t)] Using this in (5.130) However.
network layer drops can be made arbitrarily small by either increasing β or V . for example. . . Thus. . . if φ ∗ = φ ∗=0 . such functions are nonsingular at x = 0 (preventing worstcase backlog bounds as in Exercises 5. 0] are added to enforce these constraints. . . . xM ) over all possible vectors in • It maximizes the second lowest entry over all vectors in • It maximizes the third lowest entry over all vectors in and so on. OPTIMIZING FUNCTIONS OF TIME AVERAGES Because the network layer packet drops dk (t) are inefﬁcient. . For ﬂowbased networks with capacitated links. With 0 buffer space. 5. (129)(3)(5)(6)).5. Let (x 1 .98)(5. . One can approximate maxmin fairness using a concave utility function in a network with capacitated links. This can be viewed as a sequence of nested optimizations. xM ) ∈ is maxmin fair if: • It maximizes the lowest entry of (x1 . then lim supt→∞ [ν1 d 1 (t) + . .98)(5. .65. . 7 If b ≥ k for all k then the ﬁnal term (φ ∗=0 − φ ∗ )/(β − 1) can be removed. + νK d K (t)] ≤ (B + ˜ C)/(V (β − 1)). M} under some stabilizing control algorithm. . . However.116). all data that is not immediately admitted to the network layer is dropped.128 5. it can be shown that: K lim sup t→∞ k=1 νk d k (t) ≤ (φ ∗=0 − φ ∗ ) B +C + V (β − 1) β −1 where φ ∗ is the optimal solution to (5. Thus. or 0 buffer space.7 The above analysis allows for an arbitrary operation of the transport layer queues Lk (t).7). these can have either inﬁnite buffer space. . . ﬁnite buffer space. .7 ALTERNATIVE FAIRNESS METRICS One type of fairness used in the literature is the socalled maxmin fairness (see.102) with = 0 (which removes constraint (5. that satisfy the above two conditions. . one can reach a maxmin fair allocation by starting from 0 and gradually increasing all ﬂows equally until a bottleneck link is found. A tokenbased scheduling scheme is developed in (160) for achieving maxmin fairness in onehop wireless networks on graphs with link selections deﬁned by matchings. and so on (see Chapter 6. it is shown in (3) that optimizing a sum of concave functions of the form gα (x) = −1 approaches a maxmin fair point as α → ∞. where B adds second moment terms (μk (t) − )2 to (5.102) for the given > 0.2 in (129)). . x M ) represent average throughputs achieved by users {1. . Indeed. . as in (17). and φ ∗=0 is the solution to (5. then increasing all nonbottlenecked ﬂows equally. Indeed. if virtual queues Hk (t + ˜ 1) = max[Hk (t) − μk (t) + . and let denote the set of all possible (x 1 . much different from the utility optimization framework treated in this chapter. x M ) vectors.100)). the above theorems only assume that Lk (t) ≥ 0 for all t. . that satisfy the above condition. It is likely that such an approach also holds xα for more general wireless networks with transmission rate allocation and scheduling. . A vector (x1 . Alternatively. .
m=1 Exercise 5. EXERCISES 129 and for large α they have very large values of gα (x)/gα (x) for x > 0. 1. . . y l . M}.11). . (Using Logarithmic Utilities) Give a closed form solution to the auxiliary variable update of (5. ω(t)) for m ∈ {1. and that satisﬁes all constraints of problem (5. 2. ( Jensen’s Inequality) Let φ(γ ) be a concave function deﬁned over a convex set R ⊆ RM . Assume there is a positive ˆ constant θmax such that: Exercise 5. . . . . 0 ≤ xm (α. and deﬁne T as an independent and random time that is uniformly distributed over the integers {0. A simpler hard fairness approach seeks only to maximize the minimum throughput (161). where log(·) denotes the natural logarithm.8) (including the constraint x ∈ R). 1.131) (5.2.8 EXERCISES Exercise 5. . .This easily ﬁts into the concave utility based driftpluspenalty framework using the concave function g(x) = min[x1 . where xm (t) = xm (α(t). Exercise 5.4. .3.2)(5. m=1 b) φ(γ ) = M log(1 + νm γm ).50) when: a) φ(γ ) = M log(γm ). x 2 .} (5. .4.13)(5.1. .49)(5. .}. Construct a policy that satisﬁes all constraints of problem (5. 1.10)(5. 2. . . t − 1}.9) to prove (5. . (Hard Fairness (161)) Consider a system with M attributes x(t) = (x1 (t). where log(·) denotes the natural logarithm. . x M ] Subject to: 1) All queues are mean rate stable 2) α(t) ∈ Aω(t) ∀t ∈ {0.132) (5. Fix an integer t > 0. xM ]: Maximize: min[x 1 . which typically results in large queue backlog if used in conjunction with the driftpluspenalty method. Hint: Use γ (t) = x for all t. The constant β is a large weight that ensures maximizing m=1 the minimum throughput has a higher priority than maximizing the logarithmic terms. A “mixed” approach can also be considered. . .5). Deﬁne the random vector X = γ (T ). . Let γ (τ ) be a sequence of random vectors in R for τ ∈ {0. ∀ω. . . . . ω) ≤ θmax ∀m ∈ {1. . 5. . . .18) and that yields the same utility value φ(x ).8. M}. x M ] + M log(1 + x m ). with utility φ(x ) = φ opt .(5. . . Use (5. xM (t)). .5. (Transformed Problem with Auxiliary Variables) Let α (t) be a policy that yields well deﬁned averages x . ∀α ∈ Aω ˆ .133) See also Exercise 5. . which seeks to maximize β min[x 1 . . .
we have: V φm (γm (t)) − Gm (t)γm (t) ≤ V φm (0) + (V νm − Gm (t))γm (t) Use this to prove that γm (t) = 0 is the unique optimal solution to (5. where φ(b) is a concave and entrywise nondecreasing utility function. Transmission rates on slot t are given by b(t) = (b1 (t).5 in this case is: Exercise 5. . φm (x) ≤ φm (0) + νm x whenever 0 ≤ x ≤ γm. a) Verify that the algorithm of Section 5.6. SM (t)) is an observed channel state vector for slot t (assumed to be i. . 1. . Show they are slightly different if ties are broken to choose the largest possible auxiliary variables. M} 0 ≤ θ (t) ≤ θmax ∀t ∈ {0. . over slots). . .48) that Gm (t) ≤ V νm + γm.5.max for a constant νm > 0. . 2.} which is solved with auxiliary variables γm (t) with 0 ≤ γm (t) ≤ θmax . c) The problems in (a) and (b) both seek to maximize the minimum throughput. .49)(5. bM (t)) with bm (t) = ˆ bm (α(t). The goal is to choose α(t) every slot to maximize φ(b). .0. (1Hop Wireless System with Inﬁnite Backlog) Consider a wireless system with M channels. . where φm (x) has the property that: Exercise 5. particularly in cases when some virtual queues are zero. x M ] α(t) ∈ Aω(t) ∀t ∈ {0. (Bounded Virtual Queues) Consider the auxiliary variable optimization for γm (t) in (5. with θ(t) as a new variable: Maximize: Subject to: 1) 2) 3) θ x m ≥ θ ∀m ∈ {1. where ω(t) = (S1 (t). Conclude from (5. . . provided this is true at t = 0. and α(t) is a control action chosen within a set Aω(t) . . 2. Show that if both algorithms “break ties” when choosing auxiliary variables by choosing the lowest possible values.130 5. . . 2. . 1. 1. . OPTIMIZING FUNCTIONS OF TIME AVERAGES a) State the driftpluspenalty algorithm for solving the following problem.max . Assume that each channel has an inﬁnite backlog of data.} α(t) ∈ Aω(t) ∀t ∈ {0. . . . .d. . ω(t)).i.} b) State the utilitybased driftpluspenalty algorithm for solving the problem: Maximize: Subject to: min[x 1 . so that there is always data to send.50). x 2 .max for all t. . Show that if 0 ≤ γm (t) ≤ γm.50) whenever Gm (t) > V νm .49)(5. . . then they are exactly the same algorithm. .
so that bk (α(t). where the functions φm (bm ) are continuous. .5). (1Hop Wireless System with Random Arrivals) Consider the same system as Exercise 5. . with maximum rightderivative νm < ∞. γM (t)) to solve: Maximize: V φ(γ (t)) − M Gm (t)γm (t) m=1 Subject to: 0 ≤ γm (t) ≤ γm. . ω(t)). . ω(t)). so that φm (γ ) ≤ φm (0) + νm γ for all γ ≥ 0. made subject to 0 ≤ xm (t) ≤ Am (t).max for some ﬁnite constant Am. . then: φ(b(t)) ≥ φ opt − D+C − V M m=1 νm (V νm + γm.3. m=1 ω(t) and choose α(t) ∈ Aω(t) to maximize • (Virtual Queue Update) Update Gm (t) for all m ∈ {1. provided that this holds at t = 0.6(b). over all slots Exercise 5.8. m=1 nondecreasing. . concave. if all virtual queues are initially empty.d. c) Use (5.33) to conclude that if the conditions of part (b) hold.7. ∀t > 0 Exercise 5.max . ω(t)) is a random function of α(t) and ω(t). M} • (Transmission) Observe M ˆ Gm (t)bm (α(t).max ) t . but under the assumption that ω(t) provides only a partial understanding of the channel for each queue ˆ Qk (t). Suppose φ(b) has the structure of Exercise 5. . M} according to: ˆ Gm (t + 1) = max[Gm (t) + γm (t) − bm (α(t). .max ∀m ∈ {1. show that all queues Gm (t) and Qk (t) are deterministically bounded.6.i. . 0] b) Suppose that φ(b) = M φm (bm ). EXERCISES 131 • (Auxiliary Variables) Choose γ (t) = (γ1 (t). (Imperfect Channel Knowledge) Consider the general problem of Theorem 5.8. . a) State the new algorithm for this case. with the exception that we have random arrivals Am (t) and: ˆ Qm (t + 1) = max[Qm (t) − bm (α(t).max for all t. b) Suppose 0 ≤ Am (t) ≤ Am. 0] + xm (t) where xm (t) is a ﬂow control decision. . We want to maximize φ(x). Prove that the auxiliary variable decisions above yield γm (t) = 0 if Gm (t) > V νm (see also Exercise 5.5. Conclude that 0 ≤ Gm (t) ≤ V νm + γm. ω(t)). . . assumed to be i. and if any Cadditive approximation is used. Using a similar argument.
Deﬁne: ˆ βk (α. (Proof of Theorem 5.132 5. and: Exercise 5. we assume the above holds for δ = 0.71)(5. where knowledge of the channel statistics is needed for computing the βk (α. ∀l ∈ {1.81). ω) functions and their generalizations. Show that this policy also satisﬁes l 0 all constraints of the problem (5.71)(5.5: If problem (5. L} ˆ ∗ ˆ E ak (α (t). so that there are positive constants νm .m such that for all x(t) and γ (t). ω(t)) + gl (γ ∗ ) ≤ δ . ω(t) = ω Assume that the function βk (α. ω)=E bk (α(t). and assumed to have ﬁnite second moments regardless of the choice of α(t). ω(t)) = γ ∗ for some vector γ ∗ . yielding time averages x . if we deﬁne the auxiliary variable decisions to be γ (t) = x∗ for all t. State the modiﬁed algorithm that minimizes the righthandside of (5. Further assume that the functions f (γ ) and gl (γ ) are Lipschitz continuous. . and yields the same cost value. .75) is feasible. βl.m γm (t) − xm (t) . in this proof.76)(5.3) We make use of the following fact. . . α(t). .75).9.10. analogous to Theorem 4. then for all δ > 0 there exists an ωonly policy α ∗ (t) ∈ Aω(t) ˆ such that E x(α ∗ (t). . with a cost y 0 + f (x ) ≤ v. we have: f (γ (t)) − f (x(t)) ≤ M νm γm (t) − xm (t) m=1 gl (γ (t)) − gl (γ (t)) ≤ M βl. K} opt dist γ ∗ . Exercise 5.76)(5. ak (·) are ˆ ˆ ˆ deterministic as before. . Assume the other functions xm (·). L} m=1 . ω(t)) + f (γ ∗ ) ≤ y0 + f opt + δ ˆ E yl (α ∗ (t). . and a maxweight learning framework is developed in (166) for the case of unknown statistics.76) given by some value v. . ∀l ∈ {1. ω) is known. ω(t))α(t) = α.75). E y0 (α ∗ (t). ω(t)) − bk (α ∗ (t). ω(t)) (t)} Note: Related problems with randomized service outcomes and Lyapunov drift are considered in (162)(163)(164)(154)(165)(161). (Equivalence of the Transformed Problem Using Auxiliary Variables) a) Suppose that α ∗ (t) is a policy that satisﬁes all constraints of the problem (5. γ (t) is a policy that satisﬁes all constraints of problem (5.81). yl (·). b) Suppose that α (t). ω(t)) ˆ ≤ δ . . ω(t)}  (t)} = E {βk (α(t). . Hint: E {bk (t) (t)} = E {E {bk (t) (t).84) in this case. X ∩ R ≤ δ For simplicity. and that all actual and virtual queues are initially empty. . yielding time averages x∗ and y ∗ and a cost value of y ∗ + f (x∗ ). Show that this same policy also satisﬁes all constraints of problem (5. y l and a cost value in (5. OPTIMIZING FUNCTIONS OF TIME AVERAGES with the same α(t) and ω(t). ∀k ∈ {1.71)(5.
together with the constant auxiliary vector γ (t) γ ∗ . . (Proﬁt Risk and NonConvexity) Consider a Kqueue system described by (5. p2 (t) = p(t)2 . Exercise 5. EXERCISES 133 = righthandside of the drift bound (5. and hence (5.d. deﬁne the variance as V ar(p)=p 2 − p2 . and that has ﬁnite second moment regardless of the policy.5. . .85).i. ω(t))α(t) = α. ω). b) Use the Lyapunov optimization theorem to prove that for all t > 0: 1 t t−1 a) Plug the above policy α ∗ (t). ω) are known. .42) to prove that for all m ∈ {1. by Jensen’s inequality (with y 0 (t) and γ (t) deﬁned by (5. and show this is a nonconvex stochastic network optimization problem. Theorem 2.88)(5. and that (4. Conclude that all virtual and actual queues are mean rate stable. Speciﬁcally. . ω(t))2 α(t) = α. into the E {y0 (τ ) + f (γ (τ ))} ≤ y0 τ =0 opt + f opt + (D + C)/V and hence. and the Lipschitz conditions to prove (5.91).8. ω) = E p(α(t).11. Deﬁne: φ(α. ψ(α. d) Use (5. The goal is to stabilize all queues while maximizing a linear combination of the proﬁt minus the variance of the proﬁt (where variance is a proxy for “risk”). over all slots for which we have α(t) and ω(t). ω) = E p(α(t). ω(t) = ω ˆ and assume the functions φ(α. ω(t)) be a ˆ ˆ random proﬁt variable that is i. ω(t)) and bk (α(t). f ) Use (5. The resulting righthandside should be: D + C + opt V (y0 + f opt ). where θ1 and θ2 are positive constants. We want to maximize θ1 p − θ2 V ar(p). Write the problem using p 1 and p 2 in the form of (5.87) holds. where the notation h represents a time average expectation of a given process h(t). ˆ with arrival and service functions ak (α(t). a) Deﬁne attributes p1 (t) = p(t).82).1).24)): y 0 (t) + f (γ (t)) ≤ y0 opt + f opt + (D + C)/V c) Manipulate the drift bound of part (a) to prove that ( (t)) ≤ W for some ﬁnite constant W . ω(t) = ω ˆ ψ(α. ω(t)).7) holds for all t > 0 √ and so E {Hm (t)} /t ≤ 2W/t.83) and (4. Let p(t) = p(α(t). M}: 0 ≤ lim x m (t) − γ m (t) = lim t→∞ t→∞ E {Hm (t)}  E {Hm (t)} ≤ lim =0 t→∞ t t Argue that γ (t) ∈ X ∩ R for all t. e) Use part (b) and the Lipschitz assumptions to prove (5. as usual.86).5.84) and add C (because of the Cadditive approximation) to derive a simpler bound on the drift expression.
134) must hold deterministically for all ω(t) realizations. b) Conclude from part (a) that ( (t)) ≤ D + V (φ max − φ ) for all t. ω(t)) = γm ˆ ˆ E ak (α (t). Conclude that all actual and virtual queues are mean rate stable. ω(t)) (t)} (Optimization without Auxiliary Variables (17)(18)) Consider the problem (5. and (ii) Data can be admitted as a real number. . such that φ(γ ) = φ .134) (5. xm (α (t). . Assume that there is an ωonly policy α (t) such that for all possible values of ω(t).2)(5.5). However. ω(t)))  (t) + k=1 ˆ Qk (t)E ak (α(t). K} ∀l ∈ {1. The Lyapunov drift can be shown to satisfy the following for some constant B > 0: L ˆ ( (t)) − V E φ(x(α(t).136) The assumptions (5. α(t). M} ∀k ∈ {1. γM ). and hence all desired inequality constraints are satisﬁed. OPTIMIZING FUNCTIONS OF TIME AVERAGES b) State the “primaldual” algorithm that minimizes the righthandside of (5. we have: Exercise 5. ω(t)}  (t)} = E {φ(α(t). for some ﬁnite conˆ stant D and where φ max is an upper bound on the instantaneous value of φ(x(·)) (assumed to be ﬁnite). . ω(t)) ≤ E bk (α (t).12. . .93) in this context. ω(t)) ˆ E yl (α (t). over slots. ω(t)) − bk (α(t).i. ω(t)))  K (t) ≤ B + l=1 Zl (t)E yl (α(t). . ω(t)) (t) ˆ ˆ (t) − V E φ(x(α(t). . we have: φ(x(t)) ≥ 1 t t−1 E {φ(x(τ ))} ≥ φ − B/V − E {L( (0))} /(V t) τ =0 .134)(5. . a) Assume ω(t) is i. particularly because (5. Plug the alternative policy α (t) into the righthandside above to get a greatly simpliﬁed drift expression.135) (5. . . L} (5. Assume there is a vector γ = (γ1 . ω(t)) ≤ 0 ˆ ∀m ∈ {1.134 5. Hint: Note that: E {p1 (t) (t)} = E {E {p1 (t) (t). . . . . ω(t))  ˆ Suppose every slot we observe (t) and ω(t) and choose an action α(t) that minimizes the righthandside of the above drift inequality. these assumptions can be shown to hold for the special case when xm (t) represents the amount of data admitted to a network from a source m when: (i) All sources are “inﬁnitely backlogged” and hence always have data to send. . . where φ is the maximum utility for the problem. c) Use Jensen’s inequality and part (a) (with iterated expectations and telescoping sums) to conclude that for all t > 0. called the optimal operating point.d.136) are restrictive.
. 0 ≤ p(t) ≤ pmax for some ﬁnite constant pmax . and all data must either be transmitted in 1 slot or dropped (similar to the delaylimited capacity formulation of (70)). . there are no actual queues in the system. with known expectations βk (α(t). . . over all slots with the same α(t) and ω(t). ω(t)) is a ˆ random transmission outcome (as in Exercise 5.i. ω(t)} for all k ∈ {1. ω(t))=E {μk (t)α(t). . Exercise 5. . This is a special case of the general problem (5. . Thus. . Hint: Use a virtual queue Z(t) to enforce the constraint p ≤ Pav . . . . . b) State the driftpluspenalty algorithm that solves this transformed problem.d. . .}. The goal is to solve the following problem: Maximize: φ(μ) Subject to: p ≤ Pav where p is the time average expected power expenditure. Deﬁne ω(t)=[A(t). k k to write the corresponding transformed problem (5. and use virtual queues Gk (t) to enforce the constraints μk ≥ γ k for all k ∈ {1. and Pav is a prespeciﬁed average power constraint. Assume that Ak (t) ≤ Amax for all t. . μK (t)) is the transmission vector and p(t) is the power used on slot t. p(t) = p(α(t). There is no queueing. γK (t)) subject to 0 ≤ γk (t) ≤ Amax for all t. ω(t)) ˆ where μ(t) = (μ1 (t). assumed to be i.14.6.13. . . ω): ˆ ˆ ˆ ˆ μ(t) = (μ1 (α(t). . Deﬁne α(t) ∈ Aω(t) as a general control action. . (DelayLimited Transmission with Errors (71)) Consider the same system as Exercise 5. Let μ be the time average expectation of the transmission vector μ(t).2)(5. (DelayLimited Transmission (71)) Consider a Kuser wireless system with arrival vector A(t) = (A1 (t). Assume these are constrained as follows for all slots t: 0 ≤ μk (t) ≤ Ak (t) ∀k ∈ {1. . and let φ(μ) be a continuous. and entrywise nondecreasing utility function of μ. . .8. ω(τ )).5). .8) to redesign the driftpluspenalty algorithm for this case. K}.13)(5. K}. μK (α(t). . so that μk (t) = μk (α(t). Multislot versions of this problem are treated in Section 7. . . . EXERCISES 135 where x(t)= 1 t t−1 τ =0 E {x(τ )} ˆ and x(τ )=x(α(τ ). .5. Use iterated expectations (as in Exercise 5. 2. for some ﬁnite constants Amax k k for k ∈ {1. . . a) Use auxiliary variables γ (t) = (γ1 (t). . ω(t)). which affects how much of the data to transmit and the amount of power used according to general functions μk (α. SK (t)) for each slot t ∈ {0.8). S (t)] as the random network event observed every slot.18) for this case. Exercise 5. . . ω(t))) . . . . . AK (t)) and channel state vector S (t) = (S1 (t). K}. .1.13. . but now assume that transmissions can have errors. concave. 1. K} . . ω) and p(α.
.
L} on slot t. The weights are related to queue backlogs for singlehop problems and differential backlogs for multihop problems. Thus.1) where the expectation is with respect to the possibly random decision. transmission decisions are coupled throughout the network.137 CHAPTER 6 Approximate Scheduling This chapter focuses on the maxweight problem that arises when scheduling for stability or maximum throughpututility in a wireless network with interference. and where W (t) = (W1 (t). Speciﬁcally. Cadditive approximations can be used to push network utility arbitrarily close to optimal. Such maxweight problems can be very complex for wireless networks with interference. It guarantees constant factor throughput results for algorithms that schedule transmissions within a multiplicative constant of the maxweight solution every slot. . . and let b(t) = (b1 (t). In this chapter. This is because a transmission on one link can affect transmissions on many other links. . Previous chapters showed the key step is maximizing the expectation of a weighted sum of link transmission rates. Algorithms that accomplish this for a given constant C ≥ 0 every slot are called Cadditive approximations. The goal is to make (possibly randomized) decisions for b(t) to come within an additive constant C of maximizing the following expectation: L Wl (t)E {bl (t)W (t)} l=1 (6. WL (t)) is a vector of weights for slot t. or coming within an additive constant C of the maximum. . The second is a more elegant randomized transmission technique that admits a simple distributed implementation.Thus. . . as determined by the parameter V . consider a (possibly multihop) network with L links. . . . . . For problems of network stability. bL (t)) be the transmission rate offered over link l ∈ {1. we ﬁrst consider a class of interference networks without time varying channels and develop two Cadditive approximation algorithms for this context. We then present a multiplicative approximation theorem that holds for general networks with possibly timevarying channels. . with average backlog and delay bounds that grow linearly with C. Chapter 5 showed that Cadditive approximations can be used with a simple ﬂow control rule to give utility that is within (B + C)/V of optimality (where B is a ﬁxed constant and V is any nonnegative parameter chosen as desired). The ﬁrst is a simple algorithm based on trading off computation complexity and delay. with average backlog that grows linearly in both V and C. For problems of maximum throughpututility. . previous chapters showed that Cadditive approximations can be used to stabilize the network whenever arrival rates are inside the network capacity region.
Deﬁne B as the collection of all feasible binary vectors. The set B depends on the interference properties of the network. . and bl (t) = 0 otherwise. Meanwhile.}.} the algorithm allocates the constant rate vector that was computed on the previous frame.}. Let C be a given nonnegative constant. 2. . However. . . At the beginning of each frame r ∈ {0.}. . Thus. the network controller observes the current link weights W (t) = (W1 (t). Deﬁne tr =rT as the start of frame r. .1 COMPUTING OVER MULTIPLE SLOTS We ﬁrst consider the following simple technique for obtaining a Cadditive approximation with arbitrarily low pertime slot computation complexity. 1. and each link can transmit at most one packet per slot. It is easy to show that the maximum is achieved by a deterministic choice bopt (t). . . . . with the goal of maximizing the maxweight value (6. 2.1. . . for any frame r ∈ {1. . APPROXIMATE SCHEDULING 6. Assume that all transmissions are in units of packets. We assume the computation is completed within the T slot frame. so that no two active links share a node.}. A Cadditive approximation to the maxweight problem ﬁnds a vector b(t) every slot t that satisﬁes: L L Wl (t)E {bl (t)W (t)} ≥ max l=1 b∈B Wl (t)bl − C l=1 6. .1). Thus. it also computes the optimal solution to the maxweight problem for the current frame (see Fig. 2. . . . 6. 3. 3. . bL (t)) is a binary vector with bl (t) = 1 if link l transmits a packet on slot t. we have: b(t) = bopt (tr−1 ) ∀t ∈ {tr . while also computing bopt (tr+1 ) during that frame. WL (t)) and chooses a (possibly random) b(t) ∈ B . 1. the problem may be NPhard for general sets B .1). . If this set is deﬁned by all links that satisfy matching constraints. 2. every frame r ∈ {1. The transmission rate vector b(t) = (b1 (t). . The network controller then allocates the constant rate vector b(tr ) for all slots of frame r + 1.138 6. called the link activation set (7). . We say that a binary vector b(t) is feasible if the set of links that correspond to “1” entries can be simultaneously activated for successful transmission. Every slot t. tr + T − 1} . where: L bopt (t)= arg max b∈B l=1 Wl (t)bl The amount of computation required to ﬁnd an optimal vector bopt (t) depends on the structure of the set B . . for r ∈ {0. . in that the channel conditions do not change and the transmission rate options are the same for all slots t ∈ {0. . so that no polynomial time solution is available. 1. . 2. then bopt (t) can be found in polynomial time (via a centralized algorithm).1 TIMEINVARIANT INTERFERENCE NETWORKS Suppose the network is time invariant. and divide the timeline into successive intervals of T slot frames. Fix a positive integer T > 0. possibly by exhaustively searching through all options in the set B . . . the network controller observes the weights W (tr ) and begins a computation to ﬁnd bopt (tr ).
1. we have: L opt Wl (tr−1 )bl (tr−1 ) l=1 L = max b∈B L l=1 Wl (tr−1 )bl opt ≥ l=1 Wl (tr−1 )bl (t) . Note that t − tr−1  ≤ 2T − 1.2) Further. where θ is some positive constant.1. assume that no link weight can change by an amount more than θ.1. Fix any slot t ≥ T . we now compute a value C such that the above algorithm is a Cadditive approximation for all slots t ≥ T . We have: L L Wl (t)bl (t) = l=1 l=1 L Wl (t)bl (tr−1 ) L opt = l=1 L Wl (tr−1 )bl (tr−1 ) + l=1 L opt (Wl (t) − Wl (tr−1 ))bl (tr−1 ) θ t − tr−1 bl (tr−1 ) l=1 opt opt ≥ l=1 L Wl (tr−1 )bl (tr−1 ) − opt opt ≥ l=1 Wl (tr−1 )bl (tr−1 ) − Lθ (2T − 1) (6. Now assume the maximum change in queue backlog over one slot is deterministically bounded. because bopt (tr−1 ) solves the maxweight problem for links W (tr−1 ). as is the maximum change in each link weight. Speciﬁcally. TIMEINVARIANT INTERFERENCE NETWORKS 139 opt Compute b (t0) opt Compute b (t1) opt Compute b (t2) t0 opt opt Implement b (t0) Implement b (t1) t1 t2 t3 Figure 6. Let r represent the frame containing this slot.1: An illustration of the frame structure for the algorithm of Section 6. It follows that for any two slots t1 < t2 : Wl (t1 ) − Wl (t2 ) ≤ θ (t2 − t1 ) Under this assumption.6.
for any α such that 0 ≤ α ≤ 3. The work (168) uses this to provide a smooth complexitydelay tradeoff for switches. where new link activations are tried randomly and compared in the max . APPROXIMATE SCHEDULING L L = l=1 L Wl (t)bl (t) − l=1 opt opt [Wl (t) − Wl (tr−1 )]bl (t) opt ≥ l=1 Wl (t)bl (t) − Lθ (2T − 1) L = max b∈B l=1 Wl (t)bl − Lθ (2T − 1) (6.1.140 6. with a tradeoff in average queue backlog and average delay. This shows that maximum throughput can be achieved with arbitrarily low pertime slot complexity.3) Combining (6. This means the frame size T must be chosen to be at least exponential in L to achieve polynomial perslot complexity. with a tradeoff of increasing the value of C linearly in T . Unfortunately.2 RANDOMIZED SEARCHING FOR THE MAXWEIGHT SOLUTION The ﬁrst lowcomplexity algorithm for fullthroughput scheduling in timeinvariant interference networks was perhaps (169).3) yields: L L Wl (t)bl (t) ≥ max l=1 b∈B Wl (t)bl − 2Lθ (2T − 1) l=1 Taking conditional expectations gives: L L Wl (t)E {bl (t)W (t)} ≥ max l=1 b∈B Wl (t)bl − 2Lθ (2T − 1) l=1 It follows that this algorithm yields a Cadditive approximation for C =2Lθ (2T − 1). The maxweight problem for N × N packet switches is a maxweight matching problem that can be computed in time that is polynomial in N . the maxweight problem for networks with general activation sets B may be NPhard. This technique was used in (167)(168) to reduce the perslot complexity of scheduling in N × N packet switches. This can be made as small as desired by increasing T . Now let complexity represent the number of operations required to compute the maxweight solution (assuming for simplicity that this number is independent of the size of the weights Wl (t)). Because this complexity is amortized over T slots. so that the only available computation algorithms have complexity that is exponential in the network size L. showing average delay of O(N 4−α ) is possible with perslot complexity O(N α ). the algorithm yields a perslot computation complexity of complexity/T . 6. which in turn incurs delay that is exponential in L. The constant C is linear in the number of links L and in the frame size T .2) and (6.
1. even when arrival rates are very low. . after a polynomial time T . A variation on the randomized algorithm of (169) for more complex networks is developed in (170).1. intuitively this works for the same reason as the framebased scheme presented in the previous subsection: The randomized selection can be viewed as a (randomized) computation algorithm that solves the maxweight problem over a (variable length) frame. in a N × N packet switch. However. and in (173) for scheduling in optical networks: Max Link Weight Plus Entropy Algorithm: Every slot t. and hence this yields maximum throughput with polynomial delay. This is achieved by using α = 1 in the smooth complexitydelay tradeoff curve described in the previous subsection. if the vector is chosen according to the desired distribution every slot t. 6. . or nonpolynomial delays and/or convergence times. Related NPhardness results are developed in (171) for pure stability problems with low delay. observe the current link weights W (t) = (W1 (t). the throughput would be at least 1 away from optimal). . If we can design an algorithm that. While the implementation of the algorithm is more elegant than the deterministic computation method described in the previous subsection. We ﬁrst present the result. Thus. then this algorithm (with high probability) must have selected a vector b(t) that is a maxsize vector during some slot t ∈ {0. WL (t)) and choose b(t) by randomly selecting a binary vector b = . and it has a simple relation to distributed scheduling in a carrier sense multiple access (CSMA) system. TIMEINVARIANT INTERFERENCE NETWORKS 141 weight metric against the previously tried activation. and then discuss the complexity associated with generating a vector with the desired distribution. . This is not surprising: Suppose the problem of maximizing the number of activated links is NPhard. where T is geometric with success probability equal to the number of optimal vectors in B divided by the size of the set B . . this could be used as a randomized algorithm for ﬁnding a maxsize vector in polynomial time. Further. the randomized method yields complexity that is O(N) and an average delay bound of O(N!). The optimal solution is computed in some random number of T slots. All known methods for achieving throughpututility within of optimality for networks with general interference constraints (and for arbitrary > 0) have either nonpolynomial perslot complexity. .3 THE JIANGWALRAND THEOREM Here we present a randomized algorithm that produces a Cadditive approximation by allocating a link vector b(t) according to the steady state solution of a particular reversible Markov chain. its resulting delay bounds can be worse. The following randomized algorithm for choosing b(t) ∈ B was developed in (172) for wireless systems with general interference constraints. T } (else. This is analyzed with a different Markov chain argument in (169). For example. The Markov chain can easily be simulated. the value of C that this algorithm produces is linear in the network size. related to the convergence time required for the Markov chain to approach steady state. . However.6. the deterministic method of (168) can achieve complexity that is O(N) with an average delay bound of O(N 3 ). has produced a throughput that is within 1/2 from the maximum sum throughput with high probability. .
1).5) and (6. the max linkweightplusentropy algorithm is a Cadditive approximation for the maxweight problem. and the entropy of any probability distribution that contains at most k probabilities is at most log(k). and can come arbitrarily close to utility optimality. Remarkably.6) is given by (6. We ﬁrst compute the value of the maximization objective under the particular distribution p ∗ (b) given in (6. However. although we organize the proof differently below. We have: − b∈B p (b) log(p (b)) + b∈B ∗ ∗ p (b) l=1 ∗ L Wl (t)bl = b∈B p (b) log(A) − b∈B ∗ p (b) l=1 ∗ L Wl (t)bl + b∈B p ∗ (b) L Wl (t)bl l=1 = log(A) . then this produces a Cadditive approximation to the maxweight problem (6. b∈B p(b) L Wl (t)bl l=1 b∈B p(b) = 1 (6. where H (p(·)) is the entropy (in nats) associated with the probability distribution p(b). and E {bl (t)W (t)} is the expected transmission rate over link l given that b(t) is selected according to the probability distribution p(b). developed in (172).4) is the desired distribution. we have for any probability distribution p(b): 0≤− b∈B p(b) log(p(b)) ≤ L log(2) It follows that if we can ﬁnd a probability distribution p(b) to solve the problem (6.5) (6. Theorem 6. It follows that such an algorithm can yield full throughput optimality.6) where log(·) denotes the natural logarithm. The work (172) motivates this algorithm by the modiﬁed problem that computes a probability distribution p(b) over the set B to solve the following: Maximize: − Subject to: b∈B p(b) log(p(b)) + 0 ≤ p(b) ∀b ∈ B . APPROXIMATE SCHEDULING (b1 .5)(6.4) where A is a normalizing constant that makes the distribution sum to 1.5)(6. . bL ) ∈ B with probability distribution: p∗ (b)=P r[b(t) = b] = L exp(W (t)b ) l l l=1 A (6. note that because the set B contains at most 2L link activation sets. . Proof. the next theorem. This problem is equivalent to maximizing H (p(·)) + L l=1 Wl (t)E {bl (t)W (t)}. .6).4). with an average backlog and delay expression that is polynomial in the network size. Thus. with C = L log(2).142 6.4). The proof follows directly from the analysis techniques used in (172).1 (JiangWalrand Theorem (172)) The probability distribution p∗ (b) that solves (6. .6). in that it exactly solves the problem (6. shows that the probability distribution (6.
We have: L p∗ (b) − b∈B p(b) log(p(b)) + b∈B p(b) l=1 Wl (t)bl p(b) p ∗ (b) − b∈B L = − b∈B p(b) log p∗ (b) p(b) log b∈B L + b∈B p(b) l=1 Wl (t)bl = − + b∈B p(b) p ∗ (b) Wl (t)bl p(b) log(p∗ (b)) p(b) l=1 ≤ − b∈B p(b) log(p∗ (b)) + b∈B L p(b) l=1 L Wl (t)bl Wl (t)bl (6. We now show that the expression in the objective of (6. In this case. TIMEINVARIANT INTERFERENCE NETWORKS 143 where we have used the fact that is a probability distribution and hence sums to 1. the distribution (6.7) = − b∈B p(b) log(1/A) − b∈B L p(b) l=1 + b∈B p(b) l=1 Wl (t)bl = log(A) where in (6.5) is log(A).6. which states that the divergence between any two distributions p∗ (b) and p(b) is nonnegative (174): dKL (pp∗ )= b∈B p(b) log p(b) p ∗ (b) ≥0 Thus. consider any other distribution p(b). so that it is possible to get from any b1 ∈ B to any other b2 ∈ B by a sequence of adding or removing single links. so that p ∗ (b) is optimal for this objective. which is achieved by the distribution p ∗ (b).4) is particularly interesting because it is the exact stationary distribution associated with a continuous time ergodic Markov chain with state b(v) (where v is a continuous time variable that is not related to the discrete time index t for the current .1. To this end. where each step of the sequence produces another valid activation vector in B (this holds in the reasonable case when removing any activated link from an activation vector in B yields another activation vector in B ). we have used the well known KullbackLeibler divergence result.7). proving the result.5) for any other distribution p(b) is no larger than log(A). 2 Assume now the set B of all valid link activation vectors has a connectedness property. the maximum value of the objective function (6.
6. .1 to allow more time to reach the steady state. . It is often possible to provide low complexity decisions for b(t) that come within a multiplicative factor of the maxweight solution. chosen within an abstract set IS (t) .1. they may have large convergence times and delays as discussed in the previous section. provided that turning this link ON does not violate the link constraints B . Let S(t) describe the channel randomness on slot t (i.4) is known to be a “#Pcomplete” problem (178) (see also factor graph approximations in (179)). Transitions for this Markov chain take place by having each link l such that bl (v) = 1 deactivate at times according to an independent exponential distribution with rate μ = 1.4) can be shown by state space truncation arguments as in (129)(131). APPROXIMATE SCHEDULING slot). for general networks.144 6. . . S(t)) ∀l ∈ {1. and let I (t) be the transmission action on slot t. This section shows that such algorithms immediately lead to constantfactor stability and throughpututility guarantees. . However. we need to run such an algorithm in continuous time for a long enough time to reach a near steady state. similar Markov chains require only a small (polynomial) time to reach near steady state (180)(181). and this all needs to be done within one slot to implement the result. the topology state). The result holds for general networks. This is because the Markov chain can get “trapped” for long durations of time in certain suboptimal link activations (this is compensated for in the steady state distribution by getting “trapped” in a maxweight link activation for an even longer duration of time). and having each link l such that bl (v) = 0 independently activate according to an exponential distribution with rate λl = exp(Wl (t)). we could solve NPhard problems with efﬁcient randomized algorithms). the convergence of the Markov chain to nearsteadystate takes a nonpolynomial amount of time (else.8) . with the understanding that the queue backlog changes by an amount O(T ) that yields an additional additive term in our Cadditive approximation (see (176) for an argument in this direction using stochastic approximation theory).2 MULTIPLICATIVE FACTOR APPROXIMATIONS While Cadditive approximations can push throughput and throughpututility arbitrarily close to optimal. This may explain why the simulations in (172) for networks with small degree provide good performance. . possibly with timevarying channels. such as those formed by networks on rings. with Carrier Sense Multiple Access (CSMA) telling us if it is possible to turn a new link ON (see also (175)(172)(173)(176)(177) for details on this). Even computing the normalizing A constant for the distribution in (6. bL (t)) is determined by a general function of I (t) and S(t): ˆ bl (t) = bl (I (t). L} (6. . it is known that for link activation sets with certain degree2 properties. That the resulting steady state is given by (6.. we can use a T slot argument as in Section 6. Unfortunately.e. and possibly with nonbinary rate vectors. However. The rate vector b(t) = (b1 (t). . Of course. This has the form of a simple distributed algorithm where links independently turn ON or OFF.
. . then stability is only guaranteed when arrival rates are at most half the distance to the capacity region boundary (so that the region where we can provide stability guarantees shrinks by 50%). MULTIPLICATIVE FACTOR APPROXIMATIONS 145 Deﬁnition 6. . . S(t)) ≥ λl ∀l ∈ {1. . and (154) for applications to channels with errors). . . λN ) in the capacity region . It is known that (β. λL + ) ∈ β if there is an >0 . . . . . and the reader is referred to the above references for proofs of the more general versions. . Deﬁne λl = E {al (t)} as the arrival rate to queue l. Deﬁne as the set of all vectors (b1 . For simplicity. over slots with some probability distribution. S(t))W (t) ≥ β sup L l=1 I ∈IS(t) ˆ Wl (t)bl (I. there exists a Sonly policy I ∗ (t) that satisﬁes: ˆ E bl (I ∗ (t). . where the βscaling goes inside the utility function (see (22)(19) for a precise scaledutility statement. Related constantfactor guarantees are available for joint scheduling and ﬂow control to maximize throughpututility. bL ) that can be achieved as 1slot expectations under Sonly policies. L} We say that a vector (λ1 . . a (1. (b1 . we prove this result only for the special case of achieving stability in a 1hop network. C) approximation is the same as a Cadditive approximation.i. . (137) for applications to cognitive radio. . Deﬁne ω(t)=[S(t). . λL ) is interior to the scaled capacity region β such that: (λ1 + . Deﬁne an Sonly policy as a policy that independently chooses I (t) ∈ IS(t) based only on a (possibly randomized) function of the observed S(t). . L} where the expectation in the lefthandside is with respect to the distribution of S(t) and the possibly randomized decision for I ∗ (t) that is made in reaction to the observed S(t). . bL ) ∈ if and only if there is a Sonly policy I ∗ (t) that satisﬁes I ∗ (t) ∈ IS(t) and: ˆ E bl (I ∗ (t).2. . . . and assume that ω(t) is i. being a βscaled version of the capacity region (17)(22)(19)(182). C)approximation is an algorithm that makes (possibly randomized) decisions I (t) ∈ IS(t) every slot t to satisfy: L l=1 ˆ Wl (t)E bl (I (t).8). . C)approximations can provide stability in single or multihop networks whenever the arrival rates are interior to β . if β = 1/2. . . Consider a 1hop network with L queues with dynamics: Ql (t + 1) = max[Ql (t) − bl (t). C be constants such that 0 < β ≤ 1 and C ≥ 0. . aL (t)) is the random vector of new data arrivals on slot t. . Here. L} where the service variables bl (t) are determined by I (t) and S(t) by (6. . S(t)) − C Under this deﬁnition. . . a(t)]. and a(t) = (a1 (t). That is. 0] + al (t) ∀l ∈ {1. Recall that for any rate vector (λ1 . A (β. S(t)) = bl ∀l ∈ {1. This provides all of the necessary insight with the least amount of notation. .2 Let β. assume the set is closed. . .6. . . For example.d. .
10) If a (β. . Because (6. Plugging this policy into the righthandside of (6. L} where the ﬁrst equality above holds because I ∗ (t) is Sonly and hence independent of the queue backlogs Q(t). S(t))Q(t) (6. . λL /β + /β) ∈ Thus. C)approximation is used for all slots t (where C ≥ 0 is a given constant). Fix β such that 0 < β ≤ 1.9). Fix slot t. λL + ) ∈ β (6.i.146 6.11) where I ∗ (t) is any other (possibly randomized) decision in the set IS(t) . 2 . we have: L L t−1 L E {Ql (τ )} ≤ B/ τ =0 l=1 (Q(t)) ≤ B + C + l=1 Ql (t)λl − β l=1 ˆ Ql (t)E bl (I ∗ (t). and recall that Lyapunov drift satisﬁes (see (3.12) (6. Theorem 6. S(t))Q(t) (6.d. . . . Proof.11) yields: L L (Q(t)) ≤ B + C + = B +C− Ql (t)λl − β l=1 L l=1 Ql (t)(λl /β + /β) (6. .16)): l=1 L L (Q(t)) ≤ B + l=1 Ql (t)λl − l=1 ˆ Ql (t)E bl (I (t).13) Ql (t) l=1 The result then follows by the Lyapunov drift theorem (Theorem 4. we know that: (λ1 /β + /β. . C)approximation for minimizing the ﬁnal term in the righthandside of (6. . λL ). APPROXIMATE SCHEDULING Assume second moments of the arrival and service rate processes are bounded. . then the network is mean rate stable and strongly stable. with average queue backlog bound: 1 lim sup t→∞ t where B is the constant from (6. S(t)) ≥ λl /β + /β ∀l ∈ {1. .1). Because our decision I (t) yields a (β. there exists a Sonly policy I ∗ (t) that satisﬁes: ˆ ˆ E bl (I ∗ (t).9) where B is a positive constant that depends on the maximum second moments. . Deﬁne 1 L(Q(t)) = 2 L Ql (t)2 . .10) holds. .3 Consider the above 1hop network with ω(t) i. . Suppose there is an > 0 such that: (λ1 + . . and if E {L(Q(0))} < ∞. over slots and with arrival rates (λ1 .9). . S(t))Q(t) = E bl (I ∗ (t).
6. not based on approximating the queuebased maxweight rule. Distributed random access versions of this that produce (β. (137)). for stable wireless networks in (184)(104)(103). then selects the next largest weight link that does not conﬂict with the previous one.e. (1/2. Indeed. Different forms of approximate scheduling. yields a (1/2. and so on. 0)approximation algorithms) exist for networks with matching constraints (where links can be simultaneously scheduled if they do not share a common node). for example. are treated using maximal matchings for stable switch scheduling in (183)(102). . MULTIPLICATIVE FACTOR APPROXIMATIONS 147 The above theorem can be intuitively interpreted as follows: Any (perhaps approximate) effort to schedule transmissions to maximize the weighted sum of transmission rates translates into good network performance. for utility optimization in (185). so that it comes within a factor β = 1/2 of the maxweight decision (see. and for energy optimization in (186). More concretely. simple greedy algorithms with β = 1/2 and C = 0 (i. C) approximations are considered in (154). 0)approximation. it can be shown that the greedy maximal match algorithm that ﬁrst selects the largest weight link (breaking ties arbitrarily).2.
.
Renewal frames occur one after the other. and the combination of α(t) and ω(t) generates a vector of attributes (i. we considered a slotted structure and assumed that every slot t a single random event ω(t) is observed. .1 THE RENEWAL SYSTEM MODEL T[0] T[1] T[2] T[3] t[0]=0 t[1] t[2] t[3] t[4] Figure 7. and let {t[0]. . Examples of this type are given in Section 7. This model allows a larger class of problems to be treated. A policy is a contingency plan for making a sequence of decisions. we change the slot structure to a renewal frame structure. For each r ∈ {0. Deﬁne t[0] = 0. We decompose the timeline into successive renewal frames. we must specify a dynamic policy π [r] for the frame. Here. 1.149 CHAPTER 7 Optimization of Renewal Systems Here we extend the driftpluspenalty framework to allow optimization over renewal systems. Consider a dynamic system over the continuous timeline t ≥ 0 (where t can be a real number). . t[1]. a single action α(t) is taken.1. t[r + 1]) is the . . including Markov Decision Problems. 7.1: An illustration of a sequence of renewal frames. An example renewal system is a wireless sensor network that is repeatedly used to perform sensing tasks. the interval of time [t[r]. Assume that each new task starts immediately when the previous task is completed. either penalties or rewards) for that slot.4 and Exercise 7. . 2. described in more detail in Section 7.e. where new random events might take place after each decision in the sequence.2. and the start of each renewal frame is a time when the system state is “refreshed. Rather than specifying a single action to take on each frame r. .. t[2]. The duration of each task and the network resources used depend on the policy implemented for that task.}. In previous chapters.6.” which will be made precise below.} be a strictly increasing sequence that represents renewal events. The frame durations are variable and can depend on the decisions made over the course of the frame.
. and there are ﬁnite constants Tmin . . .150 7. . . is independent of the events and outcomes from past frames. 7. y0. We make the following renewal ˆ assumptions: • For any policy π ∈ P . second moments are uniformly bounded. 1. . .}. y0.1.3) . .min ≤ E y0 (π [r])π [r] = π ≤ y0. Denote T [r]=t[r + 1] − t[r] as the duration of the rth renewal frame (see Fig.2) That is.1 THE OPTIMIZATION GOAL Suppose we have an algorithm that chooses π [r] ∈ P at the beginning of each frame r ∈ {0. all frame sizes T [r] are positive integers.1) . 2. • The frame sizes T [r] are always strictly positive.max for l ∈ {1. the controller chooses a policy π [r] from some abstract policy space P . 7.max ∀l ∈ {1.p. 1. the conditional distribution of (T [r]. y0. . . At the start of each renewal frame r ∈ {0. T (π[r]) and yl (π [r]) are random variables. 2. . y [r]).1) (7. Assume temporarily that this algorithm yields well deﬁned frame averages T and y l with probability 1. . . regardless of the policy. given π[r].}. . and the policy π [r] speciﬁes decisions that are made in reaction to these events. so that: 1 R→∞ R lim R−1 T [r] = T (w. We formally write the renewal size T [r] and the penalties yl [r] as random functions of π[r]: ˆ ˆ T [r] = T (π [r]) . we have: ˆ 0 < Tmin ≤ E T (π[r])π [r] = π ≤ Tmax .min . . . and is identically distributed for each frame that uses the same policy π .p.max such that for all policies π ∈ P . L} ˆ Thus. the policy on frame r generates a random vector of penalties y [r] = (y0 [r]. y1 [r]. . . and Tmin = 1. L} (7.This policy is implemented over the course of the frame.1). The size of the frame T [r] is random and may depend on the policy. yl [r] = yl (π [r]) ∀l ∈ {0. OPTIMIZATION OF RENEWAL SYSTEMS rth renewal frame. given π[r] = π. .1) r=0 (7. . 1. Further. . Tmax . In the special case when the system evolves in discrete time with unit time slots. yL [r]). . L} such that for all π ∈ P : ˆ E T (π [r])2 π [r] = π E yl (π [r])2 π [r] = π ˆ ≤ D2 2 ≤ yl. r=0 1 R→∞ R lim R−1 yl [r] = y l (w.There may be a sequence of random events during each frame r.max ˆ 2 • There are ﬁnite constants D 2 and yl.
7) Under mild boundedness assumptions on T [r] and yl [r] (for example.7. .1. . 2. THE RENEWAL SYSTEM MODEL 151 We want to design an algorithm that chooses policies π [r] over each frame r ∈ {0. 7.) over frames.6) is feasible if there is an i.i.d. cL ) are a given collection of real numbers that deﬁne time average cost constraints for each penalty.d. algorithm π ∗ [r] that satisﬁes: E yl (π ∗ [r]) ˆ ˆ E T (π ∗ [r]) ≤ cl ∀l ∈ {1.d.6) where (c1 . for all l ∈ {1. . . L}. . we shall ﬁnd it easier to work with time average expectations of the form: 1 T [R]= R R−1 r=0 1 E {T [r]} . y l [R]= R R−1 E {yl [r]} ∀l ∈ {0.9).} (7. 1. . . . the Lebesgue dominated convergence theorem ensures that the limiting values of T [R] and y l [R] also converge to T and y l whenever (7. 1. As before.1. . Let π ∗ [r] represent such an i. sampled at renewal times. algorithm as one that. note that the time average penalty.d.5) (7. . .4) (7. . ˆ r=0 r=0 ∗ Thus. We say that the problem (7. .D. . . 2.8) . . . . our goal is to minimize the time average associated with the y0 [r] penalty. is given by: lim R−1 r=0 yl [r] R−1 r=0 T [r] R→∞ = limR→∞ 1 R 1 limR→∞ R R−1 r=0 yl [r] R−1 r=0 T [r] = yl T Hence.}. chooses a policy π [r] by independently and probabilistically selecting π ∈ P according to some distribution that is the same for all frames r.i. . . 1. ALGORITHMS Deﬁne an i. when these are deterministically bounded). . . .2 OPTIMALITY OVER I. . . as are {yl (π ∗ [r])}∞ .i. l where the averages are equal to the expectations over one frame. . . L} π [r] ∈ P ∀r ∈ {0. To understand this. 2.} to solve the following problem: Minimize: Subject to: y 0 /T y l /T ≤ cl ∀l ∈ {1. .I. 1. The value y l /T represents the time average penalty associated with the yl [r] process. . these have well deﬁned averages T and y ∗ with probability 1.4)(7. Then the values ˆ {T (π ∗ [r])}∞ are independent and identically distributed (i.i. L} (7.3) holds (see Exercise 7. subject to the constraint that the time average associated with the yl [r] process is less than or equal to cl . L} r=0 (7. by the law of large numbers. algorithm. at the beginning of each new frame r ∈ {0.
and deﬁne the Lyapunov function L(Z [r]) by: L(Z [r])= Deﬁne the conditional Lyapunov drift 1 2 L Zl [r]2 l=1 (7.6). algorithms.i.i. .d.d. .11) 7.i. . deﬁne virtual queues Zl [r] with Zl [0] = 0. . then for any δ > 0 there is an i. under mild assumptions.d. .1 If there is an i.i.5. algorithm that satisﬁes the feasibility constraints (7. which does not restrict to i. .12) For each l ∈ {1. and with dynamics as follows: Let Z [r] be the vector of queue values. the value ratioopt is also the inﬁmum of the objective function in the problem (7. algorithm π ∗ [r] that satisﬁes: E y0 (π ∗ [r]) ˆ E yl (π ∗ [r]) ˆ ˆ ≤ E T (π ∗ [r]) (ratioopt + δ) ˆ ≤ E T (π ∗ [r]) cl ∀l ∈ {1.152 7.d. algorithm π ∗ [r] such that: ˆ E yl (π ∗ [r]) ≤ E T (π ∗ [r]) (cl − ) ∀l ∈ {1.13) (Z [r]) as: (Z [r])=E {L(Z [r + 1]) − L(Z [r])Z [r]} . L}.9) (7. OPTIMIZATION OF RENEWAL SYSTEMS Assuming feasibility. . algorithms that meet the constraints (7.d. . 0] ∀l ∈ {1.18 and 4. L} (7. L} ˆ (7. . .i. This is similar in spirit to Theorems 4. It is often useful to additionally assume that the following “Slater” assumption holds: Slater Assumption for Renewal Systems: There is a value > 0 and an i. . It can be shown that. However. .d.10) The value ratioopt is deﬁned in terms of i.8): E y0 (π ∗ [r]) ˆ ˆ E T (π ∗ [r]) The following lemma is an immediate consequence of these deﬁnitions: Lemma 7.i.8). so that we desire to push the time average penalty objective as close as possible to the smallest value that can be achieved over i. we simply use ratioopt as our target. . . algorithms. rather than stating these assumptions and proving this result. L} (7.d. .2 DRIFTPLUSPENALTY FOR RENEWAL SYSTEMS Zl [r + 1] = max[Zl [r] + yl [r] − cl T [r]. we deﬁne ratioopt as the inﬁmum value of the following quantity over all i. .4)(7. algorithms.i.
which was only shown to be effective for pure feasibility problems (where y0 (π [r]) = 0 for all r) or for problems where the frame durations are independent of ˆ the policy (see also Exercise 7.2. Our algorithm below.14) where B is a ﬁnite constant that satisﬁes the following for all r and all possible Z [r]: 1 B≥ 2 L E (yl [r] − cl T [r])2 Z [r] l=1 (7. 1.16) − l=1 ˆ Zl [r]cl E T (π[r])Z [r] This variableframe drift methodology was developed in (56)(57) for optimizing delay in networks deﬁned on Markov chains. is inspired by the decision rule in (58).1)(7. RenewalBased DriftPlusPenalty Algorithm: At the beginning of each frame r ∈ {0. . . it is easy to show that: L (Z [r]) ≤ B + l=1 ˆ Zl [r]E yl (π [r]) − cl T (π [r])Z [r] ˆ (7.17) if: ⎤ ⎡ E V y0 (π ) + L Zl [r]yl (π )Z [r] ˆ ˆ ˆ E V y0 (π [r]) + L Zl [r]yl (π [r])Z [r] ˆ l=1 l=1 ⎦ ≤ C + inf ⎣ π∈P ˆ ˆ E T (π[r])Z [r] E T (π )Z [r] . DRIFTPLUSPENALTY FOR RENEWAL SYSTEMS 153 Using the same techniques as in previous chapters.3). Deﬁnition 7.7. However.12).17) As before. we deﬁne a Cadditive approximation to the ratiominimizing decision as follows. observe Z [r] and do the following: • Choose a policy π[r] ∈ P that minimizes the following ratio: E V y0 (π [r]) + ˆ L ˆ l=1 Zl [r]yl (π [r])Z [r] ˆ E T (π[r])Z [r] • Update the virtual queues Zl [r] by (7.2 A policy π [r] is a Cadditive approximation of the policy that minimizes (7.2). (7. . the analysis in (56)(57) used a policy based on minimizing the righthandside of the above inequality. which minimizes the ratio of expected driftpluspenalty over expected frame size.15) Such a ﬁnite constant B exists by the boundedness assumptions (7. The driftpluspenalty for frame r thus satisﬁes: L (Z [r]) + V E {y0 [r]Z [r]} ≤ B + V E y0 (π [r])Z [r] + ˆ l=1 L Zl [r]E yl (π [r])Z [r] ˆ (7. which can be applied to the general problem.}. 2.
L}. d) If the Slater assumption (7.d.i.18) where π ∗ [r] is any i. Suppose we implement the above renewalbased drift (RenewalBased DriftPlusPenalty Performance) Assume there is an i.3 pluspenalty algorithm using a Cadditive approximation for all frames r. . .154 7. .d. .11) holds for a constant stable and satisfy the following for all R > 0: 1 R R−1 L E {Zl [r]} ≤ r=0 l=1 VF Tmin (7. . then all queues Zl [r] are strongly where B is deﬁned in (7. . . Then: a) All queues Zl [r] are mean rate stable.7). with initial condition Zl [0] = 0 for all l ∈ {1. algorithm where y l [R] and T [R] are deﬁned in (7.19) . In the above inequality.i.15). . . L} R→∞ R lim b) For all l ∈ {1. L} we have: lim sup(y l [R] − cl T [R]) ≤ 0 and so lim sup y l [R]/T [R] ≤ cl R→∞ R→∞ π ∗ [r] that satisﬁes the feasibility constraints (7. . . OPTIMIZATION OF RENEWAL SYSTEMS In particular. in that: E {Zl [R]} = 0 ∀l ∈ {1. . c) The penalty process y0 [r] satisﬁes the following for all R > 0: y 0 [R] − ratioopt T [R] ≤ B + CTmax V > 0. then: L E V y0 (π [r]) + ˆ l=1 Zl [r]yl (π [r])Z [r] ≤ CTmax ˆ L ˆ ∗ l=1 Zl [r]yl (π [r]) ˆ +E T (π [r])Z [r] E V y0 (π ∗ [r]) + ˆ ˆ E T (π ∗ [r]) (7. algorithm that is chosen in P and is independent of queues Z [r]. we have used the fact that: ˆ E T (π [r])Z [r] ≤ Tmax Theorem 7.8). if policy π [r] is a Cadditive approximation.
10). and plug into the righthandside of (7.21) to ﬁnd: ˆ E {L(Z [r + 1])} − E {L(Z [r])} + V E {y0 [r]} ≤ B + CTmax + E T (π [r]) V ratioopt Summing over r ∈ {0.d.22).20) the policy π ∗ [r] that satisﬁes (7. ratioopt Tmin ] because ratioopt may be negative. we can rearrange (7. .4). The ﬁrst lim sup statement in part (b) follows immediately from mean rate stability of Zl [r] (via Theorem 2.p. we take expectations of (7.20) where π ∗ [r] is any policy in P .16) yields: (Z [r]) + V E {y0 [r]Z [r]} ≤ B + CTmax + ˆ E T (π[r])Z [r] ˆ E T (π ∗ [r]) L l=1 ˆ E T (π [r])Z [r] ˆ E T (π ∗ [r]) L l=1 V E y0 (π ∗ [r]) ˆ + Zl [r]E yl (π [r]) − ˆ ∗ ˆ Zl [r]cl E T (π[r])Z [r] (7. . DRIFTPLUSPENALTY FOR RENEWAL SYSTEMS 155 where the constant F is deﬁned below in (7. Taking a limit as δ → 0 yields: ˆ (Z [r]) + V E {y0 [r]Z [r]} ≤ B + CTmax + E T (π [r])Z [r] V ratioopt To prove part (a). L} (w. (Theorem 7. Further. This proves that all components Zl [r] are mean rate stable by Theorem 4. . . to yield: ˆ (Z [r]) + V E {y0 [r]Z [r]} ≤ B + CTmax + E T (π [r])Z [r] V (ratioopt + δ) The above holds for all δ > 0. .18) into the righthandside of the driftpluspenalty inequality (7. . then queues Zl [r] are rate stable and: lim sup R→∞ 1 R 1 R R−1 r=0 yl [r] R−1 r=0 T [r] ≤ cl ∀l ∈ {1. . we know that (7. To prove part (c). ratioopt Tmin ] − V y0.min where we use max[ratioopt Tmax . . algorithm π ∗ [r] from (7. . R − 1} and dividing by RV yields: 1 E {L(Z [R])} − E {L(Z [0])} + RV R R−1 (7.9)(7. . which makes decisions independent of Z [r]. . Now ﬁx δ > 0.21) to yield: (Z [r]) ≤ B + CTmax + V max[ratioopt Tmax . L}.18) holds.2. proving part (a). yl [r] is either deterministically lower bounded or deterministically upper bounded. if for all l ∈ {1.5(b)). Plugging the i. .i.3) Because we use a Cadditive approximation every frame r. The second lim sup statement in part (b) follows from the ﬁrst (see Exercise 7.21) E {y0 [r]} ≤ r=0 B + CTmax 1 + ratioopt V R R−1 E {T [r]} r=0 .7.1.1) Proof.
.19) holds.22) Thus. Part (d) follows from plugging the policy π ∗ [r] from (7. because for all r ∈ {1. y0. so that all queues Zl [r] are strongly stable.max − Tmin l=1 Zl [r] (Z [r]) ≤ V F − Tmin l=1 Zl [r] where the constant F is deﬁned: F= Tmax B + CTmax Tmin + max y0. T [r − 1]} ≥ Tmin and E T [r]2 T [0].8) that all queues are rate stable. . OPTIMIZATION OF RENEWAL SYSTEMS Using the deﬁnitions of y 0 [R] and T [R] in (7.23) Further. . from Theorem 4.20) to obtain: (Z [r]) + V E {y0 [r]Z [r]} ≤ B + CTmax + V This can be written in the form: L ˆ E T (π[r])Z [r] ˆ E T (π ∗ [r]) L y0.3 it follows that: lim inf R→∞ 1 R R−1 T [r] ≥ Tmin > 0 (w. . In the special case when the yl [r] are deterministically bounded. . 0 1 R R−1 1 R 1 R R−1 r=0 T [r] R−1 r=0 T [r] yl [r] − cl T [r]. . Thus. .p.11) into (7.7) and noting that E {L(Z [R])} ≥ 0 and E {L(Z [0])} = 0 yields: y 0 [R] ≤ B + CTmax + ratioopt T [R] V This proves part (c). from Lemma 4. .min V Tmin Tmax (7. . .1) r=0 However: 1 R 1 R R−1 r=0 yl [r] R−1 r=0 T [r] − cl ≤ max = max 1 R 1 R 1 R R−1 r=0 yl [r] R−1 r=0 T [r] R−1 r=0 − cl . T [1].1) r=0 .156 7. T [r − 1] ≤ D 2 .5(a): lim sup R→∞ 1 R R−1 yl [r] − cl r=0 1 R R−1 T [r] ≤ 0 (w. T [1]. we have that (7. . by Theorem 2. we have by the Strong Stability Theorem (Theorem 2. 0 r=0 1 1 R R−1 r=0 T [r] (7.1.p.max .} we have E {T [r]T [0].max − y0. 2.
3 shows that it can be solved by minimizing an expectation every frame. The tradeoff is that the virtual queues are O(V ) in size.4)(7. Now suppose we seek to minimize y 0 . . 1. .6). .1) This proves part (d). However.} This problem has a signiﬁcantly different structure than (7. . We note that Tmax ≥ E {b(π )} ≥ Tmin > 0 for all π ∈ P . . Deﬁne θ ∗ as the inﬁmum . Indeed.7. .3 MINIMIZING THE DRIFTPLUSPENALTY RATIO E {a(π)} E {b(π )} We rewrite the driftpluspenalty ratio (7. . 2. Consider the following variation of problem (7.4)(7. both expressed as a function of the policy π ∈ P .3.6): Minimize: Subject to: y 0 /T y l ≤ 0 ∀l ∈ {1. 2 The above theorem shows that time average penalty can be pushed to within O(1/V ) of optimal (for arbitrarily large V ).} This changes the constraints from y l /T ≤ cl to y l ≤ 0. Finally. The alternative uses only a minimum of an expectation every frame. .23) yields: lim sup R→∞ 1 R 1 R R−1 r=0 yl [r] R−1 r=0 T [r] − cl ≤ 0 × 1 Tmin = 0 (w. 7. The problem is: Minimize: Subject to: y0 y l /T ≤ cl ∀l ∈ {1. . this is just a special case of the original problem (7. rather than a ratio of expectations. and it is considerably easier to solve. .5 explores an alternative algorithm for the original problem (7. rather than y 0 /T .p.17) in the following simpliﬁed form: where a(π ) represents the numerator and b(π ) the denominator. . 2.2. rather than a ratio of expectations. .6).4)(7. rather than y l /T .4)(7. we care more about y l itself.6) with cl = 0. L} π [r] ∈ P ∀r ∈ {0. Exercise 7.1 ALTERNATE FORMULATIONS In some cases. we note that Exercise 7. 1. . 7. MINIMIZING THE DRIFTPLUSPENALTY RATIO 157 and so taking a lim sup of (7. . which affects the time required for the penalties to be close to their required time averages cl . L} π [r] ∈ P ∀r ∈ {0.
and it can often be accomplished through dynamic programming algorithms (64)(67)(57) and their special cases of stochastic shortest path algorithms. 2 Lemma 7.28) .4 For any policy π ∈ P .25).24) We want to understand how to ﬁnd θ ∗ . To treat the case when E {b(π )} may depend on the policy. Proof. the minimization is achieved by choosing π ∈ P to minimize E {a(π)}. OPTIMIZATION OF RENEWAL SYSTEMS of the above ratio: θ ∗ = inf π∈P E {a(π)} E {b(π )} (7. we have: E a(π) − θ ∗ b(π ) ≥ 0 (7. In the special case when E {b(π )} does not depend on the policy π (which holds when the expected renewal interval size is the same for all policies). Lemma 7. we have: π∈P inf E {a(π) − θ b(π )} < 0 if θ > θ ∗ (7.158 7. we use the following simple but useful lemmas. we have for any policy π ∈ P : E {a(π)} E {a(π)} ≥ inf = θ∗ E {b(π )} π∈P E {b(π )} Multiplying both sides by E {b(π )} and noting that E {b(π )} > 0 yields (7.26) Further.25) with equality if and only if policy π achieves the inﬁmum ratio E {a(π)} /E {b(π )} = θ ∗ . This is important because the minimization of an expectation is typically much simpler than a minimization of the ratio of expectations. for any value θ ∈ R. By deﬁnition of θ ∗ .27) π∈P inf E {a(π) − θ b(π )} > 0 if θ < θ ∗ (7. That equality holds if and only if E {a(π )} /E {b(π )} = θ ∗ follows immediately.5 We have: π∈P inf E a(π) − θ ∗ b(π ) = 0 (7.
Then: π∈P inf E {a(π ) − θb(π )} = ≤ π∈P π∈P inf E a(π) − θ ∗ b(π ) − (θ − θ ∗ )E {b(π )} inf E a(π) − θ ∗ b(π ) − (θ − θ ∗ )Tmin = −(θ − θ ∗ )Tmin < 0 where we have used (7. This is useful because each stage involves minimizing an expectation.27). then θbisect = θ ∗ . If the result (k) (k) is positive then we know θbisect < θ ∗ . This proves (7.4 that we have for any policy π : 0 ≤ E a(π ) − θ ∗ b(π) = E {b(π )} E {a(π)} − θ∗ E {b(π )} ≤ Tmax E {a(π)} − θ∗ E {b(π )} Taking inﬁmums over π ∈ P of the above yields: 0 ≤ inf E a(π) − θ ∗ b(π ) ≤ Tmax inf π∈P π∈P E {a(π)} − θ∗ E {b(π )} =0 where the ﬁnal equality uses the deﬁnition of θ ∗ in (7. and if the result is negative we know θbisect > θ ∗ . To prove (7. we have ﬁni