You are on page 1of 13

MIMO Downlink Scheduling in LTE and LTE-Advanced Systems

Honghai Zhang, Narayan Prasad, Sampath Rangarajan NEC Laboratories America {honghai,prasad,sampath}@nec-labs.com
AbstractLTE and LTE-Advanced (LTE-A) broadband wireless data networks are characterized by their use of Orthogonal Frequency Division Multiple Access (OFDMA) and Multiple Input and Multiple Output (MIMO) techniques. Scheduling plays a vital role in such systems in exploiting multi-user, multichannel and spatial diversity. We consider the downlink (DL) scheduling problem at the base station (BS) over such networks under a variety of practical constraints mandated by the 3GPP standards. Dening a new construct called transmission mode, which denotes a particular choice of MIMO operational mode, precoding matrix, transmission rank, as well as the modulation and coding schemes (MCSs) of up to two codewords, we show that both LTE and LTE-A systems require that in every scheduling interval, each scheduled user be served using only one transmission mode. We then prove that the resulting scheduling problems are NP-hard under both backlogged and nite queue trafc models. In each case, we re-formulate the scheduling problem as the maximization of a monotonic submodular function under a matroid constraint. This enables us to develop a unied lowcomplexity greedy algorithm that yields solutions guaranteed to be within 1/2 of the respective optima. Extensive performance evaluation in realistic settings reveals near-optimal performance of our proposed algorithm and that it signicantly outperforms the state of the art, especially for the nite queue model.

I. I NTRODUCTION The emerging LTE and LTE-A broadband wireless data networks feature OFDMA-based downlink transmission schemes and MIMO techniques. In these networks, the transmission time is divided into scheduling intervals or subframes of 1 ms duration each. In each subframe, the available frequency band (typically 5-20MHz) is divided into a large number of orthogonal narrow-band subcarriers (or tones). Contiguous subcarriers are then grouped into multiple resource blocks (RBs), which form the minimum allocation units in the frequency domain. Frequency-domain packet scheduling plays a critical role in exploiting multi-user and multi-channel diversity in LTE and LTE-A systems. MIMO is a promising multiple-antenna technology that can substantially improve system spectrum efciency. MIMO provides three possible operational modes that are advantageous over single antenna systems: transmit diversity, beamforming, and spatial multiplexing. Beamforming can be viewed as a special type of spatial multiplexing where only one layer (i.e., one data stream) is transmitted with a particular beam pattern. In the general case of spatial multiplexing, transmission rank which is equal to the number of transmitted streams and the precoding matrixwhich determines the beam pattern with which the streams are transmitted, can be chosen to optimize 1

the system capacity. Therefore, MIMO provides additional degrees of freedom for packet scheduling, where the operational modes (transmit diversity or spatial multiplexing) along with the choice of precoding matrix and transmission rank become part of the scheduling decision to be made. Ideally, the MIMO mode (including the precoding matrix and rank) and the MCS should be separately optimized for each RB in order to obtain the maximum possible gains in the system capacity. However, in practice, doing so would entail a prohibitive amount of channel feedback from the users (to enable such a ne-grained per-RB scheduling) as well as feedforward signaling to the users (to inform them about the scheduling decisions). Consequently, LTE and LTE-A networks have decided to impose several practical constraints which limit the signaling overhead to manageable levels. In this work, we incorporate several key constraints imposed by 3GPP standards body on LTE and LTE-A [1], [2], such as (i) a maximum limit on the number of users scheduled in a subframe, (ii) using a common MIMO mode on all the RBs allocated to a particular user, and (iii) assigning to a user at most two codewords (a codeword is a transport block of information bits) with one MCS-per-codeword. Additionally, we consider the scheduling problems under both backlogged and nite queue trafc models. A striking difference of the LTE/LTE-A MIMO downlink scheduling compared to that in single-antenna systems is that a user may be simultaneously allocated up to two codewords and each codeword spans all the RBs assigned to that user but maps to only one MCS. As a result, each scheduled user may be assigned up to two codewords (each with its own MCS) in a subframe. This is in sharp contrast to the previous academic models of single-antenna systems where a user can be assigned a separate MCS on each of its RBs. To the best of our knowledge, no previous work has considered the DL scheduling problem for LTE and LTE-A incorporating the aforementioned practical constraints imposed by the standards. We expect such scheduling constraints to be imposed not only in LTE and LTE-A networks but also in future technology releases of the 3GPP standards. This motivates us to devise efcient algorithms for solving the problem. We make three major contributions in this work. First, to facilitate the problem formulation, we dene a new construct called transmission mode, which denotes a particular choice of a MIMO operational mode, precoding

matrix, transmission rank, and MCS(s) for the allocated codeword(s) and show that LTE and LTE-A require using a common transmission mode on all RBs allocated to a particular user. Second, we show that the formulated MIMO scheduling problems with the aforementioned constraints under both the trafc models are NP-hard. Third, we develop a unied greedy algorithm that simultaneously considers all the aforementioned constraints and develop two solutions for the problem: one for the backlogged trafc model and the other for the nite queue trafc model. We prove that it achieves 1 -approximation 2 guarantee under both the trafc models. We note that it is also possible to obtain (1-1/e)-approximation algorithms by exploiting recent advances in submodular function maximization, although these algorithms have high complexity not yet conducive to practical implementation. We conduct extensive simulations based on practical LTE channel models with MIMO and OFDMA to corroborate our analysis. Realistic performance evaluation shows that our proposed algorithm attains a performance gap well within 10% of the optimal values under most practical scenarios, much better than the worst case guarantee of half. Simulations show that our unied algorithm outperforms a best-known algorithm by nearly 10% for the backlogged trafc model and up to 30% for the nite queue model. A. Related work Various avors of the DL scheduling problems over OFDMA-based networks have been studied extensively [16], [3], [4], [5], [14], [6], [7], [11], [12], [9], [15], but hitherto the attention has mainly been on single antenna systems. Even within the context of single-antenna systems, most of the existing works assume that the MCS for a user can be selected independently on each RB (which avoids the coupling constraint imposed by mandating a common MCS for each codeword across all assigned RBs) and/or only consider the backlogged trafc model. One recent exception is [13] which considers the DL scheduling problem with a common MIMO mode constraint. However, the algorithm proposed in [13] only works with two possible MIMO modes: transmit diversity and a restricted spatial multiplexing technique which allows using only the identity matrix as the precoding matrix. Since [13] does not allow for complete spatial multiplexing involving several transmission ranks and precoding matrices, assignment of precoding matrix to users under the constraint of a common precoding matrix and transmission rank on the RBs allocated to a user, does not arise. Additionally, standards specic constraints such as the common MCS requirement for each codeword across all RBs, maximum of two codewords for every scheduled user and a limit on the number of scheduled users, are not considered. We note that while [13] also obtained 1 a 2 -approximation for their simpler scheduling problem, their method, when extended to the case with more than two 1 transmission modes, does not guarantee 2 -approximation and 2

in fact can only offer a guarantee that is inversely proportional to the number of transmission modes. A few other works have considered a much smaller subset of the constraints or trafc models that we consider here. For example, Andrews and Zhang [5] considered the scheduling problem under the nite queue trafc model over single antenna systems without MIMO modes and without a common MCS requirement. Kwan et al. [12] studied the single antenna system scheduling problem with a common MCS requirement but proposed a greedy algorithm that does not offer any approximation guarantee. Thus, none of the existing works has considered all the aforementioned constraints simultaneously, all of which are essential to realize an implementable scheduling algorithm. It is substantially more challenging to address all these constraints together and offer approximation guarantees. B. Paper organization In Section II, we discuss the constraints imposed by LTE and LTE-A and introduce the model and problem formulation. In Section III, we present a unied algorithm that simultaneously considers all the aforementioned constraints and develop two different solutions for the problem; one applies to the backlogged trafc model and the other applies to the nitequeue trafc model. Section IV is devoted to the proofs of the approximation guarantees of the proposed algorithms. We present the performance evaluation in Section V and conclude with Section VI. II. P ROBLEM F ORMULATION A. LTE downlink scheduling In 3GPP LTE (and LTE-A), the transmission time is divided into subframes of 1 ms duration. In each subframe, the frequency base band (typically 5-20MHz) is divided into a large number of orthogonal narrow-band subcarriers. Subcarriers are then grouped into resource blocks (RBs), which form the minimum allocation unit in the frequency domain. LTE downlink scheduling is performed in the frequency domain of each subframe. The objective of the downlink scheduling problem is to allocate RBs to users, in order to maximize a system-wide utility, given that all users experience different channel conditions on different RBs. It was shown in [17] that, in order to maximize the sum of all users utility functions K Ui (i ), where the r i=1 utility Ui of user i is a function of the long-term average user (t) (t) rate ri , it is sufcient to maximize i Ui (i )ri at each r (t) time instant t, where ri is the instantaneous rate of user i (t) and ri is its average rate up to time t. Therefore, we consider the general problem of maximizing the weighted sum data rate in each subframe and omit the time index t in the following discussion. Assume that there are N resource blocks (N RBs) and K wireless users, and user u has a weight wu . Under a simple model where different data rates can be assigned independently to each RB, it is assumed that the user u receives a data rate ru,n on resource block n. For the convenience of future

discussion, we assume that the data rate ru,n is measured in bits per RB (which can be viewed as the bits per sub-carrier multiplied by the number of sub-carriers in each RB). The objective is to maximize the weighted sum rate over all RBs subject to the constraint that only one user can be allocated to each RB. Mathematically, the problem is to
N K

maximize
n=1 u=1 K

wu ru,n yu,n yu,n = 1, 1 n N


u=1

subject to

(1)

where yu,n is an indicator variable whose value is 1 if user u is allocated to RB n, and the constraints require that only one user is allocated to each RB. If there is no other constraint, the scheduling problem (1) can be optimally solved by selecting the user that maximizes the weighted data rate on each RB. In other words, for each RB n, choose user u(n) = argmax wu ru,n to schedule. B. Constraints imposed by LTE and LTE-A with MIMO MIMO provides three operational modes that are advantageous over single-antenna systems: transmit diversity, beamforming, and spatial multiplexing. For spatial multiplexing,1 there are two more control knobs that can further optimize the system capacity: transmission rank and precoding matrix. Transmission rank is the number of transmitted spatial layers, where a spatial layer is a mapping of symbols to the transmit antenna ports; each layer can be viewed as a separate data stream. Transmission rank is limited by the minimum of the number of transmit and receive antennas. In a MIMO transmission with L-layer spatial multiplexing, in each RB a sequence of data symbols is rst segmented into L layers to form a matrix X. Then, a precoding matrix P is applied to map the multi-layered data symbols in X to the transmitted signal Y(X) before being launched into the air: Y(X) = PX. For spatial multiplexing, LTE allows up to 4 spatial layers (i.e., a maximum rank of 4) and up to 2 codewords per scheduled user, where a codeword is a transport block of information bits that is independently decoded. Codewords are mapped onto different spatial layers in a pre-dened manner. To reduce the feedforward control signaling overhead and the user feedback overhead, LTE imposes the following practical constraints. 1) The number of users scheduled in a subframe cannot exceed a certain threshold. This constraint is required to limit the signaling overhead of resource allocations. 2) The MIMO operational mode (transmit diversity or spatial multiplexing), the rank, and the precoding matrix on all the RBs allocated to the same user needs to be the same. This simplies the transmitter design and reduces the signaling overhead. 3) The MCS has to be the same for each codeword spanning all the RBs allocated to a user in a sub-frame.
mentioned earlier, beamforming can be viewed as a special type of spatial multiplexing with one layer (i.e., one data stream).
1 As

This also reduces the signaling overhead of resource allocations. Since at most two codewords per-user are allowed in LTE, constraint 3 implies that each user may receive up to two MCSs, one for each codeword. Without loss of generality, we assume that two codewords (and thus two MCSs) are always allocated to a scheduled user, where the second codeword may take a dummy value to indicate the case of one codeword. We dene a transmission mode as the combination of the MIMO operational mode, the rank, the precoding matrix, and the two MCSs for the two codewords. For example, spatial multiplexing with 2 layers and precoding matrix index 1 where the rst codeword uses an MCS comprising of 16QAM modulation and 1/2 coding rate, and the second codeword uses QPSK-1/2, together forms one transmission mode. Then constraints 2 and 3 can be combined into one constraint of common transmission mode per-user. LTE-Advanced (LTE-A) is similar to LTE except that in LTE-A, on each RB, user-specic reference symbols can be transmitted to specify a precoding matrix for the allocated user. As a consequence, the precoding matrix can be independently chosen to optimize the channel capacity on each RB although the transmission rank and the MCS on each codeword have to be the same. For LTE-A systems, if we dene a transmission mode as the combination of the MIMO operational mode (transmit diversity or spatial multiplexing), transmission rank, and the two MCSs for the two codewords, then it also requires that a common transmission mode be used on all the RBs allocated to a scheduled user. With this modied denition of the transmission mode, all the analysis and the algorithms in this paper can be carried over to LTE-A systems. For notational convenience, we focus on LTE systems hereafter. C. Optimization framework For LTE systems, we consider the downlink scheduling problem where the objective is to maximize the weighted sum rates subject to the practical constraints imposed by LTE as presented in the previous subsection. Note that the user weights can be updated in each subframe to maximize the sum of arbitrary concave utility functions [17]. For example, if our objective is to maximize a log utility function (which is equivalent to maintaining proportional fair scheduling), it is sufcient to maximize the weighted sum rate in each subframe, where the weight of each user is the inverse of the exponential moving average of the achieved rate. Given the transmission mode m, let the MCSs for the two codewords be 1 (m) and 2 (m), respectively, both of which are uniquely determined by m. When there is no confusion, we omit the dependence parameter m for brevity. Now the expected data rate for scheduling user u on RB n with mode m is given by
m ru,n = T1 (1 p1 ) + T2 (1 p2 ) u,n u,n

(2)

where Ti is the number of information bits (a.k.a., transport block size) per-RB carried with MCS i on codeword i and pi is the block error probability for user u with transmission u,n 3

mode m and MCS i on RB n for i = 1, 2. With this formulation, we assume that Tm T1 + T2 bits can be m allocated on RB n but ru,n ( Tm ) is the expected number of bits that can be successfully delivered. This model is more accurate than the typical assumption made in previous works m (e.g. [5], [13]) where an unconstrained variable number ru,n of bits can be allocated on RB n and then all of them are successfully delivered. Comments on Channel feedback: In order to compute pi , u,n the base station (BS, or e-NodeB) requires necessary channel state information from the users. Ideally, mobile terminals need to feedback either the channel quality indicator (CQI) for every possible MIMO mode or a channel gain matrix, perRB. A mobile can reduce the amount of feedback by sending the CQIs for only a few MIMO modes that it deems are most likely to be assigned to it by the BS. For example, in the LTE downlink, a mobile is allowed to select and report one transmission rank and one precoding matrix. In addition it is allowed to send two CQIs (one CQI) per subband, where a subband is a group of contiguous RBs, for the two codewords, whenever the selected transmission rank is greater than 1 (equal to one). Moreover, the LTE precoding codebook has a nested structure wherein one or more column subsets of each precoding matrix are themselves valid lower rank precoding matrices. Together these features enable the BS to determine the per-RB expected data rates for up to 4 MIMO modes (as well as 29 MCSs) based on a single feedback report received from that mobile and to possibly override the user recommended rank. A more detailed discussion on the rankoverride is presented in Appendix A. Let wu be the weight of user u. We dene xm as the u,n indicator variable which is one if user u is scheduled on RB n with transmission mode m, and zero otherwise. Let K, M be the total number of users and transmission modes, respectively. Dene K as the maximum number of users that can be scheduled in a subframe. We consider two possible trafc models: a backlogged trafc model and a nite queue model. The following two subsections describe the problem formulation under these two models. D. Backlogged trafc model Under this model, we can use Eq. (2) to simplify the notations and the scheduling problem is formulated as
K N M m wu ru,n xm u,n u=1 n=1 m=1 M K xm = 1, 1 n N u,n m=1 u=1 M N max xm 1, 1 u K u,n n=1 m=1 K M N max xm K u,n n=1 u=1 m=1

where the objective is to maximize the weighted sum data rate, the rst set of constraints requires that on each RB, at most one user with one transmission mode can be scheduled, the second set of constraints stipulates that each user can only have one transmission mode (i.e., one common transmission mode is used across all RBs allocated to the user), and the last constraint requires that the maximum number of scheduled users cannot exceed K. Note that the max function can be easily converted into a linear inequality constraint so the problem is a linear integer programming problem. E. Finite queue model Under this model, we assume that user u has a queue size Qu . Although the two MCSs 1 (m) and 2 (m) on the two codewords depend on the transmission mode m, we often omit the parameter m for brevity when there is no confusion. Denote t1 , t2 as the number of information bits n n allocated to the two codewords on RB n, respectively. If RB n is allocated to user u with transmission mode m (i.e., xm = 1), the maximum number of bits allocated to the two u,n codewords on RB n is T1 (m) and T2 (m) , respectively. So 0 xm ti Ti (m) , i = 1, 2. The scheduling problem is u,n n then formulated as,
K N M 2

maximize subject

wu xm u,n

ti (1 pi ) u,n n

u=1 n=1 m=1 i=1 M K m to xu,n = 1 m=1 u=1 M N max xm 1, 1 u K u,n n=1 m=1 K M N max xm K u,n n=1 u=1 m=1 M N xm (t1 + t2 ) Qu , 1 u K u,n n n m=1 n=1 0 xm ti Ti (m) , 1 n N, i = 1, 2 u,n n

(4)

where the second to the last equation is the queue size limit for each user, and the last equation species the maximum allowed number of bits allocated to one codeword on every RB. F. Hardness results We now establish the hardness results of the formulated problems. Theorem 1 Problem (3) is NP-hard and there exists > 0, such that it is NP-hard to obtain (1 )-approximation to problem (4). Proof: The problem with common MIMO operational mode (either transmit diversity or spatial multiplexing) without user limit is shown to be NP-hard in [13]. That problem is a special case of problem (3) if we let K = K and M = 2. Thus problem (3) is NP-hard. 4

maximize subject to

(3)

Andrew and Zhang showed that it is NP-hard to obtain a (1 )-approximation solution to problem (2) in [5], which can be viewed as maximizing weighted sum rate subject to nite queue size limit, where the weight is chosen to be the queue size in [5]. The hardness result of problem (2) in [5] can be extended to the same problem but with a general, arbitrary weight for each user, which is then a special case of problem (4) if we let K = K, M = 1, set p to be either 1 or 0, and there is only one codeword on each RB. Therefore, it is NP-hard to obtain (1 )-approximation to problem (4). III. A U NIFIED S CHEDULING A LGORITHM Due to the NP-hardness of the problems, only sub-optimal solutions are possible. We next develop a unied algorithm to solve the downlink scheduling problem and then specialize the solution to apply to each of the trafc models. We then show that the unied algorithm achieves a constant approximation ratio of the optimal solutions under both the trafc models. We devise the following unied scheduling algorithm to solve problems (3) and (4). Our objective is to construct a set S of scheduled users, a transmission mode m(u) and a set of RBs I(u) allocated to each user u S. We also dene a current value V (n) on each RB n, representing the weighted data rate of the current allocation on RB n. Denote U as the set of candidate users to be scheduled. Initially U = {1, , K}, S = , and V (n) = 0 for 1 n N . At each iteration, we rst compute the value v(u, n, m) of scheduling user u U on each RB n with transmission mode m. We then compute the gain g(u, m) of adding user u with transmission mode m:
N

Algorithm 1 Unied Scheduling Algorithm 1: U = {1, , K}, S = , V (n) = 0 for 1 n N . 2: while Termination conditions not satised do 3: for all u U do 4: for m = 1 to M do 5: Compute the value v(u, n, m) on each RB n and the gain g(u, m). 6: end for 7: end for 8: (u , m ) = arg maxuU ,1mM g(u, m). 9: If g(u , m ) 0, go to 18. 10: Add user u to S and remove it from U. 11: for n = 1 to N do 12: if v(u , n, m ) > V (n) then 13: Allocate RB n to user u , 14: V (n) = v(u , n, m ). 15: end if 16: end for 17: end while 18: Re-select the optimal transmission mode for each user with the set of allocated RBs.

g(u, m) =
n=1

max(0, v(u, n, m) V (n)).

(5)

and whenever g(u, m(u)) 0, user u is removed from U. Second, if the computation of v(u, n, m) does not depend on the existing allocations (as in the backlogged trafc model), it can be pre-computed during initialization. However, this cannot be applied to the nite-queue model. Depending on whether it is the backlogged trafc model or the nite queue model, only the function v(u, n, m) (as well as g(u, m)) needs to be computed differently. In the following we describe the methods to compute them under each model. A. Backlogged trafc model Under backlogged trafc model, v(u, n, m) is simply the weighted rate of allocating user u on RB n with transmission mode m:
m v(u, n, m) = wu ru,n .

v(u, n, m) can be viewed as the weighted data rate of assigning user u to RB n with mode m. The exact computation of v(u, n, m) (as well as g(u, m)) depends on the trafc model and will be discussed later. We next nd the optimal m and u that maximizes g(u, m). Let (u , m ) = argmaxuU ,mM g(u, m). We then add the user u to the scheduled user list (and remove it from U), and assign its transmission mode to be m . For each RB n, if v(u , n, m ) > V (n), then it is allocated to user u and removed from its previous allocation if it was allocated previously. Note that the RB allocation is temporary as it may be re-allocated to another user later. V (n) is now updated with V (n) = max(V (n), v(u , n, m )). (6)

(7)

g(u, m) is the maximum total gain of adding user u to S with transmission mode m and is calculated using Eq. (5). Algorithm complexity: As at most K users can be selected, the maximum number of iterations in the outer loop is K. Each outer iteration takes O(KM N ) time (note that it requires O(N ) to compute g(u, m)). The total complexity of the algorithm is thus O(KKM N ). B. Finite queue model Given u, m and the nite queue size Qu for each user u, the value v(u, n, m) is the weighted data rate of allocating user u with t1 and t2 bits for the two codewords on RB n with n n i (m) 2 transmission mode m: v(u, n, m) = wu i=1 ti (1 pu,n ). n We can see that the value v(u, n, m) depends on the actual bit allocations and cannot be pre-computed. The major issue here is to nd the best bit allocations (i.e., t1 and t2 ) to maximize n n 5

The algorithm terminates when any of the following condi tions are satised: i) U = , ii) |S| = K, or iii) g(u , m ) 0. The pseudo-code of the algorithm is illustrated in Algorithm 1. Two simplications may be made to speed up the computation (they are not illustrated in Algorithm 1 for brevity). First, we can compute m(u) = argmax1mM g(u, m),

the gain g(u, m) (dened in Eq. (5)) subject to the queue size constraint. For a given transmission mode m (with MCSs 1 , 2 ), m ti Ti . Therefore, only on RBs n such that wu ru,n > V (n) n m is there weighted rate increase (where ru,n is dened in Eq. (2)). We dene a subset of RBs R {1 n N : m wu ru,n > V (n)}. Clearly there is no reason to allocate to user u RBs outside of R. Given a transmission mode m with MCS 1 , 2 , the maximum number of bits carried by one RB is T1 + T2 . So if the queue size Qu of user u satises Qu |R|(T1 + T2 ), the problem is reduced to the backlogged case and we simply apply Eq. (7) and (5) to compute v(u, n, m) and g(u, m). If Qu < Tm |R|, we need to nd a subset of RBs in R (as well as the bit allocations) that maximizes the total gain of the allocations. We consider two possible scenarios. In the rst scenario, only one codeword is allowed for the user (this can happen when either the user has only one receiving antenna or the channel condition of the user is too weak to support multiple layers). In the second scenario, two codewords are allowed for the user. While the rst scenario can be viewed as a special case of the second scenario where the second codeword takes a dummy value, the problem in the rst scenario is easier and allows much more efcient (but still non-trivial) solutions. Note that even if there is only one codeword (and thus one layer), there are still multiple MIMO modes as the MIMO mode also includes the choice of the precoding matrix (precoding vector in this case) and the MIMO operational mode (transmit diversity or beamforming). 1) One codeword: In the case that only one codeword is allowed, we ignore the terms related to the second codeword and omit the codeword index i in i and ti to simplify the n notations. Then, given user u and transmission mode m with MCS (m), the sub-problem of allocating bits to RBs in R can be formulated as max
nR

Algorithm 2 Optimal Bit Allocation with One Codeword 1: if Qu T |R| then 2: Allocate all RBs in R to the user u, stop. 3: else 4: Let k1 Qu /T 5: Sort RBs in R in the decreasing order of the gain wu T (1 p ) V (n). Assume RBs are sorted in u,n this order hereafter. 6: Allocate T bits to each of the rst k1 RBs in R. 7: n1 = arg maxnR,n>k1 wu (Qu k1 T )(1 p ) u,n V (n). Allocate RB n1 with Qu k1 T bits if the gain on RB n1 is positive. 8: Denote H1 as the allocation resulting from the previous two steps. 9: n2 = arg maxnR,nk1 +1 p and allocate RB n2 u,n with Qu k1 T bits if the gain on RB n2 is positive. 10: Allocate T bits to each of the rst k1 + 1 RBs except n2 . 11: Let H2 denote the allocation resulting from the previous two steps. 12: Select the allocation having a larger gain between H1 and H2 . 13: end if worst case. Since the greedy step will be executed O(KKM ) times, the total complexity of Algorithm 1 for the nite queue model is O(KKM N log N ). It may be tempting to provide simpler algorithms. For example, one simple solution is to sort the RBs in the decreasing order of the gain (wu T (1 p ) V (n)) and u,n then allocate bits in this order. In Appendix C, we show that this straightforward solution is not optimal. 2) Two codewords: When there are two codewords, let t1 , t2 be the number of bits allocated to the two codewords n n on RB n. Given user u and transmission mode m with MCS 1 and 2 , and the block error probabilities p1 and p2 for u,n u,n the two codewords on RB n, the sub-problem of bit-allocation can be formulated as
2 +

max(0, wu tn (1 p ) V (n)) u,n tn Q u 0 tn T , for all n R


nR

s.t.

(8)

max
nR

wu
i=1 2

ti (1 n

p i ) u,n

V (n)

After solving problem (8), we use the solution {tn } to obtain v(u, n, m) = wu tn (1p ). We note that = (m) depends u,n on m. We now present the following optimal solution to problem (8) as shown in Algorithm 2. Proposition 1 Algorithm 2 nds the optimal solution to problem (8). Moreover, the found optimal solution has at most one partially allocated RB and if there is a partially allocated RB, it has the largest p among all allocated RBs. u,n The proof is deferred to Appendix B and we comment that this proposition is important for the approximation guarantees of the algorithms to be established later. Algorithm complexity: The major complexity of Algorithm 2 lies in sorting RBs in Step 5, which requires O(N log N ) in the 6

s.t.

ti Q u n
nR i=1 0 ti n

Ti , for all n R, i = 1, 2.

(9)

where [x]+ represents max(0, x). After solving problem (9), we let v(u, n, m) = wu (t1 (1 p1 ) + t2 (1 p2 )). We now n u,n n u,n develop the following optimal solution to problem (9) based on dynamic programming. Dene gn (t) as the maximum gain with total t bits for RBs 1 to n of R (without loss of generality, only RBs in R are considered and indexed). Among total t bits, let s t bits be allocated to the nth RB, and t s bits be allocated to the rst n1 RBs. The optimal way to allocate s bits between the two codewords on RB n is to rst allocate to the codeword with a

Algorithm 3 Optimal Bit Allocation with Two Codewords 1: if Qu (T1 + T2 )|R| then 2: Allocate all RBs in R to the user u, stop. 3: else 4: Compute g1 (t) using Eq. (12) for 1 t Qu 5: for n = 1 to |R| do 6: for t = 1 to Qu do 7: Compute gn (t) and sn (t) using Eqs. (10) and (11). 8: end for 9: end for 10: for n = |R| to 1 do |R| 11: tn = sn (Qu j=n+1 tj ). 12: end for 13: Allocate tn bits to RB n for n = 1, |R|. 14: If more than one codeword in a RB is not fully allocated, shift the data from one codeword to another such that only one codeword becomes partially allocated. 15: end if

The proof of the proposition is deferred to Appendix D. Algorithm complexity: The major complexity of Algorithm 3 lies in the recursive steps, which requires up to O(T ) time, where T is the maximum transport block size in one codeword of one RB among all possible MCSs. Thus, 2 Algorithm 3 requires O(N 2 T ) time. The total complexity of 2 Algorithm 1 in this case is then O(KKM N 2 T ). IV. A PPROXIMATION G UARANTEE We show that the proposed unied algorithm (Algorithm 1) achieves 1/2-approximation of the optimal solution to problems (3) and (4) under the backlogged trafc model and the nite queue model, respectively. Before showing our results, we present some basic known results on matroids and submodular functions. Denition 1 Let be a nite set and I a collection of subsets of . The system M = (, I) is called a matroid if (a) A I and B A B I. (b) For all F , every maximal member of I(F ) = {E : E F, E I} has the same cardinality. The members of I are called independent sets. A matroid (, I) is called a partition matroid if there exist i , i = 1, , k such that i j = for any i, j, i i = , and A I if and only if |A i | 1 for any i. Denition 2 Given a nite set , a real-valued function f dened on the subsets of is called submodular if, for all A, B such that A B, a B, f (B {a}) f (B) f (A {a}) f (A). f is called non-decreasing if for all a , A , f (A {a}) f (A). The problem of maximizing a non-decreasing submodular function over a matroid is to maximize {f (A) : A I} (13)

smaller block error probability until this codeword is full, and then to the other codeword. Let a = argmini=1,2 pi , and u,n b = 3 a. The optimal gain of allocating s bits to the nth RB is n hn (s) [wu ru (s) V (n)]+ where 0 s T1 + T2 , and
n ru (s)

= min(s, Ta )(1

p a ) u,n

+ [s Ta ] (1

pb ). u,n

The recursive equation for gn (t) is then gn (t) = sn (t) =


0smin(t,T1 +T2 )

max

(gn1 (t s) + hn (s)) (10) (gn1 (t s) + hn (s)) (11)

argmax
0smin(t,T1 +T2 )

where sn (t) is the optimal allocation on RB n when t bits are available for the rst n RBs in R. The initial condition is g1 (t) = h1 (min(t, T1 + T2 )). (12)

The dynamic programming algorithm based on the recursive equation (10) and the initial condition (12) is illustrated in Algorithm 3. Note that steps 10-13 use backtracking to nd the total optimal allocation tn on RB n, and once tn > 0 are determined, min(tn , Ta ) bits are allocated to the codeword a where a = arg mini=1,2 pi , and the remaining bits ([tn u,n Ta ]+ ) are allocated to the other codeword. The optimality of Algorithm 3 in solving problem (9) follows from Bellmans principle of optimality. Moreover, we show that the optimal solution exhibits the following property. Proposition 2 There exists an optimal solution to problem (9) such that: 1) at most one codeword in one RB is partially allocated (i.e., 0 < ti < Ti for some RB n and codeword i), n and 2) the partially allocated pair of codeword and RB has the largest block error probability (pi ) among all RB and u,n codeword pairs that receive allocations. 7

Fisher, et al. [10] proposed the following greedy algorithm 1 to solve problem (13) and established a 2 -approximation guarantee for the greedy algorithm. Algorithm 4 Greedy Algorithm on Matroid 1: Set A = 2: repeat 3: Let a = argmaxaA,A{a}I f (A {a}) f (A), 4: Set A = A {a} 5: until no improvement can be made on f (A)

1 Lemma 1 Algorithm 4 achieves a 2 -approximation bound to Problem (13).

We next prove the approximation guarantee under the backlogged trafc model.

Theorem 2 Under the backlogged trafc model, Algorithm 1 achieves 1/2-approximation of the optimal solution. Proof: Our goal is to map the unied scheduling algorithm (Algorithm 1) to the above greedy algorithm (Algorithm 4) for maximizing submodular functions under a matroid constraint. In order to do so, we need to i) construct a matroid, ii) map the objective function in problem (3) to a nondecreasing submodular function on the constructed matroid, and iii) prove that the greedy steps in each iteration of the two algorithms are equivalent. We dene the set of users as U, the set of transmission modes (including MCS and MIMO modes) as M, and the set of all RBs as C. We dene the complete set = {(u, m) : u U, m M}, and dene a set A as independent set iff i) |A| K, and ii) for any two elements (u1 , m1 ) and (u2 , m2 ) in A, u1 = u2 . Then the members of the set I are all independent sets. It can be easily veried that the system (, I) forms a matroid, (in particular, a partition matroid). The objective function on each subset A is dened as f (A) =
cC (u,m)A m max wu ru,c

Qu where k1 = Tm . Note that tm only depends on (u, m, Cu ), u,n not on the RB assignment of other users in A, thus it is welldened. Dene

and ii) u1 = u2 for any two elements (u1 , m1 , C1 ) and (u2 , m2 , C2 ) in A. The collection I is the set of all independent sets. It can easily be veried that the system dened by (, I) is a partition matroid. Given an independent set A (i.e., A I), we dene the following function f over A. Let A = {(ui , mi , Ci ) : 1 i |A|}. Each element (ui , mi , Ci ) in A can be viewed as assigning a set of RBs Ci to a user ui with transmission mode mi . For a given element (u, m, Cu ) A, we sort the RBs in Cu in the increasing order of pm and let j be the index u,n of the j th sorted RB in Cu . Then we let the number of bits allocated on RB j to be T m for j = 1, , k1 , tm j = Qu k1 Tm for j = k1 + 1, (15) u, 0 for j > k1 + 1,

f (A)
nC

(u,m,Cu )A,Cu n

max

wu tm (1 pm ) u,n u,n

(16)

We next show that the function f dened above is submodular. For any A, B , A B, (u0 , m0 ) B, f (A {(u0 , m0 )}) f (A) m = max wu ru,c
cC (u,m)A{(u0 ,m0 )} cC (u,m)A m max wu ru,c

The submodularity of the function f (A) can be proved similarly as in Theorem 2. We consider the following problem of maximizing a submodular function over a matroid, max f (A) : A I. (17)

(u,m)A

=
cC

m m max(0, wu0 ru00 max wu ru,c ) ,c m m max(0, wu0 ru00 max wu ru,c ) ,c

cC

(u,m)B

f (B {(u0 , m0 )}) f (B),

It should be noted that an independent set A may not correspond to a RB assignment in the sense that two users in A may have overlapping RBs. However, we notice the following fact, which is proved in Appendix E. (14) Lemma 2 The optimal value of f (A) in Eq. (17) is equal to the optimal objective value in problem (4) with one-codeword model. Therefore, in order to solve problem (4), it is sufcient to nd the optimal solution for Eq. (17). Since the latter is a submodular maximization problem over a matroid, we can apply the same greedy procedure as in Algorithm 4 to obtain a 1/2-approximation. It remains to be shown that the greedy step (step 3) in Algorithm 4, when applied to solve problem (17), is equivalent to Algorithm 2, which is the greedy step in Algorithm 1 for the nite queue model. For the greedy step in Algorithm 4, we need to nd (u, m, Cu ) that maximizes f (A {(u, m, Cu )}) f (A) max(0, wu tm (1 pm ) V (n)), = u,n u,n
nCu

where the inequality is because A B, and thus m m max(u,m)A wu ru,c max(u,m)B wu ru,c . It is obvious that the function f is non-decreasing by its denition. Therefore, it is non-decreasing and sub-modular. Finally, we need to verify that in each iteration of Algorithm 1, the selected (u0 , m0 ) maximizes the function f (A {(u, m)} f (A), where A is the present selected set of users and transmission mode pairs. This is immediate by comparing m Eqs. (14) and (5) and noting that v(u, n, m) = wu ru,n , and m V (n) = max(u,m)A wu ru,n from Eq. (6). We now show that the same conclusion holds for the nite queue model. We rst consider the case where one codeword is used. Theorem 3 Under the nite queue model with one codeword per-user, Algorithm 1 along with Algorithm 2 achieves 1/2approximation of the optimal solution. Proof: Since the transmission mode m uniquely determines the MCS 1 of the only codeword on each RB n, we 1 (m) use pm to represent pu,n and use Tm to represent T1 . We u,n dene the set = {(u, m, Cu ) : u U, m M, Cu C}. Dene a set A as an independent set iff i) |A| K, 8

(18)

where V (n) = max(u,m,Cu )A,Cu n wu tm (1 pm ). Alu,n u,n though there are exponential number of subsets {Cu } of RBs for each user u, fortunately, we do not need to enumerate all the subsets to nd the best element (u, m, Cu ). Instead, for each pair of u, m, we efciently determine the optimal subset

of RBs that achieves the maximum gain. In the following, we prove by induction on the cardinality of A that, i) V (n) = V (n) (V (n) is dened as in Eq. (6) with the nite queue model), ii) Algorithm 2 nds the optimal solution to problem (18), thereby completing the proof. In the initial step, V (n) = 0 and A = , thus V (n) = 0 = V (n) for all n. When V (n) = 0 for all n, Algorithm 2 reduces to sorting RBs in the increasing order of pm , allocating Tm u,n bits to the rst k1 RBs, and allocating Qu k1 Tm bits to the (k1 + 1)th RB. This also solves problem (18). Now assume that both statements i) and ii) are true for A, we prove that they continue to hold after adding one more element (u, m, Cu ) to A. Firstly, note that Algorithm 2 optimally solves problem (8), which is very similar to problem (18), except that the latter only needs to nd the optimal subset Cu and the bit allocation among Cu is xed (in the increasing order of pm ), while the former allows jointly optimizing both u,n RB selections and bit allocations. Nevertheless, Proposition 1 shows that, in the optimal solution to problem (8), only one partially allocated RB can occur which is the one with the largest pm among all allocated RBs. Therefore, the optimal u,n bit allocation for problem (8) is identical to that for problem (18). Thus, Algorithm 2 also nds the optimal solution to problem (18). Let V (n) and V (n) be the value after adding the best element in the current iteration. We have V (n) = = = V (n) + max(0, wu tu (1 pm ) V (n)) u,n V (n) + max(0, wu tm (1 pm ) V (n)) u,n u,n (n). V (19)

for any two elements (ui , mi (1 , 2 ), Cui ), i = 1, 2 in A. It can be veried that the system dened by (, I) is a partition matroid. Given an independent set A (i.e., A I), we dene the following function f on A. Let A = {(ui , mi (1 , 2 ), Ci ) : 1 i |A|}. For a given element (u, m(1 , 2 ), Cu ) A, we dene the RB-codeword pair (n, j) to the combination of the j jth codeword and RB n, which has block error probability pu,n (note that this probability also depends on the transmission mode m). We sort the RB-codeword pairs in Cu in the j increasing order of pu,n . Let nl , jl be the RB and codeword pair with the lth smallest block error probability. Letting tj u,n denote the number of bits allocated to the jth codeword on RB n, we have
l1

tjl l u,n

= min(Tjl , [Qu
k=1

Tjk ]+ ).

(20)

We then dene f (A)


nC

max wu (t1 (1 p1 ) + t2 (1 p2 )). u,n u,n u,n u,n (u,m(1 ,2 ),Cu )A,Cu n

We can prove the submodularity of the function f (A) similarly as in Theorem 2. Now we consider the problem max f (A) : A I (21)

Similar to Lemma 2, we can prove the following lemma. Lemma 3 The optimal value of f (A) in Eq. (21) is equal to the optimal objective value in problem (4) with the twocodeword model. Therefore, to solve problem (4), we only need to solve the problem in Eq. (21), which is a submodular function maximization problem over a matroid. So we can apply the greedy procedure as in Algorithm 4 to obtain a 1/2approximation solution. The greedy step in Algorithm 4 is to nd (u, m(1 , 2 ), Cu ) to maximize f (A {(u, m(1 , 2 ), Cu )} f (A)
2

Thus the induction step is proved. Finally, we provide the approximation results under the nite queue model with two codewords. Theorem 4 Under the nite queue model with two codewords or mixed codewords (i.e., some users have one codeword and others have two codewords), Algorithm 1 along with Algorithm 2 and Algorithm 3 achieves 1/2-approximation of the optimal solution. Proof: Without loss of generality, we only consider the case that all users are allowed to have two codewords. In the case of mixed codewords, those users with only one codeword available to allocate can be viewed as having two codewords where the second codeword always has zero expected rate on each RB (e.g., the per-RB block error probability of the second codeword pi is always 1). u,n We dene the set = {(u, m(1 , 2 ), Cu ), u U, m M, Cu C}, where an element (u, m(1 , 2 ), Cu ) is viewed as assigning a set of RBs Cu to a user u with transmission mode m which contains MCS 1 and 2 on the rst and second codewords, respectively. We note that the transmission mode m itself includes other components such as the MIMO operational mode (transmit diversity or spatial multiplexing), the transmission rank, and the precoding matrix. Dene a set A as an independent set iff i) |A| K, and ii) u1 = u2 9

=
nCu

max(0, wu (
j=1

tj (1 pj )) V (n)) (22) u,n u,n

where is the number of bits allocated to the jth codeword on RB n and it is determined according to Eq. (20), and
2

tj u,n

V (n) =

(u,m(1 ,2 ),Cu )A,Cu n

max

wu
j=1

tj (1 pj ). u,n u,n

It remains to show that, Algorithm 3 optimally solves the above problem in Eq. (22) for a given user u (so that the greedy step needs to scan every user and nd the optimal incremental gain). To do so, we prove the following lemma by induction on the cardinality of A, thus completing the proof. Lemma 4 i) V (n) = V (n), where V (n) is dened in Eq. (6). ii) Algorithm 3 nds the optimal solution to problem (22).

18 UB2 layers UB1 layer Unified2 layers Unified1 layer RefAlg12 layers RefAlg11 layer

100 90 Celledge throughput (Kbps) 80 70 60 50 40 30 20 30 40 Number of users per cell 50 20 10 20 30 40 Number of users per cell 50 UB2 layers UB1 layer Unified2 layers Unified1 layer RefAlg12 layers RefAlg11 layer

40 35 Sector throughput (Mbps) 30 25 20 15 10 5 0 10 20 30 40 Number of RBs 50 UB2 layers UB1 layer Unified2 layers Unified1 layer RefAlg12 layers RefAlg11 layer

200 180 Celledge throughput (Kbps) 160 140 120 100 80 60 40 20 10 20 30 40 Number of RBs 50 UB2 layers UB1 layer Unified2 layers Unified1 layer RefAlg12 layers RefAlg11 layer

Sector throughput (Mbps)

16

14

12

10

8 10

Fig. 1.

Backlogged trafc: impact of # users.

Fig. 2.

Backlogged trafc: impact of # RBs.

The proof of Lemma 4 is deferred to Appendix F. (1 1/e)-approximation algorithms. Since both problems (3) and (4) can be mapped to the problem of maximizing a submodular function over a matroid, we can apply a process called continuous greedy algorithm, developed in [18], [8] to solve the problem, which has (1 1/e)-approximation guarantee. Nevertheless, the continuous greedy algorithm requires a complexity of O((KM )7 ) where K is the number of users and M is the number of transmission modes. In a typical LTE setup, M is larger than 10000. As a result, we feel that this algorithm is too complex for practical implementations. V. P ERFORMANCE
EVALUATION

An event-driven OFDMA-MIMO simulator written in C++ is developed to evaluate the proposed LTE MIMO scheduling algorithms. A single-cell OFDMA MIMO downlink system based on LTE is considered with cell radius varying from 500m to 2000m. Users are uniformly randomly distributed within the cell but at least 250m away from the BS. We consider a MIMO channel model incorporating path loss, lognormal shadowing, and multi-path Rayleigh fading. A Doppler fading equivalent to velocity of 3Km/hour (ITU Pedestrian-B model) is assumed in the simulation. We consider an LTE subframe with 10 to 50 RBs and a MIMO system with 4 transmit antennas and 2 receiving antennas forming 1 or 2 layers depending on the channel conditions as well as the user capability. We assume that 16 rank-1 and 16 rank-2 precoding matrices, and 29 possible MCSs are available, as dened in the LTE standard. The scheduling algorithm is performed on each subframe and the user weights are updated according to the proportional-fair algorithm between subframes. 20 random drops are simulated and each drop is executed for 500 subframes. The sector throughput and the cell-edge throughput in each drop are measured and the average values among the 20 drops are then reported, where the cell-edge throughput is dened as the 5th percentile of all user throughput. We assume that in each subframe, at most 10 users can be scheduled. Unless otherwise specied, we consider an LTE downlink system with 20 users, 20 RBs, cell radius 1500m. 10

We evaluate two possible scenarios. In the rst scenario, only 1 layer is supported due to the limited user capability. Multiple rank-1 precoding matrices are available for scheduling in this case. In the second scenario, up to 2 layers are supported and the rank(i.e., number of layers) as well as the precoding matrix is determined by the scheduling algorithm. Reference Algorithm: We adapted the Alg1 in [13] to consider the constraint of maximum user limit and multiple transmission modes. When the number of allocated users reaches the maximum limit, the rest of RBs are only allocated among the set of users who already receive allocations. Alg1 in [13] is designed for backlogged case only. We also adapted it for the nite queue model as follows. The scheduling is rst determined based on the backlogged model. If a user receives allocations but does not have enough trafc to ll the allocated RBs, the unused RBs are then re-scheduled to the remaining users, until all RBs are allocated or the number of scheduled users reaches the limit. Performance upper bound: We also establish a performance upper bound under the backlogged trafc model by relaxing both the common transmission mode constraint and the user number limit. Thus, to obtain the upper bound, we simply nd the best combination of user and transmission mode on each RB (in terms of weighted rates) and compute the received rates based on this allocation. We note that such an upper bound is not possible for the nite queue model, where even if the user number limit and the common transmission mode constraint are lifted, the problem (due to the nite queue constraint alone) is NP-hard and does not even permit (1 )-approximation algorithms for a small > 0, as established in [5]. A. Backlogged trafc Figure 1 shows the aggregate sector throughput and the cell edge throughput, along with the corresponding upper bound, as a function of the number of users. The aggregate throughput gradually increases due to multi-user diversity, while the celledge throughput decreases steadily as the number of users sharing the bandwidth increases. We can see that i) using two layers can signicantly boost the aggregate throughput (over 20% compared to using 1 layer), ii) the average and the sector throughput improvement compared to reference Alg1 is

7-12% and the cell-edge improvement is mostly 5-10%, and iii) the proposed algorithm performs well within 10% of the upper bound in all cases for both the aggregate throughput and the cell-edge throughput, which is signicantly better than the worst-case guarantee of half, as established in the analysis. Figure 2 shows the impact of the number of RBs (i.e., available bandwidth) on the system performance. It is clearly seen that the both the aggregate and the cell-edge throughput increase linearly with respect to the available bandwidth. Compared to Alg1 in [13], our unied algorithm achieves about 10% performance gain for both the cell-edge throughput and the average throughput and the performance gain increases as the number of RB increases. Similar to the previous case, using 2-layer MIMO can improve the sector throughput by 20-30%, while the cell-edge throughput is nearly unaffected. We also notice that when the number of RBs increases, the performance gap from the upper bound also increases. This is not surprising. When the number of RBs increases, the upper bound allows to choose user and transmission mode independently on each RB and benets more from the channel diversity. Thus, the increasing gap is more likely because the upper bound becomes looser. Nevertheless, even with 50 RBs, our unied algorithm still retains the performance gap within 10%. Figure 3 shows the system performance vs. the cell radius. It is interesting to observe that when the cell radius decreases, the gain of using 2-layer MIMO increases signicantly. For example, when the cell radius is 500m, the sector throughput of using 2-layer MIMO is 76% more than that of using 1-layer MIMO, while for the cell radius of 2000, the improvement is 24%. This is because when the cell radius decreases, the channel SNR increases and it is more benecial to use spatial multiplexing in the high-SNR scenario. Again, we observe that our unied algorithm achieves an average improvement of 9% for the sector throughput and 6% for the cell-edge throughput, compared to the reference Alg1. B. Finite queue trafc model We now investigate the case of nite queue, where a xed trafc arrival rate is employed at every user in each run. Figure 4 shows the throughput vs. the trafc arrival rate. For our unied algorithm, while the aggregate user rate increases almost monotonically, the cell-edge user rate is initially at, and then decreases after the arrival rate is more than 200Kbps. This is because when the cell is under-loaded, our algorithm can deliver more trafc for the cell-edge users. When the loading increases, interior users gradually take more resources, resulting lower throughput for cell-edge users until a stable point is reached due to the proportional-fair-based weightupdate. In comparison, the reference scheme Alg1 always attempts to rst allocate to users with higher weighted rate even if they only have a small amount of data. This results in a low cell-edge throughput even when the arrival rate is low. In this case, we observe a much larger performance gain (on the sector throughput) of the unied algorithm compared to the Alg1 in [13], ranging from 9% to 31%, and the gain 11

600 25 Sector throughput (Mbps) UB2 layers UB1 layer Unified2 layers Unified1 layer RefAlg12 layers RefAlg11 layer 500 400 300 200 100 0 500 UB2 layers UB1 layer Unified2 layers Unified1 layer RefAlg12 layers RefAlg11 layer

20

15

10

5 500

1000 1500 Cell radius (m)

2000

Celledge throughput (Kbps)

1000 1500 Cell radius (m)

2000

Fig. 3.
14 12 Sector throughput (Mbps) 10 8 6 4 2 0

Backlogged trafc: impact of cell radius.

Celledge throughput (Kbps)

Unified2 layers Unified1 layer RefAlg12 layers RefAlg11 layer

100 90 80 70 60 50 40 30 20

Unified2 layers Unified1 layer RefAlg12 layers RefAlg11 layer

200

400 600 800 Arrival rate (Kbps)

1000

10

200

400 600 800 Arrival rate (Kbps)

1000

Fig. 4.

Finite queue: impact of arrival rate.

increases as the arrival rate decreases. Therefore, it is more benecial to apply our unied algorithm in the more practical trafc model with nite queue length. VI. C ONCLUSION In this paper, we have studied the LTE and LTE-A downlink scheduling problem with MIMO and several practical constraints. Both a backlogged trafc model and a nite queue model are considered for the problem. We prove that the problems are NP-hard and then develop a unied algorithm that can achieve 1 -approximation for both the trafc mod2 els. We also show the existence of (1 1/e)-approximation polynomial-time schemes although they may not be amenable to practical implementations. For the future work, we plan to apply variations of these algorithms to solve the scheduling problems in other OFMDA-based systems such as WiMAX. R EFERENCES
[1] 3GPP. UTRA-UTRAN long term evolution (LTE) and 3GPP system architecture evolution (sae). http://www.3gpp.org/article/lte, 2005. [2] 3GPP. TR 36.913 - requirements for further advancements for E-UTRA (LTE-Advanced). http:http://www.3gpp.org/article/lte-advanced, 2010. [3] M. Andrews, K. Kumaran, K. Ramanan, S. Stolyar, R. Vijayakumar, and P.Whiting. Scheduling in a queueing system with asynchronously varying service rates. Probability in the Engineering and Information Sciences, 18:191217, 2004. [4] M. Andrews, L. Qian, and A. Stolyar. Optimal utility based multiuser throughput allocation subject to throughput constraints. In IEEE Infocom, 2005.

[5] M. Andrews and L. Zhang. Scheduling algorithms for multi-carrier wireless data systems. In ACM Mobicom 2007, Sep. 2007. [6] M. Assaad and A. Mourad. New frequency-time scheduling algorithms for 3gpp/lte-like ofdma air interface in the downlink. In IEEE Vehicular Technology Conference, May 2008. [7] K. C. Beh, S. Armour, and A. Doufexi. Joint time-frequency domain proportional fair scheduler with harq for 3gpp lte systems. In IEEE Vehicular Technology Conference, Sep. 2008. [8] G. Calinescu, C. Chekuri, M. Pal, and J. Vondrak. Maximizing a monotone submodular function subject to a matroid constraint. To appear in SICOMP. [9] N. Chen and S. Jordan. Downlink scheduling with probabilistic guarantees on short-term average throughputs. In IEEE Wireless Communications and Networking Conference (WCNC08), Las Vegas, Nev, USA, March 2008. [10] M. L. Fisher, L. A. Wolsey, and G. L. Nemhauser. An analysis of approximations for maximizing submodular set functions-ii. Mathematical Programming Study, July 1978. [11] P. Kela, J. Puttonen, N. Kolehmainen, T. Ristaniemi, T. Henttonen, and M. Moisio. Dynamic packet scheduling performance in utra long term evolution downlink. In the 3rd International Symposium on Wireless Pervasive Computing (ISWPC08), May 2008. [12] R. Kwan, C. Leung, and J. Zhang. Multiuser scheduling on the downlink of an lte cellular system. Research Letters in Communications, 2008, Jan. 2008. [13] S. Lee, S. Choudhury, A. Khoshnevis, S. Xu, and S. Lu. Downlink mimo with frequency-domain packet scheduling for 3gpp lte. In IEEE Infocom, Brazil, 2009. [14] A. Pokhariyal, G. Monghal, K. I. Pedersen, P.E. Mogensen, C. Rosa, and T.E. Kolding. Frequency domain packet scheduling under fractional load for the utran lte downlink. In Proc. of the 65th IEEE Vehicular Technology Conference (VTC07), pages 699703, April 2007. [15] B. Sadiq, S. J. Baek, and G. de Veciana. Delay-optimal opportunistic scheduling and approximations: the log rule. In IEEE Infocom, 2009. [16] S. Shakkottai and A. L. Stolyar. Scheduling for multiple ows sharing a time-varying channel: the exponential rule. Analytic Methods in Applied Probability, 207:185202, 2001. [17] A. Stolyar. On the asymptotic optimality of the gradient scheduling algorithm for multiuser throughput allocation. Operations Research, 53(1), 2005. [18] Jan Vondrak. Optimal approximation for the submodular welfare problem in the value oracle model. In the 40th annual ACM symposium on Theory of computing (STOC), 2008.

a good and conservative choice as it only considers the signal power increase (as a result of transmitting two layers instead of four) but not the decrease of interference from other layers. For a user reported rank of r, which in LTE is limited to 4, the rank employed by the BS can vary from 1 to r and the the expected data rates for up to r possible MIMO modes can be derived at the BS based on the user feedback report. The BS may choose to override the user recommended rank based on these derived expected data rates. Rank-override is particularly helpful when the assigned RBs are not in the set of preferred subbands reported by the user, and perhaps more so with a nite queue size. P ROOF A PPENDIX B OF P ROPOSITION 1

We rst prove the following lemma. Lemma 5 There exists an optimal allocation such that only one RB is partially allocated. The proof of the lemma is only sketched here. If two RBs are partially allocated, we can shift the allocation from one RB to the other. At least one way of shifting does not decrease the sum of gain. The shifting can continue until either one RB is empty or one RB is fully allocated. Now assume that there is one partially allocated RB and the remaining RBs are either not allocated or allocated with full capacity. Therefore, if the partially allocated RB is determined, we just pick k1 other RBs with the highest gain. The sorted index n of the partially allocated RB can be either greater than k1 , which leads to the allocation H1 , or less than or equal to k1 , which results in the allocation H2 . The algorithm returns the better allocation between H1 and H2 . Thus it nds the optimal solution. This also proves that the obtained solution contains at most one partially allocated RB. To prove the last statement in the theorem, if the partially allocated RB does not have the largest pm among all RBs u,n receiving allocations, we can switch the allocation between the partially allocated RB and the one having the largest pm u,n and receiving full allocation. It is not hard to see that the switched allocation has a higher gain. This contradicts with the optimality of the solution. Thus it completes the proof. A PPENDIX C A
NAIVE ALGORITHM FOR

A PPENDIX A R ANK - OVERRIDING IN LTE In the LTE downlink with MIMO, when a user needs to report channel feedback, it reports a common transmission rank and precoding matrix for all subbands, along with a CQI value on each subband for every codeword. If the transmission rank is at least 2, the CQI values of two codewords are reported. Suppose that a user reports rank 4, then it will also report a rank 4 precoding matrix W = [w1 , w2 , w3 , w4 ] and 2 CQIs (for two codewords) on every subband (or for a subset of subbands it prefers). Using the user report, the base station (BS) can then compute a per-RB SINR estimate i for each layer i : 1 i 4 which corresponds to the ith column wi of W. Note that since the user prefers a transmission rank of 4, it assumes a per-layer-power of P/4 while computing its CQI, where P is the total transmit power allocated on each RB. If the BS decides to reduce the rank to 2, the nested structure of LTE codebook ensures that at least one rank-2 column subset of W , say W = [w1 , w2 ], is a valid rank-2 precoding matrix. Then the per-layer-power that can be used by the BS is P/2 instead of P/4. Using (4/2)i , 1 i 2 as the new SINRs corresponding to the two columns of W is 12

P ROBLEM (8)

The problem appears simple and our rst thought is to have the following algorithm. Step 1, let k1 Qu /Tm Step 2, sort RBs in R in the decreasing order of Tm (1pm ) u,n V (n). Step 3, allocate Tm bits to the rst (sorted) k1 RBs. Step 4, allocate RB k1 + 1 with Qu k1 Tm bits if the gain of the allocation is positive, and 0 otherwise. We now show, through an example, that this algorithm is not optimal. Let Tm = 1000, pm = 0.1, pm = 0.2, pm = 1 2 3 0.3, V (1) = 399, V (2) = 0, V (3) = 200, Q = 1500 (the user

index u is omitted). According to the above algorithm, RBs 2 and 1 will be selected and allocated with 1000 bits and 500 bits, respectively, resulting in a total gain of 1049. But the optimal solution is to allocated RBs 2 and 3 with 1000 bits and 500 bits, respectively. The optimal gain is 1150. The above example also shows that replacing step 2 of the naive algorithm with sorting the RBs in the increasing order of pm does not work either. u,n A PPENDIX D P ROOF OF P ROPOSITION 2 Suppose the optimal solution ti , n = 1, , N, i = 1, 2 n contains two partially allocated RB-codewords, say tj and ti . n l We assume that the gain on both RBs n and l is positive (otherwise, the allocation on one of them can be removed j without any loss of the gain). Assume that pu,n > pi . We u,l can shift the allocation from the jth codeword on RB n to the ith codeword on RB l. Clearly, such shift does not decrease the total gain. Therefore, we can continue shifting until either the jth codeword on RB n is empty or the ith codeword on RB l becomes full. Finally, if the only partially allocated RB-codeword has smaller block error probability pi than a RB-codeword u,n receiving full allocation, we can shift the allocation from the latter to the former to obtain higher gain. This completes the proof. A PPENDIX E P ROOF OF L EMMA 2 Let OP T denote the optimal value of problem (4) with one codeword and A denote the optimal solution to f (A) in Eq. (17). Suppose that xm , t are the optimal solution to u,n n problem (4). We can construct a set A from it such that A does not contain overlapping RBs as follows. For each u, m with m n xu,n > 0, we dene an element (u, m, Cu ) A such that Cu = {n : xm = 1}. Due to the second constraint n,n in problem (4), one user can appear in A at most once, thereby satisfying the independent constraint for the set A. Due to the rst constraint in problem (4), the RBs allocated to different users have no overlap. For each set of RBs allocated to a user, by Proposition 1, only the RB with the largest pm among all RBs receiving allocations may be partially u,n allocated. This coincides with the RB allocation strategy when computing f (A). Thus, the function value f (A) is exactly OP T . Therefore, OP T f (A ). Next, we consider the optimal solution A to f (A) in Eq. (17). Since it may contain overlapping RBs, it may not correspond to a valid RB allocation. However, we can convert it to a valid RB assignment (i.e., a solution to problem (4)) as follows. Let A = {(ui , mi , Ci ), 1 i |A |}. For each RB n, if n only belongs to one Ci , then we set xmi,n = 1. If n belongs to multiple Ci s, we nd uj = ui m m argmaxu:Cu n wu tu,n (1 pu,n ) (ties are broken arbitrarily), where tm is dened as in Eq. (15), and set xm ,n = 1. u,n uj All remaining xm = 0. The resulting xm corresponds to u,n u,n a valid RB assignment (i.e., without overlapping). Since the 13

bit allocation strategy is the same, we have OP T f (A ). Therefore, we conclude that OP T = f (A ). A PPENDIX F P ROOF OF L EMMA 4 For the second statement, it is sufcient to show that the optimal solutions to problems (9) and (22) (for a given user u) are equivalent since Algorithm 3 solves problem (9) optimally. In the initial step, V (n) = 0 and A = , thus V (n) = 0 = V (n). Both problems (9) and (22) can be optimally solved by 1) allocating Qu bits to all RB-codeword pairs in the increasing order of the block error probability pi u,n for each user u, and 2) nding the user u that has the largest gain. Therefore, they are equivalent. Now assume that both statements i) and ii) hold for A, we prove that they continue to hold after adding a new element (u, m(1 , 2 ), Cu ). We note that problems (9) and (22) both require to nd a subset of RB-code pairs and to allocate them Qu bits. The difference is that the bit-allocation in (22) is xed according to Eq. (20) while that in Eq. (9) is chosen optimally. However, Proposition 2 shows that in the optimal solution to problem (9), bits are allocated in the increasing order of pi , u,n which is identical to that in Eq. (20). Therefore, the optimal solutions to both problems (9) and (22) are equivalent. Let V (n) and V (n) be the value on RB n after adding the best element in the current iteration. We have
2

V (n) = = =

V (n) + max(0, wu
j=1 2

tj (1 pj ) V (n)) u,n u,n tj (1 pj ) V (n)) u,n u,n


j=1

V (n) + max(0, wu V (n).

(23)

This completes the induction step.