This action might not be possible to undo. Are you sure you want to continue?

# MIMO Downlink Scheduling in LTE and LTE-Advanced Systems

Honghai Zhang, Narayan Prasad, Sampath Rangarajan NEC Laboratories America {honghai,prasad,sampath}@nec-labs.com

Abstract—LTE and LTE-Advanced (LTE-A) broadband wireless data networks are characterized by their use of Orthogonal Frequency Division Multiple Access (OFDMA) and Multiple Input and Multiple Output (MIMO) techniques. Scheduling plays a vital role in such systems in exploiting multi-user, multichannel and spatial diversity. We consider the downlink (DL) scheduling problem at the base station (BS) over such networks under a variety of practical constraints mandated by the 3GPP standards. Deﬁning a new construct called “transmission mode”, which denotes a particular choice of MIMO operational mode, precoding matrix, transmission rank, as well as the modulation and coding schemes (MCSs) of up to two codewords, we show that both LTE and LTE-A systems require that in every scheduling interval, each scheduled user be served using only one transmission mode. We then prove that the resulting scheduling problems are NP-hard under both backlogged and ﬁnite queue trafﬁc models. In each case, we re-formulate the scheduling problem as the maximization of a monotonic submodular function under a matroid constraint. This enables us to develop a uniﬁed lowcomplexity greedy algorithm that yields solutions guaranteed to be within 1/2 of the respective optima. Extensive performance evaluation in realistic settings reveals near-optimal performance of our proposed algorithm and that it signiﬁcantly outperforms the state of the art, especially for the ﬁnite queue model.

I. I NTRODUCTION The emerging LTE and LTE-A broadband wireless data networks feature OFDMA-based downlink transmission schemes and MIMO techniques. In these networks, the transmission time is divided into scheduling intervals or subframes of 1 ms duration each. In each subframe, the available frequency band (typically 5-20MHz) is divided into a large number of orthogonal narrow-band subcarriers (or tones). Contiguous subcarriers are then grouped into multiple resource blocks (RBs), which form the minimum allocation units in the frequency domain. Frequency-domain packet scheduling plays a critical role in exploiting multi-user and multi-channel diversity in LTE and LTE-A systems. MIMO is a promising multiple-antenna technology that can substantially improve system spectrum efﬁciency. MIMO provides three possible operational modes that are advantageous over single antenna systems: transmit diversity, beamforming, and spatial multiplexing. Beamforming can be viewed as a special type of spatial multiplexing where only one layer (i.e., one data stream) is transmitted with a particular beam pattern. In the general case of spatial multiplexing, transmission rank– which is equal to the number of transmitted streams and the precoding matrix–which determines the beam pattern with which the streams are transmitted, can be chosen to optimize 1

the system capacity. Therefore, MIMO provides additional degrees of freedom for packet scheduling, where the operational modes (transmit diversity or spatial multiplexing) along with the choice of precoding matrix and transmission rank become part of the scheduling decision to be made. Ideally, the MIMO mode (including the precoding matrix and rank) and the MCS should be separately optimized for each RB in order to obtain the maximum possible gains in the system capacity. However, in practice, doing so would entail a prohibitive amount of channel feedback from the users (to enable such a ﬁne-grained per-RB scheduling) as well as feedforward signaling to the users (to inform them about the scheduling decisions). Consequently, LTE and LTE-A networks have decided to impose several practical constraints which limit the signaling overhead to manageable levels. In this work, we incorporate several key constraints imposed by 3GPP standards body on LTE and LTE-A [1], [2], such as (i) a maximum limit on the number of users scheduled in a subframe, (ii) using a common MIMO mode on all the RBs allocated to a particular user, and (iii) assigning to a user at most two codewords (a codeword is a transport block of information bits) with one MCS-per-codeword. Additionally, we consider the scheduling problems under both backlogged and ﬁnite queue trafﬁc models. A striking difference of the LTE/LTE-A MIMO downlink scheduling compared to that in single-antenna systems is that a user may be simultaneously allocated up to two codewords and each codeword spans all the RBs assigned to that user but maps to only one MCS. As a result, each scheduled user may be assigned up to two codewords (each with its own MCS) in a subframe. This is in sharp contrast to the previous academic models of single-antenna systems where a user can be assigned a separate MCS on each of its RBs. To the best of our knowledge, no previous work has considered the DL scheduling problem for LTE and LTE-A incorporating the aforementioned practical constraints imposed by the standards. We expect such scheduling constraints to be imposed not only in LTE and LTE-A networks but also in future technology releases of the 3GPP standards. This motivates us to devise efﬁcient algorithms for solving the problem. We make three major contributions in this work. • First, to facilitate the problem formulation, we deﬁne a new construct called transmission mode, which denotes a particular choice of a MIMO operational mode, precoding

when extended to the case with more than two 1 transmission modes. Assume that there are N resource blocks (N RBs) and K wireless users. However. Simulations show that our uniﬁed algorithm outperforms a best-known algorithm by nearly 10% for the backlogged trafﬁc model and up to 30% for the ﬁnite queue model. [11]. much better than the worst case guarantee of half. [9]. the algorithm proposed in [13] only works with two possible MIMO modes: transmit diversity and a restricted spatial multiplexing technique which allows using only the identity matrix as the precoding matrix. Thus. it is assumed that the user u receives a data rate ru. one applies to the backlogged trafﬁc model and the other applies to the ﬁnitequeue trafﬁc model. does not arise. P ROBLEM F ORMULATION A. Therefore. Related work Various ﬂavors of the DL scheduling problems over OFDMA-based networks have been studied extensively [16]. [15]. where ri is the instantaneous rate of user i (t) and ri is its average rate up to time t. It is substantially more challenging to address all these constraints together and offer approximation guarantees. the transmission time is divided into subframes of 1 ms duration. We prove that it achieves 1 -approximation 2 guarantee under both the trafﬁc models. [14]. and user u has a weight wu . transmission rank. it is sufﬁcient to maximize i Ui′ (¯i )ri at each ¯ r (t) time instant t. We conduct extensive simulations based on practical LTE channel models with MIMO and OFDMA to corroborate our analysis. In each subframe. LTE downlink scheduling is performed in the frequency domain of each subframe. standards speciﬁc constraints such as the common MCS requirement for each codeword across all RBs. We note that while [13] also obtained 1 a 2 -approximation for their simpler scheduling problem. One recent exception is [13] which considers the DL scheduling problem with a common MIMO mode constraint.n on resource block n. In Section III. where the r i=1 utility Ui of user i is a function of the long-term average user (t) (t) rate ri . given that all users experience different channel conditions on different RBs. Section IV is devoted to the proofs of the approximation guarantees of the proposed algorithms. we consider ¯ the general problem of maximizing the weighted sum data rate in each subframe and omit the time index t in the following discussion. the frequency base band (typically 5-20MHz) is divided into a large number of orthogonal narrow-band subcarriers. we develop a uniﬁed greedy algorithm that simultaneously considers all the aforementioned constraints and develop two solutions for the problem: one for the backlogged trafﬁc model and the other for the ﬁnite queue trafﬁc model. For the convenience of future . [6]. all of which are essential to realize an implementable scheduling algorithm. Since [13] does not allow for complete spatial multiplexing involving several transmission ranks and precoding matrices. we discuss the constraints imposed by LTE and LTE-A and introduce the model and problem formulation. The objective of the downlink scheduling problem is to allocate RBs to users. but hitherto the attention has mainly been on single antenna systems. [12] studied the single antenna system scheduling problem with a common MCS requirement but proposed a greedy algorithm that does not offer any approximation guarantee. Under a simple model where different data rates can be assigned independently to each RB. [3]. we present a uniﬁed algorithm that simultaneously considers all the aforementioned constraints and develop two different solutions for the problem.matrix. their method. • Second. Subcarriers are then grouped into resource blocks (RBs). • Third. most of the existing works assume that the MCS for a user can be selected independently on each RB (which avoids the coupling constraint imposed by mandating a common MCS for each codeword across all assigned RBs) and/or only consider the backlogged trafﬁc model. Additionally. [12]. Realistic performance evaluation shows that our proposed algorithm attains a performance gap well within 10% of the optimal values under most practical scenarios. II. does not guarantee 2 -approximation and 2 in fact can only offer a guarantee that is inversely proportional to the number of transmission modes. are not considered. [4]. We present the performance evaluation in Section V and conclude with Section VI. B. Paper organization In Section II. none of the existing works has considered all the aforementioned constraints simultaneously. in order to maximize the sum of all users’ utility functions K Ui (¯i ). A few other works have considered a much smaller subset of the constraints or trafﬁc models that we consider here. Even within the context of single-antenna systems. Kwan et al. we show that the formulated MIMO scheduling problems with the aforementioned constraints under both the trafﬁc models are NP-hard. maximum of two codewords for every scheduled user and a limit on the number of scheduled users. assignment of precoding matrix to users under the constraint of a common precoding matrix and transmission rank on the RBs allocated to a user. although these algorithms have high complexity not yet conducive to practical implementation. and MCS(s) for the allocated codeword(s) and show that LTE and LTE-A require using a common transmission mode on all RBs allocated to a particular user. For example. LTE downlink scheduling In 3GPP LTE (and LTE-A). which form the minimum allocation unit in the frequency domain. Andrews and Zhang [5] considered the scheduling problem under the ﬁnite queue trafﬁc model over single antenna systems without MIMO modes and without a common MCS requirement. [7]. A. [5]. We note that it is also possible to obtain (1-1/e)-approximation algorithms by exploiting recent advances in submodular function maximization. in order to maximize a system-wide utility. It was shown in [17] that.

together forms one transmission mode. If there is no other constraint. LTE imposes the following practical constraints. To reduce the feedforward control signaling overhead and the user feedback overhead.. As a consequence.n = 1. beamforming.n (2) where Tφi is the number of information bits (a. where the weight of each user is the inverse of the exponential moving average of the achieved rate. This constraint is required to limit the signaling overhead of resource allocations. 1 ≤ n ≤ N u=1 subject to (1) where yu. we assume that two codewords (and thus two MCSs) are always allocated to a scheduled user. transmission rank. Note that the user weights can be updated in each subframe to maximize the sum of arbitrary concave utility functions [17]. This simpliﬁes the transmitter design and reduces the signaling overhead. one for each codeword. LTE allows up to 4 spatial layers (i. Then constraints 2 and 3 can be combined into one constraint of common transmission mode per-user. With this modiﬁed deﬁnition of the transmission mode. LTE-Advanced (LTE-A) is similar to LTE except that in LTE-A. For notational convenience. For spatial multiplexing. Mathematically. Transmission rank is the number of transmitted spatial layers. and the two MCSs for the two codewords. user-speciﬁc reference symbols can be transmitted to specify a precoding matrix for the allocated user. we consider the downlink scheduling problem where the objective is to maximize the weighted sum rates subject to the practical constraints imposed by LTE as presented in the previous subsection. it is sufﬁcient to maximize the weighted sum rate in each subframe. For spatial multiplexing.a. both of which are uniquely determined by m. Constraints imposed by LTE and LTE-A with MIMO MIMO provides three operational modes that are advantageous over single-antenna systems: transmit diversity. beamforming can be viewed as a special type of spatial multiplexing with one layer (i..1 there are two more control knobs that can further optimize the system capacity: transmission rank and precoding matrix. For LTE-A systems. Codewords are mapped onto different spatial layers in a pre-deﬁned manner. 1 As This also reduces the signaling overhead of resource allocations. the scheduling problem (1) can be optimally solved by selecting the user that maximizes the weighted data rate on each RB. for each RB n. and the second codeword uses QPSK-1/2. The objective is to maximize the weighted sum rate over all RBs subject to the constraint that only one user can be allocated to each RB.k.n is an indicator variable whose value is 1 if user u is allocated to RB n. each layer can be viewed as a separate data stream.n 3 . the precoding matrix.n is measured in bits per RB (which can be viewed as the bits per sub-carrier multiplied by the number of sub-carriers in each RB). respectively.n yu. where the second codeword may take a dummy value to indicate the case of one codeword. one data stream).n to schedule. Optimization framework For LTE systems. if our objective is to maximize a log utility function (which is equivalent to maintaining proportional fair scheduling). we omit the dependence parameter m for brevity. Then. a maximum rank of 4) and up to 2 codewords per scheduled user. Transmission rank is limited by the minimum of the number of transmit and receive antennas. and the precoding matrix on all the RBs allocated to the same user needs to be the same. the problem is to N K maximize n=1 u=1 K wu ru. In other words. let the MCSs for the two codewords be φ1 (m) and φ2 (m). Now the expected data rate for scheduling user u on RB n with mode m is given by m ru.discussion. mentioned earlier. spatial multiplexing with 2 layers and precoding matrix index 1 where the ﬁrst codeword uses an MCS comprising of 16QAM modulation and 1/2 coding rate. In a MIMO transmission with L-layer spatial multiplexing. all the analysis and the algorithms in this paper can be carried over to LTE-A systems. We deﬁne a transmission mode as the combination of the MIMO operational mode. the precoding matrix can be independently chosen to optimize the channel capacity on each RB although the transmission rank and the MCS on each codeword have to be the same. in each RB a sequence of data symbols is ﬁrst segmented into L layers ¯ to form a matrix X. When there is no confusion. on each RB. constraint 3 implies that each user may receive up to two MCSs. then it also requires that a common transmission mode be used on all the RBs allocated to a scheduled user. For example. and spatial multiplexing. we focus on LTE systems hereafter. where a spatial layer is a mapping of symbols to the transmit antenna ports. Given the transmission mode m.n yu. we assume that the data rate ru. 1) The number of users scheduled in a subframe cannot exceed a certain threshold. a precoding matrix P is applied to ¯ map the multi-layered data symbols in X to the transmitted ¯ ¯ ¯ signal Y(X) before being launched into the air: Y(X) = PX. choose user u(n) = argmax wu ru.e. 3) The MCS has to be the same for each codeword spanning all the RBs allocated to a user in a sub-frame. where a codeword is a transport block of information bits that is independently decoded. Without loss of generality. B. and the two MCSs for the two codewords. 2) The MIMO operational mode (transmit diversity or spatial multiplexing). Since at most two codewords per-user are allowed in LTE. For example.n u. the rank. if we deﬁne a transmission mode as the combination of the MIMO operational mode (transmit diversity or spatial multiplexing). C.n = Tφ1 (1 − pφ1 ) + Tφ2 (1 − pφ2 ) u.e. transport block size) per-RB carried with MCS φi on codeword i and pφi is the block error probability for user u with transmission u. the rank.. and the constraints require that only one user is allocated to each RB.

respectively. we can use Eq. 2.n indicator variable which is one if user u is scheduled on RB n with transmission mode m. the ﬁrst set of constraints requires that on each RB. for the two codewords.n n then formulated as.n u=1 n=1 m=1 M K xm = 1. Denote t1 . (2) to simplify the notations and the scheduling problem is formulated as K N M m wu ru. [13]) where an unconstrained variable number ru. Deﬁne K as the maximum number of users that can be scheduled in a subframe. For example.mode m and MCS φi on RB n for i = 1. 1 ≤ u ≤ K u. the maximum number of bits allocated to the two u. one common transmission mode is used across all RBs allocated to the user). 1 ≤ u ≤ K u. Together these features enable the BS to determine the per-RB expected data rates for up to 4 MIMO modes (as well as 29 MCSs) based on a single feedback report received from that mobile and to possibly override the user recommended rank. Proof: The problem with common MIMO operational mode (either transmit diversity or spatial multiplexing) without user limit is shown to be NP-hard in [13]. such that it is NP-hard to obtain (1 − δ)-approximation to problem (4). in the LTE downlink. i = 1. xm = 1).n m=1 u=1 M N max xm ≤ 1. Finite queue model Under this model. and the last equation speciﬁes the maximum allowed number of bits allocated to one codeword on every RB. So 0 ≤ xm ti ≤ Tφi (m) . t2 as the number of information bits n n allocated to the two codewords on RB n.. Let wu be the weight of user u.n codewords on RB n is Tφ1 (m) and Tφ2 (m) . we assume that user u has a queue size Qu . Theorem 1 Problem (3) is NP-hard and there exists δ > 0. respectively.n n=1 m=1 K M N ¯ max xm ≤ K u. Comments on Channel feedback: In order to compute pφi .n n=1 u=1 m=1 where the objective is to maximize the weighted sum data rate. Backlogged trafﬁc model Under this model. The scheduling problem is u. E.n n n m=1 n=1 0 ≤ xm ti ≤ Tφi (m) . K N M 2 maximize subject wu xm u.n of bits can be allocated on RB n and then all of them are successfully delivered. the LTE precoding codebook has a nested structure wherein one or more column subsets of each precoding matrix are themselves valid lower rank precoding matrices. A more detailed discussion on the rankoverride is presented in Appendix A. and the last constraint requires that the maximum number of scheduled ¯ users cannot exceed K. mobile terminals need to feedback either the channel quality indicator (CQI) for every possible MIMO mode or a channel gain matrix. the second set of constraints stipulates that each user can only have one transmission mode (i. at most one user with one transmission mode can be scheduled. Thus problem (3) is NP-hard. 1 ≤ n ≤ N. i = 1.n n=1 m=1 K M N ¯ max xm ≤ K u.n the base station (BS. perRB. That problem is a ¯ special case of problem (3) if we let K = K and M = 2. Although the two MCSs φ1 (m) and φ2 (m) on the two codewords depend on the transmission mode m. M be the total number of users and transmission modes.e.n n=1 u=1 m=1 M N xm (t1 + t2 ) ≤ Qu . u. 4 maximize subject to (3) . a mobile is allowed to select and report one transmission rank and one precoding matrix.n = 1 m=1 u=1 M N max xm ≤ 1. In addition it is allowed to send two CQIs (one CQI) per subband.g.e. Hardness results We now establish the hardness results of the formulated problems. [5].n (≤ Tm ) is the expected number of bits that can be successfully delivered. Moreover. where a subband is a group of contiguous RBs. we often omit the parameter m for brevity when there is no confusion. With this formulation. A mobile can reduce the amount of feedback by sending the CQIs for only a few MIMO modes that it deems are most likely to be assigned to it by the BS. 2 u.. The following two subsections describe the problem formulation under these two models. D. we assume that Tm Tφ1 + Tφ2 bits can be m allocated on RB n but ru. We consider two possible trafﬁc models: a backlogged trafﬁc model and a ﬁnite queue model. and zero otherwise. 1 ≤ n ≤ N u.n xm u. whenever the selected transmission rank is greater than 1 (equal to one). Ideally.n n (4) where the second to the last equation is the queue size limit for each user. If RB n is allocated to user u with transmission mode m (i. Note that the max function can be easily converted into a linear inequality constraint so the problem is a linear integer programming problem. ¯ respectively. or e-NodeB) requires necessary channel state information from the users. F. 1 ≤ u ≤ K u. 2. Let K. This model is more accurate than the typical assumption made in previous works m (e.n n u=1 n=1 m=1 i=1 M K m to xu. We deﬁne xm as the u.n ti (1 − pφi ) u.

· · · . Each outer iteration takes O(KM N ) time (note that it requires O(N ) to compute g(u. m) = wu i=1 ti (1 − pu. We also deﬁne a current value V (n) on each RB n. and assign its transmission mode to be m∗ . go to 18. The hardness result of problem (2) in [5] can be extended to the same problem but with a general. n. m).m≤M g(u. The pseudo-code of the algorithm is illustrated in Algorithm 1. Let (u∗ . (5) and whenever g(u. a transmission mode m(u) and a set of RBs I(u) allocated to each user u ∈ S. m). n. n. n We can see that the value v(u. First. For each RB n. We next ﬁnd the optimal m and u that maximizes g(u. m) is simply the weighted rate of allocating user u on RB n with transmission mode m: m v(u. · · · . 14: V (n) = v(u∗ . only sub-optimal solutions are possible. m) of scheduling user u ∈ U on each RB n with transmission mode m. n. where the weight is chosen to be the queue size in [5]. or iii) g(u∗ . Initially U = {1. m) (as well as g(u. M = 1. m∗ ) > V (n). n. m) = wu ru.n . Note that the RB allocation is temporary as it may be re-allocated to another user later. m∗ ). At each iteration. We next develop a uniﬁed algorithm to solve the downlink scheduling problem and then specialize the solution to apply to each of the trafﬁc models. n. Backlogged trafﬁc model Under backlogged trafﬁc model. (5). if v(u∗ . and there is only one codeword on each RB. m) is the weighted data rate of allocating user u with t1 and t2 bits for the two codewords on RB n with n n φi (m) 2 transmission mode m: v(u. n. t1 and t2 ) to maximize n n 5 The algorithm terminates when any of the following condi¯ tions are satisﬁed: i) U = ∅. m(u)) ≤ 0. We then compute the gain g(u. 11: for n = 1 to N do 12: if v(u∗ . which can be viewed as maximizing weighted sum rate subject to ﬁnite queue size limit. m)). n. it can be pre-computed during initialization. it is NP-hard to obtain (1 − δ)-approximation to problem (4). we can compute m(u) = argmax1≤m≤M g(u. m) = n=1 max(0. 10: Add user u∗ to S and remove it from U. representing the weighted data rate of the current allocation on RB n. (6) (7) g(u. S = ∅. n. The total complexity of the ¯ algorithm is thus O(KKM N ). ¯ Algorithm complexity: As at most K users can be selected. The major issue here is to ﬁnd the best bit allocations (i. K}. 15: end if 16: end for 17: end while 18: Re-select the optimal transmission mode for each user with the set of allocated RBs. and V (n) = 0 for 1 ≤ n ≤ N . m∗ )). V (n) = 0 for 1 ≤ n ≤ N . m) (as well as g(u. The exact computation of v(u. We devise the following uniﬁed scheduling algorithm to solve problems (3) and (4). Denote U as the set of candidate users to be scheduled. v(u.1≤m≤M g(u. n. B. n. n. 6: end for 7: end for 8: (u∗ . this cannot be applied to the ﬁnite-queue model. m∗ ) = argmaxu∈U . m) is the maximum total gain of adding user u to S with transmission mode m and is calculated using Eq. g(u. Finite queue model Given u. m∗ ) ≤ 0. A. if the computation of v(u. m∗ ) ≤ 0. v(u. 9: If g(u∗ . m and the ﬁnite queue size Qu for each user u. K}. ¯ Second. m) − V (n)). Depending on whether it is the backlogged trafﬁc model or the ﬁnite queue model. m∗ ) = arg maxu∈U . III. However. A U NIFIED S CHEDULING A LGORITHM Due to the NP-hardness of the problems. then it is allocated to user u∗ and removed from its previous allocation if it was allocated previously. arbitrary weight for each user. We then show that the uniﬁed algorithm achieves a constant approximation ratio of the optimal solutions under both the trafﬁc models. n. m). m)) needs to be computed differently.n ). Therefore. m)) depends on the trafﬁc model and will be discussed later. we ﬁrst compute the value v(u. ¯ the maximum number of iterations in the outer loop is K. m) of adding user u with transmission mode m: N Algorithm 1 Uniﬁed Scheduling Algorithm 1: U = {1. V (n) is now updated with V (n) = max(V (n). n. only the function v(u. m) can be viewed as the weighted data rate of assigning user u to RB n with mode m. v(u∗ . m).. m). user u is removed from U. n. n. m) on each RB n and the gain g(u. v(u. ¯ . m) depends on the actual bit allocations and cannot be pre-computed. In the following we describe the methods to compute them under each model. ii) |S| = K. S = ∅. Two simpliﬁcations may be made to speed up the computation (they are not illustrated in Algorithm 1 for brevity). 2: while Termination conditions not satisﬁed do 3: for all u ∈ U do 4: for m = 1 to M do 5: Compute the value v(u. the value v(u. which is then a special case of problem ¯ (4) if we let K = K.e. m) does not depend on the existing allocations (as in the backlogged trafﬁc model). set p to be either 1 or 0. m∗ ) > V (n) then 13: Allocate RB n to user u∗ . We then add the user u∗ to the scheduled user list (and remove it from U).Andrew and Zhang showed that it is NP-hard to obtain a (1 − δ)-approximation solution to problem (2) in [5]. Our objective is to construct a set S of scheduled users.

2.n is deﬁned in Eq. ti ≤ Q u n n∈R i=1 0 ≤ ti ≤ n Tφi . Clearly there is no reason to allocate to user u RBs outside of R. only one codeword is allowed for the user (this can happen when either the user has only one receiving antenna or the channel condition of the user is too weak to support multiple layers). the problem is reduced to the backlogged case and we simply apply Eq. Note that even if there is only one codeword (and thus one layer). n. m) (deﬁned in Eq. For example. Given user u and transmission mode m with MCS φ1 and φ2 .n develop the following optimal solution to problem (9) based on dynamic programming. Then. Among total t bits. 12: Select the allocation having a larger gain between H1 and H2 .n≤k1 +1 pφ and allocate RB n2 u. there are still multiple MIMO modes as the MIMO mode also includes the choice of the precoding matrix (precoding vector in this case) and the MIMO operational mode (transmit diversity or beamforming). stop. x). (9) where [x]+ represents max(0. (8) max n∈R wu i=1 2 ti (1 n − p φi ) u. t2 be the number of bits allocated to the two codewords n n on RB n.n The proof is deferred to Appendix B and we comment that this proposition is important for the approximation guarantees of the algorithms to be established later. given user u and transmission mode m with MCS φ(m). (2)).n with Qu − k1 Tφ bits if the gain on RB n2 is positive. the sub-problem of bit-allocation can be formulated as 2 + max(0. m) and g(u. n. n.n then allocate bits in this order. While the ﬁrst scenario can be viewed as a special case of the second scenario where the second codeword takes a dummy value.n > V (n)}. u. 13: end if ¯ worst case. (7) and (5) to compute v(u. m) = wu (t1 (1 − pφ1 ) + t2 (1 − pφ2 )). we use the solution {tn } to obtain v(u. and t − s bits be allocated to the ﬁrst n−1 RBs.n V (n). We deﬁne a subset of RBs R {1 ≤ n ≤ N : m wu ru. Proposition 1 Algorithm 2 ﬁnds the optimal solution to problem (8). 10: Allocate Tφ bits to each of the ﬁrst k1 + 1 RBs except n2 . In Appendix C. m) = wu tn (1−pφ ). We note that φ = φ(m) depends u. Moreover. Algorithm complexity: The major complexity of Algorithm 2 lies in sorting RBs in Step 5. the found optimal solution has at most one partially allocated RB and if there is a partially allocated RB. only RBs in R are considered and indexed). we ignore the terms related to the second codeword and omit the codeword index i in φi and ti to simplify the n notations. only on RBs n such that wu ru. So if the queue size Qu of user u satisﬁes Qu ≥ |R|(Tφ1 + Tφ2 ).n the two codewords on RB n. and the block error probabilities pφ1 and pφ2 for u. 6: Allocate Tφ bits to each of the ﬁrst k1 RBs in R. 8: Denote H1 as the allocation resulting from the previous two steps. m ti ≤ Tφi .n > V (n) n m is there weighted rate increase (where ru. 1) One codeword: In the case that only one codeword is allowed. m). let t1 .n>k1 wu (Qu − k1 Tφ )(1 − pφ ) − u. for all n ∈ R n∈R s. φ2 . the problem in the ﬁrst scenario is easier and allows much more efﬁcient (but still non-trivial) solutions. 3: else 4: Let k1 ← ⌊Qu /Tφ ⌋ 5: Sort RBs in R in the decreasing order of the gain wu Tφ (1 − pφ ) − V (n). which requires O(N log N ) in the 6 s. two codewords are allowed for the user. Therefore. we need to ﬁnd a subset of RBs in R (as well as the bit allocations) that maximizes the total gain of the allocations. We now present the following optimal solution to problem (8) as shown in Algorithm 2.n on m. it has the largest pφ among all allocated RBs. In the ﬁrst scenario. If Qu < Tm |R|. the maximum number of bits carried by one RB is Tφ1 + Tφ2 . we let v(u. Deﬁne gn (t) as the maximum gain with total t bits for RBs 1 to n of R (without loss of generality. wu tn (1 − pφ ) − V (n)) u. We consider two possible scenarios. It may be tempting to provide simpler algorithms. We now n u. φ2 ). for all n ∈ R. 2) Two codewords: When there are two codewords.n n u. For a given transmission mode m (with MCSs φ1 .the gain g(u. Given a transmission mode m with MCS φ1 . Since the greedy step will be executed O(KKM ) times. 7: n1 = arg maxn∈R. 9: n2 = arg maxn∈R. In the second scenario. the sub-problem of allocating bits to RBs in R can be formulated as max n∈R Algorithm 2 Optimal Bit Allocation with One Codeword 1: if Qu ≥ Tφ |R| then 2: Allocate all RBs in R to the user u.n u. After solving problem (9). i = 1. we show that this straightforward solution is not optimal. let s ≤ t bits be allocated to the nth RB. Assume RBs are sorted in u.n − V (n) After solving problem (8). the total complexity of Algorithm 1 for the ﬁnite queue ¯ model is O(KKM N log N ).t. Allocate RB n1 with Qu − k1 Tφ bits if the gain on RB n1 is positive.t. (5)) subject to the queue size constraint. 11: Let H2 denote the allocation resulting from the previous two steps. one simple solution is to sort the RBs in the decreasing order of the gain (wu Tφ (1 − pφ ) − V (n)) and u. The optimal way to allocate s bits between the two codewords on RB n is to ﬁrst allocate to the codeword with a .n tn ≤ Q u 0 ≤ tn ≤ Tφ .n this order hereafter.

which requires up to O(TΦ ) time. and n ru (s) = min(s. Algorithm complexity: The major complexity of Algorithm 3 lies in the recursive steps. a real-valued function f deﬁned on the subsets of Ω is called submodular if. and once tn > 0 are determined. (12) for 1 ≤ t ≤ Qu 5: for n = 1 to |R| do 6: for t = 1 to Qu do 7: Compute gn (t) and sn (t) using Eqs. Tφa ) bits are allocated to the codeword a where a = arg mini=1. B ⊆ Ω such that A ⊆ B. we present some basic known results on matroids and submodular functions. 2 Algorithm 3 requires O(N 2 TΦ ) time. The optimal gain of allocating s bits to the nth RB is n hn (s) [wu ru (s) − V (n)]+ where 0 ≤ s ≤ Tφ1 + Tφ2 . 15: end if The proof of the proposition is deferred to Appendix D. I) is called a partition matroid if there exist Ωi . Deﬁnition 1 Let Ω be a ﬁnite set and I a collection of subsets of Ω. where TΦ is the maximum transport block size in one codeword of one RB among all possible MCSs. and u. The system M = (Ω. Let a = argmini=1. Tφ1 + Tφ2 )). f (B ∪ {a}) − f (B) ≤ f (A ∪ {a}) − f (A). (b) For all F ⊆ Ω. u. shift the data from one codeword to another such that only one codeword becomes partially allocated. i = 1. · · · . Deﬁnition 2 Given a ﬁnite set Ω. stop. The initial condition is g1 (t) = h1 (min(t.n + [s − Tφa ] (1 − + pφb ). 0 < ti < Tφi for some RB n and codeword i).2 pφi . respectively. ∪i Ωi = Ω. A ⊆ Ω. a ∈ Ω − B.Tφ1 +Tφ2 ) where sn (t) is the optimal allocation on RB n when t bits are available for the ﬁrst n RBs in R.n codeword pairs that receive allocations. The problem of maximizing a non-decreasing submodular function over a matroid is to maximize {f (A) : A ∈ I} (13) smaller block error probability until this codeword is full. and A ∈ I if and only if |A ∩ Ωi | ≤ 1 for any i. A matroid (Ω. The total complexity of 2 ¯ Algorithm 1 in this case is then O(KKM N 2 TΦ ). 12: end for 13: Allocate tn bits to RB n for n = 1. [10] proposed the following greedy algorithm 1 to solve problem (13) and established a 2 -approximation guarantee for the greedy algorithm. Note that steps 10-13 use backtracking to ﬁnd the total optimal allocation tn on RB n. 8: end for 9: end for 10: for n = |R| to 1 do |R| 11: tn = sn (Qu − j=n+1 tj ). The optimality of Algorithm 3 in solving problem (9) follows from Bellman’s principle of optimality. we show that the optimal solution exhibits the following property. f (A ∪ {a}) ≥ f (A). · · · |R|.Tφ1 +Tφ2 ) max (gn−1 (t − s) + hn (s)) (10) (gn−1 (t − s) + hn (s)) (11) argmax 0≤s≤min(t.. .e. min(tn . k such that Ωi ∩Ωj = ∅ for any i. (10) and (11). Thus. 3: else 4: Compute g1 (t) using Eq. We next prove the approximation guarantee under the backlogged trafﬁc model. The members of I are called independent sets. Before showing our results. 7 Fisher. IV. and the remaining bits ([tn − u. E ∈ I} has the same cardinality. A PPROXIMATION G UARANTEE We show that the proposed uniﬁed algorithm (Algorithm 1) achieves 1/2-approximation of the optimal solution to problems (3) and (4) under the backlogged trafﬁc model and the ﬁnite queue model. 14: If more than one codeword in a RB is not fully allocated. n and 2) the partially allocated pair of codeword and RB has the largest block error probability (pφi ) among all RB and u. for all A. Algorithm 4 Greedy Algorithm on Matroid 1: Set A = ∅ 2: repeat 3: Let a = argmaxa∈Ω−A. et al. (12) The dynamic programming algorithm based on the recursive equation (10) and the initial condition (12) is illustrated in Algorithm 3. and then to the other codeword. f is called non-decreasing if for all a ∈ Ω. I) is called a matroid if (a) A ∈ I and B ⊆ A ⇒ B ∈ I. every maximal member of I(F ) = {E : E ⊆ F.Algorithm 3 Optimal Bit Allocation with Two Codewords 1: if Qu ≥ (Tφ1 + Tφ2 )|R| then 2: Allocate all RBs in R to the user u.n The recursive equation for gn (t) is then gn (t) = sn (t) = 0≤s≤min(t.n Tφa ]+ ) are allocated to the other codeword.n b = 3 − a. Proposition 2 There exists an optimal solution to problem (9) such that: 1) at most one codeword in one RB is partially allocated (i. Tφa )(1 − p φa ) u.A∪{a}∈I f (A ∪ {a}) − f (A). j. 4: Set A = A ∪ {a} 5: until no improvement can be made on f (A) 1 Lemma 1 Algorithm 4 achieves a 2 -approximation bound to Problem (13). Moreover.2 pφi .

we do not need to enumerate all the subsets to ﬁnd the best element (u.n though there are exponential number of subsets {Cu } of RBs for each user u. (14) and (5) and noting that v(u. u. m. We ﬁrst consider the case where one codeword is used. and the set of all RBs as C. for each pair of u. Therefore.n not on the RB assignment of other users in A. we φ1 (m) use pm to represent pu. This is immediate by comparing m Eqs. we deﬁne the following function f over A. m2 .m)∈A = c∈C m m max(0. mi .n from Eq.n and use Tm to represent Tφ1 .Cu )∈A. (17) (u. max f (A) : A ∈ I.m)∈A∪{(u0 .m. (17). thus it is welldeﬁned. m) = wu ru. it is sufﬁcient to ﬁnd the optimal solution for Eq. k1 . is equivalent to Algorithm 2. and thus m m max(u. m) : u ∈ U. (in particular. In order to do so. Cu ) ∈ A. and deﬁne a set A as independent set iff i) |A| ≤ ¯ K. we need to verify that in each iteration of Algorithm 1. A ⊂ B. m1 . m0 ) maximizes the function f (A ∪ {(u. Since the latter is a submodular maximization problem over a matroid. we sort the RBs in Cu in the increasing order of pm and let σj be the index u. Cu ) that maximizes f (A ∪ {(u. (15) u. Algorithm 1 achieves 1/2-approximation of the optimal solution. m0 )}) − f (B).. (14) Lemma 2 The optimal value of f (A) in Eq. m ∈ M}. Ci ) in A can be viewed as assigning a set of RBs Ci to a user ui with transmission mode mi . m. It can be easily veriﬁed that the system (Ω. We now show that the same conclusion holds for the ﬁnite queue model.σ 0 for j > k1 + 1. in order to solve problem (4).Cu )∈A. I) is a partition matroid. It can easily be veriﬁed that the system deﬁned by (Ω. Alu. we need to i) construct a matroid.c ) . Given an independent set A (i. a partition matroid). (u0 .n u. which is proved in Appendix E. fortunately. we notice the following fact. It is obvious that the function f is non-decreasing by its deﬁnition. n. m)} − f (A). For a given element (u.Theorem 2 Under the backlogged trafﬁc model. Deﬁne and ii) u1 = u2 for any two elements (u1 . Then the members of the set I are all independent sets. Instead. Each element (ui .n u.m)∈A wu ru. = u. wu tm (1 − pm ) − V (n)).m0 )} c∈C (u. Cu )}) − f (A) ˜ max(0. We u.m)∈B = f (B ∪ {(u0 . m0 ) ∈ Ω − B. Cu ). (17) is equal to the optimal objective value in problem (4) with one-codeword model. m.e.c The submodularity of the function f (A) can be proved similarly as in Theorem 2. Cu ⊆ C}.n (16) We next show that the function f deﬁned above is submodular. ¯ Deﬁne a set A ⊂ Ω as an independent set iff i) |A| ≤ K. m.m)∈A wu ru.Cu ∋n max wu tm (1 − pm ) u. ii) map the objective function in problem (3) to a nondecreasing submodular function on the constructed matroid. we need to ﬁnd (u.c ) . we efﬁciently determine the optimal subset . m.m)∈A m max wu ru. m2 ) in A.n deﬁne the set Ω = {(u. Therefore. We deﬁne the set of users as U. we can apply the same greedy procedure as in Algorithm 4 to obtain a 1/2-approximation. It remains to be shown that the greedy step (step 3) in Algorithm 4. (6). wu0 ru00 − max wu ru. Cu ). C1 ) and (u2 .m. For any A. Note that tm only depends on (u. 8 (18) ˜ where V (n) = max(u. C2 ) in A. I) forms a matroid. B ⊆ Ω. u1 = u2 .n of the j th sorted RB in Cu .c ≥ c∈C (u.n n∈Cu where the inequality is because A ⊂ B. m. the set of transmission modes (including MCS and MIMO modes) as M. m ∈ M. when applied to solve problem (17). Ci ) : 1 ≤ i ≤ |A|}. Let A = {(ui . The objective function on each subset A ⊂ Ω is deﬁned as f (A) = c∈C (u.c . m1 ) and (u2 . Algorithm 1 along with Algorithm 2 achieves 1/2approximation of the optimal solution. Proof: Our goal is to map the uniﬁed scheduling algorithm (Algorithm 1) to the above greedy algorithm (Algorithm 4) for maximizing submodular functions under a matroid constraint. Then we let the number of bits allocated on RB σj to be T m for j = 1. and ii) for any two elements (u1 . f (A ∪ {(u0 . f (A) n∈C (u. mi . the selected (u0 . The collection I is the set of all independent sets.Cu ∋n wu tm (1 − pm ). We deﬁne the complete set Ω = {(u. and iii) prove that the greedy steps in each iteration of the two algorithms are equivalent. However. Theorem 3 Under the ﬁnite queue model with one codeword per-user.c m m max(0. and m V (n) = max(u.n u. it is non-decreasing and sub-modular. m. · · · . A ∈ I). where A is the present selected set of users and transmission mode pairs. Proof: Since the transmission mode m uniquely determines the MCS φ1 of the only codeword on each RB n. Finally.m)∈B wu ru. which is the greedy step in Algorithm 1 for the ﬁnite queue model.n .m)∈A m max wu ru. It should be noted that an independent set A may not correspond to a RB assignment in the sense that two users in A may have overlapping RBs. We consider the following problem of maximizing a submodular function over a matroid. wu0 ru00 − max wu ru. For the greedy step in Algorithm 4.c ≤ max(u.c Qu where k1 = ⌊ Tm ⌋. tm j = Qu − k1 Tm for j = k1 + 1. m0 )}) − f (A) m = max wu ru. Cu ) : u ∈ U.c − c∈C (u.

. To do so. u. Lemma 3 The optimal value of f (A) in Eq.m(φ1 . the transmission rank. φ2 ). For a given element (u. Therefore. (22) for a given user u (so that the greedy step needs to scan every user and ﬁnd the optimal incremental gain). ii) Algorithm 2 ﬁnds the optimal solution to problem (18). Cu ) to maximize f (A ∪ {(u.g. Finally. m ∈ M. [Qu − k=1 Tφjk ]+ ). wu tu (1 − pm ) − V (n)) u.n bit allocation for problem (8) is identical to that for problem (18).Cu )∈A. Proposition 1 shows that. φ2 ). which is very similar to problem (18). Cu ) ∈ A. we provide the approximation results under the ﬁnite queue model with two codewords.n u. ˜ i) V (n) = V (n) (V (n) is deﬁned as in Eq.n = min(Tφjl . Therefore.n u.n ˆ V (n) = (u. we deﬁne the RB-codeword pair (n. we only consider the case that all users are allowed to have two codewords. which is a submodular function maximization problem over a matroid. We note that the transmission mode m itself includes other components such as the MIMO operational mode (transmit diversity or spatial multiplexing). jl be the RB and codeword pair with the lth smallest block error probability. wu ( j=1 ˆ tj (1 − pφj )) − V (n)) (22) u. we can prove the following lemma.Cu ∋n max wu j=1 tj (1 − pφj ). A ∈ I). Letting tj u..n bits to the ﬁrst k1 RBs. Deﬁne a set ¯ A ⊂ Ω as an independent set iff i) |A| ≤ K. 2 in A. except that the latter only needs to ﬁnd the optimal subset Cu and the bit allocation among Cu is ﬁxed (in the increasing order of pm ).n It remains to show that. u.m(φ1 . Let A = {(ui . We have V ′ (n) = = = V (n) + max(0.n We deﬁne the set Ω = {(u. thereby completing the proof.φ2 ). V (19) for any two elements (ui . u.φ2 ).e. It can be veriﬁed that the system deﬁned by (Ω. In the following.n where is the number of bits allocated to the jth codeword on RB n and it is determined according to Eq. m(φ1 . Let V ′ (n) and V ′ (n) be the value after adding the best element in the current iteration.of RBs that achieves the maximum gain. mi (φ1 . Theorem 4 Under the ﬁnite queue model with two codewords or mixed codewords (i. Algorithm 2 also ﬁnds the optimal solution to ˜ problem (18). So we can apply the greedy procedure as in Algorithm 4 to obtain a 1/2approximation solution. When V (n) = 0 for all n. i = 1. The greedy step in Algorithm 4 is to ﬁnd (u.n ˜ ′ (n). thus V (n) = 0 = V (n) for all n. (21). Firstly. (6) with the ﬁnite queue model). and ii) u1 = u2 9 = n∈Cu max(0. Let nl .n u. φ2 ). Cu ) to A. m. note that Algorithm 2 optimally solves problem (8). Ci ) : 1 ≤ i ≤ |A|}. m(φ1 . and allocating Qu − k1 Tm bits to the (k1 + 1)th RB. (6). m(φ1 . Cu )} − f (A) 2 Thus the induction step is proved. Proof: Without loss of generality. φ2 ). some users have one codeword and others have two codewords). m(φ1 . Cui ). ˜ In the initial step. Nevertheless..n RB selections and bit allocations.n u. in the optimal solution to problem (8). m(φ1 . where an element (u. Algorithm 2 reduces to sorting RBs in the increasing order of pm . I) is a partition matroid.Cu )∈A. Now we consider the problem max f (A) : A ∈ I (21) Similar to Lemma 2. the optimal u. ii) Algorithm 3 ﬁnds the optimal solution to problem (22).n u. allocating Tm u.n . Algorithm 3 optimally solves the above problem in Eq.n denote the number of bits allocated to the jth codeword on RB n. (20). Cu ). to solve problem (4). we prove the following lemma by induction on the cardinality of A. we prove by induction on the cardinality of A that. we prove that they continue to hold after adding one more element (u. We sort the RB-codeword pairs in Cu in the φj increasing order of pu. (20) We then deﬁne f (A) n∈C max wu (t1 (1 − pφ1 ) + t2 (1 − pφ2 )). This also solves problem (18). we deﬁne the following function f on A. φ2 ). Algorithm 1 along with Algorithm 2 and Algorithm 3 achieves 1/2-approximation of the optimal solution.n (note that this probability also depends on the transmission mode m). In the case of mixed codewords. where V (n) is deﬁned in Eq. V (n) = 0 and A = ∅.. u ∈ U.n ˜ ˜ V (n) + max(0. (21) is equal to the optimal objective value in problem (4) with the twocodeword model.e. which has block error probability pu. while the former allows jointly optimizing both u. respectively. Cu ∈ C}.Cu ∋n We can prove the submodularity of the function f (A) similarly as in Theorem 2.n (u. thus completing the proof. ˆ Lemma 4 i) V (n) = V (n). mi (φ1 . only one partially allocated RB can occur which is the one with the largest pm among all allocated RBs. Cu ) is viewed as assigning a set of RBs Cu to a user u with transmission mode m which contains MCS φ1 and φ2 on the ﬁrst and second codewords. φ2 ). and the precoding matrix. wu tm (1 − pm ) − V (n)) u. Now assume that both statements i) and ii) are true for A. Thus. and 2 tj u.n u. those users with only one codeword available to allocate can be viewed as having two codewords where the second codeword always has zero expected rate on each RB (e. j) to the combination of the φj jth codeword and RB n. we only need to solve the problem in Eq. Given an independent set A (i. the per-RB block error probability of the second codeword pφi is always 1). φ2 ). we have l−1 tjl l u.

e. The sector throughput and the cell-edge throughput in each drop are measured and the average values among the 20 drops are then reported. When the number of allocated users reaches the maximum limit. Performance upper bound: We also establish a performance upper bound under the backlogged trafﬁc model by relaxing both the common transmission mode constraint and the user number limit. Unless otherwise speciﬁed. In a typical LTE setup. where even if the user number limit and the common transmission mode constraint are lifted. A. V. the problem (due to the ﬁnite queue constraint alone) is NP-hard and does not even permit (1 − δ)-approximation algorithms for a small δ > 0. number of layers) as well as the precoding matrix is determined by the scheduling algorithm. A Doppler fading equivalent to velocity of 3Km/hour (ITU Pedestrian-B model) is assumed in the simulation. we feel that this algorithm is too complex for practical implementations. We assume that in each subframe. (1 − 1/e)-approximation algorithms. Nevertheless. We consider a MIMO channel model incorporating path loss. Users are uniformly randomly distributed within the cell but at least 250m away from the BS. We assume that 16 rank-1 and 16 rank-2 precoding matrices. We consider an LTE subframe with 10 to 50 RBs and a MIMO system with 4 transmit antennas and 2 receiving antennas forming 1 or 2 layers depending on the channel conditions as well as the user capability. Backlogged trafﬁc: impact of # RBs.. M is larger than 10000. we can apply a process called continuous greedy algorithm. as established in [5]. We also adapted it for the ﬁnite queue model as follows. we consider an LTE downlink system with 20 users. 10 We evaluate two possible scenarios. Alg1 in [13] is designed for backlogged case only. ii) the average and the sector throughput improvement compared to reference Alg1 is . In the second scenario. We can see that i) using two layers can signiﬁcantly boost the aggregate throughput (over 20% compared to using 1 layer). Thus. 1. which has (1 − 1/e)-approximation guarantee. lognormal shadowing. The proof of Lemma 4 is deferred to Appendix F. 20 RBs. If a user receives allocations but does not have enough trafﬁc to ﬁll the allocated RBs. the continuous greedy algorithm requires a complexity of O((KM )7 ) where K is the number of users and M is the number of transmission modes. at most 10 users can be scheduled. the unused RBs are then re-scheduled to the remaining users. we simply ﬁnd the best combination of user and transmission mode on each RB (in terms of weighted rates) and compute the received rates based on this allocation. The scheduling is ﬁrst determined based on the backlogged model. A single-cell OFDMA MIMO downlink system based on LTE is considered with cell radius varying from 500m to 2000m. The scheduling algorithm is performed on each subframe and the user weights are updated according to the proportional-fair algorithm between subframes. the rest of RBs are only allocated among the set of users who already receive allocations. and 29 possible MCSs are available. while the celledge throughput decreases steadily as the number of users sharing the bandwidth increases. Backlogged trafﬁc Figure 1 shows the aggregate sector throughput and the cell edge throughput. to obtain the upper bound. Backlogged trafﬁc: impact of # users. [8] to solve the problem. P ERFORMANCE EVALUATION An event-driven OFDMA-MIMO simulator written in C++ is developed to evaluate the proposed LTE MIMO scheduling algorithms.18 UB−2 layers UB−1 layer Unified−2 layers Unified−1 layer RefAlg1−2 layers RefAlg1−1 layer 100 90 Cell−edge throughput (Kbps) 80 70 60 50 40 30 20 30 40 Number of users per cell 50 20 10 20 30 40 Number of users per cell 50 UB−2 layers UB−1 layer Unified−2 layers Unified−1 layer RefAlg1−2 layers RefAlg1−1 layer 40 35 Sector throughput (Mbps) 30 25 20 15 10 5 0 10 20 30 40 Number of RBs 50 UB−2 layers UB−1 layer Unified−2 layers Unified−1 layer RefAlg1−2 layers RefAlg1−1 layer 200 180 Cell−edge throughput (Kbps) 160 140 120 100 80 60 40 20 10 20 30 40 Number of RBs 50 UB−2 layers UB−1 layer Unified−2 layers Unified−1 layer RefAlg1−2 layers RefAlg1−1 layer Sector throughput (Mbps) 16 14 12 10 8 10 Fig. along with the corresponding upper bound. until all RBs are allocated or the number of scheduled users reaches the limit. Multiple rank-1 precoding matrices are available for scheduling in this case. Since both problems (3) and (4) can be mapped to the problem of maximizing a submodular function over a matroid. developed in [18]. as deﬁned in the LTE standard. In the ﬁrst scenario. as a function of the number of users. Reference Algorithm: We adapted the Alg1 in [13] to consider the constraint of maximum user limit and multiple transmission modes. and multi-path Rayleigh fading. where the cell-edge throughput is deﬁned as the 5th percentile of all user throughput. The aggregate throughput gradually increases due to multi-user diversity. only 1 layer is supported due to the limited user capability. As a result. Fig. 2. 20 random drops are simulated and each drop is executed for 500 subframes. cell radius 1500m. We note that such an upper bound is not possible for the ﬁnite queue model. up to 2 layers are supported and the rank(i.

increases as the arrival rate decreases. Figure 4 shows the throughput vs. 2005. We also show the existence of (1 − 1/e)-approximation polynomial-time schemes although they may not be amenable to practical implementations. 2005.org/article/lte. This is because when the cell is under-loaded.e. and the gain 11 600 25 Sector throughput (Mbps) UB−2 layers UB−1 layer Unified−2 layers Unified−1 layer RefAlg1−2 layers RefAlg1−1 layer 500 400 300 200 100 0 500 UB−2 layers UB−1 layer Unified−2 layers Unified−1 layer RefAlg1−2 layers RefAlg1−1 layer 20 15 10 5 500 1000 1500 Cell radius (m) 2000 Cell−edge throughput (Kbps) 1000 1500 Cell radius (m) 2000 Fig. K. we observe a much larger performance gain (on the sector throughput) of the uniﬁed algorithm compared to the Alg1 in [13]. which is signiﬁcantly better than the worst-case guarantee of half. Vijayakumar. Compared to Alg1 in [13]. available bandwidth) on the system performance. [2] 3GPP.org/article/lte-advanced. our algorithm can deliver more trafﬁc for the cell-edge users. Similar to the previous case.913 . Stolyar. [4] M. the reference scheme Alg1 always attempts to ﬁrst allocate to users with higher weighted rate even if they only have a small amount of data. while the cell-edge throughput is nearly unaffected. C ONCLUSION In this paper. Optimal utility based multiuser throughput allocation subject to throughput constraints. This results in a low cell-edge throughput even when the arrival rate is low. This is because when the cell radius decreases. and iii) the proposed algorithm performs well within 10% of the upper bound in all cases for both the aggregate throughput and the cell-edge throughput. we plan to apply variations of these algorithms to solve the scheduling problems in other OFMDA-based systems such as WiMAX. the cell-edge user rate is initially ﬂat. 4. This is not surprising. interior users gradually take more resources.3gpp. For the future work. Andrews. resulting lower throughput for cell-edge users until a stable point is reached due to the proportional-fair-based weightupdate. In comparison. using 2-layer MIMO can improve the sector throughput by 20-30%. For our uniﬁed algorithm. VI. Andrews. even with 50 RBs. compared to the reference Alg1. It is clearly seen that the both the aggregate and the cell-edge throughput increase linearly with respect to the available bandwidth. ranging from 9% to 31%. 2004. the cell radius. Figure 2 shows the impact of the number of RBs (i. and A. we have studied the LTE and LTE-A downlink scheduling problem with MIMO and several practical constraints. as established in the analysis. When the loading increases. Thus. We also notice that when the number of RBs increases. UTRA-UTRAN long term evolution (LTE) and 3GPP system architecture evolution (sae). When the number of RBs increases. http://www. It is interesting to observe that when the cell radius decreases. the improvement is 24%. Again. Probability in the Engineering and Information Sciences. In IEEE Infocom. 14 12 Sector throughput (Mbps) 10 8 6 4 2 0 Backlogged trafﬁc: impact of cell radius. it is more beneﬁcial to apply our uniﬁed algorithm in the more practical trafﬁc model with ﬁnite queue length. Finite queue: impact of arrival rate. Finite queue trafﬁc model We now investigate the case of ﬁnite queue. B. S. 18:191–217. when the cell radius is 500m. Kumaran. Ramanan.3gpp. We prove that the problems are NP-hard and then develop a uniﬁed algorithm that can achieve 1 -approximation for both the trafﬁc mod2 els. the increasing gap is more likely because the upper bound becomes looser. Stolyar. while for the cell radius of 2000. the channel SNR increases and it is more beneﬁcial to use spatial multiplexing in the high-SNR scenario.requirements for further advancements for E-UTRA (LTE-Advanced). we observe that our uniﬁed algorithm achieves an average improvement of 9% for the sector throughput and 6% for the cell-edge throughput. Nevertheless.. the sector throughput of using 2-layer MIMO is 76% more than that of using 1-layer MIMO. where a ﬁxed trafﬁc arrival rate is employed at every user in each run. Therefore. [3] M. and then decreases after the arrival rate is more than 200Kbps. while the aggregate user rate increases almost monotonically. . In this case. Figure 3 shows the system performance vs. Cell−edge throughput (Kbps) Unified−2 layers Unified−1 layer RefAlg1−2 layers RefAlg1−1 layer 100 90 80 70 60 50 40 30 20 Unified−2 layers Unified−1 layer RefAlg1−2 layers RefAlg1−1 layer 200 400 600 800 Arrival rate (Kbps) 1000 10 200 400 600 800 Arrival rate (Kbps) 1000 Fig. L. R EFERENCES [1] 3GPP. http:http://www. Both a backlogged trafﬁc model and a ﬁnite queue model are considered for the problem. 3.Whiting. K. our uniﬁed algorithm achieves about 10% performance gain for both the cell-edge throughput and the average throughput and the performance gain increases as the number of RB increases. Scheduling in a queueing system with asynchronously varying service rates. TR 36. the gain of using 2-layer MIMO increases signiﬁcantly. the upper bound allows to choose user and transmission mode independently on each RB and beneﬁts more from the channel diversity. the trafﬁc arrival rate. R.7-12% and the cell-edge improvement is mostly 5-10%. and P. the performance gap from the upper bound also increases. our uniﬁed algorithm still retains the performance gap within 10%. 2010. For example. Qian.

C. 2008. Downlink mimo with frequency-domain packet scheduling for 3gpp lte. May 2008. Rank-override is particularly helpful when the assigned RBs are not in the set of preferred subbands reported by the user. 2001. S. Using the user report. it reports a common transmission rank and precoding matrix for all subbands. J. Stolyar. On the asymptotic optimality of the gradient scheduling algorithm for multiuser throughput allocation. Research Letters in Communications. Nemhauser. In the 40th annual ACM symposium on Theory of computing (STOC). Khoshnevis. Kwan. and M. [7] K. Choudhury. March 2008. and G. [8] G. 2008. Analytic Methods in Applied Probability. Kela. and G. Step 4. April 2007.2. An analysis of approximations for maximizing submodular set functions-ii. through an example. pages 699–703. Using (4/2)γi . P. Multiuser scheduling on the downlink of an lte cellular system. Wolsey. In IEEE Wireless Communications and Networking Conference (WCNC’08). M. and J. Shakkottai and A. it assumes a per-layer-power of P/4 while computing its CQI. the CQI values of two codewords are reported. which results in the allocation H2 . Vondrak. Downlink scheduling with probabilistic guarantees on short-term average throughputs. This also proves that the obtained solution contains at most one partially allocated RB. 1 ≤ i ≤ 2 as the new SINRs corresponding to the two columns of W ′ is 12 P ROBLEM (8) The problem appears simple and our ﬁrst thought is to have the following algorithm. if the partially allocated RB is determined. Q = 1500 (the user . [15] B. I. The shifting can continue until either one RB is empty or one RB is fully allocated. S. pm = 0.E. let k1 ← ⌊Qu /Tm ⌋ Step 2. Armour. Pal. [11] P.3. [10] M. Jordan. where P is the total transmit power allocated on each RB. The proof of the lemma is only sketched here.OVERRIDING IN LTE In the LTE downlink with MIMO. and J. L. If the transmission rank is at least 2. [18] Jan Vondrak. Doufexi. [13] S. G. A PPENDIX C A NAIVE ALGORITHM FOR A PPENDIX A R ANK . Thus it ﬁnds the optimal solution. K. and 0 otherwise. [17] A. Frequency domain packet scheduling under fractional load for the utran lte downlink. 2009. is a valid rank-2 precoding matrix. Scheduling algorithms for multi-carrier wireless data systems. of the 65th IEEE Vehicular Technology Conference (VTC’07). Optimal approximation for the submodular welfare problem in the value oracle model. S. along with a CQI value on each subband for every codeword.n V (n). The algorithm returns the better allocation between H1 and H2 . The sorted index n of the partially allocated RB can be either greater than k1 . In IEEE Infocom. Thus it completes the proof. Beh. [16] S. Andrews and L. Moisio. the base station (BS) can then compute a per-RB SINR estimate γi for each layer i : 1 ≤ i ≤ 4 which corresponds to the ith column wi of W. For a user reported rank of r. In IEEE Infocom. A. and S. Zhang. say W ′ = [w1 . Mourad. and T. [14] A. Maximizing a monotone submodular function subject to a matroid constraint. In IEEE Vehicular Technology Conference. Now assume that there is one partially allocated RB and the remaining RBs are either not allocated or allocated with full capacity. It is not hard to see that the switched allocation has a higher gain. allocate RB k1 + 1 with Qu − k1 Tm bits if the gain of the allocation is positive.n receiving allocations. The BS may choose to override the user recommended rank based on these derived expected data rates. If two RBs are partially allocated. Xu. P ROOF A PPENDIX B OF P ROPOSITION 1 We ﬁrst prove the following lemma. Lemma 5 There exists an optimal allocation such that only one RB is partially allocated. July 1978. and perhaps more so with a ﬁnite queue size. we can shift the allocation from one RB to the other. de Veciana. S. Lu. Baek. the rank employed by the BS can vary from 1 to r and the the expected data rates for up to r possible MIMO modes can be derived at the BS based on the user feedback report. then it will also report a rank 4 precoding matrix W = [w1 . Joint time-frequency domain proportional fair scheduler with harq for 3gpp lte systems. In IEEE Vehicular Technology Conference. pm = 1 2 3 0. We now show. Ristaniemi. V (2) = 0. Delay-optimal opportunistic scheduling and approximations: the log rule. [12] R.1. Operations Research. This contradicts with the optimality of the solution. Henttonen. Therefore. we just pick k1 other RBs with the highest gain. Sadiq. if the partially allocated RB does not have the largest pm among all RBs u. 2005. Mathematical Programming Study. 2007. Lee. L. the nested structure of LTE codebook ensures that at least one rank-2 column subset of W . V (3) = 200. Stolyar. Las Vegas. allocate Tm bits to the ﬁrst (sorted) k1 RBs. At least one way of shifting does not decrease the sum of gain. Pedersen. Leung. Zhang. Step 3. C. w4 ] and 2 CQIs (for two codewords) on every subband (or for a subset of subbands it prefers). Chen and S. Sep. L. 207:185–202. Puttonen. In ACM Mobicom 2007.n and receiving full allocation. Nev. Brazil. that this algorithm is not optimal. we can switch the allocation between the partially allocated RB and the one having the largest pm u. and A. Let Tm = 1000. w3 . C. 53(1). Kolehmainen. w2 ]. which in LTE is limited to 4. L. In the 3rd International Symposium on Wireless Pervasive Computing (ISWPC’08). Kolding. Assaad and A. A. New frequency-time scheduling algorithms for 3gpp/lte-like ofdma air interface in the downlink.E. Rosa. Note that since the user prefers a transmission rank of 4. Monghal. Then the per-layer-power that can be used by the BS is P/2 instead of P/4. 2009. To appear in SICOMP. a good and conservative choice as it only considers the signal power increase (as a result of transmitting two layers instead of four) but not the decrease of interference from other layers. sort RBs in R in the decreasing order of Tm (1−pm )− u. Sep. pm = 0. USA. To prove the last statement in the theorem. Pokhariyal. 2008. Dynamic packet scheduling performance in utra long term evolution downlink. Fisher.[5] M. C. V (1) = 399. [9] N. when a user needs to report channel feedback. which leads to the allocation H1 . Calinescu. N. Mogensen. J. In Proc. May 2008. 2008. T. or less than or equal to k1 . Step 1. Scheduling for multiple ﬂows sharing a time-varying channel: the exponential rule. T. Chekuri. Jan. [6] M. If the BS decides to reduce the rank to 2. w2 . Suppose that a user reports rank 4.

A PPENDIX E P ROOF OF L EMMA 2 Let OP T denote the optimal value of problem (4) with one codeword and A∗ denote the optimal solution to f (A) in Eq. we conclude that OP T = f (A∗ ). Therefore. the allocation on one of them can be removed φj without any loss of the gain). φ2 ). Clearly. we can continue shifting until either the jth codeword on RB n is empty or the ith codeword on RB l becomes full. A PPENDIX F P ROOF OF L EMMA 4 For the second statement.n receiving full allocation.n > pφi . We can construct a set A from it such that A does not contain overlapping RBs as follows. We u. OP T ≤ f (A∗ ).n = 1. resulting in a total gain of 1049.n A PPENDIX D P ROOF OF P ROPOSITION 2 Suppose the optimal solution ti . if n only belongs to one Ci . Suppose that x∗m .e. For each u.n ˆ tj (1 − pφj ) − V (n)) u. say tj and ti . However. Thus. Both problems (9) and (22) can be optimally solved by 1) allocating Qu bits to all RB-codeword pairs in the increasing order of the block error probability pφi u. (20) while that in Eq. we have OP T ≥ f (A∗ ).n u. one user can appear in A at most once. Due to the second constraint n.l can shift the allocation from the jth codeword on RB n to the ith codeword on RB l. The difference is that the bit-allocation in (22) is ﬁxed according to Eq. Therefore. · · · . 2 n contains two partially allocated RB-codewords. n l We assume that the gain on both RBs n and l is positive (otherwise.n (1 − pu.. This completes the proof. (9) is chosen optimally. Let A∗ = {(ui . Due to the ﬁrst constraint in problem (4).n = 1.index u is omitted).n ) (ties are broken arbitrarily). we can shift the allocation from the latter to the former to obtain higher gain.n uj All remaining xm = 0. by Proposition 1. Therefore. Finally. Next. (17). RBs 2 and 1 will be selected and allocated with 1000 bits and 500 bits. respectively. bits are allocated in the increasing order of pφi . V (n) = 0 and A = ∅. m.n which is identical to that in Eq. t∗ are the optimal solution to u.n > 0. thus ˆ V (n) = 0 = V (n). and 2) ﬁnding the user u that has the largest gain. we ﬁnd uj = ui m m argmaxu:Cu ∋n wu tu. u. the optimal solutions to both problems (9) and (22) are equivalent. The resulting xm corresponds to u. For each RB n.n for each user u. if the only partially allocated RB-codeword has smaller block error probability pφi than a RB-codeword u. we consider the optimal solution A∗ to f (A) in Eq. Since the 13 bit allocation strategy is the same. the function value f (A) is exactly OP T . mi . Let ˆ V ′ (n) and V ′ (n) be the value on RB n after adding the best element in the current iteration. According to the above algorithm. (23) This completes the induction step. wu ˆ V ′ (n). u. we prove that they continue to hold after adding a new element (u. where tm is deﬁned as in Eq. n = 1. respectively. it is sufﬁcient to show that the optimal solutions to problems (9) and (22) (for a given user u) are equivalent since Algorithm 3 solves problem (9) optimally. they are equivalent. a solution to problem (4)) as follows. such shift does not decrease the total gain. For each set of RBs allocated to a user. m(φ1 . then we set ′ xmi. Therefore.. The optimal gain is 1150. u. Now assume that both statements i) and ii) hold for A. we deﬁne an element (u. Therefore. wu j=1 2 tj (1 − pφj ) − V (n)) u. and set xm . the RBs allocated to different users have no overlap. (17).n j=1 ˆ V (n) + max(0. The above example also shows that replacing step 2 of the naive algorithm with “sorting the RBs in the increasing order of pm ” does not work either.n allocated. Cu ). Cu ) ∈ A such that Cu = {n : x∗m = 1}. we can convert it to a valid RB assignment (i.n a valid RB assignment (i. thereby satisfying the independent constraint for the set A.e. m with ∗m n xu. 1 ≤ i ≤ |A∗ |}.n n problem (4). However. i = 1. We have 2 V ′ (n) = = = V (n) + max(0. This coincides with the RB allocation strategy when computing f (A). We note that problems (9) and (22) both require to ﬁnd a subset of RB-code pairs and to allocate them Qu bits.n u.n u. But the optimal solution is to allocated RBs 2 and 3 with 1000 bits and 500 bits. only the RB with the largest pm among all RBs receiving allocations may be partially u. it may not correspond to a valid RB allocation. Assume that pu. (20). . (15). In the initial step.n in problem (4). without overlapping). Since it may contain overlapping RBs. Proposition 2 shows that in the optimal solution to problem (9). Ci ). If n belongs to multiple Ci s. N.