Professional Documents
Culture Documents
Abstract—One of the significant challenges for managing that the actual RA procedure is not efficient for managing
machine-to-machine (M2M) communication in cellular networks, massive simultaneous access, as the physical random access
such as LTE-A, is the overload of the radio access network due channel (PRACH) suffers from a large number of MTCDs
to very many machine type communication devices (MTCDs)
requesting access in burst traffic. This problem can be addressed competing for resources [4], [5]. So, several methods have
well by applying an access class barring (ACB) mechanism to been proposed to control the congestion and provide better
regulate the number of MTCDs simultaneously participating in network performance.
random access (RA). In this regard, here we present a novel deep In LTE-A, 3GPP suggests the use of access class barring
reinforcement learning algorithm, first for dynamically adjusting (ACB), and then extended access barring (EAB), for con-
the ACB factor in a uniform priority network. The algorithm is
then further enhanced to accommodate heterogeneous MTCDs gestion control in the PRACH. ACB is a probability based
with different quality of service (QoS) requirements. Simulation solution that limits the number of MTCDs requesting network
results show that the ACB factor controlled by the proposed access and RA is performed simultaneously. The optimal value
algorithm coincides with the theoretical optimum in a uniform of the ACB factor that best reduces congestion and access
priority network, and achieves higher access probability, as delay if the BS knows the number of backlogged users was
well as lower delay, for each priority class when there are
heterogeneous QoS requirements. determined in [6]. In [7], the effectiveness of an ACB method
in highly congested environments was evaluated according
I. I NTRODUCTION to key performance indicators such as delay and energy
Machine-to-machine (M2M) communications, also known consumption. A Markov-Chain-based traffic-load estimation
as Machine-Type Communications (MTC), feature a wide scheme according to network collision status was developed in
range and large number of autonomous devices communicat- [8]. In [9], the authors proposed a dynamic RACH preamble
ing with little or no human intervention, and are an essential allocation scheme based on the ACB factor.
part of the internet of things (IoT) and future wireless systems As MTCDs have a diverse range of applications, quality
[1]. To satisfy versatile massive M2M traffic characteristics, of service (QoS) requirements in M2M communications are
from best effort applications like water/gas metering systems highly variable. Therefore, meeting the growing range of QoS
and environmental monitoring, to ultra-reliable ones such as requirements for MTC devices is an urgent area for research.
healthcare, public safety and mission-critical industry, enhanc- However, little attention has been given to QoS provisions
ing cellular networks is a promising means to accommo- where MTCDs with different requirements are treated accord-
date such heterogeneous requirements. However, as current ingly. QoS requirements, for various M2M services, include
LTE-A networks have predominantly been used to support delay requirements as a primary concern. As MTCDs can
Human-to-Human (H2H) communications, there are problems have various QoS requirements, a multiple access class barring
for MTCDs accessing LTE networks. Due to an anticipated (MACB) scheme is proposed in [10], which assigns distinct
massive number of MTCDs, each carrying a small amount access probabilities to different classes of MTCDs requiring
of data to be transmitted, simultaneous transmission attempts different service levels. Similar mechanisms are presented
from these MTCDs requires large-scale synchronization and in [11] and [12], but the parameters are adjusted according
will result in data traffic flow congestion in the radio access to estimated traffic analysed with partial information of the
network (RAN) [2], [3]. M2M communications can cause network.
signaling congestion in several ways: an external event can Reinforcement learning (RL) is a type of machine learning
trigger very many MTCDs to become active from an idle technique that mimics the fundamental way in which humans
state and access the network at once, or massive number learn. A reinforcement learning agent evolves by interacting
of scheduled MTCDs periodically request network access to with the environment through observing it, taking actions and
report data. receiving immediate reward feedback, while the goal of the
Each MTCD performs a random access (RA) procedure for agent is to select actions that maximize cumulative future
initial uplink access and and to synchronize with the base rewards. Deep reinforcement learning (DRL) is an enhanced
station (BS/eNodeB). in LTE-A. Recent studies have shown version of RL that uses a deep neural network (DNN) to
Characteristics Traffic Model 1 Traffic Model 2 Class name Application example QoS requirement
Number of MTCDs 1000,3000,5000,10000,30000 1000,3000,5000,10000,30000 High priority Seismic alarm/E-care extremly strict delay
Arrival dist. Uniform dist. over T Beta(3,4) dist. over T
Dist. period (T) 60 seconds 10 seconds Low priority Consumer electronics medium delay
Scheduled Smart meters delay tolerant
C. Network Configuration are activated in Beta(3, 4) distribution over 10s, for low
To evaluate congestion solution proposals, 3GPP TR 37.868 priority category, we consider 4,000 MTCDs whose access
[15] defines two different traffic models (see Table I) for the attempts are distributed uniformly over time, with an arrival
evaluation of the network performance with M2M communi- rate of 400 per second; scheduled devices follow the traffic
cations. This gives examples of how each type of MTCDs model 2, and each scheduled burst traffic contains 30,000
reestablish connections with the eNB in typical scenarios. MTCDs, which are activated in Beta(3, 4) distribution over
Traffic model 1 can be viewed as a typical scenario in which 10s. In the simulation period, high priority category burst
M2M devices access the network uniformly over a period of traffic seldom happens, while scheduled priority MTCDs cause
time, i.e., in a non-synchronized manner. Traffic model 2 can periodic burst traffics more often.
be considered as an extreme scenario in which a large amount There is 1 RAO every 5ms and 𝑀 = 54 out of the 64
of M2M devices access the network in a highly synchronized available preambles are used for contention-based RA. Under
manner, e.g., after an application alarm that activates them. these conditions, the system offers 200 RAOs per second;
preambleTransMax, which is the maximum allowable num-
In traffic model 2, Each MTC device is activated at time
ber of msg1 transmissions, is set to 10. ENB broadcasts
0 ≤ 𝑡 ≤ 𝑇 with probability 𝑔(𝑡), following a beta distribution
an ACB factor 𝑃𝐴𝐶𝐵 as part of the system information
with parameters 𝛼 = 3, 𝛽 = 4,
before each random access opportunity (RAO). In each random
𝑡𝛼−1 (𝑇 − 𝑡)𝛽−1 access channel, an MTC device, which has not yet connected
𝑔(𝑡) = (1) to the network, generates a random number 𝑞 ∈ [0, 1]. If
𝑇 (𝛼+𝛽−1) B(𝛼, 𝛽)
𝑞 < 𝑃𝐴𝐶𝐵 , then the requested packet will be sent. Otherwise,
where B(⋅) is the beta function. the MTC device stays silent and waits for 𝑇𝑏𝑎𝑟𝑟𝑖𝑛𝑔 until
According to the RA procedure, eNB does not know MTCD the next RAO, in which both the new activations in the
identifiers until msg3’s are successfully received. Thus, to next slot and the backlogged users will perform an ACB
allocate distinguishing RA resources with respect to different check before transmission. We apply exponential barring time,
QoS requirements, we classify MTCDs into 3 categories: high 𝑇𝑏𝑎𝑟𝑟𝑖𝑛𝑔 = (0.7 + 0.6 × 𝑟𝑎𝑛𝑑) × 2𝑁𝑏𝑎𝑟𝑟𝑖𝑛𝑔 ; where rand =
priority, low priority and scheduled as shown in Table II, and U[0, 1) and 𝑁𝑏𝑎𝑟𝑟𝑖𝑛𝑔 is the number of failed ACB checks.
assign them different resources through separated ACB factors. When the RA attempt of a UE fails, we also apply expo-
High priority category includes public safety devices and nential backoff policy, where backoff time depends on the
healthcare applications etc., whose traffic features a low fre- number of preamble transmissions, 𝑃𝑡 , attempted previously
quency of occurrence, extremely short delay constraints and by 𝑇𝐵𝑂 = 𝑈 (0, 10 × 2𝑃𝑡 −1 ). When 𝑃𝑡 > 10, the network is
high channel access success rate requirements. This type of declared unavailable by the UE and the problem is reported
traffic is best represented by traffic model 2, in which an to upper layer.
unpredicted event may trigger thousands of MTCDs. We are interested in estimating the total time it takes
Low priority category includes consumer electronics, fac- for the eNodeB to collect each user’s data. If a preamble
tory management sensors etc., which features looser delay con- is successfully transmitted, the actual user data will then
straints and medium channel access success rate requirements. be transmitted without contention on PUSCH via scheduled
The activating distribution can be represented by traffic model transmissions that takes a fixed time. Therefore, the time for
1, in which the devices are uniformly activated without a burst. all the MTC devices to successfully transmit Step 1 preamble
Scheduled priority category contains delay tolerant devices sequences dominates the total delay. We define delay 𝐷𝑖 as the
such as smart meters, which report data periodically to the number total of random access opportunities before MTCD 𝑖 is
eNB. A large number of these devices reports data periodically successfully connected to the network after the RA procedure.
to the eNB during a short period, e.g., every half hour, and
III. D EEP R EINFORCEMENT L EARNING BASED ACB
this burst of MTC traffic is the main factor causing the RACH
FACTOR CONTROLLING A LGORITHM
overload. They fit into traffic model 2, in which tens of
thousands of them are periodically activated at the same time. A. Optimising single ACB factor in uniform priority network
Network performance is evaluated in a single cell en- In this section, we present the DRL-based ACB factor
vironment, where high priority, low priority and scheduled controlling algorithm with respect to uniform priority. In net-
categories co-exist in the cell. Thus the network is subjected works where QoS of MTCDs are not of important concern, or
to different access intensities; for high priority MTCDs, we broadcasting multiple ACB factor is unpractical, we implement
consider in each event triggered burst traffic, 10,000 MTCDs DRL-based algorithm with a single ACB factor. As all MTCDs
in the cell are uniformly treated, the optimization problem is memory in this offline procedure can smooth out learning and
simply to maximize the total number of MTCDs successful in avoid oscillations or divergence in parameters.
RA over a particular period. Then, based on the offline-built DNN, the deep Q-learning
If the system knows the number of request packets waiting is adopted for further improving of the controlling of online
to be transmitted in each RAO, then the theoretical optimal dynamic ACB factor. In each decision epoch 𝑡𝑘 , the DRL agent
∗
ACB factor 𝑃𝐴𝐶𝐵 can be numerically derived to maximize derives the estimated 𝑄 value from the DNN with the input of
the number of MTCDs successful in each RA procedure [6]. current state 𝑠𝑘 and each available action 𝑎𝑘 . Then we apply
The expected number of successful preamble transmissions 𝐾 the 𝜀 − 𝑔𝑟𝑒𝑒𝑑𝑦 policy to select the execution action 𝑎𝑘 . More
when 𝑁 MTCDs are prepared to request network access is: specifically, with (1 − 𝜀) probability we follow the greedy
( )𝑛−1 policy and select the action with highest 𝑄 value, or with 𝜀
𝑃𝐴𝐶𝐵
𝐸[𝐾∣𝑁 = 𝑛] = 𝑛𝑃𝐴𝐶𝐵 1 − (2) probability we select a random action.
𝑀
After taking an action, DRL agent observes another expe-
Where 𝑃𝐴𝐶𝐵 is ACB factor, 𝑀 is the number of available rience 𝑒𝑘 and store it into the experience memory 𝐷. After
preambles and 𝑛 represent the number of request packets that, the DRL agent updates weight parameters 𝜃 of the DNN
waiting to be transmitted in current RAO. By taking the with 𝑁𝐵 samples from the experience memory 𝐷 every T
derivative of 2, the maximum expected 𝐾 is achieved when epochs to avoid oscillation. In our implementation, for the
∗
𝑃𝐴𝐶𝐵 = min(1, 𝑀 𝑛 ). DNN construction, we employ a feed-forward neural network
Remark 1 The theoretical optimal ACB factor with respect that has one hidden layer of fully-connected units with 10
to 𝑀 available preambles and 𝑛 access requests is: 𝑃𝐴𝐶𝐵 ∗
= neurons. We set capacity of mini-batch 𝑁𝐵 = 32, the reward
𝑀
min(1, 𝑛 ) discount factor 𝜇 = 0.9.
However, in practice, the eNB cannot acquire the exact Since DRL employs a DNN as function approximator for
number of MTCDs requesting packets transmission in each action-value, it can deal with a large or continuous state
RAO. The information it has is limited to the number of suc- space and action space, which is very suitable for continuous
cessful transmissions and the number of preambles collided, management of dynamic ACB factor. The state space, action
as well as the time each received packet has back-off when space and reward function are defined as follows:
examining those MTCDs successful in finishing RA. Thus, State Space: The state space of the DRL agent consists of
the eNB can only estimate the upcoming traffic based on such 4 components, the number of MTCDs successfully accessed
limited information. However, there is an inherent trade-off in in the network through an RA procedure during the last RAO
choosing the ACB factor 𝑃𝐴𝐶𝐵 . When 𝑃𝐴𝐶𝐵 is too large, 𝑁𝑠 , the number of preambles being detected as collided during
there will be a lot of preambles transmitted in each RAO, last RAO 𝑁𝑐 , The average delay 𝐷𝑎𝑣𝑔 of successful accessed
and there will be a large amount of collisions on most of the MTCDs during last RAO and currently broadcasting ACB rate
preambles. On the other hand, when 𝑃𝐴𝐶𝐵 is too small, very 𝑃𝐴𝐶𝐵 .
few users will be able to pass the ACB check and transmit their
preambles, resulting in fewer collisions but then the network 𝑠 = {𝑁𝑠 , 𝑁𝑐 , 𝐷𝑎𝑣𝑔 , 𝑃𝐴𝐶𝐵 } (4)
resources are under-utilised. Action space: The action space of DRL agent is the ACB
Therefore, we present a deep reinforcement learning (DRL) factor 𝑃𝐴𝐶𝐵 ′ to be broadcast prior to upcoming RAO .
algorithm, which learns from experience and choose the best
action according to its estimated future reward. We first present 𝑎 = 𝑃𝐴𝐶𝐵 ′ (5)
a generalized form of DRL. DRL comprises two phases: Reward: The reward needs to represent the objective of the
an offline DNN training phase and a online reinforcement algorithm, which is maximizing the total number of MTCDs
learning phase. The offline training phase take in observed successful in RA procedures in each RAO. Thus, we define
data from randomly choosing actions and train the DNN the immediate reward that the DRL agent receives as number
to fit the correlation between state-action pair (𝑠, 𝑎) and of successful access after performing action 𝑎 in upcoming
corresponding value function 𝑄(𝑠, 𝑎), which represent the RAO, 𝑁𝑠′ .
expected cumulative reward with discount of staying in state The complete DRL-based ACB factor controlling algorithm
𝑠 and take action 𝑎. Value function 𝑄(𝑠, 𝑎) is given as: is presented in Algorithm 1.
0.8
where 𝑁𝑠 = {𝑁𝑠1 , 𝑁𝑠2 , 𝑁𝑠3 }, representing success Number acb
1 2 3
of each category. 𝐷𝑎𝑣𝑔 = {𝐷𝑎𝑣𝑔 , 𝐷𝑎𝑣𝑔 , 𝐷𝑎𝑣𝑔 } representing 0.7
ACB factor p
Action space: We let DRL agent make decision on the ACB 0.5
1 2 3
factor 𝑃𝐴𝐶𝐵 , 𝑃𝐴𝐶𝐵 , 𝑃𝐴𝐶𝐵 simultaneously. Specifically, the 0.4
action space of DRL agent is these 3 ACB rates to be broadcast DRL-based algorithm
0.3 Theoretical optimum
prior to the upcoming RAO. Dynamic ACB
0.2
1 2 3
𝑎 = {𝑃𝐴𝐶𝐵 ′ , 𝑃𝐴𝐶𝐵 ′ , 𝑃𝐴𝐶𝐵 ′ } (8) 0 200 400 600 800 1000 1200
Random access opportunity (RAO)
1400 1600 1800 2000
60
simultaneously satisfy distinct requirements.
50
R EFERENCES
40
30
[1] Z. Dawy, W. Saad, A. Ghosh, J. G. Andrews, and E. Yaacoub, “To-
ward massive machine type cellular communications,” IEEE Wireless
20 Communications, vol. 24, no. 1, pp. 120–128, 2017.
10
[2] H. S. Dhillon, H. Huang, and H. Viswanathan, “Wide-area wireless
communication challenges for the Internet of Things,” IEEE Commu-
0 nications Magazine, vol. 55, no. 2, pp. 168–174, 2017.
0 1000 2000 3000 4000 5000 6000 7000 8000
Random access opportunity (RAO) [3] S.-Y. Lien, K.-C. Chen, and Y. Lin, “Toward ubiquitous massive accesses
in 3GPP machine-to-machine communications,” IEEE Communications
Magazine, vol. 49, no. 4, 2011.
Fig. 3. Temporal distribution of MTCD activation, access and success number [4] L. Ferdouse, A. Anpalagan, and S. Misra, “Congestion and overload
when the DRL algorithm is activated control techniques in massive M2M systems: a survey,” Transactions on
Emerging Telecommunications Technologies, vol. 28, no. 2, 2017.
We consider the following metrics to evaluate the perfor- [5] A. Laya, L. Alonso, and J. Alonso-Zarate, “Is the random access channel
of LTE and LTE-A suitable for M2M communications? a survey of
mance of the proposed algorithm. Access delay, defined as alternatives.,” IEEE Communications Surveys and Tutorials, vol. 16,
the time delay between the first activation and the completion no. 1, pp. 4–16, 2014.
of the RA procedure for MTCDs that successfully access the [6] S. Duan, V. Shah-Mansouri, and V. W. Wong, “Dynamic access class
barring for M2M communications in LTE networks,” in Globecom
network. Access success probability, 𝑃𝑠 , defined as the prob- Workshops (GC Wkshps), 2013 IEEE, pp. 4747–4752, IEEE, 2013.
ability to successfully complete the random access procedure [7] I. Leyva-Mayorga, L. Tello-Oquendo, V. Pla, J. Martinez-Bauset, and
within the maximum allowable access delay. V. Casares-Giner, “Performance analysis of access class barring for
handling massive M2M traffic in LTE-A networks,” in Communications
The simulation results are shown in Table III, the average (ICC), 2016 IEEE International Conference on, pp. 1–6, IEEE, 2016.
delay is represented in milliseconds. By applying the DRL [8] H. He, Q. Du, H. Song, W. Li, Y. Wang, and P. Ren, “Traffic-aware
based ACB controlling algorithm, the success probability is ACB scheme for massive access in machine-to-machine networks,”
very high in all MTCD classes, and when compared with the in Communications (ICC), 2015 IEEE International Conference on,
pp. 617–622, IEEE, 2015.
prioritized RA scheme [16], the DRL algorithm achieves both [9] H.-Y. Hwang, S.-M. Oh, C. Lee, J. H. Kim, and J. Shin, “Dynamic
higher success rate and lower average delay. RACH preamble allocation scheme,” in Information and Communication
To illustrate the effect of the DRL-ACB algorithm with Technology Convergence (ICTC), 2015 International Conference on,
pp. 770–772, IEEE, 2015.
heterogeneous classes, we obtain the newly activated number [10] N. Zangar, S. Gharbi, and M. Abdennebi, “Service differentiation
of both high priority class and scheduled class MTCDs, the strategy based on MACB factor for M2M communications in LTE-
number of initial access of each category, as well as their A networks,” in Consumer Communications & Networking Conference
(CCNC), 2016 13th IEEE Annual, pp. 693–698, IEEE, 2016.
respective successful access number in each RAO. The results [11] U. Phuyal, A. T. Koc, M.-H. Fong, and R. Vannithamby, “Controlling
are shown in Fig. 3, where we observe the dynamic ACB factor access overload and signaling congestion in M2M networks,” in Signals,
applying to different classes causing different probability of Systems and Computers (ASILOMAR), 2012 Conference Record of the
Forty Sixth Asilomar Conference on, pp. 591–595, IEEE, 2012.
access. The high priority class receives a higher ACB factor [12] N. Li, C. Cao, and C. Wang, “Dynamic resource allocation and access
allowing them to succeed faster in the RA procedure, while class barring scheme for delay-sensitive devices in machine to machine
when scheduled MTCDs are activated, some of them are (M2M) communnications,” Sensors, vol. 17, no. 6, p. 1407, 2017.
[13] V. Mnih, K. Kavukcuoglu, D. Silver, et al., “Human-level control
rejected due to a low ACB factor, ensuring better performance through deep reinforcement learning,” Nature, vol. 518, no. 7540,
of higher priority MTCDs. When the number of scheduled pp. 529–533, 2015.
MTCDs is even greater, causing overload to the RA channel, [14] D. Silver, A. Huang, C. J. Maddison, et al., “Mastering the game of Go
with deep neural networks and tree search,” Nature, vol. 529, no. 7587,
the ACB factor is properly adjusted to avoid the majority of pp. 484–489, 2016.
preambles colliding. [15] 3GPP, “TR 37.868 study on RAN improvements for machine type
communications,”
V. C ONCLUSIONS [16] J.-P. Cheng, C.-h. Lee, and T.-M. Lin, “Prioritized random access with
dynamic access barring for RAN overload in 3GPP LTE-A networks,” in
In this paper, the random access network (RAN) overload GLOBECOM Workshops (GC Wkshps), 2011 IEEE, pp. 368–372, IEEE,
issue in cellular networks, such as LTE-A, has been addressed. 2011.