Access control in NB-IoT networks: a deep reinforcement learning strategy

Texte intégral

(1)Access control in NB-IoT networks: a deep reinforcement learning strategy Yassine Hadjadj-Aoul. To cite this version: Yassine Hadjadj-Aoul. Access control in NB-IoT networks: a deep reinforcement learning strategy. GDR ARC - Session Automation and Communication Networks, Nov 2020, Virtual, France. �hal03135194�. HAL Id: hal-03135194 https://hal.inria.fr/hal-03135194 Submitted on 8 Feb 2021. HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés..

(2) ACCESS CONTROL IN NB-IOT NETWORKS: A DEEP REINFORCEMENT LEARNING STRATEGY Yassine Hadjadj-Aoul Associate professor, Univ Rennes IRISA/INRIA Dionysos team-project. Lien pour les questions: shorturl.at/prNW5 Thursday, November 26th, 2020. 1.

(3) PLAN Introduction Access overview of IoT devices A model for the access Efficient support of a massive number of IoT devices using reinforcement learning Conclusions 2.

(4) INTRODUCTION. Massive access of IoT devices. 3.

(5) THE IOT IS GOING TO BE BIG THOUGH NOBODY REALLY KNOWS HOW BIG …. 28.1 BILLION Units by 2020. 25 BILLION. 25 BILLION. Units by 2021. M2M connections by 2022 OF WHICH. $1.7 TRILLION. $200 BILLION. 2.6 BILLION ARE CELLULAR. SERVICE REVENUES IN 2020. GLOBAL SOLUTION REVENUES BY 2020. $1.7 TRILLION GLOBAL ECONOMIC VALUE IN 2020 Source: May 2015. Source: November 2018. $1.2 TRILLION GLOBAL OPPORTUNITY BY 2022 Source: January 2013. 4.

(6) HOW TO HANDLE SUCH A LARGE NUMBER OF DEVICES? A large share of IoT devices will be served by short-range radio technologies Unlicensed spectrum (e.g., Wi-Fi and Bluetooth) Costless but … Limited QoS and security requirements. A significant proportion will be enabled by wide area networks (WANs) Unlicensed Low Power Wide Area (LPWA): LoRa, Sigfox, … Very limited demands on throughput, reliability and QoS. Licensed spectrum: 4G, NB-IoT, 5G, … Largely responsible for wireless connectivity on a global scale Adapted to deliver reliable, secure and diverse IoT services. 5.

(7) CELLULAR NETWORK ARCHITECTURE CONGESTION LOCALIZATION A huge number of devices ... ... but a limited number of resources (i.e., # of opportunities to connect) Random access. 🔥 🔥. Only way to access the network (simplest) The most critical area. 🔥. Complex traffic pattern. 🔥. Poisson (e.g., credit machine in shops), Uniform (e.g., traffic lights), Beta (e.g., event driven). 🔥. Different classes of IoT (including prioritized M2M) Adlen Ksentini, Yassine Hadjadj-Aoul, and Tarik Taleb: “Cellular-based Machine Type Communication: Overload control”. In IEEE Network, Vol. 26, Issue 6, Pages : 54 – 60 (November 2012). 6.

(8) RISK OF CONGESTION COLLAPSE AT THE RAN Even when having 54 opportunities, the risk of congestion is still high … (b). (a) 25. # of successful RA. # of Beta arrivals. 80. 60. 40. 20. 0. « RAN overload control … is identified as the first priority improvement area » … 3GPP TR 37.868. 20 15 10 5. 0. 2. 4 6 Time (s). 8. 10. 0. 0. 2. 4 6 Time (s). 8. 10. Meriam Bouzouita, Yassine Hadjadj-Aoul, Nawel Zangar, Sami Tabbane : “On the risk of congestion collapse in heavily congested M2M networks”. In proc. of IEEE ISNCC, Hammamet, Tunis (May 2016). 7.

(9) ACCESS OVERVIEW OF IOT DEVICES. Understanding the origin of the problem. 8.

(10) RANDOM ACCESS. Random selection of a preamble Collision. Successful. X. Preamble Transmission (Msg1) TA: Timing Advance RAR Message : TA + UL grant + T-CRNTI (Msg2). T-CRNTI : Temporary Cell Radio Network Identifier. Preambles RRC Connection Req. : terminal ID Msg3. RRC Connection Setup : Msg4 10.

(11) A MODEL FOR THE ACCESS. Fluid model approximating the access process. 11.

(12) MODEL FOR ACCESS Could be modeled using the classical « Balls into Bins » problem NI : # of idle preambles (# of bins with no ball). 1 " 𝑁! 𝑀 = 𝑁 1 − 𝑁 NS: # of successful access (# of bins with 1 ball) 1 "$% 𝑁# 𝑀 = 𝑀 1 − 𝑁. N Bins ~ N opportunities to connect. Collision Successful preamble Idle preamble. M Balls ~ M IoT devices. 13.

(13) HOW TO DETERMINE THE OPTIMAL NUMBER OF CONTENDING DEVICES? Method 1: Can be determined by Monte Carlo simulations.. Contending devices vs. Successful accesses. 25. (54, ~20.06) 20. Maximized when: # of contending devices = 54. Number of opportunities N. Method 2: Can be determined analytically.. 15. 10. 5. Analysis of 𝑁# 0 0. 50. 100. 150. 200. 250. 14. 300.

(14) SOME EXISTING APPROACHES TO TACKLE THE CONGESTION AT THE ACCESS Access planning. Limit the burden … but insufficient since some devices react to events which cannot be timed.. Grouping devices Pull-based scheme. A paging message may also include a back-off time for the MTC. Separate RACH resources for MTC. Splitting the preambles into H2H group(s) and MTC group(s) or allocating PRACH occasions in time or frequency to either H2H or MTC devices.. Dynamic allocation of RACH resources Access Class Barring (ACB) UE individual Access Class Barring Extended Access Barring. Meriam Bouzouita, Yassine Hadjadj-Aoul, Nawel Zangar, Sami Tabbane, César Viho: « A random access model for M2M communications in LTEadvanced mobile networks», In « Modeling and simulation of computer networks and systems », Elsevier/Morgan Kofmann, pp. 577 – 599 (2015). 15.

(15) FOCUS ON THE ACB Broadcast - BCCH Per-class ACB-factor p. Backlogged IoT device. Select a random q. ACB barring time. No. q< p. Arrival. Back logged IoT devices x x. Starting RA and preamble transmission. Successful transmission Remaining RA steps. No. Max retransmissions Failure. x. Access with probability p. IoT devices that could attempt access 16.

(16) A FLUID MODEL FOR THE ACCESS 𝑋! 𝜆. 𝑋!. Arrivals. 𝑋". 𝑋" 𝑝 𝑥!. 𝑥!. 𝜇! 𝑥!,'. 𝜇% 𝑥%,'. Access with M IoT device probability p 𝑵𝒔 𝑴. 1 − 𝑞" ##$! 𝑥%. 1 − 𝑝 𝑥!. 𝑥!,$. Back logged IoT devices. 𝑥". Successful attempts IoT devices that could attempt access 1 𝑞" = 1 − 𝑁. 𝑞" ##$!𝑥%. dx1 dt dx2 dt dx1,L dt dx2,L dt. 𝑥",!. = λ − x1 + µ1 x1,L , = px1 + µ2 x2,L − x2 , = (1 − p)x1 − µ1 x1,L ,. ! " x2 −1 = 1 − qN x2 − µ2 x2,L . 17.

(17) EFFICIENT SUPPORT OF IOT DEVICES. Estimating the access’s contention. 18.

(18) CHALLENGES AT THE ACCESS Setting up a control action. 𝒙∗𝟐. What is the optimal number of contending devices Best target for a control strategy. How to estimate the number of contending devices (in states 𝑋% and 𝑋& ) ? Difficulty: no direct way to know it. 𝑋! 𝜆. 𝑥!. 𝜇! 𝑥!,'. 𝑝 𝑥!. 𝑥". 𝑞" ##$!𝑥%. 𝜇% 𝑥%,'. What is the best control action to optimize the number of contending devices ? Optimal barring strategy KPI: delay, energy, number of abandons, number of attempts… Difficulty: Nonlinear model, non-affine in control. 𝑋". 1 − 𝑞" ##$! 𝑥%. 1 − 𝑝 𝑥!. 𝑥!,$. 𝑥",$. How prioritize the contending devices (sharing the same resources)? Per-class estimation, per-class barring Meriam Bouzouita, Yassine Hadjadj-Aoul, Nawel Zangar, Gerardo Rubino: “Estimating the number of contending IoT devices in 5G networks: revealing the invisible”. In Wiley, Transactions on Emerging Telecommunications Technologies (TETT). (August 2018). 19.

(19) TOWARDS THE USE OF LEARNING TECHNIQUES FOR ACCESS CONTROL 20.

(20) WHY USING DEEP REINFORCEMENT LEARNING? The blocking factor calculation requires a good knowledge of the number of terminals willing to attempt access But it is not available in the network the state of the network is not observable. It is possible to estimate this number, but this estimate is subject to noise.. The traffic pattern is very complex Lack of data We cannot use supervised learning. Deep reinforcement learning techniques have been shown to be effective in making predictions even when the data is very noisy. 21.

(21) PROBLEM FORMULATION. MARKOV DECISION PROCESS (MDP) DEFINITION MDP: 𝑀 = (𝑆, 𝐴, 𝑝, 𝑟) State 𝑆 : State space. Observation, Revenue State. 𝑠! = (𝑥*"! , 𝑥*"!#$ , … , 𝑥*"!%&%$ ), 𝐻: Horizon 𝑘: time step (each new frame) 𝑠! reflects better the real state. Action 𝐴 : Action space. 𝑝: blocking factor Continuous, deterministic. action. 𝑝(𝑠 1 |𝑠, 𝑎): transition probability Related to the environment (not known). 1 ) : is the reward of transition Revenue 𝑟(𝑠, 𝑎, 𝑠 (𝑠, 𝑎, 𝑠 1 ) !. 𝑟( = ") ∑(*+($),! 𝑁-*. Objective: Find the probability of blocking that maximizes the average reward.. 22.

(22) HOW TO SOLVE THE PROBLEM? Twin Delayed Deep Deterministic policy gradient algorithm (TD3) Deterministic approach Deals with continuous action space Solves the problem of overvaluation in value estimation Performs better than DDPG, PPO, …. Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. arXiv 2018, arXiv:1802.09477.. 23.

(23) ARRIVAL REGULATION SYSTEM pk pk. sk+1. rk (sk , ak , rk , sk+1 ). pk+1 ak+1 (pk+1 ). 24.

(24) PERFORMANCE EVALUATION Simulator:. Compared strategies:. Discrete event simulator developed from scratch. ADAPT PID controller TD3 (proposed). Arrival process of IoT devices: Poisson MTBA = 0.018s. Preambles: Number of preambles: 𝑁 = 12 Arrival frequency: 0.1s. Others: Measurement horizon: 𝐻 = 10 25.

(25) THE ACCESS PROBABILITY FOR THE CONSIDERED STRATEGIES. 26.

(26) THE AVERAGE REWARD OF THE CONSIDERED STRATEGIES. Average = 20.33%. Average = 22.84%. Average = 29.25%. 27.

(27) THE STATUS OF THE PREAMBLES Ave. success : 2.47 Ave. attempts: 23.52. Optimal success: 4.61 Optimal attempts: 11.49. Ave. success : 2.74 Ave. attempts: 17.15. Ave. success : 3.52 Ave. attempts: 15.70. 28.

(28) CONCLUSIONS We proposed a mechanism to control the congestion of IoT access networks We proposed a fluid model of the access Allow determining optimal objective. We exploited recent advances in deep reinforcement learning, through the use of the TD3 algorithm. Simulation results show the superiority of the proposed approach Despite the lack of accurate data. Future work:. Improve the estimation of the number of attempts. 29.

(29)