• Aucun résultat trouvé

Access control in NB-IoT networks: a deep reinforcement learning strategy

N/A
N/A
Protected

Academic year: 2021

Partager "Access control in NB-IoT networks: a deep reinforcement learning strategy"

Copied!
28
0
0

Texte intégral

(1)Access control in NB-IoT networks: a deep reinforcement learning strategy Yassine Hadjadj-Aoul. To cite this version: Yassine Hadjadj-Aoul. Access control in NB-IoT networks: a deep reinforcement learning strategy. GDR ARC - Session Automation and Communication Networks, Nov 2020, Virtual, France. �hal03135194�. HAL Id: hal-03135194 https://hal.inria.fr/hal-03135194 Submitted on 8 Feb 2021. HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés..

(2) ACCESS CONTROL IN NB-IOT NETWORKS: A DEEP REINFORCEMENT LEARNING STRATEGY Yassine Hadjadj-Aoul Associate professor, Univ Rennes IRISA/INRIA Dionysos team-project. Lien pour les questions: shorturl.at/prNW5 Thursday, November 26th, 2020. 1.

(3) PLAN Introduction Access overview of IoT devices A model for the access Efficient support of a massive number of IoT devices using reinforcement learning Conclusions 2.

(4) INTRODUCTION. Massive access of IoT devices. 3.

(5) THE IOT IS GOING TO BE BIG THOUGH NOBODY REALLY KNOWS HOW BIG …. 28.1 BILLION Units by 2020. 25 BILLION. 25 BILLION. Units by 2021. M2M connections by 2022 OF WHICH. $1.7 TRILLION. $200 BILLION. 2.6 BILLION ARE CELLULAR. SERVICE REVENUES IN 2020. GLOBAL SOLUTION REVENUES BY 2020. $1.7 TRILLION GLOBAL ECONOMIC VALUE IN 2020 Source: May 2015. Source: November 2018. $1.2 TRILLION GLOBAL OPPORTUNITY BY 2022 Source: January 2013. 4.

(6) HOW TO HANDLE SUCH A LARGE NUMBER OF DEVICES? A large share of IoT devices will be served by short-range radio technologies ­ Unlicensed spectrum (e.g., Wi-Fi and Bluetooth) ­ Costless but … ­ Limited QoS and security requirements. A significant proportion will be enabled by wide area networks (WANs) ­ Unlicensed Low Power Wide Area (LPWA): LoRa, Sigfox, … ­ Very limited demands on throughput, reliability and QoS. ­ Licensed spectrum: 4G, NB-IoT, 5G, … ­ Largely responsible for wireless connectivity on a global scale ­ Adapted to deliver reliable, secure and diverse IoT services. 5.

(7) CELLULAR NETWORK ARCHITECTURE CONGESTION LOCALIZATION A huge number of devices ... ... but a limited number of resources (i.e., # of opportunities to connect) Random access. 🔥 🔥. ­ Only way to access the network (simplest) ­ The most critical area. 🔥. Complex traffic pattern. 🔥. ­ Poisson (e.g., credit machine in shops), Uniform (e.g., traffic lights), Beta (e.g., event driven). 🔥. Different classes of IoT (including prioritized M2M) Adlen Ksentini, Yassine Hadjadj-Aoul, and Tarik Taleb: “Cellular-based Machine Type Communication: Overload control”. In IEEE Network, Vol. 26, Issue 6, Pages : 54 – 60 (November 2012). 6.

(8) RISK OF CONGESTION COLLAPSE AT THE RAN Even when having 54 opportunities, the risk of congestion is still high … (b). (a) 25. # of successful RA. # of Beta arrivals. 80. 60. 40. 20. 0. « RAN overload control … is identified as the first priority improvement area » … 3GPP TR 37.868. 20 15 10 5. 0. 2. 4 6 Time (s). 8. 10. 0. 0. 2. 4 6 Time (s). 8. 10. Meriam Bouzouita, Yassine Hadjadj-Aoul, Nawel Zangar, Sami Tabbane : “On the risk of congestion collapse in heavily congested M2M networks”. In proc. of IEEE ISNCC, Hammamet, Tunis (May 2016). 7.

(9) ACCESS OVERVIEW OF IOT DEVICES. Understanding the origin of the problem. 8.

(10) RANDOM ACCESS. Random selection of a preamble Collision. Successful. X. Preamble Transmission (Msg1) TA: Timing Advance RAR Message : TA + UL grant + T-CRNTI (Msg2). T-CRNTI : Temporary Cell Radio Network Identifier. Preambles RRC Connection Req. : terminal ID Msg3. RRC Connection Setup : Msg4 10.

(11) A MODEL FOR THE ACCESS. Fluid model approximating the access process. 11.

(12) MODEL FOR ACCESS Could be modeled using the classical « Balls into Bins » problem ­ NI : # of idle preambles (# of bins with no ball). 1 " 𝑁! 𝑀 = 𝑁 1 − 𝑁 ­ NS: # of successful access (# of bins with 1 ball) 1 "$% 𝑁# 𝑀 = 𝑀 1 − 𝑁. N Bins ~ N opportunities to connect. Collision Successful preamble Idle preamble. M Balls ~ M IoT devices. 13.

(13) HOW TO DETERMINE THE OPTIMAL NUMBER OF CONTENDING DEVICES? Method 1: Can be determined by Monte Carlo simulations.. Contending devices vs. Successful accesses. 25. (54, ~20.06) 20. Maximized when: # of contending devices = 54. Number of opportunities N. Method 2: Can be determined analytically.. 15. 10. 5. Analysis of 𝑁# 0 0. 50. 100. 150. 200. 250. 14. 300.

(14) SOME EXISTING APPROACHES TO TACKLE THE CONGESTION AT THE ACCESS Access planning. ­ Limit the burden … but insufficient since some devices react to events which cannot be timed.. Grouping devices Pull-based scheme. ­ A paging message may also include a back-off time for the MTC. Separate RACH resources for MTC. ­ Splitting the preambles into H2H group(s) and MTC group(s) ­ or allocating PRACH occasions in time or frequency to either H2H or MTC devices.. Dynamic allocation of RACH resources Access Class Barring (ACB) ­ UE individual Access Class Barring ­ Extended Access Barring. Meriam Bouzouita, Yassine Hadjadj-Aoul, Nawel Zangar, Sami Tabbane, César Viho: « A random access model for M2M communications in LTEadvanced mobile networks», In « Modeling and simulation of computer networks and systems », Elsevier/Morgan Kofmann, pp. 577 – 599 (2015). 15.

(15) FOCUS ON THE ACB Broadcast - BCCH Per-class ACB-factor p. Backlogged IoT device. Select a random q. ACB barring time. No. q< p. Arrival. Back logged IoT devices x x. Starting RA and preamble transmission. Successful transmission Remaining RA steps. No. Max retransmissions Failure. x. Access with probability p. IoT devices that could attempt access 16.

(16) A FLUID MODEL FOR THE ACCESS 𝑋! 𝜆. 𝑋!. Arrivals. 𝑋". 𝑋" 𝑝 𝑥!. 𝑥!. 𝜇! 𝑥!,'. 𝜇% 𝑥%,'. Access with M IoT device probability p 𝑵𝒔 𝑴. 1 − 𝑞" ##$! 𝑥%. 1 − 𝑝 𝑥!. 𝑥!,$. Back logged IoT devices. 𝑥". Successful attempts IoT devices that could attempt access 1 𝑞" = 1 − 𝑁. 𝑞" ##$!𝑥%. dx1 dt dx2 dt dx1,L dt dx2,L dt. 𝑥",!. = λ − x1 + µ1 x1,L , = px1 + µ2 x2,L − x2 , = (1 − p)x1 − µ1 x1,L ,. ! " x2 −1 = 1 − qN x2 − µ2 x2,L . 17.

(17) EFFICIENT SUPPORT OF IOT DEVICES. Estimating the access’s contention. 18.

(18) CHALLENGES AT THE ACCESS Setting up a control action. 𝒙∗𝟐. What is the optimal number of contending devices ­ Best target for a control strategy. How to estimate the number of contending devices (in states 𝑋% and 𝑋& ) ? ­ Difficulty: no direct way to know it. 𝑋! 𝜆. 𝑥!. 𝜇! 𝑥!,'. 𝑝 𝑥!. 𝑥". 𝑞" ##$!𝑥%. 𝜇% 𝑥%,'. What is the best control action to optimize the number of contending devices ? ­ Optimal barring strategy ­ KPI: delay, energy, number of abandons, number of attempts… ­ Difficulty: Nonlinear model, non-affine in control. 𝑋". 1 − 𝑞" ##$! 𝑥%. 1 − 𝑝 𝑥!. 𝑥!,$. 𝑥",$. How prioritize the contending devices (sharing the same resources)? ­ Per-class estimation, per-class barring Meriam Bouzouita, Yassine Hadjadj-Aoul, Nawel Zangar, Gerardo Rubino: “Estimating the number of contending IoT devices in 5G networks: revealing the invisible”. In Wiley, Transactions on Emerging Telecommunications Technologies (TETT). (August 2018). 19.

(19) TOWARDS THE USE OF LEARNING TECHNIQUES FOR ACCESS CONTROL 20.

(20) WHY USING DEEP REINFORCEMENT LEARNING? The blocking factor calculation requires a good knowledge of the number of terminals willing to attempt access ­ But it is not available in the network ­ the state of the network is not observable. ­ It is possible to estimate this number, but this estimate is subject to noise.. The traffic pattern is very complex Lack of data ­ We cannot use supervised learning. Deep reinforcement learning techniques have been shown to be effective in making predictions even when the data is very noisy. 21.

(21) PROBLEM FORMULATION. MARKOV DECISION PROCESS (MDP) DEFINITION MDP: 𝑀 = (𝑆, 𝐴, 𝑝, 𝑟) ­ State 𝑆 : State space. Observation, Revenue State. ­ 𝑠! = (𝑥*"! , 𝑥*"!#$ , … , 𝑥*"!%&%$ ), ­ 𝐻: Horizon ­ 𝑘: time step (each new frame) ­ 𝑠! reflects better the real state. ­ Action 𝐴 : Action space. ­ 𝑝: blocking factor ­ Continuous, deterministic. action. ­ 𝑝(𝑠 1 |𝑠, 𝑎): transition probability ­ Related to the environment (not known). 1 ) : is the reward of transition ­ Revenue 𝑟(𝑠, 𝑎, 𝑠 (𝑠, 𝑎, 𝑠 1 ) !. ­ 𝑟( = ") ∑(*+($),! 𝑁-*. Objective: Find the probability of blocking that maximizes the average reward.. 22.

(22) HOW TO SOLVE THE PROBLEM? Twin Delayed Deep Deterministic policy gradient algorithm (TD3) ­ Deterministic approach ­ Deals with continuous action space ­ Solves the problem of overvaluation in value estimation ­ Performs better than DDPG, PPO, …. Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. arXiv 2018, arXiv:1802.09477.. 23.

(23) ARRIVAL REGULATION SYSTEM pk pk. sk+1. rk (sk , ak , rk , sk+1 ). pk+1 ak+1 (pk+1 ). 24.

(24) PERFORMANCE EVALUATION Simulator:. Compared strategies:. ­ Discrete event simulator developed from scratch. ­ ADAPT ­ PID controller ­ TD3 (proposed). Arrival process of IoT devices: ­ Poisson ­ MTBA = 0.018s. Preambles: ­ Number of preambles: 𝑁 = 12 ­ Arrival frequency: 0.1s. Others: ­ Measurement horizon: 𝐻 = 10 25.

(25) THE ACCESS PROBABILITY FOR THE CONSIDERED STRATEGIES. 26.

(26) THE AVERAGE REWARD OF THE CONSIDERED STRATEGIES. Average = 20.33%. Average = 22.84%. Average = 29.25%. 27.

(27) THE STATUS OF THE PREAMBLES Ave. success : 2.47 Ave. attempts: 23.52. Optimal success: 4.61 Optimal attempts: 11.49. Ave. success : 2.74 Ave. attempts: 17.15. Ave. success : 3.52 Ave. attempts: 15.70. 28.

(28) CONCLUSIONS We proposed a mechanism to control the congestion of IoT access networks ­ We proposed a fluid model of the access ­ Allow determining optimal objective. ­ We exploited recent advances in deep reinforcement learning, through the use of the TD3 algorithm. Simulation results show the superiority of the proposed approach ­ Despite the lack of accurate data. Future work:. ­ Improve the estimation of the number of attempts. 29.

(29)

Références

Documents relatifs

We use Reinforcement Learning (RL) algorithms to determine the best operating mode according several parameters. This parameters depend of our application. And depending on

This paper presents Web Service Access Control for devices (WSACd), a scheme which aims to address the above requirements by defining a policy-based Access Con- trol (AC)

We have given a formal semantics for the access control model, defined a constraint-based analysis for computing the permissions available at each point of a program, and shown how

Consider a user U join that joins an existing group UG K ; few steps are necessary as introduced below: First, U join should register to SKDC after being

Does not really scale for a massive number of IoT devices ­ Adaptive version … ­ Improving the estimation of the number of IoT devices Meriam Bouzouita, Yassine Hadjadj-Aoul,

In traditional access control systems, a process is granted or not the access to a resource following a control on a single action without taking into consideration user and/or sys-

SemIoTics is driven by a knowledge base capturing knowledge about the devices of the system represented according to our core-domain IoT ontology, and about the environment shared

This paper addresses the issue of conveying a massive volume of IoT data through a network with limited communications resources (bandwidth) using a cognitive communications