• Aucun résultat trouvé

Cognitive Policy Based SON Management Demonstrator


Academic year: 2021

Partager "Cognitive Policy Based SON Management Demonstrator"


Texte intégral


HAL Id: hal-01815821


Submitted on 14 Jun 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Cognitive Policy Based SON Management Demonstrator

Tony Daher, Sana Jemaa, Laurent Decreusefond

To cite this version:

Tony Daher, Sana Jemaa, Laurent Decreusefond. Cognitive Policy Based SON Management

Demon-strator. ICIN 2018, Feb 2018, paris, France. �hal-01815821�


Cognitive Policy Based SON Management


Tony Daher, Sana Ben Jemaa

Orange Labs

44 Avenue de la Republique 92320 Chatillon, France Email:{tony.daher,sana.benjemaa}@orange.com

Laurent Decreusefond

Telecom ParisTech

23 avenue d’Italie, 75013 Paris, France Email:laurent.decreusefond@mines-telecom.fr

Abstract—Policy Based SON Management

(PBSM) framework has been introduced to manage Self-Organizing Networks (SON) func-tions in a way that they fulfill all together the operator global goals and provide a unique self-organized network that can be controlled as a whole. This framework mainly translates operator global objectives into policies to be followed by individual SON functions. To cope with the complexity of radio networks due to the impact of radio environment and traffic dy-namics, we propose to empower the PBSM with cognition capability. We propose a Cognitive PBSM (CPBSM) that relies on a Reinforcement Learning (RL) algorithm which learns the opti-mal configuration of SON functions and steers them towards the global operator objectives. The visitor will see how changing the opera-tor objectives leads to a reconfiguration of the SON functions in such a way that the new objectives are fulfilled. The operation of a RL based cognitive management will be illustrated and the exploration/exploitation and scalability dilemmas will be explained.

I. Introduction

The automation of Radio Access Networks (RAN) management became a market reality with the standardization [1] and the commercial release of Self Organized Networks (SON) solutions. To-day, SON functions are running in several oper-ational networks to replace individual tasks such as neighboring relations configuration, load man-agement related parameter settings, etc. Hence several SON functions run simultaneously in the same network, each of them fulfilling a specific objective. These individual objectives may some-times be conflictual and trade-offs need to be found with respect to the operator strategy. Policy Based SON Management (PBSM) framework has been introduced to orchestrate SON functions in a way that they fulfill all together operator global goals [2]. Finding the optimal mapping between the operator goals and the actual actions or behaviors

of the SON functions is a complex problem for sev-eral reasons. First, SON functions are provided by RAN vendors as black boxes with limited leverages on their behavior, through the configuration of the SON function parameters such as thresholds, step size, parameters intervals, etc that we note as SON function Configuration Value (SCV) sets. Besides, the behavior of a SON function running on a given cell, for a given SCV set, depends on the radio environment, namely the propagation, and on the traffic distribution and dynamics. To cope with this complexity we propose to empower the PBSM with cognition capability illustrated in figure 1 [3]. The proposed Cognitive PBSM (CPBSM) relies on a Reinforcement Learning (RL) algorithm that learns the optimal configuration of SON functions and steers them towards fulfilling the global operator objectives.

Fig. 1: Cognitive PBSM

The learning technique considered for the demonstrator is a centralized online learning tech-nique, based on Multi-Armed Bandits (MAB) [4].


MAB is a RL problem, formulated as a sequential decision problem, where at each iteration an agent is confronted with a set of actions called arms, each when pulled, generates a reward. The agent is only aware of the reward of the arm after pulling it. The objective is to find a strategy that permits to identify the optimal action, while maximizing the cumulated rewards obtained during the process. In our case, a learning agent on top of the SON func-tions takes acfunc-tions by configuring all the instances of the SON functions in the network, as shown in figure 2.

Fig. 2: Cognitive PBSM Functional Scheme II. Demonstrator Description

A. Scenario

Fig. 3: Network Topology

The demonstration shows a cognitive manage-ment system for radio access mobile networks. Based on operator-defined objectives, such as max-imizing end-user average bit-rates and minimiz-ing average cell loads, the cognitive management system automatically configures and controls the operation of SON Functions. The demonstrator is based on an LTE-A radio access network simulator

with several SON functions deployed on a hetero-geneous network, with realistic topology and prop-agation modeling. The considered network portion is presented in figure 3. The macro-cellular layer corresponds to a portion of a real network topology with real-like network parameters and accurate ray tracing based propagation. Small cells are added to the network using standard propagation. We consider a stationary and unbalanced traffic dis-tribution.The demonstration scenario considers 3 SON functions:

• Mobility Load Balacing (MLB) deployed on each macro-cell, in charge of setting the mo-bility parameters to balance the load between neighboring macro cells.

• Cell Range Expansion (CRE) deployed on

each small cell, in charge of balancing the load between the small cell and the neighbor macro cell.

• Enhanced Inter Cell Interference

Coordina-tion (eICIC) deployed on each macro cell having at least 1 small cell deployed in its coverage region. The eICIC’s task is to protect small cell edge users from the high levels of in-terference generated by the macro cell, by tun-ing the number of Almost Blank Subframes (ABS) transmitted by the macro cell (ABS contains only control signals transmitted at reduced power).

For this scenario, we will define the following KPIs:

Li,c,t is the load of cell i

Lc,t is the average load in the considered


Tc,t is the average user throughput in the

considered section

T0c,tis the average pico cell edge user

through-put in the considered section

c ∈ C, where C is the set of actions, which

is in this case the set of SCV set combinations for the deployed SON instances in the considered section. t represents the iteration. The variables were normalized between 0 and 1. The reward function reflecting the operator’s objective, and that the learning algorithm seeks to maximize, is therefore:

rc,t= ω1(1 − σc,t) + ω2Tc,t+ ω3T0c,t (1)

Where the load variance is:

σc,t =


i=0(Li,c,t− Lc,t)



B is the number of cells in the considered section

and ω1, ω2 and ω3 are priority weights set by the


B. Demonstrator

The CPBSM demonstrator consists of 3 main components that are: the Operator Objectives

Panel, the Learning Panel and the Performance Evaluation Panel.

In the Operator Objectives Panel, the operator defines its objectives by giving priorities to several proposed criteria such as load balance, end user throughput or interference level at cell edge. An example of operator objective panel is illustrated in figure 4.

Fig. 4: Operator Objectives Panel

The Learning Panel shows the evolution of the CPBSM’s action selection statistics, as well as the generated rewards. This panel, illustrated in figure 5, shows how the learning process progresses. The action selection statistics graph shows the evolu-tion of the online learning agent decisions. In the first stages of the learning, the demonstrator shows an almost random action selection, due to the lack of knowledge of the learning agent. After several iterations, the agent’s decisions start converging towards the optimal action. This is also shown in the reward function evolution graph, where we can see that the generated rewards keep increasing with time, to finally converge to a stationary state. Furthermore, the demonstration shows that the CPBSM, through its learning agent, is able to adapt to objective changes, i.e. when the operator changes its objectives, the CPBSM will perform a brief learning phase, before converging back to a new optimal action.

Fig. 5: Learning Process Evolution

The Performance Evaluation Panel compares the KPIs performances of the CPBSM with a none-objective driven, static configuration of SON functions as shown in figure 6. It is important to note that the static configuration of SON func-tions is a typical configuration deployed on real field networks. The objective is to illustrate the gain that we can obtain by adopting the CPBSM with respect to typical performances that can be obtained today with SON.

Fig. 6: Performance Comparison III. Conclusion

The demonstrator shows how cognition capabil-ity allows the RAN management system to always make optimal decisions. The visitor will see how changing the operator objectives leads to a recon-figuration of the SON functions in such a way that the new objectives are fulfilled. The operation of a RL based cognitive management will be illustrated and the exploration/exploitation and scalability dilemmas will be explained.


[1] 3GPP TR 36.814, "Technical Specification Group Radio Access Network; Evolved Univer-sal Terrestrial Radio Access (E-UTRA); Fur-ther advancements for E-UTRA physical layer aspects (Release 9)", March 2010.

[2] SEMAFOUR project web page http://fp7-semafour.eu/.

[3] T. Daher, S. Ben Jemaa and L. Decreusefond, "Cognitive Management of Self-Organized Ra-dio Networks Based on Multi Armed Bandit,"

IEEE Personal Indoor and Mobile Radio Com-munications (PIMRC), 2017.

[4] P. Auer, N. Cesa-Bianchi and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," Machine learning, 47 (2-3), pp. 235-256, 2002


Documents relatifs

Our results show that spatial biases favoring reward learning in the right (vs. left) hemifield were associated with increased reward responses in the left hemisphere and

We examined the effects of these two motivational constructs, predicting that regular exposure to cognitively demanding situations during the life span may result in older

In section 3, we apply this construction to the games model, define real-time strategies and give the Galois con- nection that make all the denotations of λ-terms real-time.. In

Finally, in order to show that any effects of interface preferences were related to matching with the user’s cognitive style rather than just a preference for any adaptive interface,

We will not review all of this research but will only describe different types of stimulus control that differ with regard to the specific aspects of stimuli in the environment

After knowledge has been acquired, thoughts on conflicts, trust, gaps, nov- elty, and overlaps are derived through reasoning over the graph. Conflicts link back to theory of mind

One of the ways to do this is to apply cognitive visualization technology which en- ables to elaborate a special learning strategy (technique) that involves trainees into

Dans un premier temps, je souhaite remercier M Picot pour son soutien, sa disponibilité et sa patience à notre égard lors de nos doutes. Il nous a accompagnés, écoutés et a

observations of leve!s of programming skill development (cf. .Accounts of novice - expert ditferences in programming ability among.. Further difi"erentiatl~~n will inevitably

The g.eneral concept of "developmental leveln at the abstract theoretical levels of preoperational, concrete operational, and formal operational intellectual

The management is distributed over the system and a hierarchical dependency is set on 3 layers, each having a different level of knowledge of the system and the associated

Une étude a été faite en Brésil Pour étudier la prévalence et l'incidence de la NP liée au traitement antinéoplasique chez les personnes atteintes de myélome multiple

Cognitive and Language Development in

We investigate the impact of text-based factors (readability and problem topic) on the solving of mathematics story problems using a corpus of N = 3394 students

While the relation ‘is quality measurement of’ cannot be used to connect cognitive functioning assay scores to the cognitive functions they provide information for, we

Este trabalho foi realizado no Instituto Barreiro de Biotecnologia IBB, sediado na Fazenda Barreiro Ltda., município de Silvânia, Estado de Goiás, para avaliar os efeitos de

Published in Karen Littleton, Christine Howe (Eds), Educational Dialogues.. Understanding and Promoting

En outre, plusieurs études dans des modèles animaux et sur des biopsies rénales chez l’homme suggèrent la présence de TEM dans les reins fibreux ou, tout au moins, l’acqui-

exhibit acceptor defects, both as caesium and copper vacancies, the major defect depending on the conditions.. Of all the copper vacancies, V Cu1 is the easiest

In the present study, in a sample of 13 males and 22 females between 20 and 36 years of age, we jointly examined four self-estimates of cognitive ability (mathematical, verbal,

D’une part, en utilisant la classification de premier ordre issue des deux échelles les plus utilisées pour évaluer les stratégies de coping chez les patients atteints d’un

MDLChunker represents them using sets of initial symbols or features called hereafter canonical chunks (in the example of section 2, this is the initial alphabet used

The values manifested through the other types of use we have seen earlier flow from this condition: designata physically present within the situation of utterance which are