• Aucun résultat trouvé

Multi-Armed Bandit Learning in IoT Networks

N/A
N/A
Protected

Academic year: 2021

Partager "Multi-Armed Bandit Learning in IoT Networks"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02013839

https://hal.inria.fr/hal-02013839

Submitted on 11 Feb 2019

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Multi-Armed Bandit Learning in IoT Networks

Remi Bonnefoi, Lilian Besson

To cite this version:

Remi Bonnefoi, Lilian Besson. Multi-Armed Bandit Learning in IoT Networks. Journée des Doctor-ants de l’IETR, Jul 2017, Rennes, France. �hal-02013839�

(2)

Multi-Armed Bandit Learning in IoT Networks

Poster for PhD Student Day, July 2017, Rennes (France)

By: R

EMI

B

ONNEFOI

& L

ILIAN

B

ESSON

First.Last@

CentraleSupelec.fr

– SCEE Team,

CentraleSupélec Rennes

& IETR

1. I

NTRODUCTION

& G

OAL

Goal: fit more objects in a “Internet of Things”

net-works, keep a good Quality of Service.

• Hypothesis: objects choose channel k ∈ {1, . . . , K}, to use for each communication.

• Idea: use on-line Machine Learning algorithms ? • Not so easy: each device takes its own decisions,

without central control or communication, has light CPU/memory etc. . .

• =⇒ Solution: Decentralized MAB algorithms !

2. M

ODEL

: T

IME

/F

REQUENCY PROTOCOL

D

EVICES IN THE NETWORK

Figure 1: Time-frequency slotted protocol.

Frame = fix-duration uplink slot ր (end-devices transmit

their packets) + Ack delay + downlink slot ւ (base station

replies with Ack if packet well received).

Model: One base station

K = 10 RF channels (of same bandwidth).

S + D = 2000 end-devices in the network, with

very low duty-cycle (one message every 1000 frame).

They are separated into two groups:

• S static devices : poor RF abilities, and use only one channel to communicate with the base sta-tion. Their choice is fixed in time (stationary) and independent (i.i.d.). interfering traffic generated by static devices. (Unknown) affectation to the K channels: S = (S1, . . . , SK).

• D dynamic devices : richer RF abilities, can use all the available channels, by quickly reconfiguring

their RF transceiver on the fly (dynamically).

3. S

OME BASELINE ALGORITHMS

Performance = successful transmission rate.

Three algorithms used for baseline comparison.

• Naive algorithm: all the D dynamic devices choose their channel ki(t) ∼ U ({1, . . . , K}) purely

uniformly at random.

• Optimal algorithms: exact algorithm (or a greedy approximation), when a centralized agent can affect the D dynamic devices to channels.

!

Inapplicable in practice as we need

a decentralized approach, but it gives a baseline for comparison.

4. M

ULTI

-A

RMED

B

ANDITS ALGORITHMS

Every time t ∈ N∗ a dynamic device needs to send :

1. it chooses a channel A(t) ∈ {1, . . . , K}

2. it sends an uplink packet ր on that channel

2. then it observes a binary reward rA(t) ∈ {0, 1}

(1 if Ack ւ is well received, 0 if collision)

4.1. U

PPER

C

ONFIDENCE

B

OUND ALGO

.

Simple frequentist approach :

• Selections of channel k, up-to time t Nk(t) := P t τ =1 ✶(A(τ) = k) • Accumulated rewards Xk(t) := P t τ =1 rk(τ ) × ✶(A(τ) = k)

• UCB1 uses a confidence term (parameter α > 0) Bk(t) :=

p

α log(t)/Nk(t)

To compute its index (upper confidence bound) Uk(t) := Xk(t)/Nk(t) + Bk(t) = µck(t) + Bk(t)

• Use Uk(t) to decide the channel for next step:

A(t + 1) ∈ arg max1≤k≤Nc Uk(t)

=⇒ UCB1 is a deterministic index policy.

4.2. T

HOMPSON

S

AMPLING ALGORITHM

Old algorithm (1935), Bayesian approach :

• Start with a flat Beta prior, Beta(1, 1), on the (un-known) parameter µk ∈ [0, 1]

And at time t, the posterior counts the successes and failures of channel k:

Πk(t) = Beta 1 + Xk(t), 1 + Nk(t) − Xk(t)

 • Then sample a random index for each channel,

from the posteriors:

Ik(t) ∼ Πk(t)

• And choose:

A(t + 1) ∈ arg max1≤k≤Nc Ik(t)

=⇒ TS is a randomized index policy.

5. Q

UICK CONVERGENCE OF

MAB

ALGORITHMS

Number of slots ×105

2 4 6 8 10

Successful transmission rate

0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 UCB Thompson-sampling Optimal Good sub-optimal Random Number of slots ×105 2 4 6 8 10

Successful transmission rate

0.81 0.815 0.82 0.825 0.83 0.835 0.84 0.845 0.85 0.855 0.86 UCB Thompson-sampling Optimal Good sub-optimal Random Number of slots ×105 2 4 6 8 10

Successful transmission rate

0.795 0.8 0.805 0.81 0.815 0.82 0.825 0.83 0.835 UCB Thompson-sampling Optimal Good sub-optimal Random Number of slots ×105 2 4 6 8 10

Successful transmission rate

0.76 0.77 0.78 0.79 0.8 0.81 0.82 UCB Thompson-sampling Optimal Good sub-optimal Random

Figure 2: Performance of 2 MAB algorithms, compared to baseline algorithms (naive or optimal), when the proportion of

dynamic end-devices in the network increases, for 10%, 30%, 50% and to 100% (limit scenario).

=⇒ Almost optimal performances! =⇒ Very quick convergence!

6. N

EAR OPTIMAL PERFORMANCES

Proportion of dynamic devices (%)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Gain compared to random channel selection

-0.02 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Optimal strategy UCB1, α=0.5 Thomson-sampling

Figure 3: Learning with UCB1 and TS, with more and

more dynamic devices. =⇒ For any configuration, TS converges quickly to near optimal performances!

7. C

ONCLUSIONS

• Our approach is simple to set up: every dynamic object runs a simple on-line Multi-Armed Bandit

algorithm to learn the quality of each channel, and aim

at the most available channel

• Economic: low runtime complexity, low memory requirements

In a fully decentralized manner, dynamic devices learn to fit in the channels almost optimally !

• Convergence is very quick to attain: about 50 com-munications for each device is enough !

• Surprising result: stochastic MAB algorithms also work very well in non-stochastic environments !

=⇒ With lots of dynamic objects in a IoT network,

using MAB learning helps to improve the success-ful transmission rate, and increase quality of service.

9. T

HANKS TO

. . .

– ADDI association for the “PhD Students Day” 2017 !

– Magnet Inria team for Workshop on Decentralized ML 2017 ! – CentraleSupélec, IETR, Univ.Rennes 1, Inria (Lille) & CNRS – Our advisors: Christophe Moy, Émilie Kaufmann & J. Palicot.

8. M

AIN REFERENCES

M

ORE ON

-

LINE

→ http://lbo.k.vu/JdD2017

[BBM+17] R. Bonnefoi, L. Besson, C. Moy, E. Kaufmann, and J. Palicot (2017). Multi-Armed Bandit Learning in IoT Networks:

Learning helps even in non-stationary settings. Sent to the CrownCom 2017 conference in May 2017.

[MPD16] C. Moy, J. Palicot, and S. J. Darak (2016). Proof-of-Concept System for Opportunistic Spectrum Access in Multi-user

Références

Documents relatifs

Later in the clinical course, measurement of newly established indicators of erythropoiesis disclosed in 2 patients with persistent anemia inappropriately low

Households’ livelihood and adaptive capacity in peri- urban interfaces : A case study on the city of Khon Kaen, Thailand.. Architecture,

3 Assez logiquement, cette double caractéristique se retrouve également chez la plupart des hommes peuplant la maison d’arrêt étudiée. 111-113) qu’est la surreprésentation

Et si, d’un côté, nous avons chez Carrier des contes plus construits, plus « littéraires », tendant vers la nouvelle (le titre le dit : jolis deuils. L’auteur fait d’une

la RCP n’est pas forcément utilisée comme elle devrait l’être, c'est-à-dire un lieu de coordination et de concertation mais elle peut être utilisée par certains comme un lieu

The change of sound attenuation at the separation line between the two zones, to- gether with the sound attenuation slopes, are equally well predicted by the room-acoustic diffusion

Si certains travaux ont abordé le sujet des dermatoses à l’officine sur un plan théorique, aucun n’a concerné, à notre connaissance, les demandes d’avis

Using the Fo¨rster formulation of FRET and combining the PM3 calculations of the dipole moments of the aromatic portions of the chromophores, docking process via BiGGER software,