• Aucun résultat trouvé

Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

N/A
N/A
Protected

Academic year: 2021

Partager "Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits"

Copied!
3
0
0

Texte intégral

(1)

HAL Id: hal-03201526

https://hal.archives-ouvertes.fr/hal-03201526

Submitted on 19 Apr 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

Thibaut Cuvelier, Richard Combes, Eric Gourdin

To cite this version:

Thibaut Cuvelier, Richard Combes, Eric Gourdin. Statistically Efficient, Polynomial-Time Algo- rithms for Combinatorial Semi-Bandits. SIGMETRICS 2021, ACM, Jun 2021, Virtual Event, China.

�10.1145/3410220.3453926�. �hal-03201526�

(2)

Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

Thibaut Cuvelier

Centrale-Supelec, L2S and Orange Labs

Gif-sur-Yvette, France [email protected]

Richard Combes

Centrale-Supelec, L2S Gif-sur-Yvette, France [email protected]

Eric Gourdin

Orange Labs Chatillon, France [email protected]

ABSTRACT

We consider combinatorial semi-bandits over a setX ⊂ {0,1}d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields the smallest known regret boundR(T) = O

d(

lnm)2(lnT)

min

afterT rounds, wherem=maxx∈X1x. How- ever, ESCB has computational complexityO(|X |), which is typi- cally exponential ind, and cannot be used in large dimensions. We propose the first algorithm that is both computationally and statisti- cally efficient for this problem with regretR(T)=O

d(lnm)2(lnT)

min

and computational asymptotic complexity

O(δT1poly(d)), whereδT is a function which vanishes arbitrarily slowly. Our approach involves carefully designing AESCB, an ap- proximate version of ESCB with the same regret guarantees. We show that, whenever budgeted linear maximization overXcan be solved up to a given approximation ratio, AESCB is implementable in polynomial timeO(δT1poly(d)) by repeatedly maximizing a linear function overXsubject to a linear budget constraint, and showing how to solve these maximization problems efficiently. Ad- ditional algorithms, proofs and numerical experiments are given in the complete version of this work.

CCS CONCEPTS

•Mathematics of computing→Combinatorial optimization;

•Theory of computation→Reinforcement learning;

KEYWORDS

Bandits, Combinatorial Bandits, Combinatorial Optimization ACM Reference Format:

Thibaut Cuvelier, Richard Combes, and Eric Gourdin. 2021. Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits. In Abstract Proceedings of the 2021 ACM SIGMETRICS / International Confer- ence on Measurement and Modeling of Computer Systems (SIGMETRICS ’21 Abstracts), June 14–18, 2021, Virtual Event, China.ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/3410220.3453926

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored.

For all other uses, contact the owner /author(s).

SIGMETRICS ’21 Abstracts, June 14–18, 2021, Virtual Event, China

© 2021 Copyright held by the owner/author(s).

ACM ISBN 978-1-4503-8072-0/21/06.

https://doi.org/10.1145/3410220.3453926

1 COMBINATORIAL SEMI-BANDITS

We consider combinatorial semi-bandits: time is discrete, and at timest=1, ...,T a learner chooses a decisionx(t)∈ X, whereX ⊂ {0,1}dis a combinatorial set which is known to the learner. SetX may be any combinatorial set, including the bases of a matroid, the set of paths in some graph, the set of matchings in a bipartite graph, etc. The problem dimension isd, and we definem=maxx∈X1x the size of the largest decision. After selecting decisionx(t), the learner then receives a rewardZ(t)x(t)and observes a feedback vectorY(t)=(x1(t)Z1(t), . . . ,xd(t)Zd(t)), whereZ(t)∈[0,1]dis a random vector.

We assume that (Z(t))t are i.i.d. with meanθ ∈ [0,1]d and that the entries ofZ(t)are independent as well. Vectorθis initially unknown to the learner, and must be learnt by repetitively selecting decisions and observing subsequent feedback. Fori∈ {1, ...,d}, if xi(t) =1, then the learner obtains a noisy realization ofθi and nothing otherwise, so that decisions must be carefully selected to obtain a good estimate ofθ. This is the "semi-bandit feedback"

model. Sinceθis unknown to the learner, decisionx(t)must be selected solely as a function of the feedback information available at timet, i.e.Y(t−1), ...,Y(1).

The expected reward received by selecting decisionx ∈ Xis θx(i.e. rewards are linear in the decision), so thatθirepresents the amount of reward received by selectingxi =1. The optimal decision isx∈arg maxx∈Xx}(there may be several optimal decisions). We define the reward gap∆x(x−x), i.e. the amount of regret incurred to the learner by selecting decisionx instead ofx. We denote by∆min =minx:

x>0x the smallest non-null gap.

The goal of the learner is to minimize the regret, which is simply the difference in terms of expected cumulative rewards between the learner and an oracle who knows the latent vectorθin advance and who always selects the optimal decisionx, that is:

R(T)= XT t=1

E(∆x(t)).

Known algorithms for this problem include CUCB [2], ESCB [1]

and TS [3].

2 THE AESCB ALGORITHM

We now propose AESCB (Approximate-ESCB), an algorithm that approximates ESCB and enjoys the same regret bound, while being implementable with polynomial complexity (unlike ESCB). The AESCB algorithm requires two sequences(εtt), which quantify the level of approximation at each time step. We define the following

(3)

SIGMETRICS ’21 Abstracts, June 14–18, 2021, Virtual Event, China Thibaut Cuvelier, Richard Combes & Eric Gourdin

statistics, fori=1, ...,d: ni(t)=

t−1

X

t=1

xi(t) θˆ(t)=

Pt−1

t=1xi(t)Zi(t) max(1,Pt−1

t=1xi(t)) σi2(t)=

f(t)

2ni(t) ifni(t)≥1 +∞ otherwise.

where, at timet,ni(t)is the number of samples obtained forθi,θˆi(t) is the estimate ofθi, andσi2(t)is proportional to the variance of estimateθˆi(t).f(t)is defined as lnt+4mln lnt. We denote byn(t)= (ni(t))i=1,...,d,θˆ(t)=(θˆi(t))i=1,...,d, andσ2(t) =(σi2(t))i=1,...,d

the corresponding vectors.

Definition 2.1 (AESCB). The AESCB algorithm with approxima- tion factors(εtt)t≥1is the policy which at any timet≥1selects a decisionx(t)verifying:

arg max

x∈X

{θˆ(t)x+q

σ2(t)x} ≤δt+θˆ(t)x(t)+ 1 εt

2(t)x(t)

where ties are broken arbitrarily.

When(εtt) = (1,0)for allt ≥ 1, AESCB reduces to ESCB.

Our first main result is Theorem 2.2, which provides a regret upper bound for AESCB. We show that, if one chooses approximation parameters(εtt)withεt=ε>0 some fixed number andδtany sequence such that limt→∞δt =0, then AESCB verifies the same (state-of-the-art) regret as ESCB up to a multiplicative constant. For m-sets, knapsack sets, and source destination paths, we chooseε=1.

For spanning trees, matroids, matchings, and matroid intersection, we chooseε = 12 (see Section 3). This choice of parameters does not require any knowledge about the time horizonT, nor about the unknown problem parametersθ, nor about the minimal gap∆min. Nevertheless, if∆minis known as well, we can selectδtto yield an even better algorithm; however, knowing this parameter is by no means required. We can show that, with this choice of parameters, AESCB can be implemented in polynomial time.

Theorem 2.2 (Regret of AESCB). The regret of AESCB with parameters(εtt)admits the following upper bound for allT ≥1:

R(T)≤C4(m)+2d m3

2min + 24d f (T) (mint≤Tεt)2min

&

lnm 1.61 '2

+4

T

X

t=1

δt1(∆min≤4δt).

withC4(m)a positive number that solely depends onm. By corollary, forεt =εandlimt→∞δt =0, we have:

R(T)=O d(lnm)2 1

min

lnT!

as T → ∞.

Similarly, withεt =εandδt < 14min, we have, for allT ≥1: R(T)≤C4(m)+2d m3

2min +24d f (T) ε2min

&

lnm 1.61 '2

.

3 AESCB IN POLYNOMIAL TIME

We now show a technique to implement AESCB that ensures poly- nomial time complexity. While our methodology is generic, the precise value of the computational complexity depends on the combinatorial setX. Our approach involves three steps: rounding and scaling to ensure that the weights are integer, then solving a budgeted linear maximization overXseveral times, and finally maximizing over the budget to obtain the result. Given timet, statis- ticsθˆ(t)andσ2(t), and approximation factors(εtt), the method works as follows.

Step 1: rounding and scaling.Definea(t)andb(t): ξ(t)=⌈m/δt⌉.

ai(t)=⌈ξ(t)θˆi(n)⌉,i∈ {1, ...,d} bi(t)=ξ(t)2σi2(t),i∈ {1, ...,d}

Step 2: budgeted linear maximization.For alls ∈ {0, ...,mξ(t)}, compute ¯xs(t), anεt-optimal solution to budgeted linear maximiza- tion problem:

s(t)≥εt

max

x∈X:a(t)xs{b(t)x}

and a(t)s(t)≥s.

Step 3: optimizing over a budget.Return decisionx(t): x(t)=x¯s(t)(t)with

s(t)∈ arg max

s=0,...,mξ(t)

( s+ 1

εt

qb(t)s(t) )

.

a(t)is defined using a ceiling operation in order to ensure that a(t)x has an integer value for anyx ∈ X, whileb(t)does not need to have integer entries. Theorem 3.1 states that this technique returns the decision chosen by AESCB, in a time proportional to solving budgeted linear maximization at mostmξ(t)times (where ξ(t)is bounded by a polynomial ind), and that the input parameters a(t)andb(t)are positive vectors and where the entries ofa(t)are

in{1, ...,ξ(t)}. For many combinatorial sets of interest, budgeted

linear maximization overXcan be done in polynomial time in the dimension, so that AESCB is indeed implementable in polynomial time, see the complete version of this work where we provide algorithms to do so.

Theorem 3.1. The above algorithm returns a decisionx(t)∈ X verifying the AESCB definition. It does so by maximizingxb(t)sub- ject toxa(t)≥soverXat mostmξ(t)times with input parameters a(t)andb(t), wherea(t)∈ {1, ...,ξ(t)}d andb(t)∈Rd.

4 CONCLUSION

We propose AESCB, the first algorithm which enjoys both the state- of-the art regret bound of ESCB and polynomial computational complexity.

REFERENCES

[1] Richard Combes, M. Sadegh Talebi, Alexandre Proutiere, and Marc Lelarge. 2015.

Combinatorial Bandits Revisited. InProc. of NIPS.

[2] Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari. 2015. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits. InProc. of AISTATS. [3] Siwei Wang and Wei Chen. 2018. Thompson Sampling for Combinatorial Semi-

Bandits. InProc. of ICML.

Références

Documents relatifs

We propose AESCB (approximate ESCB), the first algorithm that is both computationally and statistically efficient for this problem: it has a state-of-the-art regret bound and

To conclude these preliminaries, we recall known fast algorithms for three polynomial matrix subroutines used in our determinant algorithm: multiplication with unbalanced

Furthermore, as shown by Exam- ple 1, in order to determine this differential Hilbert function it may be necessary to compute derivatives of the original equations up to order ν,

In this paper we consider the computational complexity of solving initial-value problems defined with analytic ordinary differential equations (ODEs) over unbounded domains in R n and

Pollard did not analyze the asymptotic complexity of his method, but it can be shown that its recursive use for the arithmetic

We describe a polynomial-time algorithm to compute the homotopic Fr´echet distance between two given polygonal curves in the plane minus a given set of obstacles, which are

DLAL keeps the same properties as LAL (P- completeness and polynomial bound on execution) but ensures the complexity bound on the lambda-term it- self: if a term is typable one

Characterizing complexity classes over the reals requires both con- vergence within a given time or space bound on inductive data (a finite approximation of the real number)