• Aucun résultat trouvé

A minimax and asymptotically optimal algorithm for stochastic bandits

N/A
N/A
Protected

Academic year: 2021

Partager "A minimax and asymptotically optimal algorithm for stochastic bandits"

Copied!
16
0
0

Texte intégral

Loading

Références

Documents relatifs

In this paper, the uncoordinated spectrum access problem, where each user is allowed to choose N channels at each timeslot, was modeled as a multi-user MAB with multiple plays

T ) (Audibert & Bubeck, 2010) and the risk-aversion setup has Ω(T 2/3 ) (Sani et al., 2012), therefore our worst case result implies that mini- mizing the quantile-based regret

Differential privacy of the Laplace mecha- nism (See Theorem 4 in (Dwork and Roth 2013)) For any real function g of the data, a mechanism adding Laplace noise with scale parameter β

In what follows, we put Lemma 1 to use and prove improved high-probability performance guaran- tees for several well-studied variants of the non-stochastic bandit problem, namely,

The first term is due to the difference in the mean–variance of the best arm and the arms pulled by the algorithm, while the second term denotes the additional variance introduced

This result illustrates the fact that in this active learning problem (where the goal is to estimate the mean values of the arms), the performance of the algorithms that rely on

Puis, on propose des algorithmes pour la minimisation du regret dans les problèmes de bandit stochastique paramétrique dont les bras appartiennent à une certaine famille

1) Effet du stress oxydatif sur les activités enzymatiques antioxydantes et des marqueurs du stress oxydatif. Dans cette partie, l’impact du stress oxydatif sur