• Aucun résultat trouvé

Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning

N/A
N/A
Protected

Academic year: 2021

Partager "Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning"

Copied!
19
0
0

Texte intégral

Loading

Figure

Figure 1: Tree structure and results from the Non-Stationary bridge experiment.
Figure 2: Discounted return of the three algorithms for various values of .
Figure 1: The Non-Stationary bridge environment
Table 1: Summary of the number of experiment repetition, number of sampled tasks, number of episodes, maximum length of episodes and upper bounds on the number of collected samples

Références

Documents relatifs

Given a pair of menu designs and estimates of user expertise and interest, these models predict selection time for items for varying user strategies.. For each model, the reward

In order to overcome the drowning effect in possibilistic stationary sequential decision problems, we have proposed two lexicographic criteria, initially introduced by Fargier

Proposed localization models (In-Out ML, Borders ML, Combined ML): To create the training samples we take proposals of which the IoU with a ground truth bound- ing box is at least

By Proposition 3.7, we thus obtain that if an equilibrium is monotonic, an increase in the players’ optimism will lead to more reciprocated links, increasing the value of

Economic diplomacy can roughly and briefly be understood as 'the management of relations between states that affect international economic issues intending to

First, we explore chemical vapor deposition of functionalized isobenzofuran films using two different functional groups: pentafluorophenolate ester and alkyne.. Both

Broadly speaking, the performance (e.g., service levels, cost to produce and hold items) of this isolated portion of the supply chain is dictated by three factors: