• Aucun résultat trouvé

Approximate Policy Iteration for Generalized Semi-Markov Decision Processes: an Improved Algorithm

N/A
N/A
Protected

Academic year: 2021

Partager "Approximate Policy Iteration for Generalized Semi-Markov Decision Processes: an Improved Algorithm"

Copied!
15
0
0

Texte intégral

Loading

Références

Documents relatifs

there are tight temporal constraints, delaying a task does not reduce the expected value, EOC and OCr are quite the same when resources are large.. Improvements are much larger

We prove that the number of queries needed before finding an optimal policy is upper- bounded by a polynomial in the size of the problem, and we present experimental results

In order to exemplify the behavior of least-squares methods for policy iteration, we apply two such methods (offline LSPI and online, optimistic LSPI) to the car-on- the-hill

The analysis for the second step is what this work has been about. In our Theorems 3 and 4, we have provided upper bounds that relate the errors at each iteration of API/AVI to

For the classification- based implementation, we develop a finite- sample analysis that shows that MPI’s main parameter allows to control the balance be- tween the estimation error of

For many classes of dynamical systems, including all POMDPs, it has been shown that special sets of such future events, called the set of core tests can fully represent the state,

For the last introduced algorithm, CBMPI, our analysis indicated that the main parameter of MPI controls the balance of errors (between value function approximation and estimation

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des