Approximate Policy Iteration for Generalized Semi-Markov Decision Processes: an Improved Algorithm
Texte intégral
Documents relatifs
there are tight temporal constraints, delaying a task does not reduce the expected value, EOC and OCr are quite the same when resources are large.. Improvements are much larger
We prove that the number of queries needed before finding an optimal policy is upper- bounded by a polynomial in the size of the problem, and we present experimental results
In order to exemplify the behavior of least-squares methods for policy iteration, we apply two such methods (offline LSPI and online, optimistic LSPI) to the car-on- the-hill
The analysis for the second step is what this work has been about. In our Theorems 3 and 4, we have provided upper bounds that relate the errors at each iteration of API/AVI to
For the classification- based implementation, we develop a finite- sample analysis that shows that MPI’s main parameter allows to control the balance be- tween the estimation error of
For many classes of dynamical systems, including all POMDPs, it has been shown that special sets of such future events, called the set of core tests can fully represent the state,
For the last introduced algorithm, CBMPI, our analysis indicated that the main parameter of MPI controls the balance of errors (between value function approximation and estimation
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des