• Aucun résultat trouvé

Feature Discovery in Reinforcement Learning using Genetic Programming

N/A
N/A
Protected

Academic year: 2021

Partager "Feature Discovery in Reinforcement Learning using Genetic Programming"

Copied!
20
0
0

Texte intégral

Références

Documents relatifs

For evaluating the individuals and testing the performance of the discovered features, we employed two different RL algorithms: λ policy iteration for the Tetris problem, and

In order to exemplify the behavior of least-squares methods for policy iteration, we apply two such methods (offline LSPI and online, optimistic LSPI) to the car-on- the-hill

As predicted by the theoretical bounds, the per-step regret b ∆ of UCRL2 significantly increases as the number of states increases, whereas the average regret of our RLPA is

The analysis for the second step is what this work has been about. In our Theorems 3 and 4, we have provided upper bounds that relate the errors at each iteration of API/AVI to

For the last introduced algorithm, CBMPI, our analysis indicated that the main parameter of MPI controls the balance of errors (between value function approximation and estimation

The LTU is particularly high among the low schooled employed and mainly affects the young adults (25-44 years of age) specially women. The national priorities for the fight

Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy

[Benallegue 91] Bennallegue A., "Contribution à la commande dynamique adaptative des robots manipulateurs rapides", Thèse de Doctorat, Université Pierre et Marie Curie, Paris,