Feature Discovery in Reinforcement Learning using Genetic Programming
Texte intégral
Documents relatifs
For evaluating the individuals and testing the performance of the discovered features, we employed two different RL algorithms: λ policy iteration for the Tetris problem, and
In order to exemplify the behavior of least-squares methods for policy iteration, we apply two such methods (offline LSPI and online, optimistic LSPI) to the car-on- the-hill
As predicted by the theoretical bounds, the per-step regret b ∆ of UCRL2 significantly increases as the number of states increases, whereas the average regret of our RLPA is
The analysis for the second step is what this work has been about. In our Theorems 3 and 4, we have provided upper bounds that relate the errors at each iteration of API/AVI to
For the last introduced algorithm, CBMPI, our analysis indicated that the main parameter of MPI controls the balance of errors (between value function approximation and estimation
The LTU is particularly high among the low schooled employed and mainly affects the young adults (25-44 years of age) specially women. The national priorities for the fight
Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy
[Benallegue 91] Bennallegue A., "Contribution à la commande dynamique adaptative des robots manipulateurs rapides", Thèse de Doctorat, Université Pierre et Marie Curie, Paris,