• Aucun résultat trouvé

Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris

N/A
N/A
Protected

Academic year: 2021

Partager "Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris"

Copied!
55
0
0

Texte intégral

Figure

Figure 1: Visualizing λ Policy Iteration on the greedy partition sketch: Following (Bertsekas and Tsitsiklis, 1996, page 226), one can decompose the value space as a collection of polyhedra, such that each polyhedron corresponds to a region where one polic
Figure 2: λ Policy Iteration, a fundamental algorithm for Reinforcement Learn- Learn-ing: The left drawing, taken from chapter 10.1 the book of Sutton and Barto (1998), represents “two of the most important dimensions” of Reinforcement Learning methods
Figure 3: This simple deterministic MDP is used to show that λ Policy Iteration cannot be analysed in terms of contraction (see text for details).
Figure 4: Left: a screenshot of a Tetris game. Right: the seven existing shapes.
+3

Références

Documents relatifs

Similarly to the 1-dimensional case, we shall prove the compactness of functions with equi- bounded energy by establishing at first a lower bound on the functionals E h.. Thanks to

In this study, we provide new exponential and polynomial bounds to the ratio functions mentioned above, improving im- mediate bounds derived to those established in the

Even though the theory of optimal control states that there exists a stationary policy that is optimal, Scherrer and Lesner (2012) recently showed that the performance bound of

The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.. L’archive ouverte pluridisciplinaire HAL, est

In trials of CHM, study quality and results depended on language of publication: CHM trials published in English were of higher methodological quality and showed smaller effects

Spruill, in a number of papers (see [12], [13], [14] and [15]) proposed a technique for the (interpolation and extrapolation) estimation of a function and its derivatives, when

For the last introduced algorithm, CBMPI, our analysis indicated that the main parameter of MPI controls the balance of errors (between value function approximation and estimation

Shi, Bounds on the matching energy of unicyclic odd-cycle graphs, MATCH Commun.. Elumalai, On energy of graphs,