Récemment recherché

Aucun résultat trouvé

Étiquettes

Aucun résultat trouvé

Document

Aucun résultat trouvé

Accueil Écoles Thèmes

Connexion

OpenDP a free Reinforcement Learning toolbox for discrete time control problems

Partager "OpenDP a free Reinforcement Learning toolbox for discrete time control problems"

N/A

N/A

Protected

Année scolaire: 2021

Info

Protected

Academic year: 2021

Partager "OpenDP a free Reinforcement Learning toolbox for discrete time control problems"

Copied!

2

0

0

2

0

0

Chargement.... (Voir le texte intégral maintenant)

Télécharger maintenant ( 2 Page )

Texte intégral

(1)

HAL Id: inria-00117392

https://hal.inria.fr/inria-00117392

Submitted on 1 Dec 2006

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

OpenDP a free Reinforcement Learning toolbox for discrete time control problems

Sylvain Gelly, Olivier Teytaud

To cite this version:

Sylvain Gelly, Olivier Teytaud. OpenDP a free Reinforcement Learning toolbox for discrete time

control problems. NIPS Workshop on Machine Learning Open Source Software, Dec 2006, Whistler

(B.C.). �inria-00117392�

(2)

OpenDP a free Reinforcement Learning toolbox for discrete time control problems

OpenDP (http://opendp.sourceforge.net) is a free software (under GPL) written in C++ for control problems: (i) with discrete time steps (either finite horizon, or approximation of infinite horizon by temporal-ring for stationnary problems), (ii) where the transition function newState = T ( state, decision ) can be encoded in C++, (iii) possibly stochastic (with the function T above depending on some random process).

OpenDP is designed to be a general framework for all control problems of the previous form. One can easily add new solving methods to the framework and can compare the results to already existing methods. OpenDP includes some benchmark problems, with parameterizable dimension and difficulty. Some benchmarks are stochastic and some are deterministic. Other benchmark prob- lems can easily be added.

OpenDP includes policy search methods in their simplest way, but its core solving method is stochastic dynamic programming. In this method, the Bell- man value is computed backward in time by iterating: (i) sampling points in the state domain, (ii) for each point, compute the best expected value given the Bellman function, (iii) the examples and values given (i) and (ii) to learn a regression which gives the next Bellman function.

For each of the three steps considered above, a toolbox is provided including classical methods. For the sampling step, we provide classes generating some classical quasi random sequences (low discrepancy sequences with or without scrambling, low dispersion, ...). For the continuous-optimisation steps, OpenDP is interfaced with famous optimisation frameworks as Opt++, Coin DFO, EO, Beagle, which include for example quasi-Newton algorithms, Hooke&Jeeves, CMAES, SAES and some other algorithms. For the learning step, OpenDP is interfaced with the Weka library and so we include all the regressions imple- mented in Weka. OpenDP is also interfaced with the Torch library (including SVM and Neural Networks) and the FANN library. The algorithms included in the interfaced toolboxes are available as plugins and so OpenDP can be used even if the libraries are not present.

The download area of OpenDP can be found at http://opendp.sourceforge.

net/. OpenDP is in beta stage of developpement. Some typical command-lines provide extensive experiments (see the webpage above), providing a simple way of testing modifications of the source code. OpenDP also comes with a graph- ical user interface written in Qt, with curve visualizations in 2D, and Bellman function visualization in 3D.

The OpenDP source code is cross-platform. OpenDP actually works under Linux and MS Windows and binary-packages can be downloaded for theses platforms. The building process and some mecanisms have to be ported to Mac OS X.

Acknowledgment

Authors are Sylvain Gelly

¹

, Olivier Teytaud

¹

, J´er´emie Mary

²

. This work was supported in part by the Pascal Network of Excellence.

1

TAO (INRIA), LRI, UMR CNRS, Univ. Paris-Sud; email firstname.lastname@lri.fr.

2

Grappa, Univ. Lille UMR CNRS, Jeremie.Mary@grappa.univ-lille3.fr

1

Références

Télécharger maintenant ( PDF - 2 Page - 104.37 KB )

Documents relatifs

Finite-Time Stabilization of Longitudinal Control for Autonomous Vehicles via a Model-Free Approach

Comparison of the longitudinal speed errors (ground truth) between classic and adaptive model- free control on real driver reference data..

An exponential turnpike theorem for dissipative discrete time optimal control problems

We have derived a condition for establishing an exponential turnpike property for nonlinear discrete time finite horizon optimal control problems, given in terms of a bound on

Discrete-time probabilistic approximation of path-dependent stochastic control problems

Next in Section 3, we provide a probabilistic interpretation of the numerical scheme, by showing that the numerical solution is equivalent to the value function of a

A discrete duality finite volume method for elliptic problems with corner singularities

Therefore, in order to obtain the corresponding error estimates, we combine the error analysis of the DDFV scheme given in [DO 05] for smooth solutions of the Laplacian problem,

Fast input-free observers for LPV discrete-time systems

Abstract— This paper deals with fast input-free partial state estimation for discrete-time LPV systems through functional observers.. The system inputs are assumed to be unknown

Characterization of LOCOS and oxidized mesa isolation in deep-sub micrometer SOI NMOS processes

Figure 8 shows a section of the basic test structure used to study sidegating. The layout of the gate lines and the alternating metal connections to the active

Chemistry models for major gas species estimation and tar prediction in fluidized bed biomass gasification

An accurate simulation of the biomass gasification process requires an appropriately detailed chemistry model to capture the various chemical processes inside the

Using SysML for Smart Surface Modeling

As an example of possible re- finement we have presented SysML diagrams corresponding to the distributed MEMS air-flow surface micromanipulator from [1]. As SysML is a language, not

Téléchargez tous les documents en téléchargeant vos documents d'étude.

Votre document sera enrichi, partagé sur 123dok FR pour vous aider à étudier.

Documents relatifs

La Turquie et le rêve européen

La Turquie et le rêve européen

13

0

0

Amérique latine : la croissance en sous-régime

Amérique latine : la croissance en sous-régime

5

0

0

Morfogênese in vitro em explantes foliares e radiculares de Jacarandá da Bahia.

Morfogênese in vitro em explantes foliares e radiculares de Jacarandá da Bahia.

11

0

0

الفاصلة القرآنية بين الإيقاع و الدلالة – سورة الحاقة أنموذجا-

الفاصلة القرآنية بين الإيقاع و الدلالة – سورة الحاقة أنموذجا-

80

0

0

Parallel manipulators. Part 2 : Theory. Singular configurations and Grassmann geometry

Parallel manipulators. Part 2 : Theory. Singular configurations and Grassmann geometry

70

0

0

Acoustic leak detection survey strategies for water distribution pipes

Acoustic leak detection survey strategies for water distribution pipes

6

0

0

Intérêt et faisabilité de l'échographie dans la prise en charge des traumatismes thoraciques aux urgences

Intérêt et faisabilité de l'échographie dans la prise en charge des traumatismes thoraciques aux urgences

38

0

0

CHROMAgar Mastitis medium: assessment of the new formulation and comparison with the previous one

CHROMAgar Mastitis medium: assessment of the new formulation and comparison with the previous one

1

0

0