• Aucun résultat trouvé

Aggregating Optimistic Planning Trees for Solving Markov Decision Processes

N/A
N/A
Protected

Academic year: 2021

Partager "Aggregating Optimistic Planning Trees for Solving Markov Decision Processes"

Copied!
9
0
0

Texte intégral

Loading

Figure

Figure 1: Comparison of ASOP to OP-MDP, UCT, and FSSS on the inverted pendulum benchmark problem, showing the sum of discounted rewards for simulations of 50 time steps.
Figure 2: Comparison of different planning strategies (on the same problem as in figure 1)

Références

Documents relatifs

a single “empirical” MDP based on the collected information, in which states are represented by forests: on a transition, the forest is partitioned into groups by successor states,

We have proposed BOP (for “Bayesian Optimistic Plan- ning”), a new model-based Bayesian reinforcement learning algorithm that extends the principle of the OP-MDP algorithm [10], [9]

To our knowledge, this is the first simple regret bound available for closed- loop planning in stochastic MDPs (closed-loop policies are state-dependent, rather than open-loop

This demonstrates that using a regularized pseu- doinverse matrix of proportionality factors is an effective way to take into account the entanglement of all sensors and coils

The proposed algorithm has been applied on a real problem for locating stations of a Carsharing service for electrical vehicles.. The good results we obtained in comparison to PLS

Since the propagation-search value iteration method has a time consuming and some- times complex pre-processing part on MDPs, in this section we present a new algorithm,

Optimal Reactive Power Dispatch (ORPD) is a special case of OPF problem in which, control parameters are the variables which have a close rela- tionship with the reactive power

there are tight temporal constraints, delaying a task does not reduce the expected value, EOC and OCr are quite the same when resources are large.. Improvements are much larger