[PDF] Top 20 Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning

Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning

... in non-stationary stochastic environments. We study Markov Decision Processes (MDPs) evolving over time and consider Model-Based Reinforcement Learning ... Voir le document complet

19

Lexicographic refinements in stationary possibilistic Markov Decision Processes

... by using certainty equivalents, as in [ 11 ...simulate a large number of times: simulation may be used to generate samples on which expert elicitation is ...possibilistic reinforcement ... Voir le document complet

22

Efficient Policies for Stationary Possibilistic Markov Decision Processes

... horizon case is concerned, other types of lexicographic refinements could be ...in a single trajectory and consider only those which are observed. A second perspective of this work will be to define ... Voir le document complet

12

Efficient Policies for Stationary Possibilistic Markov Decision Processes

... horizon case is concerned, other types of lexicographic refinements could be ...in a single trajectory and consider only those which are observed. A second perspective of this work will be to define ... Voir le document complet

11

Limits of Multi-Discounted Markov Decision Processes

... all based on the fact that discounted MDPs admit pure stationary optimal ...In a classical approach (see for example [15]), existence of pure stationary optimal strategies in discounted ... Voir le document complet

13

Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

... algorithms based on modifications of R- MAX and the recent Bayesian exploration bonus (BEB) approach ...of stationary unstructured, finite state and action spaces, we provide a kind of ... Voir le document complet

10

A Reinforcement Learning Approach to Interval Constraint Propagation

... follows a pure exploitation strategy, enhancing it with a good initial guess yields a more focused ...is a better choice to keep α = 0.9 as the default value for bcrl-j in case the ... Voir le document complet

24

A cautious approach to generalization in reinforcement learning

... in a similar setting as the CGRL algorithm, while not exploiting the weak prior knowledge about the environment, do not output a lower bound on the return of the policy h they infer from the sample of ... Voir le document complet

10

A Reinforcement Learning Approach to Protein Loop Modeling

... represented using simplified models where the degrees of freedom (DOFs) are the dihedral bond ...provide a good review of the techniques applied to loop sampling ... Voir le document complet

3

Incorporating Bayesian networks in Markov Decision Processes

... are a special class of Bayesian networks that can be used for modeling time series data and represent stochastic proc- ...of a sequence of time slices (which are often ...be a primal ...sented ... Voir le document complet

11

Toggling a genetic switch using reinforcement learning

... mCherry). A typical input is a targeted induction of the gene expression, which can be achieved by, ...to a target region in the state-space), minimal burden control (minimal expression of ... Voir le document complet

11

Clinical data based optimal STI strategies for HIV: a reinforcement learning approach

... to a high level, consequently activating an adaptive im- mune ...by a set of Ordinary Differ- ential Equations (ODEs), and deduction of STI strategies from them is done by using methods from ... Voir le document complet

8

Clinical data based optimal STI strategies for HIV: a reinforcement learning approach

... Fitted Q iteration computes from F the functions ˆ Q 1 , ˆ Q 2 , . . ., ˆ Q N , approximations of Q 1 , Q 2 , . . ., Q N . Computation done iteratively by solving a sequence of standard supervised learning ... Voir le document complet

22

Clinical data based optimal STI strategies for HIV: a reinforcement learning approach

... by a set of Ordinary Differential Equations (ODEs), and deduction of STI strategies from them is done by using methods from control ...however a complex ...identify a model of the HIV ... Voir le document complet

7

Lightweight Verification of Markov Decision Processes with Rewards

... infection case study is based on [21] and initially comprises the following sets of linked nodes: a set containing one node infected by a virus, a set with no infected nodes and ... Voir le document complet

16

Voronoi model learning for batch mode reinforcement learning

... learned model also leads to high returns on the real ...Voronoi Reinforcement Learning algorithm This algorithm approximates the reward function ρ and the system dynamics f using piecewise ... Voir le document complet

10

Decentralized Control of Partially Observable Markov Decision Processes Using Belief Space Macro-Actions

... Observable Markov Decision Processes using Belief Space Macro-actions Shayegan Omidshafiei, Ali-akbar Agha-mohammadi, Christopher Amato, Jonathan ...Observable Markov De- cision ... Voir le document complet

9

Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions

... Observable Markov Decision Processes using Belief Space Macro-actions Shayegan Omidshafiei, Ali-akbar Agha-mohammadi, Christopher Amato, Jonathan ...Observable Markov De- cision ... Voir le document complet

9

Reinforcement learning based design of sampling policies under cost constraints in Markov Random Fields

... combines a parameterized representation of the value of a policy, the construction of a batch of simulated trajectories of the MDP and a backwards induction ...problem. Based on an ... Voir le document complet

35

Asymptotic properties of constrained Markov decision processes

... L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignemen[r] ... Voir le document complet

22