# Haut PDF On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

### On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

**the**usual situation when γ is close to ...consequence

**of**this result is that

**the**problem

**of**“computing an approximately optimal

**non**-

**stationary**policy” is much simpler than ...

5

### Algorithmic aspects of mean–variance optimization in Markov decision processes

**the**literature.

**For**example, (Guo, Ye, & Yin, 2012) consider a mean-variance optimization problem, but subject to a constraint on

**the**vector

**of**expected rewards starting from each ...

26

### Lexicographic refinements in possibilistic decision trees and finite-horizon Markov decision processes

**of**this work, not unrelated, is to develop simulation-based algorithms

**for**finding lexico- graphic solutions to ...making

**use**

**of**simulated trajectories

**of**states and ...

26

### Lexicographic refinements in stationary possibilistic Markov Decision Processes

**of**lexicographic refinements to finite

**horizon**possibilistic

**Markov**

**decision**

**processes**and proposes a value iteration algorithm that looks

**for**

**policies**optimal ...

22

### Lexicographic refinements in possibilistic decision trees and finite-horizon Markov decision processes

**decision**theory has been proposed twenty years ago and has had several extensions since ...pealing

**for**its ability to handle qualitative

**decision**problems, possibilistic

**decision**...

27

### Lightweight Verification of Markov Decision Processes with Rewards

**the**classic ‘sparse sampling algorithm’

**for**large, in- finite

**horizon**,

**discounted**...approximating

**the**best action from a current state, using a stochastic depth-first ...search. ...

16

### Scalable Verification of Markov Decision Processes

**The**Kearns algorithm [13] is

**the**classic ‘sparse sampling algorithm’

**for**large,

**infinite**

**horizon**,

**discounted**...approximating

**the**best action from a current state ...

13

### Limits of Multi-Discounted Markov Decision Processes

**the**mean–payoff parity and

**the**priority weighted function are both generalizations

**of**parity and mean–payoff functions they have radically different prop- ...erties.

**The**main ...

13

### Smart Sampling for Lightweight Verification of Markov Decision Processes

**The**classic algorithms to solve MDPs are ‘policy iteration’ and ‘value iteration’ ...algorithms

**for**MDPs may

**use**value iteration applied to probabilities [1, ...solve

**the**same problem ...

14

### Efficient Policies for Stationary Possibilistic Markov Decision Processes

**the**

**infinite**

**horizon**case is concerned, other types

**of**lexicographic refinements could be ...One

**of**these options could be to avoid

**the**duplication

**of**

**the**set ...

12

### On the link between infinite horizon control and quasi-stationary distributions

**the**definition

**of**

**the**controlled

**non**-linear branch- ing

**processes**and to

**the**proof

**of**preliminary ...Using

**the**criteria

**of**[CV15a], we also state in ...

31

### Efficient Policies for Stationary Possibilistic Markov Decision Processes

**the**

**infinite**

**horizon**case is concerned, other types

**of**lexicographic refinements could be ...One

**of**these options could be to avoid

**the**duplication

**of**

**the**set ...

11

### Exact aggregation of absorbing Markov processes using quasi-stationary distribution

**the**conditions under which an absorbing Markovian finite process (in dis- crete or continuous time) can be transformed into a new aggregated process conserving

**the**Markovian property, whose ...

10

### A Stochastic Minimax Optimal Control Problem on Markov Chains with Infinite Horizon

22

### Collision Avoidance for Unmanned Aircraft using Markov Decision Processes

**of**hand-crafting a collision avoidance algorithm

**for**every combination

**of**sensor and aircraft configuration, we investigate

**the**automatic generation

**of**collision avoidance ...

23

### Aggregating Optimistic Planning Trees for Solving Markov Decision Processes

**The**optimistic part

**of**

**the**algorithm allows a deep exploration

**of**

**the**...At

**the**same time, it biases

**the**expression maximized by ˆ π in (4) towards near-optimal actions ...

9

### Approximate solution methods for partially observable Markov and semi-Markov decision processes

**the**local minimum was found, which also shows that

**the**ap- proach

**of**finite-state controller with policy gradient is quite effective

**for**this ...problem.

**The**initial policy has ...

169

### On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems

**of**playing a policy with

**the**highest expected reward, and

**the**regret grows as

**the**logarithm

**of**T ...bounds

**for**

**the**regret have been derived (see Auer et ...Though ...

25

### Incorporating Bayesian networks in Markov Decision Processes

**for**

**the**first time period, in

**the**case

**of**using

**the**BN, is i 2 , whereas it is i 3 that has a smaller SD (hence, it is costlier)

**for**

**the**case

**of**using a ...

11

### DetH*: Approximate Hierarchical Solution of Large Markov Decision Processes

**for**large, factored

**Markov**

**decision**...types

**of**leverage to

**the**problem: it shortens

**the**

**horizon**using an automatically generated temporal hierarchy and it reduces ...

9