Precisely, any probability π can be decomposed as a pair (p, Q) where p is a probability over K and Q is a transition probability from K to L. Then, one may consider v θ as a function of
(p, Q) and this function is concave with respect to p. A dual notion of II-convexity was also introduced and the notions of I-concave and II-convex envelopes were the building blocks of the system of functional equations characterizing the limit value. Based on this characterization, a construction of asymptotically **optimal** **strategies** (i.e. **strategies** being almost **optimal** in G θ (π),

En savoir plus
Outline of the paper. Se
tion 2 re
alls the
lassi
al notions about sim- ple sto
hasti
games. In Se
tion 3, we show that the dierent qualitative
riteria are equivalent in nite turn-based sto
hasti
tail games, and dene a new notion of qualitative determina
y. Se
tion 4 takes on the quanti- tative problems, and shows how a qualitative algorithm
an be used to
ompute the values of a nite turn-based sto
hasti
tail game. The exis- ten
e of **optimal** **strategies** for both players in nite turn-based sto
hasti
tail games also follows from the proofs, as well as the fa
t that **optimal** **strategies** are no more
omplex than almost-sure **strategies**.

En savoir plus
our algorithm finds the Blackwell **optimal** policy f ∗ for player 1 in the MDP Γ 1 (g).
If t 2 ≥ 0, then we focus on the state s t 1 +t 2 = s τ , which is the last examined by our algorithm.
The actions available in state s τ are A 2 (s τ ) ≡ X ∪ a i , where X = {a 1 . . . a i −1 , a i +1 . . . a n } and n ≥ 2 by hypothesis. By induction hypothesis, we suppose that the algorithm finds the uniform discount **optimal** **strategies** for both players in the game Γ τ X without cycling. Since no uniform improvements are possible in Γ τ X by definition of uniform **optimal** **strategies**, then the algorithm looks for an uniform adjacent improvement g ′ , where g ′ (a i |s τ ) = 1. There are now two possibilities.

En savoir plus
∗
(v). According to Martin’s theorem [Mar98] these values are equal, and this common value is called the value of vertex v and denoted val(v)
1.3. **Optimal** and ǫ-**optimal** **strategies**. By definition of the value, for each ǫ > 0 there exist ǫ-**optimal** **strategies** σ ǫ for player Max and τ ǫ for player Min such that for every vertex

holds thus Φ x (b Σ) ⊆ Σ and Φ e (σ ♯ ) is a strategy in G. According to the second part of
Proposition 10, the strategy Φ e (σ ♯ ) is both **optimal** in G and deterministic stationary.
Therefore Max has an **optimal** deterministic stationary strategy in G.
To find an **optimal** deterministic stationary for player Min in G it suffices to choose as a separation state a state controlled by player Min with at least two ac- tions available. Such a state exists because G is not a one-player game. By a rea- soning symmetric to the one developped previously we can construct another pair of **optimal** **strategies** (σ ⋆ , τ ⋆ ) in G, however now the strategy τ ⋆ of player Min will be

En savoir plus
The interest in such a result is threefold.
First we think that establishing a very strong link between two apparently different classes of games has its own intrinsic interest.
Discounted games were thoroughly studied in the past and our result shows that algorithms for such games can, in principle, be used to solve parity games (admittedly all depends on how much the discount factor should be close to 1 in order that two types of games become close enough, and this remains open). Another point concerns the stability of solutions (**optimal** **strategies** and games values) under small perturbations. When we examine stochastic games then the natural question is where the transition probabilities come from? If they come from an observation then the values of transition probabilities are not exact. On the other hand algorithms for stochastic games use only rational transition probabilities thus even if we know the exact probabilities we replace them by close rational values. What is the impact of such approximations on solutions, are **optimal** **strategies** stable under small perturbations? Usually we tacitly assume that this is the case but it would be better to be sure. Since Blackwell-**optimal** **strategies** studied in Section 4 are stable under small perturbations of discount factors (because they do not depend on the discount factor) this adds some credibility to the claim that Blackwell **optimal** **strategies** are stable for parity games.

En savoir plus
Near-**Optimal** **Strategies** for Nonlinear and Uncertain Networked Control Systems
Lucian Bus¸oniu Romain Postoyan Jamal Daafouz
Abstract—We consider problems where a controller commu- nicates with a general nonlinear plant via a network, and must optimize a performance index. The system is modeled in discrete time and may be affected by a class of stochastic uncertainties that can take finitely many values. Admissible inputs are constrained to belong to a finite set. Exploiting some optimistic planning algorithms from the artificial intelligence field, we propose two control **strategies** that take into account the communication constraints induced by the use of the network. Both **strategies** send in a single packet long-horizon solutions, such as sequences of inputs. Our analysis characterizes the relationship between computation, near-optimality, and trans- mission intervals. In particular, the first strategy imposes at each transmission a desired near-optimality, which we show is related to an imposed transmission period; for this setting, we analyze the required computation. The second strategy has a fixed computation budget, and within this constraint it adapts the next transmission instant to the last state measurement, leading to a self-triggered policy. For this case, we guarantee long transmission intervals. Examples and simulation experiments are provided throughout the paper.

En savoir plus
6 Conclusion
In this paper we have described an algorithm to synthesize **optimal** **strategies** for a sub-class of priced timed game automata. The algorithm is based on the work described in [6] where we proved this problem was decidable (under some hypotheses we recall in this paper). Morever, we also provide an implemen- tation of our algorithm in HyTech and demonstrate it on small case-studies. In a recent paper [2] Alur et al. addressed a related problem i.e. “compute the **optimal** cost within k steps”. They give a complexity bound for this restricted “bounded” problem and prove that the splitting incurred by the computation of the **optimal** cost within k steps only yields an exponential number (in k and the size of the automaton) of subregions. They do not consider the problem of strategy synthesis.

En savoir plus
– in addition to the previous new results on **optimal** cost computation that extend the ones in [14, 1] we also tackle the problem of strategy synthesis. In particular we study the properties of the **strategies** (memoryless, cost-dependence) needed to achieve the **optimal** cost which is a natural question that arises in game theory. For example, in [1] setting, it could be the case that in two instances of the unfolding of the game, the values of a strategy for a given state are different. In this paper we prove that if an **optimal** strategy exists then one can effectively construct an **optimal** strategy which only depends on the current state and on the accumulated cost since the beginning of the play. We also prove that under some assumptions, if an **optimal** strategy exists then a state-based cost-independent strategy exists and can be effectively computed (theorem 7).

En savoir plus
end for
IMED-GS 2 strategy At time step t> 1 the choice of user b t is no longer strategy-dependent but is imposed by the sequence of users (b t ) t>1 which is assumed to be deterministic in the uncontrolled scenario. The learner only chooses an arm to pull a t knowing user b t . We define IMED-GS 2 to be the strategy consisting of pulling an arm with minimum index in Algorithm 3 of Appendix C . IMED-GS 2 suffers the same advantages and shortcomings as IMED-GS . It does not exploit optimally the structure of the problem but it works well in practice, see Section 4 , and has a low computational complexity. IMED-GS ? 2 strategy In order to explore optimally according to the graph structure in the uncontrolled scenario, we also track the **optimal** numbers of pulls. β may be at first glance different from 1 B . This requires some normalizations. First, for all time step t > 1, n opt (t) now denotes a solution of the empirical version of ( 2 ) with β = ( ˆ β b (t)) b∈B where ˆ β b (t) =log(N b (t))/ log(t) estimates log- frequency β b of user b ∈ B. Second, we have to consider normalized indexes e I a,b (t) = I a,b (t)/ ˆ β b (t) for couples (a, b) ∈ A × B in order to have e I a,b (T ) ∼ log(T ) as in the controlled scenario. An additional difficulty is that at a given time step t > 1, while the indexes indicate to explore, the current tracked user (see Equation 7 ) given is likely to be different from user b t with whom the learner deals. This difficulty is easy to circumvent by postponing and prioritizing the exploration until the learner deals with the tracked user. Priority in exploration phases is given to first delayed forced-exploration and delayed exploration based on solving optimization problem ( 2 ), then exploration based on current indexes (see Algorithm 4 in Appendix C ). IMED-GS ? 2 corresponds essentially to IMED-GS ? with some delays due to the fact that the tracked and the current users may be different. This has no impact on the optimality of IMED-GS ? 2 since log-frequencies of users are enforced to be positive.

En savoir plus
WITH IMPERFECT EFFICACY AND DURABILITY OF PROTECTION
FRANCESCO SALVARANI AND GABRIEL TURINICI
Abstract. We analyze a model of agent based vaccination campaign against influenza with imperfect vaccine efficacy and durability of protection. We prove the existence of a Nash equilibrium by Kakutani’s fixed point theorem in the context of non-persistent immunity. Subsequently, we propose and test a novel numerical method to find the equilibrium. Various issues of the model are then discussed, such as the dependence of the **optimal** policy with respect to the imperfections of the vaccine, as well as the best vaccination timing. The numerical results show that, under specific circumstances, some counter-intuitive behaviors are **optimal**, such as, for example, an increase of the fraction of vaccinated individuals when the efficacy of the vaccine is decreasing up to a threshold. The possibility of finding **optimal** **strategies** at the individual level can help public health decision makers in designing efficient vaccination campaigns and policies.

En savoir plus
by RL to compute new STI **strategies**.
The paper is structured as follows. Section 2 formal- izes the problem of learning **optimal** **strategies** from a set of trajectories and introduces the algorithms used in our simulations. Section 3 reports simulation results obtained by using the RL-based approach to determine from clinical data **optimal** STI **strategies**. Instead of actual clinical data, we have used synthetic ones ob- tained from simulations with a computer model of the HIV infection dynamics. In Section 4, we suggest ways to overcome difficulties that may arise when relying on real-life data rather than numerically generated ones. Section 5 concludes and Appendix A gathers informa- tion about the mathematical model of HIV dynamics used in the data generation process.

En savoir plus
The paper is structured as follows. Section II formalizes the problem of learning **optimal** **strategies** from a set of trajectories and introduces the algorithms used in our simula- tions. Section III reports simulation results obtained by using the RL-based approach to determine **optimal** STI **strategies** from clinical data. Instead of actual clinical data, we have used synthetic ones obtained from simulations with an ODE model of the HIV infection dynamics. In Section IV, we suggest ways to overcome difficulties that may arise when relying on real-life data rather than numerically generated ones. Section V concludes and the Appendix gathers infor- mation about the mathematical model of HIV dynamics used in the data generation process.

En savoir plus
• or a penalization (more precisely a decreasing function) of the final proportion of Wolbachia-infected mosquitoes at the final time of the experiment. Note that the horizon of time will be considered fixed in this case.
This will lead us to introduce two large families of relevant optimization problems in order to model this issue. Analyzing them will allow us to discuss **optimal** **strategies** of mosquito releasing and also the robustness of the properties of the solutions with respect to the modeling choices (in particular the choice of the functional we optimize).

While the linear influence function is consistent with the well-established Friedkin-Johnsen model, the influence of a camp on a node might not increase linearly with the corre- sponding investment. In fact, several social and economic set- tings follow law of diminishing marginal returns, which says that for higher investments, the marginal returns (influence in this context) are lower for a marginal increase in investment. An example of this law is when we watch a particular product advertisement on television; as we watch the advertisement more number of times, its marginal influence on us tends to get lower. A concave influence function naturally captures this law. We study such an influence function in the settings of both unbounded and bounded investment per node, and relate it to the skewness of investment in **optimal** **strategies** as well as user perception of fairness. We study this in Section IV.

En savoir plus
Mosquitoes are responsible for the transmission of many diseases such as dengue fever, zika or chigungunya. One way to control the spread of these diseases is to use the sterile insect technique (SIT), which consists in a massive release of sterilized male mosquitoes. This strategy aims at reducing the total population over time, and has the advantage being specific to the targeted species, unlike the use of pesticides. In this article, we study the **optimal** release **strategies** in order to maximize the efficiency of this technique. We consider simplified models that describe the dynamics of eggs, males, females and sterile males in order to optimize the release protocol. We determine in a precise way **optimal** **strategies**, which allows us to tackle numerically the underlying optimization problem in a very simple way. We also present some numerical simulations to illustrate our results.

En savoir plus
WITH IMPERFECT EFFICACY AND DURABILITY OF PROTECTION
FRANCESCO SALVARANI AND GABRIEL TURINICI
Abstract. We analyze a model of agent based vaccination campaign against influenza with imperfect vaccine efficacy and durability of protection. We prove the existence of a Nash equilibrium by Kakutani’s fixed point theorem in the context of non-persistent immunity. Subsequently, we propose and test a novel numerical method to find the equilibrium. Various issues of the model are then discussed, such as the dependence of the **optimal** policy with respect to the imperfections of the vaccine, as well as the best vaccination timing. The numerical results show that, under specific circumstances, some counter-intuitive behaviors are **optimal**, such as, for example, an increase of the fraction of vaccinated individuals when the efficacy of the vaccine is decreasing up to a threshold. The possibility of finding **optimal** **strategies** at the individual level can help public health decision makers in designing efficient vaccination campaigns and policies.

En savoir plus
Discussion
Quantitative growth laws are empirical regularities pointing at fundamental properties of microbial life [50]. Recent work has led to the precise theoretical formulation of growth laws and has shown that they can be derived from basic assumptions on the molecular processes responsible for the assimilation of nutrients and their conversion to biomass [11, 13, 15, 17, 18]. The growth laws are uniquely defined under the hypothesis that microorganisms allocate resources in such a way as to maximize their growth rate. Several of the above-mentioned studies have analyzed feedback control **strategies** on the molecular level enabling cells to achieve **optimal** resource allocation in a robust manner. The control **strategies** exploit infor- mation on the physiological state of the cell to adjust the (relative) rate of synthesis of different classes of proteins (ribsomes, metabolic enzymes, . . .). Whereas the growth laws describe microbial growth at steady state, most microorganisms live in complex, continuously chang- ing environments. Despite some precursory work [25, 26], questions about the dynamics of microbial growth remain largely unanswered: Which resource allocation schemes are **optimal** in changing environments? Which dynamical control **strategies** lead to (near-)**optimal** resource allocation? How do these **strategies** compare with those actually implemented by microorganisms?

En savoir plus