Optimal strategies

Top PDF Optimal strategies:

Optimal strategies in repeated games with incomplete information: the dependent case

Precisely, any probability π can be decomposed as a pair (p, Q) where p is a probability over K and Q is a transition probability from K to L. Then, one may consider v θ as a function of (p, Q) and this function is concave with respect to p. A dual notion of II-convexity was also introduced and the notions of I-concave and II-convex envelopes were the building blocks of the system of functional equations characterizing the limit value. Based on this characterization, a construction of asymptotically optimal strategies (i.e. strategies being almost optimal in G θ (π),
En savoir plus

Optimal Strategies in Turn-Based Stochastic Tail Games

Outline of the paper. Se tion 2 re alls the lassi al notions about sim- ple sto hasti games. In Se tion 3, we show that the dierent qualitative riteria are equivalent in nite turn-based sto hasti tail games, and dene a new notion of qualitative determina y. Se tion 4 takes on the quanti- tative problems, and shows how a qualitative algorithm an be used to ompute the values of a nite turn-based sto hasti tail game. The exis- ten e of optimal strategies for both players in nite turn-based sto hasti tail games also follows from the proofs, as well as the fa t that optimal strategies are no more omplex than almost-sure strategies.
En savoir plus

Algorithms for uniform optimal strategies in two-player zero-sum stochastic games with perfect information

our algorithm finds the Blackwell optimal policy f ∗ for player 1 in the MDP Γ 1 (g). If t 2 ≥ 0, then we focus on the state s t 1 +t 2 = s τ , which is the last examined by our algorithm. The actions available in state s τ are A 2 (s τ ) ≡ X ∪ a i , where X = {a 1 . . . a i −1 , a i +1 . . . a n } and n ≥ 2 by hypothesis. By induction hypothesis, we suppose that the algorithm finds the uniform discount optimal strategies for both players in the game Γ τ X without cycling. Since no uniform improvements are possible in Γ τ X by definition of uniform optimal strategies, then the algorithm looks for an uniform adjacent improvement g ′ , where g ′ (a i |s τ ) = 1. There are now two possibilities.
En savoir plus

Optimal Strategies in Zero-Sum Repeated Games with Incomplete Information: The Dependent Case

Optimal strategies in repeated games with incomplete information: the dependent case Fabien Gensbittel ∗ , Miquel Oliu-Barton † Abstract. Using the duality techniques introduced by De Meyer (1996a, 1996b), De Meyer and Marino (2005) provided optimal strategies for both players in finitely repeated games with incomplete information on two sides, in the independent case. In this note, we extend both the duality techniques and the construction of optimal strategies to the case of general type dependence.

Optimal Strategies in Perfect-Information Stochastic Games with Tail Winning Conditions

∗ (v). According to Martin’s theorem [Mar98] these values are equal, and this common value is called the value of vertex v and denoted val(v) 1.3. Optimal and ǫ-optimal strategies. By definition of the value, for each ǫ > 0 there exist ǫ-optimal strategies σ ǫ for player Max and τ ǫ for player Min such that for every vertex

Pure and Stationary Optimal Strategies in Perfect-Information Stochastic Games with Global Preferences

holds thus Φ x (b Σ) ⊆ Σ and Φ e (σ ♯ ) is a strategy in G. According to the second part of Proposition 10, the strategy Φ e (σ ♯ ) is both optimal in G and deterministic stationary. Therefore Max has an optimal deterministic stationary strategy in G. To find an optimal deterministic stationary for player Min in G it suffices to choose as a separation state a state controlled by player Min with at least two ac- tions available. Such a state exists because G is not a one-player game. By a rea- soning symmetric to the one developped previously we can construct another pair of optimal strategies (σ ⋆ , τ ⋆ ) in G, however now the strategy τ ⋆ of player Min will be
En savoir plus

Blackwell Optimal Strategies in Priority Mean-Payoff Games

The interest in such a result is threefold. First we think that establishing a very strong link between two apparently different classes of games has its own intrinsic interest. Discounted games were thoroughly studied in the past and our result shows that algorithms for such games can, in principle, be used to solve parity games (admittedly all depends on how much the discount factor should be close to 1 in order that two types of games become close enough, and this remains open). Another point concerns the stability of solutions (optimal strategies and games values) under small perturbations. When we examine stochastic games then the natural question is where the transition probabilities come from? If they come from an observation then the values of transition probabilities are not exact. On the other hand algorithms for stochastic games use only rational transition probabilities thus even if we know the exact probabilities we replace them by close rational values. What is the impact of such approximations on solutions, are optimal strategies stable under small perturbations? Usually we tacitly assume that this is the case but it would be better to be sure. Since Blackwell-optimal strategies studied in Section 4 are stable under small perturbations of discount factors (because they do not depend on the discount factor) this adds some credibility to the claim that Blackwell optimal strategies are stable for parity games.
En savoir plus

Near-optimal strategies for nonlinear and uncertain networked control systems

Near-Optimal Strategies for Nonlinear and Uncertain Networked Control Systems Lucian Bus¸oniu Romain Postoyan Jamal Daafouz Abstract—We consider problems where a controller commu- nicates with a general nonlinear plant via a network, and must optimize a performance index. The system is modeled in discrete time and may be affected by a class of stochastic uncertainties that can take finitely many values. Admissible inputs are constrained to belong to a finite set. Exploiting some optimistic planning algorithms from the artificial intelligence field, we propose two control strategies that take into account the communication constraints induced by the use of the network. Both strategies send in a single packet long-horizon solutions, such as sequences of inputs. Our analysis characterizes the relationship between computation, near-optimality, and trans- mission intervals. In particular, the first strategy imposes at each transmission a desired near-optimality, which we show is related to an imposed transmission period; for this setting, we analyze the required computation. The second strategy has a fixed computation budget, and within this constraint it adapts the next transmission instant to the last state measurement, leading to a self-triggered policy. For this case, we guarantee long transmission intervals. Examples and simulation experiments are provided throughout the paper.
En savoir plus

Synthesis of Optimal Strategies Using HyTech

6 Conclusion In this paper we have described an algorithm to synthesize optimal strategies for a sub-class of priced timed game automata. The algorithm is based on the work described in [6] where we proved this problem was decidable (under some hypotheses we recall in this paper). Morever, we also provide an implemen- tation of our algorithm in HyTech and demonstrate it on small case-studies. In a recent paper [2] Alur et al. addressed a related problem i.e. “compute the optimal cost within k steps”. They give a complexity bound for this restricted “bounded” problem and prove that the splitting incurred by the computation of the optimal cost within k steps only yields an exponential number (in k and the size of the automaton) of subregions. They do not consider the problem of strategy synthesis.
En savoir plus

Optimal Strategies in Priced Timed Game Automata

– in addition to the previous new results on optimal cost computation that extend the ones in [14, 1] we also tackle the problem of strategy synthesis. In particular we study the properties of the strategies (memoryless, cost-dependence) needed to achieve the optimal cost which is a natural question that arises in game theory. For example, in [1] setting, it could be the case that in two instances of the unfolding of the game, the values of a strategy for a given state are different. In this paper we prove that if an optimal strategy exists then one can effectively construct an optimal strategy which only depends on the current state and on the accumulated cost since the beginning of the play. We also prove that under some assumptions, if an optimal strategy exists then a state-based cost-independent strategy exists and can be effectively computed (theorem 7).
En savoir plus

Optimal strategies for biomass productivity maximization in a photobioreactor using natural light

Optimal control of bioreactors has been studied for many years whether it was for metabolites production (Tartakovsky, Ulitzur & Sheintuch 1995), ethanol fermentation (Wang & Shyu 1997), baker yeast production (Wu, Chen & Chiou 1985) or, more gen- erally, optimal control of fed-batch processes taking kinetics uncertainties into account (Smets, Claes, November, Bastin & Van Impe 2004). The control of photobioreac- tors is however a lot more scarce in the literature, though the influence of self-shading on the optimal setpoint (Masci, Bernard & Grognard 2010) or on an MPC control algorithm (Becerra-Celis, Hafidi, Tebbani, Dumur & Isambert 2008) for productivity optimization have already been studied. The light-variation was mostly absent (Masci et al. 2010, Becerra-Celis et al. 2008) or considered to be an input that could be manip- ulated in order to impose the physiological state of the microalgae (Marxen, Vanselow, Lippemeier, Hintze, Ruser & Hansen 2005) or maximize productivity as one of the pa- rameters of bioreactor design (Suh & Lee 2003). The optimization we consider can also be classified with the Maximal Sustainable Yield problem that is classical eg. in fish- eries and has been tackled in constant and some form of specific periodic environments (Clark 1990, Dong, Chen & Sun 2007), and distincts itself from the classical periodic optimal control where periodicity was not inherent to the system (Guardabassi, Lo- catelli & Rinaldi 1974). The present result is an important generalization of the result of Grognard, Akhmetzhanov, Masci & Bernard (2010) which otherwise has not been tackled yet in the literature.
En savoir plus

Optimal Strategies for Graph-Structured Bandits

end for IMED-GS 2 strategy At time step t> 1 the choice of user b t is no longer strategy-dependent but is imposed by the sequence of users (b t ) t>1 which is assumed to be deterministic in the uncontrolled scenario. The learner only chooses an arm to pull a t knowing user b t . We define IMED-GS 2 to be the strategy consisting of pulling an arm with minimum index in Algorithm 3 of Appendix C . IMED-GS 2 suffers the same advantages and shortcomings as IMED-GS . It does not exploit optimally the structure of the problem but it works well in practice, see Section 4 , and has a low computational complexity. IMED-GS ? 2 strategy In order to explore optimally according to the graph structure in the uncontrolled scenario, we also track the optimal numbers of pulls. β may be at first glance different from 1 B . This requires some normalizations. First, for all time step t > 1, n opt (t) now denotes a solution of the empirical version of ( 2 ) with β = ( ˆ β b (t)) b∈B where ˆ β b (t) =log(N b (t))/ log(t) estimates log- frequency β b of user b ∈ B. Second, we have to consider normalized indexes e I a,b (t) = I a,b (t)/ ˆ β b (t) for couples (a, b) ∈ A × B in order to have e I a,b (T ) ∼ log(T ) as in the controlled scenario. An additional difficulty is that at a given time step t > 1, while the indexes indicate to explore, the current tracked user (see Equation 7 ) given is likely to be different from user b t with whom the learner deals. This difficulty is easy to circumvent by postponing and prioritizing the exploration until the learner deals with the tracked user. Priority in exploration phases is given to first delayed forced-exploration and delayed exploration based on solving optimization problem ( 2 ), then exploration based on current indexes (see Algorithm 4 in Appendix C ). IMED-GS ? 2 corresponds essentially to IMED-GS ? with some delays due to the fact that the tracked and the current users may be different. This has no impact on the optimality of IMED-GS ? 2 since log-frequencies of users are enforced to be positive.
En savoir plus

Optimal individual strategies for influenza vaccines with imperfect efficacy and durability of protection

WITH IMPERFECT EFFICACY AND DURABILITY OF PROTECTION FRANCESCO SALVARANI AND GABRIEL TURINICI Abstract. We analyze a model of agent based vaccination campaign against influenza with imperfect vaccine efficacy and durability of protection. We prove the existence of a Nash equilibrium by Kakutani’s fixed point theorem in the context of non-persistent immunity. Subsequently, we propose and test a novel numerical method to find the equilibrium. Various issues of the model are then discussed, such as the dependence of the optimal policy with respect to the imperfections of the vaccine, as well as the best vaccination timing. The numerical results show that, under specific circumstances, some counter-intuitive behaviors are optimal, such as, for example, an increase of the fraction of vaccinated individuals when the efficacy of the vaccine is decreasing up to a threshold. The possibility of finding optimal strategies at the individual level can help public health decision makers in designing efficient vaccination campaigns and policies.
En savoir plus

Clinical data based optimal STI strategies for HIV: a reinforcement learning approach

by RL to compute new STI strategies. The paper is structured as follows. Section 2 formal- izes the problem of learning optimal strategies from a set of trajectories and introduces the algorithms used in our simulations. Section 3 reports simulation results obtained by using the RL-based approach to determine from clinical data optimal STI strategies. Instead of actual clinical data, we have used synthetic ones ob- tained from simulations with a computer model of the HIV infection dynamics. In Section 4, we suggest ways to overcome difficulties that may arise when relying on real-life data rather than numerically generated ones. Section 5 concludes and Appendix A gathers informa- tion about the mathematical model of HIV dynamics used in the data generation process.
En savoir plus

Clinical data based optimal STI strategies for HIV: a reinforcement learning approach

The paper is structured as follows. Section II formalizes the problem of learning optimal strategies from a set of trajectories and introduces the algorithms used in our simula- tions. Section III reports simulation results obtained by using the RL-based approach to determine optimal STI strategies from clinical data. Instead of actual clinical data, we have used synthetic ones obtained from simulations with an ODE model of the HIV infection dynamics. In Section IV, we suggest ways to overcome difficulties that may arise when relying on real-life data rather than numerically generated ones. Section V concludes and the Appendix gathers infor- mation about the mathematical model of HIV dynamics used in the data generation process.
En savoir plus

Optimal release strategies for mosquito population replacement

• or a penalization (more precisely a decreasing function) of the final proportion of Wolbachia-infected mosquitoes at the final time of the experiment. Note that the horizon of time will be considered fixed in this case. This will lead us to introduce two large families of relevant optimization problems in order to model this issue. Analyzing them will allow us to discuss optimal strategies of mosquito releasing and also the robustness of the properties of the solutions with respect to the modeling choices (in particular the choice of the functional we optimize).

Optimal investment strategies for competing camps in a social network: a broad framework

While the linear influence function is consistent with the well-established Friedkin-Johnsen model, the influence of a camp on a node might not increase linearly with the corre- sponding investment. In fact, several social and economic set- tings follow law of diminishing marginal returns, which says that for higher investments, the marginal returns (influence in this context) are lower for a marginal increase in investment. An example of this law is when we watch a particular product advertisement on television; as we watch the advertisement more number of times, its marginal influence on us tends to get lower. A concave influence function naturally captures this law. We study such an influence function in the settings of both unbounded and bounded investment per node, and relate it to the skewness of investment in optimal strategies as well as user perception of fairness. We study this in Section IV.
En savoir plus

Optimal control strategies for the sterile mosquitoes technique

Mosquitoes are responsible for the transmission of many diseases such as dengue fever, zika or chigungunya. One way to control the spread of these diseases is to use the sterile insect technique (SIT), which consists in a massive release of sterilized male mosquitoes. This strategy aims at reducing the total population over time, and has the advantage being specific to the targeted species, unlike the use of pesticides. In this article, we study the optimal release strategies in order to maximize the efficiency of this technique. We consider simplified models that describe the dynamics of eggs, males, females and sterile males in order to optimize the release protocol. We determine in a precise way optimal strategies, which allows us to tackle numerically the underlying optimization problem in a very simple way. We also present some numerical simulations to illustrate our results.
En savoir plus

Optimal individual strategies for influenza vaccines with imperfect efficacy and durability of protection

WITH IMPERFECT EFFICACY AND DURABILITY OF PROTECTION FRANCESCO SALVARANI AND GABRIEL TURINICI Abstract. We analyze a model of agent based vaccination campaign against influenza with imperfect vaccine efficacy and durability of protection. We prove the existence of a Nash equilibrium by Kakutani’s fixed point theorem in the context of non-persistent immunity. Subsequently, we propose and test a novel numerical method to find the equilibrium. Various issues of the model are then discussed, such as the dependence of the optimal policy with respect to the imperfections of the vaccine, as well as the best vaccination timing. The numerical results show that, under specific circumstances, some counter-intuitive behaviors are optimal, such as, for example, an increase of the fraction of vaccinated individuals when the efficacy of the vaccine is decreasing up to a threshold. The possibility of finding optimal strategies at the individual level can help public health decision makers in designing efficient vaccination campaigns and policies.
En savoir plus

Dynamical allocation of cellular resources as an optimal control problem: Novel insights into microbial growth strategies

Discussion Quantitative growth laws are empirical regularities pointing at fundamental properties of microbial life [50]. Recent work has led to the precise theoretical formulation of growth laws and has shown that they can be derived from basic assumptions on the molecular processes responsible for the assimilation of nutrients and their conversion to biomass [11, 13, 15, 17, 18]. The growth laws are uniquely defined under the hypothesis that microorganisms allocate resources in such a way as to maximize their growth rate. Several of the above-mentioned studies have analyzed feedback control strategies on the molecular level enabling cells to achieve optimal resource allocation in a robust manner. The control strategies exploit infor- mation on the physiological state of the cell to adjust the (relative) rate of synthesis of different classes of proteins (ribsomes, metabolic enzymes, . . .). Whereas the growth laws describe microbial growth at steady state, most microorganisms live in complex, continuously chang- ing environments. Despite some precursory work [25, 26], questions about the dynamics of microbial growth remain largely unanswered: Which resource allocation schemes are optimal in changing environments? Which dynamical control strategies lead to (near-)optimal resource allocation? How do these strategies compare with those actually implemented by microorganisms?
En savoir plus