# Haut PDF On Submodular Value Functions of Dynamic Programming ### On Submodular Value Functions of Dynamic Programming

Unite´ de recherche INRIA Lorraine, Technopoˆle de Nancy-Brabois, Campus scientifique, 615 rue du Jardin Botanique, BP 101, 54600 VILLERS LE` S NANCY Unite´ de recherche INRIA Rennes, Ir[r] ### On Aggregators and Dynamic Programming

To sum up, Theorem 4.2 and Theorem 4.1 both give conditions to get the existence of a solution to the Bellman equation. These conditions are how- ever not comparable in the general case, for on the one hand weak continuity is weaker than uniform continuity in v, but on the other hand Theorem 4.2 requires to be able to get an interval [v, v] such that B([v, v]) ⊂ [v, v]. A conceivable difficulty with Theorem 4.2 stems from the actual possibility of identifying a pair of functions v and v which fullfils the assumptions of this theorem. The following class of examples is useful as it introduces a general method that allows for finding these two candidate values. Broadly speaking, it is based upon the idea of a the value function of aggregators that dominates the primitive aggregator A.
En savoir plus ### Dynamic programming for optimal control of stochastic McKean-Vlasov dynamics

the noise W 0 . Then, by reformulating the original control problem into a stochastic control problem where the conditional law P W 0 X t is the sole controlled state variable driven by the random noise W 0 , and by showing the continuity of the value function in the Wasserstein space of probability measures, we are able to prove a dynamic programming principle (DPP) for our stochastic McKean-Vlasov control problem. Next, for exploiting the DPP, we use a notion of differentiability with respect to probability measures introduced by P.L. Lions in his lectures at the Coll`ege de France , and detailed in the notes . This notion of derivative is based on the lifting of functions defined on the Hilbert space of square integrable random variables distributed according to the “lifted” probability measure. By combining with a special Itˆo’s chain rule for flows of conditional distributions, we derive the dynamic programming Bellman equation for stochastic McKean-Vlasov control problem, which is a fully nonlinear second order partial differential equation (PDE) in the infinite dimensional Wasserstein space of probability measures. By adapting standard arguments to our context, we prove the viscosity property of the value function to the Bellman equation from the dynamic programming principle. To complete our PDE characterization of the value function with a uniqueness result, it is convenient to work in the lifted Hilbert space of square integrable random variables instead of the Wasserstein metric space of probability measures, in order to rely on the general results for viscosity solutions of second order Hamilton-Jacobi-Bellman equations in separable Hilbert spaces, see , , . We also state a verification theorem which is useful for getting an analytic feedback form of the optimal control when there is a smooth solution to the Bellman equation. Finally, we apply our results to the class of linear-quadratic (LQ) stochastic McKean-Vlasov control problem for which one can obtain explicit solutions, and we illustrate with an example arising from an interbank systemic risk model.
En savoir plus ### Dynamic programming approach to principal-agent problems

The main contribution of our paper is the following: we provide a systematic method to solve any problem of this sort, including those in which Agent can also control the volatility of the output process, and not just the drift 1 . We first used that method to solve a Principal-Agent problem which had not been solved before 2 in a pre-cursor to this paper, Cvitani´c, Possama¨ı and Touzi , for the special case of CARA utility functions, showing that the optimal contract depends not only on the output value (in a linear way, because of CARA preferences), but also on the risk the output has been exposed to, via its quadratic variation. In the examples section of the present paper, we also show how to solve other problems of this type by our method, problems which had been previously solved by ad hoc methods, on a case-by-case basis. We expect there will be many other applications involving Principal-Agent problems of this type, which have not been previously solved, and which our method will be helpful in solving. 3 The present paper includes all the above cases as special cases (up to some technical considerations), considering a multi-dimensional model with arbitrary utility functions and Agent’s effort affecting both the drift and the volatility of the output, that is, both the return and the risk 4 . Let us also point out that there is no need for any Markovian type assumptions for using our approach, a point which also generalizes earlier results.
En savoir plus ### Approximate dynamic programming with a fuzzy parameterization

Key words: approximate dynamic programming, fuzzy approximation, value iteration, convergence analysis. 1 Introduction Dynamic programming (DP) is a powerful paradigm for solving optimal control problems, thanks to its mild as- sumptions on the controlled process, which can be non- linear or stochastic [3, 4]. In the DP framework, a model of the process is assumed to be available, and the imme- diate performance is measured by a scalar reward sig- nal. The controller then maximizes the long-term per- formance, measured by the cumulative reward. DP al- gorithms can be extended to work without requiring a model of the process, in which case they are usually called reinforcement learning (RL) algorithms . Most DP and RL algorithms work by estimating an optimal value function, i.e., the maximal cumulative reward as a function of the process state and possibly also of the control action. Representing value functions exactly is
En savoir plus ### Value function for regional control problems via dynamic programming and Pontryagin maximum principle

In this paper we focus on regional deterministic optimal control problems, i.e., problems where the dynamics and the cost functional may be different in several regions of the state space and present discontinuities at their interface. Under the assumption that optimal trajectories have a locally finite number of switchings (no Zeno phenomenon), we use the duplication technique to show that the value function of the regional optimal control problem is the minimum over all possible structures of trajectories of value functions associated with classical optimal control problems settled over fixed structures, each of them being the restriction to some submanifold of the value function of a classical optimal control problem in higher dimension. The lifting duplication technique is thus seen as a kind of desingularization of the value function of the regional optimal control problem. In turn, we extend to regional optimal control problems the classical sensitivity relations and we prove that the regularity of this value function is the same (i.e., is not more degenerate) than the one of the higher-dimensional classical optimal control problem that lifts the problem.
En savoir plus ### Approximating Submodular Functions Everywhere

Contributions. The extensive literature on submod- ular functions motivates us to investigate other fun- damental questions concerning their structure. How much information is contained in a submodular func- tion? How much of that information can be obtained in just a few value oracle queries? Can an auctioneer efficiently estimate a player’s utility function if it is sub- modular? To address these questions, we consider the problem of approximating a submodular function f ev- erywhere while performing only a polynomial number of queries. More precisely, the problem we study is:
En savoir plus ### Allowing non-submodular score functions in distributed task allocation

F (s, A 0 ) ≥ F (s, A) (2) ∀A 0 s.t. A 0 ⊂ A Equation (2) roughly means that a particular task cannot increase in value because of the presence of other assign- ments. Although many score functions typically used in task allocation satisfy this submodularity condition (for example the information theory community ), many also do not. It is simple to demonstrate that the distributed greedy multi-assignment problem may fail to converge with a non- submodular score function, even with as few as 2 tasks and 2 agents. Consider the following example, where notation for an agent’s task group is (task ID, task score), and the sequential order added moves from left to right. The structure of this simple algorithm is that each agent produces bids on a set of desired tasks, then shares these with the other agents. This process repeats until no agent has incentive to deviate from their current allocation. In the following examples, the nominal score achieved for servicing a task will be defined as T . The actual value achieved for servicing the task may be a function of other things the agent has already committed to. In the above example,  is defined as some value, 0 <  < T . Example 1: Allocations with a submodular score function
En savoir plus ### Metric Learning with Submodular Functions

Table 5: Accuracy of KNN with different metrics learning algorithm and their running time in seconds. datasets as in the previous experiment, but we make ξ of the ξ-additive varies from 1 to min(10, m). A value of ξ = 1 means that there is no interaction be- tween features, and only singletons are considered. Increasing ξ adds orders of interaction, and finally reaches the order of interaction tackled by LMEL approach without ξ-additive method. It can be seen that each time we decrease ξ, the number of free pa- rameters of f (S) is divided by 2, so that running time of the method is now very reasonable, even for quite large dimensional data. Table 5 also gives the results obtained through a grid search of ξ (last col- umn). Interestingly, one can see that LEML-ξ often gives better results than LEML, showing that using all the m-tuple-wise combinations are not always necessary, and may even penalize the performances (e.g. balance, ionosphere, liver, and sonar).
En savoir plus ### An approximate dynamic programming approach to solving dynamic oligopoly models

Observe that since |X | · |X | · (N + 1) will typically be substantially smaller than |Y|, the use of this approximation architecture makes the linear program in Algorithm 3 a tractable program. For example, in models in which the state space has millions of billions states only thousands of basis functions are required. Note that our selection of basis functions produces the most general possible separable approximation. For each x, j, the function f x j (s) can take different values for each different s ∈ {0, ..., N }. 22 Our selection of basis functions generalizes approximation architectures for which the value function is approximated as a linear combination of moments of the industry state. Moment-based approximations have been previously used in large scale stochastic control problems that arise in macroeconomics (Krusell and Smith 1998) – using a very different approach and algorithm than ours, though. In our computational experiments we observe that moving beyond simple linear combinations of, say, the first two moments of the industry state is valuable: often the simpler architecture fails to produce good approximations to MPE for our computational examples while our proposed architecture does.
En savoir plus ### Air-Combat Strategy Using Approximate Dynamic Programming

Cambridge, MA, 02139 Unmanned Aircraft Systems (UAS) have the potential to perform many of the dangerous missions currently flown by manned aircraft. Yet, the complexity of some tasks, such as air combat, have precluded UAS from successfully carrying out these missions autonomously. This paper presents a formulation of a level flight, fixed velocity, one-on-one air combat ma- neuvering problem and an approximate dynamic programming (ADP) ap- proach for computing an efficient approximation of the optimal policy. In the version of the problem formulation considered, the aircraft learning the optimal policy is given a slight performance advantage. This ADP approach provides a fast response to a rapidly changing tactical situation, long plan- ning horizons, and good performance without explicit coding of air combat tactics. The method’s success is due to extensive feature development, reward shaping and trajectory sampling. An accompanying fast and effec- tive rollout based policy extraction method is used to accomplish on-line implementation. Simulation results are provided that demonstrate the ro- bustness of the method against an opponent beginning from both offensive
En savoir plus ### A CMOS Current-Mode Dynamic Programming Circuit

In conventional DP computation, the Bellman’s equation  is evaluated iteratively and sequentially and such time-con- suming sequential iterations result in substantial computational delay. The notion of “curse of dimensionality” as coined by Bellman in  refers to the vast computational effort required for the numerical solution of Bellman’s equation when there is a large number of state variables that are subjected to the optimization objective function. Both the computational delay and hardware resources requirements grow explosively when the problem size increases. To accelerate the DP computa- tion, a first-order ordinary differential equation (ODE) system was proposed by Lam and Tong  and can be employed to transform the sequential DP algorithm into a continuous-time parallel computational network which enables high-speed convergence for Bellman’s optimality criterion. Here, a CMOS current-mode analog circuit is presented to provide a highly portable and low-power implementation of the proposed net- work architecture. Detailed analysis on computational speed, power consumption and network convergence is presented. Realization of a circuit with a reasonable size is demonstrated to exemplify the design principles. A procedure to test and validate the fabricated circuit is discussed. We have also inves- tigated the error models and the results lead to a compensation scheme whereby the errors due to nonideal current source and device mismatch are minimized.
En savoir plus ### Branch-and bound strategies for dynamic programming

in the state space Q in low-speed (tape, disk) computer storage. It is common knowledge that in real problems excessive high-speed storage requirements.. can present a serious implementa[r] ### Branch-and-bound strategies for dynamic programming

Let ^ be an upper bound on the objective function value of any optimal solution to the original discrete optimization problem 9, Then, since .^ is a representation of 9.. it follows that[r] ### Dynamic Programming for Mean-Field Type Control

the extension of the HJB dynamic programming approach. The second difficulty is related to the well-posedness of the HJB or adjoint equation because it is set in an infinite domain in space. The third difficulty is the lack of proof of convergence of the two algorithms suggested here, an HJB-based fixed point and a steepest descent based on calculus of variations. When it converges the fixed point method is preferred but there is no known way to guarantee convergence even to a local minimum; as for the steepest descent we found it somewhat hard to use, mainly because it generates ir- regular oscillating solutions; some bounds on the derivatives of u need to be added in penalised form in the criteria. Numerically both algorithms are cursed by the asymp- totic behaviour of the solution of the adjoint state at infinity. So, when possible, the Riccati semi-analytical feedback solution is the best. Finally, but this applies only to this toy problem, the pure feedback solution is nearly optimal, easy to compute and stable. Note also that this semi-analytical solution is a validation test, since it has been recovered by both algorithms.
En savoir plus ### Optimal routing configuration clustering through dynamic programming

2 Clustered Robust Routing problem In this section we briefly present the CRR problem [SFC + 18] and introduce some key notations. Given a network topology and a sequence of traffic matrices (TMs), which might represent the evolution over time of end-to-end connections (i.e., demands), the CRR problem consists in splitting the sequence into smaller clusters of contiguous TMs with a single routing configuration applied to each cluster. Since a single routing configuration is applied to all TMs of the cluster, the goal is to find the clusters and corresponding routing configurations that minimize the worst deviation with respect to Dynamic-TE.
En savoir plus ### Local minimization algorithms for dynamic programming equations

which requires a parametric maximization (minimization) over the control set U . For its discrete analogue G(V ), a common practice in the literature is to compute the min- imization by comparison, i.e., by evaluating the expression in a finite set of elements of U (see for instance [1, 12, 17] and references therein). In contrast to the com- parison approach, the contribution of this paper is to demonstrate that an accurate realization of the min-operation on the right hand side of (1.3) can have an impor- tant impact on the optimal controls that are determined on the basis of the dynamic programming principle. In this respect, the reader can take a preview to Figure 4.2, where differences between optimal control fields obtained with different minimization routines can be appreciated. Previous works concerning the construction of minimiza- tion routines for this problem date back to , where Brent’s algorithm is proposed to solve high dimensional Hamilton-Jacobi -Bellman equations and to , where the authors consider a fast semi-Lagrangian algorithm for front propagation problems. In this latter reference, the authors determine the minimizer of a specific Hamiltonian by means of an explicit formula. Moreover, for local optimization strategies in dynamic programming we refer to  for Brent’s algorithm and to  for a Bundle Newton method.
En savoir plus ### Risk-aware decision making and dynamic programming

X t=0 β t r(x t , u t , x t+1 ), (1) with β ∈ (0, 1) the discount factor. In finite horizon problems the sum is truncated and β ∈ (0, 1]. The random variable depends on the initial distribution q and on the policy π of the agent. The superscripts q, π stress that dependence but are omitted in the sequel when the context is clear. The agent applies a Stochastic Markov policy π ∈ Σ. Stochastic policies are often relevant in a risk-sensitive context: see Section 4.1. The policy may be time-varying. Even if the process is observable, in many risk-sensitive setups, it is necessary to extend the state space and define auxiliary variables (see Section 2.2). Therefore, let I refer to such an extended state space, and let I be identified to the information space of the agent. The random selection of an action in A according to a distribution conditioned on the information state i t ∈ I is written
En savoir plus ### Adaptive aggregation methods for discounted dynamic programming

(The smoothing of the error after a single successive approximation step in this example is a coincidence. In general, several successive approximation steps will be requir[r] ### Dynamic Programming for Mean-field type Control

Based on the variation of J with respect to u we have used 100 iterations of a gradient method with fixed step size, ω = 0.3. The parameters of the problem are T = 2, h = 0.1, θ = 1, σ 2 = 2 3 θ and ρ 0 , ρ T are as in . The numerical method for the PDEs is a centered space-time finite element method of degree 1 on a mesh of 150 points and 40 time steps.