On the time discretization of stochastic optimal control problems: the dynamic programming approach

(1)

HAL Id: hal-01474285

https://hal.inria.fr/hal-01474285v2

Submitted on 7 Mar 2018

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

On the time discretization of stochastic optimal control problems: the dynamic programming approach

Joseph Frédéric Bonnans, Justina Gianatti, Francisco Silva

To cite this version:

Joseph Frédéric Bonnans, Justina Gianatti, Francisco Silva. On the time discretization of stochastic optimal control problems: the dynamic programming approach. ESAIM: Control, Optimisation and Calculus of Variations, EDP Sciences, 2019, �10.1051/cocv/2018045�. �hal-01474285v2�

(2)

PROBLEMS: THE DYNAMIC PROGRAMMING APPROACH

J. FR ´ED ´ERIC BONNANS, JUSTINA GIANATTI, AND FRANCISCO J. SILVA

Abstract. In this work we consider the time discretization of stochastic optimal control problems. Under general assumptions on the data, we prove the convergence of the value functions associated with the discrete time problems to the value function of the original problem. More- over, we prove that any sequence of optimal solutions of discrete problems is minimizing for the continuous one. As a consequence of the Dynamic Programming Principle for the discrete problems, the minimizing sequence can be taken in discrete time feedback form.

1. Introduction

Stochastic optimal control problems in continuous time have been extensively studied during the last decades. This important area of research has a wide range of applications, such as in economy, mathematical finance and engineering. Usually, there are two approaches to deal with these problems. The first one is related to the Bellman’s Dynamic Programming Principle (DPP), which allows to characterize the value function as the unique viscosity solution of the associated Hamilton-Jacobi-Bellman (HJB) equation [21, 22, 23]. The second one is the variational approach, which deals with extensions of the Pontryagin maximum principle [26] to the stochastic framework.

For a detailed account of the theory and historical remarks we refer the reader to the classical monographs [30, 19, 14].

Almost independently, another active field of research in the last decades has been the optimal control of discrete time processes and general state space. In that framework, controls at timek (also called policies) are probability measures on the actions space, which depend on the history of states at time kand the chosen actions up to timek−1. Given a control, an action is chosen according to the probability measure associated with this control and this fixes the transition probability function between the states at time k and k+ 1. The literature on this subject is too extensive and we refer the reader to the classical monographs [12, 3, 15, 27, 31], and the references therein, for a comprehensive presentation and historical remarks. Generally speaking, the assumptions in this theory are rather general and a common theme is the investigation of existence of optimal (or ε-optimal) Markov policies, i.e. the chosen control at time k depends only on the value of the state at timek.

In this work, we consider a continuous time stochastic optimal control problem with deterministic coefficients and a finite horizonT >0. Given a time grid with diameter h >0, we study its natural time discretization. While we consider only uniform grids, our analysis is easily extended for general time grids. The state equation is discretized with the classical stochastic explicit Euler scheme. Since at the continuous level we consider the strong formulation for the state equation (see [30, Chapter 2, Section 4.1]), i.e. the control acts pathwise on the state on a fixed probability space, it is natural to consider at the discrete level a similar formulation. In this case, the controls are assumed to be adapted to the filtration generated by the increments of the Brownian motion on the time grid. In this sense and similarly to the continuous time case (see [30, Chapter 2, Section 4.1 and Section 4.2]) our formulation is more specific than the one described in the

Date: March 2, 2018.

2010Mathematics Subject Classification. Primary: 93E20, 49L20; Secondary: 90C15, 93C55.

Key words and phrases. Stochastic Control, Discrete Time Systems, Dynamic Programming Principle, Value Function, Feedback Control.

The first and second author thank the Laboratoire de Finance des March´es de l’Energie for its support. The first and third authors thank the support from project iCODE :“Large-scale systems and Smart grids: distributed decision making”and from the Gaspar Monge Program for Optimization and Operation Research (PGMO). The second author was also supported by Centro Internacional Franco Argentino de Ciencias de la Informaci´on y de Sistemas (CIFASIS).

1

(3)

previous paragraph, which is more related to theweakformulation of the continuous problem (see [30, Chapter 2, Section 4.2]).

The study of the discrete time case arises from different objectives. For instance, it can be used to prove the existence of optimal controls for the continuous time problem, as the limit of optimal discrete time policies (see e.g. [9] and [20]). Another application is to derive the DPP for the continuous time problem as a consequence of this property in the discrete time case (see e.g.

[19] and [25]). We point out that in [19, 17] and [25], given a discrete time control the associated state solves the continuous time stochastic differential equation and so the state is not discrete in time. Finally, discrete time problems appear naturally as the first step in obtaining a numerical approximation of the continuous time problem, the second step being the discretization of the state space (see e.g. [20]) or the resolution by Monte Carlo methods.

The simplicity of our pathwise formulation and the regularity of the coefficients defining the continuous problems, which, as we will see, yields the continuity of the optimal cost as a function of the initial state, allow us to simplify the proof of the DPP for the discrete time problem by arguing as in [5]. Thus, we do not have to deal with delicate measurability issues as in [3]. Although we consider controls adapted to the filtrations generated by the increments of the Brownian motion, a consequence of the DPP is the existence of discrete time optimal feedback (or Markov) controls.

This important property in the discrete time case is in contrast with the analogous property in the continuous time case, where the existence of an optimal feedback control can be assured only in exceptional cases (see [30, Chapter 5, Section 6] and Remark 3.7). In some sense, this is similar to the existence issues for continuous time Stochastic Differential Equations (SDEs), where the Euler scheme is always well-posed even when the continuous time SDEs does not admit explicit solutions.

We study several properties of the discrete time value functions V^h, which are analogous to their continuous time counterparts, such as Lipschitz continuity and semiconcavity with respect to the state variable on bounded sets. When extended by linear interpolation to the entire interval [0, T], we prove thatV^his H¨older continuous in time, on bounded sets of the space variable. Using an approximation result by Krylov (see [19, Lemma 6, Section 3.2, p.143]), we also prove with a direct approach the local uniform convergence of V^h toV, the value function of the continuous time problem. Since we work under quite general assumptions, this convergence result is more general than those already proved in [9], where under stronger assumptions weak convergence to a feedback control of the continuous time problems is shown, and [14, Chapter 9]. Probably, the convergence of the value functions can also be proved by using analytical methods based on viscosity solution theory (see for instance [7] and [8] for the deterministic case and [11], [2]

and [14, Chapter 9] for the stochastic one), however, our direct approach allows us to prove the important fact that optimal (or ε-optimal) discrete time controls form a minimizing sequence for the continuous time problem. In particular, there always exists a minimizing sequence of discrete time optimal feedback controls. In addition, under some convexity (strong convexity) assumptions, we obtain the weak (strong) convergence of the discrete time optimal controls to a solution of the original problem. In this general framework, we have not established error estimates for the discrete value functions. We refer the reader to [17] and [18] where, under additional assumptions, the author tackles this problem.

The paper is organized as follows. In Section 2 we state the continuous time and discrete time problems and our main assumptions. Also, we provide some technical and fundamental estimates, which are proved in the Appendix, relating the continuous time and discrete time states associated with piecewise constant controls. In Section 3 we prove the DPP for the discrete time problem and show the existence of feedback optimal controls. The continuity of the discrete value function plays an important role here and simplifies its proof. Next, in Section 4 we prove several regularity properties ofV^h, which are analogous to those of V. Finally, in Section 5 we prove in Theorem 5.2 the local uniform convergence of V^h to V and that the sequence of discrete time optimal controls is a minimizing sequence of the continuous problem. Under some convexity assumptions, the convergence of this sequence to an optimal control of the continuous problem is also shown.

2. Preliminaries We begin by defining the problems we are interested in.

(4)

2.1. Continuous time problem. Let (Ω,F,P) be a probability space on which anm-dimensional standard Brownian motion W(·) is defined. For every s∈ [0, T], we set F^s = {F_t^s}_t∈[s,T_] where fort∈[s, T], F_t^s is the completion of σ(W(r)−W(s) :s≤r ≤t) byP-null sets of F.

Lets∈[0, T] andx∈Rⁿ, we consider the following controlled Stochastic Differential Equation (SDE):

(1)

( dyu^s,x(t) = f(t, y^s,xu (t), u(t))dt+σ(t, y^s,xu (t), u(t))dW(t), t∈]s, T[, y^s,xu (s) = x,

wheref : [0, T]×Rⁿ×R^r →Rⁿand σ: [0, T]×Rⁿ×R^r→R^n×m are given maps. In the notation aboveyu^s,x∈Rⁿ denotes thestate functionandu∈R^r thecontrol. We define thecost functional

(2) J^s,x(u) :=E

Z T s

`(t, y_u^s,x(t), u(t))dt+g(y_u^s,x(T))

,

where`: [0, T]×Rⁿ×R^r →Randg:Rⁿ→Rare given maps. A precise definition of the control space, i.e. the domain of the functional J^s,xin (2), and assumptions over the data ensuring that yu^s,x is well defined will be given in the next sections.

Let U_ad be a non-empty compact subset ofR^r and define (3) U_ad^s :=

u∈(H²_F^s)^r :u(t, ω)∈U_ad, for almost all (a.a.) (t, ω)∈[s, T]×Ω , where

(4) H²_F^s :=

v∈L²([s, T]×Ω) : the process (t, ω)∈[s, T]×Ω7→v(t, ω) is F^s-adapted , and it is endowed with the L²([s, T]×Ω) norm.

Then, for fixed s∈[0, T] and x∈Rⁿ the control problem that we consider is (Ps,x) inf J^s,x(u) subject tou∈ U_ad^s .

The value functionof the continuous problem V : [0, T]×Rⁿ→Ris defined as

(5) V(s, x) := inf

u∈U_ad^s J^s,x(u).

2.2. Discrete time problem. Let us introduce a discrete time approximation of the above problem. Given N ∈N\ {0}, define h :=T /N. Let us set t_k =kh (k = 0, . . . , N) and consider the sequence of independent and identically distributed (i.i.d.) m-valued random vectors, defined as ∆W_k+1 :=W(t_k+1)−W(t_k), fork= 0, ..., N−1 and ∆W₀ := 0 in Ω. Then,E(∆W_k) = 0 and E(∆W_kⁱ¹∆W_kⁱ²) =hδ_i₁_i₂ (where δ_i₁_i₂ = 1 if i₁ =i₂ and δ_i₁_i₂ = 0 otherwise). For eachk= 0, ..., N we consider the filtration{F_j^k}^N_j=k, where

(6) F_k^k :={∅,Ω} and F_j^k:=σ(∆W_k⁰ ; k+ 1≤k⁰ ≤j) for allj∈ {k+ 1, ..., N}.

Let us set L²_Fk j

:=L²(Ω,F_j^k,P). For k= 0, . . . , N−1, we consider the admissible controlsets (7) U_k^h :={u= (u_k, ..., uN−1)∈Π^N−1_j=kL²_Fk

j

: u_j(ω)∈U_ad P-a.s., j=k, ..., N −1}, where Π^N−1_j=k L²_Fk

j

is endowed with the normkuk²

U_k^h :=hPN−1

j=k E|u_j|².

Givenu∈ U_k^h,x∈Rⁿandk= 0, . . . , N−1, we recursively define thestatey_j^k,x,u,j=k, . . . , N, associated withu,x andk as

(8)

( y_j+1^k,x,u = y_j^k,x,u+hf(tj, y^k,x,u_j , uj) +σ(tj, y_j^k,x,u, uj)∆Wj+1, j=k, . . . , N−1, y_k^k,x,u = x.

Note that, under the assumption (H1)below, y_j^k,x,u∈L²_Fk j

, for all j=k, . . . , N.

Finally, we associate to each u∈ U_k^h thecost function,

(9) J_k^h(x, u) :=E



h

N−1

X

j=k

`(tj, y_j^k,x,u, uj) +g(y^k,x,u_N )



.

(5)

Then, for fixed k andx, the discrete time control problem that we will consider is (P_k,x^h ) inf J_k^h(x, u) subject to u∈ U_k^h.

In this case, the value function{V_k^h:Rⁿ→R|k= 0, . . . , N}is defined over Rⁿ as (10) V_N^h(x) :=g(x), and V_k^h(x) := inf

u∈U_k^hJ_k^h(x, u) fork= 0, . . . , N−1.

The main result of this work is to prove the convergence of an extension ofV_(·)^h(·) to [0, T]×Rⁿ to the value function V. In the next section we introduce the main assumptions in this work.

2.3. Assumptions. We present the hypothesis that we consider in this paper.

(H1) Assumptions on the dynamics:

(a) The maps ϕ=f, σ areB([0, T]×Rⁿ×R^r) measurable.

(b) For almost allt∈[0, T] the map (y, u)7→ϕ(t, y, u) is C¹ and there exists a constant L₁ >0 such that for almost all t∈[0, T] and for ally∈Rⁿ andu∈U_ad we have (11)

( |ϕ(t, y, u)| ≤L₁[|y|+|u|+ 1],

|ϕ_y(t, y, u)|+|ϕ_u(t, y, u)| ≤L1,

where ϕy(t, y, u) :=Dyϕ(t, y, u) andϕu(t, y, u) :=Duϕ(t, y, u).

(c) There exists an increasing modulus of continuityω1 : [0,+∞[→[0,+∞[ such that for ϕ=f, σ, and for ally ∈Rⁿ,u∈U_ad,s, t∈[0, T] we have

(12) |ϕ(s, y, u)−ϕ(t, y, u)| ≤ω1(|s−t|).

(H2) Assumptions on the cost:

(a) The maps `and g are respectivelyB([0, T]×Rⁿ×R^r) and B(Rⁿ) measurable.

(b) For almost all t ∈ [0, T] the map (y, u) 7→ `(t, y, u) is C¹, and there exists L₂ > 0 such that for ally ∈Rⁿ and u∈Uad,

(13)

( |`(t, y, u)| ≤L2[|y|+|u|+ 1]²,

|`_y(t, y, u)|+|`_u(t, y, u)| ≤L₂[|y|+|u|+ 1], where `y(t, y, u) :=Dy`(t, y, u) and `u(t, y, u) :=Du`(t, y, u).

(c) There exists an increasing modulus of continuityω2 : [0,+∞[→[0,+∞[ such that for for all y∈Rⁿ,u∈U_ad and s, t∈[0, T] we have

(14) |`(s, y, u)−`(t, y, u)| ≤ω2(|s−t|).

(d) The mapy7→g(y) isC¹ and there exists L₂ >0 such that for all y∈Rⁿ, (15)

( |g(y)| ≤L₂[|y|+ 1]²,

|∇g(y)| ≤L2[|y|+ 1].

In order to keep the notation as simple as possible, we define L := max{L₁, L₂} and ω :=

max{ω₁, ω2}.

Remark 2.1. Under assumption (H1), for any u ∈ U_ad^s the state equation (1) admits a unique strong solution, see the proof of[24, Proposition 2.1]. For the sake of completeness, we recall the following particular instance of[24, Proposition 2.1] which is valid under our assumption (H1).

Proposition 2.1Assume that(H1)holds true. Then for anyu∈ U_ad⁰ the continuous times state equation (1) admits a unique strong solution y∈L²(Ω;C([0, T];Rⁿ)), and the following estimate holds:

(16) E

"

sup

s∈[0,t]

|y(s)|²

#

≤CE

"

|x|²+ Z t

0

|f(s,0, u(s))|ds 2

+ Z t

0

|σ(s,0, u(s))|²ds

# .

(6)

Moreover, for any γ ≥1, there exists C_γ >0 such that

(17) E

"

sup

s∈[0,t]

|y_u^0,x(s)−y_u^0,¯^x(s)|^γ

#

≤Cγ|x−x|¯^γ.

Under assumptions (H1) and (H2) we can prove the main result presented in Section 5 (Theorem 5.2), which gives the convergence of the discrete value function to the continuous one.

The contribution of this work in the context of the existing literature is that our assumptions are rather general, see for instance [11] where the coefficient are bounded. Moreover, using the DPP, proved in Section 3, we obtain the existence of feedback discrete optimal controls which will be shown to form a minimizing sequence for the continuous problem.

We end this section by providing some technical results which will be needed in the next sections.

2.4. Estimates on the states of the discrete and continuous formulations. We begin this section by providing some estimates for the discrete state and then for the difference between two discrete states with different initial state. Finally, we also prove some estimates for the difference between the discrete and continuous time states associated with the same discrete time control, which is extended to a piecewise constant control at each interval [t_k, t_k+1[. The proofs of these results will be given in the Appendix.

In order to keep the notation as simple as possible, we assume that the initial time is k = 0 and the initial state xis fixed, and we omit these indexes in the state.

We start by providing an estimate of the norm of the state in terms of the control and the initial condition. For a fixed h, let u^h = (uk)^N_k=0⁻¹ ∈ U₀^h be a given discrete control and define the associated discrete statey^h = (yk)^N_k=0 as the solution of (8).

Let us underline that all the constants involved in the following results are independent of h.

Lemma 2.2. Assume that(H1) holds. Then, there exists C >0, such that

(18) E

k=0,...,Nmax |y_k|²

≤Ch

|x|²+ku^hk²_U_h

0

+ 1i .

Proof. See the Appendix.

Lemma 2.3. Let (H1) holds. Then, for every p ≥ 2, there exists Cp > 0, such that for all x, y∈Rⁿ, we have

(19) E

max

k=0,...,N

y^x_k−y_k^y

p

≤C_p|x−y|^p,

where (y_k^x)^N_k=0 and (y_k^y)^N_k=0 are the solutions of (8) associated with the control u^h and the initial states x and y, respectively.

Let u^h_c ∈H²_F be defined as u^h_c(t) :=u_k for all t∈[t_k, t_k+1) and y be the associated continuous state, i.e. the unique solution of

(20)

dy(t) = f(t, y(t), u^h_c(t))dt+σ(t, y(t), u^h_c(t))dW(t), t∈]0, T[, y(0) = x.

The existence and uniqueness of solution associated with u^h_c is guaranteed by Remark 2.1. Now, we are going to analyse the relationship between y, the solution of (20) associated with u^h_c, and y^h, the solution of (8) associated withu^h. Note that ku^h_ck

H²

F0 =ku^hk_Uh 0.

Lemma 2.4. Assume that (H1) holds true. Then, there exists C > 0 such that for all k = 0, ..., N −1,

(21) E

"

sup

t_k≤t<t_k+1

|y(t)−y_k|²

#

≤Chh

|x|²+ku^hk²_U_h

0

+ 1i

+Cω²(h).

(7)

3. Discrete Time Dynamic Programming Principle

Throughout this section we consider h =T /N fixed. Our goal in this section is to prove the DPP for problem (P_k,x^h ) parameterized by the initial discrete timekand the initial statex. Besides its own importance, the DPP implies the existence of discrete time optimal feedback controls, and it is also useful to prove the convergence of the discrete value functions to the continuous one, as we will see in Section 5.

Our aim is to prove the following DPP for {V_k^h : k= 0, . . . , N}

(22) V_k^h(x) = inf

u∈U_ad

n

h`(t_k, x, u) +E h

V_k+1^h (x+hf(t_k, x, u) +σ(t_k, x, u)∆W_k+1)io ,

for all k= 0, ..., N −1, where we recall that{V_k^h : k= 0, . . . , N} was defined in (10). Note that (H2), the compactness of U_ad and Lemma 2.2 imply that V_k^h(x) ∈ R for all k = 0, . . . , N and x∈Rⁿ.

We will need the following result which proves that the discrete value function is Lipschitz continuous with respect to the state variable, on bounded sets.

Lemma 3.1. Assume that(H1) and (H2) hold. Then, there exists a constant C >0 (independent of h), such that for all x, y∈Rⁿ, and for all u∈ U_k^h, we have

(23)

J_k^h(x, u)−J_k^h(y, u)

≤C[|x|+|y|+ 1]|x−y|, ∀k= 0, . . . , N−1.

As a consequence, (24)

V_k^h(x)−V_k^h(y)

≤C[|x|+|y|+ 1]|x−y| ∀k= 0, . . . , N.

Proof. By notational convenience, we omit the indexesk and u in the states. We have for fixed kand u∈ U_k^h

(25) |J_k^h(x, u)−J_k^h(y, u)| ≤E



h

N−1

X

j=k

|`(t_j, y^x_j, u_j)−`(t_j, y^y_j, u_j)|+|g(y_N^x)−g(y^y_N)|



.

By (H2)we have for allk≤j ≤N−1, (26)

E

h|`(t_j, y_j^x, u_j)−`(t_j, y_j^y, u_j)|i

≤ E hR1

0 |`_y(t_j, y^x_j +s(y_j^y−y_j^x), u_j)(y_j^y−y_j^x)|dsi

≤ E hR₁

0[L(1 +|y_j^x|+s|y_j^y−y_j^x|+|u_j|)|y_j^y−y_j^x|]dsi

≤ LE h

|y^y_j −y_j^x|+|y^x_j||y_j^y−y_j^x|+|y^y_j −y^x_j|²+|u_j||y^y_j −y^x_j|i . Since the set U_ad is compact, it is bounded by a positive constant M_U, and so by the Cauchy- Schwarz inequality, Lemma 2.2 and Lemma 2.3 there exists C₀ >0 such that

(27)

E h

|y^y_j −y^x_j|i

≤

E[|y_j^y−y_j^x|²] ¹₂

≤C0|x−y|, E

h|y^x_j||y_j^y−y_j^x|i

≤

E[|y^x_j|²]¹₂

E[|y_j^y−y^x_j|²]¹₂

≤C₀

|x|²+T M_U² + 1|¹₂

|x−y|, E

h

|u_j||y^y_j −y^x_j|i

≤ E[|u_j|²]¹₂

E[|y^y_j −y^x_j|²] ¹

2 ≤C0MU|x−y|, E

h

|y^y_j −y^x_j|²i

≤C₀|x−y|² ≤C₀[|x|+|y|]|x−y|.

We conclude that there exists C1>0 such that,

(28) E

h|`(t_j, y_j^x, u_j)−`(t_j, y_j^y, u_j)|i

≤C₁[|x|+|y|+ 1]|x−y|.

Analogously we can deduce that

(29) E

|g(y_N^x)−g(y^y_N)|

≤C₁[|x|+|y|+ 1]|x−y|.

(8)

Since C₁ in (28) does not depend onj, we conclude that

(30) |J_k^h(x, u)−J_k^h(y, u)| ≤ (T + 1)C1[|x|+|y|+ 1]|x−y|, and (23) follows.

Since the set U_k^h is bounded, and the constantC1 obtained is independent ofu∈ U_k^h, relation (24) easily follows from the inequality

(31) |V_k^h(x)−V_k^h(y)| ≤ sup

u∈U_k^h

|J_k^h(x, u)−J_k^h(y, u)|.

In order to prove the DPP (22), for all x∈Rⁿ we define

(32) S_k^h(x) := infu∈U_ad

n

h`(t_k, x, u) +E h

V_k+1^h (y_k+1^k,x,u)io

, ∀k= 0, . . . , N−1, S_N^h(x) :=g(x),

where,

(33) y_k+1^k,x,u =x+hf(tk, x, u) +σ(tk, x, u)∆Wk+1.

Lemma 3.2. For all x ∈ Rⁿ, and for all k = 0, ..., N, the map u ∈ U_ad 7→ E[V_k+1^h (y^k,x,u_k+1 )] is Lipschitz continuous. As a consequence, S_k^h(x) ∈R for all x ∈ Rⁿ, and there exists u_k,x ∈ U_ad where the minimum in the first relation in (32) is attained.

Proof. Letu, v∈U_ad, then by(H1) and the Itˆo isometry, (34)

E h

|y^k,x,u_k+1 −y_k+1^k,x,v|²i

≤ E

|h(f(tk, x, u)−f(tk, x, v)) + (σ(tk, x, u)−σ(tk, x, v))∆Wk+1|²

≤ 2h²E

|f(t_k, x, u)−f(t_k, x, v)|²

+ 2hE

|σ(t_k, x, u)−σ(t_k, x, v)|²

≤ 2h²L²|u−v|²+ 2hL²|u−v|².

Thus, by Lemma 2.2, Lemma 3.1 and the Cauchy-Schwarz inequality, there exist C₀ > 0 and C >0 such that,

(35) E

h

V_k+1^h (y_k+1^k,x,u)−V_k+1^h (y^k,x,v_k+1 )i

≤ E h

|C₀[|y^k,x,u_k+1 |+|y^k,x,v_k+1 |+ 1]|y_k+1^k,x,u−y_k+1^k,x,v|i

≤

E[|C₀[|y_k+1^k,x,u|+|y_k+1^k,x,v|+ 1]|²]¹₂

E[|y_k+1^k,x,u−y^k,x,v_k+1 |²]¹₂

≤ C[|x|+ 1]|u−v|.

By (35), we conclude that U_ad 3u 7→E[V_k+1^h (y_k+1^k,x,u)] is locally Lipschitz continuous, and, hence, using (H2) and that U_ad is compact, we get that S_k^h(x) is finite and the minimum in (32) is

attained.

Lemma 3.3. Under assumptions (H1) and (H2), there exists C > 0 (independent of h), such that for all x, y∈Rⁿ, andk= 0, . . . , N−1,

(36)

S_k^h(x)−S_k^h(y)

≤C[|x|+|y|+ 1]|x−y|.

Proof. By the definition ofS_k^h and Lemma 3.1, there exists C0 >0 such that (37)

|S_k^h(x)−S_k^h(y)| ≤ sup_u∈U_ad n

h|`(t_k, x, u)−`(t_k, y, u)|+E h

|V_k+1^h (y^k,x,u_k+1 )−V_k+1^h (y^k,y,u_k+1 )|io

≤ hC₀[|x|+|y|+ 1]|x−y|+E h

C₀[|y^k,x,u_k+1 |+|y^k,y,u_k+1 |+ 1]|y^k,x,u_k+1 −y^k,y,u_k+1 |i . By the Cauchy-Schwarz inequality, we obtain

(38) E

h

[|y_k+1^k,x,u|+|y_k+1^k,y,u|+ 1]|y_k+1^k,x,u−y_k+1^k,y,u|i

≤

E[[|y_k+1^k,x,u|+|y_k+1^k,y,u|+ 1]²]¹₂

E[|y_k+1^k,x,u−y_k+1^k,y,u|²]¹₂ .

(9)

By Lemma 2.2 and Lemma 2.3 we conclude that there exists C₁ >0, independent ofu, because U_ad is bounded, such that

(39)

E

h

[|y^k,x,u_k+1 |+|y^k,y,u_k+1 |+ 1]² i¹

2 E

h

|y_k+1^k,x,u−y_k+1^k,y,u|²i¹

2 ≤C1[|x|+|y|+ 1]|x−y|.

Combining (37) and (39), we deduce that (36) holds true.

Now we prove the DPP. Since V_k^h is continuous, we can directly prove the result following the arguments in [5] without needing to embed our problem in the general framework of [3].

Theorem 3.4 (DPP). Under assumptions (H1) and (H2), for all x∈Rⁿ we have (40) V_k^h(x) =S_k^h(x) ∀k= 0, ..., N.

Proof. We start by proving thatV_k^h(x)≥S_k^h(x). Letu= (u_k, ..., uN−1) be any element ofU_k^h. By the definition of the setU_k^h, we can write uj forj ∈ {k+ 1, ..., N −1}, as a measurable function of the increments of the Brownian motion, i.e. uj(∆W_k+1, ...,∆Wj). For fixed ∆ω_k+1 ∈R^m and j=k+ 1, . . . , N−1, we define the maps

(41) uˆj(∆ω_k+1) :ω∈Ω7→uj(∆ω_k+1,∆W_k+2(ω), ...,∆Wj(ω)).

Setting û(∆ω_k+1) = (û_k+1(∆ω_k+1), . . . ,uˆ_N−1(∆ω_k+1)), we obtain û(∆ω_k+1) ∈ U_k+1^h . Then, we can also define,

(42) yˆ_k+1(∆ω_k+1) :=x+hf(t_k, x, u_k) +σ(t_k, x, u_k)∆ω_k+1, which is deterministic, and forj=k+ 2, . . . , N,

(43) yˆ_j(∆ω_k+1) :ω∈Ω7→y^k+1,ˆ_j ^y^k+1^(∆ω^k+1^),ˆ^u(∆ω^k+1^)(ω)(ω).

Then, by the independence of the increments of a Brownian motion, we have (44)

E h

hPN−1

j=k+1`(t_j, y_j^k,x,u, u_j) +g(y_N^k,x,u)i

= R

R^mE h

hPN−1

j=k+1`(tj,yˆj(∆ωk+1),uˆj(∆ωk+1)) +g(ˆyN(∆ωk+1)) i

dP∆Wk+1(∆ωk+1), whereP∆Wk+1 is the measure induced fromPby ∆W_k+1 on R^m (see [1, Section 4.13]). Thus, (45)

E h

hPN−1

j=k+1`(t_j, y^k,x,u_j , u_j) +g(y^k,x,u_N )i

=R

R^m(h`(tk+1,yˆk+1(∆ωk+1),uˆk+1(∆ωk+1)) +E

h

hPN−1

j=k+2`(t_j,yˆ_j(∆ω_k+1),uˆ_j(∆ω_k+1)) +g(ˆy_N(∆ω_k+1))i

dP∆Wk+1(∆ω_k+1)

≥R

R^mV_k+1^h (ˆy_k+1(∆ω_k+1))dP∆Wk+1(∆ω_k+1)

=E

V_k+1^h (x+hf(t_k, x, u_k) +σ(t_k, x, u_k)∆W_k+1) . Since u_k∈L²_Fk

k

and F_k^k ={∅,Ω}, for allu= (u_k, ..., u_N−1)∈ U_k^h we have

(46)

`(tk, x, uk) +E h

hPN−1

j=k+1`(tj, y^k,x,u_j , uj) +g(y_N^k,x,u) i

≥`(tk, x, uk) +E

V_k+1^h (x+hf(tk, x, uk) +σ(tk, x, uk)∆Wk+1)

≥S_k^h(x).

Minimizing w.r.t. u∈ U_k^h in the l.h.s. we deduce

(47) V_k^h(x)≥S_k^h(x).

We next prove the converse inequality by an induction argument. It is clear by the definitions that

(48) V_N^h(x) =S_N^h(x) and V_N−1^h (x) =S_N−1^h (x), ∀x∈Rⁿ.

(10)

Now, letεbe a positive number and for eachx∈Rⁿletδ_x>0 be such thatC[|x|+|y|+1]|x−y|< ^ε₂ for ally:|x−y|< δ_x, where Cis the maximum of the constants given in Lemma 3.1 and Lemma 3.3. Then, for all k= 0, ..., N andu∈ U_k^h,

(49) maxn

|J_k^h(x, u)−J_k^h(y, u)|,|V_k^h(x)−V_k^h(y)|,|S_k^h(x)−S_k^h(y)|o

< ε

2, ∀y:|x−y|< δ_x. SinceRⁿis a Lindel¨of space, i.e. every open cover has a countable subcover, there exists a sequence (ξ_i)i∈N⊂Rⁿ such thatRⁿ=S∞

i=1B(ξ_i, δ_ξ_i). In order to obtain a disjoint union, we can define (50) Bˆ₁ :=B(ξ₁, δ_ξ₁), and Bˆ_k:=B(ξ_k, δ_ξ_k)\(∪^k−1_j=1Bˆ_j), ∀k >1.

Let k < N −1, and assume that V_hⁿ≡S_n^h, for all n=k+ 1, ..., N. Since Uad is compact, by Lemma 3.2, for j=k, ..., N −1, there existsuⁱ_j ∈U_ad such that

(51) h`(tj, ξi, uⁱ_j) +E h

V_j+1^h (ξi+hf(tj, ξi, uⁱ_j) +σ(tj, ξi, uⁱ_j)∆Wj+1) i

=S_j^h(ξi) We define the measurable functionu_j(x) =P∞

i=1uⁱ_jχ_B_ˆ

i(x). Letx∈Rⁿand isuch that x∈Bˆ_i (see (50)). Then,

(52)

h`(tj, x, uj(x)) +E h

V_j+1^h (x+hf(tj, x, uj(x)) +σ(tj, x, uj(x))∆Wj+1) i

=h`(t_j, x, uⁱ_j) +E h

V_j+1^h (x+hf(t_j, x, uⁱ_j) +σ(t_j, x, uⁱ_j)∆W_j+1)i

≤ ^ε₂ +h`(t_j, ξ_i, uⁱ_j) +E h

V_j+1^h (ξ_i+hf(t_j, ξ_i, uⁱ_j) +σ(t_j, ξ_i, uⁱ_j)∆W_j+1)i

≤ ^ε₂ +S_j^h(ξ_i)

≤S_j^h(x) +ε.

Now, we fix x ∈ Rⁿ and take ¯u_k = u_k(x) ∈ U_ad. We define inductively u_j : Ω → R^r, for all k < j≤N−1, as

(53) u¯_j+1(ω) :=u_j+1

y_j+1^k,x,(¯^u^k^,...,¯^u^j⁾(ω) ,

where

yr^k,x,(¯^u^k^,...,¯^u^j⁾

j+1

r=k satisfies the first equation in (8) for ¯u_k, ...,u¯j, and y_k^k,x,(¯^u^k^,...,¯^u^j⁾ = x.

Since the functionuj+1 is measurable and bounded, andy_j+1^k,x,(¯^u^k^,...,¯^u^j⁾ is measurable with respect to F_j+1^k , we obtain that ¯u = (¯u_k, ...,u¯N−1) ∈ U_k^h. Using the same ideas as in (44)-(45), and the assumption V_j^h ≡S_j^h, for all k < j≤N −1, we get that

(54)

J_k^h(x,u)¯ = h`(t_k, x,u¯_k) +E h

hPN−1

j=k+1`(t_j, y_j^k,x,¯û, u_j(y_j^k,x,¯û)) +g(y^k,x,¯_N û)i

= h`(t_k, x,u¯_k) +E h

hPN−1

j=k+1`(tj, y_j^k,x,¯û, uj(y_j^k,x,¯û)) +V_N^h(y_N^k,x,¯û) i

≤ h`(tk, x,u¯k) +E h

hPN−2

j=k+1`(tj, y_j^k,x,¯û, uj(y_j^k,x,¯û)) +S_N−1^h (y^k,x,¯_N−1û) +ε i

= h`(t_k, x,u¯_k) +E h

hPN−2

j=k+1`(t_j, y_j^k,x,¯û, u_j(y_j^k,x,¯û)) +V_N−1^h (y_N^k,x,¯₋₁û) +εi

≤ h`(t_k, x,u¯_k) +E h

V_k+1^h (y_k+1^k,x,¯^u) i

+ (N −1)ε

≤ S_k^h(x) +N ε,

where the inequality in the third line above is obtained by using (52). We conclude that, (55) V_k^h(x)≤J_k^h(x,u)¯ ≤S_k^h(x) +N ε.

Since ε >0 is arbitrary, we obtain

(56) V_k^h(x)≤S_k^h(x),

from which the result follows.

The following remark will be used in the proof of the main result of Section 5.

(11)

Remark 3.5. Given k∈ {0, . . . , N−1}, we introduce the following sets of controls,

(57)











U¯_k^h := {u∈Π^N−1_j=kL²_F0 j

:uj ∈U_ad, P-a.s. ∀j=k, . . . , N−1}

U¯_k¹ := {u∈Π^N−1_j=kL²_F0

tj :uj ∈Uad, P-a.s. ∀j=k, . . . , N−1}, U¯_k² := {u∈Π^N−1_j=kL²

F^tk

tj

:uj ∈Uad, P-a.s. ∀j=k, . . . , N−1}, and the associated value functions,

(58)











V¯_N^h(x) :=g(x), V¯_k^h(x) := inf_u∈U¯_k^hJ_k^h(x, u), k= 0, ..., N −1, V¯_N^h,1(x) :=g(x), V¯_k^h,1(x) := inf_u∈U¯_k¹J_k^h(x, u), k= 0, ..., N −1, V¯_N^h,2(x) :=g(x), V¯_k^h,2(x) := inf_u∈U¯_k²J_k^h(x, u), k= 0, ..., N −1.

We can observe that all the results of the current section including the DPP, remain true if we deal with any of the sets in (57). Indeed, the proofs are based in the fact that the processes are adapted to the given filtration and the increments of Brownian motions are independent. Since V¯_N^h(x) = ¯V_N^h,1(x) = ¯V_N^h,2(x) = V_N^h(x) =g(x), and {V¯_k^h}, {V¯_k^h,1}, {V¯_k^h,2} and {V_k^h} satisfy (40), we have

(59) V¯_k^h(x) = ¯V_k^h,1(x) = ¯V_k^h,2(x) =V_k^h(x), ∀k= 0, ..., N.

3.1. Feedback optimal control. The aim of this section is to prove that there exists a feedback optimal control for (P_k,x^h ). For notational convenience we define for all 0≤k≤N−1, the function F^k:Rⁿ×U_ad→Ras

(60) F^k(x, u) :=h`(t_k, x, u) +E h

V_k+1^h (x+hf(t_k, x, u) +σ(t_k, x, u)∆W_k+1) i

. Then, by the DPP, for allx∈Rⁿ and k= 0, . . . , N−1, we have,

(61) V_k^h(x) = inf

u∈U_adF^k(x, u) and V_N^h(x) =g(x).

Based on a measurable selection theorem due to Sch¨al [28, Theorem 5.3.1], we can prove the following result.

Proposition 3.6. Under the above assumptions, for all k= 0, ..., N−1 there exists a measurable functionu¯^k:Rⁿ→U_ad such that

(62) F^k(x,u¯^k(x)) =V_k^h(x),

for all x∈Rⁿ.

Proof. Arguing as in the proof of Lemma 3.3, it is easy to check that (x, u)7→F^k(x, u) is continuous. SinceU_ad is compact we can apply [28, Theorem 5.3.1]. The result follows.

Remark 3.7. As a corollary of the previous results and the DPP, in this discrete framework, we always have a discrete time feedback (also called Markov) optimal control. Indeed, the sequence of measurable functions u¯₀, . . . ,u¯N−1 given by the previous proposition, defines the optimal control

¯

u= (¯u⁰(y0),u¯¹(y1), ...,u¯^N−1(yN−1)), where (y0, ..., yN) is defined recursively as (63)

( y_k+1=y_k+hf(t_k, y_k,u¯^k(y_k)) +σ(t_k, y_k,u¯^k(y_k))∆W_k+1 ∀k= 0, . . . , N−1, y0=x.

Let us point out an interesting phenomenon, not underlined enough in the literature, which shows the power of the DPP. In the continuous time case it is well known(see e.g. [13, Chapter VI]and [14, Chapters 3 and 4])that if the Hamilton-Jacobi-Bellman equation associated with the stochastic control problem, which is a consequence of the DPP in continuous time, admits a solutionv which is regular enough, then we can construct a feedback optimal control. This is known as a verification result and, under standard assumptions, usually holds when σ does not depend on u and, setting a=σσ^>, we have that

X

1≤i, j≤n

a_i,j(x, t)ξ_iξ_j ≥c|ξ|² ∀ξ∈R^d,

(12)

for some c > 0. In particular, if we fix (t, x) ∈ [0, T[×Rⁿ, we get the existence of an optimal feedback policy for the problem associated with V(t, x). The main feature of this analysis is that existence of an optimum is obtained without some usual convexity assumptions required in the strong formulation (see e.g. [30, Chapter 2, Section 5.2]). On the other hand, as we have just seen, for the problem obtained by discretizing the time variable we always have the existence of a feedback control without any extra assumption. This is still an infinite dimensional problem for which existence does not follow by standard methods, but it is a consequence of the DPP.

4. Other regularity properties of the value function

In this section we prove some regularity properties of the value function of the continuous and the discrete problems. Some of them will be used in the proof of our main result in the next section and the others are interesting by themselves.

In the following result we show the local (in space) H¨older continuity in time for V as well as an analogous result for its discrete version{V_k^h ; k= 0, . . . , N}defined in (10). The former result is classical, see e.g. [29, Section 3.4] and [30, Chapter 4, Proposition 3.1]). However, for the sake of completeness, we prove here a version adapted to our assumptions.

In the statement of the following result we use the r.h.s. of (2) (respectively (9)) to extend J^s,x(·) (respectivelyJ_k^h(x,·)) toU_ad⁰ (respectivelyU₀^h).

Theorem 4.1. Under assumptions (H1)and(H2), there exists C >0 (independent ofh), such that for all x∈Rⁿ, u∈ U_ad⁰ and s, t∈[0, T],

(64) |J^s,x(u)−J^t,x(u)| ≤C[1 +|x|²]|s−t|¹², and for all u∈ U₀^h andr, k= 0, . . . , N,

(65) |J_r^h(x, u)−J_k^h(x, u)| ≤C[1 +|x|²]|k−r|¹²h¹². As a consequence,

(66) |V(s, x)−V(t, x)| ≤C[1 +|x|²]|s−t|¹² ∀x∈Rⁿ, s, t∈[0, T], and

(67) |V_r^h(x)−V_k^h(x)| ≤C[1 +|x|²]|k−r|¹²h¹² ∀x∈Rⁿ, r, k= 0, . . . , N.

Proof. First of all note that for all s∈[0, T] and x∈Rⁿ, we have

(68) V(s, x) = inf

u∈U_ad⁰ J^s,x(u).

Indeed, it is clear thatV(s, x)≥inf_u∈U⁰

adJ^s,x(u). On the other hand, if u∈ U_ad⁰ , then for alls≤ t≤T the functionu(t) isF_t⁰-measurable, and so there exists a measurable maput((ωr)0≤r≤s,(ωr− ω_s)s≤r≤t) such that u(t, ω) = u_t((W_r(ω))0≤r≤s,(W_r(ω)−W_s(ω))s≤r≤t), P-a.s. ([1]). Then if we fix (ωr)0≤r≤s we can define the function

(69) uˆt((ωr)0≤r≤s) :ω∈Ω7→ut((ωr)0≤r≤s,(Wr(ω)−Ws(ω))s≤r≤t),

which belongs toF_t^s. By the independence of the increments of Brownian motions, we obtain the converse inequality (see, e.g. [5, Remark 5.2]).

Without loss of generality, assume that 0≤s≤t≤T. We consider a fixed initial state x and a controlu∈ U_ad⁰ . For simplicity we denote y^s :=yu^s,x,y^t:=yu^t,x and for t≤r ≤T, andϕ=f, σ, (70) ∆y(r) :=y^s(r)−y^t(r) and ∆ϕ(r) =ϕ(r, y^s(r), u(r))−ϕ(r, y^t(r), u(r)).

Then, we obtain for t≤r ≤T

(71) ∆y(r) = Rt

sf(τ, y^s(τ), u(τ))dτ +Rt

sσ(τ, y^s(τ), u(τ))dW(τ) +Rr

t ∆f(τ)dτ +Rr

t ∆σ(τ)dW(τ).

By assumption (H1), the Cauchy-Schwarz inequality and the Itˆo isometry we have

(72) E

|∆y(r)|²

≤ 4[|s−t|+ 1]Rt

s3L²(1 +E[|y^s(τ)|²] +E[|u(τ)|²])dτ +4[|s−t|+ 1]Rr

t L²E[|∆y(τ)|²]dτ.