• Aucun résultat trouvé

Decomposition of High Dimensional Aggregative Stochastic Control Problems

N/A
N/A
Protected

Academic year: 2021

Partager "Decomposition of High Dimensional Aggregative Stochastic Control Problems"

Copied!
27
0
0

Texte intégral

(1)

HAL Id: hal-02917014

https://hal.inria.fr/hal-02917014v3

Preprint submitted on 17 Sep 2020 (v3), last revised 20 Oct 2021 (v4)

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Stochastic Control Problems

Adrien Seguret, Clémence Alasseur, J Frédéric Bonnans, Antonio de Paola, Nadia Oudjane, Vincenzo Trovato

To cite this version:

Adrien Seguret, Clémence Alasseur, J Frédéric Bonnans, Antonio de Paola, Nadia Oudjane, et al..

Decomposition of High Dimensional Aggregative Stochastic Control Problems. 2020. �hal-02917014v3�

(2)

ADRIEN SEGURET, CLÉMENCE ALASSEUR, J. FRÉDÉRIC BONNANS§, ANTONIO 3

DE PAOLA, NADIA OUDJANEk, AND VINCENZO TROVATO# 4

Abstract. We consider the framework of high dimensional stochastic control problem, in which 5

the controls are aggregated in the cost function. As first contribution we introduce a modified 6

problem, whose optimal control is under some reasonable assumptions anε-optimal solution of the 7

original problem. As second contribution, we present a decentralized algorithm whose convergence 8

to the solution of the modified problem is established. Finally, we study the application to a problem 9

of coordination of energy production and consumption of domestic appliances.

10

Key words. Stochastic optimization, Lagrangian decomposition, Uzawa’s algorithm, stochastic 11

gradient.

12

AMS subject classifications. 93E20,65K10, 90C25, 90C39, 90C15.

13

1. Introduction. The present article aims at solving a high dimensional

14

stochastic control problem (P

1

) involving a large number n of agents indexed by

15

i ∈ {1, · · · , n} , of the form:

16

(1.1) (P

1

)

 

 

 inf

u∈U

J (u)

J(u) := E F

0

( 1 n

n

X

i=1

u

i

) + 1 n

n

X

i=1

F

i

(u

i

, X

i,ui

)

! .

17

The dynamics of the state of each agent X

i,ui

is driven by independent Brownian mo-

18

tions W

i

(no common noise) so that potential interactions between agents dynamics

19

is only due to the non anticipative controls u

i

supposed to be progressively measur-

20

able w.r.t. to the Brownian noise W = (W

i

)

i∈{1,···,n}

. We emphasize, the specific

21

structure of that problem whose cost function is the sum of, on one side, additively

22

separable terms F

i

between agents and a coupling term F

0

, function of the aggregate

23

strategies 1 n

n

X

i=1

u

i

.

24

1.1. Motivations. This work is motivated by its potential applications for large-

25

scale coordination of flexible appliances, to support power system operation in a con-

26

text of increasing penetration of renewables. One type of appliances that has been

27

consistently investigated in the last few years, for its intrinsic flexibility and potential

28

Submitted to the editors September 17, 2020

Funding: The first, second, third and fifth author thank the FiME Lab (Institut Europlace de Finance). The third author was supported by the PGMO project “Optimal control of conservation equations", itself supported by iCODE(IDEX Paris-Saclay) and the Hadamard Mathematics LabEx.

OSIRIS department, EDF Lab, Paris-Saclay, France([email protected]).

OSIRIS department, EDF Lab, Paris-Saclay, France([email protected]).

§CMAP, Ecole polytechnique, 91128 Palaiseau, France, and Inria- Saclay([email protected]).

Center for Sustainable Power Distribution University of Bath, Bath, UK([email protected]).

kOSIRIS department, EDF Lab, Paris-Saclay, France([email protected]).

#Department of Electrical and Electronic Engineering, Imperial College London, London, UK([email protected]).

1

(3)

for network support, includes thermostatically controlled loads (TCLs) such as re-

29

frigerators or air conditioners. Several papers have already investigated the potential

30

of dynamic demand control and frequency response services of TCLs [21] and how

31

the population recovers from significant perturbations [3]. The coordination of TCLs

32

can be performed in a centralized way, like in [8]. However this approach raises chal-

33

lenging problems in terms of communication requirements and customer privacy. A

34

common objective can be reached in a fully distributed approach, like in [25], where

35

each TCL is able to calculate its own actions (ON/OFF switching) to pursue a com-

36

mon objective. This paper is related to the work of De Paola et al. [4], where each

37

agent represents a flexible TCL device. In [4] a distributed solution is presented for

38

the operation of a population of n = 2 × 10

7

refrigerators providing frequency sup-

39

port and load shifting. They adopt a game-theory framework, modelling the TCLs as

40

price-responsive rational agents that schedule their energy consumption and allocate

41

their frequency response provision in order to minimize their operational costs. The

42

potential practical application of our work also considers a large population of TCLS

43

which, contrarily to [4], have stochastic dynamics. The proposed approach is able to

44

minimize the overall system costs in a distributed way, with each TCL determining

45

its optimal power consumption profile in response to price signals.

46

1.2. Related literature. The considered problem belongs to the class of

47

stochastic control: looking for strategies minimizing the expectation of an objective

48

function under specific constraints. One of the main approaches proposed in the litera-

49

ture to tackle this problem is to use random trees: this consists in replacing the almost

50

sure constraints, induced by non-anticipativity, by a finite number of constraints to

51

get a finite set of scenarios (see. [9] and [19]). Once the tree structure is built, the

52

problem is solved by different decomposition methods such as scenario decomposition

53

[18] or dynamic splitting [20]. The main objective of the scenario method is reducing

54

the problem to an approximated deterministic one. The paper focuses on high dimen-

55

sional noise problems with large number of time steps, for which this approach is not

56

feasible. The idea of reducing a single high dimensional problem to a large number

57

with low dimension has been widely studied in the deterministic case. In determinis-

58

tic and stochastic problems a possibility is to use time decomposition thanks to the

59

Dynamic Programming Principle [1] taking advantage of Markov property of the sys-

60

tem. However, this method requires a specific time structure of the cost function and

61

fails when applied to problems for which the state space dimension is greater than

62

five. One can deal with the curse of dimensionality, under continuous linear-convex

63

assumptions, by using the Stochastic Dual Dynamic Programming algorithm (SDDP)

64

[15] to get upper and lower bounds of the value function, using polyhedral approxi-

65

mations. Though the almost-sure convergence of a broad class of SDDP algorithms

66

has been proved [17], there is no guarantee on the speed of the convergence and there

67

is no good stopping test. In [14], a stopping criteria based on a dual version of SDDP,

68

which gives a deterministic upper-bound for the primal problem, is proposed. SDDP

69

is well-adapted for medium sized population problems ( n ≤ 30 ), whereas it fails for

70

problems with magnitude similar to one of the present paper ( n > 1000 ). It is natural

71

for this type of high dimensional problem to investigate decomposition techniques in

72

the spirit of the Dual Approximation Dynamic Programming (DADP). DADP has

73

been developed in PhD theses (see [7], [12]). This approach is characterized by a

74

price decomposition of the problem, where the stochastic constraints are projected

75

on subspaces such that the associated Lagrangian multiplier is adapted for dynamic

76

programming. Then the optimal multiplier is estimated by implementing Uzawa’s

77

(4)

algorithm. To this end in [12], the Uzawa’s algorithm, formulated in a Hilbert set-

78

ting, is extended to a Banach space. DADP has been applied in different cases, such

79

as storage management problem for electrical production in [7, chapter 4] and hydro

80

valley management [2]. In the proposed paper, in the same vein as DADP we propose

81

a price decomposition approach restricted to deterministic prices. This new approach

82

takes advantage of the large population number in order to introduce an auxiliary

83

problem where the coupling term is purely deterministic.

84

1.3. Contributions. We consider the following approximation of problem (P

1

) :

85

(1.2) (P

2

)

 

 

u∈U

inf J ˜ (u) J ˜ (u) := F

0

1 n

n

X

i=1

E (u

i

)

! + 1

n E

n

X

i=1

F

i

(u

i

, X

i,ui

)

! .

86

As a first contribution, this paper shows that under some convexity and regularity

87

assumptions on F

0

and (F

i

)

i∈{1,...,n}

, any solution of problem (P

2

) is an ε

n

-solution

88

of (P

1

) , with ε

n

→ 0 when n → ∞ . In addition, an approach of price decomposition

89

for (P

2

) is easier than for (P

1

) , since the Lagrange multiplier is deterministic for (P

2

) ,

90

whereas it is stochastic for (P

1

) . Since computing the dual cost of (P

2

) is expensive,

91

we propose Stochastic Uzawa and Sampled Stochastic Uzawa algorithms relying on

92

Robbins Monroe algorithm in the spirit of the stochastic gradient. Its convergence

93

is established. We check the effectiveness of the Stochastic Uzawa algorithm on a

94

linear quadratic Gaussian framework, and we apply the Sampled Stochastic Uzawa

95

algorithm to a model of power system, inspired by the work of A. De Paola et al. [4].

96

2. General framework. Let (Ω, F, F , P ) be a complete filtered probability

97

space on which W = (W

i

)

i=1,...,n

is a n− dimensional Brownian motion, such that

98

for any t ∈ [0, T ] and i ∈ {1, . . . , n} , W

ti

takes value in R, and generates the filtration

99

F = (F

t

)

0≤t≤T

. P stands for the Wiener measure associated with this filtration and

100

F for the augmented filtration by all P-null sets.

101

The following notations are used:

102

X := {ϕ : Ω → C([0, T ], R ) | ϕ(·) is F − adapted , kϕk

∞,2

:= E ( sup

s∈[0,T]

|ϕ(s)|

2

)

12

< ∞}, L

2

(0, T ) := {ϕ : [0, T ] → R |

Z

T 0

|ϕ(t)|

2

dt < ∞},

U := {ϕ : [0, T ] × Ω → R | ϕ(·) is F − prog. measurable , E Z

T

0

|ϕ(t)|

2

dt < ∞},

103

and for any i ∈ {1, . . . , n} , the feasible set of controls is defined by:

104

(2.1) U

i

:= {v ∈ U and v

t

(ω) ∈ [−M

i

, M

i

], for a.a. (t, ω) ∈ [0, T ] × Ω},

105

and we set M := max

i∈{1,...,n}

M

i

. The set of admissible controls is U := U

1

× . . . × U

n

,

106

whose elements are denoted by u := (u

1

, . . . , u

n

) .

107

Each local agent i = 1, . . . , n is supposed to control its state variable through the

108

control process u

i

∈ U

i

and suffers from independent uncertainties. More specifically,

109

the state process of each agent, X

i,ui

= (X

ti,ui

)

t∈[0;T]

, for i = 1, . . . , n takes values in

110

(5)

R and follows the dynamics

111

(2.2)

dX

ti,ui

= µ

i

(t, u

it

, X

ti,ui

)dt + σ

i

(t, X

ti,ui

)dW

ti

, for t ∈ [0, T ], for i ∈ {1, . . . , n}

X

0,ui i

= x

i0

∈ R .

112

Without loss of generality, the initial states x

i0

are supposed to be deterministic.

113

The process X

i

is F-progressively measurable. For all i , F

i

stands for the natural

114

filtration of the Brownian motion W

i

.

115

2.1. On the well-posedness of (P

1

). In this section, the assumptions needed

116

for (P

1

) to be well posed are studied.

117

Assumption 2.1. For any i ∈ {1, . . . , n} , the functions µ

i

and σ

i

are continuous

118

w.r.t (u, x) uniformly in t . In addition there exists K

i

> 0 such that, for any t ∈ [0, T ]

119

and ν ∈ [−M, M ] :

120

(2.3)

i

(t, ν, x) − µ

i

(t, ν, y)| + |σ

i

(t, ν, x) − σ

i

(t, ν, y)| ≤ K

i

|x − y|,

i

(t, ν, x)| + |σ

i

(t, ν, x)| ≤ K

i

(1 + |x|), for any x, y ∈ R .

121

Lemma 2.2. Let i ∈ {1, . . . , n} and v ∈ U

i

be a control process. If Assumption

122

2.1 holds, then there exists a unique process X

i,v

∈ X satisfying (2.2) (in the strong

123

sense) such that for any p ∈ [1, ∞):

124

(2.4) E

sup

0≤t≤T

|X

ti,v

|

p

< C (p, T, x

0

, K ) < ∞ .

125

Proof. The proof for the existence and uniqueness of a solution of (2.2) relies on

126

[13, Theorem 3.6, Chapter 2]. The inequality is a result of [13, Theorem 4.4, Chapter

127

2 ].

128

Let F

0

: L

2

(0, T ) → R ¯ and F

i

: L

2

(0, T ) × C([0, T ], R ¯ ) → R be proper and lower

129

semi continuous functions, and there exists u ˆ ∈ U such that:

130

(2.5) E F

0

( 1

n

n

X

i=1

ˆ u

i

)

!

< ∞.

131

Define G

i

: L

2

(0, T ) → R ¯ by G

i

(u

i

) = F

i

(u

i

, X

i,ui

) for any u

i

∈ U

i

. Additional

132

assumptions are formulated below.

133

Assumption 2.3. For any i ∈ {1, . . . , n} :

134

(i) u

i

7→ E (G

i

(u

i

)) is strictly convex.

135

(ii) there exists a positive integer p such that F

i

has p -polynomial growth, i.e

136

there exists K > 0 such that for any x

i

∈ C([0, T ], R ) and u

i

∈ L

2

(0, T ) :

137

|F

i

(u

i

, x

i

)| ≤ K(1 + sup

0≤t≤T

|x

it

|

p

) .

138

Assumption 2.3.(1) holds in different cases, like in the example below.

139

Example 2.4. For any i ∈ {1, . . . , n} , there exists g

i

: L

2

(0, T ) → R and h

i

:

140

C[0, T ] → R such that for any (v, X) ∈ L

2

(0, T ) × C[0, T ] , F

i

(v, X) = g

i

(v) + h

i

(X )

141

and there exists five L

([0, T ]) scalar functions α

i

, β

i

, γ

i

, ξ

i

and θ

i

such that for any

142

(t, ν, x) ∈ [0, T ] × [−M, M ] × R:

143

(2.6) µ

i

(t, ν, x) = α

i

(t)ν + β

i

(t)x + γ

i

(t) and σ

i

(x, t) = ξ

i

(t)x + θ

i

(t).

144

(6)

Then Assumption 2.3.(i) is satisfied if:

145

(i) g

i

is strictly convex and h

i

convex.

146

(ii) for a.e. t ∈ [0, T ] , α(t) 6= 0 , g

i

is convex and h

i

strictly convex.

147

Indeed, for any i ∈ {1, . . . , n} , u, v ∈ U, δ ∈ [0, 1] and t ∈ [0, 1] , it holds from (2.6)

148

that X

i,δu+(1−δ)v

t

= δX

ti,u

+ (1 − δ)X

ti,v

. If point (i) holds, then:

149

(2.7) h

i

(X

i,δu+(1−δ)v

) ≤ δh

i

(X

i,u

) + (1 − δ)h

i

(X

i,v

).

150

Assumption 2.3.(i) follows from (2.7) and strict convexity of g

i

.

151

Similarly, if point (ii) holds, then the inequality in (2.7) is strict, and Assumption

152

2.3.(i) follows using also the convexity of g

i

.

153

Remark 2.5. If for any i ∈ {1, . . . , n} , Assumption 2.3.(i) holds, then G

i

is w.l.s.c.

154

on U

i

. Indeed, G

i

being convex, finite valued and bounded on bounded subsets

155

of L

2

(0, T ) ( from the polynomial growth of F

i

and the inequality (2.4) ) ,thus G

i

is

156

continuous on U

i

.

157

From now on, Assumptions 2.1 and 2.3 are in force in the sequel.

158

The following lemma ensures the well-posedness of (P

1

) .

159

Lemma 2.6. Suppose that F

0

is convex. Then J reaches its minimum over U at

160

a unique point.

161

Proof. Clearly the control u ˆ ∈ U defined in (2.5) is feasible. The existence and

162

uniqueness of a minimum is proved by considering a minimizing sequence {u

k

} of

163

J over U . The set U being bounded and weakly closed, there exists a sub-sequence

164

{u

k`

} which weakly converges to a certain u

∈ U . Using Assumptions 2.3.(i)(ii) and

165

convexity of F

0

, it follows that lim inf J (u

k`

) ≥ J (u

) and thus u

is a solution of

166

(P

1

) . The uniqueness is due to the strict convexity of u

i

7→ E (F

i

(u

i

, X

i,ui

)) .

167

Remark 2.7. This kind of stochastic optimization problem is illustrated in Section

168

7 with a problem of coordination of a large population of domestic appliances, where

169

a system operator has to meet the demand while producing at low cost. The state X

ti

170

can represent for instance the temperature or the battery level of the agent i at time t ,

171

and u

it

its proper power generation or consumption. F

0

can be assimilated to the cost

172

function to satisfy the demand, and for any i , F

i

to the cost function connected to the

173

proper functioning of the TCLs (characterized by individual cost function, comfort

174

constraints, etc...).

175

3. Approximating the optimization problem. In this section, the link be-

176

tween problems (P

1

) and (P

2

) is analyzed.

177

Assumption 3.1. Problem (P

2

) admits a unique solution.

178

Notice that by using the same techniques as for Lemma 2.6, one can prove that the

179

above assumption is satisfied when F

0

is convex.

180

We have the following key result.

181

Theorem 3.2. Under Assumption 3.1, J ˜ reaches its minimum over U at a unique

182

point, u ˜ ∈ U , such that for any i, u ˜

i

is F

i

−adapted and thus for any j 6= i, u ˜

i

and u ˜

j

183

are mutually independent.

184

Proof. Fix i ∈ {1, . . . , n} , using the convexity and l.s.c properties of

185

u

i

7→ E (G

i

(u

i

)) and Jensen’s inequality, we get:

186

(3.1) E G

i

(u

i

)|W

i

≥ G

i

E (u

i

W

i

)

.

187

(7)

On the other hand (u

1

, . . . , u

n

) 7→ F

0

( 1 n

n

X

i=1

E (u

i

)) is invariant when taking the con-

188

ditional expectation, thus F

0

1 n

n

X

i=1

E (u

i

)

!

= F

0

1 n

n

X

i=1

E E (u

i

|W

i

)

!

. From As-

189

sumption 3.2, we know that there exists a solution u

to (P

2

) .

190

We set u ˜ := ( E (u

1

|W

1

), E (u

2

|W

2

), . . . , E (u

n

|W

n

)) . For any i , u ˜

i

:=

191

E (u

i

|W

i

) is F

i

-adapted. Using the definition of u

and (3.1), one can derive that

192 u∈U

inf

J(u) = ˜ ˜ J(u

) ≥ J ˜ (˜ u) .

193

Let U ˆ be a subset of U associated to decentralized controls, in the sense that:

194

(3.2) U ˆ := {u ∈ U | u

i

is F

i

− adapted for all i ∈ {1, . . . , n}}

195

From Theorem 3.2, if Assumption 3.1 holds, then:

196

(3.3) min

u∈Uˆ

J ˜ (u) = min

u∈U

J ˜ (u).

197 198

Remark 3.3. If Assumption 3.1 isn’t satisfied, we can prove by same arguments

199

that for any ε > 0 there exists an ε -optimal solution such that the individual controls

200

are mutually independent.

201

Lemma 3.4. If F

0

is Lipschitz with constant γ, then an optimal solution in U ˆ of

202

problem (P

2

) is an ε-optimal solution in U ˆ of problem (P

1

), with ε = 2γM p T /n.

203

Proof. Indeed, there exists a number γ such that γ > 0 and for all x, y ∈ H

1

we

204

have |F

0

(x) − F

0

(y)| < γkx − yk

H1

. We set for any u ∈ U :

205

(3.4) u ˆ

i

:= u

i

− E (u

i

).

206

Using the Jensen and Hölder inequalities, ( E |Y |) ≤ ( E |Y |

2

)

12

, the fact that for any

207

j 6= i , u

i

and u

j

are mutually independent, and that u

i

is bounded by M , we have

208

∀u ∈ U ˆ :

209

(3.5)

| E F

0

( 1 n

n

X

i=1

u

i

)) − F

0

( 1 n

n

X

i=1

E (u

i

)

!

| ≤ E |F

0

( 1 n

n

X

i=1

u

i

)) − F

0

( 1 n

n

X

i=1

E (u

i

))|

!

≤ γ n E (k

n

X

i=1

ˆ

u

i

k

L2(0,T)

)

≤ γ n E (k

n

X

i=1

ˆ

u

i

k

2L2(0,T)

)

12

= γ n

Z

T 0

Var(

n

X

i=1

u

it

)dt

12

≤ γ n

12

M √

T .

210

Let u ˜

denote a minimizer of J ˜ on U ˆ , then using (3.5) for any u ∈ U ˆ it holds:

211

(3.6) J (˜ u

) ≤ J(˜ ˜ u

) + γ n

12

M √

T ≤ J ˜ (u) + γ n

12

M √

T ≤ J(u) + 2γ n

12

M √

212

T .

(8)

Assumption 3.5. F

0

is Gâteaux differentiable with c -Lipschitz derivative.

213

Theorem 3.6. Suppose F

0

is convex, then the following ε-optimality results hold:

214

(i) For any u ∈ U , J ˜ (u) ≤ J (u).

215

(ii) Suppose Assumption 3.5 holds, then any optimal solution of problem (P

2

) is

216

an ε-optimal solution (where ε = 2cT M

2

/n) of problem (P

1

).

217

Proof. Proof of point (i).

218

By Jensen’s inequality, we have that:

219

(3.7) F

0

( 1

n

n

X

i=1

E (u

i

)) ≤ E (F

0

( 1 n

n

X

i=1

u

i

)), ∀u ∈ U ,

220

which gives the result.

221

Proof of point (ii).

222

Since F

0

is convex, differentiable, with a c -Lipschitz differential, one can derive for

223

any u ∈ U ˆ and a.s.:

224

(3.8)

F

0

( 1 n

n

X

i=1

u

i

) − F

0

( 1 n

n

X

i=1

E [u

i

])

≤ 1

n h∇F

0

( 1 n

n

X

i=1

u

i

) ,

n

X

i=1

ˆ

u

i

i

L2(0,T)

= 1

n h(∇F

0

( 1 n

n

X

i=1

u

i

) − ∇F

0

( 1 n

n

X

i=1

E [u

i

])) ,

n

X

i=1

ˆ

u

i

i

L2(0,T)

+ 1

n h∇F

0

( 1 n

n

X

i=1

E (u

i

)) ,

n

X

i=1

ˆ

u

i

i

L2(0,T)

≤ c n

2

k

n

X

i=1

ˆ

u

i

k

2L2(0,T)

+ 1

n h∇F

0

( 1 n

n

X

i=1

E (u

i

)) ,

n

X

i=1

ˆ

u

i

i

L2(0,T)

, )

225

where u ˆ

i

is defined in (3.4). Taking the expectation of (3.8), E h∇F

0

( 1

n

n

X

i=1

E [u

i

]) ,

n

X

i=1

ˆ

u

i

i

L2(0,T)

!

= 0

, and using the mutual independence of the controls and their boundedness we get as

226

in (3.5):

227

(3.9) c

n

2

E k

n

X

i=1

ˆ

u

i

k

2L2(0,T)

= c n

2

Z

T 0

n

X

i=1

Var(u

it

)dt ≤ c n T M

2

.

228

Let u ˜

denote a minimizer of J ˜ on U ˆ , then using (3.3), (3.9) and (3.7), for any u

0

∈ U

229

we have:

230

(3.10) J (˜ u

) ≤ J ˜ (˜ u

) + c

n T M

2

≤ J ˜ (u

0

) + c

n T M

2

≤ J (u

0

) + 2c n T M

2

.

231

Thus for ε = 2cT M

2

/n , u ˜

constitutes an ε -optimal solution to the stochastic control

232

problem (P

1

) .

233

Proposition 3.7. If F

0

is convex, then we have the following inequalities:

234

(3.11) J(˜ u) − J(˜ ˜ u) ≥ J(˜ u) − J(u

) ≥ 0,

235

where u ˜ and u

are respectively the optimal controls of problems (P

2

) and (P

1

).

236

(9)

Proof. From Jensen inequality and by definition of u ˜ we have:

237

(3.12) J (u

) ≥ J(u ˜

) ≥ J ˜ (˜ u),

238

therefore from the two previous inequalities and adding J (˜ u) we get (3.12).

239

Remark 3.8. An approximation scheme to compute u ˜ is provided in Section 5.

240

The practical interest of inequality (3.11) is that one can compute an upper bound

241

for the error J (˜ u)−J (u

) , that can be automatically derived from this approximation.

242

4. Dualization and Decentralization of problem (P

2

). From now on, in

243

addition to Assumptions 2.1 and 2.3, the assumption that F

0

is convex is in force in

244

the sequel. The problem (P

2

) defined in (1.2) is dualized in order to decouple the

245

controls in this problem.

246 247

The optimization problem (P

2

) is equivalent to:

248

(4.1) (P

3

)

 

 

 

 

 

  min

u∈U,v∈V

J(u, v), ¯ J(u, v) := ¯ F

0

(v) + 1

n E

n

X

i=1

F

i

(u

i

, X

i,ui

)

! , s.t g(u, v) = 0,

249

where g(u, v) := 1 n

n

X

i=1

E (u

i

) − v and V := {ν ∈ L

2

(0, T )|; |v(t)| ≤ 2M ∀t ∈ [0, T ]} .

250

The Lagrangian function associated with the constrained optimization problem (P

3

)

251

is: L : U × L

2

(0, T ) × L

2

(0, T ) − → ¯

R defined by:

252

(4.2) L(u, v, λ) := ¯ J(u, v) + hλ, 1 n

n

X

i=1

E (u

i

) − vi

L2(0,T)

.

253

The dual problem (D) associated with (P

3

) is:

254

(4.3) (D) max

λ∈L2(0,T)

W(λ), where W(λ) := min

u∈U,v∈V

L(u, v, λ).

255

The problem is said to be qualified if it is still feasible after a small perturbation of

256

the constraint, in the following sense:

257

(4.4) There exists ε > 0 such that B

L2(0,T)

(0, ε) ⊂ g(U , V ),

258

where B

L2(0,T)

(0, ε) is the open ball of radius ε in L

2

(0, T ) and g has been defined in

259

(4.1).

260

Lemma 4.1. Problem (P

3

) is qualified.

261

Proof. Choose ε := M . Then: B

L2(0,T)

(0, ε) ⊂ B

L2(0,T)

(0, 2M ) = g(0, V) ⊂

262

g(U , V) , where g(0, V) and g(U , V) are respectively the image by g of {0} × V and

263

U × V . The conclusion follows.

264

By Assumption 2.3, Lemma 4.1 and the convexity of F

0

, the strong duality holds:

265

W(λ

) = ˜ J (u

) , where λ

∈ arg max

λ∈L2(0,T)

W(λ) and u

∈ arg min

u∈U,v∈V

L(λ

, u, v).

266

(10)

Since the set of admissible controls U = U

1

× . . . × U

n

is a Cartesian product and by strict convexity of u

i

7→ E (F

i

(u

i

, X

i,ui

)) , each component u

∗i

can be uniquely determined by solving the following sub problem:

u

∗i

= arg min

ui∈Uˆi

n E

F

i

(u

i

, X

i,ui

) + hλ

, u

i

i

L2(0,T)

o , where U ˆ

i

:= {u ∈ U

i

| u

i

is F

i

− adapted } .

267

Remark 4.2. By using the same argument as in Theorem 3.2, one can prove:

268

(4.5)

min

ui∈Uˆi

n E

F

i

(u

i

, X

i,ui

) + hλ

, u

i

i

L2(0,T)

o

= min

ui∈Ui

n E

F

i

(u

i

, X

i,ui

) + hλ

, u

i

i

L2(0,T)

o .

269

5. Stochastic Uzawa and Sampled Stochastic Uzawa algorithms.

270

5.1. Continuous time setting. We recall that Assumptions 2.1 and 2.3 are in

271

force, as well as convexity of F

0

.

272

This section aims at proposing an algorithm to find a solution of the dual problem

273

(4.3).

274

For all i ∈ {1, . . . n} , and λ ∈ L

2

(0, T ) , we define the optimal control u

i

(λ) :

275

(5.1) u

i

(λ) := arg min

ui∈Uˆi

n E

F

i

(u

i

, X

i,ui

) + hλ, u

i

i

L2(0,T)

o ,

276

which is well defined since u

i

→ E (F

i

(u

i

, X

i,ui

)) is strictly convex.

277

For any λ ∈ L

2

(0, T ) , the subset V (λ) is defined by:

278

(5.2) V (λ) := arg min

v∈V

{F

0

(v) − hλ, vi

L2(0,T)

}.

279

Since F

0

is convex and V is bounded, V (λ) is a non empty subset of V and is reduced

280

to a singleton if F

0

is strictly convex.

281

For any λ ∈ L

2

(0, T ) , a function v(λ) , which is a selection of V (λ) , is associated.

282

Uzawa’s algorithm seems particularly adapted for this problem. However at each

283

dual iteration k and any i ∈ {1, . . . , n} , for the update of λ

k+1

, one would have to

284

compute the quantities E [u

i

k

)] , which is hard in practice. Therefore two algorithms

285

are proposed where at each iteration k , λ

k+1

is updated thanks to a realization of

286

u

i

k

) .

287

For any real valued function F defined on L

2

(0, T ) , F

stands for its Fenchel

288

conjugate.

289

Lemma 5.1. Assumption 3.5 holds iff F

0

is proper and strongly convex.

290

Proof. (i) Let Assumption 3.5 hold. Since F

0

is proper, convex and l.s.c., F

0

291

is l.s.c. proper. From the Lipschitz property of the gradient of F

0

, it holds that

292

dom (F

0

) = L

2

(0, T ) .

293

Let s, s ˜ ∈ dom (F

0

) such that there exist λ

s

∈ ∂F

0

(s) and µ

˜s

∈ ∂F

0

(˜ s) . From

294

the differentiability, l.s.c. and convexity of F

0

, it follows that: s = ∇F

0

s

) and

295

˜

s = ∇F

0

˜s

) . By Assumption 3.5 and the extended Baillon-Haddad theorem [16,

296

(11)

Theorem 3.1], ∇F

0

is cocoervice. In other words:

297

(5.3)

hs − s, λ ˜

s

− µ

s˜

i

L2(0,T)

= h∇F

0

s

) − ∇F

0

˜s

), λ

s

− µ

˜s

i

L2(0,T)

≥ 1

c k∇F

0

s

) − ∇F

0

s˜

)k

2L2(0,T)

= 1

c ks − sk ˜

2L2(0,T)

.

298

Therefore ∂F

0

is strongly monotone, which implies the strong convexity of F

0

.

299

(ii) Conversely, assume that F

0

is proper and strongly convex. Then there exist

300

α, β > 0 such that for any s ∈ dom (F

0

) : F

0

(s) ≥ αksk

2L2(0,T)

− β , and F

0

being

301

convex, l.s.c. and proper, for any λ ∈ L

2

(0, T ) it holds:

302

(5.4) F

0

(λ) ≤ sup

s∈L2(0,T)

hs, λi

L2(0,T)

− αksk

L2(0,T)2

+ β = kλk

2

/α + β.

303

Thus F

0

is proper and uniformly upper bounded over bounded sets and therefore is

304

locally Lipschitz. In addition, from the strong convexity of F

0

and the convexity

305

of F

0

, for any λ ∈ L

2

(0, T ) , ∂F

0

(λ) is a singleton. Thus F

0

is everywhere Gâteaux

306

differentiable.

307

Let λ, µ ∈ L

2

(0, T ) . Since F

0

is strongly convex, the functions F

0

(s)−hλ, si

L2(0,T) 308

(resp. F

0

(s) − hµ, si

L2(0,T)

) has a unique minimum point s

λ

(resp. s

µ

), characterized

309

by: λ ∈ ∂F

0

(s

λ

) and µ ∈ ∂F

0

(s

µ

) . From the strong convexity of F

0

, the strong

310

monotonicity of ∂F

0

holds: hµ − λ, s

µ

− s

λ

i

L2(0,T)

≥ 1

c ks

µ

− s

λ

k

2L2(0,T)

, where c > 0

311

is a constant related to the strong convexity of F

0

. Using that s

λ

= ∇F

0

(λ) and

312

s

µ

= ∇F

0

(µ) , it holds:

313

(5.5) h µ − λ, ∇F

0

(µ) − ∇F

0

(λ) i

L2(0,T)

≥ 1

c k∇F

0

(µ) − ∇F

0

(λ)k

2L2(0,T)

,

314

meaning that ∇F

0

is cocoercive. Applying the Cauchy–Schwarz inequality to the left

315

hand side of the previous inequality, the Lipschitz property of ∇F

0

follows.

316

Lemma 5.2. If Assumption 3.5 holds, then W is strongly concave.

317

Proof. For any λ ∈ L

2

(0, T ) , we have: W(λ) = − F

0

(λ) + 1 n

n

X

i=1

F

i

(λ) , where

318

for any i ∈ {1, . . . , n} , F

i

is convex and from Lemma 5.1 F

0

is strongly convex.

319

We introduce the function f : L

2

(0, T ) → L

2

(0, T ) where for any λ ∈ L

2

(0, T ) :

320

(5.6) f(λ) := 1

n

n

X

i=1

E (u

i

(λ)) − v(λ).

321

From the boundedness of U and V , it easily follows that there exists a finite positive

322

real M

1

such that for any λ ∈ L

2

(0, T ) :

323

(5.7) kf (λ)k

2L2(0,T)

≤ M

1

.

324

For any λ ∈ L

2

(0, T ) , we denote by ∂(−W (λ)) the subgradient of −W at λ . Therefore

325

for any λ ∈ L

2

(0, T ) :

326

(5.8) ∂(−W(λ)) 3 −f (λ).

327

The iterative algorithm, proposed as an approximation scheme for λ

∈ arg max

λ

W(λ) ,

328

is summarized in the Stochastic Uzawa Algorithm 5.1.

329

(12)

Algorithm 5.1 Stochastic Uzawa

1:

Initialization λ

0

∈ L

2

(0, T ) , set {ρ

k

} satisfying Assumption 5.4.

2:

k ← 0 .

3:

for k = 0, 1, . . . do

4:

v

k

← v(λ

k

) where v(λ

k

) ∈ V (λ

k

) , this set being defined in (5.2).

5:

u

i,k

← u

i

k

) where u

i

k

) is defined in (5.1) for any i ∈ {1, . . . , n} .

6:

Generate n independent realizations of Brownian motions (W

1,k+1

, . . . , W

n,k+1

) , independent also with {W

i,p

: 1 ≤ i ≤ n, p ≤ k} .

7:

Compute the associated state realizations (X

1,u1k)

, . . . , X

n,unk)

) .

8:

Y

k+1

← 1 n

n

X

i=1

u

i

k

)(W

i,k+1

) − v(λ

k

) .

9:

λ

k+1

← λ

k

+ ρ

k

Y

k+1

.

Remark 5.3. For the purpose of notation, u

i

k

)(W

i,k+1

) in (8) corresponds to

330

the realization of u

i

k

) resulting from a realization of the Brownian W

i,k+1

.

331

At any dual iteration k of Algorithm 5.1, Y

k+1

is an estimator of

332

E ( 1 n

n

X

i=1

u

i

k

)(W

i,k+1

) − v(λ

k

)) . Therefore an alternative approach proposed in the

333

Sampled Stochastic Uzawa Algorithm 5.2 consists in performing less simulations at

334

each iteration, by taking m < n , at the risk of performing more dual iterations, to

335

estimate the quantity E ( 1 n

n

X

i=1

u

i

k

)(W

i,k+1

) − v(λ

k

)) .

Algorithm 5.2 Sampled Stochastic Uzawa

1:

Initialization of m a positive integer and λ ˇ

0

∈ L

2

(0, T ) , set {ρ

k

} satisfying As- sumption 5.4.

2:

k ← 0 .

3:

for k = 0, 1, . . . do

4:

v

k

← v(ˇ λ

k

) where v(ˇ λ

k

) ∈ V (ˇ λ

k

) , this set being defined in (5.2).

5:

Generate m i.i.d discrete random variables I

1k

, . . . , I

mk

uniformly in {1, . . . , n} .

6:

u

Ikj,k

← u

Ijk

(ˇ λ

k

) where u

Ijk

(ˇ λ

k

) is defined in (5.1) for any j ∈ {1, . . . , m} .

7:

Generate m independent realizations of Brownian motions (W

I1k,k+1

, . . . , W

Imk,k+1

) , independent also with {W

i,p

: 1 ≤ i ≤ m, p ≤ k} .

8:

Compute the associated state realizations (X

Ik1,uIk1λk)

, . . . , X

Imk,uIkmλk)

) .

9:

Y ˇ

k+1

← 1 m

m

X

j=1

u

Ikj

(ˇ λ

k

)(W

Ijk,k+1

) − v(ˇ λ

k

)

10:

ˇ λ

k+1

← λ ˇ

k

+ ρ

k

Y ˇ

k+1

.

336

The complexity of the Stochastic Uzawa Algorithm 5.2 is proportional to m × K ,

337

where K is the total number of dual iterations and m the number of simulations

338

performed at each iteration. The error E (kλ

k+1

− λ

k

2

) for λ

∈ S is the sum of

339

the square of the bias (which only depends on K and not on m ) and the variance

340

(which both depends on K and m ). Therefore this algorithm enables a bias variance

341

trade-off for a given complexity. Similarly for a given error it enables to optimize the

342

complexity of the algorithm.

343

(13)

Some assumptions on the step size are introduced.

344

Assumption 5.4. The sequence (ρ

k

)

k

is such that: ρ

k

> 0 ,

X

k=1

ρ

k

= ∞ and

345

X

k=1

k

)

2

< ∞ .

346

Note that a sequence of the form ρ

k

:= a

b + k , with (a, b) ∈ R

+

× R

+

, satisfies

347

Assumption 5.4.

348

Let us denote S := arg max

λ∈L2(0,T)

W(λ) , where S is nonempty because of the strong

349

convexity of W .

350

The following result establishes the convergence of the Stochastic Uzawa Algo-

351

rithm 5.1:

352

Theorem 5.5. Let Assumption 5.4 hold, then:

353

(i) {kλ

k

− λk

2L2(0,T)

} converges a.s., for all λ ∈ S.

354

(ii) W(λ

k

) − −−− →

k→∞

max

λ∈L2(0,T)

W(λ) a.s.

355

(iii) {λ

k

} weakly converges to some λ ¯ ∈ S in L

2

(0, T ) a.s.

356

(iv) If Assumption 3.5 holds, then a.s. {λ

k

} converges to ¯ λ in L

2

(0, T ), with

357

S := { ¯ λ}.

358

Though the proof is similar to [6, Theorem 3.6], the current framework is different

359

from the one of that reference, and for the convenience of the reader we provide the

360

proof.

361

We first state two lemmas.

362

Lemma 5.6 (Robbins-Siegmund). Let {G

k

} be an increasing sequence of σ-

363

algebra and d

k

, a

k

, b

k

and c

k

be nonnegative random variables adapted to G

k

. Assume

364

that: E (d

k+1

|G

k

) ≤ d

k

(1 + a

k

) + b

k

− c

k

and

X

k=1

a

k

< ∞ a.s.,

X

k=1

b

k

< ∞ a.s. Then

365

with probability one, {d

k

} is convergent and it holds that

X

k=1

c

k

< ∞.

366

Proof. See [5], Theorem 1.3.12.

367

Lemma 5.7. Let {α

k

} be a nonnegative deterministic sequence and {β

k

} a non-

368

negative random sequence adapted to {G

k

}. Assume that

X

k=1

α

k

= ∞ a.s. and

369

E (

X

k=1

α

k

β

k

) < ∞ a.s. Moreover assume that β

k

− E (β

k+1

|G

k

) ≤ cα

k

a.s. for all

370

k and some c > 0. Then β

k

−−→

a.s.

0.

371

Proof. See [6], Proposition 3.2.

372

Proof of Theorem 5.5. First consider point (i). Let λ ∈ S . For any k , G

k+1

is the

373

filtration defined by:

374

(5.9) G

k+1

:= σ {W

i,p

} : 1 ≤ i ≤ n, p ≤ k + 1}

.

375

Using the definition of Y

k+1

∈ L

2

(0, T ) line 8 in the Stochastic Uzawa Algorithm 5.1,

376

(14)

we have:

377

(5.10)

k+1

− λk

2L2(0,T)

= kλ

k

+ ρ

k

Y

k+1

− λk

2L2(0,T)

= kλ

k

− λk

2L2(0,T)

+ 2ρ

k

k

− λ, Y

k+1

i

L2(0,T)

+(ρ

k

)

2

kY

k+1

k

2L2(0,T)

.

378

Since Y

k+1

is independent from G

k

, and using (5.7), it follows that:

379

(5.11) E (kY

k+1

k

2L2(0,T)

|G

k

) = E k 1 n

n

X

i=1

u

i

k

)(W

i,k+1

) − v(λ

k

)k

2L2(0,T)

!

≤ M

1

380

Since λ

k

is G

k

-measurable and that E [Y

k+1

|G

k

] = f (λ

k

) , we have that:

381

(5.12)

E [kλ

k+1

− λk

2L2(0,T)

|G

k

]

= kλ

k

− λk

2L2(0,T)

+ 2ρ

k

E (hλ

k

− λ, Y

k+1

i|G

k

) + (ρ

k

)

2

E [kY

k+1

k

2L2(0,T)

|G

k

]

≤ kλ

k

− λk

2L2(0,T)

+ 2ρ

k

k

− λ, f(λ

k

)i + (ρ

k

)

2

M

1

≤ kλ

k

− λk

2L2(0,T)

+ (ρ

k

)

2

M

1

− 2ρ

k

(W(λ) − W(λ

k

)).

382

In the last inequality, we used the concavity of W and (5.8). We set:

383

(5.13) a

k

= 0, b

k

= (ρ

k

)

2

M

1

, c

k

= 2ρ

k

(W(λ) − W(λ

k

)),

384

We have that

X

k=1

a

k

< ∞ a.s. and

X

k=1

b

k

< ∞ a.s. Clearly, a

k

and b

k

are nonnega-

385

tive; c

k

is nonnegative since λ ∈ S . By Lemma 5.6, the sequence {kλ

k

− λk

2L2(0,T)

}

386

converges a.s. Now we show point (ii) thanks to Lemma 5.7.

387

By Lemma 5.6:

X

k=1

ρ

k

(W(λ) − W (λ

k

)) < ∞ a.s . Taking the expected value in

388

both side of (5.12), we get, using the deterministic version of Lemma 5.6 that:

389

E

X

k=1

ρ

k

(W(λ) − W(λ

k

))

!

< ∞ . By concavity of W and the Cauchy-Schwarz in-

390

equality, we have:

391

(5.14) W(λ

k+1

) − W(λ

k

) ≤ hf (λ

k

), λ

k+1

− λ

k

i ≤ ρ

k

kf(λ

k

)k kY

k+1

k.

392

Let τ

M

be the stopping time τ

M

:= inf{k : kλ

k

k > M } for M ∈ N.

393

The sequence {β

k

} is defined by:

394

(5.15) β

k

:=

W(λ) − W(λ

k

) if {τ

M

> k},

W(λ) − W(λ

τM

) otherwise , with β

τM+k

= β

τM

, k ≥ 1.

395

Notice that if kλ

k

k ≤ M , there exists by (5.11) and (5.7) M

0

> 0 such that

396

(5.16) kf (λ

k

)k E (kY

k+1

k|G

k

) ≤ (M

0

)

2

.

397

Now: β

k

−β

k+1

= 1

τM>k

(W(λ

k+1

) − W(λ

k

)) , and therefore by taking the conditional

398

expectation on both sides, noticing that 1

τM>k

is G

k

-measurable, and considering

399

(15)

(5.14) and (5.16), we get β

k

− E (β

k+1

|G

k

) ≤ ρ

k

(M

0

)

2

.

400

By Lemma 5.7, on the set B

M

:= {τ

M

= ∞} , β

k

converges to 0 and coincides with

401

W(λ) − W(λ

k

) . Since kλ

k

− λk converges a.s, kλ

k

k is bounded in probability and

402

therefore the probability of the set B

M

ca be made arbitrarily close to 1 by choosing

403

M large. Since P (∪

M=1

B

M

) = 1 , we may infer that W(λ) − W (λ

k

) converges to 0

404

almost surely.

405 406

For point (ii), since {kλ

k

− λk

2

} converges a.s. for all λ ∈ S , it is bounded

407

in probability, so the sequence {λ

k

} generated by the algorithm has a.s. a weak

408

accumulation point λ ¯ (the point ¯ λ is random in general). Let {λ

km

} such that λ

km

*

409

λ ¯ . Since W is concave and upper semi-continuous, it is weakly upper semi-continuous,

410

(5.17) W(¯ λ) ≥ lim

m→∞

W(λ

km

) = W(˜ λ).

411

In particular, λ ¯ ∈ S . To show uniqueness, let λ

1

, λ

2

∈ S be two distinct weak limits

412

of {λ

k

} , i.e λ

km

* λ

1

and λ

kl

* λ

2

. Then

413

(5.18) kλ

km

− λ

2

k

2

= kλ

km

− λ

1

k

2

+ kλ

1

− λ

2

k

2

+ 2hλ

km

− λ

1

, λ

1

− λ

2

i,

414 415

(5.19) kλ

kl

− λ

1

k

2

= kλ

k1

− λ

2

k

2

+ kλ

2

− λ

1

k

2

+ 2hλ

kl

− λ

2

, λ

2

− λ

1

i,

416

so by weak convergence of each subsequence, (5.18) and (5.19) are combined to obtain

417

(5.20) lim

m→∞

km

− λ

2

k

2

− kλ

km

− λ

1

k

2

= kλ

2

− λ

1

k

2

,

418 419

(5.21) lim

l→∞

kl

− λ

1

k

2

− kλ

kl

− λ

2

k

2

= kλ

2

− λ

1

k

2

.

420

By a.s. convergence of the sequence {kλ

k

− λk

2

} for all λ ∈ S , the limit of each

421

subsequence is equal to the limit of the entire sequence with probability one, so

422

m→∞

lim kλ

km

− λ

1

k

2

= lim

k→∞

k

− λ

1

k

2

=: l

1

and similarly lim

m→∞

km

− λ

2

k

2

=

423

lim

k→∞

k

−λ

2

k

2

=: l

2

. Therefore (5.20) and (5.21) imply l

2

− l

1

= kλ

1

− λ

2

k

2

= l

1

−l

2

,

424

meaning kλ

1

− λ

2

k

2

= 0 and thus the weak limits coincide. Therefore {λ

k

} is weakly

425

convergent to a unique limit with probability one.

426

Finally, the last statement can now be proved. By strong convexity, −W has an

427

unique minimum λ ¯ , so S = { λ} ¯ . By strong convexity, there exists a µ > 0 such that

428

(5.22) W(¯ λ) − W(λ

k

) ≥ −hf (¯ λ), λ

k

− ¯ λi

L2(0,T)

+ µ

2 kλ

k

− λk ¯

L2(0,T)

.

429

Since −hf (¯ λ), λ

k

− ¯ λi

L2(0,T)

> 0 , by optimality of λ ¯ , lim

k→∞

W(¯ λ) − W(λ

k

) = 0 a.s.

430

implies lim

k→∞

k

− λk ¯

L2(0,T)

= 0 a.s.

431

We recall the definition of J(u, v) ¯ in (4.1) and we define u ¯ :

432

(5.23) u ¯ := arg min

u∈U

( E

n

X

i=1

F

i

(u

i

, X

i,ui

) + h λ, u ¯

i

i

L2(0,T)

!) ,

433

If F

0

is strictly convex, then we define:

434

(5.24) ¯ v := arg min

v∈V

F

0

(v) + h λ, vi ¯

L2(0,T)

.

435

If Assumption 3.5 holds and F

0

is strictly convex, (¯ u, v, ¯ ¯ λ) is a saddle point and u ¯ is

436

the unique minimizer of J ˜ in U .

437

Références

Documents relatifs

Each class is characterized by its degree of dynamism (DOD, the ratio of the number of dynamic requests revealed at time t &gt; 0 over the number of a priori request known at time t

Indeed, when the scenario domain Ω is countable and that every scenario w(·) has strictly positive probability under P , V iab 1 (t 0 ) is the robust viability kernel (the set

There are multiple applications in various domains: non- rigid structure-from-motion, 3D curve matching, 3D tracking, gesture representation and analysis, motion primitives

Using the measured total transmission loss of the strip-loaded waveguide and simulated confinement factor of the STF film layer, the optical absorption coefficient of the STF film

Although no change was observed at the gene expression level in response to the OLIV diet in LXR+/+ mice, oleic acid-rich diet induced an LXR-dependent increase in SREBP-1c

Under general assumptions on the data, we prove the convergence of the value functions associated with the discrete time problems to the value function of the original problem..

Then, a convergence result is proved by using the general framework of Barles-Souganidis [10] based on the monotonicity, sta- bility, consistency of the scheme (a precise definition

Stochastic dynamic programming is based on the following principle : Take the decision at time step t such that the sum ”cost at time step t due to your decision” plus ”expected