HAL Id: hal-02917014
https://hal.inria.fr/hal-02917014v3
Preprint submitted on 17 Sep 2020 (v3), last revised 20 Oct 2021 (v4)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Stochastic Control Problems
Adrien Seguret, Clémence Alasseur, J Frédéric Bonnans, Antonio de Paola, Nadia Oudjane, Vincenzo Trovato
To cite this version:
Adrien Seguret, Clémence Alasseur, J Frédéric Bonnans, Antonio de Paola, Nadia Oudjane, et al..
Decomposition of High Dimensional Aggregative Stochastic Control Problems. 2020. �hal-02917014v3�
ADRIEN SEGURET†, CLÉMENCE ALASSEUR‡, J. FRÉDÉRIC BONNANS§, ANTONIO 3
DE PAOLA¶, NADIA OUDJANEk, AND VINCENZO TROVATO# 4
Abstract. We consider the framework of high dimensional stochastic control problem, in which 5
the controls are aggregated in the cost function. As first contribution we introduce a modified 6
problem, whose optimal control is under some reasonable assumptions anε-optimal solution of the 7
original problem. As second contribution, we present a decentralized algorithm whose convergence 8
to the solution of the modified problem is established. Finally, we study the application to a problem 9
of coordination of energy production and consumption of domestic appliances.
10
Key words. Stochastic optimization, Lagrangian decomposition, Uzawa’s algorithm, stochastic 11
gradient.
12
AMS subject classifications. 93E20,65K10, 90C25, 90C39, 90C15.
13
1. Introduction. The present article aims at solving a high dimensional
14
stochastic control problem (P
1) involving a large number n of agents indexed by
15
i ∈ {1, · · · , n} , of the form:
16
(1.1) (P
1)
inf
u∈U
J (u)
J(u) := E F
0( 1 n
n
X
i=1
u
i) + 1 n
n
X
i=1
F
i(u
i, X
i,ui)
! .
17
The dynamics of the state of each agent X
i,uiis driven by independent Brownian mo-
18
tions W
i(no common noise) so that potential interactions between agents dynamics
19
is only due to the non anticipative controls u
isupposed to be progressively measur-
20
able w.r.t. to the Brownian noise W = (W
i)
i∈{1,···,n}. We emphasize, the specific
21
structure of that problem whose cost function is the sum of, on one side, additively
22
separable terms F
ibetween agents and a coupling term F
0, function of the aggregate
23
strategies 1 n
n
X
i=1
u
i.
24
1.1. Motivations. This work is motivated by its potential applications for large-
25
scale coordination of flexible appliances, to support power system operation in a con-
26
text of increasing penetration of renewables. One type of appliances that has been
27
consistently investigated in the last few years, for its intrinsic flexibility and potential
28
∗Submitted to the editors September 17, 2020
Funding: The first, second, third and fifth author thank the FiME Lab (Institut Europlace de Finance). The third author was supported by the PGMO project “Optimal control of conservation equations", itself supported by iCODE(IDEX Paris-Saclay) and the Hadamard Mathematics LabEx.
†OSIRIS department, EDF Lab, Paris-Saclay, France([email protected]).
‡OSIRIS department, EDF Lab, Paris-Saclay, France([email protected]).
§CMAP, Ecole polytechnique, 91128 Palaiseau, France, and Inria- Saclay([email protected]).
¶Center for Sustainable Power Distribution University of Bath, Bath, UK([email protected]).
kOSIRIS department, EDF Lab, Paris-Saclay, France([email protected]).
#Department of Electrical and Electronic Engineering, Imperial College London, London, UK([email protected]).
1
for network support, includes thermostatically controlled loads (TCLs) such as re-
29
frigerators or air conditioners. Several papers have already investigated the potential
30
of dynamic demand control and frequency response services of TCLs [21] and how
31
the population recovers from significant perturbations [3]. The coordination of TCLs
32
can be performed in a centralized way, like in [8]. However this approach raises chal-
33
lenging problems in terms of communication requirements and customer privacy. A
34
common objective can be reached in a fully distributed approach, like in [25], where
35
each TCL is able to calculate its own actions (ON/OFF switching) to pursue a com-
36
mon objective. This paper is related to the work of De Paola et al. [4], where each
37
agent represents a flexible TCL device. In [4] a distributed solution is presented for
38
the operation of a population of n = 2 × 10
7refrigerators providing frequency sup-
39
port and load shifting. They adopt a game-theory framework, modelling the TCLs as
40
price-responsive rational agents that schedule their energy consumption and allocate
41
their frequency response provision in order to minimize their operational costs. The
42
potential practical application of our work also considers a large population of TCLS
43
which, contrarily to [4], have stochastic dynamics. The proposed approach is able to
44
minimize the overall system costs in a distributed way, with each TCL determining
45
its optimal power consumption profile in response to price signals.
46
1.2. Related literature. The considered problem belongs to the class of
47
stochastic control: looking for strategies minimizing the expectation of an objective
48
function under specific constraints. One of the main approaches proposed in the litera-
49
ture to tackle this problem is to use random trees: this consists in replacing the almost
50
sure constraints, induced by non-anticipativity, by a finite number of constraints to
51
get a finite set of scenarios (see. [9] and [19]). Once the tree structure is built, the
52
problem is solved by different decomposition methods such as scenario decomposition
53
[18] or dynamic splitting [20]. The main objective of the scenario method is reducing
54
the problem to an approximated deterministic one. The paper focuses on high dimen-
55
sional noise problems with large number of time steps, for which this approach is not
56
feasible. The idea of reducing a single high dimensional problem to a large number
57
with low dimension has been widely studied in the deterministic case. In determinis-
58
tic and stochastic problems a possibility is to use time decomposition thanks to the
59
Dynamic Programming Principle [1] taking advantage of Markov property of the sys-
60
tem. However, this method requires a specific time structure of the cost function and
61
fails when applied to problems for which the state space dimension is greater than
62
five. One can deal with the curse of dimensionality, under continuous linear-convex
63
assumptions, by using the Stochastic Dual Dynamic Programming algorithm (SDDP)
64
[15] to get upper and lower bounds of the value function, using polyhedral approxi-
65
mations. Though the almost-sure convergence of a broad class of SDDP algorithms
66
has been proved [17], there is no guarantee on the speed of the convergence and there
67
is no good stopping test. In [14], a stopping criteria based on a dual version of SDDP,
68
which gives a deterministic upper-bound for the primal problem, is proposed. SDDP
69
is well-adapted for medium sized population problems ( n ≤ 30 ), whereas it fails for
70
problems with magnitude similar to one of the present paper ( n > 1000 ). It is natural
71
for this type of high dimensional problem to investigate decomposition techniques in
72
the spirit of the Dual Approximation Dynamic Programming (DADP). DADP has
73
been developed in PhD theses (see [7], [12]). This approach is characterized by a
74
price decomposition of the problem, where the stochastic constraints are projected
75
on subspaces such that the associated Lagrangian multiplier is adapted for dynamic
76
programming. Then the optimal multiplier is estimated by implementing Uzawa’s
77
algorithm. To this end in [12], the Uzawa’s algorithm, formulated in a Hilbert set-
78
ting, is extended to a Banach space. DADP has been applied in different cases, such
79
as storage management problem for electrical production in [7, chapter 4] and hydro
80
valley management [2]. In the proposed paper, in the same vein as DADP we propose
81
a price decomposition approach restricted to deterministic prices. This new approach
82
takes advantage of the large population number in order to introduce an auxiliary
83
problem where the coupling term is purely deterministic.
84
1.3. Contributions. We consider the following approximation of problem (P
1) :
85
(1.2) (P
2)
u∈U
inf J ˜ (u) J ˜ (u) := F
01 n
n
X
i=1
E (u
i)
! + 1
n E
n
X
i=1
F
i(u
i, X
i,ui)
! .
86
As a first contribution, this paper shows that under some convexity and regularity
87
assumptions on F
0and (F
i)
i∈{1,...,n}, any solution of problem (P
2) is an ε
n-solution
88
of (P
1) , with ε
n→ 0 when n → ∞ . In addition, an approach of price decomposition
89
for (P
2) is easier than for (P
1) , since the Lagrange multiplier is deterministic for (P
2) ,
90
whereas it is stochastic for (P
1) . Since computing the dual cost of (P
2) is expensive,
91
we propose Stochastic Uzawa and Sampled Stochastic Uzawa algorithms relying on
92
Robbins Monroe algorithm in the spirit of the stochastic gradient. Its convergence
93
is established. We check the effectiveness of the Stochastic Uzawa algorithm on a
94
linear quadratic Gaussian framework, and we apply the Sampled Stochastic Uzawa
95
algorithm to a model of power system, inspired by the work of A. De Paola et al. [4].
96
2. General framework. Let (Ω, F, F , P ) be a complete filtered probability
97
space on which W = (W
i)
i=1,...,nis a n− dimensional Brownian motion, such that
98
for any t ∈ [0, T ] and i ∈ {1, . . . , n} , W
titakes value in R, and generates the filtration
99
F = (F
t)
0≤t≤T. P stands for the Wiener measure associated with this filtration and
100
F for the augmented filtration by all P-null sets.
101
The following notations are used:
102
X := {ϕ : Ω → C([0, T ], R ) | ϕ(·) is F − adapted , kϕk
∞,2:= E ( sup
s∈[0,T]
|ϕ(s)|
2)
12< ∞}, L
2(0, T ) := {ϕ : [0, T ] → R |
Z
T 0|ϕ(t)|
2dt < ∞},
U := {ϕ : [0, T ] × Ω → R | ϕ(·) is F − prog. measurable , E Z
T0
|ϕ(t)|
2dt < ∞},
103
and for any i ∈ {1, . . . , n} , the feasible set of controls is defined by:
104
(2.1) U
i:= {v ∈ U and v
t(ω) ∈ [−M
i, M
i], for a.a. (t, ω) ∈ [0, T ] × Ω},
105
and we set M := max
i∈{1,...,n}
M
i. The set of admissible controls is U := U
1× . . . × U
n,
106
whose elements are denoted by u := (u
1, . . . , u
n) .
107
Each local agent i = 1, . . . , n is supposed to control its state variable through the
108
control process u
i∈ U
iand suffers from independent uncertainties. More specifically,
109
the state process of each agent, X
i,ui= (X
ti,ui)
t∈[0;T], for i = 1, . . . , n takes values in
110
R and follows the dynamics
111
(2.2)
dX
ti,ui= µ
i(t, u
it, X
ti,ui)dt + σ
i(t, X
ti,ui)dW
ti, for t ∈ [0, T ], for i ∈ {1, . . . , n}
X
0,ui i= x
i0∈ R .
112
Without loss of generality, the initial states x
i0are supposed to be deterministic.
113
The process X
iis F-progressively measurable. For all i , F
istands for the natural
114
filtration of the Brownian motion W
i.
115
2.1. On the well-posedness of (P
1). In this section, the assumptions needed
116
for (P
1) to be well posed are studied.
117
Assumption 2.1. For any i ∈ {1, . . . , n} , the functions µ
iand σ
iare continuous
118
w.r.t (u, x) uniformly in t . In addition there exists K
i> 0 such that, for any t ∈ [0, T ]
119
and ν ∈ [−M, M ] :
120
(2.3)
|µ
i(t, ν, x) − µ
i(t, ν, y)| + |σ
i(t, ν, x) − σ
i(t, ν, y)| ≤ K
i|x − y|,
|µ
i(t, ν, x)| + |σ
i(t, ν, x)| ≤ K
i(1 + |x|), for any x, y ∈ R .
121
Lemma 2.2. Let i ∈ {1, . . . , n} and v ∈ U
ibe a control process. If Assumption
122
2.1 holds, then there exists a unique process X
i,v∈ X satisfying (2.2) (in the strong
123
sense) such that for any p ∈ [1, ∞):
124
(2.4) E
sup
0≤t≤T
|X
ti,v|
p< C (p, T, x
0, K ) < ∞ .
125
Proof. The proof for the existence and uniqueness of a solution of (2.2) relies on
126
[13, Theorem 3.6, Chapter 2]. The inequality is a result of [13, Theorem 4.4, Chapter
127
2 ].
128
Let F
0: L
2(0, T ) → R ¯ and F
i: L
2(0, T ) × C([0, T ], R ¯ ) → R be proper and lower
129
semi continuous functions, and there exists u ˆ ∈ U such that:
130
(2.5) E F
0( 1
n
n
X
i=1
ˆ u
i)
!
< ∞.
131
Define G
i: L
2(0, T ) → R ¯ by G
i(u
i) = F
i(u
i, X
i,ui) for any u
i∈ U
i. Additional
132
assumptions are formulated below.
133
Assumption 2.3. For any i ∈ {1, . . . , n} :
134
(i) u
i7→ E (G
i(u
i)) is strictly convex.
135
(ii) there exists a positive integer p such that F
ihas p -polynomial growth, i.e
136
there exists K > 0 such that for any x
i∈ C([0, T ], R ) and u
i∈ L
2(0, T ) :
137
|F
i(u
i, x
i)| ≤ K(1 + sup
0≤t≤T
|x
it|
p) .
138
Assumption 2.3.(1) holds in different cases, like in the example below.
139
Example 2.4. For any i ∈ {1, . . . , n} , there exists g
i: L
2(0, T ) → R and h
i:
140
C[0, T ] → R such that for any (v, X) ∈ L
2(0, T ) × C[0, T ] , F
i(v, X) = g
i(v) + h
i(X )
141
and there exists five L
∞([0, T ]) scalar functions α
i, β
i, γ
i, ξ
iand θ
isuch that for any
142
(t, ν, x) ∈ [0, T ] × [−M, M ] × R:
143
(2.6) µ
i(t, ν, x) = α
i(t)ν + β
i(t)x + γ
i(t) and σ
i(x, t) = ξ
i(t)x + θ
i(t).
144
Then Assumption 2.3.(i) is satisfied if:
145
(i) g
iis strictly convex and h
iconvex.
146
(ii) for a.e. t ∈ [0, T ] , α(t) 6= 0 , g
iis convex and h
istrictly convex.
147
Indeed, for any i ∈ {1, . . . , n} , u, v ∈ U, δ ∈ [0, 1] and t ∈ [0, 1] , it holds from (2.6)
148
that X
i,δu+(1−δ)vt
= δX
ti,u+ (1 − δ)X
ti,v. If point (i) holds, then:
149
(2.7) h
i(X
i,δu+(1−δ)v) ≤ δh
i(X
i,u) + (1 − δ)h
i(X
i,v).
150
Assumption 2.3.(i) follows from (2.7) and strict convexity of g
i.
151
Similarly, if point (ii) holds, then the inequality in (2.7) is strict, and Assumption
152
2.3.(i) follows using also the convexity of g
i.
153
Remark 2.5. If for any i ∈ {1, . . . , n} , Assumption 2.3.(i) holds, then G
iis w.l.s.c.
154
on U
i. Indeed, G
ibeing convex, finite valued and bounded on bounded subsets
155
of L
2(0, T ) ( from the polynomial growth of F
iand the inequality (2.4) ) ,thus G
iis
156
continuous on U
i.
157
From now on, Assumptions 2.1 and 2.3 are in force in the sequel.
158
The following lemma ensures the well-posedness of (P
1) .
159
Lemma 2.6. Suppose that F
0is convex. Then J reaches its minimum over U at
160
a unique point.
161
Proof. Clearly the control u ˆ ∈ U defined in (2.5) is feasible. The existence and
162
uniqueness of a minimum is proved by considering a minimizing sequence {u
k} of
163
J over U . The set U being bounded and weakly closed, there exists a sub-sequence
164
{u
k`} which weakly converges to a certain u
∗∈ U . Using Assumptions 2.3.(i)(ii) and
165
convexity of F
0, it follows that lim inf J (u
k`) ≥ J (u
∗) and thus u
∗is a solution of
166
(P
1) . The uniqueness is due to the strict convexity of u
i7→ E (F
i(u
i, X
i,ui)) .
167
Remark 2.7. This kind of stochastic optimization problem is illustrated in Section
168
7 with a problem of coordination of a large population of domestic appliances, where
169
a system operator has to meet the demand while producing at low cost. The state X
ti170
can represent for instance the temperature or the battery level of the agent i at time t ,
171
and u
itits proper power generation or consumption. F
0can be assimilated to the cost
172
function to satisfy the demand, and for any i , F
ito the cost function connected to the
173
proper functioning of the TCLs (characterized by individual cost function, comfort
174
constraints, etc...).
175
3. Approximating the optimization problem. In this section, the link be-
176
tween problems (P
1) and (P
2) is analyzed.
177
Assumption 3.1. Problem (P
2) admits a unique solution.
178
Notice that by using the same techniques as for Lemma 2.6, one can prove that the
179
above assumption is satisfied when F
0is convex.
180
We have the following key result.
181
Theorem 3.2. Under Assumption 3.1, J ˜ reaches its minimum over U at a unique
182
point, u ˜ ∈ U , such that for any i, u ˜
iis F
i−adapted and thus for any j 6= i, u ˜
iand u ˜
j183
are mutually independent.
184
Proof. Fix i ∈ {1, . . . , n} , using the convexity and l.s.c properties of
185
u
i7→ E (G
i(u
i)) and Jensen’s inequality, we get:
186
(3.1) E G
i(u
i)|W
i≥ G
iE (u
iW
i)
.
187
On the other hand (u
1, . . . , u
n) 7→ F
0( 1 n
n
X
i=1
E (u
i)) is invariant when taking the con-
188
ditional expectation, thus F
01 n
n
X
i=1
E (u
i)
!
= F
01 n
n
X
i=1
E E (u
i|W
i)
!
. From As-
189
sumption 3.2, we know that there exists a solution u
∗to (P
2) .
190
We set u ˜ := ( E (u
∗1|W
1), E (u
∗2|W
2), . . . , E (u
∗n|W
n)) . For any i , u ˜
i:=
191
E (u
∗i|W
i) is F
i-adapted. Using the definition of u
∗and (3.1), one can derive that
192 u∈U
inf
J(u) = ˜ ˜ J(u
∗) ≥ J ˜ (˜ u) .
193
Let U ˆ be a subset of U associated to decentralized controls, in the sense that:
194
(3.2) U ˆ := {u ∈ U | u
iis F
i− adapted for all i ∈ {1, . . . , n}}
195
From Theorem 3.2, if Assumption 3.1 holds, then:
196
(3.3) min
u∈Uˆ
J ˜ (u) = min
u∈U
J ˜ (u).
197 198
Remark 3.3. If Assumption 3.1 isn’t satisfied, we can prove by same arguments
199
that for any ε > 0 there exists an ε -optimal solution such that the individual controls
200
are mutually independent.
201
Lemma 3.4. If F
0is Lipschitz with constant γ, then an optimal solution in U ˆ of
202
problem (P
2) is an ε-optimal solution in U ˆ of problem (P
1), with ε = 2γM p T /n.
203
Proof. Indeed, there exists a number γ such that γ > 0 and for all x, y ∈ H
1we
204
have |F
0(x) − F
0(y)| < γkx − yk
H1. We set for any u ∈ U :
205
(3.4) u ˆ
i:= u
i− E (u
i).
206
Using the Jensen and Hölder inequalities, ( E |Y |) ≤ ( E |Y |
2)
12, the fact that for any
207
j 6= i , u
iand u
jare mutually independent, and that u
iis bounded by M , we have
208
∀u ∈ U ˆ :
209
(3.5)
| E F
0( 1 n
n
X
i=1
u
i)) − F
0( 1 n
n
X
i=1
E (u
i)
!
| ≤ E |F
0( 1 n
n
X
i=1
u
i)) − F
0( 1 n
n
X
i=1
E (u
i))|
!
≤ γ n E (k
n
X
i=1
ˆ
u
ik
L2(0,T))
≤ γ n E (k
n
X
i=1
ˆ
u
ik
2L2(0,T))
12= γ n
Z
T 0Var(
n
X
i=1
u
it)dt
12≤ γ n
12M √
T .
210
Let u ˜
∗denote a minimizer of J ˜ on U ˆ , then using (3.5) for any u ∈ U ˆ it holds:
211
(3.6) J (˜ u
∗) ≤ J(˜ ˜ u
∗) + γ n
12M √
T ≤ J ˜ (u) + γ n
12M √
T ≤ J(u) + 2γ n
12M √
212
T .
Assumption 3.5. F
0is Gâteaux differentiable with c -Lipschitz derivative.
213
Theorem 3.6. Suppose F
0is convex, then the following ε-optimality results hold:
214
(i) For any u ∈ U , J ˜ (u) ≤ J (u).
215
(ii) Suppose Assumption 3.5 holds, then any optimal solution of problem (P
2) is
216
an ε-optimal solution (where ε = 2cT M
2/n) of problem (P
1).
217
Proof. Proof of point (i).
218
By Jensen’s inequality, we have that:
219
(3.7) F
0( 1
n
n
X
i=1
E (u
i)) ≤ E (F
0( 1 n
n
X
i=1
u
i)), ∀u ∈ U ,
220
which gives the result.
221
Proof of point (ii).
222
Since F
0is convex, differentiable, with a c -Lipschitz differential, one can derive for
223
any u ∈ U ˆ and a.s.:
224
(3.8)
F
0( 1 n
n
X
i=1
u
i) − F
0( 1 n
n
X
i=1
E [u
i])
≤ 1
n h∇F
0( 1 n
n
X
i=1
u
i) ,
n
X
i=1
ˆ
u
ii
L2(0,T)= 1
n h(∇F
0( 1 n
n
X
i=1
u
i) − ∇F
0( 1 n
n
X
i=1
E [u
i])) ,
n
X
i=1
ˆ
u
ii
L2(0,T)+ 1
n h∇F
0( 1 n
n
X
i=1
E (u
i)) ,
n
X
i=1
ˆ
u
ii
L2(0,T)≤ c n
2k
n
X
i=1
ˆ
u
ik
2L2(0,T)+ 1
n h∇F
0( 1 n
n
X
i=1
E (u
i)) ,
n
X
i=1
ˆ
u
ii
L2(0,T), )
225
where u ˆ
iis defined in (3.4). Taking the expectation of (3.8), E h∇F
0( 1
n
n
X
i=1
E [u
i]) ,
n
X
i=1
ˆ
u
ii
L2(0,T)!
= 0
, and using the mutual independence of the controls and their boundedness we get as
226
in (3.5):
227
(3.9) c
n
2E k
n
X
i=1
ˆ
u
ik
2L2(0,T)= c n
2Z
T 0n
X
i=1
Var(u
it)dt ≤ c n T M
2.
228
Let u ˜
∗denote a minimizer of J ˜ on U ˆ , then using (3.3), (3.9) and (3.7), for any u
0∈ U
229
we have:
230
(3.10) J (˜ u
∗) ≤ J ˜ (˜ u
∗) + c
n T M
2≤ J ˜ (u
0) + c
n T M
2≤ J (u
0) + 2c n T M
2.
231
Thus for ε = 2cT M
2/n , u ˜
∗constitutes an ε -optimal solution to the stochastic control
232
problem (P
1) .
233
Proposition 3.7. If F
0is convex, then we have the following inequalities:
234
(3.11) J(˜ u) − J(˜ ˜ u) ≥ J(˜ u) − J(u
∗) ≥ 0,
235
where u ˜ and u
∗are respectively the optimal controls of problems (P
2) and (P
1).
236
Proof. From Jensen inequality and by definition of u ˜ we have:
237
(3.12) J (u
∗) ≥ J(u ˜
∗) ≥ J ˜ (˜ u),
238
therefore from the two previous inequalities and adding J (˜ u) we get (3.12).
239
Remark 3.8. An approximation scheme to compute u ˜ is provided in Section 5.
240
The practical interest of inequality (3.11) is that one can compute an upper bound
241
for the error J (˜ u)−J (u
∗) , that can be automatically derived from this approximation.
242
4. Dualization and Decentralization of problem (P
2). From now on, in
243
addition to Assumptions 2.1 and 2.3, the assumption that F
0is convex is in force in
244
the sequel. The problem (P
2) defined in (1.2) is dualized in order to decouple the
245
controls in this problem.
246 247
The optimization problem (P
2) is equivalent to:
248
(4.1) (P
3)
min
u∈U,v∈V
J(u, v), ¯ J(u, v) := ¯ F
0(v) + 1
n E
n
X
i=1
F
i(u
i, X
i,ui)
! , s.t g(u, v) = 0,
249
where g(u, v) := 1 n
n
X
i=1
E (u
i) − v and V := {ν ∈ L
2(0, T )|; |v(t)| ≤ 2M ∀t ∈ [0, T ]} .
250
The Lagrangian function associated with the constrained optimization problem (P
3)
251
is: L : U × L
2(0, T ) × L
2(0, T ) − → ¯
R defined by:
252
(4.2) L(u, v, λ) := ¯ J(u, v) + hλ, 1 n
n
X
i=1
E (u
i) − vi
L2(0,T).
253
The dual problem (D) associated with (P
3) is:
254
(4.3) (D) max
λ∈L2(0,T)
W(λ), where W(λ) := min
u∈U,v∈V
L(u, v, λ).
255
The problem is said to be qualified if it is still feasible after a small perturbation of
256
the constraint, in the following sense:
257
(4.4) There exists ε > 0 such that B
L2(0,T)(0, ε) ⊂ g(U , V ),
258
where B
L2(0,T)(0, ε) is the open ball of radius ε in L
2(0, T ) and g has been defined in
259
(4.1).
260
Lemma 4.1. Problem (P
3) is qualified.
261
Proof. Choose ε := M . Then: B
L2(0,T)(0, ε) ⊂ B
L2(0,T)(0, 2M ) = g(0, V) ⊂
262
g(U , V) , where g(0, V) and g(U , V) are respectively the image by g of {0} × V and
263
U × V . The conclusion follows.
264
By Assumption 2.3, Lemma 4.1 and the convexity of F
0, the strong duality holds:
265
W(λ
∗) = ˜ J (u
∗) , where λ
∗∈ arg max
λ∈L2(0,T)
W(λ) and u
∗∈ arg min
u∈U,v∈V
L(λ
∗, u, v).
266
Since the set of admissible controls U = U
1× . . . × U
nis a Cartesian product and by strict convexity of u
i7→ E (F
i(u
i, X
i,ui)) , each component u
∗ican be uniquely determined by solving the following sub problem:
u
∗i= arg min
ui∈Uˆi
n E
F
i(u
i, X
i,ui) + hλ
∗, u
ii
L2(0,T)o , where U ˆ
i:= {u ∈ U
i| u
iis F
i− adapted } .
267
Remark 4.2. By using the same argument as in Theorem 3.2, one can prove:
268
(4.5)
min
ui∈Uˆi
n E
F
i(u
i, X
i,ui) + hλ
∗, u
ii
L2(0,T)o
= min
ui∈Ui
n E
F
i(u
i, X
i,ui) + hλ
∗, u
ii
L2(0,T)o .
269
5. Stochastic Uzawa and Sampled Stochastic Uzawa algorithms.
270
5.1. Continuous time setting. We recall that Assumptions 2.1 and 2.3 are in
271
force, as well as convexity of F
0.
272
This section aims at proposing an algorithm to find a solution of the dual problem
273
(4.3).
274
For all i ∈ {1, . . . n} , and λ ∈ L
2(0, T ) , we define the optimal control u
i(λ) :
275
(5.1) u
i(λ) := arg min
ui∈Uˆi
n E
F
i(u
i, X
i,ui) + hλ, u
ii
L2(0,T)o ,
276
which is well defined since u
i→ E (F
i(u
i, X
i,ui)) is strictly convex.
277
For any λ ∈ L
2(0, T ) , the subset V (λ) is defined by:
278
(5.2) V (λ) := arg min
v∈V
{F
0(v) − hλ, vi
L2(0,T)}.
279
Since F
0is convex and V is bounded, V (λ) is a non empty subset of V and is reduced
280
to a singleton if F
0is strictly convex.
281
For any λ ∈ L
2(0, T ) , a function v(λ) , which is a selection of V (λ) , is associated.
282
Uzawa’s algorithm seems particularly adapted for this problem. However at each
283
dual iteration k and any i ∈ {1, . . . , n} , for the update of λ
k+1, one would have to
284
compute the quantities E [u
i(λ
k)] , which is hard in practice. Therefore two algorithms
285
are proposed where at each iteration k , λ
k+1is updated thanks to a realization of
286
u
i(λ
k) .
287
For any real valued function F defined on L
2(0, T ) , F
∗stands for its Fenchel
288
conjugate.
289
Lemma 5.1. Assumption 3.5 holds iff F
0∗is proper and strongly convex.
290
Proof. (i) Let Assumption 3.5 hold. Since F
0is proper, convex and l.s.c., F
0∗291
is l.s.c. proper. From the Lipschitz property of the gradient of F
0, it holds that
292
dom (F
0) = L
2(0, T ) .
293
Let s, s ˜ ∈ dom (F
0∗) such that there exist λ
s∈ ∂F
0∗(s) and µ
˜s∈ ∂F
0∗(˜ s) . From
294
the differentiability, l.s.c. and convexity of F
0, it follows that: s = ∇F
0(λ
s) and
295
˜
s = ∇F
0(µ
˜s) . By Assumption 3.5 and the extended Baillon-Haddad theorem [16,
296
Theorem 3.1], ∇F
0is cocoervice. In other words:
297
(5.3)
hs − s, λ ˜
s− µ
s˜i
L2(0,T)= h∇F
0(λ
s) − ∇F
0(µ
˜s), λ
s− µ
˜si
L2(0,T)≥ 1
c k∇F
0(λ
s) − ∇F
0(µ
s˜)k
2L2(0,T)= 1
c ks − sk ˜
2L2(0,T).
298
Therefore ∂F
0∗is strongly monotone, which implies the strong convexity of F
0∗.
299
(ii) Conversely, assume that F
0∗is proper and strongly convex. Then there exist
300
α, β > 0 such that for any s ∈ dom (F
0∗) : F
0∗(s) ≥ αksk
2L2(0,T)− β , and F
0being
301
convex, l.s.c. and proper, for any λ ∈ L
2(0, T ) it holds:
302
(5.4) F
0(λ) ≤ sup
s∈L2(0,T)
hs, λi
L2(0,T)− αksk
L2(0,T)2+ β = kλk
2/α + β.
303
Thus F
0is proper and uniformly upper bounded over bounded sets and therefore is
304
locally Lipschitz. In addition, from the strong convexity of F
0∗and the convexity
305
of F
0, for any λ ∈ L
2(0, T ) , ∂F
0(λ) is a singleton. Thus F
0is everywhere Gâteaux
306
differentiable.
307
Let λ, µ ∈ L
2(0, T ) . Since F
0∗is strongly convex, the functions F
0∗(s)−hλ, si
L2(0,T) 308(resp. F
0∗(s) − hµ, si
L2(0,T)) has a unique minimum point s
λ(resp. s
µ), characterized
309
by: λ ∈ ∂F
0∗(s
λ) and µ ∈ ∂F
0∗(s
µ) . From the strong convexity of F
0∗, the strong
310
monotonicity of ∂F
0∗holds: hµ − λ, s
µ− s
λi
L2(0,T)≥ 1
c ks
µ− s
λk
2L2(0,T), where c > 0
311
is a constant related to the strong convexity of F
0∗. Using that s
λ= ∇F
0(λ) and
312
s
µ= ∇F
0(µ) , it holds:
313
(5.5) h µ − λ, ∇F
0(µ) − ∇F
0(λ) i
L2(0,T)≥ 1
c k∇F
0(µ) − ∇F
0(λ)k
2L2(0,T),
314
meaning that ∇F
0is cocoercive. Applying the Cauchy–Schwarz inequality to the left
315
hand side of the previous inequality, the Lipschitz property of ∇F
0follows.
316
Lemma 5.2. If Assumption 3.5 holds, then W is strongly concave.
317
Proof. For any λ ∈ L
2(0, T ) , we have: W(λ) = − F
0∗(λ) + 1 n
n
X
i=1
F
i∗(λ) , where
318
for any i ∈ {1, . . . , n} , F
i∗is convex and from Lemma 5.1 F
0∗is strongly convex.
319
We introduce the function f : L
2(0, T ) → L
2(0, T ) where for any λ ∈ L
2(0, T ) :
320
(5.6) f(λ) := 1
n
n
X
i=1
E (u
i(λ)) − v(λ).
321
From the boundedness of U and V , it easily follows that there exists a finite positive
322
real M
1such that for any λ ∈ L
2(0, T ) :
323
(5.7) kf (λ)k
2L2(0,T)≤ M
1.
324
For any λ ∈ L
2(0, T ) , we denote by ∂(−W (λ)) the subgradient of −W at λ . Therefore
325
for any λ ∈ L
2(0, T ) :
326
(5.8) ∂(−W(λ)) 3 −f (λ).
327
The iterative algorithm, proposed as an approximation scheme for λ
∗∈ arg max
λ
W(λ) ,
328
is summarized in the Stochastic Uzawa Algorithm 5.1.
329
Algorithm 5.1 Stochastic Uzawa
1:
Initialization λ
0∈ L
2(0, T ) , set {ρ
k} satisfying Assumption 5.4.
2:
k ← 0 .
3:
for k = 0, 1, . . . do
4:
v
k← v(λ
k) where v(λ
k) ∈ V (λ
k) , this set being defined in (5.2).
5:
u
i,k← u
i(λ
k) where u
i(λ
k) is defined in (5.1) for any i ∈ {1, . . . , n} .
6:
Generate n independent realizations of Brownian motions (W
1,k+1, . . . , W
n,k+1) , independent also with {W
i,p: 1 ≤ i ≤ n, p ≤ k} .
7:
Compute the associated state realizations (X
1,u1(λk), . . . , X
n,un(λk)) .
8:
Y
k+1← 1 n
n
X
i=1
u
i(λ
k)(W
i,k+1) − v(λ
k) .
9:
λ
k+1← λ
k+ ρ
kY
k+1.
Remark 5.3. For the purpose of notation, u
i(λ
k)(W
i,k+1) in (8) corresponds to
330
the realization of u
i(λ
k) resulting from a realization of the Brownian W
i,k+1.
331
At any dual iteration k of Algorithm 5.1, Y
k+1is an estimator of
332
E ( 1 n
n
X
i=1
u
i(λ
k)(W
i,k+1) − v(λ
k)) . Therefore an alternative approach proposed in the
333
Sampled Stochastic Uzawa Algorithm 5.2 consists in performing less simulations at
334
each iteration, by taking m < n , at the risk of performing more dual iterations, to
335
estimate the quantity E ( 1 n
n
X
i=1
u
i(λ
k)(W
i,k+1) − v(λ
k)) .
Algorithm 5.2 Sampled Stochastic Uzawa
1:
Initialization of m a positive integer and λ ˇ
0∈ L
2(0, T ) , set {ρ
k} satisfying As- sumption 5.4.
2:
k ← 0 .
3:
for k = 0, 1, . . . do
4:
v
k← v(ˇ λ
k) where v(ˇ λ
k) ∈ V (ˇ λ
k) , this set being defined in (5.2).
5:
Generate m i.i.d discrete random variables I
1k, . . . , I
mkuniformly in {1, . . . , n} .
6:
u
Ikj,k← u
Ijk(ˇ λ
k) where u
Ijk(ˇ λ
k) is defined in (5.1) for any j ∈ {1, . . . , m} .
7:
Generate m independent realizations of Brownian motions (W
I1k,k+1, . . . , W
Imk,k+1) , independent also with {W
i,p: 1 ≤ i ≤ m, p ≤ k} .
8:
Compute the associated state realizations (X
Ik1,uIk1(ˇλk), . . . , X
Imk,uIkm(ˇλk)) .
9:
Y ˇ
k+1← 1 m
m
X
j=1
u
Ikj(ˇ λ
k)(W
Ijk,k+1) − v(ˇ λ
k)
10:
ˇ λ
k+1← λ ˇ
k+ ρ
kY ˇ
k+1.
336
The complexity of the Stochastic Uzawa Algorithm 5.2 is proportional to m × K ,
337
where K is the total number of dual iterations and m the number of simulations
338
performed at each iteration. The error E (kλ
k+1− λ
∗k
2) for λ
∗∈ S is the sum of
339
the square of the bias (which only depends on K and not on m ) and the variance
340
(which both depends on K and m ). Therefore this algorithm enables a bias variance
341
trade-off for a given complexity. Similarly for a given error it enables to optimize the
342
complexity of the algorithm.
343
Some assumptions on the step size are introduced.
344
Assumption 5.4. The sequence (ρ
k)
kis such that: ρ
k> 0 ,
∞
X
k=1
ρ
k= ∞ and
345
∞
X
k=1
(ρ
k)
2< ∞ .
346
Note that a sequence of the form ρ
k:= a
b + k , with (a, b) ∈ R
∗+× R
+, satisfies
347
Assumption 5.4.
348
Let us denote S := arg max
λ∈L2(0,T)
W(λ) , where S is nonempty because of the strong
349
convexity of W .
350
The following result establishes the convergence of the Stochastic Uzawa Algo-
351
rithm 5.1:
352
Theorem 5.5. Let Assumption 5.4 hold, then:
353
(i) {kλ
k− λk
2L2(0,T)} converges a.s., for all λ ∈ S.
354
(ii) W(λ
k) − −−− →
k→∞
max
λ∈L2(0,T)
W(λ) a.s.
355
(iii) {λ
k} weakly converges to some λ ¯ ∈ S in L
2(0, T ) a.s.
356
(iv) If Assumption 3.5 holds, then a.s. {λ
k} converges to ¯ λ in L
2(0, T ), with
357
S := { ¯ λ}.
358
Though the proof is similar to [6, Theorem 3.6], the current framework is different
359
from the one of that reference, and for the convenience of the reader we provide the
360
proof.
361
We first state two lemmas.
362
Lemma 5.6 (Robbins-Siegmund). Let {G
k} be an increasing sequence of σ-
363
algebra and d
k, a
k, b
kand c
kbe nonnegative random variables adapted to G
k. Assume
364
that: E (d
k+1|G
k) ≤ d
k(1 + a
k) + b
k− c
kand
∞
X
k=1
a
k< ∞ a.s.,
∞
X
k=1
b
k< ∞ a.s. Then
365
with probability one, {d
k} is convergent and it holds that
∞
X
k=1
c
k< ∞.
366
Proof. See [5], Theorem 1.3.12.
367
Lemma 5.7. Let {α
k} be a nonnegative deterministic sequence and {β
k} a non-
368
negative random sequence adapted to {G
k}. Assume that
∞
X
k=1
α
k= ∞ a.s. and
369
E (
∞
X
k=1
α
kβ
k) < ∞ a.s. Moreover assume that β
k− E (β
k+1|G
k) ≤ cα
ka.s. for all
370
k and some c > 0. Then β
k−−→
a.s.0.
371
Proof. See [6], Proposition 3.2.
372
Proof of Theorem 5.5. First consider point (i). Let λ ∈ S . For any k , G
k+1is the
373
filtration defined by:
374
(5.9) G
k+1:= σ {W
i,p} : 1 ≤ i ≤ n, p ≤ k + 1}
.
375
Using the definition of Y
k+1∈ L
2(0, T ) line 8 in the Stochastic Uzawa Algorithm 5.1,
376
we have:
377
(5.10)
kλ
k+1− λk
2L2(0,T)= kλ
k+ ρ
kY
k+1− λk
2L2(0,T)= kλ
k− λk
2L2(0,T)+ 2ρ
khλ
k− λ, Y
k+1i
L2(0,T)+(ρ
k)
2kY
k+1k
2L2(0,T).
378
Since Y
k+1is independent from G
k, and using (5.7), it follows that:
379
(5.11) E (kY
k+1k
2L2(0,T)|G
k) = E k 1 n
n
X
i=1
u
i(λ
k)(W
i,k+1) − v(λ
k)k
2L2(0,T)!
≤ M
1380
Since λ
kis G
k-measurable and that E [Y
k+1|G
k] = f (λ
k) , we have that:
381
(5.12)
E [kλ
k+1− λk
2L2(0,T)|G
k]
= kλ
k− λk
2L2(0,T)+ 2ρ
kE (hλ
k− λ, Y
k+1i|G
k) + (ρ
k)
2E [kY
k+1k
2L2(0,T)|G
k]
≤ kλ
k− λk
2L2(0,T)+ 2ρ
khλ
k− λ, f(λ
k)i + (ρ
k)
2M
1≤ kλ
k− λk
2L2(0,T)+ (ρ
k)
2M
1− 2ρ
k(W(λ) − W(λ
k)).
382
In the last inequality, we used the concavity of W and (5.8). We set:
383
(5.13) a
k= 0, b
k= (ρ
k)
2M
1, c
k= 2ρ
k(W(λ) − W(λ
k)),
384
We have that
∞
X
k=1
a
k< ∞ a.s. and
∞
X
k=1
b
k< ∞ a.s. Clearly, a
kand b
kare nonnega-
385
tive; c
kis nonnegative since λ ∈ S . By Lemma 5.6, the sequence {kλ
k− λk
2L2(0,T)}
386
converges a.s. Now we show point (ii) thanks to Lemma 5.7.
387
By Lemma 5.6:
∞
X
k=1
ρ
k(W(λ) − W (λ
k)) < ∞ a.s . Taking the expected value in
388
both side of (5.12), we get, using the deterministic version of Lemma 5.6 that:
389
E
∞
X
k=1
ρ
k(W(λ) − W(λ
k))
!
< ∞ . By concavity of W and the Cauchy-Schwarz in-
390
equality, we have:
391
(5.14) W(λ
k+1) − W(λ
k) ≤ hf (λ
k), λ
k+1− λ
ki ≤ ρ
kkf(λ
k)k kY
k+1k.
392
Let τ
Mbe the stopping time τ
M:= inf{k : kλ
kk > M } for M ∈ N.
393
The sequence {β
k} is defined by:
394
(5.15) β
k:=
W(λ) − W(λ
k) if {τ
M> k},
W(λ) − W(λ
τM) otherwise , with β
τM+k= β
τM, k ≥ 1.
395
Notice that if kλ
kk ≤ M , there exists by (5.11) and (5.7) M
0> 0 such that
396
(5.16) kf (λ
k)k E (kY
k+1k|G
k) ≤ (M
0)
2.
397
Now: β
k−β
k+1= 1
τM>k(W(λ
k+1) − W(λ
k)) , and therefore by taking the conditional
398
expectation on both sides, noticing that 1
τM>kis G
k-measurable, and considering
399
(5.14) and (5.16), we get β
k− E (β
k+1|G
k) ≤ ρ
k(M
0)
2.
400
By Lemma 5.7, on the set B
M:= {τ
M= ∞} , β
kconverges to 0 and coincides with
401
W(λ) − W(λ
k) . Since kλ
k− λk converges a.s, kλ
kk is bounded in probability and
402
therefore the probability of the set B
Mca be made arbitrarily close to 1 by choosing
403
M large. Since P (∪
∞M=1B
M) = 1 , we may infer that W(λ) − W (λ
k) converges to 0
404
almost surely.
405 406
For point (ii), since {kλ
k− λk
2} converges a.s. for all λ ∈ S , it is bounded
407
in probability, so the sequence {λ
k} generated by the algorithm has a.s. a weak
408
accumulation point λ ¯ (the point ¯ λ is random in general). Let {λ
km} such that λ
km*
409
λ ¯ . Since W is concave and upper semi-continuous, it is weakly upper semi-continuous,
410
(5.17) W(¯ λ) ≥ lim
m→∞
W(λ
km) = W(˜ λ).
411
In particular, λ ¯ ∈ S . To show uniqueness, let λ
1, λ
2∈ S be two distinct weak limits
412
of {λ
k} , i.e λ
km* λ
1and λ
kl* λ
2. Then
413
(5.18) kλ
km− λ
2k
2= kλ
km− λ
1k
2+ kλ
1− λ
2k
2+ 2hλ
km− λ
1, λ
1− λ
2i,
414 415
(5.19) kλ
kl− λ
1k
2= kλ
k1− λ
2k
2+ kλ
2− λ
1k
2+ 2hλ
kl− λ
2, λ
2− λ
1i,
416
so by weak convergence of each subsequence, (5.18) and (5.19) are combined to obtain
417
(5.20) lim
m→∞
kλ
km− λ
2k
2− kλ
km− λ
1k
2= kλ
2− λ
1k
2,
418 419
(5.21) lim
l→∞
kλ
kl− λ
1k
2− kλ
kl− λ
2k
2= kλ
2− λ
1k
2.
420
By a.s. convergence of the sequence {kλ
k− λk
2} for all λ ∈ S , the limit of each
421
subsequence is equal to the limit of the entire sequence with probability one, so
422
m→∞
lim kλ
km− λ
1k
2= lim
k→∞
kλ
k− λ
1k
2=: l
1and similarly lim
m→∞
kλ
km− λ
2k
2=
423
lim
k→∞
kλ
k−λ
2k
2=: l
2. Therefore (5.20) and (5.21) imply l
2− l
1= kλ
1− λ
2k
2= l
1−l
2,
424
meaning kλ
1− λ
2k
2= 0 and thus the weak limits coincide. Therefore {λ
k} is weakly
425
convergent to a unique limit with probability one.
426
Finally, the last statement can now be proved. By strong convexity, −W has an
427
unique minimum λ ¯ , so S = { λ} ¯ . By strong convexity, there exists a µ > 0 such that
428
(5.22) W(¯ λ) − W(λ
k) ≥ −hf (¯ λ), λ
k− ¯ λi
L2(0,T)+ µ
2 kλ
k− λk ¯
L2(0,T).
429
Since −hf (¯ λ), λ
k− ¯ λi
L2(0,T)> 0 , by optimality of λ ¯ , lim
k→∞
W(¯ λ) − W(λ
k) = 0 a.s.
430
implies lim
k→∞
kλ
k− λk ¯
L2(0,T)= 0 a.s.
431
We recall the definition of J(u, v) ¯ in (4.1) and we define u ¯ :
432
(5.23) u ¯ := arg min
u∈U
( E
n
X
i=1
F
i(u
i, X
i,ui) + h λ, u ¯
ii
L2(0,T)!) ,
433
If F
0is strictly convex, then we define:
434
(5.24) ¯ v := arg min
v∈V
F
0(v) + h λ, vi ¯
L2(0,T).
435
If Assumption 3.5 holds and F
0is strictly convex, (¯ u, v, ¯ ¯ λ) is a saddle point and u ¯ is
436
the unique minimizer of J ˜ in U .
437