• Aucun résultat trouvé

Extended McKean-Vlasov optimal stochastic control applied to smart grid management

N/A
N/A
Protected

Academic year: 2021

Partager "Extended McKean-Vlasov optimal stochastic control applied to smart grid management"

Copied!
32
0
0

Texte intégral

(1)

HAL Id: hal-02181227

https://hal.archives-ouvertes.fr/hal-02181227v2

Preprint submitted on 12 Jan 2021

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Extended McKean-Vlasov optimal stochastic control

applied to smart grid management

Emmanuel Gobet, Maxime Grangereau

To cite this version:

Emmanuel Gobet, Maxime Grangereau. Extended McKean-Vlasov optimal stochastic control applied

to smart grid management. 2021. �hal-02181227v2�

(2)

Extended McKean-Vlasov optimal stochastic control applied to

smart grid management

*†

Emmanuel GOBET

and Maxime GRANGEREAU

§

Abstract

We study the mathematical modeling of the energy management system of a smart grid, related to a aggregated consumer equipped with renewable energy production (PV panels e.g.), storage facilities (batteries), and connected to the electrical public grid. He controls the use of the storage facilities in order to diminish the random fluctuations of his residual load on the public grid, so that intermittent renewable energy is better used leading globally to a much greener carbon footprint. The optimization problem is described in terms of an extended McKean-Vlasov stochastic control problem. Using the Pontryagin principle, we characterize the optimal storage control as solution of a certain McKean-Vlasov Forward Backward Stochastic Differential Equation (possibly with jumps), for which we prove existence and uniqueness. Quasi-explicit solutions are derived when the cost functions may not be linear-quadratic, using a perturbation approach. Numerical experiments support the study.

1

Introduction

General context in energy management. The energy sector is currently facing major changes because of the raising concern about climate change, the search for energy-efficiency and the need to reduce carbon footprint. In particular, the share of renewable energy (RE for short) production has increased in most industrialized countries over the last few years, and further effort has to be done to limit the temperature increase well below 2◦ C by 2100, as targeted by the 2015 Paris agreement. However, even if these renewable energies allow a huge reduction of carbon footprint during the energy production phase, they raise a major issue: the amount of energy produced is intermittent and uncertain, as a main difference with more conventional energy production units (coal/gas-fired units, or nuclear power plants).

Reducing uncertainty of net residual consumption. Since the electricity production has to meet consumption at all spatial and time scales, the load balancing operations become harder in this uncertain context, this leads to higher operating costs for the whole electricity system; furthermore, it sometimes lead to ecologically catastrophic solutions such as the use of coal units to compensate the deficit of clean energy production. See [Mor+14] for an overview on how to integrate renewables in electricity markets. Therefore, a major challenge is to smooth the electricity consumption by better predicting RE production and better managing the energy system. We address the latter in the context of a consumer equipped with its own RE production (e.g. PV panels), and formalize the problem as a stochastic control problem of McKean-Vlasov (MKV for short) type that we solve theoretically and numerically.

*This research is part of the Finance for Energy Market (FiME) Lab (Institut Europlace de Finance) and the ANR project CAESARS

(ANR-15-CE05- 0024). The work benefits from the support of the Siebel Energy Institute (Calls for Proposals #2, 2016).

This work has been presented at the FOREWER conference Paris-June 2017, at the MCM2017 conference Montreal-July 2017, at the

conference ”Stochastic control, BSDEs and new developments ” Roscoff-September 2017, at the conference ”Advances in Stochastic Analysis for Risk Modeling” CIRM-November 2017, at the Workshop ”Mean-Field Games, Energy and Environment” London-February 2018, at the conference ”Advances in Modelling and Control for Power Systems of the Future” Paris-September 2018, at the conference ”APS Informs” Brisbane-July 2019, at the MCM2019 conference Sydney-July 2019. The authors wish to thank the participants for their feedbacks.

Email: emmanuel.gobet@polytechnique.edu. Centre de Math ´ematiques Appliqu ´ees (CMAP), CNRS, Ecole Polytechnique, Institut

Poly-technique de Paris, Route de Saclay, 91128 Palaiseau Cedex, France. Corresponding author.

§Email: maxime.grangereau@edf.fr. Electricit ´e de France, Department OSIRIS, 7 Boulevard Gaspard Monge, 91120 Palaiseau cedex,

(3)

More specifically, we study a decentralized mechanism aimed at reducing the variability of residual consumption on the electricity network; thus, operating the network could be done at lower costs and with a lower carbon footprint. This mechanism is a setting where a consumer has to commit in advance (sayT=one day-ahead, to match the usual working of day-ahead markets) to a predefined load profile and then, he has to command optimally and dynamically his system according to his stochastic consumption/production. Both the optimal load profile and the optimal control are the outputs of the stochastic control problem described below. The above model is a simplified prototype of smart grid (as defined by the European Commission1): our so-called consumer is considered as an association of small consumers, with possibly individual RE production and individual storage facilities, that we aggregate and consider as a whole.

General setting and methodology. We take the point of view of a consumer supplied in energy by its own intermittent sources (PV panels for instance) and by the electrical public grid. We consider the situation where the non-flexible consumption and the intermittent production are exogenous and can not be predicted perfectly: a stochastic model should be used for both of them. See [Bad+18] about a recent methodology for deriving a proba-bilistic forecast for solar irradiance (and thus PV production). To smooth his residual consumption, the consumer can take advantage of storage facilities (for instance conventional batteries, electrical vehicle batteries, heating network, flywheel etc) which we consider as a whole. At timet, his control is denoted by ut, the level of storage is represented by Xu

t, its net consumption on the electrical public grid is P grid,u

t . The (deterministic) committed profile load is the curve(Pgridt ,com. : 0 ≤ t ≤ T). Optimal control of a single micro-grid has already been considered in the literature, without the optimal committed load profile. A popular yet without theoretical optimality guarantee is Model Predic-tive Control [SSM16]. In discrete-time settings, Stochastic Dynamic Programming [IMM14; Wu+16] and Stochastic Dual Dynamic Programming [Pac+18] are popular approaches to get theoretical optimality guarantees. Long-term aging of the battery equipping a micro-grid is taken into account by two-time scales time decomposition in [Car+19]. Continuous time optimal control problems are considered in [Hey+15] in a deterministic setting, and in [Hey+16] in a stochastic environment. By jointly optimizing with the profile Pgrid,com., we change the nature of the stochastic control problem, compared to these works. We shall consider general filtrations with processes possibly exhibiting jumps, to account for sudden variations of solar irradiance or consumption for instance.

In short, in a simplified setting, the optimization criterion takes the form of the following cost functional E "Z T 0 ( CtPgrid,u t + µ 2u 2 t + ν 2  Xtu− 1 2 2 + l1  Pgrid,u t − P grid,com. t  ) dt+γ 2  XuT− 1 2 2# ,

minimized over admissible controls(ut)t. The first term in the above cost functional is the cost of buying electricity to the electrical public grid, at a price Ct which can be random. The second term in the cost functional accounts for a penalization of the use of the storage (e.g. aging cost in the case of a battery). The third and fifth terms are penalization of the deviation from the desired state of charge of the storage, which we define as 12 by convention. The fourth term is a penalization (through a convex loss functionl1) of the deviation of the power supplied by the electrical public grid Pgrid,u from the commitment profile Pgrid,com.. If the later were exogenously given, it would take the form of standard stochastic control problem. In our model, it is endogenous and we set

Pgridt ,com.= EhPgridt ,ui. (1.1) This choice is inspired by the quadratic case forl1: indeed, solving the optimal stochastic control for a given Pgrid,com., and then minimizing the resulting cost functional over Pgrid,com.would lead to (1.1), as the reader can easily check. Doing so, we obtain a stochastic control problem of MKV type, see later.

Going back to the applications, once identified the optimal control (ut : 0 ≤ t ≤ T), the consumer can commit to the profile Pgrid,com. as in (1.1) and then execute the optimal command, so that the variability of its residual consumption on the electrical public grid is minimized in a consistent way. On the side of the electricity supplier on the electrical public grid, since the consumption is smoothed, the operating costs are lower and the use of ”brown” generation units can be likely avoided. We shall highlight that presumably, good loss functionsl1 should penalize more the consumption exceedance than the consumption deficit: indeed, exceedance possibly requires the use of

(4)

extra production units with high carbon footprint, this is clearly to discard as often as possible. A typical example of loss function would be:

l1(x)= αx2+ α+max(x, 0)2; (1.2) see Figure 1 for an example with α+ = 1, α = 1. This choice is somehow related to generalized risk measures accounting for both left and right tails of the distribution, such as expectiles [Bel+14]. Another point to stress is the need to account for jumps in the production/consumption dynamics – i.e. the consumption might have discontinuities as appliances/devices are switched-on/off, the power production by a solar panel might suddenly drop to zero if a cloud hides the sun. To summarize, in order to fit application needs, we shall consider non quadratic loss functions and a probabilistic setting of general filtration (allowing jumps).

Figure 1: Loss functionl1penalizing more the consumption exceedance

MKV stochastic control problems: background results. We embed the previous example in a more general setting: J (u) := ERT 0 l  t, ω, ut, Xut, E h g(t, ω, ut, Xut) i dt+ ψω, Xu T, E h k(ω, Xu T) i s.t.Xu t = x0+ Rt 0 φ(s, ω, us, X u s)ds.          −→ min u . (1.3)

The functions l, g, ψ, k, φ depend on time, control, state variable and on the ambient randomness ω, precise as-sumptions are given later. Note that the control only appears in the drift of the state variable: we could also have considered a more general modelXu

t = x + Rt

0 φ(s, ω, us, X u

s)ds+ ZtwhereZ is c `adl `ag semi-martingale (independent ofu), but actually, this extended model is equivalent to the current one by setting fXut = Xut− Ztas a new state variable and by adjusting the (already random) coefficients. Besides, note that the above dynamics forXuis compatible with usual battery dynamics [Hus+07], like for example models of the form

d State of charge

dt = constant · Battery power. (1.4)

The problem (1.3) is of McKean-Vlasov (MKV) type since the distribution of(u, Xu) enters into the functional cost. But since this is through generalized moments via the functionsg and k, the interactions are so-called scalar, which avoids to use the notion of derivatives with respect to probability measures, while maintaining some interesting flexibility. For a full account on control of Stochastic Differential Equations (SDE for short) of MKV type and the link with Mean Field Games, see the recent books [CD18] and in particular Chapter 6 of Volume I. However, in the above reference, only the distribution of SDE enters in the coefficients, not that of the control as in our setting. We refer to this more general setting as extended MKV stochastic optimal control.

Studies in such an extended framework are quite unusual in the literature. In [PW16], the general discrete case is studied. In [Yon13] and very recently in [BP18], both the probability distributions of the state and control variables appear in the dynamic of the state and the cost function, but only through their first and second order moments (Linear-Quadratic problems, LQ for short). In [PW18], the cost functional and the dynamic depend both on the joint probability distribution of the state and control variables, but the authors consider closed-loop controls, which allows them to consider the probability distribution of the state variable only: in our setting, we do not make any Markovian assumptions for the characterization of the optimal control. During the preparation of this work (started in 2016),

(5)

we have been aware of the recent preprint [ABVC19] which deals also with the extended MKV stochastic optimal control, with fully non-linear interaction, Markovian dynamics, in the case of a Brownian filtration.

Our contributions. As a difference with the previous references, we do not restrict ourselves to the LQ setting, we deal with extended MKV stochastic optimal control, without Markovian assumptions, and we do not assume that the underlying filtration is Brownian (allowing jump processes). Besides, apart ”expected” results about ex-istence/uniqueness, we provide some numerical approximations by using some perturbations analysis around the LQ case. We shall insist that MKV stochastic control is a very recent field and numerical methods are still in their infancy; see [Ang+19] for a scheme based on tree methods for solving some MKV Forward-Backward SDE (FBSDE for short) that characterize optimal stochastic controls. Our perturbation approach is different from theirs. As a consequence, we design an effective numerical scheme to address the problem raised by the optimal management of storage facilities able to reduce the variability of residual electricity consumption on the electrical public grid, in the context of uncertain production/consumption of an aggregated consumer. This presumably opens the door to a wider use of these approaches in real smart grid applications.

Now let us go into the details of mathematical/computational arguments. For characterizing the optimal control, we follow a quite standard methodology (see e.g.[CD15]), although details are quite different. This is made in three steps: necessary first order conditions, which become sufficient under additional convexity assumptions, existence of solutions to the first order equations. The derivation of the first order conditions follows the stochastic Pontryagin principle, see for instance [Ben88; Pen90; CD15]. This is achieved for general running and terminal cost functions. In particular, to account for jumps in the production/consumption dynamics, our mathematical analysis is performed in the context of general filtration. It gives rise to an optimality system (see Theorems 2.2 and 2.3), composed of a forward degenerate SDE and of a backward SDE (the adjoint equation), with possibly discontinuous martingale term, and an optimality condition linking the values and probability laws of the state and control variables with the adjoint variable.

In Section 2.4, we establish that this system of equations has a unique solution under some regularity conditions, an invertibility assumption and for small time horizonT (see Theorem 2.4). The condition on T is quite explicit from the proof, which makes the verification on practical examples easy. Here the proof has to be specific and restricted to small time because of non-Brownian filtration and of non-Markovian dynamics: indeed, we can not invoke neither a drift-monotony condition, as in [PT99], nor a non-degeneracy condition as in [DG06]. In Section 2.5, we discuss how the unique solution to the first order condition may or may not be the optimal solution; we provide a counter-example (Proposition 2.6), which is interesting for its own, we believe that this kind of situation is already known but we could not find an appropriate reference.

Then we show in Section 2.7 that the necessary optimality conditions established in Theorem 2.3 become sufficient if we assume some convexity conditions on the Hamiltonian and the terminal cost. We shall highlight that the usual Hamiltonian [CD15] (when the distribution of the control is not optimized) can not match with our framework; alternatively, we define a version in expectation (Lemma 2.9). The final optimality result is stated in Theorem 2.10.

In Section 3, we exemplify our study to the toy model presented in introduction, motivated by practical appli-cations to smart grid management. To get a tractable and effective solution, we perform a perturbation approach around the LQ case. We establish error bounds and as an approximation, we select the expansion with the sec-ond order error terms. Numerical experiments illustrate the performance and accuracy of the method, as well the behavior on the optimally controlled system.

Long and technical proofs are postponed to Section 4 in order to smooth the reading.

Notations. We list the most common notations used in all this work.  Numbers, vectors, matrices. R, N, N∗

denote respectively the set of real numbers, integers and positive integers. The notation |x| stands for the Euclidean norm of a vector x, without further reference to its dimension. For a given matrixA ∈ Rp⊗ Rd, A> refers to its transpose. Its norm is that induced by the Euclidean norm, i.e. |A| := supx∈Rd,|x|=1|Ax|. Recall that |A

>|= |A|. For p ∈ N

(6)

 Functions, derivatives. When a function (or a process) ψ depends on time, we write indifferently ψt(z) orψ(t, z) for the value ofψ at time t, where z represents all other arguments of ψ.

For a smooth function g : Rq 7→ Rp, g

x represents the Jacobian matrix of g with respect to x, i.e. the matrix (∂xjgi)i,j∈ R

p⊗ Rq. However, a subscriptx

trefers to the value of a processx at time t (and not to a partial derivative with respect tot). We also introduce ∇xf := fx>.

 Probability. To model the random uncertainty on the time interval [0, T] (T > 0 fixed), we consider a complete filtered probability space(Ω, F , {Ft}0≤t≤T, P), we assume that the filtration {Ft}0≤t≤T is right-continuous, augmented with the P-null sets. For a vector/matrix-valued random variable V, its conditional expectation with respect to the sigma-field Ftis denoted by Et[Z]= E [Z|Ft]. Denote by P theσ-field of predictable sets of [0, T] × Ω.

All the quantities impacted by the controlu are upper-indexed by u, like Zufor instance.

As usually, c `adl `ag processes stand for processes that are right continuous with left-hand limits. All the martingales are considered with their c `adl `ag modifications.

 Spaces. Let k ∈ N∗

. We define L2([0, T], Rk) (resp. L∞([0, T], Rk)) as the Banach space of deterministic functions f on [0, T] with values in Rksuch thatRT

0 | ft|

2dt< +∞ (resp. sup

t∈[0,T]| f (t)|< +∞). Since the arrival space Rkwill be unimportant, we will skip the reference to it in the notation and write the related norms as

k f kL2 T :=  Z T 0 | f (t)|2dt 12 , k f kL∞T := sup t∈[0,T]| f (t)|.

Letp ≥ q ≥ 1. The Banach space of Rk-valued random variablesX such that E [|X|p]< +∞ is denoted by Lp(Ω, Rk), or simply Lp; the associated norm is

kXkLp

Ω := E [|X|

p]1p.

The Banach space Hp,q([0, T] × Ω, Rk) (resp. Hp,q

P([0, T] × Ω, R

k)) is the set of all F-progressively measurable (resp. F-predictable) processes ψ : [0, T] × Ω → Rksuch that

RT 0 E|ψt|

qp/q

dt< +∞. Here again we will omit the reference to Rk, which will be clear from the context. The associated norm is

kψk Hp,q := Z T 0 E|ψt|qp/qdt !1p . The Banach space H∞,q[0, T] × Ω, Rkstands for the elements of Hp,q

[0, T] × Ω, Rksatisfyingsup

t∈[0,T]E|ψt|q< +∞, and the related norm is

kψk

H∞,q([0,T]×Ω,Rk) := sup t∈[0,T]E

|ψt|q1q.

We shall most often considerp= q = 2.

2

Stochastic control and MKV-FBSDEs

The aim is to analyze the control problem, about minimizing (1.3). We first discuss the smart grid setting and the class of admissible controlsu; second we derive the first-order condition (Pontryagin principle) which writes as a MKV-FBSDE; third we derive sufficient conditions for the existence and uniqueness to the above; fourth in the ab-sence of convexity conditions we provide a counter-example to optimality; last, with suitable convexity assumptions we establish that the MKV-FBSDE solution characterizes the optimal control.

2.1

Stochastic model and smart grid framework

As explained in introduction, (1.3) may describe the optimal energy management of an aggregated consumer, with storage facilities (e.g. battery), with his own RE production (e.g. building equipped with solar panel), with a

(7)

connection to the electrical public grid. The management horizonT is typically short, e.g. 24 hours for reasons explained in introduction.

The control is made through a Rd-valued vector processu = (u

t : 0 ≤ t ≤ T), d ∈ N∗. We consideru as a Ft -predictable process in H2P,2: the intuition behind it is that decisions occurring at timet have to be made in accordance with the information available up to this time. This is coherent with the smart grid application. In particular, there has to be a slight delay between sudden events and the decisions taken by the controller, whence the predictability assumption.

The dynamics of the system are represented by a Rp-valued state variable, denoted byX, which satisfies the following ODE

Xut = x0+ Z t

0

φ(s, ω, us, Xus)ds. (2.1) The state variable can include various information in the smart grid application, like for example the state of charge of the battery (see (1.4)), the PV production, the building electricity consumption, etc. Moreover, the possible dependence in time ofφ(·) is a degree of freedom suitable to account for energy losses over time or aging of the battery, both impacting the state of charge of the battery.

The cost functional is described by J(u), given in (1.3). In the smart grid application, Markovian-type costs would take the form, for instance,l(t, ω, u, x, ¯g) = ˜l(t, Zt(ω), u, x, ¯g) where Z would represent a multidimensional stochastic factor modeling the evolution of the exogenous uncontrolled variables (weather, consumption. . . ), but we also allow non Markovian models. In the sequel, we omitω when we write terms inside J(u) and Xu, since it is now clear that we deal with random coefficients. All in all, the optimal control problem we study is

J (u) := ERT 0 l  t, ut, Xut, E h g(t, ut, Xut) i dt+ ψXuT, Ehk(XuT)i  s.t.Xut = x0+ Rt 0 φ(s, us, X u s)ds.          −→ min u∈H2,2P . (2.2)

Last, we summarize the coefficients from the toy example described p.2.

Example 2.1 (Smart grid toy example). Let Pload be the difference between the instantaneous consumer local consumption and his RE production: we assume this is a process in H2,2([0, T] × Ω, R). The control u ∈ H2P,2([0, T] × Ω, R) corresponds to the power supplied by the battery, while the state Xucorresponds to the normalized state of charge of the battery which dynamics is linear with respect to the controlu, see [Hey+15]:

Xut = x0− 1 Emax Z t 0 usds.

If Pgrid,u is the power supplied by the electrical public grid, the power balance imposes that Ploadt− = Pgrid,u

t− + ut. Then setd= p = 1 and

l(t, ω, u, x, ¯g) := Ct−(ω) (Ploadt− (ω) − u) + µt 2u 2+νt 2(x − 1 2) 2+ l 1(Ploadt− (ω) − u − ¯g), g(t, ω, u, x) := Pload t− (ω) − u, ψ(ω, x, ¯k) :=γ 2(x − 1 2) 2, k(ω, x) := 0, φ(t, u, x) := −Eu max. (2.3)

The time-dependent coefficientsµt andνt give the flexibility to include hourly effect in the management. We recall that the convex loss functionl1may take the form (1.2). Considering the left-hand limitt− in the above definitions is a technicality to fulfill the following assumptions.

(8)

2.2

Standing assumptions

From now on, we assume the following hypotheses hold. When we refer to a constant, we mean a finite deterministic constant.

(H.x) x0∈ L2and is F0-measurable.

(H.l) l : (t, ω, u, x, ¯g) ∈ [0, T] × Ω × Rd× Rp× Rq 7→ l(t, ω, u, x, ¯g) ∈ R is P ⊗ B(Rd) ⊗ B(Rp) ⊗ B(Rq)-measurable. Furthermore,l(·, ·, 0, 0, 0) ∈ H1,1,l is continuously differentiable in (u, x, ¯g) with the growth condition

|∇ul(t, ω, u, x, ¯g)| + |∇xl(t, ω, u, x, ¯g)| + |∇¯gl(t, ω, u, x, ¯g)| ≤ C (|u| + |x| + | ¯g|) + C(0) l (t, ω) for any(t, u, x, ¯g) ∈ [0, T] × Rd× Rp× Rqa.s., for some constantC and some random process C(0)

l in H 2,2.

(H.g) g : (t, ω, u, x) ∈ [0, T] × Ω × Rd× Rp 7→ g(t, ω, u, x) ∈ Rq is P ⊗ B(Rd) ⊗ B(Rp)-measurable. Furthermore, g(·, ·, 0, 0) ∈ H2,1,g is continuously differentiable in (u, x) and there exist constants C

g,uandCg,xsuch that |∇xg(t, ω, u, x)| ≤ Cg,x and |∇ug(t, ω, u, x)| ≤ Cg,u

for any(t, u, x) ∈ [0, T] × Rd× Rpa.s. .

(H.ψ) ψ : (ω, x, ¯k) ∈ Ω × Rp× Rr7→ψ(ω, x, ¯k) ∈ R is FT⊗ B(Rp) ⊗ B(Rr)-measurable. Furthermore,ψ(·, 0, 0) ∈ L1 Ω,ψ is continuously differentiable in(x, ¯k) and the growth condition

|∇xψ(ω, x, ¯k)| + |∇¯kψ(ω, x, ¯k)| ≤ C (|x| + |¯k|) + C(0) ψ (ω) holds for any(x, ¯k) ∈ Rp× Rra.s., for some constantC and some random variable C(0)

ψ in L2Ω.

(H.k) k : (ω, x) ∈ Ω × Rp 7→ k(ω, x) ∈ Rr is FT ⊗ B(Rp)-measurable. Furthermore, k(·, 0) ∈ L1, k is continuously differentiable inx and there exists a constant Ck,xsuch that

|∇xk(ω, x)| ≤ Ck,x holds for anyx ∈ Rp a.s..

(H.φ) φ : (t, ω, u, x) ∈ [0, T] × Ω × Rd × Rp 7→ φ(t, ω, u, x) ∈ Rp is P ⊗ B(Rd) ⊗ B(Rp)-measurable. Furthermore, φ(·, ·, 0, 0) ∈ H2,2,φ is continuously differentiable in (u, x) and there exist constants C

φ,u andCφ,xsuch that |∇uφ(t, ω, u, x)| ≤ Cφ,u and |∇xφ(t, ω, u, x)| ≤ Cφ,x

hold for any(t, u, x) ∈ [0, T] × Rd× Rpa.s.. It is easy to check these conditions in Example 2.1.

As a consequence of (H.φ), the dynamics of Xu in (2.1) writes as a ODE with Lipschitz-continuous stochastic coefficient: the uniqueness and existence stem from the Cauchy existence theorem for ODE, appliedω by ω. In addition, we easily show

|Xut| ≤ |x0|+ Z t 0  |φ(s, 0, 0)| + Cφ,u|us|+ Cφ,x|Xu s|  ds ≤ CT |x0|+ Z t 0  |φ(s, 0, 0)| + Cφ,u|us|ds !

where the second inequality comes from Gronwall’s lemma. Then one directly shows that, sinceu andφ(·, 0, 0) are in H2,2, Xu

is in H∞,2 ⊂ H2,2. Then, a careful inspection of the assumptions (H.l)-(H.g)-(H.ψ)-(H.k) shows that it implies that the cost J(u) is finite.

(9)

2.3

Necessary condition for optimality

For admissible controlsu and v, we now provide a representation of the derivative ˙

J (u, v) = ∂εJ (u+ εv)|ε=0, using an adjoint processYu.

Theorem 2.2 (G ˆateaux derivatives). Letu ∈ H2,2P and set ¯g u t := E

h

g(t, ut, Xut) i

. Let ˜Lube the unique solution of ˜Lu 0 = Idp, d˜Lu t dt = ˜L u t∇xφ(t, ut, Xut). Then ˜Luis invertible and its inverse satisfies (see Lemma 4.1)

(˜Lu0)−1= Idp, d(˜Lu t)−1 dt = −∇xφ(t, ut, X u t)(˜L u t)−1. Define alsoLu:= ((˜Lu)−1)>

. The following Rp-valued processYuis well defined as a c `adl `ag process in H∞,2 : Yut = Et " (˜Lut)−1˜LuT ∇xψ  XTu, Ehk(XuT)i + ∇xk(XuT)E h ∇¯kψXu T, E h k(XuT)ii !# + Et "Z T t (˜Lut)−1˜Lus ∇xl(s, us, Xus, ¯gus)+ ∇xg(s, us, Xus)E h ∇¯gl(s, us, Xu s, ¯gus) i ! ds # . (2.4)

In particular, there exists a Rp-valued c `adl `ag martingaleMuin H∞,2, vanishing at time 0, such that(Yu, Mu) is the unique solution in H∞,2

× H∞,2

of the following BSDE in(Y, M): −dYt=  ∇xφ(t, ut, Xu t)Yt+ ∇xl(t, ut, Xut, ¯gut)+ ∇xg(t, ut, Xut)E h ∇¯gl(t, ut, Xu t, ¯gut) i dt − dMt, YT= ∇xψ  XuT, Ehk(XuT)i + ∇xk(XuT)E h ∇¯kψXu T, E h k(XuT)ii. (2.5)

Besides, for anyu, v ∈ H2P,2, the directional derivative ˙J (u, v) exists and is given by ˙ J (u, v) = E "Z T 0  lu(t, ut, Xut, ¯g u t)+ E h l¯g(t, ut, Xut, ¯g u t) i gu(t, ut, Xut)+ (Y u t−) > φu(t, ut, Xut)  vtdt # .

The proof is postponed to Subsection 4.1. At the optimal control u (whenever it exists), the above derivative ˙

J (u, v) must be 0, in any direction v ∈ H2,2P . Take for instancev given by: ∀ ∈ [0, T], vt:= lu(t, ut, Xut, ¯gut)+ E h l¯g(t, ut, Xut, ¯gut) i gu(t, ut, Xut)+ (Yut−) >φ u(t, ut, Xut), which ensures thatv ∈ H2,2P under our assumptions. This justifies the following statement.

Theorem 2.3 (Necessary condition for optimality). Under the notations and assumptions of Theorem 2.2, if a control

u ∈ H2P,2is optimal, then there exists a unique couple 

Xu, Yu∈ H∞,2 × H∞,2

fulfilling (2.1) and (2.4) such that lu(t, ut, Xut, ¯gut)+ E h l¯g(t, ut, Xut, ¯gut) i gu(t, ut, Xtu)+ (Yut−) > φu(t, ut, Xut)= 0 (2.6) holdsdt ⊗ dP-a.e.

2.4

Solvability of the MKV Forward-Backward SDE

Our aim is now to provide sufficient conditions to ensure existence of solution to the system of forward-backward equations (2.1)-(2.4)-(2.6), which we call MKV-FBSDE. For this, we strengthen previous assumptions.

(10)

(H.l.2) (H.l) holds and there exist constantsCl∗,?where ∗ stands forx and ¯g, and? stands for u, x or ¯g such that:

|∇xl(t, ω, u1, x1, ¯g1) − ∇xl(t, ω, u2, x2, ¯g2)| ≤ Cl

x,u|u1− u2|+ Clx,x|x1− x2|+ Clx, ¯g|¯g1− ¯g2|,

|∇¯gl(t, ω, u1, x1, ¯g1) − ∇¯gl(t, ω, u2, x2, ¯g2)| ≤ Cl

¯g,u |u1− u2|+ Cl¯g,x|x1− x2|+ Cl¯g, ¯g|¯g1− ¯g2|

holds for any(u1, u2, x1, x2, ¯g1, ¯g2) ∈ Rd× Rd× Rp× Rp× Rq× Rq,dt × dP-a.e..

(H.g.2) (H.g) holds andg is affine-linear in x, of the form g(t, u, x) = a(g)t x+ b(g)(t, u).

(H.ψ.2) (H.ψ) holds and there exist constants Cψ∗,?where ∗ and? stand for x or ¯k such that:

|∇xψ(x1, ¯k1) − ∇xψ(x2, ¯k2)| ≤ Cψ

x,x|x1− x2|+ Cψx,¯k|¯k1−¯k2|,

|∇¯kψ(x1, ¯k1) − ∇¯kψ(x2, ¯k2)| ≤ Cψ

¯k,x|x1− x2|+ Cψ¯k,¯k|¯k1−¯k2|

holds for any(x1, x2, ¯k1, ¯k2) ∈ Rp× Rp× Rr× Rr,dt × dP-a.e..

(H.k.2) (H.k) holds andk is affine-linear in x, of the form k(x)= a(k)x+ b(k).

(H.φ.2) (H.φ) holds and the dynamic of Xuis affine-linear inx, given byφ(t, u, x) = a(φ)t x+ b(φ)(t, u).

Observe again that this set of conditions is consistent with Example 2.1. We now aim at establishing the solvability of the system composed of (2.1), (2.5) and (2.6). We are going to show that this system has a unique solution for a small enough time horizon T, hence the existence and uniqueness of a solution to the optimal control problem, under the sufficient conditions of Theorem 2.10.

Theorem 2.4. Assume (H.l.2)-(H.g.2)-(H.ψ.2)-(H.k.2)-(H.φ.2) hold. Assume furthermore that there exists a P ⊗

B(Rp) ⊗ B(Rp) ⊗ B(Rq) ⊗ B(Rq)-measurable function h : (t, ω, x, y, ¯g, ¯λ) 7→ h(t, ω, x, y, ¯g, ¯λ) ∈ Rdsuch that lu(t, ut, Xut, ¯g u t)+ E h l¯g(t, ut, Xtu, ¯g u t) i gu(t, ut, Xut)+ (Y u t−) > φu(t, ut, Xtu)= 0, dP ⊗ dt − a.e. ⇐⇒ ut= h  t, Xu t, Y u t−, ¯g u t, E h ∇¯gl(t, ω, ut, Xu t, ¯g u t)i , dP ⊗ dt − a.e.. (2.7) Ifh is Lipschitz continuous in (x, y, ¯g, ¯λ), with Lipschitz constants denoted by Ch,x, Ch,y, Ch, ¯g, Ch, ¯λ, and if

 h(t, ω, 0, 0, 0, 0) (t,ω)∈P ∈ H2P,2, Θ :        H2,2P → H 2,2 P u 7→u˜ , where Θ(u)t:= ˜ut= h  t, ω, Xu t, Y u t−, ¯g u t, E h ∇¯gl(t, ut, Xu t, ¯g u t)i , dP ⊗ dt − a.e., is well defined and Lipschitz continuous. If moreover,

Ch, ¯gCg,u+ Ch, ¯λ 

Cl¯g,u+ Cl¯g, ¯gCg,u < 1, (2.8)

then forT small enough,Θ is a contraction and therefore has a unique fixed point u?. In that case, there exists a uniqueu ∈ H2P,2satisfying (2.1)-(2.5)-(2.6) andu= u?.

The proof is available in Subsection 4.2. Regarding the proof of a fixed point when the time interval [0, T] is arbitrary large, observe that, as a difference with [PT99] and [DG06] for instance, in our setting we can rely on a monotony condition of the drifts, nor a non-degeneracy condition. This is why we shall restrict to small time condition.

Remark 2.5. If one can exhibit a P ⊗ B(Rp) ⊗ B(Rp) ⊗ B(Rq) ⊗ B(Rq)-measurable function h such that for all ( ˜u, x, y, ¯g, ¯λ) ∈ Rd× Rp× Rp× Rq× Rq: dP ⊗ dt − a.e., lu(t, ω, ˜u, x, ¯g) + ¯λ > gu(t, ω, ˜u, x) + y > φu(t, ω, ˜u, x) = 0 ⇐⇒ dP ⊗ dt − a.e., ˜u = h t, ω, x, y, ¯g, ¯λ,

(11)

2.5

Existence and uniqueness of critical point do not necessarily imply existence of a

minimum

If there exists a unique solution to the first order optimality condition (unique critical point), and under other as-sumptions like continuity, growth properties, it is tempting to conclude that this point is a minimum. However, this is not necessarily the case in infinite dimension. This section aims at clarifying this fact by providing an example2 where continuity, coercivity and unique critical point are ensured, but without existence of minimum. Therefore, extra conditions are necessary to get the existence of a minimum, see later the discussion in Section 2.6.

Proposition 2.6. Set F :          L21 := L2([0, 1], R) 7→ R u 7→ (kuk2 L21 − 1)2+R1 0 t|ut| 2dt.

ThenF satisfies the following properties: 1. Continuity:F is continuous

2. Coercivity:F(u) tends to+∞ when kukL2

1 tends to+∞

3. Existence and uniqueness of critical point:F is Gateaux-differentiable and has a unique critical point. However,F does not have a minimum.

The proof is postponed to Subsection 4.3. The functionF defined in this example cannot be quasi-convex (and a fortioriF cannot be convex), since it would then have a minimum, as stated in the next section.

2.6

Existence of an optimal control

We now give sufficient conditions for the existence of an optimal control, i.e. existence of a minimizer of J. In such a favorable case, and if the necessary optimality conditions (2.1)-(2.4)-(2.6) have a unique solutionu∗

, thenu∗ is the unique minimum of J. We start with a general result.

Theorem 2.7. LetE be a reflexive Banach space, let F : E → R be a lower semi-continuous, quasi-convex function which satisfies the coercivity conditionlimkukE→+∞F(u)= +∞. Then F has a minimum on E.

Proof. We adapt the arguments of [Bre10, Corollary 3.23, pp. 71], where the operator considered is assumed to be continuous and convex. However, the hypothesis can be relaxed to lower semi-continuity and quasi-convexity of the functionF, since we only need closedness and convexity of the sub-level setsΓ(F)α := {u ∈ E|F(u) ≤ α} for all

α ∈ R. 

Let us add a few comments. In the finite dimensional case, any lower semi-continuous and coercive function has a minimum (since any closed and bounded set is compact). In the infinite dimensional case, the example in Subsection 2.5 illustrates that this may be not the case without the quasi-convexity assumption. Besides, note that without the coercivity condition, the existence of minimum may not hold, even in finite dimension (takeE= R and F(x)= exp(x)). Moreover, without the lower semi-continuity of F, the result may fail as well (take F : (−∞, 0] 7→ R defined byF(x)= |x|1x<0+ 1x=0, which is coercive and convex).

Apply the previous result withE = H2,2 andF= J: E is an Hilbert space, thus a reflexive Banach space. The functional J is continuous, hence lower semi-continuous. Therefore, we have proved the following.

Corollary 2.8. Assume that J defined in (2.2) is quasi-convex and thatlimkuk

H2,2→+∞J (u)= +∞. Then the optimal

control problem has a solutionu∗ ∈ H2,2P .

(12)

2.7

Sufficient condition for optimality

Let us now give conditions under which the necessary optimality conditions are sufficient. Additionally to

(H.x)-(H.g)-(H.l)-(H.k)-(H.ψ)-(H.φ), we assume the following conditions.

(Conv) 1. The mapping T :        L2 → R X 7→ Eψ(X, E [k(X)]) is convex. 2. The mapping I:        H2P,2× H ∞,2 → R ( ˜u, X) 7→RT 0 El t, ˜ut, Xt, E g (t, ˜ut, Xt)dt is convex. 3. The mapping: φ :        [0, T] × Rd× Rp → Rp (t, u, X) 7→φ(t, u, X) is affine-linear in(u, X).

Lemma 2.9. Under (Conv), J is convex. If furthermore, I is strictly convex inu, or I is strictly convex in X and˜ φu has full column rank (which impliesp ≥ d) for almost every t in [0, T], then J is strictly convex.

Proof. Under the assumption onφ, u 7→ Xuis affine-linear. This yields the first result using the fact that a compo-sition of an affine-linear function by a convex function is convex. If I is strictly convex inu then so is J. Ifφu has full column rank,u 7→ Xuis an affine-linear injection and if besides I is strictly convex inX, we get that J is strictly

convex. 

Let us emphasize the difference with usual stochastic maximum principle (when distributions do not enter in the cost functions). In that case, i.e. without the dependence w.r.t. E g (t, ˜ut, Xt) of the running cost and w.r.t. E [k(X)] of the terminal cost, the sufficient optimality condition is the affine-linearity in(u, X) of φ, the point-wise convexity in (u, x) of

(t, u, x) 7→ l(t, u, x), for anyt and the point-wise convexity ofψ in x.

In the current MKV setting, it would be tempting to require:

ξ : (t, u, x) 7→ l(t, u, x, E g(t, u, x)) to be convex in(u, X) ∈ L2

Ω× L2Ωfor anyt and

X 7→ψ(X, E [k(X)]) to be convex inX in L2

Ω. However, even for the simple linear-quadratic case withd= p = q = 1, i.e. l(t, u, x, ¯g) = (1 + κ)u2κ ¯g2, g(t, u, x) = u, φ(t, u, x) = u, ψ = 0, with parameterκ > 0, this fails to be true. Indeed, denoting ζ(u) = ξ(t, u, x), we get:

ζu1+ u2 2  − 1 2(ζ(u1)+ ζ(u2))= 1 4κ(E [u1− u2]) 2− (1+ κ)(u 1− u2)2 .

Now ifu1 is a Bernoulli random variable with parameter 12, and u2 = −u1, then on the set {ω : u1(ω) = u2(ω) = 0} of positive probability, the above equals κ4 > 0, which violates the convexity condition for these ω. On the contrary, Ehζ(u1+u2 2) −ζ(u1)+ζ(u2 2)

i

≤ 0 forκ ≥ 0, and it is easy to see that E [ζ(u)] is convex in u, for such κ. This discussion clarifies better why the correct convexity condition for the integrated Hamiltonian I or the point-wise one H is in expectation and notω-wise, as stated in (Conv).

We now summarize all the results for having existence and uniqueness of an optimal stochastic control. This is one of the main results of this section.

(13)

Theorem 2.10. Assume (H.x)-(H.g)-(H.l)-(H.k)-(H.ψ)-(H.φ)-(Conv) hold.

1. If J defined in (2.2) satisfies the coercivity condition: lim kuk

H2,2→+∞

J (u)= +∞, then there exists an optimal controlu?∈ H2,2P , i.e. a minimum of J on H

2,2 P.

2. u? is an optimal control for the problem (2.2) if and only if there exists (X?, Y?) ∈ H∞,2

× H∞,2

such that (u?, X?, Y?) fulfills (2.1)-(2.5)-(2.6).

3. If J is strictly convex, then it admits at most one minimizer. Proof. 1. This is a direct consequence of Theorem 2.8 and Lemma 2.9.

2. If(u?, Xu?, Yu?) satisfies (2.1)-(2.5)-(2.6), then ˙J (u?, v) = 0 for any v ∈ H2P,2according to Theorem 2.2. Besides, under our assumptions, J is convex and therefore, for allv ∈ H2,2P andt ∈ (0, 1],

J (v) − J(u?) ≥ J (u

?+ t(v − u?)) − J(u?)

t .

By taking the limit whent → 0, we obtain J(v) − J(u?) ≥ ˙J (u?, v − u?)= 0, hence the optimality of u?. The direct

implication ⇒ has been established in Theorem 2.3. 

3

Effective computation and approximation of battery control

3.1

Model/Context

For simplicity, we assume one-dimensional processes (p= q = r = 1), but the results can be easily extended to any dimension, since the arguments are based on the solution of Linear-Quadratic FBSDE, which are well known (see [Yon06]). Let us consider the following toy problem:

min u∈H2P,2 E "Z T 0 ( Ct−Pgridt− ,u+µ 2u 2 t + ν 2  Xut −1 2 2 + l Pgridt− ,u− EhPgridt− ,ui ) dt+γ 2  XuT−1 2 2# s.t.        Xu t = x −Emax1 Rt 0 usds, Pgridt− ,u= Ploadt− − ut.

This model is the same as the one presented in the introduction and has the same interpretation. We consider the following hypothesis:

(Toy)

1. The parametersµ, ν, γ are deterministic and satisfy µ > 0, ν ≥ 0, γ ≥ 0.

2. The mappingl is deterministic, convex, continuously differentiable with the growth condition |l0(x)| ≤ Cl,x(1+ |x|) for allx, for some constant Cl,x> 0.

3. Pload

∈ H2,2,C ∈ H2,2are F-adapted and c `adl `ag.

Under assumptions (Toy), (H.x)-(H.g)-(H.l)-(H.k)-(H.ψ)-(H.φ)-(Conv) hold. Besides, one can show the strict con-vexity of J. Then, it remains to apply Theorem 2.10 to conclude the following.

Proposition 3.1. Under assumptions (Toy), there exists a unique optimal controlu ∈ H2P,2. Besides, there exist unique processes Xu ∈ H,2

andYu ∈ H,2

such that(u, Xu, Yu) satisfies the following McKean-Vlasov Forward Backward SDE:              Xu t = x − 1 Emax Rt 0 usds, Yu t = Et  RT t ν(X u s − 12)ds+ γ  Xu T− 1 2  , µut− Ct−− l0  Pload t− − ut− E h Pload t− − uti + E hl 0 Pload t− − ut− E h Pload t− − utii = Yu t− Emax. (3.1)

(14)

Although we can derive specific results for the control problem under assumption (Toy) (see Propositions 3.1 and 3.4), solving explicitly the system (3.1) remains difficult for general convexl. To get approximation results, we consider a specific form ofl.

(ToyBis) The mappingl is given by l(x) := λ2x2+ε(λ+µ)

2 (x+)2withλ ≥ 0, |ε| < 1.

From the application point of view, we remind that we want to penalize more consumption excess (compared to the commitment) than consumption deficit. The asymmetry parameterε should thus be taken non-negative. Under assumptions (Toy) and (ToyBis), the last equation in (3.1) writes:

(λ + µ)ut−λE [ut] − Ct−−λ(Ploadt− − E h

Ploadt− i) −ε(λ + µ)Ploadt− − ut− EhPloadt− − uti + + ε(λ + µ)Eh Ploadt− − ut− E h Ploadt− − ut i +i = Yu t− Emax.

We now provide a first order expansion of the solution of this problem with respect to the parameterε → 0.

3.2

Computation of first order expansion

3.2.1 Preliminary result

The computation of a first order expansion of the solution of the MKV FBSDE (3.1) will rely extensively on the following result.

Proposition 3.2. Leta, b, c, e, f, g be deterministic real parameters with a > 0, g > 0, b ≥ 0 and e ≥ 0. Let (ht)tbe a stochastic process in H2P,2andx0∈ L

2(Ω) be F 0-measurable. Define: θt :=          1 2  1+ e q ag b  exp(pabg(T − t))+12  1 − e q ag b  exp(pabg(t − T)) if b> 0, eag(T − t)+ 1 i f b= 0, (3.2) pt:= −dθt dt 1 agθt, (3.3) πt= 1 θt f − Z T t (apsEt[hs] − c)θsds ! . (3.4)

Definex, y and v by:

             xt = x0θθ0t − Rt 0(agπs+ ahs) θt θsds, yt= ptxt+ πt, vt = gptxt+ gπt−+ ht. (3.5) Then(x, y, v) is a solution in H∞,2 × H∞,2

× H2P,2of the Forward-Backward system:              xt = x0− Rt 0 avsds, yt= Et  RT t (bxs+ c)ds + exT+ f  , vt = gyt−+ ht. (3.6)

Besides, forT small enough, this solution to (3.6) is the unique one in H∞,2 × H∞,2

× H2P,2. The proof is postponed to Subsection 4.4.

Remark 3.3. Uniqueness of the solution of the FBSDE (3.6) could be proved for arbitrary time horizonT, using the fact that (3.6) characterizes the solution of a (linear-quadratic) stochastic control which has a unique solution (as the associated cost function is continuous, convex and coercive [Bre10, Corollary 3.23, pp. 71]).

(15)

3.2.2 Average processes

We introduce the following notations for the average (in the sense of expectation) of the solutions of (3.1): ¯

u := E [u], X := E [X¯ u], Y := E [Y¯ u], ¯C := E [C].

By taking the expectation in (3.1), we immediately get the following simple but remarkable result: the average processes do not depend onl.

Proposition 3.4. Assume (Toy),( ¯u, ¯X, ¯Y) solves              ¯ Xt = E [x] −Emax1 Rt 0 u¯sds, ¯ Yt = RT t ν( ¯Xs− 1 2)ds+ γ ¯XT− 1 2 , ¯ ut= ¯ Yt− µEmax + ¯Ct− µ . (3.7)

In particular,( ¯u, ¯X, ¯Y) does not depend on l.

Note that the FBSDE (3.7) is explicitly solvable, as a particular case of Equation (3.6) withx0 := x, a = Emax1 ,b= ν, c= −ν2,e= γ, f = −γ2,g= 1

µEmax andht =

¯Ct−

µ .

3.2.3 Notations

From now on, assume that (Toy) and (ToyBis) hold. From Proposition 3.4, ( ¯u, ¯X, ¯Y) does not depend on ε. We denote the processesu, XuandYubyu(ε),X(ε)andY(ε)respectively to insist on the dependency w.r.t. the parameter ε. (u(ε), X(ε), Y(ε)) satisfies (3.1) with l0

(x)= λx + ε(λ + µ)x+.

For the ease of the proofs, let us define the recentered processes

u∆,(ε):= u(ε)−u,¯ X∆,(ε):= X(ε)−X,¯ Y∆,(ε):= Y(ε)−Y,¯ Pload,∆:= Pload− EhPloadi, C∆:= C − E [C].

Then,(u∆,(ε), X∆,(ε), Y∆,(ε)) satisfies:                X∆,(ε)t = x − E [x] −Emax1 Rt 0 u ∆,(ε) s ds, Y∆,(ε)t = Et RT t νX ∆,(ε) s ds+ γX∆,(ε)T  , µu∆,(ε)t − C∆t−−λPload,∆ t− − u ∆,(ε) t  −ε(λ + µ)Pload,∆ t− − u ∆,(ε) t  ++ ε(λ + µ)E h Pload,∆ t− − u ∆,(ε) t  +i = Y∆,(ε)t− Emax. (3.8)

We now seek a first order expansion of the solution of (3.1) w.r.t. ε, as ε → 0, and equivalently, as the average processes do not depend onε (see Proposition 3.4), we will perform it for the recentered processes, by showing

u∆,(ε)= u∆,(0)+ ε ˙u + o(ε), X∆,(ε)= X∆,(0)+ ε ˙X + o(ε), Y∆,(ε)= Y∆,(0)+ ε ˙Y + o(ε), where ˙u, ˙X and ˙Y are suitable processes in H2,2P × H

2,2× H2,2(independent ofε) and the convergence o(ε)/ε → 0 asε → 0 holds in H2,2-norm.

Proposition 3.5. Assume (Toy) and (ToyBis). Then(u∆,(0), X∆,(0), Y∆,(0)) satisfies:                X∆,(0)t = x − E [x] − 1 Emax Rt 0 u ∆,(0) s ds, Y∆,(0)t = Et RT t νX ∆,(0) s ds+ γX∆,(0)T  , u∆,(0)t = Y∆,(0)t− (λ+µ)Emax + C∆t−+λPloadt− ,∆ λ+µ . (3.9)

Observe that the FBSDE (3.9) is known in a closed form, as a particular case of Equation (3.6) withx0:= x−E [x], a= Emax1 ,b= ν, c = 0, e = γ, f = 0, g =

1

(λ+µ)Emax andht=

C∆t−+λPloadt− ,∆ λ+µ .

(16)

Proposition 3.6. Assume (Toy) and (ToyBis). Define the finite differences ˙u(ε):= u∆,(ε)− u∆,(0) ε , X˙(ε):= X∆,(ε)− X∆,(0) ε , Y˙(ε):= Y∆,(ε)− Y∆,(0) ε , which solve                ˙ X(ε)t = −Emax1 Rt 0 ˙u (ε) s ds, ˙ Yt(ε)= Et RT t ν ˙X (ε) s ds+ γ ˙X (ε) T  , ˙u(ε)t = Y˙ (ε) t− (λ+µ)Emax +  Ploadt− ,∆− u∆,(ε)t  +− E h Ploadt− ,∆− u∆,(ε)t  +i. (3.10)

Besides, for small enough time horizonT, ( ˙u(ε), ˙X(ε), ˙Y(ε)) is uniformly bounded in H2P,2× H

2,2× H2,2asε → 0. Define( ˙u, ˙X, ˙Y) as a solution (unique when T is small enough) to

               ˙ Xt= −Emax1 Rt 0 ˙usds, ˙ Yt= Et RT t ν ˙Xs+ γ ˙XTds  , ˙ut= ˙ Yt− (λ+µ)Emax +  Ploadt− ,∆− u∆,(0)t  +− E h Ploadt− ,∆− u∆,(0)t  +i. (3.11)

Then, for small enough time horizonT, the finite differences ( ˙u(ε), ˙X(ε), ˙Y(ε)) are close (at order 1 inε) to ( ˙u, ˙X, ˙Y): k˙u(ε)− ˙uk H2,2P + k ˙X (ε)−Xk˙ H2,2+ k ˙Y (ε)−Yk˙ H2,2 = O(ε).

The proof is postponed to Subsection 4.5. Note again that the FBSDE (3.11) is explicitly solvable, as a particular case of Equation (3.6) withx0 := 0, a = Emax1 ,b = ν, c = 0, e = γ, f = 0, g =

1

(λ+µ)Emax andht =

 Pload,∆ t− − u ∆,(0) t  +− E h Pload,∆ t− − u ∆,(0) t  + i .

Collecting all the previous results, we get the following theorem, which fully characterizes the first order expansion of the solution to the control problem.

Theorem 3.7. Assume (Toy) and (ToyBis) hold. For small enough time horizonT, the unique solution (u(ε), X(ε), Y(ε)) in H2P,2× H2,2× H2,2of (3.1) can be expanded at first order w.r.t.ε (with error of second order as ε → 0):

u(ε)= ¯u + u∆,(0)+ ε ˙u + O(ε2), X(ε)= ¯X + X∆,(0)+ ε ˙X + O(ε2), Y(ε)= ¯Y + Y∆,(0)+ ε ˙Y + O(ε2), where errors O(ε2

) are measured in H2,2-norm, with ( ¯u, ¯X, ¯Y) solution of (3.7), (u∆,(0), X∆,(0), Y∆,(0)) solution of (3.9) and( ˙u, ˙X, ˙Y) solution of (3.11).

We shall emphasize that all terms in these expansions are solutions of FBSDEs of the form (3.6) for different input parameters (see Table 1) and thus they are explicitly solvable.

For other problems with more regularity (notice that x 7→ (x+)2 is not twice continuously differentiable), the previous approach could actually be extended to a second order expansion or even higher order, but it would lead to more and more nested FBSDEs: on the mathematical side, there is no hard obstacle to derive these equations under appropriate regularity conditions. The concerns would be rather on the computational side since it would require larger and larger computational time.

3.3

Effective simulation of first order expansion of optimal control

3.3.1 Models for random uncertainties

We assume the electricity price C is constant ( ¯C= C and C∆= 0), and we suppose Ploadis given by Pload= Pcons− Psun, where Pconsand Psun are two independent scalar SDEs3, representing respectively the consumption and the photo-voltaic power production. For the consumption Pcons, we use the jump process:

dPconst = −ρ cons (Pconst − p cons,ref t )dt+ h cons dNconst , (3.12)

(17)

where Ncons is a compensated Poisson Process with intensity λcons. Regarding the PV production, we follow [Bad+18] by setting Psun= Psun,maxXsunwhere Psun,max: [0, T] 7→ R is a deterministic function (the clear sky model) and Xsunsolves a Fisher-Wright type SDE which dynamics is

dXsunt = −ρ sun (Xsunt − x sun,ref t )dt+ σ sun (Xsunt )α(1 − X sun t )βdWt, (3.13) withα, β ≥ 1/2. As proved in [Bad+18], there is a strong solution to the above SDE and the solution Xsuntakes values in[0, 1].

Since the drifts are affine-linear, the conditional expectation of the solution is known in closed forms (this property is intensively used in [BSS05]): EtPsuns = P sun t Psun,max t exp(−ρsun (s − t))+ Z s t

ρsunxsun,ref

τ exp(−ρsun(s −τ))dτ ! Psuns ,max, (3.14) EtPconss = Pcons t exp(−ρcons(s − t))+ Z s t

ρconspcons,ref

τ exp(−ρcons(s −τ))dτ, (3.15) fors ≥ t. This will allow us to speed up computations of the conditional expectations Et

h Pload

s i

as required when deriving the optimal control.

3.3.2 FBSDE Parameters

Algorithm 1 Sample of a path of(x, y, v), solution of (3.6)

1: Inputs:x0∈ L2, a > 0, b ≥ 0, c ∈ R, e ≥ 0, f ∈ R, g > 0, h ∈ H2,2P ,NT> 0

2: Samplex0and setX(0) ← x0. Setτ = NTT.

3: forn= 0, ..., NT− 1 do

4: Compute the conditional expectations(Enτ[hs])nτ≤s≤T

5: Computeπ(nτ) by numerical integration, as given in (3.4)

6: Computep(nτ) as in (3.3)

7: v(nτ) ← gp(nτ)X(nτ) + gπ(nτ) + h(nτ) 8: x((n+ 1)τ) ← x(nτ) − av(nτ)τ

9: end for

10: return (x, y, v)

Proposition 3.2 is repeatedly used to solve the affine-linear FBSDEs (u, ¯X, ¯Y),¯ (u∆,(0), X∆,(0), Y∆,(0)) and ( ˙u, ˙X, ˙Y) arising in the first order expansion of the optimal control w.r.t. ε (see Theorem 3.7). In Algorithm 1 we give the pseudo-code of the scheme used to compute solutions of the FBSDE of the form (3.6).

In Table 1, we give the correspondence between the input parameters (a, b, c, d, e, f, g, ht) for the generic FB-SDE of Proposition 3.2 and the parameters defining the 3 FBFB-SDEs. Merged columns indicate common values of parameters. As the data involved in the system defining( ¯u, ¯X, ¯Y) is deterministic, one only needs to perform numer-ical integrations to computeπ and therefore ( ¯u, ¯X, ¯Y). For (u∆,(0), X∆,(0), Y∆,(0)) and ( ˙u, ˙X, ˙Y), it becomes a bit more involved. Let us provide some details on the implementation.

• For the computation of (u∆,(0), X∆,(0), Y∆,(0)), the conditional expectations (E

nτ[hs])nτ≤s≤T are given by affine-linear combinations of Pcons

nτ and Psunnτ with deterministic coefficients, depending ons and n, by assumption on our models for Pconsand Psun(see (3.14)-(3.15)). Therefore,π(nτ) is also given by an affine-linear combination of Pcons

nτ and Psunnτ with deterministic coefficients. This allows to speed up Steps 4 and 5 in Algorithm 1. • For the computation of ( ˙u, ˙X, ˙Y), the conditional expectations 

Enτ h Ploads ,∆− u∆,(0)s  + i nτ≤s≤T at Step 4 is es-timated by Monte-Carlo methods. The procedure for doing so is given in Algorithm 2. This Step 4 has a complexity of order O((NT− n)M0), which is the most costly Step in the loop of Algorithm 1; hence sampling ( ˙u, ˙X, ˙Y) has a computational cost of order O(N2

(18)

(u, ¯X, ¯Y) (u¯ ∆,(0), X∆,(0), Y∆,(0)) ( ˙u, ˙X, ˙Y) a Emax1 b ν c −2ν 0 e γ f −2γ 0

g µE1max (λ+µ)E1 max ht ¯Cµt− C∆t−+λPload,∆t− λ+µ  Pload,∆ t− − u ∆,(0) t  +− E h Pload,∆ t− − u ∆,(0) t  + i

Table 1: Table of parameters needed to compute the expansion terms

Algorithm 2 Evaluation ofEnτ h Ploads ,∆− u∆,(0)s  + i s=nτ,...,NTτ

1: Inputs:n< NT,X∆,(0)nτ , Psunnτ , Ploadnτ ,M0> 0

2: Initialization: (R[n], R[n + 1], ..., R[NT]) ← (0, 0, ..., 0).

3: Computeu∆,(0)(nτ) using a similar procedure as in Algorithm 1.

4: R[n] ←Ploadnτ ,∆− u∆,(0)nτ  +. 5: form= 1, ..., M0 do 6: fork= n + 1, ..., NTdo 7: Sample(Pcons

kτ , Psunkτ ) conditionally to (Pcons(k−1)τ, Psun(k−1)τ) using (3.12)-(3.13), independently from all other random variables simulated so far.

8: Computeu∆,(0)kτ with Steps 5 to 8 of Algorithm 1 with the data of the FBSDE (3.9). ComputeX∆,(0)kτ .

9: R[k] ← R[k]+M10Pload,∆ kτ − u∆,(0)kτ  + 10: end for 11: end for 12: return (R[n], R[n + 1], ..., R[NT])

3.3.3 Numerical values of parameters

We report the values chosen for the next experiments.

Parameters for smart grid. We consider the following values for the time horizon, the size of the storage system and the initial value of its normalized state of charge.

Parameter T Emax x0 Value 24 h 200 kWh 0.5

Parameters for uncertain consumption/production. The following table gives the values of the parameters used in the modeling of the underlying exogenous stochastic processes impacting the system.

(19)

Psun ρsun 0.75h−1 xsunt ,ref 0.5

σsun 0.8

α 0.8

β 0.7

Psun,max see Figure 2a Pcons ρcons 0.9h−1

pcons,ref see Figure 2b hcons 5 kW λcons 0.5 h−1

In Figure 2, we plot the time-evolution of the deterministic functions Psun,maxand pcons,ref, 10 independent samples of processes Psunand Pcons, and the time-evolution of quantiles (computed withM

1= 100000 i.i.d. simulations). Parameters of input data and optimization problem. The values of the parameters of the optimization problem are chosen such that:

1. the state of charge of the battery remains close to a reference level, which we set to0.5, 2. we observe a clear reduction of the random fluctuation of Pgridon the time interval.

The following table gives the values of the parameters of the cost functional of the control problem.

Parameter ε λ µ ν γ C

Value 0.2 0.49 0.01 0.1 500 0.27

Unit - euros.kW−2.h−1 euros.kW−2.h−1 euros.h−1 euros euros.kW.h−1

Time discretization. The average processes( ¯u, ¯X) are computed explicitly (up to numerical integration), while the recentered processes (u∆,(0), X∆,(0)) and first order correction processes ( ˙u, ˙X) are computed using discretization schemes (detailed in Algorithms 1 and 2) with time-step equal to0.5h.

Monte-Carlo simulations. To compute the first order correction, we need Monte-Carlo estimations, as explained in Algorithm 2. We chooseM0 = 4000. For assessing the statistical performances of the optimal control associated to a symmetric loss function (ε = 0), we consider M1 = 100000 macro-runs. Among those M1 trajectories, we only consider the firstM2= 4000 trajectories for the computation of the first order corrections associated to ε = 0.2.

3.3.4 Results from the experiments

Computational time. The simulations have been performed on Python 3.7, with an Intel-Core i7 PC at 2.1 GHz with 16 Go memory. We have computed the optimal control associated to a symmetric penalization of deviations of Pgrid from its average (ε = 0) and for M1 = 100000 i.i.d. simulations, which takes about 3 seconds. The computation of the first order correction whenε = 0.2 for M2= 4000 i.i.d. simulations takes about 80 minutes.

Reduction of fluctuations. We plot the time-evolution of quantiles (see Figure 3) of the power supplied by the network in 3 cases: using no flexibility, with optimal control of the battery withε = 0, and with the approximated optimal control associated toε = 0.2 respectively. The comparison of the first graph with the two others shows that the quantiles are much closer to each other in the case of storage use, meaning that the variability of the power supplied by the grid has been much reduced, as expected. However, the difference between the optimal control with symmetric and asymmetric loss functions is not much visible on these plots.

Impact of first order correction. Overall, the effect of the first order correction ˙u (which has theoretically an average value of0), is to lower the probability of large upper deviations of Pgrid from its expectation. This is quite visible if

(20)

(a) Time evolution of Psun,max

, accounting for clear sky model (b) Time evolution of pcons,ref

, accounting for intraday peaks

(c) Example trajectories of Psun (d) Example trajectories of Pcons

(e) Time evolution of quantiles of Psun (f) Time evolution of quantiles of Pcons

Figure 2: Graphical statistics of the evolution of Psun and Pcons

we plot the time-evolution of quantiles of the deviations Pgrid(t) − EhPgrid(t)ifor the caseε = 0, in green in Figure 4, refered as ”LQ” andε = 0.2, in red, refered as ”First Order Correction”. In Figure 4, we have represented from top to bottom, the quantiles of Pgrid

(t) − EhPgrid(t)iassociated to levels99%, 95%, 80%, 50%, 20%, 5% and 1%. We observe that the empirical estimations of the lower quantiles are left unchanged, while the upper quantiles99% and 95% have been notably decreased, which was the effect sought by the choice of this loss function. To have a even more clear visualization of the change of distribution of the deviations Pgrid

− EhPgridi, we have represented the empirical histograms of Pgrid(T) − EhPgrid(T)ifor both casesε = 0 in Figure 5a (M

(21)

(a) Without flexibility

(b) With controlled flexibility (ε = 0) (c) With controlled flexibility (ε = 0.2)

Figure 3: Quantiles of Pgridas a function of time

Figure 4: Time evolution of quantiles of deviations Pgrid

− EhPgridi

ε = 0.2 in Figure 5b (M2 = 4000 i.i.d. simulations). Observe that the impact of the first order term is to break the symmetry of the distribution around0, and to reduce the probability of the highest values of Pgrid

(T) − EhPgrid(T)i. These results suggest that we have reached our goal of reducing the probability of high upper deviations of Pgrid from its average.

(22)

(a) Symmetric penalization (ε = 0) (b) Asymmetric penalization (ε = 0.2)

Figure 5: Empirical histograms of deviations Pgrid

(T) − EhPgrid(T)i

Distribution of state of charge of the battery. As the first order correction term has only minor impact on the distri-bution of the state of charge of the storage system, we only consider the case withε = 0 in this paragraph. Figure 6 shows the time-evolution of the quantiles95%, 50%, 5% of the state of charge of the battery withε = 0 (computed using M1 = 100000 i.i.d. simulations) for several initial conditions on the state of charge of the battery, namely x0= 0.75, x0= 0.5 and x0= 0.25. What we observe is that independently on the initial condition chosen, the state of charge remains between0.15 and 0.75 with high probability. Besides, the terminal distribution of the state of charge is quite independent from the initial condition: the terminal values of the quantiles (levels95%, 50%, 5%) of the state of charge are almost the same, for all initial conditionsx0= 0.75, x0 = 0.5 and x0= 0.25. This is presumably due to the term in the cost functional which penalizes the deviations of state of charge from a medium value (here1/2).

Figure 6: Time evolution of the empirical quantiles95%, 50%, 5% of the state of charge of the storage system

Simulation-based bound on approximation error of first order expansion. Following the proof of Proposition 3.6, with our choice of parameters, we obtain an upper bound on the error in the approximation of the optimal controlu(ε):

ku(ε)−u − u¯ ∆,(0)−ε ˙uk H2,2= εk ˙u (ε)− ˙uk H2,2 ≤ 4ε 2 (1 −α(T))(1 − α(T) − 2ε)k(P load,∆ − u∆,(0))+kH2,2. We would like to obtain a bound on the relative error committed ku(ε)−u−u¯ ∆,(0)−ε ˙ukH2,2

ku(ε)k

H2,2

(23)

inequality: ku(ε)−u − u¯ ∆,(0)−ε ˙uk H2,2 ku(ε)k H2,2 ≤ ku (ε)−u − u¯ ∆,(0)−ε ˙uk H2,2 ku¯+ u∆,(0)+ ε ˙ukH2,2− ku(ε)−u − u¯ ∆,(0)−ε ˙ukH2,2 ≤ 4ε 2 (1 −α(T))(1 − α(T) − 2ε) k(Pload,∆− u∆,(0)) +kH2,2 ku¯+ u∆,(0)+ ε ˙ukH2,2− ku(ε)−u − u¯ ∆,(0)−ε ˙ukH2,2 ≤ 4ε 2k(Pload,∆− u∆,(0)) +kH2,2

(1 −α(T))(1 − α(T) − 2ε)k ¯u + u∆,(0)+ ε ˙ukH2,2− 4ε2k(Pload,∆− u∆,(0))+kH2,2.

In the last inequality, we used the fact that ku(ε)−u − u¯ ∆,(0)−ε ˙uk

H2,2is asymptotically small compared to ku¯+ u∆,(0)+ ε ˙ukH2,2whenε goes to 0, as well as the previous bound on ku

(ε)−u − u¯ ∆,(0)−ε ˙uk

H2,2. Hence we obtain an upper bound which depends only on quantities which can be estimated by simulations in the algorithm. This is very convenient to assess the relative accuracy of our approximation. The left-hand-side in the last inequality is estimated using the M2 = 4000 simulations of the first order expansion and we find a value of 0.03. In other words, the relative error is smaller than3% when taking the first order expansion of the control instead of its true value. Note that we do not take into account errors due to the time discretization or due to residual noise in the Monte-Carlo estimations.

4

Proofs

4.1

Proof of Theorem 2.2

a) Observe first that, in view of (H.φ) and Lemma 4.1, ˜Luand(˜Lu)−1are uniformly bounded by a constantdt × dP-a.e (takeA : (t, ω) 7→ ∇xφ(t, ω, ut, Xut)). Therefore, and owing to (H.x)-(H.g)-(H.l)-(H.k)-(H.ψ)-(H.φ), the random variable inside the conditional expectation definingYuin (2.4) is bounded by

ΓT:=CT  C(0)ψ + |XuT|+ E [|k(0)|] + Eh|Xu T| i + CT Z T 0  C(0)l (s)+ |Xus|+ |us|+ E |g(s, 0, 0)| + E [|us|]+ E |Xus|+ E h C(0)l (s)ids

for some constantCTdepending on the bounds in (H.x)-(H.g)-(H.l)-(H.k)-(H.ψ)-(H.φ). Hence by the Cauchy Schwartz inequality, for some other constantCT:

E h |ΓT|2i≤CT  E  C(0)ψ 2  + Eh |XTu|2i + E [|k(0)|]2  + CT Z T 0  E  C(0)l (s)2  + E |g(s, 0, 0)|2+ Eh |us|2i + E h|Xus|2 i ds. Note that this bound is finite and independent fromt (since C(0)ψ ∈ L2

Ω, Xu ∈ H∞,2 ⊂ H2,2, k(0) ∈ L1, C(0)l ∈ H2,2, g(·, 0, 0) ∈ H2,1andu ∈ H2,2). ConsequentlyYu∈ H∞,2

. b) Now observe that, by definition ofYu,

Nut := ˜LutYut + Z t 0 ˜Lu s  ∇xl(s, us, Xu s, ¯gus)+ ∇xg(s, us, Xus)E h ∇¯gl(s, us, Xu s, ¯gus) i ds= Et h NTui, (4.1) withNu

T square integrable (using the same arguments as before) and therefore,N

uis a c `adl `ag martingale in H,2 . As a by-product, we obtain thatYuis a semi-martingale, which dynamics has now to be identified.

c) To justify thatYudefined in (2.4) solves the BSDE (2.5) for someMu, left-multiply both sides of (4.1) by(˜Lu t)−1, then apply the integration by parts formula in [Pro03, Corollary 2, p. 68] to(˜Lu)−1Nu and use the fact that(˜Lu)−1 is continuous with finite variations. After reorganizing terms and using thatNuhas countable jumps, we retrieve (2.5) withMut :=R0t+(˜Lus)−1dNus (which is also a c `adl `ag martingale in H

,2

, see [Pro03, Theorem 20 p.63, Corollary 3 p.73, Theorem 29 p.75]).

Figure

Figure 1: Loss function l 1 penalizing more the consumption exceedance
Table 1: Table of parameters needed to compute the expansion terms
Figure 4: Time evolution of quantiles of deviations P grid − E h P grid i
Figure 6: Time evolution of the empirical quantiles 95%, 50%, 5% of the state of charge of the storage system Simulation-based bound on approximation error of first order expansion

Références

Documents relatifs

The optimal reconfiguration of the network is theoretically possible; however, its practical application is not viable for the Galapagos case since the feeders have not

Although the four dimensions and 16 items reported here provide a better theo- retical and empirical foundation for the measurement of PSM, the lack of metric and scalar

Si l'on veut retrouver le domaine de l'énonciation, il faut changer de perspective: non écarter l'énoncé, évidemment, le &#34;ce qui est dit&#34; de la philosophie du langage, mais

This optimal operational strategy should be conducted on a daily basis while taking into consideration physical, operational, environmental and financial constraints such as the

Un conseiller (détail du Vase des Perses), Naples, Museo Archeologico Nazionale, H 3253 (inv. Foliot publiée avec l’aimable autorisation de la Soprintendenza speciale per i

Although no change was observed at the gene expression level in response to the OLIV diet in LXR+/+ mice, oleic acid-rich diet induced an LXR-dependent increase in SREBP-1c

Notre zone situé au nord EST de la ville d'Oran situé à 10 km du centre-ville elle se trouve à distance idéale par rapport aux zones immédiates tel que Canastel, Belkayed,

The main tools used in the proof of Theorem 1.2 are the Scholze– Weinstein [ 7 ] classification of p-divisible groups over O C by the Hodge–Tate filtration, and Tate’s [ 9 ] theorem