Empirical Regression Method for Backward Doubly Stochastic Differential Equations

(1)

HAL Id: hal-01152886

https://hal.archives-ouvertes.fr/hal-01152886

Preprint submitted on 21 May 2015

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Empirical Regression Method for Backward Doubly Stochastic Differential Equations

Achref Bachouch, Emmanuel Gobet, Anis Matoussi

To cite this version:

Achref Bachouch, Emmanuel Gobet, Anis Matoussi. Empirical Regression Method for Backward Doubly Stochastic Differential Equations. 2015. �hal-01152886�

(2)

Empirical Regression Method for Backward Doubly Stochastic Differential Equations

Achref Bachouch^∗

Humboldt-Universit¨at zu Berlin

Emmanuel Gobet^†

Ecole Polytechnique

Anis MATOUSSI^‡

University of Le Mans

May 18, 2015

Abstract

In this paper we design a numerical scheme for approximating Backward Doubly Stochastic Differential Equations (BDSDEs for short) which represent solution to Stochastic Partial Differential Equations (SPDEs). We first use a time-discretization and then, we decompose the value function on a functions basis. The functions are deterministic and depend only on time-space variables, while decomposition coefficients depend on the external Brownian motion B.

The coefficients are evaluated through a empirical regression scheme, which is performed conditionally toB. We establish non asymptotic error estimates, conditionally toB, and deduce how to tune parameters to obtain a convergence conditionally and unconditionally toB. We provide numerical experiments as well.

Keywords: Backward Doubly Stochastic Differential Equations, discrete Dy- namic Programming Equations, empirical regression scheme, SPDEs.

MSC2010: Primary 60H10, Secondary 62G08

1 Introduction

Backward Doubly Stochastic Differential Equations (BDSDE in short) are classic tools to give Feynman-Kac representations for stochastic semilinear PDEs, see the seminal work of [PP94]. The BDSDE (Y^x, Z^x) of our interest is of the following form

Y_s^t,x= Φ(X_T^t,x) + Z T

s

f(r, X_r^t,x, Y_r^t,x, Z_r^t,x)dr (1)

∗Institute for Mathematics, Humboldt-Universit¨at zu Berlin, Unter den Linden 6, 10099 Berlin, Germany. Email: achref.bachouch@gmail.com.

†Centre de Math´ematiques Appliqu´ees, Ecole Polytechnique and CNRS, Route de Saclay, 91128 Palaiseau Cedex, France. Email: emmanuel.gobet@polytechnique.edu.

‡Risk and Insurance Institute and Laboratoire Manceau de Math´ematiques, Univer- sity of Le Mans, Avenue Olivier Messiaen, 72085 Le Mans Cedex 09, France. Email:

anis.matoussi@univ-lemans.fr .

(3)

+ Z T

s

h(r, X_r^t,x, Y_r^t,x, Z_r^t,x)←−−

dB_r− Z T

s

Z_r^t,xdW_r,

where (X_s^t,x)t≤s≤T is a d-dimensional diffusion process starting from x at time t driven by the finited-dimensional brownian motion (Wt)0≤t≤T. Here T >0 is fixed and the differential term with ←−

dB_t refers to the backward stochastic integral with respect to a l-dimensional Brownian motion B independent from W. In addition W and B are defined on a filtered probability space Ω,F,P

where we define the sigma-fields F_t,s^W := σ{W_r −W_t, t ≤ r ≤ s}, F_s,T^B := σ{B_r−B_s, s ≤ r ≤ T}, F^W := F_0,T^W , F^B := F_0,T^B , F := F^W ∨ F^B, all completed with the P-null sets. To account for the measurability of the solution to (1), we need to define the collection of sigma-fields (for fixedt∈[0, T])

F_s^t:=F_t,s^W ∨ F_s,T^B ,

and we know that the solution is such thatYs^t,x is F_s^t-measurable for any s∈[t, T] and Zs^t,x isF_s^t-measurable for a.e. s∈[t, T].

Moreover, the random map (t, x)7→(Y_t^t,x, Z_t^t,x) provides the solution of the following SPDE and its gradient times σ at point (t, x):

u(t, x) = Φ(x) + Z T

t

[Lu(s, x) +f(s, x, u(s, x),(∇_xuσ)(s, x))]ds +

Z T t

g(s, x, u(s, x),(∇_xuσ)(s, x))←−−

dB_s

where L is the infinitesimal generator of X (see [PP94, Theorem 3.1] for details).

Such SPDEs appear in various applications like pathwise stochastic control problems, the Zakai equations in filtering and stochastic control with partial observations.

Several generalizations to investigate more general nonlinear SPDEs have been developed following different approaches of the notion of weak solutions, namely, Sobolev’s solutions [K99, BM01, MS02], and stochastic viscosity solutions [LS98, BuM01, LS02]. Generally, the approaches used to solve numerically SPDEs are an- alytic and based on time-space discretization of the equations. The discretization is achieved by different methods such as finite difference, finite element and spectral Galerkin methods [GN95, G99, W05, GK10, JK10]

Only recently some works have paid attention to the simulation and approximation of (1): See [Abo09, Ama13, BBMM13] for time discretization under various assumptions, see [Abo11] for an attempt to an implementable numerical scheme using regression methods without full convergence results, see [SYY08] for a scheme based on random walks.

In this work, we consider an empirical regression scheme (also known as regression Monte-Carlo method or least-squares Monte-Carlo method) for solving the discrete time BDSDE arising in [BBMM13]: This approach (inspired by [GLW05,

(4)

LGW06] and more recently by [GT15b]) is increasingly popular and known to account well for high-dimensional problems, as a difference with scheme based on random walks. Our original contribution is the analysis of the regression scheme for approximating BDSDEs and its proof of convergence, with some non-asymptotic error estimates in order to have the most accurate control on the convergence w.r.t.

all the parameters. Here, we adapt the tools for the regression error analysis, arising from discrete BSDE’s approximation, developed recently in [GT15b] in a quite general context. These tools will allow us to analyse the regression error in the doubly stochastic framework.

We recall the different strategies of approximation using least squares algorithms, to better motivate our approach. For the sake of clarity, assume standard Lipschitz and boundedness assumptions (detailed later) and t = 0, then start with the case h ≡0, i.e. the usual BSDE case, and consider a time discretization scheme which takes the form (in [LGW06]) of One step forward Dynamic Programming (ODP for short) equation: Y_t_N = Φ(X_t_N) and for alli∈ {N −1, . . . ,0}

Yti =E[Yti+1+f(ti, Xti, Yti+1, Zti)∆i | F_0,t^W

i], (2)

∆_iZ_t_i =E h

Y_t_i+1∆W_i^> |F_0,t^W_ii ,

where^>denotes the transpose operator and whereti∈π,π :={t₀:= 0, . . . , t_N:=T}

being a discrete time grid of the time interval [0, T], ∆_i :=t_i+1−t_iand ∆W_i:=W_t_i+1−W_t_i. Since (Xti)i forms a Markov chain, there exist deterministic measurable functions

yi(.) andzi(.), but unknown, such thatYti =yi(Xti) and Zti =zi(Xti). The func- tionsy_i(.) and z_i(.) are solutions of least squares problems in L₂(Ω,P,F_t_i) and can be approximated on a finite dimensional subspace, which coefficients are computed using Monte-Carlo simulations. Now for the caseh 6= 0, in [Abo11] a similar algorithm is proposed where the equation forY_t_i is replaced by

Y_t_i =E

Y_t_i+1+f(t_i, X_t_i, Y_t_i+1, Z_t_i)∆_i+h(t_i, X_t_i, Y_t_i+1, Z_t_i+1)∆B_i | F_0,t^W

i∨ F_0,T^B ,

where ∆Bi :=Bti+1−Bti and similarly for theZ-component. Then, the author has designed an empirical least-squares algorithm by taking approximations in the space of functions w.r.t. the variables (X_t_i, B_t_k+1−B_t_k :i≤k≤N): Thus, the dimension of this problem isdim=d+l×N and goes to infinity as the discretization parameter N → +∞. Since we know [LGW06, GT15b, GT15a] from the usual error analysis on BSDE that the convergence rates are of the formN^−c¹^/(c²^+(dim)) wherec1, c2 are positive constants and dim is the dimension of the explanatory variables, it seems hopeless to conclude to the convergence of the above algorithm.

Our strategy of approximation is different from the above and it leads to a convergent scheme. This is inspired by the ”SPDE” object seen as a PDE driven by an auxiliary independent noise (here the Brownian motion B), i.e. we are to compute Yti as a function x 7→ yi(∆B, x) for given Brownian increments ∆B. As

(5)

a consequence, the dimension of the problem is still d, but the regression schemes and the error analysis are to be performed conditionally to B. This raises new difficulties, in particular because the theory of BDSDE is well posed unconditionally toB. Moreover and as a difference with [Abo11], we incorporate in our scheme an additional improvement inspired by [GT15b] in the BSDE setting, where the discrete BSDE is considered in the form of a Multi step forward Dynamic Programming (MDP for short) equation given by

Y_t_i =E

"

Φ(X_t_N) +

N−1

X

k=i

f(t_k, X_t_k, Y_t_k+1, Z_t_k)∆_k| F_0,t^W

i

# ,

and similarly for Z_t_i. Using the tower property of conditional expectations, we note that ODP (based on (2)) and MDP coincide. But combined with empirical regression approximations, they are different and it is proved in [GT15b] that the MDP scheme leads to better error estimates than the ODP scheme, in particular for theY-component. Indeed, the quadratic error is the average of local error terms rather than the sum.

In this work, we specialise our analysis to the case wheref andhdo not depend on z, i.e. we only approximate Y. We guess that this simplification makes the reading easier for the reader (even in the ”simple” BSDE case as in [GT15b, GT15a], the analysis is rather tough) and the essence of our methodology remains unchanged if f and h depend on z. This simplified setting already raises new issues (about a priori estimates and stabilities) which we partly overcome but which will still deserve deeper investigation in the future to handle more generalf andh.

The organisation of the paper is as follows. In Section 2 we give preliminaries on BDSDEs and the assumptions we will use. Then, we define the discrete BDSDE to be solved, in the MDP form. After that, we establish a priori estimates that will be useful in the regression error analysis. In Section 3, we present the Least Squares MDP algorithm designed to approximate the solution of the discrete BDSDE of Section 2. Then, we give the full analysis of the regression error conditionally and unconditionally to the Brownian motionB. Section 4 is dedicated to some numerical tests.

Usual notations. Ifx is in an Euclidean spaceE,|x|denotes its norm. Ifϕis a vector-valued function defined onE,|ϕ|∞denotes its sup-norm. Ifν is a probability measure onE,|.|_ν stands for theL2-norm w.r.t. the measureν. IfX is a E-valued random variable with distributionν, we may write|.|_X :=|.|_ν.Last, ifAis a matrix,

|A|stands for its Hilbert-Schmidt norm.

2 Preliminaries and notations

This section gathers preliminary results to be used in order to discuss the approxi-

(6)

mation of the solution and its convergence. Actually, in the sequel we consider only the solution (X^t, Y^t, Z^t) with initial conditiont= 0: Extending the results to other tis rather straightforward. Thus, from now on, we omit to indicate the dependence w.r.t. t by simply write (X, Y, Z), the starting point X₀ is given.

2.1 Forward Backward Doubly Stochastic Differential Equation Recall the setting related to the filtered probability space given in the introduction.

Letx∈R^d be given and consider (Xs)0≤s≤T as solution of the following SDE dXs=b(Xs)ds+σ(Xs)dWs fors∈[0, T], X0 =x, (3) where b and σ are two given functions on R^d with values respectively in R^d and R^d⊗R^d, that satisfy the following standard Lipschitz assumption.

Assumption (H1). There exists a non-negative constantK such that

|b(x)−b(x⁰)|+|σ(x)−σ(x⁰)| ≤K|x−x⁰|, ∀x, x⁰∈R^d. This implies the existence of a unique strong solution to (3). Besides, we consider the following BDSDE defined by

(

−dY_s =f(s, X_s, Y_s)ds+h(s, X_s, Y_s)←−−

dB_s−Z_sdW_s, s∈[0, T],

YT = Φ(XT):=ξ, (4)

wheref andhare respectively real-valued andR^l-valued functions on [0, T]×R^d×R and Φ is a real-valued function onR^d (h is considered as a row vector).

Our standing assumptions to study (4) are the following.

Assumption (H2). There exist non-negative constants C_f, C_h, C_ξ, L_f, L_h and L_ξ such that

i) |f(s1, x1, y1)−f(s2, x2, y2)| ≤Lf

p

|s₁−s2|+|x₁−x2|+|y₁−y2| for all s₁, s₂ ∈[0, T], x₁, x₂ ∈R^d andy₁, y₂∈R,

ii) |h(s₁, x₁, y₁)−h(s₂, x₂, y₂)| ≤L_hp

|s₁−s₂|+|x₁−x₂|+|y₁−y₂| for all s1, s2 ∈[0, T], x1, x2 ∈R^d andy1, y2∈R,

iii) |f(s, x,0)|and|h(s, x,0)|are uniformly bounded on [0, T]×R^dbyC_f and Ch respectively,

iv) |Φ(x₁)−Φ(x₂)| ≤L_ξ|x₁−x₂|for all x₁, x₂ ∈R^d, v) Φ is uniformly bounded onR^dby C_ξ.

Pardoux and Peng [PP94, Theorem 1.1] proved that under the previous assumptions, there exists a unique solution (Y, Z) ∈ S²([0, T])×H²_d([0, T]) to the BDSDE (4), where

(7)

• H²_d([0, T]) denotes the set of (classes of dP×dt a.e. equal) R^d-valued jointly measurable processes {ψ_s;s∈[0, T]} such that E

hRT

0 |ψ_s|²dsi

<+∞ and ψ_s isF_s⁰-measurable, for a.e. s∈[0, T].

• S²([0, T]) denotes the set of real-valued continuous processes such that E

"

sup

0≤s≤T

|ψ_s|²

#

<+∞and ψs isF_s⁰-measurable for anys∈[0, T].

2.2 Time-discretization scheme for decoupled Forward-BDSDE In order to approximate the solution of the Forward-BDSDE (3)-(4), we introduce the following discretized version. Let

π ={t₀ := 0< t₁ < . . . < t_N :=T}

be a partition of the time interval [0, T] with time step ∆i := ti+1−ti, 0 ≤ i ≤ N −1. Throughout this work, we will use the notations ∆W_i := W_t_i+1−W_t_i and

∆B_i :=B_t_i+1−B_t_i, fori= 0, . . . , N−1.

The forward componentX is approximated by the classical Euler scheme:

( X_t^π₀ =x,

X_t^π_i+1 =X_t^π_i+b(X_t^π_i)∆_i+σ(X_t^π_i)∆W_i, fori= 0, . . . , N−1.

It is known that as max0≤i≤N−1∆_i →0, one has sup

0≤i≤NE

|X_t_i−X_t^π_i|²

→0.

The solutionY of (4) is approximated byY^π defined by the followingMulti step- forward Dynamic Programming (MDP) equation: For i = N −1, . . . ,0, we set

Y_t^π_i =Ei

"

Φ(X_T^π) +

N−1

X

k=i

f(t_k, X_t^π

k, Y_t^π

k+1)∆_k+h(t_k+1, X_t^π

k+1, Y_t^π

k+1)∆B_k

# (5)

=Ei

h

Y_t^π_i+1+f(ti, X_t^π_i, Y_t^π_i+1)∆i+h(ti+1, X_t^π_i+1, Y_t^π_i+1)∆Bi

i ,

whereEi[.] denotes the conditional expectation w.r.t. G_i defined by G_i :=σ(∆Wj,0≤j≤i−1)∨ F^∆B

with F^∆B = σ(∆Bj,0 ≤ j ≤ N −1). Observe that (G_i)0≤i≤N−1 is a discrete filtration associated to the time grid π. We recall from [BBMM13, Theorem 4.1]

the following convergence result for the time discretization error. Set ErrorN(Y, Z) := max

0≤i≤N−1 sup

ti≤s≤ti+1

E

|Y_s−Y_t^π_i|² +

N−1

X

i=0

E Z ti+1

ti

|Z_s−Z_t^π_i|²ds

.

Thus, we have

(8)

Theorem 1 Under Assumptions (H1)-(H2) and assuming in addition that b, σ, Φ, f and h are of class C² with bounded derivatives up to order 2, there exists a non-negative constant C₍₆₎ (independent onπ) such that

Error_N(Y, Z)≤C₍₆₎ max

0≤i≤N−1∆_i. (6)

We note that in the case we are dealing with (i.e. the drivers are independent from the variable z), we do not have to approximate the control process Z, since it does not enter in the approximation of Y.

Extra notations. Our aim being the Monte-Carlo approximation of the discrete BDSDE solution for a given time grid π, we shall alleviate the notation by simply writingXi, Yi forX_t^π_i, Y_t^π_i. Furthermore, we shall write

∆B :={∆B_j,0≤j≤N −1}.

With this notation and since the data are Lipschitz (coefficients of the BDSDE and of the Euler scheme), it is easy to check the following lemma, by combining the Equation (5) with a recursion argument.

Lemma 1 Under Assumptions (H1)-(H2), for each i∈ {0, . . . , N} there exists a locally Lipschitz functiony_i : (R^l)^N ×R^d7→R such that

Y_i=y_i(∆B, X_i). (7)

2.3 A priori estimates

In this section, we establish a priori estimates on discrete BDSDEs. These estimates will be needed later for the regression analysis. In the case of pure BSDEs, they are rather standard (see [GT15b] among other references): on the one hand we take advantage of the driver independent ofZ to provide slightly stronger estimates than usually. On the other hand, the BDSDE setting with the ∆B contribution is a source of difficulty in the analysis.

We aim at comparing two discrete BDSDEs, Y_1,. andY_2,., defined as follows. For j= 1,2, we set Yj,N =ξj and for all i=N −1, . . . ,0

Yj,i=Eei[Yj,i+1+fj,i(Yj,i+1)∆i+hj,i+1(Yj,i+1)∆Bi], (8) where

• Eei[·] is the conditional expectation w.r.t. G_i ∨G, wheree Ge is a sigma-field independent ofW and B,

• (ω, y) → f_1,i(ω, y) :=f_1,i(y) and (ω, y) →f_2,i(ω, y) := f_2,i(y) are real-valued and [G_i∨G]˜ ⊗ B(R)-measurable functions on R,

(9)

• (ω, y) → h_1,i+1(ω, y) := h_1,i+1(y) and (ω, y) → h_2,i+1(ω, y) := h_2,i+1(y) are R^l-valued and [G_i+1∨G]e ⊗ B(R)-measurable functions on R.

The above choice of measurability is coherent with the MDP equation (5) and with the further feature that solutions to posterior times are built using extra independent Monte-Carlo simulations (to be associated to the sigma-field G). Wee set

δYi:=Y1,i−Yi,2, δξ:=ξ1−ξ2,

δfi:=f1,i(Y1,i+1)−f2,i(Y1,i+1), δh_i+1:=h_1,i+1(Y_1,i+1)−h_2,i+1(Y_1,i+1).

Now, we are in a position to state the following lemma, which gives a local estimate on the solutions of two discrete BDSDEs.

Lemma 2 We assume that for j ∈ {1,2}, ξ_j belongs to L₂(G_N ∨G)e and that for i∈ {0, . . . , N−1},fj,i(Yj,i+1)andhj,i+1(Yj,i+1)belong toL2(G_i+1∨G). In addition,e we assume thatf2,i andh2,i+1are Lipschitz continuous with Lipschitz constantsL_f_2,i and L_h_2,i+1 (possibly G₀∨G-measurable). Then, we havee

|δY_i| ≤(1 +L_f_2,i∆_i+L_h_2,i+1|∆B_i|)Eei[|δY_i+1|]

+eEi[|δf_i|] ∆_i+Eei[|δh_i+1|]|∆B_i|. (9) Proof. From (8), we have

δYi =Eei[δYi+1+{δf_i+f2,i(Y1,i+1)−f2,i(Y2,i+1)}∆_i] +Eei[{δh_i+1+h2,i+1(Y1,i+1)−h2,i+1(Y2,i+1)}∆B_i].

Applying the triangle and Jensen inequalities, then using the Lipschitz assumptions

on f2,i and h2,i give the estimation (9).

By propagating the above result, we obtain a global stability result on the solutions of two discrete BDSDEs. The proof is easy and left to the reader.

Proposition 1 Under the notations and assumptions of Lemma 2, the following estimation holds a.s. for all i∈ {1, . . . , N}:

Γi|δY_i| ≤Γ_NEei[|δξ|] +

N−1

X

k=i

Γ_k

Eei[|δf_k|] ∆_k+Eei[|δh_k+1|]|∆B_k| ,

where Γ_i := Πⁱ⁻¹_j=0(1 +L_f_2,j∆_j+L_h_2,j+1|∆B_j|) and Γ₀ := 1.

As an application of the above proposition, we can derive an a.s. upper bound for the solution of the discrete BDSDE (5). Such an upper bound is required in the subsequent empirical regression algorithm.

(10)

Proposition 2 Under Assumptions (H1) and (H2), the solution of the discrete BDSDE (5)-(7) has an a.s. upper bound, uniformly w.r.t. i∈ {0, . . . , N}:

|y_i(∆B, .)|_∞≤C_y^∆B:=e^L^f^T^+L^h

PN−1 j=0 |∆Bj|



Cξ+CfT+Ch N−1

X

j=0

|∆B_j|



.

Proof. Apply Proposition 1 by setting Y1,i:= 0 (with ξ1 := 0, f1,i≡0, h1,i ≡0) andY_2,i:=Y_i =y_i(∆B, X_i) (withξ₂ := Φ(X_N),f_2,i(y) :=f(t_i, X_i, y) andh_2,i+1(y) :=

h(ti+1, Xi+1, y)): Combined with Assumption(H2)this gives Γi|y_i(∆B, .)|∞≤ΓNCξ+

N−1

X

k=i

Γk(Cf∆k+Ch|∆B_k|). We obtain the announced result by observing that

Γi≤exp





i−1

X

j=0

[Lf∆j+Lh|∆B_j|]



≤exp



LfT +Lh N−1

X

j=0

|∆B_j|



. (10) Observe that unfortunately this a.s. upper bound explodes in probability as N → ∞ because

N−1

X

j=0

|∆B_j| ≥(

N−1

X

j=0

|∆B_j|²)/ max

0≤j≤N−1|∆B_j| ∼T / max

0≤j≤N−1|∆B_j| →+∞

in probability. On the other hand, it is valid in rather great generality under the assumptions of our setting (f and h Lipschitz). Nevertheless, an easy improvement can be obtained provided that f and h are uniformly bounded, avoiding the exponential factor: Indeed, from (5) we directly have

|y_i(∆B, .)|_∞≤Cξ+T|f|_∞+|h|_∞

N−1

X

j=0

|∆B_j|a.s.,

yielding another upper bound which still explodes asN → ∞ but at a slower rate.

Lastly, we know that sup_N_≥1sup_0≤i≤NE|y_i(∆B, Xi)|² < +∞, see [BBMM13], which shows the gap between a.s. and L2 estimates. This is a difficulty intrinsic to the study of pathwise property of BDSDE: To our knowledge, having good pathwise estimates is an open question.

3 Regression Monte-Carlo scheme

In this section, we design an algorithm to approximate conditional expectations involved in (5) using linear least squares methods (empirical regressions). We also analyse its convergence.

(11)

Since we are to consider regressions conditionally to ∆B, it is clearer to write (Ω,F,P) as a product space (Ω^∆W×Ω^∆B,F^∆W⊗ F^∆B,P^∆W⊗P^∆B) whereF^∆W = σ(∆Wj : 0 ≤ j ≤ N −1) and F^∆B = σ(∆Bj : 0 ≤ j ≤ N −1), coherently with the discrete BDSDE (5) to solve. This is our convention from now on. Then the conditional expectation w.r.t. F^∆B is denoted byE∆B[.].

3.1 Preliminaries on Ordinary Least-Squares (OLS)

In the following, we recall the definition of the Least-Squares regression as stated in [GT15b] and specialise it to our framework. The next general probability space (Ωe×Ω^∆B,F ⊗ Fe ^∆B,Pe⊗P^∆B) (which will be larger than (Ω,F,P)) is to account for the extra simulations used in the regression Monte-Carlo algorithm.

Definition 1 Let n ≥ 1. We consider the two probability spaces (Ωe ×Ω^∆B,F ⊗e F^∆B,Pe⊗P^∆B) and (Rⁿ,B(Rⁿ), ν). Let

• S be aF ⊗Fe ^∆B⊗B(Rⁿ)-measurableR-valued function such thatS(ω,e ∆B, .)∈ L2(B(Rⁿ), ν) for eP⊗P^∆B-a.e. (ω,e ∆B)∈Ωe⊗Ω^∆B,

• K be the linear vector subspace of L2(B(Rⁿ), ν) spanned by F^∆B ⊗ B(Rⁿ)- measurableR-valued functions {p^j(∆B, .), j ≥1}.

The least squares approximation of S in the space K with respect to ν is the eP⊗ P^∆B⊗ν-a.e. unique andF ⊗ Fe ^∆B⊗ B(Rⁿ)-measurable function S^∗ given by:

S^∗(ω,e ∆B, .) := arginf

φ∈K

Z

|S(ω,e ∆B, x)−φ(x)|²ν(dx).

Then, we say that S^∗ solves OLS(S,K, ν).

In the same manner, let M :=M^∆B be a positive integer-valuedF^∆B-random variable andν_M := _M¹_∆B PM^∆B

m=1 δ_X(m) be a discrete probability measure on(Rⁿ,B(Rⁿ)), where δ_x is the Dirac measure at x and (X^(m):Ωe −→Rⁿ, m≥1) is an infinite sequence of i.i.d. random variables. For an F ⊗ Fe ^∆B ⊗ B(Rⁿ)-measurable real- valued function S such that|S(ω,e ∆B,X^(m)(ω))|e <+∞ for all m and eP⊗P^∆B-a.e.

(ω,e ∆B)∈Ω×Ωe ^∆B, the least squares approximation ofS in the spaceKwith respect toνM is the (eP⊗P^∆B-a.e.) unique,F ⊗ Fe ^∆B⊗ B(Rⁿ)-measurable functionS^∗ given by

S^∗(ω,e ∆B, .) := arginf

φ∈K

1 M^∆B

M^∆B

X

m=1

|S(ω,e ∆B,X^(m)(eω))−φ(X^(m)(ω))|e ². (11) Then, we say that S^∗ solves OLS(S,K, ν_M).

Due to (7), the MDP equation (5) is interpreted in terms of Definition 1 as follows:

(12)

For alli∈ {0, . . . , N−1},y_i(∆B, .) is the measurable function given by:

y_i(∆B, .) = solution of OLS

Y_i(∆B, .),K_i, ν_i

, (12)

whereν_i :=P◦(X_i, . . . , X_N)⁻¹,K_i is any dense subset in the real-valued functions belonging toL2(B(R^d),P◦(Xi)⁻¹) and

Y_i(∆B, x_i:N) :=Φ(x_N) +

N−1

X

k=i

f(t_k, x_k, y_k+1(∆B, x_k+1))∆_k (13) +h(t_k+1, x_k+1, y_k+1(∆B, x_k+1))∆B_k

with x_i:N := (x_i, . . . , x_N) ∈ (R^d)^N−i+1. To make the algorithm implementable, the infinite-dimensional space K_i and the exact measure νi in (12) are replaced respectively by a finite-dimensional space and an empirical measure.

3.2 Notations and algorithm

The solutionyi(∆B, .) of (12) will be approximated in a finite dimensional functional linear space, defined hereafter. Because the algorithm and the regression analysis are performed conditionally on ∆B, it is important that the number M of data to be used and the functions spaceK depend on ∆B in Definitions 1 and 2. This is a significant difference with [GT15b] whereK and M are not stochastic.

Definition 2 (Finite dimensional approximation spaces) For eachi∈ {0, . . . , N−

1}, the finite dimensional approximation space K^∆B_Y,i (of cardinality K_Y,i^∆B which is a finite F^∆B-random variable) is given by:

K^∆B_Y,i := span{p^j_i(∆B, .), j= 1, . . . , K_Y,i^∆B}

where for all j, theF^∆B⊗ B(R^d)-measurable functionp^j_i : Ω^∆B⊗R^d−→Rsatisfies the condition E∆B

h|p^j_i(∆B, Xi)|²i

<+∞.

The best approximation error of y_i(∆B, .) on the linear space K^∆B_Y,i is given by τ_1,i^∆B,Y := inf

φ∈K^∆B_Y,i E∆B

y_i(∆B, X_i)−φ(X_i)

2 .

The computation of the OLS (12) involves the law ofXi, . . . , XN, which is replaced by the empirical measure, defined as follows.

Definition 3 (Simulations and empirical measure) For any i ∈ {0, . . . , N − 1}, let M_i^∆B be¹ the number of Monte-Carlo simulations used for the regression at

1Mi^∆B may depend on ∆B to allow an optimal tuning of parameters as a function of ∆B, see Corollary 1. To avoid overfitting, we assume w.l.o.g. Mi^∆B ≥KY,i^∆B.

(13)

time t_i: namely, we sample independent copies of X_i:N := (X_i, . . . , X_N), that we denote by

C_i :={X_i:N^(i,m), m≥1}

and that we call cloud of simulationsat time t_i. For the algorithm we will use only the firstM_i^∆B simulations ofC_i, however for the sake of clarity in the analysis error it is more convenient to write C_i with an infinite sequence.

In addition, we assume that the clouds {C_i;i= 0, . . . , N−1} are sampled inde- pendently. The random variables (X_i:N^(i,m): 0≤i≤N−1, m≥1)are supported by a probability space(Ω^(M⁾,F^(M),P^(M⁾) and we define the empirical probability measure associated to the cloudC_i:

ν_i,M := 1 M_i^∆B

M_i^∆B

X

m=1

δ(X_i^(i,m),...,X_N^(i,m)).

The L₂-norm w.r.t ν_i,M will be denoted as usually as |.|_ν_i,M (and |.|_ν for another measureν).

Then, the full probability space used to analyse the following algorithm is ( ¯Ω,F¯,P¯) = (Ω,F,P)⊗(Ω^(M),F^(M⁾,P^(M)). Within this extended probability space, we keep the same notation for probability and expectation, whenever unambiguous, for the sake of simplicity.

The algorithm is defined as follows.

Algorithm 1 (Least-Squares MDP (LSMDP) Algorithm) We definey^(M_i ⁾(∆B, .), for all i, by a backward induction. We start with

y_N^(M)(∆B, .) := Φ(.) and for i=N −1, . . . ,0, we define

ψ^(M_i ⁾(∆B, .) as the solution of OLS

Y_i^(M)(∆B, .),K^∆B_Y,i , νi,M

(14) where for any xi:N := (xi, . . . , xN) ∈(R^d)^N−i+1, we set

Y_i^(M)(∆B, x_i:N) :=Φ(x_N) +

N−1

X

k=i

f(t_k, x_k, y^(M_k+1⁾(∆B, x_k+1))∆_k (15) +h(tk+1, xk+1, y_k+1^(M)(∆B, xk+1))∆Bk

.

After that, we set

y^(M_i ⁾(∆B, .) :=

h

ψ^(M_i ⁾(∆B, .) i

i, (16)

where [.]_i is the soft thresholding operator defined by

[y]_i:=−C_y^∆B∨y∧C_y^∆B, (17) C_y^∆B being the bound computed in Proposition 2. Any other (and better) upper bound on|y_i(∆B, .)|∞ could advantageously replace C_y^∆B.

(14)

The further statements will be made in terms of the squared approximation error ofy_i in the linear spaceK_Y,i^∆B with respect to the empirical measureν_i,M, defined by

τ_1,i,M^∆B,Y :=E∆B

"

inf

φ∈K^∆B_Y,i

y_i(∆B, .)−φ

2 ν_i,M

#

. (18)

A simple argument based on the inversion ofE∆B and inf and on the independence between simulations (X_i:N^i,m, m≥1) and ∆B yields the following bound.

Lemma 3 For all i ∈ {0, . . . , N−1}, we have τ_1,i,M^∆B,Y ≤τ_1,i^∆B,Y.

3.3 Main result: non-asymptotic error estimates for the regression scheme

The following theorem gives the conditional regression error of Algorithm 1 for approximating solutions of (5): It is measured in terms of

η_i,M^Y,∆B:=

s E∆B

h

yi(∆B, .)−y^(M_i ⁾(∆B, .)

2 νi,M

i

, (19)

η_i^Y,∆B:=

s E∆B

y_i(∆B, X_i)−y_i^(M)(∆B, X_i)

2

. (20)

Actually by using uniform concentration-of-measure estimates, we can switch from one error to the other, up to a small error term; see later Proposition 4 in our specific setting or more generally Proposition 5.

Theorem 2 Under Assumptions (H1-H2), for anyi∈ {0, . . . , N−1} we have η^Y,∆B_i,M ≤δ_i+√

2 exp √

2L_fT +√ 2L_h

N−1

X

k=i

|∆B_k|

N−2

X

k=i

(L_f∆_k+L_h|∆B_k|)δ_k+1, (21) where for all k in {0, . . . , N−1}

δ_k:=

τ_1,k,M^∆B,Y¹₂

+card(K^∆B_Y,k) M_k^∆B

¹₂

σY_k(∆B)

+√

2028C_y^∆B

N−2

X

j=k

(L_f∆_j +L_h|∆B_j|) v u u t

(card(K_Y,j+1^∆B ) + 1) log(3M_j+1^∆B)

M_j+1^∆B , (22)

with

σYk(∆B) :=Cξ+T(LfC_y^∆B+Cf) + (LhC_y^∆B+Ch)

N−1

X

j=k

|∆B_j|. (23)

(15)

Theorem 2 gives explicit non asymptotic error estimates for the algorithm, since the constants of the error upper bound depend explicitly on the time gridπ and on the path (∆B_k)0≤k≤N−1. As in [GT15b], it allows an easy tuning of the convergence parametersK_Y,i^∆B andM_i^∆Bto obtain an a.s. convergence given the external noiseB as in the spirit of SPDEs. The subsequent convergence result (Corollary 1) is made possible owing to the Lipschitz regularity of the unknown solution x 7→yi(∆B, x), which is stated as follows.

Proposition 3 Under Assumptions (H1-H2), for any x, x⁰ ∈ R^d and i0 ∈ {0, . . . , N−1} we have

|y_i₀(∆B, x)−yi0(∆B, x⁰)| ≤C₍₂₄₎^∆B|x−x⁰| (24)

where C₍₂₄₎^∆B :=C₍₂₅₎e^L^f^T^+L^h

PN−1 j=0 |∆Bj|

Lξ+LfT+Lh N−1

X

j=0

|∆B_j| .

Proof. Set X_iⁱ⁰^,x = x for i ≤ i0 and let (X_iⁱ⁰^,x)i≥i₀ be the Euler scheme starting from x at time i0. We apply Proposition 1 by setting Y1,i:=y(∆B, X_iⁱ⁰^,x) (with ξ₁ := Φ(X_Nⁱ⁰^,x), f_1,i(.) := f(t_i, X_iⁱ⁰^,x, .), h_1,i+1(.) := h(t_i+1, X_i+1ⁱ⁰^,x, .)) and Y_2,i:=y_i(∆B, X_iⁱ⁰^,x⁰) (with ξ₂ := Φ(X_Nⁱ⁰^,x⁰), f_2,i(.) :=f(t_i, X_iⁱ⁰^,x⁰, .) andh_2,i+1(.) :=

h(t_i+1, X_i+1ⁱ⁰^,x⁰, .)). With Assumption(H2), we get

Γ_i|y_i(∆B,X_iⁱ⁰^,x)−y_i(∆B, X_iⁱ⁰^,x⁰)| ≤Γ_NL_ξEi

h|X_Nⁱ⁰^,x−X_Nⁱ⁰^,x⁰|i

+

N−1

X

k=i

Γ_k L_fEi

h|X_kⁱ⁰^,x−X_kⁱ⁰^,x⁰|i

∆_k+L_hEi

h|X_k+1ⁱ⁰^,x−X_k+1ⁱ⁰^,x⁰|i

|∆B_k| .

Using (10) and takingi=i₀ in the above inequality, we get

|y_i₀(∆B, x)−y_i₀(∆B, x⁰)| ≤e^L^f^T^+L^h

PN−1 j=0 |∆B_j|

L_ξEi0

h|X_Nⁱ⁰^,x−X_Nⁱ⁰^,x⁰|i

+

N−1

X

k=i0

{L_fEi0

h

|X_kⁱ⁰^,x−X_kⁱ⁰^,x⁰|i

∆_k+L_hEi0

h

|X_k+1ⁱ⁰^,x−X_k+1ⁱ⁰^,x⁰|i

|∆B_k|}

.

Since the coefficients b and σ are globally Lipschitz, there exists a non negative constantC₍₂₅₎such that

sup

i0≤k≤NEi

h

|X_kⁱ⁰^,x−X_kⁱ⁰^,x⁰|i

≤C₍₂₅₎|x−x⁰|. (25)

By the last estimation, we conclude immediately.

(16)

3.4 Convergence of the algorithm and complexity

We are now in a position to study in details the convergence of the Algorithm 1, by choosing appropriately the approximation space K^∆B_Y,i and the number of simulations M_i^∆B. We are to handle the analysis conditionally to ∆B. To simplify the presentation, ∆_i = Cst = T /N and K^∆B_Y,i will not depend on i; nevertheless less restrictive investigations are possible in light of the general estimates of Theorem 2.

Approximation spaces. Since the unknown functionyi(∆B, .) is Lipschitz (Propo- sition 3), it is enough to consider piecewise approximations. Let D^∆B be a large hypercube of R^d centered onX₀ = x, that isD^∆B =Qd

k=1(x_k−H^∆B, x_k+H^∆B] for some parameter H^∆B large enough. Then, D^∆B can be partitioned in a finite number of small hypercubes C_j^∆B₁_,...,j

d of edge ρ^∆B >0 i.e.

D^∆B= [

j1,...,jd

C_j^∆B₁_,...,j_d

where

C_j^∆B₁_,...,j_d =

d

Y

k=1

(x_k−H^∆B+j_kρ^∆B, x_k−H^∆B+ (j_k+ 1)ρ^∆B]

and jk ∈ {0, . . . ,^2H_ρ_∆B^∆B −1}. To simplify the exposition we neglect the rounding effect by assuming ^2H_ρ∆B^∆B is integer. The number of hypercubes is (2H^∆B/ρ^∆B)^d, which equals card(K_Y,i^∆B) since on each hypercube, the approximation is piecewise constant.

Recall that under (H1),Xi has finite moments at any order, i.e. for anyq >0 sup

0≤i≤NE[|X_i−x|^q_∞]≤C_q,(26) (26) for a constant independent of N. With Proposition 3 at hand, we easily upper bound the squared approximation error as follows:

τ_1,i,M^∆B,Y ≤τ_1,i^∆B,Y := inf

φ∈K^∆B_Y,i E∆B

yi(∆B, Xi)−φ(Xi)

2

≤E∆B

yi(∆B, Xi)

2

1_{X_i_∈D_/ ∆B}

+ X

j1,...,jd

E∆B

y_i(∆B, X_i)−y_i(∆B, x_j₁_,...,j_d)

2

1_{X

i∈C_j^∆B

1,...,jd}

for an arbitrary pointx_j₁_,...,j_d in the hypercube C_j^∆B

1,...,jd

≤(C_y^∆B)²C_2,(26)(H^∆B)⁻²+ (C₍₂₄₎^∆Bρ^∆B)²

(17)

using the Markov inequality withq= 2 for the first term and the Lipschitz property of y_i(∆B, .) for the second. To get a squared approximation error τ_1,i,M^∆B,Y of order N⁻¹ (in coherence with Theorem 1), it is enough to choose

H^∆B =C_y^∆B

√

N , ρ^∆B= 1

C₍₂₄₎^∆B√ N.

However, this is not sufficient to contribute in η^Y,∆B_i,M with an error of magnitude N^−1/2 because of the summation overk in (21). An appropriate choice is

H^∆B=e^c^P^N−1^k=0 ^|∆B^k^|N^3/2, ρ^∆B =e^−c^P^N−1^k=0 ^|∆B^k^|N^−3/2.

For c large enough (and explicit w.r.t. model data), this shows that the τ_1,k,M^∆B,Y- errors contribute in η_i,M^Y,∆B as CN^−1/2 for a deterministic constant C. With the above choice, we have

card(K^∆B_Y,i ) = (2e^2c^P^N−1^k=0 ^|∆B^k^|N³)^d.

Number of simulations. A careful analysis of the upper bound (21) shows that M_i^∆B:=N^3d+5exp c⁰

N−1

X

k=0

|∆B_k|

!

(27) forc⁰ large enough implies

η^Y,∆B_i,M ≤C₍₂₈₎

rlog(N + 1)

N , a.s, (28)

for some deterministic constant C₍₂₈₎ > 0. We have proved the first part of the following result.

Corollary 1 (Convergence of the algorithm) For the uniform time grid with N time steps and for a.s. any discrete path (∆B_k)_{0≤k≤N−1}, the empirical regression algorithm with appropriate choices of K^∆B_Y,k and M_k^∆B yields an error in L2

conditionally to ∆B bounded by C₍₂₈₎

qlog(N+1)

N where C₍₂₈₎ is deterministic.

Furthemore, the complexity of the algorithm is C ∼ NPN−1

i=0 M_i^∆B up to a deterministic constant. Thus,

• conditionally on∆B, the complexity is of order C=O(N^3d+7);

• in expectation, the complexity is of order

E[C]≈˜cN^3d+7exp(˜c√ N) for some ˜c >0.

(18)

It remains to justify the second part regarding the complexity C. The latter is directly evaluated by counting the elementary operations as in [GT15b]. Then, its evaluation conditionally and unconditionally on ∆B are easily obtained in view of (27).

As for simple BSDEs, the curse of dimensionality occurs. But here, the effect of external Brownian motion is seemingly much more determinant, it is responsible for the factors exp

c⁰PN−1

k=0 |∆B_k|

and exp(˜c√

N). In other words, the convergence holds, but in average at a logarithmic speed w.r.t. the accuracy. In a pathwise sense, the approximation may be more or less accurate depending on the realization of ∆B, which is intuitively meaningful.

3.5 Proof of Theorem 2

The following proposition is useful to interchange the errors (19) and (20).

Proposition 4 For all i∈ {0, . . . , N−1}, we have

η_i^Y,∆B 2

≤2

η_i,M^Y,∆B 2

+ (C_y^∆B)²2028(card(K^∆B_Y,i ) + 1) log(3M_i^∆B)

M_i^∆B .

Proof. Leti∈ {0, . . . , N−1}. Write E∆B

y_i(∆B, X_i)−y_i^(M)(∆B, X_i)

2

≤2E∆B

y_i(∆B, .)−y_i^(M)(∆B, .)

2 νi,M

+E∆B

h E

h

yi(∆B, Xi)−y_i^(M)(∆B, Xi)

2|∆B,{C_k :k≥i}i

−2

y_i(∆B, .)−y^(M_i ⁾(∆B, .)

2 νi,M

+

i ,

and apply Proposition 5, withp= 2,λ=C_y^∆B (owing to Proposition 2),K=K^∆B_Y,i,

M =M_i^∆B.

To prove Theorem 2, we need few extra notations.

1. We define the followingσ-fields G_i^∗ := F^∆B∨σ(C_i+1, . . . ,C_N−1) and G^i,1:M_i :=

G_i^∗∨σ

X_i^(i,m):m≥1

for all i= 0, . . . , N−1.

2. We define

ψi(∆B, .) as the solution of OLS

Y_i(∆B, .),K^∆B_Y,i , νi,M

,

for all i = 0, . . . , N −1. This is the OLS solution when the functions f and h are computed with the right solution, as opposed to the definition (14) of ψ^(M_i ⁾(∆B, .).