## On the Monte Carlo simulation of BSDE’s: an

## improvement on the Malliavin weights

### D. Crisan

∗### , K. Manolarakis

†### , N. Touzi

‡Abstract

Monte Carlo simulation of backward stochastic differential equations offers a probabilistic numerical method for the approximation of solutions of semilinear PDEs. We propose a generic framework for the analysis of such numerical approx-imations. The general results are used to re-visit the convergence of the algorithm suggested by Bouchard and Touzi [6]. By dropping the higher order terms in the Skorohod integrals resulting from the Malliavin integration by parts in [6], we intro-duce a variant of the latter algorithm which allows for a significant reduction of the numerical complexity. We prove the convergence of this improved Malliavin-based algorithm, and we derive a bound on the induced error. In particular, we show that the price to pay for our simplification is to use a more accurate localizing function. Key words: BSDEs, weak approximations, Monte Carlo methods, Malliavin calculus

Mathematics subject classification:65C30; 65C05; 60H07; 62G08

∗_{Department of Mathematics, Imperial College London, 180 Queen’s Gate, London, SW7 2AZ, UK.}
†_{Department of Mathematics, Imperial College London, 180 Queen’s Gate, London, SW7 2AZ,}
UK.Research partially supported by EPSRC and the Greek state scholarship’s foundation

‡_{Centre de Math´}_{ematiques Appliques Ecole Polytechnique UMR CNRS 7641 91128 Palaiseau Cedex}
FRANCE

### 1

### Introduction

Let (Ω, F , P ) be a probability space on which we have defined the decoupled forward-backward stochastic differential equation (BSDE henceforth)

dXt = b(Xt)dt + σ(Xt)dWt

−dYt = f (t, Xt, Yt, Zt)dt − ZtdWt

(1.1)

with the initial condition X0 = x for the forward component and the final condition

Y1 = Φ(X1) for the backward component. In (1.1), W = {Wt, t ≥ 0} is a d-dimensional

Brownian motion. The coefficients of the system (1.1) are given by the function b : Rd→
Rd, σ : Rd → Rd×d for the forward component and f : [0, 1] × Rd× R × Rd → R for the
backward part. Also Φ : Rd _{→ R is a real valued function. A solution to the forward}

backward system (1.1) is the triplet Θt = (Xt, Yt, Zt). The approximation of the forward

part of the solution X is well understood (see, for example [14] and the references therein). The focus of this paper is on obtaining approximations for the process Y . Discrete-time schemes based on the approximation of the Brownian motion have been analysed by Chevance [7], Coquet, Macquevicius and M´emin [11], Briand, Delyon and M´emin [17], Antonelli and Kohatsu-Higa [1], Ma, Protter, San Martin and Torres [13], Bally and Pag`es [2].

In contrast with the above literature, we concenrate on approximations of solutions of backward stochastic differential equations based on the Monte Carlo simulation of the underlying Brownian motion, thus continuing the studies of Bouchard and Touzi [6], Zhang [18] and Gobet, Lemor and Warin [9]. See also Bouchard and Elie [5] for the case of jump-diffusions.

As a first contribution, we propose a generic framework for the analysis of such ap-proximations. By isolating the one-step approximation operator of the algorithm, we provide a transparent set of sufficient conditions in order to study the rate of convergence of the approximation. The main tool is an expansion of the error as a Trotter product. This generic framework applies to various numerical methods. For instance, the above

methodology is applied in [8] in order to prove convergence of a probabilistic numerical algorithm where the regressions are estimated by using the cubature method of Lyons and Victoir [15] to approximate the distribution of (X, W ).

This methodology is applied to the Malliavin intergration-by-parts based algorithm suggested in [6] whose main idea is to exploit the representation of the regression function r(x) := E[Y |X = x] = E[Y ε{x}(X)]/E[ε{x}(X)], where ε{x} denotes the Dirac mass at x,

as
r(x) = E
h
Y 1_{R}d
+(X − x)S
h_{(φ(X − x))}i
E
h
1_{R}d
+(X − x)S
h_{(φ(X − x))}i

for smooth random variables X and Y valued respectively in Rd_{and R}k_{, for some integers}

d, k ≥ 1. Here Sh _{is an iterated Skorohod integral, and φ is a smooth function with}

φ(0) = 1, called localizing function.

The Malliavin integration-by-parts algorithm of [6] turns out to have high numerical complexity mainly due to the two following facts:

• the Heaviside function 1_{R}d

+ obtained by integrating up the Dirac measure; this

numerical difficulty can be dealt with by appealing to advanced sorting algorithms from the operations research literature, see e.g. [3].

• the weights obtained from the Malliavin integration-by-parts consist of iterated Sko-rohod integrals with exploding number of terms when converted into Itˆo stochastic integrals (which is necessary for the simulation step).

The second main contribution of this paper is to suggest an improvement of the Malliavin integration-by-parts based algorithm by reducing the numerical complexity of the above weights. To do this, we first re-visit completely the convergence results of [6]. In particular our methodology allows to clarify some points in the proofs of [6]. We next suggest to drop higher order terms of the weights by a judicious expansion. This leads to a reduction of the number of terms at least from 4d to 2d, where d is the dimension of the forward

state variable. Hence the number of elementary calculations is at least cut by 2d_{, for}

instance with d = 5 we reduce the effort by more than 96% !

The price to pay for such a reduction in the numerical complexity is a change of scale in the localizing function φh(x) := φ (xh−α) of [4] and [6], where h is the time step. While h = 1/2 in [6] is the optimal scale, we find in the present setting that α must be chosen strictly larger than 1/2.

The paper is organised as follows. After collecting some preliminaries in Section 2, we report our one-step operator approach for the discretisation of BSDEs in Section 3. We provide a review of the Monte Carlo method of [6] in Section 4. Section 5 introduces our main modification by truncating the weights, and provides the corresponding asymptotic results for the error analysis.

### 2

### Preliminaries

We start first by giving some notation and the assumptions that we will be using through-out the paper. We are working on a complete probability space (Ω, F , P, {Ft)}) and on

finite time interval [0, 1] where a d dimensional Brownian motion is defined Wt, t ∈ [0, 1].

The filtration of the space is the augmented Brownian filtration. We denote by Mn _{the}

space of n × n matrices equipped with the norm |A| := Pn

i,j=1|aij|2

1/2

. We denote by
A∗ _{the transpose of an element of M}n_{. The Euclidean spaces are equipped with the usual}

Euclidean norm. The inner product of two elements in Rd _{is denoted by x · y.}

We will be using the following assumptions on the coefficients of system (1.1)

(H1a) The functions b : Rd_{→ R}d_{,} _{σ : R}d _{→ M}d _{are Lipschitz continuous}

|b(x) − b(y)| + |σ(x) − σ(y)| ≤ K|x − y| and σ is uniformly positive definite.

(H1b) The functions b, σ, σ−1 ∈ C∞ b .

(H2) The driver of the BSDE f : [0, 1] × Rd_{× R × R}d _{→ R is Lipschitz in the spatial}

variables

|f (t, x, y, z) − f (t, x0_{, y}0_{, z}0_{)| ≤ K(|x − x}0_{| + |y − y}0_{| + |z − z}0_{|)}

The terminal condition is a Lipschitz function of the state process ξ = Φ(X1) with

the same Lipschitz constant as above.

A solution to the system (1.1) is an adapted triplet of processes (X, Y, Z) taking values in Rd, R, Rd, respectively, such that

E sup 0≤t≤1 |Yt|2+ Z 1 0 |Zs|2ds < ∞ (2.1)

Following for example, theorem 2.1 of [10], such a solution exists when conditions (H1a) and (H2) are satisfied. In what follows we will be primarily working with these two hy-pothesis. The enhanced condition (H1b) will be required to show additional smoothness, in the Malliavin sense, for the Euler scheme associated with the forward component of (3.2).

### 3

### Discretizing the BSDE

Consider a partition of the time interval π = {0 = t0 < t1 < . . . < tn = 1} along which,

we aim to descretize the system (1.1). Denote by πi = ti − ti−1 and |π| := max1≤i≤nπi

the partition mesh. For the discretization of the Brownian motion we write ∆Wi+1 =

Wti+1 − Wti and ∆W

j

i+1 for its j-th entry, where j = 1, 2, . . . , d. The forward component

of system (1.1) is approximated by an Euler scheme. In particular if

X_{t}π(s, x) := x + b(x)(t − s) + σ(x)(Wt− Ws)

then the Euler scheme corresponding to the partition π is defined as

The following estimate is standard (see theorem 10.2.2 [14])
max
0≤i≤n−1E
sup
ti≤t≤ti+1
|Xt− Xtπ|
2
≤ C(1 + |x|2)|π| (3.1)
For the discretization of the backward components of the solution we define recursively
on π the random variables Y_{t}π_{i}, Z_{t}π_{i}, i = 0, . . . , n, as follows:

Y_{t}π
n = Φ(X
π
tn), Z
π
tn = 0
Z_{t}π
i−1 = π
−1
i Ei−1Ytπi∆Wti
(3.2)
Y_{t}π_{i−1} _{= E}i−1Ytπi + πif (ti−1, X

π ti−1, Y π ti−1, Z π ti−1)

Above Ei stands for conditional expectation with respect to the σ –field Ftπi, the natural

filtration of the Euler scheme, F_{t}π_{i} = σ

X_{t}π_{j} : j ≤ i

. When the terminal condition ξ is
not path dependent, the random variables Y_{t}π_{i}, Z_{t}π_{i} are (deterministic) functions of the
underlying Xπ

ti and the above conditional expectations are actually regressions

Ei−1Ytπi∆Wti = E
h
Y_{t}π_{i}∆Wti|X
π
ti−1
i
, _{E}i−1Ytπi = E
h
Y_{t}π_{i}|X_{t}π_{i−1}i

The proof of the following theorem may be found in [6] or in a slight different formulation in [18] for the path dependent case.

Theorem 3.1. Associated with the backward scheme (3.2) we define the step processes Yπ

t =

Pn−1

i=0 Ytπi 1[ti,ti+1)(t)+Y

π
tn1{tn}(t) and respectively Z
π
t =
Pn−1
i=0 Ztπi1[ti,ti+1)(t)+Z
π
tn1{tn}(t)
. Then
E
sup
0≤t≤T
|Yt− Ytπ|
2_{+}
Z 1
0
|Zt− Ztπ|
2_{dt}
≤ C|π| (3.3)
for a constant C independent of the partition.

The above scheme is not directly implementable. Special care needs to be taken for the computation of the involved conditional expectations . However it does highlight the important fact that, to obtain a fully implementable scheme we need to approximate the regression functions as well as the underlying process. In this spirit we start by defining the following operators

Definition 3.2. Let C1

b = {g| g : Rd → R, Lipschitz}. Then define the non linear operators

Ri : Cb1 → Cb1 i = n, . . . , 0 as follows
Rng(x) = g(x)
Rig(x) = E
h
g(X_{t}π_{i+1}(ti, x))
i
+ πi+1f
ti, x, Rig(x), π−1i+1E
h
g(X_{t}π_{i+1}(ti, x))∆Wi+1
i
(3.4)

Remark 3.3. In this paper we will work with forward backward systems where the termi-nal condition is a Lipschitz function of X1. When an operator from the above family is

applied to a function which is C1

b and the driver f of the BSDE is one time continuously

differentiable then, the function Rig is also differentiable with bounded first order

deriva-tives. This is just an easy consequence of the chain rule and the implicit function theorem (see [18] for details). In fact, there exists a uniform bound on the first derivative of the iteration of the family {Ri}i

max

0≤i≤n−1k∇Ri. . . Rn−1Φ(x)k∞ < ∞

Note that the operator Ri is well defined for a larger class of functions g, not necessarily

in C1

b. For instance, Rig could be defined for functions g of polynomial growth.

Using a backward induction argument we have the following representation

Y_{t}π

i = RiRi+1. . . Rn−1Φ(X

π

ti) (3.5)

for i = 0, 1, . . . , n. The next result uses the notation:

kf k_{L}p_{(P}x
ti,ti+1) :=
E[|f (Xtπj)|
p_{|X}π
ti = x]
1/p

Lemma 3.4. For every i = 0, . . . , n−1 the operator Ri enjoys the following Lipschitz-type

property : |Rig1− Rig2| (x) ≤ 1 + Cπi+1 1 − Kπi+1 kg1− g2kLp(Px ti,ti+1) (3.6)

for some p > 1, where K is the Lipschitz constant of f and C some generic constant which depends on d, p, and K.

Proof. Let us fix a value for i. Starting from the definition of Ri we have that Rig1(x) − Rig2(x) =E(g1− g2)(Xti+1(ti, x)) + πi+1 f ti, x, Rig1(x), 1 πi+1E g1(Xti+1(ti, x))∆Wi+1 −f ti, x, Rig2(x), 1 πi+1E g2(Xti+1(ti, x))∆Wi+1

By the mean value theorem and the K -Lipschitz property of f , we can find deterministic
functions ν(x) : Rd_{→ R, ζ(x) : R}d _{→ R}d _{, which are bounded by K, such that}

fti, x, Rig1(x), 1 πi+1E g1(Xti+1(ti, x))∆Wi+1 − f ti, x, Rig2(x), 1 πi+1E g2(Xti+1(ti, x))∆Wi+1 = πi+1ν(x) (Rig1(x) − Rig2(x)) + ζ(x) · E(g1− g2)(Xti+1(ti, x))∆Wi+1

and we have that

(1 − πi+1ν(x))(Rig1(x) − Rig2(x)) = (3.7)

E(g1− g2)(Xti+1(ti, x)) + ζ(x) · E g1(Xti+1(ti, x)) − g2(Xti+1(ti, x)) ∆Wi+1

From (3.7) we deduce that

(1 − πi+1K)|(Rig1(x) − Rig2(x))|
≤
E(g1− g2)(Xti+1(ti, x)) (1 + ζ(x) · ∆Wi+1)
≤ kg1− g2kLp(P_{ti,ti+1}x )E [|1 + ζ(x) · ∆Wi+1|q]
1
q
≤ kg1− g2kLp(P_{ti,ti+1}x )E
h
(1 + ζ(x) · ∆Wi+1)2k
i_{2k}1

bounded by the Lipschitz constant of f, K, we have that E(1 + ζ(x) · ∆Wi+1)2k = 2k X j=0 2k j E(ζ(x) · ∆Wi+1)j = 2k X j=0 2k j E(ζ(x) · ∆Wi+1)2j ≤ 1 + Cπi+1

which completes the proof of the lemma.

We need to clarify how the above result iterates on the family {Ri}i, a point which

will be used in the sequel. We apply Lemma 3.4 to |RkRk+1g1− RkRk+1g2| (x) and to

|Rk+1h(x) − Rk+1g(x)| to get
|RkRk+1h(x) − RkRk+1(x)| ≤
1 + Cπk+1
1 − Kπk+1
kRk+1h − Rk+1gk_{L}p_{(P}x
tk,tk+1)
and
|Rk+1h(x) − Rk+1g(x)| ≤
1 + Cπk+2
1 − Kπk+2
kh − gk_{L}p_{(P}x
tk+1,tk+2).
Then
|RkRk+1h(x) − RkRk+1g(x)| ≤
1 + Cπk+1
1 − Kπk+1
1 + Cπk+2
1 − Kπk+2
kh − gk
Lp(P_{tk,tk+2}x )

by the semi group property. In general it holds that

|R0. . . Rig(x) − R0. . . Rih(x)| ≤
i+1
Y
j=1
1 + Cπj
1 − Kπj
kg − hk
Lp(P_{t0,ti+1}x )
for all i = 0, . . . , n − 1.

As mentioned above, the conditional expectations involved in (3.2) are not computable
in most cases. Hence, the next step would be to substitute the expectation operator E[·] in
(3.4), with a simulation-based approximating operator ˆ_{E[·] that is explicitly computable.}

The simulation could rely on a Monte-Carlo method as is the case in [6] and [12], or on
an evaluation on a tree as in [8]. Given, such an operator ˆ_{E[·] we define for i = n, . . . , 0}

ˆ
Rng(x) = g(x)
ˆ
Rig(x) =ˆE
h
g(X_{t}π
i+1(ti, x))
i
(3.8)
+ πi+1f
ti, x, ˆRig(x),
1
πi+1
ˆ
E
h
g(X_{t}π_{i+1}(ti, x))∆Wi+1
i

Given the above family of empirical operators, we define as approximating value of Yti

ˆ

Y_{t}π_{i} = ˆRiRˆi+1. . . ˆRn−1Φ(Xtπi)

We are now able to express the error between the Backward Euler scheme Yπ _{and the}

empirical value ˆYπ_{, as an error between the operators E[·], ˆ}_{E[·]. Let}
Ri,n−1g := Ri. . . Rn−1g Rˆi,n−1g := ˆRi. . . Rn−1g

Then the global error at each step i = 0, . . . , n − 1 is given by

Y_{t}π_{i} − ˆY_{t}π_{i} = Ri,n−1Φ(x) − ˆRi,n−1Φ(x). (3.9)

We also set by convention ∆W_{i+1}0 := 1, and we introduce the error

V[g; l](x) := _{E −}_{E}ˆh

g(X_{t}π_{i+1}(ti, x))∆Wi+1l

i

and denote the local p- errors at time ti as

E_{i}p[g; l] :=
V[g; l](X
π
ti)
p
(3.10)
where i = 0, 1 . . . , n − 1, l = 0, 1, . . . , d and p > 1.

Theorem 3.5. Assume that the partition is equidistant (ti = 1/n) and that (H1a) and

(H2) are satisfied. Then, for any p > 1 and i = 0, . . . , n − 1, the following holds

|Y_{t}π_{i} − ˆY_{t}π_{i}| ≤ C

|π| 0≤l≤dmax 0≤i≤n−1

E_{i}ph ˆRi, nΦ; l

i

Proof. In this proof C will be a constant which may vary from line to line. It is clearly sufficient to prove the result for i = 0.

We start from the definition of the regression and the empirical regression operators. For any function g, such that the corresponding expectation is well defined, we have that

Rig(x) − ˆRig(x) = (E − ˆE)[g(Xtπi+1(ti, x))]
+ πi+1
f
ti, x, Rig(x),
1
πi+1E
h
g(X_{t}π_{i+1}(ti, x))∆Wi+1
i
− f
ti, x, Rig(x),
1
πi+1
ˆ
E
h
g(X_{t}π_{i+1}(ti, x))∆Wi+1
i

Using the Lipschitz property of f we may argue as in the proof of Lemma 3.4 to deduce
that
|Rig(x) − ˆRig(x)| ≤
K
1 − Kπi+1
n
E −Eˆ
[g(X_{t}π
i+1(ti, x))]
+
E −Eˆ
[g(X_{t}π
i+1(ti, x))∆Wi+1]
o (3.11)

We now go back to (3.9) and develop the error as a Trotter product, that is we add and subtract n terms where every term is a combination of n − i empirical backward projections and i perfect ones R0, iRˆi+1,n−1Φ(Xtπ0).

|Yπ t0 − ˆY π t0| = |R0,nΦ(X π t0) − ˆR0,nΦ(X π t0)| ≤ n−1 X i=0 |R0, i−1Rˆi, n−1Φ(Xtπ0) − R0, i ˆ Ri+1,nΦ(Xtπ0)| (3.12)

We may apply the Remark following lemma 3.6, to every term in the sum to deduce that

R0,iRˆi+1,n−1Φ(x) − R0,i−1Rˆi,n−1Φ(x)

≤ i Y j=1 1 + Cπj+1 1 − Kπj+1 (Ri− ˆRi) ˆRi+1,nΦ(X π ti(0, x)) p

Using (3.11) we conclude that
R_{i}− ˆR_{i}) ˆR_{i+1}. . . ˆR_{n−1}Φ(X_{t}π
i(0, x)
p (3.13)
≤ K(d + 1)
1 − Kπi+1
max
0≤l≤dE
p
i h ˆRi,nΦ; l
i

Plugging these estimates in (3.12) we have that
|Y_{t}π_{0} − ˆY_{t}π_{0}| ≤ K(d + 1)
1 − Kπi+1
n−1
X
i=0
i
Y
j=1
1 + Cπj+1
1 − Kπj+1
max
0≤l≤dE
p
i h ˆRi,nΦ; l
i
≤ K(d + 1)
1 − Kπi+1
max
0≤l≤d
0≤i≤n−1
E_{i}ph ˆRi,nΦ; l
i_{X}n−1
i=0
i
Y
j=1
1 + Cπj+1
1 − Kπj+1

and the proof completes by noting thatPn−1

i=0
Qi
j=1
_{1+Cπ}
j+1
1−Kπj+1
≤ eC+K
|π| .

### 4

### The Malliavin calculus approach

We will now depart from the abstract level of the previous section to analyze the method presented in [6]. Throughout this section the stronger condition (H1b) is assumed to hold. Also, the partition is assumed to be equidistant so that πi = |π|, ∀i. The choice for the

simulation-based approximation ˆ_{E[·] of E[·] is based on a representation of the conditional}
expectation obtained by means of the (Malliavin) integration by parts formula. The exact
formula of this representation involves some weights, that are in fact iterated Skorohod
integrals. So, we first introduce the related notation.

We denote by Jk the subset of Nk with elements I = (i1, . . . , ik) that satisfy 1 ≤ i1 <

. . . < ik ≤ d. Trivially, we set J0 = ∅. Given two elements I ∈ Jk, J ∈ Jq we define their

concatenation I ∗ J := (r1, . . . , rl) with k ∧ q ≤ l ≤ d ∧ (k + q), ri ∈ I ∪ J for every i ≤ l,

and 1 ≤ r1 < . . . < rp ≤ d. Finally, for any J ∈ Jk, k = 0, 1 . . . , d we write Jc for its

complementary set, that is , the unique set such that J ∗ Jc _{= {1, 2, . . . , d} .}

Given a matrix valued process h with columns denoted by hi and a random variable F we denote by

S_{i}h[F ] :=
Z ∞

0

F (hi_{t})∗dWt

and for a multi index I = (i1, . . . , ik)

where the integrals are understood in the Skorohod sense. We extend the definition to I = ∅ by setting Sh

∅[F ] = F. Let φ be a bounded and continuous real valued function with

φ(0) = 1. We will say that φ is a smooth localizing function if

∂Iφ(x) ∈ Cb0(Rd), for any k = 0, 1, . . . , d, I ∈ Jk

trivially we set ∂∅φ = φ. We denote the collection of all smooth localizing functions by L.

For integrability reasons, we will follow [6] by requiring the localizing function to satisfy
the condition:
X
I∈∪d_{k=1}Jk
Z
Rd
|x|m_{∂}
Iφ(x)2dx < ∞. (4.1)

By L0, we denote the collection of all localizing functions satisfying (4.1). For i =

0, 1, . . . , n, we consider the matrix valued process

hi(t) =
1
|π|σ
−1
(X_{t}π_{i−1}) 1[ti−1,ti)(t)−
1
|π|σ
−1
(X_{t}π_{i})
Id+|π|∇b(Xtπi)+
d
X
j=1
∇σj(X_{t}π_{i})∆W_{i+1}j
1[ti,ti+1)(t)

Under (H1b), the Euler scheme is infinitely many times differentiable in the Malliavin sense, and the derivative is given by

DtXtπ1 = σ(X
π
t0)1t≤t1,
DtXtπi = DtX
π
ti−1+ |π|∇b(X
π
ti−1)DtX
π
ti−1
+
d
X
j=1
∇σj_{(X}π
ti−1)DtX
π
ti−1∆W
j
i + σ(X
π
ti−1)1(ti−1,ti](t)

Observe that the process hi(t) satisfies the identities

Z 1 0 DtXtπihi(t)dt = Id, Z 1 0 DtXtπi+1hi(t)dt = 0 (4.2)

Also, for all I ∈ ∪d_{k=0}Jk, l = 0, . . . , d, and φ ∈ L0, the iterated integrals

Shi I ∆W l i+1φ(X π ti− x) = X J ⊆I (−1)|J |∂Jφ(Xtπi− x)S hi Jc[a(∆Wi+1)] (4.3)

is well defined, and belong to D1,2_{, the space of one time Malliavin differentiable random}

variables. The latter equality follows from the first property of (4.2). The following representation is reported from [4] and [6].

Theorem 4.1. Let ρ is a real valued function, 0 ≤ l ≤ d, and assume that the random variable F = ρ(Xπ

ti+1)∆W

l

i+1∈ L2. Then for all localizing functions φ1, φF ∈ L0:

EF |Xtπi = x = EQF[hi, φF](x) E [Q1[hi, φ1](x)] , (4.4) where QF[hi, φ](x) := Hx(Xtπi)ρ(X π ti+1)S hi (1,...,d)[∆W l i+1φR(Xtπi − x)],

and Hx(y) :=Qd_{i=1}1xi≤yi is the Heaviside function in d-dimensions.

Observe that, q_{i}1_{(x) = E [Q}1[hi, φ1](x)] is the density of Xtπi. The above representation

paves the way for a Monte Carlo approach to the problem of conditioning. In particular it suggests to substitute the regression function EF |Xπ

ti = x with a ratio of empirical

means. However we should note that in the simulation, there could be integrability issues. If we consider N independent random variables {Bk(x)}Nk=1 with E[Bk(x)] =

qi(x), ∀k then the estimator of the denominator ˆqi1(x) = N−1

PN

k=1Bk(x) has a gaussian

asymptotic distribution. For this reason, in [6]( Lemma 3.3) the authors take advantage of preliminary bounds available for Yπ

ti, Ei−1[Y

π ti∆W

l

i+1]. In particular, there exists functions

Ψi(x), Ψi(x), ζ_{i}(x), ζi(x) of polynomial growth, such that

Ψi(Xtπi) ≤ Y π ti ≤ Ψi(X π ti) (4.5) ζ i−1(X π ti−1) ≤ Ei−1[Y π ti∆W l i] ≤ ζi−1(X π ti−1)

for i = 0, . . . , n and l = 0, . . . , d. With the above bounds at hand let us now build
the backward algorithm. Let us denote by (Ω0, F0_{, P}0) the original probability space.
Consider nN copies of the Euler scheme approximation of the forward diffusion, that is,
will be making use of N copies at every point tkon the grid. We define n probability spaces,

{(Ωi_{, F}i_{, P}i_{)}}n

i=1, where in each of these spaces N copies live, i.e.

n Xπjo

j∈Ni

{N i+1, . . . , N (i+1)}, live on the space Ωi_{. Further on, we consider the product probability}
space
¯
Ω, ¯F , ¯_{P}
= ⊗n
i=0Ω
i_{, ⊗}n
i=0F
i_{, ⊗}n
i=0P
i
(4.6)

and we lift all processes in this space. Integration with respect to the measure ¯_{P is denoted}
by ¯_{E, while integration with marginal measures, P}i_{, P}i_{× P}k_{, is denoted by E}i_{, E}i,k etc.
The operator ˆ_{E suggested from Theorem 4.4 is defined as follows for i = n − 1, . . . , 1:}

ˆ
Ei
h
g(X_{t}π
i+1(ti, x))∆W
l
i+1
i
= T_{i}ζh ˜_{E}i
h
g(X_{t}π
i+1(ti, x))∆W
l
i+1
ii
(4.7)
with T_{i}ζ(x) = ζ_{i}(X_{t}π_{i}) ∧ x ∨ ζ
i(X
π
ti) for any x ∈ R
d _{and}
˜
Ei
h
g(X_{t}π
i+1(ti, x))∆W
l
i+1
i
=
ˆ
QF_{[h}
i, φF](x)
ˆ
Q1_{[h}
i, φ1](x)
,

for possibility different localizing functions φ1 and φF ∈ L0, and

ˆ
QF_{i} [hi, φF](x) =
1
N
X
j∈Ni+1
Hx(Xπ
l
ti )g(X
πl
ti+1)S
hl
i
h
φR
X_{t}π_{i}l− x∆W_{i+1}l,k
i
ˆ
Q1_{i}[hi, φ1](x) =
1
N
X
l∈Ni+1
Hx(Xπ
l
ti )S
hl
i
h
φ1(Xπ
l
ti − x)
i

Here, ∆W_{i+1}l,0 = 1 and for 1 ≤ k ≤ d, ∆W_{i+1}l,k is the k-th component of the l-th copy in the
Ni+1 bundle of copies of the increment ∆Wi+1. Moreover, hli stands for the evaluation of

the function hi using the l-th copy of the Euler scheme. At time t0, ˜E is just the empirical

mean
˜
E0g(Xt1(0, x)∆W
l
1
= 1
N
X
j∈N1
g(X_{t}π_{1}l)a(∆W_{1}j,l)

1. In the current set up (3.8) becomes
˜
Rng(x) = g(x)
˜
Rig(x) = ˆE
h
g(X_{t}π
i+1(ti, x))
i
+ |π|f
ti, x, ˜Rig(x), |π|−1Eˆ
h
g(X_{t}π_{i+1}(ti, x))∆Wi+1
i
(4.8)
ˆ
Rig(x) = TiΨ( ˜Rig(x)) i = 0, 1, . . . , n
where TΨ
i (F ) = Ψi∨ F ∧ Ψi.

Remark 4.2. The conclusions of theorem 4.1 remain valid when applied to a random
function %. In particular, if % := %(x, ξ) where ξ is independent of Fπ _{the representation}

(4.4) is true, only now both sides involve random functions instead of deterministic ones. A careful inspection of the algorithm shows that (4.4) needs to be applied to such functions.

The key point is Theorem 5.1 in [6]. This gives the description for the local error.

Theorem 4.3. For 0 ≤ i ≤ n − 1 and 0 ≤ l ≤ d, consider the random variable F =
%(X_{t}π_{i+1}, ω)∆W_{i+1}l , where %(·, ω) is a random function with ω ∈ Ωi+2× . . . Ωn_{.}

For φF ∈ L0, let qiF(x) := E0QF[hi, φF](x) be defined as in Theorem 4.1 so that ri(x) :=

qF

i (x)/qi1(x) = Ei+1,...,n[F |Xtπi = x], and consider the regression estimator

ˆ ri(x) := Tiζ(˜ri(x)), with ˜ri(x) := ˆ QF i [hi, φ](x) ˆ Q1 i[hi, ψ](x) .

Then, for any p > 1, we have the error estimate:

kri(Xtπi) − ˆri(X
π
ti)kp ≤
1
N1/2p(2Γ(F ))
1/p _{(4.9)}
where
Γ(F ) :=
Z
Rd
γ(x)kQF[hi, φ](x)kL2_{(¯}
P)+ (|r(x)| + γ(x))kQ
1
[hi, ψ](x)kL2_{(¯}
P) dx
and
γ(x) := |Ψi(x) − Ψi(x)| ∨ |ζi(x) − ζ_{i}(x)|.

For the sake of completeness, we include the proof of Theorem 4.3 in the Appendix. In
particular, the factorisation (4.6) of the probability space Ω, ¯¯ F , ¯_{P} clarifies some points
in the proof of [6].

The above result provides us with a control on the error of the simulation of the conditional expectation. The error is controlled by means of the paremeter N , which is the number of simulated paths used at time ti for the Monte Carlo method. However,

there is a further negative influence of the partition mesh |π| hidden in the functional Γ. In particular , the second norm of the involved random variables explodes when the partition mesh shrinks. The exact rate of explosion is given in the following lemma, reported from [6].

Lemma 4.4. Let µ : Rd→ R be a function of polynomial growth sup

x∈Rd

|µ(x)|

1 + |x|m < ∞ for some m ≥ 1

For φ ∈ L0, set φπ(x) := φ(|π|−1/2x) and let F be defined as in Theorem 4.3. Assume

further that F ∈ L2+ _{for some > 0. Then}

max
1≤i≤n
Z
Rd
µ(x)kQF[hi, φπ](x)kL2_{(¯}_{P)}dx ≤
C
|π|d/4.

We are now able to conclude the estimate for the global error of the algorithm.

Theorem 4.5. Assume that (H1b) and (H2) hold true. Then for the Malliavin calculus algorithm the following error estimate holds

|Yπ
t0 − ˆY
π
t0| ≤
C
|π|1+d/4p_{N}1/2p

Proof. An easy induction argument shows that V := ˆRi,n−1g(x) = %(x, ω) for ω ∈ Ωi+2×

4.3 and this result gives us for every 0 ≤ l ≤ d:
k_{E −}_{E}ˆh%(Xti+1(ti, x), ω)∆W
l
i+1
i
x=Xπ
ti
kp ≤
1
N1/2p
Z
Rd
γ(x)
kQV_{[h}
i, φπ](x)kL2_{(¯}
P)
+ (|r(x)| + γ(x))kQ1[hi, ψπ](x)kL2_{(¯}
P)
dx
1/p

The above combined with theorem 3.5 gives

|Yπ
t0 − ˆY
π
t0| ≤
C
|π|
1
N1/4
Z
Rd
γ(x)hkQV_{[h}
i, φπ](x)kL2_{(¯}_{P)}
+ (|r(x)| + γ(x))kQ1[hi, ψπ](x)kL2_{(¯}
P)
i
dx
1/p

The functions γ(x), r(x) are of linear growth (see [6]) so lemma 4.4 finishes the proof.

### 5

### An improvement on the Malliavin weights

The Malliavin calculus algorithm provides us with an efficient method for the numerical solution of a BSDE. However one serious drawback of this method is that it can become quite heavy from the implementation viewpoint when the dimension is high. Our aim in this section is to present a variation of the algorithm presented in [6] that reduces the computational effort. We will show that one can consider the function

¯
hi(t) =
1
|π|
σ−1(X_{t}π_{i−1})1{[ti−1,ti)}(t) − σ
−1
(X_{t}π_{i})1{[ti,ti+1)}(t)
(5.1)

in place of hi and in effect S ¯

hi _{in place of S}hi_{, and form a backward induction scheme}

based on these new weights. Let us try to appreciate the gain from this truncation. To simplify things, in the definition of h, we treat 1

|π|σ
−1_{(X}π
ti)
Pd
j=1∇σ
j_{(X}π
ti)∆W
j
i+1 as

being one term (when in fact there are d). Under this assumption, we may deduce that
the iterated Skorohod integral Shi_{[φ(X}π

ti − x)∆W

l

i+1] constitutes of 4d terms. The same

algorithm, we have to put in only (1/2)d _{of the effort required for the original algorithm,}

when computing the weights. For example, in five dimensions, we make approximately 4% of the original effort. To add to this, observe that the above assumption is quite moderate. The following should also be considered:

• By excluding the last part of h from our computations we are able to avoid d + 1 terms for every integral instead of two.

• Avoiding the last part of h, we avoid the computation of the derivatives of the matrix valued function σ(·).

• Lastly, the last part of h is also the anticipating part of it. Hence, it is the one contributing most to the Malliavin correction terms in the Skorohod integrals, which are now excluded from the computations.

The price that has to be paid for the reduction in the computational effort, is that more copies of the Euler scheme need to be used, to obtain error estimates equivalent to the previous section. This will be made precise in Theorem 5.6.

Remark 5.1. The truncation of the h process does not affect the denominators in the
representation of Theorem 4.1. Indeed, a repetitive application of Lemma 3.2.1 of [16]
shows that
q1_{(x) = E}Hx(Xtπi) EiS
hi_{[φ(X}π
ti − x)]
= E
Hx(Xtπi) Ei[S
hi
−I[φ(Xtπi− x)]
1
|π|(−σ(X
π
ti−1))
1_{· ∆W}
i]
(5.2)
= E
h
Hx(Xtπi) S
¯
hi
1
Ei[S−Ih [φ(X
π
ti− x)]
i
= . . . = EhHx(Xtπi) S
¯
hi_{[φ(X}π
ti− x)]
i

In other words, in the denominator the truncation is exact and no precision is lost. Given this, our effort will focus on the truncation of the numerator.

Observing that R DtXtπi

¯

hi(t)dt = Id, we see that expansion (4.3) holds for ¯hi as well:

S¯hi
I [∆W
l
i+1ϕ(X
π
ti− x)] =
X
J ⊆I
(−1)|J|∂Jϕ(Xtπi − x)S
¯
hi
Jc[∆W_{i+1}l ] (5.3)

for every 0 ≤ l ≤ d. Hence, in order to analyze the error due to the truncation of hi, we

will concentrate on the difference Shi

Jc[∆W_{i+1}l ]−S

¯ hi

Jc[∆W_{i+1}l ] for every J ⊆ I and 0 ≤ l ≤ d.

In the rest of this section, we analyse the error due to the truncation in order to better
understand the price to be paid for the above gain in complexity. For this purpose we
need to redefine the regression function according to the new “truncated” weights. In the
same spirit with Theorem 4.1 we introduce an approximating expectation operator ¯_{E[·]}
as follows: For R = %(Xπ

ti+1)∆W

l

i+1, 1 ≤ i ≤ d and 0 ≤ l ≤ d, we write

¯

qR_{(x) := E}QR[¯hi, φπR](x) , ¯q
1

(x) := EQ1[¯hi, φπ1](x) = q
1_{(x)}

where given any φ1, φR ∈ L0, we denote φπ(x) := φ(x/|π|α) and set

¯
E
h
%(X_{t}π
i+1(ti, x))∆W
l
i+1
i
=
Tζi_{(¯}_{q}R_{(x)/¯}_{q}1_{(x)),} _{a(x) = x}
l, l = 1, . . . , d
TΨi_{(¯}_{q}R_{(x)/¯}_{q}1_{(x)),} _{a(x) = 1}
(5.4)

Observe that, here we have introduced a normalization in the localizing function, dividing by |π|α. In the previous section we had α = 1/2, whereas here we need to consider α > 1/2. This is necessary for the control of the error and it is exactly what will lead us to worse variance estimates in comparison to the previous section, or in other words, it will lead us to require the generation of more copies of the forward Euler scheme (larger N ) to obtain a rate of convergence of the same order with the previous section.

The next step is, using this new regression function ¯_{E[·] to redefine the operators R}i

of (3.4) as
ˇ
¯
Rig(x) = ¯E
h
g(X_{t}π_{i+1}(ti, x))
i
+ |π|fti, x, ˇR¯ig(x), |π|−1E¯
h
g(X_{t}π_{i+1}(ti, x))∆Wi+1
i
¯
Rig(x) = TΨi( ˇR¯ig(x)) (5.5)

for i = n − 1, . . . , 0 and any function g in a suitable class. The iteration of the above family will give us the values

¯
Y_{t}π

i := ¯Ri. . . ¯Rn−1Φ(x)

for i = n − 1, . . . , 0 where Φ(·) is the function of the terminal condition of the BSDE. Finally we introduce the family of operators { ˆR¯i}ni=1 that corresponds to the

simulation-based estimation of the family { ¯Ri}ni=1 in a Monte Carlo fashion similar to the previous

section. That is, the definition for the family { ˆR¯i}ni=1is similar to (4.8), with the difference

that we use ¯hi(t) in place of hi(t) for every step i.

To produce meaningful error estimates, we first need to look at the Lp norms of the iterated Skorohod integrals. We first introduce some additional notations. Let

ˆ
hi(t) := hi(t) − ¯hi(t)
= − 1
|π|σ
−1
(X_{t}π
i) |π|∇b(X
π
ti) +
d
X
j=1
∇σj_{(X}π
ti)∆W
j
i+1
!
1[ti,ti+1)(t).

For γ ∈ {±1}|I|, I ∈ ∪d_{k=1}Jk, and i = 1, . . . , d denote

SI,γ[F ] := Si1,γ1 ◦ . . . ◦ Si|I|l,γ|I|[F ],

where

Si,+1[F ] := δ(F ˆhi) and Si,−1[F ] := δ(F ¯hi).

Then, by an easy induction argument, we see that for any Im ⊆ I:

Shi
Im[∆W
l
i+1] − S
¯
hi
Im[∆W
l
i+1] =
X
γ∈{±1}m_{\{−1,−1,...,−1}}
SIm, γ[∆W
l
i+1]. (5.6)

When these integrals are of length one, the following estimate follows directly from the properties of Skorohod integration:

S¯hi j [∆W l i+1] = ∆W l i+1¯h j i(ti−1) ∆Wi+ ∆Wi+1l ¯h j i(ti)∆Wi+1− Z ti+1 ti Tr[Ds(∆Wi+1l )¯h j i(ti))]ds,

which implies that kS¯hi j [∆W l i+1]kp ≤ C p|π|, kDs1,...,sρS ¯ hi j [∆W l i+1]kp ≤ C |π|. We also note that,

Sˆhi j [∆W l i+1] = ∆W l i+1ˆh j i(ti−1) ∆Wi+ ∆Wi+1l ˆh j i(ti)∆Wi+1− Z ti+1 ti Tr[Ds(∆Wi+1l )ˆh j i(ti))]ds,

which implies that

kSˆhi j kp ≤ C, kDtS ˆ hi j kp ≤ C p|π|, but kDs1,...,sρS ˆ hi j kp ≤ C |π|.

For the general case, we have the following estimate on the explosion rate. The proof is in the spirit of Lemma 6.1 of [6].

Lemma 5.2. For 1 ≤ k, m ≤ d, consider a vector of times τ = (s1, s2, . . . , sk) ∈

[ti−1, ti+1]k, a multi index Im ∈ Jm. Then for every γ ∈ {±1}m\{−1, −1, . . . , −1}:

kDτSIm, γ[∆W
l
i+1]kp ≤ C|π|(1−k−m)/2, 0 ≤ l ≤ d. (5.7)
Moreover
kShi_{[∆W}l
i+1] − S
¯
hi_{[∆W}l
i+1]kp ≤
C
|π|(d−1)/2
and, for p > 1,
kShi_{[∆W}l
i+1]kp, kS
¯
hi_{[∆W}l
i+1]kp = O(|π|−d/2).

Proof. We only prove (5.7), as the remaining estimates follows the same line of argument as in [6]. For simplicity we write SI, γ[∆Wi+1l ] = SI, γ. The proof is done with induction

Consider Ik = (j1, . . . , jk) and γ ∈ {±1}k and we assume for simplicity that γ1 = −1.
Then
kDτSIk, γkp =
Dτ
SIk−1, γ−Sj1, γ1(1) −
Z ti+1
ti−1
DsSIk−1, γ−¯h
j1_{(s)ds}
p
≤ X
Λ⊆{t1,...,tl}
kDΛSIk−1, γk2pkDΛcδ(¯h
j1_{)k}
2p
+
Z ti+1
ti−1
kDΛ,sSIk−1, γ−k2pkDΛc¯h
j1
γ(s)k2pds,

where we have used the elementary rule δ(F ut) = F δ(ut)−

R1

0 DtF utdt and Exercise 1.2.14

pg 34 of [16]. The term involving the Riemann integral is

Z ti+1
ti−1
kDΛ,sSIk−1, γ−k2pkDΛc¯h
j1
γ (s)k2pds
≤ 2C|π||π|(1−|Λ|−1−k+1)/2_{|π|}−1
= |π|(1−|Λ|−k)/2.

As for the other terms, if Λ = τ by the induction hypothesis and the remark

kDτSIk−1, γ−k2p kδ(¯h
j1_{)k}
2p ≤ C|π|(1−l−k+1)/2|π|−1/2
= C|π|(1−l−k)/2,
and if Λ ⊂ τ
kDΛSIk−1γ−k2p kDΛcδ(¯h
j1_{)k}
2p≤ C|π|(1−|Λ|−k+1)/2|π|−1
= |π|(−|Λ|−k)/2
≤ |π|(1−l−k)/2

as |Λ| ≤ l − 1. Putting all the estimations together we have the result.

Remark 5.3. As in the previous section, it is an easy induction argument to show that for any i = 0, 1, . . . , n − 1, the iteration of the family { ˆR¯i}ni=1 produces random functions

that may be written as ˆR¯i. . . ˆR¯nΦ(x, χ) with χ a random variable. Observe that, with

the notation of the previous section, χ can be shown to be a functional of the copies of
the Euler scheme {Xπl_{, l = i + 1, . . . , n}.}

To obtain a rate of convergence for this variation of the algorithm presented in [6], we need to compare Yπ t0 = R0. . . RnΦ(x) with ˆ ¯ Yπ t0 = ˆ ¯

R0. . . ˆR¯nΦ(x) when X0 = x. The first

step is to look at the local error between Ri, ¯Ri, i = 0, . . . , n − 1, when applied to random

functions of the form described in Remark 5.3.

Theorem 5.4. Given a localizing function φ ∈ L0, let φπ(x) := φ (|π|−αx), and consider

the definition of the regression (4.4) and the truncated regression functions (5.4) with
localising function φπ_{. Then, there exists a constant C independent of the partition such}

that:

1. If % is a Lipschitz real valued function , then for any i = 0, 1, . . . , n − 1 and p > 1:

Ri%(Xtπi) − ¯Ri%(X π ti) p ≤ C|π| 1 p(α+ 1 2).

2. Let χ be a random variable independent of Fπ

1 and assume that %(Xtπi+1, χ)∆W

l i+1∈

Lq, q > 1, with % a real valued random function and l = 0, 1, . . . , d. Then Ri%(Xtπi) − ¯Ri%(X π ti) p ≤ C|π| α pk%(Xπ ti+1, χ)k 1/p p .

Proof. Let us first assume that we are working with a Lipschitz deterministic function %. Using the notation and arguments presented in Theorem 3.5, where here we consider

¯

E ≡E, we know that there exists a constant C such thatˆ

k(Ri− ¯Ri)%(Xtπi)kp ≤

C

1 − K|π|0≤l≤dmaxE p i [%; l] .

Hence we may focus on the right hand side and for that, we fix a value for l ∈ {0, . . . , d}. Let

νi(x) := ¯qR(x)/¯q1(x).

Using the truncation, we have that

E_{i}p[%; l]p =
E −
¯
E
h
%(X_{t}π
i+1(ti, X
π
ti))∆W
l
i+1
i
p
p
≤ E
νi(Xtπi) − ri(X
π
ti)
p
∧ γ(Xπ
ti)
p
≤ E
νi(Xtπi) − ri(X
π
ti)
γ(X_{t}π
i)
p−1_{ ,}

and
Eνi(Xtπi) − ri(X
π
ti)
γ(X_{t}π
i)
p−1
= E
qR_{(X}π
ti) − ¯q
R_{(X}π
ti)
q1_{(X}π
ti)
γ(X_{t}π
i)
p−1
=
Z
Rd
EH_{x}(X_{t}π
i)%(X
π
ti+1) S
hi _{− S}¯hi[∆Wl
i+1φ
π
(X_{t}π_{i}− x)]
γ(x)p−1dx .

Using the representation (4.3) we get

E|νi(Xtπi) − ri(X
π
ti)| γ(X
π
ti)
p−1
=
Z
Rd
E
n
Hx(Xtπi)%(X
π
ti+1)
X
J ∈S
k≤d−1Jk
(−1)|J|∂Jφπ(Xtπi − x)
Shi
Jc − S
¯
hi
Jc
[∆W_{i+1}l ]
o
γ(x)
p−1
dx.
(5.8)

Arguing as in (5.2), with the help of Lemma 3.2.1 of [16], we have

E
h
Shi
Jc− S
¯
hi
Jc
[∆W_{i+1}l ]
Fti
i
= 0 .

So we may condition in (5.8) the integrand with respect to Fti to get

E|νi(Xtπi) − ri(X
π
ti)| γ(X
π
ti)
p−1
=
Z
Rd
E
n
Hx(Xtπi)
%(X_{t}π_{i+1}) − %(X_{t}π_{i})
×
X
J ∈S
k≤d−1Jk
(−1)|J|∂Jφπ(Xtπi − x)
Shi
Jc− S
¯
hi
Jc
[∆W_{i+1}l ]o
γ(x)
p−1_{dx}

Our next step is to perform an ω-by- ω change of variables for the Riemann integral by setting u = (Xπ

ti− x)/|π|

α_{. Observe that the definition of the localizing function φ}π _{yields}

∂Jφπ(x) = |π|−α|J|∂Jφ(_{|π|}xα). Hence we have that

Eνi(Xtπi) − ri(X
π
ti)
γ(X_{t}π
i)
≤ X
J ∈∪k≤d−1Jk
|π|α(d−|J |)
Z
Rd+
E
n
(%(X_{t}π
i+1)−%(X
π
ti))(S
hi
Jc−S
¯
hi
Jc)[∆W_{i+1}l ]γ(X_{t}π
i−|π|
α_{u)}p−1o
|∂Jφ(u)|du

We now apply a H¨older inequality and using the assertion of Lemma 5.2, the Lipschitz assumption on % and the obvious estimate kXπ

ti+1 − X
π
tikk ≤ C|π|
1/2 _{for any k ≥ 1, we}
have
Eνi(Xtπi) − ri(X
π
ti)
γ(X_{t}π
i)
p−1
≤ C X
J ∈S
k≤d−1Jk
|π|12+(α−
1
2)(d−|J |)k%(Xπ
ti+1) − %(X
π
ti)kp
Z
Rd+
kγ(Xπ
ti− |π|
α_{u)}p−1_{k}
q2|∂Jφ(u)|du
≤ C X
J ∈S
k≤d−1Jk
|π|1+(α−12)(d−|J |)
Z
Rd+
kγ(X_{t}π_{i}− |π|αu)p−1kq2|∂Jφ(u)|du (5.9)

for some q2 > 1 and C, a constant independent of the partition. Now, since the function

γ(·) is of polynomial growth we have, with r an integer, that
Z
Rd+
Eγ(Xtπi− |π|
α
u)q2(p−1)1/q2_{|∂}
Jφ(u)|du
≤ C
Z
Rd+
E
h
1 +
r
X
k=0
r
k
|Xπ
ti|
k_{||π|}α_{u|}r−kq2(p−1)i1/q2_{|∂}
Jφ(u)|du ,

and since |∂Jφ(u)| integrates against polynomials, by the assumptions on the localizing

function, we have the first result. The second part of the theorem follows from (5.9) without using the last step estimate.

As already mentioned, here we need to consider an α > 1/2 and this will have a negative impact, relative to the results of the previous section, on the integrated variance that controls the error of the simulation-based estimation of the family { ¯Ri}n−1i=0. In

particular using the assumptions and the notation of Lemma 4.4, we may show with identical arguments that

lim sup
|π|→0
|π|αd/2 _{max}
1≤i≤n
Z
Rd
µ(x)kQR[¯hi, φπ](x)k2dx < ∞ , (5.10)

where φπ(x) := φ(_{|π|}xα) and µ(·) is a function of polynomial growth.

In our main result we will make use of the following version of Gronwall’s whose proof follows from a straightforward induction. lemma.

Lemma 5.5. Let ai, bi, i = 0, 1, . . . , n be two sequences of positive numbers, such that ai ≤ bi+ n X j=i+1 aj, i = 0, . . . , n − 1,

with > 0. Then we have that

n X i=0 ai ≤ n X i=0 (1 + )ibi+ (1 + )nan.

Theorem 5.6. Let φ, φπ be as in Theorem 5.4 and consider the three families of operators {Ri}n−1i=0, { ¯Ri}i=0n−1, { ˆR¯i}n−1i=0 defined with the function φπ, on the partition π. As before, we

have that Yπ

ti = Ri. . . RnΦ(X

π

ti(0, x)), ∀i = 0, . . . , n − 1 and we set

ˆ
¯
Y_{t}π
i := ˆ
¯
Ri. . . ˆR¯nΦ(Xtπi(0, x)).

Then, for any p > 1 and α ≥ (3p − 1)/2, we have

max
0≤i≤n−1kY
π
ti − ˆ
¯
Y_{t}π
ikp ≤ C
|π|1p(
1
2+α)−1)+ 1
|π|1+αd/2p_{N}1/2p

for a constant C independent of the partition.

Remark 5.7. The result above reports on the rate of convergence for the global error. The restriction on α appears because the use of Lemma 5.5, at the end of our proof calls for an = O(1/n). Observe that with an equidistant partition, say |π| = 1/n, this value of α gives us an error accuracy of order 1/√n, as long as we make use of N = n3p+(3p−1)d2

copies of the Euler scheme at every point of the partition.

Proof of Theorem 5.6: We will estimate |Yπ t0− ˆY¯

π

t0| and remark that the estimate for i > 0

follows identical arguments. For ease of presentation we make the conventions that R−1 =

product expansion for the error:
|Yπ
t0 − ˆ
¯
Y_{t}π
0| = |R0. . . Rn−1Φ(x) − ˆ
¯
R0. . . ˆR¯n−1Φ(x)|
≤
n−1
X
i=0

|R0,iRˆ¯i+1,n−1Φ(x) − R0,i−1Rˆ¯i,n−1Φ(x)|

≤ n−1 X i=0 (Ri− ˆ ¯ Ri) ˆR¯i+1,n−1Φ(Xtπi) p ,

where we have used Lemma 3.4, with p > 1. For every i we clearly have

k(Ri− ˆR¯i) ˆR¯i+1,n−1Φ(Xtπi) p ≤ (Ri− ¯Ri) ˆ ¯ Ri+1,n−1Φ(Xtπi) p+ ( ¯Ri− ˆ ¯ Ri) ˆR¯i+1,n−1Φ(Xtπi) p (5.11) To estimate the second term above, we may appeal to Theorem 4.3. The proof of the latter comes through in this case to give us,

( ¯Ri− ˆ ¯ Ri) ˆR¯i+1,n−1Φ(Xtπi) p ≤ 2 N1/2 Z Rd γ(x)kQR[¯hi, φ](x)kp+ (|r(x)| + γ(x))kQ1[¯hi, ψ](x)kp dx 1/p .

Plugging the estimate (5.10) in the above we get

( ¯Ri− ˆ
¯
Ri) ˆR¯i+1,n−1Φ(Xtπi)
p ≤
C
|π|αd/2p_{N}1/2p. (5.12)

As for the first term in (5.11) we have

k(Ri− ¯Ri) ˆR¯i+1,n−1Φ(Xtπi)
_{p} ≤ k(Ri− ¯Ri)Ri+1,n−1Φ(X
π
ti)kp
+ k(Ri− ¯Ri)( ˆR¯i+1,n−1− Ri+1,n−1)Φ(Xtπi)kp.

We know from Remark 3.3 that the functions Ri+1,n−1Φ are Lipschitz uniformly in i.

Hence we apply the first assertion of Theorem 5.4 to the first term above, while the second assertion is applied to the other one, to obtain

(Ri− ¯Ri) ˆ ¯ Ri+1,n−1Φ(Xtπi) p ≤ C |π|α+1/2p + |π| α p ( ˆ ¯

Ri+1,n−1− Ri+1,n−1)Φ(Xtπi+1)

1/p p . (5.13)

We would now like to eliminate the 1/p-root above. To this end, observe that for any
a > 0 and p > 1, it holds that a1/p _{≤ a + 1. Hence}

( ˆ

¯

Ri+1,n−1− Ri+1,n−1)Φ(Xtπi+1)

1/p p ≤ |π| 1 2p− 1 2 ( ˆ ¯

Ri+1,n−1− Ri+1,n−1)Φ(Xtπi+1)

p + |π| 1/2p . (5.14) Putting together (5.12), (5.13), (5.14) we have that

(Ri− ˆ
¯
Ri) ˆR¯i+1,n−1Φ(Xtπi)
p ≤ C
|π|α+1/2p _{+} 1
|π|αd/2p_{N}1/2p
+ |π|αp+
1
2p−
1
2
( ˆ
¯

Ri+1,n−1− Ri+1,n−1)Φ(Xtπi+1)

p ,

which provides, by performing a Trotter product expansion to the last term,
(Ri− ˆ
¯
Ri) ˆR¯i+1,n−1Φ(Xtπi)
p ≤ C
|π|α+1/2p _{+} 1
|π|αd/2p_{N}1/2p
+ |π|αp+
1
2p−
1
2
n−1
X
j=i+1
( ˆ
¯
Rj− Rj) ˆR¯j+1,n−1Φ(Xtπj)
p
,

and the proof is completed by appealing to Lemma 5.5.

### Appendix

Proof of Theorem 4.3. Using the bounds on ri we have

E0,i+1
h
(ri(Xtπi)−ˆri(X
π
ti))
pi
≤ E0,i+1_{|˜}_{r}
i(Xtπi) − ri(X
π
ti)|
p_{∧ γ(X}π
ti)
p
(.15)
= E0,i+1
"
εF_{(X}π
ti) − ri(X
π
ti)ε
1_{(X}π
ti)
ˆ
Q1_{(X}π
ti)
p
∧ γ(X_{t}π_{i})p
#
,
where
εF(x) := ˆQF(x) − q_{i}F(x), and ε1(x) := ˆQ1(x) − q1_{i}(x).

We will be using later the fact that

Ei+1|εF(x)| ≤ kεF(x)kL2_{(P}i+1_{)} ≤

1 N1/2V

0

where V0_{[Q}F_{] is the variance of the random variable Q}F_{[h}

i; φ] with respect to the measure

P0. Notice that the last inequality holds in the marginal measure Pi+1. A similar result
is true for ε1_{.}

Next, for x ∈ Rd let us consider the event
M(x) =
ω : | ˆQ1(x, ω) − q_{i}1(x)| ≤ 1
2q
1
i(x)
.

Using this set we may split the expectation above

E0,i+1
h
εF_{(X}π
ti) − ri(X
π
ti)ε
1_{(X}π
ti)
ˆ
Q1_{(X}π
ti)
p
∧ γ(X_{t}π_{i})p
i
≤ E0,i+1
2
εF(X_{t}π_{i}) − ri(Xtπi)ε
1_{(X}π
ti)
q1
i(Xtπi)
p
γ(X_{t}π_{i})1M(Xπ
ti)
+ E0,i+1
h
γ(X_{t}π_{i})p 1Mc_{(X}π
ti)
i
,

where we have used the inequality ap∧ bp _{≤ ab}p−1_{. For the first term on the right hand}

side we compute
2
Z
Rd
Ei+1
h
|εF_{(x)−r}
i(x)ε1(x)|γ(x)
i
dx
≤ 2
Z
Rd
(kεF(x)kL2_{(P}i+1_{)}+ |r_{i}(x)|kε1(x)k_{L}2_{(P}i+1_{)})γ(x)p−1dx
= 2
N1/2
Z
Rd
(V_{R}0(x)1/2+ |ri(x)|V10(x)
1/2_{)γ(x)}p−1_{dx}
≤ 2
N1/2
Z
Rd
(kQFkL2_{(P}0_{)}+ |r_{i}(x)|kQ1k_{L}2_{(P}0_{)})γ(x)p−1dx.

As for the second term we estimate it by means of the Chebychev inequality
E0,i+1
h
γ(X_{t}π
i)
p _{1}
Mc_{(X}π
ti)
i
= E0Ei+1[γ(Xtπi)
p _{1}
Mc_{(X}π
ti)]
= E0γ(Xtπi)
p
Pi+1[Mc(Xtπi)]
≤ E0
γ(X_{t}π
i)
p 1
q1
i(Xtπi)
Ei+1[2| ˆQ1(Xtπi) − q
1
i(X
π
ti)|]
= Ei+1
γ(x)p
Z
Rd
2| ˆQ1(x) − q_{i}1(x)|dx
≤ 1
N1/2
Z
Rd
γ(x)pVQ1(x)1/2dx.

We now have an estimate of the error with respect to the measure dP0 × dPi+1_{. The}

claimed result follows from an application of Fubini’s theorem.

### References

[1] F. Antonelli and A.Kohatsu-Higa. Filtration Stability of Backward SDEs. Appied Analysis and its Applications, 18:11–37, 2000.

[2] V. Bally and G. Pag`es. Error analysis of the quantization algorithm for obstacle problems. Stochastic Processes and their Applications, pages 1–40, 2003.

[3] J. L. Bentley. Multidimensional divide-and-conquer. Programming Tehcniques, R. Rivest Editor, pages 214–229, 1980.

[4] B. Bouchard, I. Ekeland, and N.Touzi. On the Malliavin approach to Monte Carlo methods of conditional expectations. Finance and Stochastics, 8:45–71, 2004.

[5] B. Bouchard and R. Elie. Discrete time approximation of decoupled forward-backward sde with jumps. Stochastic Processes and their Applications, page to ap-prear, 2006.

[6] B. Bouchard and N. Touzi. Discrete time approximation and Monte Carlo simula-tion for Backward Stochastic Differential Equasimula-tions. Stochastic processes and their applications, 111:175–206, 2004.

[7] D. Chevance. Numerical methods for backward stochastic differential equations. Numerical Methods in Finance, 1997.

[8] D.O. Crisan and K. Manolarakis. Numerical solution for a BSDE using the Cubature method. preprint, 2007.

[9] J.-P. Lemor E. Gobet and X. Warin. Rate of convergence of an empirical regression method for solving generalized backward stochastic differential equations. Bernoulli, pages 889–916, 2006.

[10] N. El Karoui, S. Peng, and M. Quenez. Backward Stochastic Differential Equations in finance. Mathematical finance, 7(1):1–71, 1997.

[11] V. Mackevicius F. Coquet and J. M´emin. Stability in d of martingales and backward equations under discretization of filtration. Stochastic Processes and their Applica-tions, 75:235–248, 1998.

[12] E. Gobet, J.P Lemor, and X. Warin. A regression based Monte Carlo method to solve Backward Stochastic Differential Equations. Annals of Applied Probability, 15(3):2172–2002, 2005.

[13] J. San Martin J. Ma, P. Protter and S. Torres. Numerical methods for Backward Stochastic Differential Equations. Annals of Applied Probability, 12(1):302–316, 2002.

[14] P. Kloeden and E. Platen. Numerical solutions of Stochastic Differential Equations. Springer, 1999.

[15] T Lyons and N.Victoir. Cubature on Wiener space. Proc. Royal Soc. London, 468:169–198, 2004.

[16] D. Nualart. The Malliavin calculus and related topics. Springer –Verlag, 1996.

[17] B. Delyon P. Briand and J.M´emin. Donsker-type theorem for BSDEs. Electronic Communications in Probability, 6:1–14, 2001.

[18] J. Zhang. A numerical scheme for BSDEs. Annals of Applied Probability, pages 459–488, 2004.