• Aucun résultat trouvé

V. Lemaire & G. Pag` es

N/A
N/A
Protected

Academic year: 2022

Partager "V. Lemaire & G. Pag` es"

Copied!
67
0
0

Texte intégral

(1)

Unconstrained

Recursive Importance

Sampling and applications

V. Lemaire & G. Pag` es

()

LPMA, Univ. Paris 6

9 F´evrier 2009, CMAP-X

(2)

Basic problem of Numerical Probability Compute by Monte Carlo simulation

m =

E

F(X) X : (Ω,A,

P

)

R

q or (H,(.|.)H) Hilbert space Examples in Finance : Option premium, greek parameters, etc.

If F(X)∈ L2(

P

)

E

F(X) − 1 M

XM k=1

F(Xk)

2

=

rVar(F(X))

M ,

Central Limit Theorem, Law of the iterated Logarithm, etc, are ruled by Var(F(X)).

Suggest Variance reduction.

(3)

One tool (in

R

d) : Importance sampling:

X ∼ p(x)λd(dx) and Xθ ∼ pθ(x)dx.

E

(F(X)) =

Z

F(x)p(x)dx = Z

Rd

F(x) p

pθ (x)pθ(x)dx =

E

F(Xθ) p

pθ (Xθ)

. Resulting minimization problem min

θ V (θ) V (θ) =

E

F(Xθ)2 p2

p2θ(Xθ)

=

E

F(X)2 p

pθ (X)

. If θ 7→ pθ(x) is log-concave and lim|θ|→∞ pθ(x) = 0 then

V is convex and lim

|θ|→∞V (θ) = +∞ so that

ArgminθV = {∇θV = 0} 6= ∅.

(4)

Stochastic Recursive Zero search: Robbins-Monro algorithm.

V (θ) =

E

(v(θ, X))“ =⇒ ”∇V (θ) =

E



∂v

∂θ(θ, X)

| {z }

local gradient



 =

E

 w(θ, X)

| {z }

pseudolocal gradient



Let ρ :

R

d → (0,+∞), Borel function. Then set

H(θ, X) := ρ(θ)w(θ, X).

(5)

Theorem (Extended Robbins-Monro algorithm) Let X1, X2, . . . , Xn i.i.d.

with distribution L(X) on (Ω,A,

P

) (easy to simulate).

Let γ = (γn)n1,

X

n

γn = +∞ and X

n

γn2 < +∞. Set

θn+1 = θn − γn+1H(θn, Xn+1), θ0

R

d.

Then, if (WARNING!)

∀θ∈

R

d, kH(θ, X)k2 ≤ C(1 + |θ|)

there exists a random vector θ(Ω,A,

P

) → {∇V = 0} such that

θn −→a.s. θ as n → ∞.

(6)

Why ?

Key 1: Fn = σ(θ0, X1, . . . , Xn)

E

(Hn, Xn+1)| Fn) = (

E

H(θ, X))|θ=θn = ρ(θn)∇V (θn) so that

θn+1 = θn − γn+1ρ(θn)∇V (θn)

| {z }

N ewtonRaphson like algo.

−γn+1 H(θn, Xn+1) −

E

Hn, Xn+1 | Fn)

| {z }

M artingale disturbance term

Key 2: If supn

E

(|H(θn, Xn+1)|2) < +∞ and P

n1 γn2 < +∞ then

E

Mn2 = X

n1

γn2

E

(|H(θn, Xn+1)|2) < +∞ so that Mn −→ M∈ L2(

P

).

The disturbance term is fading faster than γn

(7)

Stochastic Approximation =

Deterministic zero search procedures + Monte Carlo (simulation)

(8)

I. Recursive variance reduction: the Arouna-Lapeyre (2003) algorithm revisited (Lemaire-P., 2008)

Importance sampling by mean translation:

X ∼ N(0; Id),

E

F(X) =

Z

Rd

F(x)e|x|

2

2 dx

(2π)d2 . pθ(x) = p(x − θ) with p(x) = e|x|

2 2

(2π)d2 Cameron-Martin formula

E

F(X) =

E

(F(X + θ)e|X)e|θ|

2

2 ). (1)

The θ with the lowest variance is solution to

minθ V (θ) with V (θ) =

E

F(X + θ)2e2(θ|X)e−|θ|2

If F is smooth, large deviation approach (see Glasserman and al. 1999)

(9)

Comparison with GHS (Glasserman et al.)

θminRd

E

F(X)2e|X>+12|θ|21D(X)

where D := {F > 0}.

E

F(X)2e|X>+12|θ|21D(X) = (2π)d/2

Z

D

e2 log(F(x))|x>+12|θ|212|x|2dx

≈ C max

xD exp (2 log(F(x))−< θ|x > +1

2|θ|2− 1

2|x|2) so that the above minimization problem amounts to

θminRd max

xD(2 log(F(x))− < θ|x > +1

2|θ|2 − 1

2|x|2)

(10)

Following Arouna: second change of variable V (θ) = e|θ|

2

2

E

F2(X)e(X|θ)

so that

∇V (θ) = e|θ|

2

2

E

F2(X)e|X) − X) . which yields

θn+1 = θn − γn+1 e|θn|

2

2 F2(Xn+1)en|Xn+1)n − Xn+1)

| {z }

=Hn,Xn+1)

.

Unfortunately :

lim inf

|θ|→∞

kH(θ, X)k2

|θ| = +∞. . . Consequence : It does explode in general.

(11)

Remedy : Arouna suggests a constrained version of the algorithm with a slow relaxation of the constraint (i.e. slowly increasing sequence of

compact sets)

algorithm with repeated projections “`a la Chen”

which is the mathematical formalisation of repeated trials.

A.s. convergence holds with a CLT (Lelong 2007) once stabilization has occured.

In pratice : the choice of the compact sets needs much care in connection with the step sequence

(12)

New approach (Lemaire-P. (2008)) :

third change of variable to plug θ in the payoff F ! Plug back θ into F!

∇V (θ) = e|θ|

2

2

E

F2(X)e|X) − X)

= e|θ|2

E

F2(X − θ)(2θ − X) . +

Growth control of F at infinity:

|F(x + y)| ≤ (a + b|F(x)|)(a + b|F(y)|) + c

θn+1 = θn − γn+1 F(Xn+1 − θn)2

1 + F(−θn)2 (2θn − Xn+1)

| {z }

=:Hen,Xn+1)

.

The function He satisfies the linear growth assumption in L2(

P

).

Hence a.s. convergence to an optimal θ∈ ArgminV with CLT, LIL, etc.

(13)

II. Extension mean translation for log -concave p.d.f.

Still pθ = p(x − θ) now with

p is log-concave and lim

|x|→∞p(x) = 0.

We make the following assumption on the probability density p

∃a∈ [1,2] such that



(i) |∇pp|(x) = O(|x|a1) as |x| → ∞ (ii) ∃δ > 0, logp(x) + δ|x|a is convex.

The same approach based on three translations of means

E

(|X|2(a1)F2(X)) < +∞ and F sub-multiplicative

H(θ, x) := F2(x − θ) 1 + F2(−θ)

| {z }

selfcontrol

e|θ|a p2(x − θ) p(x)p(x − 2θ)

| {z }

C

∇p(x − 2θ) p(x − 2θ) ,

(14)

satisfies

E

H(θ, X) = ρ(θ)∇V (θ) and

kH(θ, X)k2 ≤ C(1 + |θ|) so that the extended R-M Theorem applies.

(15)

III. Esscher transform

Set ψ(θ) := log

E

(e|X)) < +∞, θ∈

R

d.

∀ θ∈

R

d, pθ(x) = e|X)ψ(θ)p(x), x

R

d.

Assume that both X = X0 and X(θ) = g(θ, U), U ∼ U([0,1]) can be simulated at the same “reasonable” cost .

V (θ) =

E

hF2(X)e|X)+ψ(θ)i.

Suppose that the function ψ satisfies



(i) lim|θ|→∞ ψ(θ)|θ| = +∞ or lim|θ|→∞ ψ(θ) − 2ψ(θ/2) = +∞. (ii) ∃ δ > 0, θ 7→ ψ(θ) − δ |θ|2 is concave,

Assume the payoff F satisfies

∀θ ∈

R

d,

E

h|X|F2(X)e|X)i

< +∞

(16)

Then,

∇V (θ) =

E

(∇ψ(θ) − X)F2(X)

e(θ,X)+ψ(θ),

=

E

h ∇ψ(θ) − X(θ)

F2(X(θ))i

eψ(θ)ψ(θ), Then the recursive procedure

θn+1 = θn − γn+1H(θn, g(−θn, Un+1)

| {z }

=:Xn+1

)

where

– (Un)n1 is an i.i.d. sequence with L(X(θ))

− H(θ, x) := eλ2d|∇ψ(θ)|F2(x) ∇ψ(θ) − x satisfies

θn −→a.s. θ , θ is a {∇V = 0}-valued r.v..

(17)

IV. Functional setting: pathwise dependent diffusions

We consider a d-dimensional Itˆo process X = (Xt)t[0,T] solution to the S.D.E.

(Eb,σ,W) ≡ dXt = b(t, Xt)dt + σ(t, Xt)dWt, X0 = x∈

R

d, (2)

where Xt denotes the stopped process at t.

– Includes Diffusion and Euler Scheme

(Hb,σ) ≡







(i) b and σcontinuous on [0, T] × C([0, T],

R

d)

(ii) ∀ t∈ [0, T], ∀x, y∈ C([0, T],

R

d),

|b(t, y) − b(t, x)| + kσ(t, y) − σ(t, x)k ≤ Cb,σkx − yksup. ensures existence and uniqueness of a strong solution.

Aim: Compute by Monte Carlo simulation

E

F(X)

F : C([0, T],

R

d)

R

, F(X)∈ L1(

P

) and

P

(F(X) 6= 0) > 0.

(18)

An extension : replace θ∈

R

d by

ϕ(Xt)

| {z }

driver

×θ(t)

with

ϕ : C([0, T],

R

d)

R

q, bounded, θ∈ L2T := L2([0, T], dt).

For this talk q = 1 and

∀ξ∈ C([0, T],

R

d), ϕ(ξ) ≡ 1

Tool : Triple application of Girsanov ’s Theorem.

(19)

• Representations of

E

F(X) [Girsanov 1] :

E

F(X) = e

1

2kθk2L2

T

E

F(X(θ))eR0T θ(s)dWs.

where

dXt(θ) = (b(t, X(θ),t) +θ(t)σ(t, X(θ),t))dt+σ(t, X(θ),t)dWt, X0 = x∈

R

d.

• Variance Minimization [Girsanov 2]:

θminL2T V (θ) ou min

θE V (θ), E = span(e1, . . . , em) ⊂ L2T where

V (θ) := e−kθk

2 L2

T ,q

E

F2(X(θ))e2R0T θ(s)dWs

= e

1

2kθk2L2

T ,q

E

F2(X)eR0T θ(s)dWs.

(20)

• Gradient representation [Girsanov 3]: (Hb,σ) and

E

F(X)2+δ) < +∞. V is log-convexe and limkθk

L2

T ,q+ V (θ) = +∞ (Fatou) hence ArgminθL2

T

V 6= ∅ and ArgminθE 6= ∅ E closed (finite dimensional) subspace of L2T.

V is differentiable at every θ∈ L2T with gradient ∇V (θ)∈ L2T : ∀ψ∈ L2T (∇V (θ)|ψ)L2

T = e

1

2kθk2L2

T ,q

E

F2(X)eR0T θ(s)dWs |ψ)L2

T − Z T

0

ψ(s)dWs

!!

,

= ekθk

2 L2

T

E

F2(X(θ)) 2(θ|ψ)L2

T

Z T 0

ψ(s)dWs

!!

. (3)

(21)

• “Sub-linearity correction” in (3):

Follows from an priori (strong) control of X − X(θ) (both living on the same space) on an appropriate space. Let p ≥ 1.

∀θ∈ L2T , X and X(θ) strong solutions of Eb,σ,W˜ and Eb+σθ,W˜ .

sup

t[0,T]|Xt − Xt(θ)| p

≤ Cb,σeCb,σT

Z T

0 |σ(s, X(θ),s)θ(s)|ds p

≤ CσkθkL1T

1 + kkXksupkp(1+δ) e

1

2pδkθk2L2 T

.

(22)

If the functional F satisfies

∀ x∈ C([0, T],

R

d), |F(x)| ≤ CF (1 + kxkrsup)

Let E = Vec(e1, . . . , em) ⊂ L2T (to be specified). The algorithm defined by θn+1 = θn − γn+1Hrn, X(θn), W(n+1))

with (W(n))n1 i.i.d. Brownian motions,

X(θn) = G(−θn, W(n+1)) solution to (Ebσθn, W(n+1)),

∀i∈ {1, . . . , m},

Hr,i(θ, x, W) := (Hr(θ, x, W)|ei)L2

T ,q = F2(x) 1 + kθk2rL2

T

2(θ|ei)L2

T ,q− Z T

0

ei(s)dWs

!

satisfies

θn −→p.s. θE ∈ argminEV.

(23)

Practical implementation

Phase 1 : Compute θ = θN using MRM iterations.

Phase 2 : Compute

E

F(X) via Monte Carlo of size MM C

E

F(X) ≈ 1 MM C

MXM C

m=1

e

1

2kθk2L2

T F((X))(m)) eR0T θ(s)dWs(m) W(m) independent Brownian motions.

MRM ≪ MM C (e.g. MRM ≈ MM C/10).

Alternative : Adaptive coupling (for mean translation only)

(24)

Numerical experiments (I): NIG Call option X =d N IG(α, β, δ, µ) = αδK1(αp

δ2 + (x − µ)2) πp

δ2 + (x − µ)2 eδ

α2β2+β(xµ)dx.

with K1 modified Bessel function of the second kind.

F(x) = 50 (ex − K)+

α = 2, β = 0.2, δ = 0.8, µ = 0.04.

MRM = 100 000, MM C = 1 000 000.

Translation

f o r n = 0 t o M do

X ˜ NIG ( a l p h a , b e t a , gamma , d e l t a )

t h e t a = t h e t a 1 / ( n +1000)∗H1 ( t h e t a , X) f o r n = 0 t o N do

X ˜ NIG ( a l p h a , b e t a , gamma , d e l t a ) mean = mean + F (X) p (X+t h e t a ) / p (X)

Esscher transform

f o r n = 0 t o M do

X ˜ NIG ( a l p h a , b e t a−t h e t a , gamma , d e l t a ) t h e t a = t h e t a 1 / ( n +1000)∗H2 ( t h e t a , X) f o r n = 0 t o N do

X ˜ NIG ( a l p h a , b e t a+t h e t a , gamma , d e l t a ) mean = mean + F (X) exp(−t h e t a∗X)

mean = mean exp ( p s i ( t h e t a ) )

(25)

K mean crude var var. ratio. var. ratio

translation (θ) Esscher (θ) 0.6 42.19 8538 5.885 (0.791) 56.484 (1.322) 0.8 34.19 8388 7.525 (0.903) 39.797 (1.309) 1.0 27.66 8176 9.218 (0.982) 32.183 (1.294) 1.2 22.60 7930 10.068 (1.017) 29.232 (1.280) 1.4 18.76 7677 9.956 (1.026) 28.496 (1.268)

Table 1: Variance reduction for different strikes.

(26)

densites

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

-2 -1 0 1 2 3 4

crude translation Esscher

Figure 1: Densities of X (crude), X + θ (translation) and X(θ) (Esscher) in the case K = 1.

(27)

Numerical experiments (II): Simplified Spark Spread

We consider now a exchange option between gas and electricity (called spark spread).

A simplified form of the payoff is

F(X) = 50(eXelec − ceXgas − K)+,

where Xelec ∼ N IG(2,0.2,0.8,0.04) and Xgas ∼ N IG(1.4,0.2,0.2,0.04) are independent.

• 300 000 iterations of Robbins-Monro

• 3 000 000 iterations of Monte Carlo

(28)

Simplified Spark Spread option - results

K c mean crude var var. ratio. var. ratio translation Esscher 0.6 0.2 33.235 8378.4 5.2609 27.455

0.4 26.534 8133.3 5.0604 28.669 0.6 21.587 7862.7 4.8046 30.649 0.8 17.931 7595.2 4.5839 33.656

1 15.184 7344.2 4.4064 37.489

0.8 0.2 26.908 8160.1 5.1366 28.876 0.4 21.725 7884.9 4.844 31.018 0.6 17.955 7612.5 4.6031 34.166 0.8 15.156 7357.3 4.416 38.167

1 13.027 7123.9 4.2685 42.781

(29)

Numerical experiments (III): Barrier options and local volatility models

Pseudo-CEV model (α∈]0,1]) : dXt = rXtdt + σXtα Xt

p1 + Xt2 dWt, X0 = x > 0, Down-and-in Call:

F(X) = (XT − K)+1n

0mintT Xt ≤ Lo.

Market parameters

X0 = 100, r = 0.04, pseudo-volatility σ = 7, α = 0.5.

Contract parameter K = 115, L = 65, T = 1.

Time discretization : Continuous Euler scheme (Brownian bridge) with step T /n, n = 100.

MRM = 50 000 MM C = 500 000

(30)

Down & In Call option - Brownian interpolation

• ( ¯Xtk) Euler scheme of step tk = kTn with n = 100.

• Brownian bridge interpolation and pre-conditioning:

E

F( ¯X) =

E

F( ¯X)|X¯t1,· · · ,X¯tn

=

E

"

( ¯XT − K)+ 1 −

NY1 k=0

p( ¯Xtk,X¯tk+1)

!#

, with

p(xk, xk+1) = 1 −

P

tmin[0,t1]Wt ≤ L − xk σ(xk)

Wt1 = xk+1 − xk σ(xk)

,

=



0 if L ≥ min(xk, xk+1),

1 − e

2(L−xk)(L−xk+1 )

σ2 (xk)(tk+1tk) , otherwise.

(31)

Basis of L2([0,1],

R

)

• a polynomial basis: ∀n ≥ 0,∀t ∈ [0,1],

n(t) = Pn(2t − 1) where Pn(t) = 1 2nn!

dn

dtn (t2 − 1)n

. (ShLeg)

• the Karhunen-Lo`eve basis: ∀n ≥ 0,∀t ∈ [0,1], en(t) = √

2 sin

n + 1 2

πt

(KL)

• the Haar basis: ∀n ≥ 0,∀k = 0, . . . ,2n − 1,∀t ∈ [0,1],

ψn,k(t) = 2k2 ψ(2kt − n) (Haar)

where ψ(t) =







1 if t ∈ [0, 12)

−1 if t ∈ [12,1) 0 otherwise

(32)

Results for trivial driver ϕ = Id

Basis Dim. Mean CI 95% Variance ratio

Constant 1 3.1836 ±0.0251 2.6297

ShiftLegendre 2 3.1830 ±0.0223 3.3258 4 3.1815 ±0.0215 3.5670 8 3.1813 ±0.0215 3.5659 Karhunen-Lo`eve 2 3.1852 ±0.0187 4.7254 4 3.1862 ±0.0183 4.9385 8 3.1918 ±0.0178 5.2183

Haar 2 3.1834 ±0.0215 3.5699

4 3.1871 ±0.0186 4.7896 8 3.1864 ±0.0177 5.2675

Table 2: Variance ratio obtained for different basis in the local volatility model (K = 115, L = 65, variance of the crude Monte Carlo: 206.52).

(33)

Representations of the optimal variance reducer

-1 -0.5 0 0.5 1 1.5 2 2.5

0 10 20 30 40 50 60 70 80 90 100

cst poly haar KL

(2 basis functions)

(34)

-1 -0.5 0 0.5 1 1.5 2 2.5

0 10 20 30 40 50 60 70 80 90 100

cst poly haar KL

(4 basis functions)

(35)

-1 -0.5 0 0.5 1 1.5 2 2.5

0 10 20 30 40 50 60 70 80 90 100

cst poly haar KL

(8 basis functions)

(36)

With a non trivial driver ϕ(t,X¯t) =

¯

pk 1 − p¯k

, with p¯k =

P

0sminkT /n

s

0, . . . ,X¯kT /n

and

E = (

R

1[0,T])2 so that Variance reducer

α p¯k + β (1 − p¯k).

(37)

Results with non-trivial driver ϕ

K L Mean CI 95% Var.Ratio (Crude) α β

85 65 3.1827 ±0.0127 10.02 (20.28) -0.3057 1.5522 75 6.4115 ±0.0190 9.96 (45.03) -0.1428 1.7985 95 65 3.1846 ±0.0124 10.65 (19.08) -0.1141 1.9139 75 6.4117 ±0.0199 9.07 (49.42) -0.0029 1.9814 85 11.4478 ±0.0293 8.03 (106.99) 0.1898 1.8937 105 65 3.1835 ±0.0135 8.98 (22.65) 0.1487 1.9628 75 6.4120 ±0.0209 8.21 (54.59) 0.1493 2.0060 85 11.4458 ±0.0295 7.88 (108.94) 0.2503 1.8737 95 18.6060 ±0.0345 9.83 (149.07) 0.5594 1.4343 115 65 3.1817 ±0.0148 7.38 (27.54) 0.3062 1.6884 75 6.4112 ±0.0209 8.18 (54.79) 0.1928 1.8119 85 11.4470 ±0.0289 8.24 (104.16) 0.2599 1.7430 95 18.6061 ±0.0346 9.79 (149.76) 0.5755 1.4313

Table 3: Variance reduction for different strikes and barrier levels in the local volatility model.

(38)

Results with non-trivial driver ϕ in the Black-Scholes model

K L Mean CI 95% Var.Ratio (Crude) α β

85 65 2.5738 ±0.0115 13.49 (16.56) -0.1752 1.6685 75 6.0489 ±0.0186 14.26 (43.39) 0.0493 1.9191 95 65 2.5704 ±0.0110 14.64 (15.26) 0.0524 1.9987 75 6.0492 ±0.0190 13.67 (45.25) 0.1557 2.0560 85 11.5970 ±0.0301 12.23 (112.92) 0.4108 2.1226 105 65 2.5687 ±0.0122 12.03 (18.56) 0.3888 2.1423 75 6.0548 ±0.0206 11.66 (53.08) 0.3895 2.1720 85 11.5953 ±0.0308 11.67 (118.32) 0.4524 2.1608 95 19.2882 ±0.0348 17.17 (151.04) 0.6619 1.7910 115 65 2.5706 ±0.0135 9.75 (22.90) 0.5473 1.8903 75 6.0530 ±0.0211 11.16 (55.42) 0.4591 1.9371 85 11.5976 ±0.0297 12.55 (109.98) 0.4807 2.0008 95 19.2958 ±0.0347 17.21 (150.67) 0.7217 1.6380

Table 4: Variance reduction for different strikes and barrier levels in the Black&Scholes model (r = 0.04, σ = 0.7).

(39)

Computing

V aR and CV aR

Monte Carlo simulation by

O. Bardou & N. Frikha & G. Pag` es

()

LPMA, Univ. Paris 6

(40)

1 Definitions

Let X : (Ω, A,P) → Rd be a random vector (structure variable).

Let ϕ : Rd → R be a Borel function representative of a loss

Definition. Value-at Risk (VaR) Let α∈ (0,1) be the confidence level.

VaRα(ϕ(X)) := inf {ξ | P(ϕ(X) ≤ ξ) ≥ α} i.e. the lowest α-quantile of ϕ(X).

If ϕ(X) has a continuous distribution (no atom) , then P(ϕ(X) ≤ ξ) = α.

and if ϕ(X) has no “hole” then VaRα(ϕ(X)) is unique.

(41)

Definitions. (a) Conditional Value-at Risk (CVaR) such that ϕ(X)∈ L1(

P

).

Conditional Value at Risk (CVaR) (at level α). As soon as ϕ(X) ∈ L1(P), the conditional Value-at-Risk is defined by

CVaRα(ϕ(X)) := E[ϕ(X)|ϕ(X) ≥ VaRα(ϕ(X))]. represents the mean loss once in the “stress” zone.

(b) Ψ-Conditional Value-at Risk (Ψ-CVaR) If Ψ(ϕ(X)) ∈ L1(P) CVaRα(ϕ(X)) := E[Ψ(ϕ(X)) |ϕ(X) ≥ VaRα(ϕ(X))].

provides more insight on the distribution of ϕ(X) in the “stress” zone.

(42)

2 Rockafellar and Uryasev representation fomula

Proposition Let V and VΨ be the functions defined by:

V (ξ) = E[v(ξ, X)] and VΨ(ξ) = E[w(ξ, X)] (4) where

v(ξ, x) := ξ + 1

1 − α(ϕ(X) − ξ)+ (5)

and

w(ξ, x) := Ψ(ξ) + 1

1 − α(Ψ(ϕ(x)) − Ψ(ξ))1{ϕ(x)ξ}. (6) Assume L(ϕ(X)) is continuous.

Then, the function V is convex, differentiable and any point of the set arg min V = {∇V = 0} = {ξ | P(ϕ(X) ≤ ξ) = α}.

is a V aRα(ϕ(X)).

(43)

Furthermore,

CV aRα(ϕ(X)) = min

ξR V (ξ) and, for every ξα ∈ arg minV

Ψ-CV aRα(ϕ(X)) = VΨα).

Proof. The function V is convex since the functions ξ 7→ (ϕ(x) − ξ)+ are convex for every x ∈ Rd.

V is differentiable with derivative V (ξ) = 1 − 11αP(ϕ(X) > ξ) reaches its absolute minimum at any ξα satisfying

P(ϕ(X) > ξα) = 1 − α i.e. P(ϕ(X) ≤ ξα) = α.

Moreover V (ξα) = ξα + E[(ϕ(X) − ξα)+] P(ϕ(X) > ξα)

= ξαE[1ϕ(X)>ξα] + E[(ϕ(X) − ξα)+] P(ϕ(X) > ξα)

= E[ϕ(X)|ϕ(X) > ξα]. Likewise VΨα) = Ψ-CV aRα(ϕ(X)).

(44)

3 Stochastic gradient and companion procedure

V aRα computation. The derivative V admit the representation:

V (ξ) = E[H (ξ, X)]

with

H(ξ, x) := ∂v

∂ξ(x, ξ) = 1 − 1

1 − α1{ϕ(x)ξ}. Stochastic gradient defined by:

ξn+1 = ξn − γn+1H(ξn, Xn+1), n ≥ 0, ξ0 ∈ L1(P), (7) where

– (Xn)n1 is an i.i.d sequence of r.v. distribution as X, independent of ξ0, – (γn)n1 is a step sequence (decreasing to 0) satisfying:

(A1) ≡ X

n1

γn = +∞ and X

n1

γn2 < +∞.

(45)

Ψ-CV aRα computation.

Temporarily assume that ξn → ξα = V aRα(ϕ(X))

P

-a.s..

– Naive idea: compute the function VΨ at ξα = V aRα(ϕ(X)):

Ψ-CV aRα = VΨα) = E[wΨα, X)]

using a regular Monte Carlo simulation, 1

n

nX1 k=0

w(ξα, Xk+1 ), Xk i.i.d. with distribution L(X).

– Alternative idea: an adaptive “companion” procedure of the quantile search algorithm

replace ξα by ξk at step k i.e.

Cn = 1 n

nX1 k=0

wΨk, Xk+1), n ≥ 1, C0 = 0.

(46)

(Cn)n0 is the sequence of empirical means of the non i.i.d. sequence (w(ξk, Xk+1))k1 can be written recursively:

Cn+1 = Cn − 1

n + 1 (Cn − w(ξn, Xn+1)) , n ≥ 0, C0 = 0.

– Why γn and n1 ?. . .

Cn+1 = Cn − γn+1 (Cn − w(ξn, Xn+1)), n ≥ 0, C0 = 0, The resulting algorithm reads







ξn+1 = ξn − γn+1H(ξn, Xn+1), ξ0 ∈ L1(P), n ≥ 0

Cn+1 = Cn − γn+1 (Cn − w(ξn, Xn+1)), C0 = 0, n ≥ 0

(47)

3.1 The a.s. convergence of the quantile search algorithm

A (slightly) more general result than in Part III. . .

Theorem (Extended Robbins-Monro Theorem). Let H : Rq × Rd → Rd be a Borel function and X be an Rd-valued random vector such that

E[|H(ξ, X)|] < ∞ for every ξ ∈ Rd. Then set

∀ξ ∈ Rd, h(ξ) = E[H(ξ, X)].

Suppose that the function h is continuous and that T := {h = 0} satisfies

∀ξ ∈ Rd \T ,∀ξ ∈ T , hξ − ξ, h(ξ)i > 0. (8) Let (γn)n1 be the decreasing step sequence satisfying

(A1) ≡ X

n

γn = +∞ and X

n

γn2 < +∞. Suppose that

∀ξ ∈ Rd, kH(ξ, X)k2 ≤ C(1 + |ξ|) (9) (which implies that |h(ξ)|2 ≤ C(1 + |ξ|2)).

(48)

Let (Xn)n1 be an i.i.d sequence of r.v. with distribution L(X), let ξ0 be a random vector independent of (Xn)n1 satisfying E[|ξ0|] < ∞, all

defined on the same probability space (Ω, A,P). Let (rn)n1 be the Fn-measurable reminder sequence satisfying

n|rn| < ∞.

Then, the recursive procedure defined by

ξn = ξn1 − γnH(ξn1, Xn) + γnrn, n ≥ 1 satisfies:

∃ ξ, r.v., such that ξn a.s→ ξ and ξ ∈ T a.s.

The convergence also holds in Lp(P), p ∈ (0,2).

Application to quantile search H(ξ, x) = 1 − 1

1 − α1{ϕ(x)ξ}, rn ≡ 0

ξn −→a.s. ξα ∈ V aRα(ϕ(X)).

(49)

3.2 The a.s. convergence of the companion algorithm

Representation of γn γn = γ0n

Sn , n ≥ 0, with Sn =

Xn k=0

k, γ0 := sup

n1

γn + 1. (10) Conversely

n+1 = ∆n γn+1 γn

γ0

γ0 − γn+1, n ≥ 0, ∆0 = 1.

Elementary computations show that X

n

γn = +∞ =⇒ lim

n+Sn = +∞. and

Cn = 1 Sn

nX1 k=0

k+1wΨk, Xk+1)

= 1

Sn

nX1 k=0

k+1VΨk) +

nX1 k=0

k+1∆Nk+1

!

(50)

where

∆Nk+1 = w(ξk, Xk+1) −

E

(wΨk, Xk+1) | FkX) = wΨk, Xk+1) − VΨk) is an FnX-martingale increment.

– The first sum in the right hand side converges to

VΨα) = Ψ-CV aRα(ϕ(X)) owing to the continuity of VΨ at ξα and Cesaro’s Lemma.

– The convergence to 0 of the second sum will follow from the a.s.

convergence of the martingale Nnγ :=

Xn k=1

γk∆Nk, n ≥ 1 and the Kronecker lemma since γn = γ0Sn

n .

(51)

The sequence (Nnγ)n1 is an Fn-martingale since the ∆Nk’s are martingales increments and

E

(∆Nn)2|FnX1

≤ 1

(1 − α)2Eh

(Ψ (ϕ (X)) − Ψ (ξ))2i

|ξ=ξk−1

.

The continuity of Ψ at ξα and the a.s. convergence of ξk toward ξα imply that

sup

n1E[(∆Nn)2|Fn1] < ∞ a.s.

Consequently, Assumption (A1) implies hNγi = X

n1

γn2E[(∆Nn)2|Fn1] < ∞ which in turn yields the a.s. convergence of (Nnγ)n1.

(52)

3.3 Rate of convergence (I): CLT

Zn = (ξn, Cn), where γn = κ

n n ≥ 1.

Under appropriate assumptions, with z := (V aRα(ϕ(X)),Ψ-CV aRα(ϕ(X)))

√n(Zn − z) −→ NL (0,√

κ D(κ)) where D(κ) also depends on v, wΨ [Dh(z)]1 where

h(ξ, c) =

E

(v(ξ, X)), c

E

( wΨ(ξ, X) .

and κ > 21

min), λmin eigenvalue of [Dh(z)]1 with the lowest real part.

To be compared to the regular Newton-Raphson algorithm (if d = 1 . . . ) Question How to minimize the asymptotic variance √

κ D(κ) as a function of κ?

κ = 1

(53)

3.4 Rate of convergence (I): Averaging principle

Theorem. (Ruppert and Polyak’s Averaging Principle) Suppose that the Rd sequence (Zn)n0 is defined recursively by

Zn+1 = Zn − γn+1(h(Zn) + ǫn+1)

where (ǫn)n1 is an L2+η-bounded sequence of martingale increments such that

∃ Γ ∈ S+(d,

R

) such that En+1ǫtn+1|Fn] a.s→ Γ.

Assume that

{h = 0} = {z} and M = Dh(z) exists in GL(d,

R

)

is repulsive (i.e. all its eigenvalues are positive) and h(z) = Dh(z)(z − z) + O(|z − z|2).

(54)

Set γn = nγ1a with 12 < a < 1, and Z¯n+1 := Z0 + · · · + Zn

n + 1 = ¯Zn − 1

n + 1( ¯Zn − Zn), n ≥ 0.

Then √

n Z¯n − z L

→ N (0, D)

with D = M1Γ(M1)T is optimal in term of “statistical efficiency”.

Comments. A proof of this result is given in Duflo’s book Algorithmes stochastiques (p.169).

Application to V aRα(ϕ(X)) and Ψ-CV aRα(ϕ(X)) computation.

Assume the distribution of ϕ(X) has a density fϕ(X) > 0.

Set ξα := V aRα(ϕ(X)) (unique). Then D is given by

α(1α) fϕ(X)2 α)

α

(1α)fϕ(X)α)E

(Ψ(ϕ(X))−Ψ(ξα

α

(1α)fϕ(X)α)E

(Ψ(ϕ(X))−Ψ(ξα))1{ϕ(X)ξα} 1

(1α)2Var (Ψ(ϕ(X))−Ψ(ξα)) 1

(55)

4 Speeding up the procedure

By the way V aRα(ϕ(X)) is about rare events! If α ≈ 1, the matrix D explodes!

In practice, the algorithm remains frozen !

(Classical) solution Variance reduction by Importance sampling!

Assume

L(X) = p(x)λd(dx).

Example: Twist by mean translation the distribution of X to minimize both

-- Var 1{ϕ(X)ξα}

(asymptotic variance of V aRα) – Var (Ψ(ϕ(X)) − ξα) 1{ϕ(X)ξα}

(asymptotic variance of CV aRα).

(56)

Finding parameters θ and µ that minimize the two functions:

Q1(θ) := E

1{ϕ(X)ξα} p(X) p(X − θ)

Q2(µ) := E

(Ψ(ϕ(X)) − ξα)2 1{ϕ(X)ξα} p(X) p(X − µ)

. Objections. 1. Explosion ! We know how to encompass (see Part II) 2. ξα the target is unknown ! We make it “implicit”

(57)

4.1 General log -concave setting from Part II

If p satisfies the assumptions of the log-concave setting,

∃a ∈ [1,2] such that



(i) |∇pp|(x) = O(|x|a1) as |x| → ∞ (ii) ∃ρ > 0,log (p(x)) + ρ|x|a is convex.

then, the first step of the machinery of Part II yields:

The optimal variance reducers (θα, µα) are zeros of

∇Q1(θ) = 0 and ∇Q2(µ) = 0 where

Références

Documents relatifs

We study the problem of the lower semicontinuous envelope of such a functional under the assumptions that the surface energy density is positively homogeneous of degree one and

If {xk} is a sequence of real numbers such that each derived sequence {xk+h-xk}, h a positive integer, is uniformly distributed modulo 1 then so is the original

We shall present here some examples of functions with rapid growth in the direc- tion e ~, more specifically functions satisfying the condition of Theorem 4..

Die Coefficienten der bestandig convergirenden P0tenzrcihe, in die F(x) entwickelt werden kann, wOrden sicht leieht ergeben, wGnn diese Frage bejaht werden

Among this class of methods there are ones which use approximation both an epigraph of the objective function and a feasible set of the initial problem to construct iteration

Il ne doit faire l'objet d'aucun compte rendu ou résumé ni d'aucune citation sans l'autorisation de l'Organisation mondiale de la Santé.. Les opinions exprimées dans les

Le ompilateur doit traduire un programme soure, érit dans un langage de programmation.. donné, en un

Langage mahine.- Uun sous-programme peut se trouver dans le même segment que le programme. qui l'appelle : la valeur du registre IP est alors hangée mais il est inutile de