V. Lemaire & G. Pag` es

(1)

Unconstrained

Recursive Importance

Sampling and applications

V. Lemaire & G. Pag` es

⁽^∗⁾

LPMA, Univ. Paris 6

—

9 F´evrier 2009, CMAP-X

(2)

Basic problem of Numerical Probability Compute by Monte Carlo simulation

m =

E

^F^(X⁾ ^X ^{: (Ω,}A,

P

⁾ →

R

^q ^{or (H,}⁽^.|.)_H) Hilbert space Examples in Finance : Option premium, greek parameters, etc.

If F(X)∈ L²(

P

⁾

E

^F^(X⁾ − 1 M

XM k=1

F(X_k)

2

=

rVar(F(X))

M ,

Central Limit Theorem, Law of the iterated Logarithm, etc, are ruled by Var(F(X)).

Suggest Variance reduction.

(3)

One tool (in

R

^d^{) :} Importance sampling:

X ∼ p(x)λ_d(dx) and X^θ ∼ p_θ(x)dx.

E

^(F^(X^{)) =}

Z

F(x)p(x)dx = Z

R^d

F(x) p

p_θ (x)p_θ(x)dx =

E

F(X^θ) p

p_θ (X^θ)

. Resulting minimization problem min

θ V (θ) V (θ) =

E

F(X^θ)² p²

p²_θ(X^θ)

=

E

F(X)² p

p_θ (X)

. If θ 7→ p_θ(x) is log-concave and lim_|_θ_|→∞ p_θ(x) = 0 then

V is convex and lim

|θ|→∞V (θ) = +∞ so that

Argmin_θV = {∇^θV = 0} 6= ∅.

(4)

Stochastic Recursive Zero search: Robbins-Monro algorithm.

V (θ) =

E

^{(v(θ, X}^{))“ =}⇒ ”∇V (θ) =

E







∂v

∂θ(θ, X)

| {z }

local gradient





 =

E



 w(θ, X)

| {z }

pseudo−local gradient





Let ρ :

R

^d → (0,+∞), Borel function. Then set

H(θ, X) := ρ(θ)w(θ, X).

(5)

Theorem (Extended Robbins-Monro algorithm) Let X₁, X₂, . . . , X_n i.i.d.

with distribution L(X) on (Ω,A,

P

) (easy to simulate).

Let γ = (γ_n)_n_≥₁,

X

n

γ_n = +∞ and X

n

γ_n² < +∞. Set

θ_n+1 = θ_n − γ_n+1H(θ_n, X_n+1), θ₀∈

R

^d^.

Then, if (WARNING!)

∀θ∈

R

^d^, kH(θ, X)k2 ≤ C(1 + |θ|)

there exists a random vector θ^∗_∞(Ω,A,

P

⁾ → {∇V = 0} such that

θ_n −→^a.s. θ_∞^∗ as n → ∞.

(6)

Why ?

Key 1: Fn = σ(θ₀, X₁, . . . , X_n)

E

^(H^(θⁿ^{, X}ⁿ⁺¹⁾| Fⁿ) = (

E

^H^{(θ, X}⁾⁾_|θ=θn = ρ(θ_n)∇V (θ_n) so that

θ_n+1 = θ_n − γ_n+1ρ(θ_n)∇V (θ_n)

| {z }

N ewton−Raphson like algo.

−γ_n+1 H(θ_n, X_n+1) −

E

^H^(θⁿ^{, X}ⁿ⁺¹ | Fn)

| {z }

M artingale disturbance term

Key 2: If sup_n

E

⁽|H(θ_n, X_n+1)|²) < +∞ and P

n≥1 γ_n² < +∞ then

E

^Mn² = X

n≥1

γ_n²

E

⁽|H(θ_n, X_n+1)|²) < +∞ so that M_n −→ M_∞∈ L²(

P

^).

The disturbance term is fading faster than γ_n

(7)

Stochastic Approximation =

Deterministic zero search procedures + Monte Carlo (simulation)

(8)

I. Recursive variance reduction: the Arouna-Lapeyre (2003) algorithm revisited (Lemaire-P., 2008)

Importance sampling by mean translation:

X ∼ N(0; I_d),

E

^F^(X^{) =}

Z

R^d

F(x)e⁻^|x|

2

2 dx

(2π)^d² . p_θ(x) = p(x − θ) with p(x) = e⁻^|x|

2 2

(2π)^d² Cameron-Martin formula

E

^F^(X^{) =}

E

^(F^(X ⁺ ^θ)e⁻^(θ^|^X⁾^e⁻^|θ|

2

2 ). (1)

The θ with the lowest variance is solution to

minθ V (θ) with V (θ) =

E

^F^(X ⁺ ^θ)²^e⁻^2(θ^|^X⁾^e^−|^θ^|²

If F is smooth, large deviation approach (see Glasserman and al. 1999)

(9)

Comparison with GHS (Glasserman et al.)

θmin∈R^d

E

^F^(X⁾²^e⁻^<θ^|^X^>+¹²^|^θ^|²¹^D^(X⁾

where D := {F > 0}.

E

^F^(X⁾²^e⁻^<θ^|^X^>+¹²^|^θ^|²¹^D^(X⁾ ^{= (2π)}⁻^d/2

Z

D

e^{2 log(F}^(x))⁻^<θ^|^x>+¹²^|^θ^|²⁻¹²^|^x^|²dx

≈ C max

x∈D exp (2 log(F(x))−< θ|x > +1

2|θ|²− 1

2|x|²) so that the above minimization problem amounts to

θmin∈R^d max

x∈D(2 log(F(x))− < θ|x > +1

2|θ|² − 1

2|x|²)

(10)

Following Arouna: second change of variable V (θ) = e^|θ|

2

E

^F²^(X^)e⁻^(X^|^θ)

so that

∇V (θ) = e^|θ|

2

E

^F²^(X^)e⁻^(θ^|^X⁾^(θ − X) . which yields

θ_n+1 = θ_n − γ_n+1 e^|^θn^|

2

2 F²(X_n+1)e⁻^(θⁿ^|^Xⁿ⁺¹⁾(θ_n − X_n+1)

| {z }

=H(θn,Xn+1)

.

Unfortunately :

lim inf

|θ|→∞

kH(θ, X)k2

|θ| = +∞. . . Consequence : It does explode in general.

(11)

Remedy : Arouna suggests a constrained version of the algorithm with a slow relaxation of the constraint (i.e. slowly increasing sequence of

compact sets)

algorithm with repeated projections “`a la Chen”

which is the mathematical formalisation of repeated trials.

A.s. convergence holds with a CLT (Lelong 2007) once stabilization has occured.

In pratice : the choice of the compact sets needs much care in connection with the step sequence

(12)

New approach (Lemaire-P. (2008)) :

third change of variable to plug θ in the payoff F ! Plug back θ into F!

∇V (θ) = e^|θ|

2

E

^F²^(X^)e⁻^(θ^|^X⁾^(θ − X)

= e^|^θ^|²

E

^F²^(X − θ)(2θ − X) . +

Growth control of F at infinity:

|F(x + y)| ≤ (a + b|F(x)|)(a + b|F(y)|) + c

⇓

θ_n+1 = θ_n − γ_n+1 F(X_n+1 − θ_n)²

1 + F(−θ_n)² (2θ_n − X_n+1)

| {z }

=:He(θn,Xn+1)

.

The function He satisfies the linear growth assumption in L²(

P

^).

Hence a.s. convergence to an optimal θ^∗∈ ArgminV with CLT, LIL, etc.

(13)

II. Extension mean translation for log -concave p.d.f.

Still p_θ = p(x − θ) now with

p is log-concave and lim

|x|→∞p(x) = 0.

We make the following assumption on the probability density p

∃a∈ [1,2] such that





(i) ^|∇_p^p^|(x) = O(|x|^a⁻¹) as |x| → ∞ (ii) ∃δ > 0, logp(x) + δ|x|^a is convex.

The same approach based on three translations of means

E

⁽|X|^2(a⁻¹⁾F²(X)) < +∞ and F sub-multiplicative

H(θ, x) := F²(x − θ) 1 + F²(−θ)

| {z }

self−control

e⁻^2δ^|^θ^|^a p²(x − θ) p(x)p(x − 2θ)

| {z }

≤C

∇p(x − 2θ) p(x − 2θ) ,

(14)

satisfies

E

^H^{(θ, X}^{) =} ^ρ(θ)∇V (θ) and

kH(θ, X)k2 ≤ C(1 + |θ|) so that the extended R-M Theorem applies.

(15)

III. Esscher transform

Set ψ(θ) := log

E

^(e^(θ^|^X⁾⁾ ^< ⁺∞, θ∈

R

^d^.

∀ θ∈

R

^d^, ^p^θ^{(x) =} ^e^(θ^|^X⁾⁻^ψ(θ)^p(x), ^x ∈

R

^d^.

Assume that both X = X⁰ and X^(θ) = g(θ, U), U ∼ U([0,1]) can be simulated at the same “reasonable” cost .

V (θ) =

E

^h^F²^(X^)e⁻^(θ^|^X^)+ψ(θ)ⁱ^.

Suppose that the function ψ satisfies





(i) lim_|_θ_|→∞ ^ψ(θ)_|_θ_| = +∞ or lim_|_θ_|→∞ ψ(θ) − 2ψ(θ/2) = +∞. (ii) ∃ δ > 0, θ 7→ ψ(θ) − δ |θ|² is concave,

Assume the payoff F satisfies

∀θ ∈

R

^d^,

E

^h|X|F²(X)e^(θ^|^X⁾i

< +∞

(16)

Then,

∇V (θ) =

E

⁽∇ψ(θ) − X)F²(X)

e⁻^(θ,X^)+ψ(θ),

=

E

^h ∇ψ(θ) − X⁽⁻^θ)

F²(X⁽⁻^θ))i

e^ψ(θ)⁻^ψ(⁻^θ), Then the recursive procedure

θ_n+1 = θ_n − γ_n+1H(θ_n, g(−θ_n, U_n+1)

| {z }

=:Xn+1

)

where

– (U_n)_n_≥₁ is an i.i.d. sequence with L(X⁽⁻^θ))

− H(θ, x) := e⁻^λ²^√^d^|∇^ψ(⁻^θ)^|F²(x) ∇ψ(θ) − x satisfies

θ_n −→^a.s. θ_∞^∗ , θ_∞^∗ is a {∇V = 0}-valued r.v..

(17)

IV. Functional setting: pathwise dependent diffusions

We consider a d-dimensional Itˆo process X = (X_t)_t_∈_[0,T_] solution to the S.D.E.

(E_b,σ,W) ≡ dX_t = b(t, X^t)dt + σ(t, X^t)dW_t, X₀ = x∈

R

^d^, ⁽²⁾

where X^t denotes the stopped process at t.

– Includes Diffusion and Euler Scheme

(H_b,σ) ≡









(i) b and σcontinuous on [0, T] × C([0, T],

R

^d⁾

(ii) ∀ t∈ [0, T], ∀x, y∈ C([0, T],

R

^d^),

|b(t, y) − b(t, x)| + kσ(t, y) − σ(t, x)k ≤ C_b,σkx − yk^sup. ensures existence and uniqueness of a strong solution.

Aim: Compute by Monte Carlo simulation

E

^F^(X⁾

F : C([0, T],

R

^d⁾ →

R

^, ^F^(X⁾∈ L¹(

P

^{) and}

P

^(F^(X⁾ 6= 0) > 0.

(18)

An extension : replace θ∈

R

^d ^by

ϕ(X^t)

| {z }

driver

×θ(t)

with

ϕ : C([0, T],

R

^d⁾ →

R

^q^, ^{bounded, θ}∈ L²_T := L²([0, T], dt).

For this talk q = 1 and

∀ξ∈ C([0, T],

R

^d^), ^ϕ(ξ) ≡ 1

Tool : Triple application of Girsanov ’s Theorem.

(19)

• Representations of

E

^F^(X⁾ [Girsanov 1] :

E

^F^(X^{) =} ^e⁻

1

2kθk²_L2

T

E

^F^(X^(θ)⁾^e⁻^R⁰^T ^θ(s)dW^s^.

where

dX_t^(θ) = (b(t, X^(θ),t) +θ(t)σ(t, X^(θ),t))dt+σ(t, X^(θ),t)dW_t, X₀ = x∈

R

^d^.

• Variance Minimization [Girsanov 2]:

θmin∈L²_T V (θ) ou min

θ∈E V (θ), E = span(e₁, . . . , e_m) ⊂ L²_T where

V (θ) := e^−k^θ^k

2 L2

T ,q

E

^F²^(X^(θ)⁾^e⁻²^R⁰^T ^θ(s)dW^s

= e

1

2kθk²_L2

T ,q

E

^F²^(X⁾^e⁻^R⁰^T ^θ(s)dW^s^.

(20)

• Gradient representation [Girsanov 3]: (H_b,σ) and

E

^F^(X⁾^2+δ⁾ ^< ⁺∞. V is log-convexe and lim_k_θ_k

L2

T ,q→+∞ V (θ) = +∞ (Fatou) hence Argmin_θ_∈_L²

T

V 6= ∅ and Argmin_θ_∈_E 6= ∅ E closed (finite dimensional) subspace of L²_T.

V is differentiable at every θ∈ L²_T with gradient ∇V (θ)∈ L²_T : ∀ψ∈ L²_T (∇V (θ)|ψ)_L²

T = e

1

2kθk²_L2

T ,q

E

^F²^(X⁾^e⁻^R⁰^T ^θ(s)dW^s ^(θ|ψ)_L²

T − Z T

0

ψ(s)dW_s

!!

,

= e^k^θ^k

2 L2

T

E

^F²^(X⁽⁻^θ)⁾ ^2(θ|ψ)_L²

T −

Z T 0

ψ(s)dW_s

!!

. (3)

(21)

• “Sub-linearity correction” in (3):

Follows from an priori (strong) control of X − X^(θ) (both living on the same space) on an appropriate space. Let p ≥ 1.

∀θ∈ L²_T , X and X^(θ) strong solutions of E_b,σ,_W_˜ and E_b+σθ,_W_˜ .

sup

t∈[0,T]|X_t − X_t^(θ)| p

≤ C_b,σe^C^b,σ^T

Z T

0 |σ(s, X^(θ),s)θ(s)|ds p

≤ C_σkθkL¹_T

1 + kkXk^supk_p(1+δ) e

1

2pδkθk²_L2 T

.

(22)

If the functional F satisfies

∀ x∈ C([0, T],

R

^d^), |F(x)| ≤ C_F (1 + kxk^rsup)

Let E = Vec(e₁, . . . , e_m) ⊂ L²_T (to be specified). The algorithm defined by θ_n+1 = θ_n − γ_n+1H_r(θ_n, X⁽⁻^θⁿ⁾, W⁽ⁿ⁺¹⁾)

with (W⁽ⁿ⁾)_n_≥₁ i.i.d. Brownian motions,

X⁽⁻^θⁿ⁾ = G(−θ_n, W⁽ⁿ⁺¹⁾) solution to (E_b₋_σθ_n, W⁽ⁿ⁺¹⁾),

∀i∈ {1, . . . , m},

H_r,i(θ, x, W) := (H_r(θ, x, W)|e_i)_L²

T ,q = F²(x) 1 + kθk^2r_L²

T

2(θ|e_i)_L²

T ,q− Z T

0

e_i(s)dW_s

!

satisfies

θ_n −→^p.s. θ_E^∗ ∈ argmin_EV.

(23)

Practical implementation

Phase 1 : Compute θ^∗ = θ_N using M_RM iterations.

Phase 2 : Compute

E

^F^(X) via Monte Carlo of size M_{M C}

E

^F^(X⁾ ≈ 1 M_{M C}

MXM C

m=1

e⁻

1

2kθ^∗k²_L2

T F((X^(θ^∗⁾)^(m)) e⁻^R⁰^T ^θ(s)dW^s^(m) W^(m) independent Brownian motions.

M_RM ≪ M_{M C} (e.g. M_RM ≈ M_{M C}/10).

Alternative : Adaptive coupling (for mean translation only)

(24)

Numerical experiments (I): NIG Call option X =^d N IG(α, β, δ, µ) = αδK₁(αp

δ² + (x − µ)²) πp

δ² + (x − µ)² e^δ√

α²−β²+β(x−µ)dx.

with K₁ modified Bessel function of the second kind.

F(x) = 50 (e^x − K)₊

α = 2, β = 0.2, δ = 0.8, µ = 0.04.

M_RM = 100 000, M_{M C} = 1 000 000.

Translation

f o r n = 0 t o M do

X ˜ NIG ( a l p h a , b e t a , gamma , d e l t a )

t h e t a = t h e t a − 1 / ( n +1000)∗H1 ( t h e t a , X) f o r n = 0 t o N do

X ˜ NIG ( a l p h a , b e t a , gamma , d e l t a ) mean = mean + F (X) ∗ p (X+t h e t a ) / p (X)

Esscher transform

f o r n = 0 t o M do

X ˜ NIG ( a l p h a , b e t a−t h e t a , gamma , d e l t a ) t h e t a = t h e t a − 1 / ( n +1000)∗H2 ( t h e t a , X) f o r n = 0 t o N do

X ˜ NIG ( a l p h a , b e t a+t h e t a , gamma , d e l t a ) mean = mean + F (X) ∗ exp(−t h e t a∗X)

mean = mean ∗ exp ( p s i ( t h e t a ) )

(25)

K mean crude var var. ratio. var. ratio

translation (θ) Esscher (θ) 0.6 42.19 8538 5.885 (0.791) 56.484 (1.322) 0.8 34.19 8388 7.525 (0.903) 39.797 (1.309) 1.0 27.66 8176 9.218 (0.982) 32.183 (1.294) 1.2 22.60 7930 10.068 (1.017) 29.232 (1.280) 1.4 18.76 7677 9.956 (1.026) 28.496 (1.268)

Table 1: Variance reduction for different strikes.

(26)

densites

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

-2 -1 0 1 2 3 4

crude translation Esscher

Figure 1: Densities of X (crude), X + θ (translation) and X^(θ) (Esscher) in the case K = 1.

(27)

Numerical experiments (II): Simplified Spark Spread

We consider now a exchange option between gas and electricity (called spark spread).

A simplified form of the payoff is

F(X) = 50(e^X^elec − ce^X^gas − K)₊,

where X^elec ∼ N IG(2,0.2,0.8,0.04) and X^gas ∼ N IG(1.4,0.2,0.2,0.04) are independent.

• 300 000 iterations of Robbins-Monro

• 3 000 000 iterations of Monte Carlo

(28)

Simplified Spark Spread option - results

K c mean crude var var. ratio. var. ratio translation Esscher 0.6 0.2 33.235 8378.4 5.2609 27.455

0.4 26.534 8133.3 5.0604 28.669 0.6 21.587 7862.7 4.8046 30.649 0.8 17.931 7595.2 4.5839 33.656

1 15.184 7344.2 4.4064 37.489

0.8 0.2 26.908 8160.1 5.1366 28.876 0.4 21.725 7884.9 4.844 31.018 0.6 17.955 7612.5 4.6031 34.166 0.8 15.156 7357.3 4.416 38.167

1 13.027 7123.9 4.2685 42.781

(29)

Numerical experiments (III): Barrier options and local volatility models

Pseudo-CEV model (α∈]0,1]) : dX_t = rX_tdt + σX_t^α X_t

p1 + X_t² dW_t, X₀ = x > 0, Down-and-in Call:

F(X) = (X_T − K)₊1n

0≤mint≤T X_t ≤ Lo.

Market parameters

X₀ = 100, r = 0.04, pseudo-volatility σ = 7, α = 0.5.

Contract parameter K = 115, L = 65, T = 1.

Time discretization : Continuous Euler scheme (Brownian bridge) with step T /n, n = 100.

M_RM = 50 000 M_{M C} = 500 000

(30)

Down & In Call option - Brownian interpolation

• ( ¯X_t_k) Euler scheme of step t_k = k^T_n with n = 100.

• Brownian bridge interpolation and pre-conditioning:

E

^F^{( ¯}^X⁾ ⁼

E

^F^{( ¯}^X⁾|X¯_t₁,· · · ,X¯_t_n

=

E

"

( ¯X_T − K)₊ 1 −

NY−1 k=0

p( ¯X_t_k,X¯_t_k+1)

!#

, with

p(x_k, x_k+1) = 1 −

P

t∈min[0,t1]W_t ≤ L − x_k σ(x_k)

W_t₁ = x_k+1 − x_k σ(x_k)

,

=





0 if L ≥ min(x_k, x_k+1),

1 − e⁻

2(L−xk)(L−xk+1 )

σ2 (xk)(tk+1−tk) , otherwise.

(31)

Basis of L²([0,1],

R

⁾

• a polynomial basis: ∀n ≥ 0,∀t ∈ [0,1],

P˜_n(t) = P_n(2t − 1) where P_n(t) = 1 2ⁿn!

dⁿ

dtⁿ (t² − 1)ⁿ

. (ShLeg)

• the Karhunen-Lo`eve basis: ∀n ≥ 0,∀t ∈ [0,1], e_n(t) = √

2 sin

n + 1 2

πt

(KL)

• the Haar basis: ∀n ≥ 0,∀k = 0, . . . ,2ⁿ − 1,∀t ∈ [0,1],

ψ_n,k(t) = 2^k² ψ(2^kt − n) (Haar)

where ψ(t) =









1 if t ∈ [0, ¹₂)

−1 if t ∈ [¹₂,1) 0 otherwise

(32)

Results for trivial driver ϕ = I_d

Basis Dim. Mean CI 95% Variance ratio

Constant 1 3.1836 ±0.0251 2.6297

ShiftLegendre 2 3.1830 ±0.0223 3.3258 4 3.1815 ±0.0215 3.5670 8 3.1813 ±0.0215 3.5659 Karhunen-Lo`eve 2 3.1852 ±0.0187 4.7254 4 3.1862 ±0.0183 4.9385 8 3.1918 ±0.0178 5.2183

Haar 2 3.1834 ±0.0215 3.5699

4 3.1871 ±0.0186 4.7896 8 3.1864 ±0.0177 5.2675

Table 2: Variance ratio obtained for different basis in the local volatility model (K = 115, L = 65, variance of the crude Monte Carlo: 206.52).

(33)

Representations of the optimal variance reducer

-1 -0.5 0 0.5 1 1.5 2 2.5

0 10 20 30 40 50 60 70 80 90 100

cst poly haar KL

(2 basis functions)

(34)

-1 -0.5 0 0.5 1 1.5 2 2.5

0 10 20 30 40 50 60 70 80 90 100

cst poly haar KL

(4 basis functions)

(35)

-1 -0.5 0 0.5 1 1.5 2 2.5

0 10 20 30 40 50 60 70 80 90 100

cst poly haar KL

(8 basis functions)

(36)

With a non trivial driver ϕ(t,X¯^t) =

¯

p_k 1 − p¯_k

, with p¯_k =

P

0≤smin≤kT /n

X¯_s

X¯₀, . . . ,X¯_{kT /n}

and

E = (

R

¹[0,T])² so that Variance reducer

α p¯_k + β (1 − p¯_k).

(37)

Results with non-trivial driver ϕ

K L Mean CI 95% Var.Ratio (Crude) α β

85 65 3.1827 ±0.0127 10.02 (20.28) -0.3057 1.5522 75 6.4115 ±0.0190 9.96 (45.03) -0.1428 1.7985 95 65 3.1846 ±0.0124 10.65 (19.08) -0.1141 1.9139 75 6.4117 ±0.0199 9.07 (49.42) -0.0029 1.9814 85 11.4478 ±0.0293 8.03 (106.99) 0.1898 1.8937 105 65 3.1835 ±0.0135 8.98 (22.65) 0.1487 1.9628 75 6.4120 ±0.0209 8.21 (54.59) 0.1493 2.0060 85 11.4458 ±0.0295 7.88 (108.94) 0.2503 1.8737 95 18.6060 ±0.0345 9.83 (149.07) 0.5594 1.4343 115 65 3.1817 ±0.0148 7.38 (27.54) 0.3062 1.6884 75 6.4112 ±0.0209 8.18 (54.79) 0.1928 1.8119 85 11.4470 ±0.0289 8.24 (104.16) 0.2599 1.7430 95 18.6061 ±0.0346 9.79 (149.76) 0.5755 1.4313

Table 3: Variance reduction for different strikes and barrier levels in the local volatility model.

(38)

Results with non-trivial driver ϕ in the Black-Scholes model

K L Mean CI 95% Var.Ratio (Crude) α β

85 65 2.5738 ±0.0115 13.49 (16.56) -0.1752 1.6685 75 6.0489 ±0.0186 14.26 (43.39) 0.0493 1.9191 95 65 2.5704 ±0.0110 14.64 (15.26) 0.0524 1.9987 75 6.0492 ±0.0190 13.67 (45.25) 0.1557 2.0560 85 11.5970 ±0.0301 12.23 (112.92) 0.4108 2.1226 105 65 2.5687 ±0.0122 12.03 (18.56) 0.3888 2.1423 75 6.0548 ±0.0206 11.66 (53.08) 0.3895 2.1720 85 11.5953 ±0.0308 11.67 (118.32) 0.4524 2.1608 95 19.2882 ±0.0348 17.17 (151.04) 0.6619 1.7910 115 65 2.5706 ±0.0135 9.75 (22.90) 0.5473 1.8903 75 6.0530 ±0.0211 11.16 (55.42) 0.4591 1.9371 85 11.5976 ±0.0297 12.55 (109.98) 0.4807 2.0008 95 19.2958 ±0.0347 17.21 (150.67) 0.7217 1.6380

Table 4: Variance reduction for different strikes and barrier levels in the Black&Scholes model (r = 0.04, σ = 0.7).

(39)

Computing

V aR and CV aR

Monte Carlo simulation by

O. Bardou & N. Frikha & G. Pag` es

⁽^∗⁾

LPMA, Univ. Paris 6

—

(40)

1 Definitions

Let X : (Ω, A,P) → R^d be a random vector (structure variable).

Let ϕ : R^d → R be a Borel function representative of a loss

Definition. Value-at Risk (VaR) Let α∈ (0,1) be the confidence level.

VaR_α(ϕ(X)) := inf {ξ | P(ϕ(X) ≤ ξ) ≥ α} i.e. the lowest α-quantile of ϕ(X).

If ϕ(X) has a continuous distribution (no atom) , then P(ϕ(X) ≤ ξ) = α.

and if ϕ(X) has no “hole” then VaR_α(ϕ(X)) is unique.

(41)

Definitions. (a) Conditional Value-at Risk (CVaR) such that ϕ(X)∈ L¹(

P

^).

Conditional Value at Risk (CVaR) (at level α). As soon as ϕ(X) ∈ L¹(P), the conditional Value-at-Risk is defined by

CVaR_α(ϕ(X)) := E[ϕ(X)|ϕ(X) ≥ VaR_α(ϕ(X))]. represents the mean loss once in the “stress” zone.

(b) Ψ-Conditional Value-at Risk (Ψ-CVaR) If Ψ(ϕ(X)) ∈ L¹(P) CVaR_α(ϕ(X)) := E[Ψ(ϕ(X)) |ϕ(X) ≥ VaR_α(ϕ(X))].

provides more insight on the distribution of ϕ(X) in the “stress” zone.

(42)

2 Rockafellar and Uryasev representation fomula

Proposition Let V and V_Ψ be the functions defined by:

V (ξ) = E[v(ξ, X)] and V_Ψ(ξ) = E[w(ξ, X)] (4) where

v(ξ, x) := ξ + 1

1 − α(ϕ(X) − ξ)₊ (5)

and

w(ξ, x) := Ψ(ξ) + 1

1 − α(Ψ(ϕ(x)) − Ψ(ξ))1_{_ϕ(x)_≥_ξ_}. (6) Assume L(ϕ(X)) is continuous.

Then, the function V is convex, differentiable and any point of the set arg min V = {∇V = 0} = {ξ | P(ϕ(X) ≤ ξ) = α}.

is a V aR_α(ϕ(X)).

(43)

Furthermore,

CV aR_α(ϕ(X)) = min

ξ∈R V (ξ) and, for every ξ_α^∗ ∈ arg minV

Ψ-CV aR_α(ϕ(X)) = V_Ψ(ξ_α^∗).

Proof. The function V is convex since the functions ξ 7→ (ϕ(x) − ξ)₊ are convex for every x ∈ R^d.

V is differentiable with derivative V ^′(ξ) = 1 − ₁₋¹_αP(ϕ(X) > ξ) reaches its absolute minimum at any ξ_α^∗ satisfying

P(ϕ(X) > ξ_α^∗) = 1 − α i.e. P(ϕ(X) ≤ ξ_α^∗) = α.

Moreover V (ξ_α^∗) = ξ_α^∗ + E[(ϕ(X) − ξ_α^∗)₊] P(ϕ(X) > ξ_α^∗)

= ξ_α^∗E[1_ϕ(X_)>ξ_α^∗] + E[(ϕ(X) − ξ_α^∗)₊] P(ϕ(X) > ξ_α^∗)

= E[ϕ(X)|ϕ(X) > ξ_α^∗]. Likewise V_Ψ(ξ_α^∗) = Ψ-CV aR_α(ϕ(X)). _♦

(44)

3 Stochastic gradient and companion procedure

V aR_α computation. The derivative V ^′ admit the representation:

V ^′(ξ) = E[H (ξ, X)]

with

H(ξ, x) := ∂v

∂ξ(x, ξ) = 1 − 1

1 − α1_{_ϕ(x)_≥_ξ_}. Stochastic gradient defined by:

ξ_n+1 = ξ_n − γ_n+1H(ξ_n, X_n+1), n ≥ 0, ξ₀ ∈ L¹(P), (7) where

– (X_n)_n_≥₁ is an i.i.d sequence of r.v. distribution as X, independent of ξ₀, – (γ_n)_n_≥₁ is a step sequence (decreasing to 0) satisfying:

(A1) ≡ X

n≥1

γ_n = +∞ and X

n≥1

γ_n² < +∞.

(45)

Ψ-CV aR_α computation.

Temporarily assume that ξ_n → ξ_α^∗ = V aR_α(ϕ(X))

P

^-a.s..

– Naive idea: compute the function V_Ψ at ξ_α^∗ = V aR_α(ϕ(X)):

Ψ-CV aR_α = V_Ψ(ξ_α^∗) = E[w_Ψ(ξ_α^∗, X)]

using a regular Monte Carlo simulation, 1

n

nX−1 k=0

w(ξ_α^∗, X_k+1^′ ), X_k^′ i.i.d. with distribution L(X).

– Alternative idea: an adaptive “companion” procedure of the quantile search algorithm

replace ξ_α^∗ by ξ_k at step k i.e.

C_n = 1 n

nX−1 k=0

w_Ψ(ξ_k, X_k+1), n ≥ 1, C₀ = 0.

(46)

(C_n)_n_≥₀ is the sequence of empirical means of the non i.i.d. sequence (w(ξ_k, X_k+1))_k_≥₁ can be written recursively:

C_n+1 = C_n − 1

n + 1 (C_n − w(ξ_n, X_n+1)) , n ≥ 0, C₀ = 0.

– Why γ_n and _n¹ ?. . .

C_n+1 = C_n − γ_n+1 (C_n − w(ξ_n, X_n+1)), n ≥ 0, C₀ = 0, The resulting algorithm reads









ξ_n+1 = ξ_n − γ_n+1H(ξ_n, X_n+1), ξ₀ ∈ L¹(P), n ≥ 0

C_n+1 = C_n − γ_n+1 (C_n − w(ξ_n, X_n+1)), C₀ = 0, n ≥ 0

(47)

3.1 The a.s. convergence of the quantile search algorithm

A (slightly) more general result than in Part III. . .

Theorem (Extended Robbins-Monro Theorem). Let H : R^q × R^d → R^d be a Borel function and X be an R^d-valued random vector such that

E[|H(ξ, X)|] < ∞ for every ξ ∈ R^d. Then set

∀ξ ∈ R^d, h(ξ) = E[H(ξ, X)].

Suppose that the function h is continuous and that T ^∗ := {h = 0} satisfies

∀ξ ∈ R^d \T ^∗,∀ξ^∗ ∈ T ^∗, hξ − ξ^∗, h(ξ)i > 0. (8) Let (γ_n)_n_≥₁ be the decreasing step sequence satisfying

(A1) ≡ X

n

γ_n = +∞ and X

n

γ_n² < +∞. Suppose that

∀ξ ∈ R^d, kH(ξ, X)k2 ≤ C(1 + |ξ|) (9) (which implies that |h(ξ)|² ≤ C(1 + |ξ|²)).

(48)

Let (X_n)_n_≥₁ be an i.i.d sequence of r.v. with distribution L(X), let ξ₀ be a random vector independent of (X_n)_n_≥₁ satisfying E[|ξ₀|] < ∞, all

defined on the same probability space (Ω, A,P). Let (r_n)_n_≥₁ be the Fⁿ-measurable reminder sequence satisfying

Xγ_n|r_n| < ∞.

Then, the recursive procedure defined by

ξ_n = ξ_n₋₁ − γ_nH(ξ_n₋₁, X_n) + γ_nr_n, n ≥ 1 satisfies:

∃ ξ_∞, r.v., such that ξ_n ^a.s→ ξ_∞ and ξ_∞ ∈ T ^∗ a.s.

The convergence also holds in L^p(P), p ∈ (0,2).

Application to quantile search H(ξ, x) = 1 − 1

1 − α1_{_ϕ(x)_≥_ξ_}, r_n ≡ 0

⇓

ξ_n −→^a.s. ξ_α^∗ ∈ V aR_α(ϕ(X)).

(49)

3.2 The a.s. convergence of the companion algorithm

Representation of γ_n γ_n = γ₀ ∆_n

S_n , n ≥ 0, with S_n =

Xn k=0

∆_k, γ₀ := sup

n≥1

γ_n + 1. (10) Conversely

∆_n+1 = ∆_n γ_n+1 γ_n

γ₀

γ₀ − γ_n+1, n ≥ 0, ∆₀ = 1.

Elementary computations show that X

n

γ_n = +∞ =⇒ lim

n→+∞S_n = +∞. and

C_n = 1 S_n

nX−1 k=0

∆_k+1w_Ψ(ξ_k, X_k+1)

= 1

S_n

nX−1 k=0

∆_k+1V_Ψ(ξ_k) +

nX−1 k=0

∆_k+1∆N_k+1

!

(50)

where

∆N_k+1 = w(ξ_k, X_k+1) −

E

^(w^Ψ^(ξ^k^{, X}^k+1⁾ | F_k^X) = w_Ψ(ξ_k, X_k+1) − V_Ψ(ξ_k) is an Fn^X-martingale increment.

– The first sum in the right hand side converges to

V_Ψ(ξ_α^∗) = Ψ-CV aR_α(ϕ(X)) owing to the continuity of V_Ψ at ξ_α^∗ and Cesaro’s Lemma.

– The convergence to 0 of the second sum will follow from the a.s.

convergence of the martingale N_n^γ :=

Xn k=1

γ_k∆N_k, n ≥ 1 and the Kronecker lemma since γ_n = γ₀^∆_Sⁿ

n .

(51)

The sequence (N_n^γ)_n_≥₁ is an Fn-martingale since the ∆N_k’s are martingales increments and

E

(∆N_n)²|Fn^X−1

≤ 1

(1 − α)²Eh

(Ψ (ϕ (X)) − Ψ (ξ))²i

|ξ=ξk−1

.

The continuity of Ψ at ξ_α^∗ and the a.s. convergence of ξ_k toward ξ_α^∗ imply that

sup

n≥1E[(∆N_n)²|Fⁿ−1] < ∞ a.s.

Consequently, Assumption (A1) implies hN^γi_∞ = X

n≥1

γ_n²E[(∆N_n)²|Fn−1] < ∞ which in turn yields the a.s. convergence of (N_n^γ)_n_≥₁. _♦

(52)

3.3 Rate of convergence (I): CLT

Z_n = (ξ_n, C_n), where γ_n = κ

n n ≥ 1.

Under appropriate assumptions, with z^∗ := (V aR_α(ϕ(X)),Ψ-CV aR_α(ϕ(X)))

√n(Z_n − z^∗) −→ N^L (0,√

κ D^∗(κ)) where D^∗(κ) also depends on v, w_Ψ [Dh(z^∗)]⁻¹ where

h(ξ, c) =

E

^{(v(ξ, X}^{)), c} −

E

⁽ ^w^Ψ^{(ξ, X}⁾ ^.

and κ > ₂_ℜ_(λ¹

min), λ_min eigenvalue of [Dh(z^∗)]⁻¹ with the lowest real part.

To be compared to the regular Newton-Raphson algorithm (if d = 1 . . . ) Question How to minimize the asymptotic variance √

κ D^∗(κ) as a function of κ?

κ = 1

(53)

3.4 Rate of convergence (I): Averaging principle

Theorem. (Ruppert and Polyak’s Averaging Principle) Suppose that the R^d sequence (Z_n)_n_≥₀ is defined recursively by

Z_n+1 = Z_n − γ_n+1(h(Z_n) + ǫ_n+1)

where (ǫ_n)_n_≥₁ is an L^2+η-bounded sequence of martingale increments such that

∃ Γ ∈ S⁺(d,

R

⁾ ^{such that} ^E^[ǫⁿ⁺¹^ǫ^tn+1|Fⁿ] ^a.s→ Γ.

Assume that

{h = 0} = {z^∗} and M = Dh(z^∗) exists in GL(d,

R

⁾

is repulsive (i.e. all its eigenvalues are positive) and h(z) = Dh(z^∗)(z − z^∗) + O(|z − z^∗|²).

(54)

Set γ_n = _n^γ¹a with ¹₂ < a < 1, and Z¯_n+1 := Z₀ + · · · + Z_n

n + 1 = ¯Z_n − 1

n + 1( ¯Z_n − Z_n), n ≥ 0.

Then √

n Z¯_n − z^∗ _L

→ N (0, D^∗)

with D^∗ = M⁻¹Γ(M⁻¹)^T is optimal in term of “statistical efficiency”.

Comments. A proof of this result is given in Duflo’s book Algorithmes stochastiques (p.169).

Application to V aR_α(ϕ(X)) and Ψ-CV aR_α(ϕ(X)) computation.

Assume the distribution of ϕ(X) has a density f_ϕ(X₎ > 0.

Set ξ_α^∗ := V aR_α(ϕ(X)) (unique). Then D^∗ is given by





α(1−α) f_ϕ(X)² (ξ_α^∗)

α

(1−α)f_ϕ(X₎(ξ^∗_α)E

(Ψ(ϕ(X))−Ψ(ξ_α^∗

α

(1−α)fϕ(X)(ξ^∗_α)E

(Ψ(ϕ(X))−Ψ(ξ_α^∗))1_{_ϕ(X₎_≥_ξ_α^∗_} ₁

(1−α)²Var (Ψ(ϕ(X))−Ψ(ξ_α^∗)) 1

(55)

4 Speeding up the procedure

By the way V aR_α(ϕ(X)) is about rare events! If α ≈ 1, the matrix D^∗ explodes!

In practice, the algorithm remains frozen !

(Classical) solution Variance reduction by Importance sampling!

Assume

L(X) = p(x)λ_d(dx).

Example: Twist by mean translation the distribution of X to minimize both

-- Var 1_{_ϕ(X₎_≥_ξ_α^∗_}

(asymptotic variance of V aR_α) – Var (Ψ(ϕ(X)) − ξ_α^∗) 1_{_ϕ(X₎_≥_ξ_α^∗_}

(asymptotic variance of CV aR_α).

(56)

Finding parameters θ^∗ and µ^∗ that minimize the two functions:

Q^∗₁(θ) := E

1_{_ϕ(X₎_≥_ξ^∗_α_} p(X) p(X − θ)

Q^∗₂(µ) := E

(Ψ(ϕ(X)) − ξ_α^∗)² 1_{_ϕ(X₎_≥_ξ_α^∗_} p(X) p(X − µ)

. Objections. 1. Explosion ! We know how to encompass (see Part II) 2. ξ_α^∗ the target is unknown ! We make it “implicit”

(57)

4.1 General log -concave setting from Part II

If p satisfies the assumptions of the log-concave setting,

∃a ∈ [1,2] such that





(i) ^|∇_p^p^|(x) = O(|x|^a⁻¹) as |x| → ∞ (ii) ∃ρ > 0,log (p(x)) + ρ|x|^a is convex.

then, the first step of the machinery of Part II yields:

The optimal variance reducers (θ^∗_α, µ^∗_α) are zeros of

∇Q^∗₁(θ) = 0 and ∇Q^∗₂(µ) = 0 where