Quantization and martingale couplings

(1)

HAL Id: hal-03083022

https://hal.archives-ouvertes.fr/hal-03083022

Preprint submitted on 18 Dec 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Quantization and martingale couplings

Benjamin Jourdain, Gilles Pagès

To cite this version:

Benjamin Jourdain, Gilles Pagès. Quantization and martingale couplings. 2020. �hal-03083022�

(2)

Quantization and martingale couplings

Benjamin Jourdain^∗‡ Gilles Pag`es^†‡

Abstract

Quantization provides a very natural way to preserve the convex order when approximating two ordered probability measures by two finitely supported ones. Indeed, when the convex order dominating original probability measure is compactly supported, it is smaller than any of its dual quantizations while the dominated original measure is greater than any of its stationary (and therefore any of its optimal) quadratic primal quantization. Moreover, the quantization errors then correspond to martingale couplings between each original probability measure and its quantization. This permits to prove that any martingale coupling between the original probability measures can be approximated by a martingale coupling between their quantizations in Wassertein distance with a rate given by the quantization errors but also in the much finer adapted Wassertein distance. As a consequence, while the stability of (Weak) Martingale Op- timal Transport problems with respect to the marginal distributions has only been established in dimension 1 so far, their value function computed numerically for the quantized marginals converges in any dimension to the value for the original probability measures as the numbers of quantization points go to∞.

AMS Subject Classification (2010): 60E15, 65C50, 65D32, 60J22, 60G42.

Introduction

For d∈N^∗ ^and ^{µ, ν} ^{in the set} ^P⁽R^d) of probability measures on R^d, we say that µ is smaller thanν in the convex order and denoteµ≤_cvx ν if

∀ϕ:R^d^→R^convex ^, Z

R^d

ϕ(x)µ(dx)≤ Z

R^d

ϕ(y)ν(dy), (0.1)

when the integrals make sense (since any real valued convex function is bounded from below by an affine function R

R^dϕ(x)µ(dx) makes sense in R^{∪ {+∞}} ^{as soon as} ^R_R^d^|x|µ(dx) ^< ^{+∞). For} p ≥ 1, we denote by P_p(R^d^{) =} ^{µ^{∈ P(}R^d^{) :} ^R_R^d^|x|^p^µ(dx) ^< ^+∞} the Wasserstein space with indexpoverR^d^{. When}^{µ, ν} ^{∈ P}¹⁽R^d), according to the Strassen theorem [50], µ≤_cvxν if and only if there exists a martingale coupling between µ and ν that is a probability measure π(dx, dy) on R^d^×R^dwith marginals R

y∈R^dπ(dx, dy) and R

x∈R^dπ(dx, dy) equal toµ(dx) and ν(dy) respectively such that π(dx, dy) = µ(dx)π_x(dy) for some Markov kernel π_x(dy) with the martingale property:

∗Universit´e Paris-Est, Cermics (ENPC), INRIA, F-77455 Marne-la-Vall´ee, France. E-mail:

[email protected]

†Laboratoire de Probabilités, Statistique et Modélisation, UMR 8001, Campus Pierre et Marie Curie, Sorbonne Université case 158, 4, pl. Jussieu, F-75252 Paris Cedex 5, France. E-mail:[email protected]

‡This research benefited from the support of the “Chaire Risques Financiers”, Fondation du Risque

(3)

∀x∈R^d^,^π^x^{∈ P}¹⁽R^d^{) and} ^R_R^d^yπ^x^{(dy) =}x. We denote byP(µ, ν) the set of probability measures on R^d^×R^d with respective marginalsµ and ν and by M(µ, ν) the subset ofP(µ, ν) consisting of martingale couplings.

Let (µ, ν) belong to the set P_≤× P_p(R^d) of couples of elements of P_p(R^d) with the first one smaller than the second in the convex order. In its simplest form, the Martingale Optimal Transport problem consists in computing

Vc(µ, ν) = inf

π∈M(µ,ν)

Z

R^d^×R^d

c(x, y)π(dx, dy)

and the optimal martingale couplings achieving this infimum for some measurable cost function c:R^d^×R^d ^→ R. When the interest rate is zero, for an exotic option written on dtraded assets with payoff given by the functioncof their prices at timessandt, thenV_c(µ, ν) (resp. −V_−c(µ, ν)) provides a robust lower (resp. upper) price bound when µ and ν are the respective joint laws of these dassets at times s and t (for instance obtained by calibration of a model to vanilla option prices). Since its introduction in [9], this MOT problem has received recently a great attention in the financial mathematics literature. In particular, the structure of martingale optimal transport couplings [11, 13, 15, 22, 28], continuous time formulations [16, 21, 27], links with the Skorokhod embedding problem [8], numerical methods [2, 1, 14, 25, 26] and stability properties [6, 33, 52] have been investigated. The MOT problem is a particular instance where the measurable cost function C:R^d^{× P}¹⁽R^d⁾^→Ris linear in the measure component (C(x, η) =R

R^dc(x, y)η(dy)) of the Weak Martingale Optimal Transport problem

π∈M(µ,ν)inf Z

R^d

C(x, πx)µ(dx)

introduced in [6] by adding the martingale constraint to the Weak Optimal Transport problem introduced in [23].

To devise a numerical procedure devoted to the computation of the value and of the optimal couplings in the MOT and WMOT problems, a first natural step consists in approximating µ and ν by finitely supported probability measures which are still in the convex order. To our best knowledge, few studies consider the problem of preserving the convex order while approximating a sequence of probability measures. We mention the thesis of Baker [7] who proposes the following construction in dimension d = 1. Let for η ∈ P(R^{) and} ^u ^∈ ^(0,^1), ^Fη⁻¹(u) = inf{x ∈ R ^: η((−∞, x]) ≥ u} be the quantile of η of order u. For (µ, ν) ∈ P_≤× P₁(R^{) and} ^{N, K} ^∈ N^∗ ^with N/K∈N^∗^{, one has}

1 N

N

X

i=1

δ

NR_Nⁱ

i−1 N

Fµ⁻¹(u)du

≤_cvx 1 K

K

X

i=1

δ

KR_Kⁱ

i−1 K

Fν⁻¹(u)du

.

Dual (or Delaunay) quantizationintroduced by Pag`es and Wilbertz [43] and further studied in [44, 45, 46] gives another way to preserve the convex order in dimension d= 1 when using the same grid to quantize both probability measures.

In two recent papers [1, 2], Alfonsi, Corbetta and Jourdain propose to restore for (µ, ν) ∈ P_≤× P₁(R^d) the convex ordering from any finitely supported approximations ˜µ and ˜ν of µ and ν. In dimension d= 1, one may define the increasing (resp. decreasing) convex order by adding

(4)

the constraint that the test functionϕis non-decreasing (resp. non-increasing) in (0.1). According to [1], the convex order restauration can be achieved by keeping ˜µ (resp. ν) and replacing ˜˜ ν (resp. ˜µ) by the supremum (resp. infimum) between ˜µand ˜ν for the increasing convex order when R

Rxν(dx)˜ ≤R

Rxµ(dx) and the decreasing convex order when˜ R

Rx˜ν(dx)≥R

Rxµ(dx). The convex,˜ increasing convex and decreasing convex orders are nicely characterized in terms of the potential function obtained as the anti-derivative of the cumulative distribution function (or of the quantile function). The supremum and infimum of two probability measures for one of these orders can be computed using their potential functions. For a general dimension d, [2] suggests to keep ˜ν and replace ˜µ by its projection on the set of probability measures dominated by ˜ν for the quadratic Wasserstein distance W2 (see (0.2) below for the definition of this distance). This projection can be computed by solving a quadratic optimization problem with linear constraints.

In the present paper, when ν is compactly supported, we investigate the combined approximation of µ by some quadratic-optimal primal quantization and ofν by some dual quantization.

By construction, any quadratic-optimal primal quantization of µ satisfies a stationarity property which implies that it is smaller than µ in the convex order. On the other hand, any dual quantization of ν is greater than this probability measure in the convex order. Therefore the convex order betweenµand ν is preserved by this combined approximation. Notice that, in contrast with the previous approaches, it cannot be generalized to the convex order preserving approximation of more than two probability measures. Moreover the dual quantization approximation is only possi- ble for compactly supported probability measures. In contrast with these restrictions, we will see that the studied approach proves to provide robust approximations of (Weak) Martingale Optimal Transport problems even in dimensiond≥2.

The first section of the paper is devoted to primal (or Voronoi) quantization. For µ∈ P_p(R^d⁾ with p ≥ 1, we show that an element of the set P(R^d^{, N}) of probability measures on R^d ^whose support contains at most N points is an Lp-optimal N-quantization of µ iff it is a Wp-projection of µon P(R^d^{, N}) where the Wasserstein distance W_p is defined by

Wp(µ, ν)^p = inf

π∈P(µ,ν)

Z

R^d^×R^d

|x−y|^pπ(dx, dy). (0.2) In the quadratic p = 2 case, any stationary and therefore any optimal N-quantization ˆµ^N of the measure µis smaller thanµ in the convex order. MoreoverW₂(ˆµ^N, µ) =M₂(ˆµ^N, µ) where

M_p(η, ν) = inf

π∈M(η,ν)

Z

R^d^×R^d

|x−y|^pπ(dx, dy) for (η, ν)∈ P_≤× P₁(R^d^). ^(0.3) This enables us to check that any quadratic-optimal primal quantizer for µ remains quadratic- optimal for each probability measure smaller than µ and greater than its associated quantization for the convex order.

The second section deals with the dual quantization of a compactly supported probability measure µ which is obtained by minimizing Mp(µ, η) over the subset P_≥µ(R^d^{, N}^{) of} ^P(R^d^{, N}⁾ consisting in probability measures greater than µ in the convex order. Since M(µ, η) ⊂ P(µ, η), M_p(µ, η) ≥ W_p(µ, η) and, in general, the inequality is strict. It turns out, that, even in the quadratic p = 2 case, Lp-optimal dual quantizations of µ are not necessarily Wp-projections of µ on P_≥µ(R^d^{, N}). Nevertheless, we check that any quadratic-optimal dual quantizer of µ remains quadratic-optimal for each probability measure greater thanµand smaller than its associated dual quantization in the convex order.

(5)

In the third section, we consider two probability measures µ, ν ∈ P(R^d) such that µ ≤_cvx ν with ν compactly supported. Since the quantization errors between µ and any of its quadratic- optimal primalN-quantization ˆµ^N and between ν and any of itsLp-optimal dualK-quantization ˇ

ν^K correspond to martingale couplings, we are able to approximate in Wasserstein distance on P(R^d^×R^d) any martingale coupling π ∈ M(µ, ν) by a martingale coupling ¯π^N,K ∈ M(ˆµ^N,νˇ^K) with a rate given by the quantization errors. We also check that, asN, K → ∞, ¯π^N,Kconverges toπ for the much finer adapted Wasserstein distance defined in (0.5) below which captures the temporal structure of probability distributions with two time marginals. Numerous financial applications of this adapted Wasserstein distance have been investigated in [3]. According to [4], the topology induced by this distance is equal to the other adapted topologies which had been introduced in particular in view of financial applications.

The adapted Wasserstein is particularly well suited to deal with Weak Martingale Optimal Transport problems. While their stability with respect to the marginal distributions has only been established in dimension 1 so far [10], this enables us to check in Section 4 that their value function computed numerically for the quantized marginals converge in any dimension to the value for the original probability measures as the numbers of quantization points go to ∞.

For the reader’s convenience, we now list the notations and basic properties that have been introduced in the above text.

Definitions and notations.

Let d∈N^∗

• | · |denotes the canonical Euclidean norm onR^d^.

•conv(A) denotes the (closed) convex hull ofA⊂R^dand card(A) or|A|its cardinality (depending on the context).

•LetP(R^d) denote the set of probability measures onR^dendowed with its Borel sigma-fieldB(R^d^).

• Forp≥1, let P_p(R^d^{) =}^{µ^{∈ P(}R^d^{) :}^R_R^d^|x|^p^µ(dx)^<^∞}.

• Forp≥1, let P_≤× P_p(R^d^{) =}^{{(µ, ν)}^{∈ P}p(R^d⁾^{× P}p(R^d^{) :}^µ^≤cvx ν}.

• For every integer N ≥1, we denote by P(R^d^{, N}) the set of distributions on R^d whose support contains at mostN points.

• Forµ, ν ∈ P(R^d^{), let} P(µ, ν) =n

π∈ P(R^d^×R^d^{) :} ^∀A^{∈ B(}R^d^{), π(A}^×R^d^{) =}^{µ(A) and}^π(R^d^×^{A) =}^ν(A) o

denote the set of couplings betweenµand ν. Forπ∈ P(µ, ν), we denote byπ_x(dy) the (µ(dx) a.e.

unique) Markov kernel such that π(dx, dy) =µ(dx)πx(dy).

• Forµ, ν ∈ P₁(R^d^{) with}^µ^≤^cvx^{ν, let} M(µ, ν) =

π ∈ P(µ, ν) : µ(dx) a.e.

Z

R^d

yπx(dy) =x

denote the (non empty according to the Strassen theorem) set of martingale couplings between µ and ν.

• Forp≥1 and µ, ν∈ P(R^d^{), let} Wp(µ, ν) = inf

π∈P(µ,ν)

Z

R^d^×R^d

|x−y|^pπ(dx, dy) 1/p

≤ ∞

(6)

denote the Wasserstein distance with indexp. This is a complete metric onP_p(R^d^).

• Forp≥1 and (µ, ν)∈ P_≤× P₁(R^d^{), let} Mp(µ, ν) = inf

π∈M(µ,ν)

Z

R^d^×R^d

|x−y|^pπ(dx, dy) 1/p

≤ ∞.

Since M(µ, ν)⊂ P(µ, ν), we clearly have, W_p(µ, ν)≤M_p(µ, ν). In the quadratic p= 2 case, when µ, ν ∈ P₂(R^d^),

∀π ∈ M(µ, ν), Z

R^d^×R^d

|y−x|²π(dx, dy) = Z

R^d

|y|²ν(dy)−2 Z

R^d

x.

Z

R^d

yπx(dy)µ(dx) + Z

R^d

|x|²µ(dx)

= Z

R^d

|y|²ν(dy)− Z

R^d

|x|²µ(dx)

so that

M₂²(µ, ν) = Z

R^d

|y|²ν(dy)− Z

R^d

|x|²µ(dx). (0.4)

• Forπ ∈ P(µ, ν) and ˜π ∈ P(˜µ,˜ν) we consider the adapted Wasserstein distance with indexp≥1 betweenπ(dx, dy) =µ(dx)πx(dy) and ˜π(d˜x, d˜y) = ˜µ(d˜x)˜πx˜(d˜y) :

AWp(π,π) =˜ inf

m∈P(µ,˜µ)

Z

R^d^×R^d

(|x−x|˜^p+W_p^p(πx,π˜x˜))m(dx, d˜x) 1/p

. (0.5)

1 Primal (Voronoi) quantization and Wasserstein projection

In this subsection, we make a connection between primal quantization and various projections (in the Wasserstein sense), including, in the quadratic case, with the one mentioned above in the introduction. Let us first recall the following basic facts about the (primal) Voronoi quantization of µ∈ P_p(R^d^{) with}^p^≥1 (see [24, 41, 37] among others):

– Let Γ = {x₁, . . . , x_N} ⊂ R^d denote a finite subset of size N. The L^p-quantization error modulusep(Γ, µ) satisfies

e_p(Γ, µ)^p = Z

R^d

|x−Proj_Γ(x)|^pµ(dx) (1.6) where Proj_Γdenotes a Borel nearest neighbour projection on Γ satisfying|x−Proj_Γ(x)|= dist(x,Γ).

IfX ∼µ, the random variable Proj_Γ(X) with law µ◦Proj⁻¹_Γ is called Γ-quantization of X.

– For any level N ≥1, there exists an optimal grid orN-quantizerΓ_p,N such that ep,N(µ) := inf

n

ep(Γ, µ) : Γ⊂R^d^, ^{|Γ| ≤}^N o

=ep Γp,N, µ .

When card(supp(µ)) ≤ N then Γ_p,N = supp(µ) and when card(supp(µ)) > N, then Γ_p,N has exactlyN pairwise distinct elements. Moreover, Theorem 4.2 in [24] ensures that

µ

{x∈R^d^:^∃y^{6= ˜}^y^∈^Γ^p,N^, ^|x⁻^y|⁼^|x⁻^y|^˜ ^{= dist(x,}^Γ^p,N^)}

= 0, (1.7)

(7)

so thatµ◦Proj⁻¹_Γ

p,N does not depend on the choice of the Borel nearest neighbour projection Proj_Γ_p,N on Γ_p,N. We denote this probability measure by ˆµ^Γ^p,N. In the same way, whenX ∼µ, Proj_Γ

p,N(X) a.s. does not depend on the Borel nearest neighbour projection Proj_Γ_p,N and is denoted ˆX^Γ^p,N.

– In the quadratic case (p= 2), any optimal quantization grid Γ_2,N (possibly not unique) and its induced quantization ˆX^Γ^2,N = Proj_Γ_2,N(X) with distribution ˆµ^Γ^2,N satisfy a stationarity (or self-consistency) property (see e.g. [24], [41] or [37], Proposition 5.1 among others) that is

E ^X^|^X^ˆ^Γ^2,N^{= ˆ}^X^Γ^2,N ^{so that ˆ}^µ^Γ^2,N ^≤^cvx ^µ. ^(1.8) Indeed, the support of the distribution of E ^X^|^X^ˆ^Γ^2,N is equal to ˜Γ_2,N :={E^(X|^X^ˆ^Γ^2,N ⁼^{x), x}^∈ Γ2,N} (when card(supp(µ)) ≤ N, ˆX^Γ^2,N = X and ˜Γ2,N = Γ2,N = supp(µ)), contains at most N points ande2(˜Γ2,N, µ)²=E^(dist(X,^Γ^˜^2,N⁾²⁾^≤E^(|X⁻E ^X^|^X^ˆ^Γ^2,N^|²). As a consequence,

e2,N(µ)² ≤e2(˜Γ2,N, µ)² ≤E^(|X⁻E ^X^|^X^ˆ^Γ^2,N^|²^{) =}E^(|X⁻^X^ˆ^Γ^2,N^|²⁾⁻E^(|^X^ˆ^Γ^2,N ⁻E ^X^|^X^ˆ^Γ^2,N^|²⁾

=e_2,N(µ)²−E^(|^X^ˆ^Γ^2,N ⁻E ^X^|^X^ˆ^Γ^2,N^|²^),

so that the second term in the right-hand side vanishes. Therefore the distribution of ( ˆX^Γ^2,N, X) belongs to M(ˆµ^Γ^2,N, µ) and

e²_2,N(µ) =E^[|X⁻^X^ˆ^Γ^2,N^|²^{] =}E^[|X|²^]⁻E^[|^X^ˆ^Γ^2,N^|²^{] =}^M2²(ˆµ^Γ^2,N, µ). (1.9) Proposition 1.1 Let p∈[1,+∞) and µ∈ P_p(R^d^).

(a) Let Γ⊂R^d be a finite set andP(Γ) denote the subset of Γ-supported distributions. Then Wp µ,P(Γ)

:= inf

ν∈P(Γ)Wp(µ, ν) =ep(Γ, µ) :=

dist(.,Γ) L^p(µ)

and for any Borel nearest neighbour projection Proj_Γ onΓ, µ◦Proj⁻¹_Γ is a W_p-projection of µ on P(Γ).

(b) The probability measure ν ∈ P(R^d^{, N}⁾ ^{is a} ^W^p-projection of µ on P(R^d^{, N}⁾ ^iff ^ν ^{= ˆ}^µ^Γ^N ^for some Lp-optimal N-quantizer ΓN of µ. Moreover, Wp(µ,P(R^d^{, N}^{)) =}^e^p,N^(µ).

(c) Quadratic case (p = 2). A subset Γ of R^d with cardinality at most N is a quadratic optimal N-quantizer of µiff there exists a probability measure ν∈ P_≤µ(R^d^{, N}⁾ ^{such that}^{ν(Γ) = 1} ^{and one} of the following equivalent conditions is satisfied

— ν is a W2-projection of µ onP_≤µ(R^d^{, N}⁾ ^i.e.

W₂(µ, ν) =W₂(µ,P≤µ(R^d^{, N}^{)) :=} ^inf

η∈P_≤µ(R^d^,N)

W₂(µ, η),

— R

R^d|x|²ν(dx) = sup_η∈P_≤µ₍_Rd,N)

R

R^d|x|²η(dx).

Moreover, we then have W2(ν, µ) =M2(ν, µ) andν = ˆµ^Γ.

Apart from the interpretation in terms of Wp-projection, the first statement can be found in Lemma 3.4 p.33 [24]. Before proving the proposition, let us state and check some easy consequence of the necessary and sufficient condition in (b).

(8)

Corollary 1.2 Let Γ_2,N be a quadratic optimal N-quantizer of µ∈ P₂(R^d). Then for any probability measure ν such that µˆ^Γ^2,N ≤_cvx ν ≤_cvx µ, Γ_2,N is a quadratic optimal N-quantizer of ν and ˆ

ν^Γ^2,N = ˆµ^Γ^2,N.

Proof of Corollary 1.2. Since ˆµ^Γ^2,N ∈ P_≤ν(R^d^{, N}⁾ ^{⊂ P}^≤µ⁽R^d^{, N}^{) and} ^R_R^d^|x|²^µ^ˆ^Γ^2,N^{(dx) =} sup_η∈P_≤µ₍_Rd,N)

R

R^d|x|²η(dx) by the necessary condition in Proposition 1.1 (c), one has Z

R^d

|x|²µˆ^Γ^2,N(dx) = sup

η∈P≤ν(R^d^,N)

Z

R^d

|x|²η(dx).

Therefore, by the sufficient condition in Proposition 1.1 (c), Γ_2,Nis a quadratic optimalN-quantizer

of ν and ˆν^Γ^2,N = ˆµ^Γ^2,N. 2

Proof of Proposition 1.1. (a) Letν∈ P(Γ) andπ ∈ P(µ, ν). Thenµ(dx) a.e.,πx(Γ) = 1 so that πx(dy) a.e. |x−y| ≥dist(x,Γ). Therefore

Z

R^d^×R^d

|x−y|^pπ(dx, dy)≥ Z

dist(x,Γ)^pµ(dx) =e_p(Γ, µ)^p. Taking the infimum over π ∈ P(µ, ν) and ν∈ P(Γ), we deduce that Wp µ,P(Γ)p

≥ ep(Γ, µ)^p. Now let Proj_Γ denote a Borel nearest neighbour projection on Γ. Since µ◦Proj⁻¹_Γ ∈ P(Γ) and µ◦(I_d,Proj_Γ)⁻¹ ∈ P(µ, µ◦Proj⁻¹_Γ ), we have

W_p^p µ,P(Γ)

≤W_p^p(µ, µ◦Proj⁻¹_Γ )≤ Z

R^d

|x−Proj_Γ(x)|^pµ(dx) =e_p(Γ, µ)^p,

where the equality follows from (1.6). Therefore the inequalities are equalities andµ◦Proj⁻¹_Γ is a Wp-projection ofµ on P(Γ).

(b) Let Γ_p,N be an L^p-optimal N-quantizer of µ. Then, for any subset Γ of R^d with at most N points, ep(Γp,N, µ) =ep,N(µ)≤ep(Γ, µ). With (a), we deduce that

Wp(µ,µˆ^Γ^p,N)≤Wp(µ,P(Γ)).

By taking the infimum over Γ, we deduce that Wp(µ,µˆ^Γ^p,N) ≤ Wp(µ,P(R^d^{, N}^{)) and ˆ}^µ^Γ^p,N ^{is a} Wp-projection ofµ on P(R^d^{, N}^{) so that}

ep,N(µ) =ep(µ,Γp,N) =Wp(µ,µˆ^Γ^p,N) =Wp(µ,P(R^d^{, N}^)).

Conversely, let ν be a projection of µ on P(R^d^{, N}) and let ΓN ={x ∈R^d ^:^ν({x})^>^{0}. The} cardinality of Γ_N is at most N. For any subset Γ ofR^d with at mostN points,

W_p(µ,P(Γ_N))≤W_p(µ, ν) =W_p(µ,P(R^d^{, N}⁾⁾^≤^Wp(µ,P(Γ)). (1.10) Therefore, by (a),e_p(Γ_N, µ)≤e_p(Γ, µ) and since Γ is arbitrary, we deduce that Γ_N is anL^p-optimal N-quantizer of µ. Moreover, the choice Γ = Γ_N in (1.10) implies that the first inequality is an equality so that, with (a),Wp^p(µ, ν) =R

R^ddist(x,ΓN)^pµ(dx). Hence, for anyWp-optimal coupling π∈ P(µ, ν),

Z

R^d^×R^d

(|y−x|^p−dist(x,ΓN)^p)π(dx, dy) = 0.

(9)

Since 1 =ν(Γ_N) =R

R^dπx(Γ_N)µ(dx), π(dx, dy) a.e., |y−x|^p ≥dist(x,Γ_N)^p. Therefore, π(dx, dy) a.e. y ∈Γ_N and |y−x|= dist(x,Γ_N). With (1.7), we conclude that µ(dx) a.e. there is a unique point ˆx^Γ^N ∈ ΓN such that |ˆx^Γ^N −x|= dist(x,ΓN) andπx(dy) =δ_ˆ_x^ΓN(dy). Therefore the second marginalν ofπ is equal to ˆµ^Γ^N.

(c) In the quadratic case (p = 2), by (b), (1.8) and (1.9), any W₂-projection ν of µ on P(R^d^{, N}⁾ is characterized by the existence of a quadratic optimal N-quantizer Γ2,N of µ such that ν = ˆ

µ^Γ^2,N, belongs to the smaller setP_≤µ(R^d^{, N}) and satisfies e²_2,N(µ) =M₂²(ν, µ). Therefore the W2- projections ofµonP(R^d^{, N}^{) and on}^P^≤µ⁽R^d^{, N}) coincide. To conclude the proof, let us check that ν is such a projection iffR

R

R^d|x|²η(dx). Ifη ∈ P_≤µ(R^d^{, N}), then, by the comparison between W2 and M2 given in the introduction and (0.4), one has

W₂²(µ, η)≤M₂²(η, µ) = Z

R^d

|y|²µ(dy)− Z

R^d

|x|²η(dx). (1.11) Letν be aW₂-projection ofµ onP≤µ(R^d^{, N}). Using (b) for the third equality then (1.11) for the last inequality and the last equality, we obtain that

Z

R^d

|y|²µ(dy)− Z

R^d

|x|²ν(dx) =M₂²(ν, µ) =e_2,N(µ)² =W₂² µ,P(R^d^{, N}⁾

=W₂² µ,P_≤µ(R^d^{, N}⁾^≤ ^inf

η∈P≤µ(R^d^,N)

M₂²(η, µ)

= Z

R^d

|y|²µ(dy)− sup

Z

R^d

|x|²η(dx).

Since ν∈ P_≤µ(R^d^{, N}), the two inequalities are equalities. Therefore Z

R^d

|x|²ν(dx) = sup

Z

R^d

|x|²η(dx)

and since W₂²(µ,P(R^d^{, N}⁾ ^≤ ^W2²(µ, ν) ≤ M₂²(ν, µ), these two inequalities are equalities and W₂²(µ, ν) =M₂²(ν, µ). Moreover,

e_2,N(µ)² =W₂² µ,P_≤µ(R^d^{, N}⁾⁼ Z

R^d

|y|²µ(dy)− sup

Z

R^d

|x|²η(dx).

If ν ∈ P_≤µ(R^d^{, N}) is such that R

R

R^d|x|²η(dx), the last equality combined with (1.11) written forη=ν ensures that ν is aW2-projection of µ onP_≤µ(R^d^{, N}^). ² Remark about uniqueness. As a consequence of Proposition 1.1 (b), it turns out that the uniqueness of W_p-projections of µ on P(R^d^{, N}), that of distributions ˆµ^N of L^p-optimal N-quantizations and that of L^p-optimal N-quantizers are equivalent. In dimension d= 1, for p= 2, distributions with log-concave densities have a unique optimal N-quantizer (see Kiefer [35]) hence this projection is unique. In higher dimension, a general result seems difficult to reach: indeed, the N(0;I_d) distribution, being invariant under the action ofO(d,R) (orthogonal transforms), so are the (hence infinite) sets of its optimal quantizers at levelsN ≥2.

Let us recall the sharp rate of convergence of the L^p-quantization error stated for instance in Theorem 5.2 [37].

(10)

Theorem 1.3 (Pierce Lemma for primal quantization) Let p ≥ 1 and η > 0. For every dimension d ≥ 1, there exists a real constant Ce_d,η,p^vor > 0 such that, for every random vector X : (Ω,A,P⁾^→R^d^,

e_p,N(X)≤Ce_d,η,p^vor N⁻¹^dσ_p+η(X) (1.12) where, for every r >0,σ_r(X) = inf_a∈_RdkX−ak_r≤+∞.

Example. Let us explicit the quadratic optimal primal N-quantizer of µ(dx) = ¹^]0,1[₂^√^(x)_x dx. In [20], by checking that the distortion function has a unique critical point, Fort and Pag`es prove uniqueness and derive semi-closed forms for optimal quantizers of three families of one-dimensional distributions indexed by ρ >0 : the exponential distributions 1(0,∞)(x)ρe^−ρxdx, the power distri- butionsρ1_(0,1)(x)x^ρ−1dxon the interval (0,1) which includeµforρ= ¹₂ and the power distributions ρ1_(1,∞)(x)x^−(1+ρ)dx on the interval (1,+∞). For µ(dx) and p = 2, their semi-closed form relies on the inductive solution of third degree equations. Working with a different parametrization, we will end up with much simpler second degree equations. We have F_µ(x) = 1_{x≥0}√

x and F_µ⁻¹(u) =u² foru∈(0,1). We parametrize the boundaries of the optimal quadratic Voronoi cells by x_k+¹

2 = F_µ⁻¹(qk) = q²_k for k ∈ {0, . . . , N} where 0 = q0 < q1 < q2 < . . . < qN−1 < qN = 1.

In particular x₀₊1

2 = 0 and x_N+1

2 = 1. By the stationarity property (1.8), for k ∈ {1, . . . , N}, xk = _q ¹

k−qk−1

R^xk+ 12

x_k−1 2

xµ(dx) (by (1.7), µ({x_k−1

2}) = µ({x_k+1

2}) = 0 and we do not need to worry about the inclusion of the boundary points in the integration interval). The equality x_k+¹

2 = ^x^k^+x₂^k+1 valid fork∈ {1, . . . , N−1} also writes F_µ⁻¹(q_k) = 1

2

1 qk−qk−1

Z qk

qk−1

F_µ⁻¹(u)du+ 1 qk+1−qk

Z qk+1

q_k

F_µ⁻¹(u)du

! .

For the above choice ofµ, we obtain q_k² = 1

6 q_k²+q_kqk−1+q_k−1² +q²_k+1+q_k+1q_k+q_k²

so that q_k+1² +q_k+1q_k+q_kqk−1+q_k−1² −4q_k² = 0.

We deduce that qk+1 = ck+1q1, where c²_k+1 +ck+1ck +ckck−1+c²_k−1−4c²_k = 0 so that ck+1 =

q

17c²_k−4c_kck−1−4c²_k−1−c_k

2 . Starting fromc0 = 0 andc1= 1, we easily compute inductively the factors c_k (for instance c₂ =

√17−1

2 ). To ensure q_N = 1, we need q₁ = _c¹

N. Therefore q_k = _c^c^k

N for k∈ {0, . . . , N}and the optimal quadratic primal grid is given by

x_k= 1 q_k−qk−1

Z q_k qk−1

F_µ⁻¹(u)du= q_k²+q_kqk−1+q_k−1²

3 = c²_k+c_kck−1+c²_k−1

3c²_N , k∈ {1, . . . , N}.

Of course for a < b, when µ(dx) = ¹^]a,b[^(x)

2√

(b−a)(x−a)dx, thenxk=a+ (b−a)^c

2

k+ckck−1+c²_k−1

3c²_N and when µ(dx) = ¹^]a,b[^(x)

2√

(b−a)(b−x)dx,x_N+1−k=b−(b−a)^c

2

k+ckck−1+c²_k−1 3c²_N .

(11)

2 Dual (Delaunay) quantization

We assume throughout this section that µis compactly supported. Let X: (Ω,A,P⁾^→R^d ^be a random vector lying in L^∞(P) with distribution µ. Optimal dual (or Delaunay) quantization as introduced in [44] relies on the best approximation which can be achieved by a discrete random vector ˇX that satisfies a certain stationarity assumption on the extended probability space (Ω× [0,1],A ⊗ B([0,1]),P^⊗^{λ) where} ^B([0,^{1]) and} ^λ respectively denote the Borel sigma field and the Lebesgue measure on the interval [0,1]. To be more precise, we define, forp∈[1,+∞),

d_p,N(X) = inf

Xˇ

n

X−Xˇ

p: ˇX : (Ω×[0,1],A ⊗ B([0,1]),P^⊗^λ)^→R^d^, card ˇX(Ω×[0,1])

≤N and E^{( ˇ}^{X|X) =}^X o

.

For every levelN ≥d+1, the set of such ˇXis not empty. Indeed, one may choosed+1 points whose convex hull has a non empty interior and includes the support of µ. Then the unique probability measure supported on these points with the same expectation asµbelongs to the setP_≥µ(R^d^{, d+1)} of distributions dominating µ for the convex order and supported by at mostd+ 1 elements. By Lemma 2.22 in [34], we see that for eachν ∈ P_≥µ(R^d^{, N}) and each martingale couplingπ∈ M(µ, ν), there exists on (Ω×[0,1],A ⊗ B([0,1]),P^⊗λ) a random vector ˇX such that (X,X) is distributedˇ according to π and therefore satisfies E^{( ˇ}^{X|X) =}^{X. Hence}

dp,N(X)^p = inf

ν∈P≥µ(R^d^,N)

π∈M(µ,ν)inf Z

R^d^×R^d

|y−x|^pπ(dx, dy) = inf

ν∈P≥µ(R^d^,N)

M_p^p(µ, ν). (2.13) As a consequence, d_p,N(X) only depends on the distributionµ ofX and can subsequently also be denoted d_p,N(µ). Next, one easily checks thatP_≥µ(R^d^{, N}^{) =}^SΓ∈GNP_≥µ(Γ) where

G_N ={Γ⊂R^d with cardinality≤N and such that supp(µ)⊂conv(Γ)}, andP_≥µ(Γ) ={ν ∈ P(R^d^{) :}^µ^≤cvxν and ν(Γ) = 1}.

For Γ ∈ G_N, there exists a dual projection Proj^del_Γ : conv Γ

×[0,1] → Γ, also called a splitting operator, which satisfies, beyond measurability, the following stationarity property

∀y∈conv Γ ,

Z 1 0

Proj^del_Γ (y, u)du=y, (2.14) from which one derives the dual stationarity property

E ^Proj^delΓ (X, U) X

=X whenU ∼ U([0,1]) is independent of X. (2.15) The stationarity property remains valid as soon as X is conv(Γ)-valued and implies that the distribution of Proj^del_Γ (X, U) belongs to P_≥µ(Γ) which is therefore non empty.

For Γ∈ G_N, let

d_p(µ,Γ)^p= inf

ν∈P≥µ(Γ)M_p^p(µ, ν), so that d_p,N(µ)^p= infΓ∈G_Nd_p(µ,Γ).

(12)

In dimension d = 1, when Γ = {x₁, x2, . . . , x_N} with x1 < x2 < . . . < x_N, the probability measure minimizing ν 7→ Mp^p(µ, ν) over P_≥µ(Γ) is the distribution ˇµ^Γ of Proj^del_Γ (X, U) for the splitting operator

Proj^del_Γ (x, u) =

N−1

X

i=1

1_[x_i_,x_i+1₎(x)

1_{u≤xi+1−x

xi+1−xi}x_i+ 1_{u>xi+1−x xi+1−xi}x_i+1

+ 1_{x=x_N_}x_N. Moreover, the coupling minimizing R

R^×R|x−y|^pπ(dx, dy) over M(µ,µˇ^Γ) is the distribution of (X,Proj^del_Γ (X, U)). Last, according to the remark after Proposition 10 in [44], whenµ≤_cvx η with ηcompactly supported in [x1, xN], then ˇµ^Γ≤_cvxηˇ^Γ. This can be seen using the affine interpolation on Γ

ˇ

ϕ^Γ(x) := 1_(−∞,x₁_)∪[x_N_,+∞)(x)ϕ(x) +

N−1

X

i=1

1_[x_i_,x_i+1₎(x)

x_i+1−x xi+1−xi

ϕ(x_i) + x−x_i xi+1−xi

ϕ(x_i+1)

of a convex functionϕ:R^→R^{. Indeed ˇ}^ϕ^Γ is still convex and one has Z

R

ϕ(x)ˇµ^Γ(dx) = Z

R

ˇ

ϕ^Γ(x)µ(dx)≤ Z

R

ˇ

ϕ^Γ(x)η(dx) = Z

R

ϕ(x)ˇη^Γ(dx).

According to the introduction in [2], this convex order preservation does not generalize to higher dimensions where the minimizers are not so easy to express.

Whatever the dimension d ∈ N^∗, for every level N ≥ d+ 1, there exists an L^p-optimal dual quantization grid Γ^del_p,N and a splitting operator Proj^del_Γdel

p,N

(see [44]) such that dp,N(µ) =dp(µ,Γ^del_p,N) =kX−Proj^del_Γdel

p,N(X, U)k_p and Proj^del_Γdel

p,N

(X, U) takes each value in Γ^del_p,N with positive probability. For more details on this dual projection, see [44, 45] where this notion has been developed and analyzed. We will see in the examples that even in dimension one, the convex order is not preserved by optimal dual quantization.

Notice that by Proposition 1.1 (b), the inequalityWp^p(µ, ν)≤Mp^p(µ, ν) valid forν∈ P_≥µ(R^d^{, N}⁾ and (2.13),

e_p,N(µ) =W_p(µ,P(R^d^{, N}⁾⁾^≤^Wp(µ,P_≥µ(R^d^{, N}⁾⁾^≤ ^inf

ν∈P≥µ(R^d^,N)

M_p(ν, µ) =d_p,N(µ). (2.16) We may wonder whether the last inequality is an equality. Combining the tightness of any sequence of probability measures inP≥µ(R^d^{, N}) minimizing theWp-distance toµdeduced from the inequality

∀ν∈ P(R^d^), Z

R^d

|x|^pν(dx)≤2^p−1 Z

R^d

|x|^pµ(dx) +W_p^p(µ, ν)

,

the closedness ofP_≥µ(R^d^{, N}) for the weak convergence topology and the lower semi-continuity of the Wasserstein distance for this topology (see for instance Remark 6.12 p97 [51]), we obtain the

(13)

existence of aWp-projection ˜µ of µon P_≥µ(R^d^{, N}). The last inequality in (2.16) is an equality iff the set

π ∈ P(µ,µ) :˜ W_p^p(µ,µ) =˜ Z

R^d^×R^d

|y−x|^pπ(dx, dy)

of Wp-optimal couplings between µ and some Wp-projection ˜µ of µ on P_≥µ(R^d^{, N}) intersects M(µ,µ). Moreover, ˜˜ µ is then the distribution of an L^p-optimal dual N-quantization of µ. But there is no reason why the intersection should be non empty. We also may wonder whether, by a somewhat naive symmetry with the situation described in Proposition 1.1 (b) for the Voronoi quantization, the distribution of anL^p-optimal dual N-quantization ofµcoincides with aW_p-projection

˜

µofµonP_≥µ(R^d^{, N}). According to the examples below, this property holds whenµis the uniform distribution on the interval [0,1] (nevertheless Wp(U[0,1],P_≥U[0,1](R^d^{, N}⁾⁾^{< d}^p,N^(U[0,1])) but is not true in general.

Note that, in the quadratic case p= 2, for ν ∈ P_≥µ(R^d^{, N}^{), since} ^M2²(µ, ν) =R

R^d|y|²ν(dy)− R

R^d|x|²µ(dx),

d_2,N(µ)²= inf

ν∈P≥µ(R^d^,N)

Z

R^d

|y|²ν(dy)− Z

R^d

|x|²µ(dx). (2.17) Proposition 2.1 In the quadratic case p = 2, any optimal dual quantization grid Γ^del_2,N remains optimal for each probability measureηsuch thatµ≤_cvx η≤_cvxµˇ_N whereµˇ_N denotes the distribution of Proj^del_Γdel

2,N(X, U).

Proof. Let η be such that µ ≤_cvx η ≤_cvx µˇ_N. Since inf_ν∈P_≥µ₍_Rd,N)

R

R^d|y|²ν(dy) is attained for ν = ˇµ_N, so is inf_ν∈P_≥η₍_Rd,N)

R

R^d|y|²ν(dy). Therefore, (2.17) written withηreplacingµimplies that d_2,N(η)²=

Z

R^d

|y|²µˇ_N(dy)− Z

R^d

|x|²η(dx) =M₂²(η,µˇ_N).

With the definition of d₂(η,Γ^del_2,N), we deduce that d_2,N(η)² ≥ d₂(η,Γ^del_2,N)². Therefore Γ^del_2,N is an

optimal dual quadratic quantization grid for η. 2

Examples. (a) Let µ = U[0,1] where, for two real numbers a < b, U[a, b] denotes the uniform distribution on [a, b] with density ¹^[a,b]_b−a^(x) with respect to the Lebesgue measure. We consider its approximation by probability measures in P(R^{, N}). A generic element of P(R^{, N}^{) writes}

ν_N =

N

X

k=1

p_kδ_x_k

with x₁ ≤x₂ ≤. . . ≤x_N and (p₁, . . . , p_N) ∈[0,1]^N satisfying P_N

k=1p_k = 1. We will consider the particular choices ˆµ_N = _N¹ PN

k=1δ^2k−1

2N

and ˇµ_N = _2(N−1)¹ δ₀ + _N−1¹ PN−1 k=2 δ^k−1

N−1 + _2(N−1)¹ δ₁ of the respective distributions of the optimal primal and dual quantizations ofµon N points. For p≥1, let us recover that ˆµN is the Wp-projection ofU[0,1] onP(R^{, N}) (consequence of Proposition 1.1 (b)) and check that ˇµ_N is the W_p-projection of U[0,1] on P_≥U[0,1](R^{, N}). The image of µ by (0,1)3u7→F_ν⁻¹

N(u)−u is equal to

ηN :=p1U[x1−p1, x1] +p2U[x2−(p1+p2), x2−p1] +. . .+pNU[x_N −1, xN −(p1+. . .+pN−1)].