Stochastic domination for iterated convolutions and catalytic majorization

(1)

www.imstat.org/aihp 2009, Vol. 45, No. 3, 611–625

DOI: 10.1214/08-AIHP175

Stochastic domination for iterated convolutions and catalytic majorization ¹

Guillaume Aubrun and Ion Nechita

Université de Lyon, Université Lyon 1, CNRS, UMR 5208 Institut Camille Jordan, Batiment du Doyen Jean Braconnier, 43, boulevard du 11 novembre 1918, F - 69622 Villeurbanne Cedex, France. E-mail: aubrun@math.univ-lyon1.fr; nechita@math.univ-lyon1.fr

Received 24 September 2007; revised 26 March 2008; accepted 4 April 2008

Abstract. We study how iterated convolutions of probability measures compare under stochastic domination. We give necessary and sufficient conditions for the existence of an integernsuch thatμ^∗ⁿis stochastically dominated byν^∗ⁿfor two given probability measuresμandν. As a consequence we obtain a similar theorem on the majorization order for vectors inR^d. In particular we prove results about catalysis in quantum information theory.

Résumé. Nous étudions comment les convolutions itérées des mesures de probabilités se comparent pour la domination stochas- tique. Nous donnons des conditions nécessaires et suffisantes pour l’existence d’un entier ntel queμ^∗ⁿ soit stochastiquement dominée parν^∗ⁿ, étant données deux mesures de probabilitésμetν. Nous obtenons en corollaire un théorème similaire pour des vecteurs deR^d et la relation de Schur-domination. Plus spécifiquement, nous démontrons des résultats sur la catalyse en théorie quantique de l’information.

MSC:Primary 60E15; secondary 94A05

Keywords:Stochastic domination; Iterated convolutions; Large deviations; Majorization; Catalysis

Introduction and notations

This work is a continuation of [1], where we study the phenomenon of catalytic majorization in quantum information theory. A probabilistic approach to this question involves stochastic domination which we introduce in Section 1 and its behavior with respect to the convolution of measures. We give in Section 2 a condition on measuresμandνfor the existence of an integer nsuch that μ^∗ⁿ is stochastically dominated by ν^∗ⁿ. We gather further topological and geometrical aspects in Section 3. Finally, we apply these results to our original problem of catalytic majorization. In Section 4 we introduce the background for quantum catalytic majorization and we state our results. Section 5 contains the proofs and in Section 6 we consider an infinite dimensional version of catalysis.

We introduce now some notation and recall basic facts about probability measures. We write P(R)for the set of probability measures onR. We denote byδx the Dirac mass at pointx. Ifμ∈P(R), we write suppμfor the support ofμ. We write respectively minμ∈ [−∞,+∞)and maxμ∈(−∞,+∞]for min suppμand max suppμ. We also write μ(a, b) andμ[a, b] as a shortcut forμ((a, b)) andμ([a, b]). The convolution of two measures μ andν is denotedμ∗ν. Recall that ifXandY are independent random variables of respective lawsμandν, the law ofX+Y is given byμ∗ν. The results of this paper are stated for convolutions of measures, they admit immediate translations in the language of sums of independent random variables. Forλ∈R, the functione_λis defined bye_λ(x)=exp(λx).

1Research was supported in part by the European Network Phenomena in High Dimensions, FP6 Marie Curie Actions, MCRN-511953.

(2)

1. Stochastic domination

A natural way of comparing two probability measures is given by the following relation.

Definition 1.1. Letμandν be two probability measures on the real line.We say thatμisstochastically dominated byνand we writeμ≤stν if

∀t∈R μ[t,∞)≤ν[t,∞). (1)

Stochastic domination is an order relation on P(R)(in particular,μ≤stν andν≤stμimplyμ=ν). The following result [9,16] provides useful characterizations of stochastic domination.

Theorem. Letμandνbe probability measures on the real line.The following are equivalent:

(1) μ≤stν.

(2) Sample path characterization.There exists a probability space(Ω,F,P)and two random variablesXandY on Ωwith respective lawsμandν,so that

∀ω∈Ω X(ω)≤Y (ω).

(3) Functional characterization.For any increasing functionf:R→Rso that both integrals exist,

fdμ≤

fdν.

It is easily checked that stochastic domination is well behaved with respect to convolution.

Lemma 1.2. Letμ₁,μ₂,ν₁,ν₂be probability measures on the real line.Ifμ₁≤stν1andμ₂≤stν2,thenμ₁∗μ₂≤stν1∗ν2. Lemma 1.3. Letμandνbe two probability measures on the real line such thatμ≤stν.Then,for alln≥2,μ^∗n≤stν^∗n. For fixedμandν, it follows from Lemma 1.2 that the set of integerskso thatμ^∗^k≤stν^∗^k is stable under addition.

In generalμ^∗ⁿ≤stν^∗ⁿdoes not implyμ^∗⁽ⁿ⁺¹⁾≤stν∗(n+1). Here is a typical example:

Example 1.4. Letμandνbe the probability measures defined as μ=0.4δ0+0.6δ2,

ν=0.8δ1+0.2δ3.

It is straightforward to verify(see Fig. 1)that:

• Fork=2,and therefore for all evenk,we haveμ^∗^k≤stν^∗^k.

• Forkodd,we haveμ^∗^k≤stν^∗k only fork≥9.

Other examples show that the minimalnso thatμ^∗n≤stν^∗ncan be arbitrarily large. This is the content of the next proposition.

Proposition 1.5. For every integer n, there exist compactly supported probability measures μ and ν such that μ^∗ⁿ≤stν^∗nand,for all1≤k≤n−1,μ^∗^kstν^∗^k.

(3)

Fig. 1. Cumulative distribution functions ofμ^∗k(solid line) andν^∗k(dotted line) from Example 1.4 fork=1,2,3,9.

Proof. Letμ=εδ₋2n+(1−ε)δ1andνbe the uniform measure on[0,2], where 0< ε <1 will be defined later. For k≥1,

μ^∗^k= k i=0

k i

(1−ε)ⁱε^k⁻ⁱδ_i₋_2n(k₋_i).

Note that supp(ν^∗^k)⊂R⁺, while for 1≤k≤n, the only part ofμ^∗^k chargingR₊is the Dirac mass at pointk. This implies that

μ^∗^k≤stν^∗^k⇐⇒μ^∗^k[k,+∞)≤ν^∗^k[k,+∞).

We haveμ^∗^k[k,+∞)=(1−ε)^kandν^∗^k[k,+∞)=1/2. It remains to chooseεso that(1−ε)ⁿ<1/2< (1−ε)ⁿ⁻¹. 2. Stochastic domination for iterated convolutions and Cramér’s theorem

In light of previous examples, we are going to study the following extension of stochastic domination:

Definition 2.1. We define a relation≤^∗stonP(R)as follows:

μ≤^∗stν ⇐⇒ ∃n≥1 s.t.μ^∗ⁿ≤stν^∗n.

In turns that when defined on P(R), this relation is not an order relation due to pathological poorly integrable measures. Indeed, there exist two probability measuresμandν so thatμ=ν andμ∗μ=ν∗ν (see [7], p. 479).

Therefore, the relation≤^∗_stis not anti-symmetric. For this reason, we restrict ourselves to sufficiently integrable measures (however, most of what follows generalizes to wider classes of measures). This is quite usual when studying orderings of probability measures; see [16] for examples of such situations.

Definition 2.2. A measureμonRis said to beexponentially integrableif

e_λdμ <+∞for allλ∈R[recall that e_λ(x)=exp(λx)].We writePexp(R)for the set of exponentially integrable probability measures.

Notice that the space of exponentially integrable measures is stable under convolution.

Proposition 2.3. When restricted toPexp(R),the relation≤^∗stis a partial order.

Proof. One has to check only the antisymmetry property, the other two being obvious. Letkandl be two integers such thatμ^∗^k≤stν^∗^k andν^∗^l≤stμ^∗^l. Thenμ^∗^kl≤stν^∗^kl≤stμ^∗^kl and thereforeμ^∗^kl=ν^∗^kl. But ifμandν are exponentially integrable, this implies thatμ=ν. One can see this in the following way: if we denote the moments ofμby

(4)

m_p(μ)=

x^pdμ(x), one checks by induction onpthatm_p(μ)=m_p(ν)for allp∈N. On the other hand, exponen- tial integrability implies thatm_2p(μ)^1/2p≤Cpfor some constantC, so that Carleman’s condition is satisfied (see [7],

p. 224). Thereforeμis determined by its moments andμ=ν.

We would like to give a description of the relation≤^∗st, for example, similar to the functional characterization of≤st. We start with the following lemma.

Lemma 2.4. Letμ, ν∈Pexp(R)such thatμ≤^∗_stν.Then the following inequalities hold:

(a) ∀λ >0,

e_λdμ≤ e_λdν, (b) ∀λ <0,

e_λdμ≥ e_λdν, (c)

xdμ(x)≤

xdν(x), (d) minμ≤minν, (e) maxμ≤maxν.

Proof. Letμ≤^∗stνandλ >0. Sinceμ^∗ⁿ≤ν^∗ⁿfor somen, we get from the functional characterization of≤stthat

eλdμ^∗ⁿ≤

eλdν^∗ⁿ. It remains to notice that

e_λdμ^∗ⁿ=

e_λdμ n

and we get (a). The proof of (b) is completely symmetric, while (c) follows also from the functional characterization.

Conditions (d) and (e) are obvious since min(μ^∗ⁿ)=nmin(μ)and max(μ^∗ⁿ)=nmax(μ).

The following proposition shows that the necessary conditions of Lemma 2.4 are “almost sufficient.”

Proposition 2.5. Letμ, ν∈Pexp(R).Assume that the following inequalities hold:

(a) ∀λ >0,

eλdμ <

eλdν, (b) ∀λ <0,

eλdν <

eλdμ, (c)

xdμ(x) <

xdν(x), (d) maxμ <maxν, (e) minμ <minν.

Thenμ≤^∗stν,and more precisely there exists an integerN∈Nsuch that for anyn≥N,μ^∗ⁿ≤stν^∗ⁿ.

We give in Proposition 3.6 a counter-example showing that Proposition 2.5 is not true when stated with large inequalities.

We are going to use Cramér’s theorem on large deviations. The cumulant generating functionΛ_μof the probability measureμis defined for anyλ∈Rby

Λμ(λ)=log

eλdμ.

It is a convex function taking values inR. Its convex conjugate Λ^∗_μ, sometimes called the Cramér transform, is defined as

Λ^∗_μ(t )=sup

λ∈R

λt−Λ_μ(λ).

Note thatΛ^∗_μ:R→ [0,+∞]is a smooth convex function, which takes the value+∞onR\ [minμ,maxμ]. More- over, fort ∈(minμ,maxμ), the supremum in the definition ofΛ^∗_μ(t )is attained at a unique pointλt. Moreover,

(5)

λ_t >0 ift >

xdμ(x)andλ_t <0 ift <

xdμ(x). Also,Λ^∗_μ(

xdμ(x))=0 sinceΛ_μ(0)=

xdμ(x). We now state Cramér’s theorem. The theorem can be equivalently stated in the language of sums of i.i.d. random variables [5,9].

Theorem (Cramér’s theorem). Letμ∈Pexp(R).Then for anyt∈R,

nlim→∞

1

nlogμ^∗ⁿ[t n,+∞)=

0 ift≤

xdμ(x),

−Λ^∗_X(t ) otherwise, (2)

nlim→∞

1 nlog

1−μ^∗ⁿ(t n,+∞) =

0 ift≥

xdμ(x),

−Λ^∗_X(t ) otherwise. (3)

Proof of Proposition 2.5. Note that the hypotheses imply that the quantities maxμand minν are finite. We write alsoM_μ=

xdμ(x)andM_ν=

xdν(x). Forn≥1, define(f_n)and(g_n)by f_n(t )=μ^∗ⁿ[t n,+∞),

g_n(t )=ν^∗ⁿ[t n,+∞).

We need to prove thatf_n≤g_nonRfornlarge enough. Ift >maxμ, the inequality is trivial sincef_n(t )=0. Similarly, ift <minνwe haveg_n(t )=1 and there is nothing to prove.

Fix a real numbert₀such thatM_μ< t₀< M_ν. We first work on the intervalI= [t₀,maxμ]. By Cramér’s theorem, the sequences(f_n^1/n)and(g^1/n_n )converge respectively onI towardf andgdefined by

f (t )=exp

−Λ^∗_μ(t ), g(t )=

1 ift0≤t≤Mν, exp

−Λ^∗_ν(t ) ifMν≤t≤maxμ.

Note thatf andg are continuous on I. We claim also thatf < g onI. The inequality is clear on [t0, Mν]since f <1. Ift∈(Mν,maxμ], note that the supremum in the definition ofΛ^∗_ν(t )is attained for someλ >0 – to show this we used hypothesis (d). Using (a) and the definition of the convex conjugate, it implies thatΛ^∗_ν(t ) > Λ^∗_μ(t ). We now use the following elementary fact: if a sequence of non-increasing functions defined on a compact interval I converges pointwise toward a continuous limit, then the convergence is actually uniform onI (for a proof see [15], Part 2, Problem 127; this statement is attributed to Pólya or to Dini depending on authors). We apply this result to both (f_n^1/n)and(g^1/n_n ); and sincef < g, uniform convergence implies that fornlarge enough,f_n^1/n< g_n^1/nonI, and thus f_n≤g_n.

Finally, we apply a similar argument on the interval J = [minν, t₀], except that we consider the sequences (1−fn)^1/nand(1−gn)^1/n, and we use (3) to compute the limit. We omit the details since the argument is totally symmetric.

We eventually showed that fornlarge enough,fn≤gnonI∪J, and thus onR. This is exactly the conclusion of

the proposition.

3. Geometry and topology of≤^∗_st

We investigate here the topology of the relation≤^∗st. We first need to define an adequate topology on Pexp(R). This space can be topologized in several ways, an important point for us being that the map μ→

e_λdμ should be continuous.

Definition 3.1. A functionf:R→Ris said to be subexponential if there exist constantsc, Cso that for everyx∈R f (x)≤Cexp

c|x| .

(6)

Definition 3.2. Letτ be the topology defined on the space of exponentially integrable measures,generated by the family of seminorms(N_f)

N_f(μ)=

fdμ ,

wheref belongs to the class of continuous subexponential functions.

The topologyτ is a locally convex vector space topology. It can be shown that the relation≤^∗_stis notτ-closed (see Proposition 3.6). However, we can give a functional characterization of its closure. This is the content of the following theorem.

Theorem 3.3. LetR⊂Pexp(R)²be the set of couples(μ, ν)of exponentially integrable probability measures so that μ≤^∗_stν.Then

R=

(μ, ν)∈Pexp(R)²s.t.∀λ≥0,

e_λdμ≤

e_λdνand∀λ≤0,

e_λdμ≥

e_λdν

, (4)

the closure being taken with respect to the topologyτ.

Proof. Let us writeXfor the set on the right-hand side of (4). We get from Lemma 2.4 thatR⊂X. Moreover, it is easily checked thatXisτ-closed, thereforeR⊂X. Conversely, we are going to show that the set of couples(μ, ν) satisfying the hypotheses of Proposition 2.5 isτ-dense inX. Let(μ, ν)∈X. We get from the inequalities satisfied by μandνthat:

•

xdμ(x)≤xdν(x)(taking derivatives atλ=0),

• minμ≤minν(takingλ→ −∞),

• maxμ≤maxν(takingλ→ +∞).

We want to define two sequences(μ_n, ν_n)whichτ-converge toward(μ, ν), withμ_n≤stμandν≤stνnand for which the above inequalities become strict. Assume for example that maxμ=maxν= +∞and minμ=minν= −∞.

Then we can defineμ_nandν_nas follows: letε_n=μ[n,+∞)andη_n=ν(−∞,−n], and set μ_n=μ_|₍_−∞_,n)+ε_nδ_n,

ν_n=ν_|₍₋_n,_+∞₎+η_nδ₋_n.

We check using dominated convergence that limμn=μand limνn=νwith respect toτ, while by Proposition 2.5 we haveμn≤^∗_stνn. The other cases are treated in a similar way: we can always play with small Dirac masses to make all inequalities strict (for example, if maxμ=maxν=M <+∞, replaceνby(1−ε)ν+εδM+1, and so on).

A more comfortable way of describing the relation≤^∗stis given by the following sets:

Definition 3.4. Letν∈Pexp(R).We defineD(ν)to be the following set:

D(ν)=

μ∈Pexp(R)s.t.μ≤^∗stν .

Using the ideas in the proof of Theorem 3.3, it can easily be showed that forν∈Pexp(R)such that minν >−∞, one has

D(ν)=

μ∈Pexp(R)s.t.∀λ≥0,

eλdμ≤

eλdνand∀λ≤0,

eλdμ≥

eλdν

, (5)

where the closure is taken in the topologyτ. However, for measuresνwith minν= −∞, the condition (e) of Propo- sition 2.5 is violated and we do not know if the relation (5) holds.

(7)

Another consequence of Eq. (5) is that theτ-closure ofD(ν)is a convex set. It is not clear that the setD(ν)itself is convex. We shall see in Proposition 3.7 that this is not the case in general for measuresν /∈Pexp(R). Note also that for fixedν∈P(R)the set{μ∈P(R)s.t.μ≤stν}is easily checked to be convex.

Remark 3.5. One can analogously define forμ∈Pexp(R)the “dual” set E(μ)=

ν∈Pexp(R)s.t.μ≤^∗_stν .

Results aboutD(ν)orE(μ)are equivalent.Indeed,letμ^↔be the measure defined for a Borel setBbyμ^↔(B)= μ(−B).We haveμ≤^∗stν⇐⇒ν^↔≤^∗stμ^↔and thereforeE(μ)=D(μ^↔)^↔.

We now give an example showing that the relation≤^∗stis notτ-closed.

Proposition 3.6. There exists a probability measureν∈Pexp(R)so that the setD(ν)is notτ-closed.Consequently, the setRappearing in(4)is not closed either.

Proof. Let us start with a simplified sketch of the proof. By the examples of Section 1, for each positive integerk, one can find probability measuresμkandνk such thatμk∈D(νk), whileμ^∗_k^k≤_stν∗k

k . We sum properly rescaled and normalized versions of these measures in order to obtain two probability measures μ andν such that μ /∈D(ν).

However, successive approximationsμ˜_nofμare shown to satisfyμ˜_n≤stν which impliesμ∈D(ν)and thusD(ν)= D(ν).

We now work out the details. Fork≥1, letak=(k+2)!, bk=(k+2)! +1 and γk =cexp(−k^k), where the constantcis chosen so that

γ_k=1. We check that(a_k)and(b_k)satisfy the following inequalities:

(k−1)bk+b_k₋1< ka_k, (6)

kbk< ak+1. (7)

It follows from Proposition 1.5 that for eachk∈Nthere existμ_k andν_k, probability measures with compact support such thatμ_k∈D(ν_k)whileμ^∗_k^k≤_stν∗k

k . Moreover, we can assume that supp(μk)⊂(a_k, b_k)and supp(νk)⊂(a_k, b_k).

Indeed, we can apply to both measures a suitable affine transformation (increasing affine transformations preserve stochastic domination and are compatible with convolution). We now defineμandνas

μ= ∞ k=1

γkμk and ν= ∞ k=1

γkνk.

Note that the sequence (γk) has been chosen to tend very quickly to 0 to ensure thatμ and ν are exponentially integrable. We also introduce the following sequences of measures:

˜ μ_n=

n

k=1

γ_kμ_k+ _∞

k=n+1

γ_k

δ0,

˜ ν_n=

n

k=1

γ_kν_k+ _∞

k=n+1

γ_k

δ0.

One checks using Lebesgue’s dominated convergence theorem that the sequences(μ˜_n)and(ν˜_n)converge respectively towardμandνfor the topologyτ. Note also that these sequences are increasing with respect to stochastic domination, so thatνñ≤stν. For fixedk,μkandνksatisfy the hypotheses of Proposition 2.5 and thus the same holds forμñandνñ. Thereforeμ˜_n∈D(ν˜_n)⊂D(ν). This proves thatμ∈D(ν).

We now prove by contradiction thatμ /∈D(ν). Assume thatμ∈D(ν), i.e.,μ^∗^k≤stν^∗kfor somek≥1. Lets_k=ka_k andt_k=kb_k. Fix a sequencei1, . . . , i_k of non-zero integers. Setm=μ_i₁∗ · · · ∗μ_i_k orm=ν_i₁ ∗ · · · ∗ν_i_k. We know that supp(m)⊂(a, b), with a=_k

j=1a_i_j andb=_k

j=1b_i_j. It is possible to locate precisely supp(m) using the inequalities (6) and (7).

(8)

(a) Ifi_j> kfor somej, thena≥a_k₊₁> t_k and therefore supp(m)⊂(t_k,+∞).

(b) Ifi_j=kfor allj, thena=s_k andb=t_kand therefore supp(m)⊂(s_k, t_k).

(c) Ifi_j≤kfor allj andi_j₀< kfor somej0, thenb≤b_k₋1+(k−1)bk< s_k and therefore supp(m)⊂ [0, sk).

Consequently,

μ^∗^k[t_k,+∞)=

i1,...,ik

γ_i₁. . . γ_i_kμ_i₁∗ · · · ∗μ_i_k[t_k,+∞)=

i1,...,iksatisfying(a)

γ_i₁· · ·γ_i_k=ν^∗^k[t_k,+∞).

Moreover, because of (b) and (c), we get that fors_k≤t≤t_k, μ^∗^k[t, tk)=γ_k^kμ^∗_k^k[t, tk)=γ_k^kμ^∗_k^k[t,+∞)

and similarly

ν^∗^k[t, tk)=γ_k^kν_k^∗^k[t,+∞).

We assumed thatμ^∗^k≤stν^∗k, i.e.,μ^∗^k[t,+∞)≤ν^∗^k[t,+∞)for allt. Ift≤t_k, sinceμ^∗^k(t_k,+∞)=ν^∗^k(t_k,+∞), we get that μ^∗^k[t, t_k)≤ν^∗^k[t, t_k). Sinceγ_k >0, this implies that for all t ≥s_k, μ^∗_k^k[t,+∞)≤ν_k^∗^k[t,+∞). This contradicts the fact thatμ^∗_k^k≤_stν∗k

k . Thereforeμ∈D(ν)\D(ν), and soD(ν)is not closed.

We now give an example of what can happen if we consider measures with poor integrability properties.

Proposition 3.7. There exists a probability measureν∈P(R)such that the set μ∈P(R)s.t.μ≤^∗stν

(8) is not convex.

The difference between Eq. (8) and our definition of D(ν)is that here we do not suppose the measures to be exponentially integrable.

Proof of Proposition 3.7. We rely on the following fact which we already alluded to (see [7], p. 479): there exist two distinct real characteristic functionsφ1andφ2such thatφ₁²=φ₂²identically. Consider now the measuresμand νwith respective characteristic functionsφ₁andφ₂, i.e.,φ₁(t )=

e^itdμ(t )andφ₂(t )=

e^itdν(t ). Obviously, we haveν≤^∗stνandμ≤^∗stνsinceμ^∗²=ν^∗². Letχ=¹₂μ+¹₂νand let us show thatχ≤^∗stν. We have

χ^∗²ⁿ= 1 2²ⁿ

2n

i=0

2n i

μ^∗ⁱ∗ν^∗²ⁿ⁻ⁱ= 1 2²ⁿ

ieven

2n i

ν^∗²ⁿ+

iodd

2n i

ν^∗²ⁿ⁻¹∗μ

.

Thusχ^∗²ⁿ≤^∗_stν²ⁿ, is equivalent toν^∗²ⁿ⁻¹∗μ≤^∗_stν²ⁿ. Let us show that this is impossible. Indeed, the measuresν^∗²ⁿ⁻¹∗μ andν^∗²ⁿhave real characteristic functions and thus they are symmetric probability measures. Note however that two symmetric probability distributions cannot be compared with≤stunless they are equal. But it cannot be thatν^∗²ⁿ⁻¹∗ μ=ν^∗²ⁿbecause their characteristic functions are different (φ1(ξ )=φ₂(ξ )iffφ₁(ξ )=0). A similar argument holds

forχ^∗²ⁿ⁺¹stν^∗²ⁿ⁺¹.

We conclude this section with few remarks on a relation which is very similar to≤^∗st. It is the analogue of catalytic majorization in quantum information theory (see Section 4).

Definition 3.8. Letμ, ν∈Pexp(R).We say thatμis catalytically stochastically dominated byνand writeμ≤^Cstνif there exists a probability measureπ∈Pexp(R)such thatμ∗π≤stν∗π.

The following lemma shows a connection between the two relations.

(9)

Lemma 3.9. Letμ, ν∈Pexp(R).Assumeμ≤^∗stν.Thenμ≤^Cstν.

Proof. Assume thatμ^∗ⁿ≤stν^∗nfor somen. Letπbe the probability measure defined by π=1

n

n−1

k=0

μ^∗^k∗ν^∗⁽ⁿ⁻¹⁻^k).

Let alsoρbe the measure defined by ρ=1

n

n−1

k=1

μ^∗^k∗ν^∗⁽ⁿ⁻^k),

then one hasμ∗π= ¹_nμ^∗ⁿ+ρ andν∗π=_n¹ν^∗ⁿ+ρ, and sinceμ^∗ⁿ≤stν^∗ⁿ this implies μ∗π≤stν ∗π. Since

π∈Pexp(R), we getμ≤^Cstν.

From Theorem 3.3 and Lemma 3.9 one can easily derive

Corollary 3.10. The analogue of Theorem3.3is true if we substitute≤^∗stwith≤^Cst. 4. Catalytic majorization

This section is dedicated to the study of the majorization relation, the notion which was the initial motivation of this work. The majorization relation provides, much as the stochastic domination for probability measures, a partial order on the set of probability vectors. Originally introduced in linear algebra [3,12], it has found many applications in quantum information theory with the work of Nielsen [13]. We shall not focus on quantum-theoretical aspects of majorization; we refer the interested reader to [1] and references therein. Here, we study majorization by adapting previously obtained results for stochastic domination.

The majorization relation is defined for probability vectors, i.e., vectorsx∈R^N with non-negative components (xi ≥0) which sum up to one (

ix_i =1). Before defining precisely majorization, let us introduce some notation.

Ford ∈N^∗, let P_d be the set ofd-dimensional probability vectors:P_d = {x∈R^ds.t.x_i≥0,

x_i =1}. Consider also the set of finitely supported probability vectorsP<∞=

d>0Pd. We equipP<∞with the1norm defined by x1=

i|xi|. For a vectorx∈P<∞, we writexmaxfor the largest component ofxandxminfor its smallest non-zero component. In this section we shall consider only finitely supported vectors. For the general case, see Section 6. We shall identify an elementx∈P_d with the corresponding element inP_d (d> d) orP_<_∞obtained by appending null components at the end ofx.

Next, we definex^↓, the decreasing rearrangement of a vectorx∈P_das the vector which has the same coordinates as x up to permutation and such that x_i^↓≥x_i^↓₊₁for all 1≤i < d. We can now define majorization in terms of the ordered vectors:

Definition 4.1. Forx, y∈P_dwe say thatxis majorized byyand we writex≺yif for allk∈ {1, . . . , d} k

i=1

x_i^↓≤ k

i=1

y_i^↓. (9)

Note however that there are several equivalent definitions of majorization which do not use the ordering of the vectorsxandy(see [3] for further details):

Proposition 4.2. The following assertions are equivalent:

(1) x≺y, (2) ∀t∈R,d

i=1|xi−t| ≤d

i=1|yi−t|,

(10)

(3) ∀t∈R,_d

i=1(x_i−t )⁺≤_d

i=1(y_i−t )⁺,wherez⁺=max(z,0), (4) there is a bistochastic matrixBsuch thatx=By.

There are two operations on probability vectors which are of particular interest to us: the tensor product and the direct sum. Forx=(x1, . . . , x_d)∈P_d andx=(x₁, . . . , x_d)∈P_d, we define the tensor productx⊗xas the vector (x_ix_j)_ij∈P_dd. We also define the direct sumx⊕x as the concatenated vector(x₁, . . . , x_d, x₁, . . . , x_d)∈R^d⁺^d. Note that if we take⊕-convex combinations, we get probability vectors:λx⊕(1−λ)x∈P_d₊_d.

The construction which permits us to use tools from stochastic domination in the framework of majorization is the following (inspired by [11]): to a probability vectorz∈P_<_∞we associate a probability measureμ_zdefined by:

μ_z= z_iδ_logz_i.

These measures behave well with respect to tensor products:

μx⊗y=μx∗μy.

The connection between majorization and stochastic domination is provided by the following lemma.

Lemma 4.3. Letx, y∈P_<_∞.Assume thatμ_x≤stμy.Thenx≺y. Proof. We can assume thatx=x^↓andy=y^↓. Note that

μ_x[t,∞)=

i:logx_i≥t

x_i=

i:xi≥exp(t )

x_i.

Thus, for allu >0,

i:xi≥uxi≤

i:yi≥uyi. To start, useu=y1to conclude thatx1≤y1. Notice that it suffices to show that_k

i=1x_i ≤_k

i=1y_i only for thoseksuch thatx_k> y_k (indeed, ifx_k≤y_k, the(k+1)th inequality in (9) can be deduced from thekth inequality). Consider such akand letx_k> u > y_k. We get:

k

i=1

x_i≤

i:xi≥u

x_i≤

i:yi≥u

y_i≤ k

i=1

y_i,

which completes the proof of the lemma.

Remark 4.4. The converse of this lemma does not hold.Indeed,considerx=(0.5,0.5)andy=(0.9,0.1).Obviously, x≺ybut1=μx[log 0.5,∞) > μy[log 0.5,∞)=0.9and thusμxstμ_y.

We can describe the majorization relation by the sets:

S_d(y)= {x∈P_ds.t.x≺y},

whereyis a finitely supported probability vector. Mathematically, such a set is characterized by the following lemma, which is a simple consequence of Birkhoff’s theorem on bistochastic matrices:

Lemma 4.5. Fory ad-dimensional probability vector,the setS(y)is a polytope whose extreme points arey and its permutations.

The initial motivation for our work was the following phenomena discovered in quantum information theory (see [10] and respectively [2]). It turns out that additional vectors can act ascatalystsfor the majorization relation: there are vectorsx, y, z∈P_<_∞such thatx⊀ybutx⊗z≺y⊗z; in such a situation we say thatxis catalytically majorized (ortrumped) byyand we writex≺T y. Another form of catalysis is provided bymultiple copiesof vectors: we can find vectorsx andy such thatx⊀y but still, for somen≥2,x^⊗ⁿ≺y^⊗ⁿ; in this case we writex≺My. We have

(11)

thus two new order relations on probability vectors, analogues of≤^Cstand respectively≤^∗st. As before, fory∈P_d, we introduce the sets

Td(y)= {x∈Pds.t.x≺T y}

and

M_d(y)= {x∈P_ds.t.x≺My}.

It turns out that the relations≺T and≺M (and thus the setsT_d(y)andM_d(y)) are not as simple as≺andS_d(y).

It is known that the inclusionM_d(y)⊂T_d(y)holds (this is the analogue of Lemma 3.9) and that it can be strict [8].

In general, the setsT_d(y)andM_d(y)are neither closed nor open, and althoughT_d(y)is known to be convex, nothing is known about the convexity ofM_d(y)(such questions have been intensively studied in the physical literature; see [4,6] and the references therein). As explained in [1] it is natural from a mathematical point of view to introduce the setsT_<_∞(y)=

d∈NT_d(y)andM_<_∞(y)=

d∈NM_d(y). A key notion in characterizing them isSchur-convexity:

Definition 4.6. A functionf:Pd→Ris said to be

• Schur-convex iff (x)≤f (y)wheneverx≺y,

• Schur-concave iff (x)≥f (y)wheneverx≺y,

• strictly Schur-convex iff (x) < f (y)wheneverxy,

• strictly Schur-concave iff (x) > f (y)wheneverxy, wherexy meansx≺yandx^↓=y^↓.

Examples are provided as follows: ifΦ:R→Ris a (strictly) convex/concave function, then the following function h:P_d→Rdefined byh(x1, . . . , x_d)=Φ(x1)+ · · · +Φ(x_d)is (strictly) Schur-convex/Schur-concave.

Forx∈P_dandp∈R, we defineN_p(x)as N_p(x)=

1≤i≤d xi>0

x_i^p.

We will also use the Shannon entropyH

H (x)= − d

i=1

xilogxi.

Note that−H (x)is the derivative ofp→N_p(x)atp=1 and thatN₀(x)is the number of non-zero components of the vectorx. These functions satisfy the following properties:

(1) Ifp >1,N_pis strictly Schur-convex onP_<_∞. (2) If 0< p <1,N_pis strictly Schur-concave onP_<_∞.

(3) Ifp <0,N_p is strictly Schur-convex onP_dfor anyd. However, forp <0, it is not possible to compare vectors with a different number of non-zero components.

(4) H is strictly Schur-concave onP_<_∞.

One possible way of describing the relations≺Mand≺T is to find a family (the smallest possible) of Schur-convex functions which characterizes them. In this direction, Nielsen conjectured the following result:

Conjecture 4.7. Fix a vectory∈Pd,with non-zero coordinates.ThenTd(y)=Md(y)and they both are equal to the set ofx∈Pdsatisfying:

(C1) Forp≥1,N_p(x)≤N_p(y).

(C2) For0< p≤1,Np(x)≥Np(y).

(12)

(C3) Forp <0,N_p(x)≤N_p(y).

Here, the closures are taken inR^d (recall that neitherM_d(y)norT_d(y)is closed). By the previous remarks, any vector inT_d(y)orM_d(y)(and by continuity, also in the closures) must satisfy conditions (C1)–(C3). Recently, Turgut [17] provided a complete characterization of the setT_d(y), which implies in particular that Nielsen’s conjecture is true forT_d(y). His method, completely different from ours, consists in solving a discrete approximation of the problem using elementary algebraic techniques. Note however that the inclusionMd(y)⊂Td(y)is strict in general, and thus the characterization ofMd(y)is still open. We shall now focus on the setMd(y). Conjecture 4.7 can be reformulated as follows: ifx, y∈Pdand satisfy (C1)–(C3), then there exists a sequence(xn)inMd(y)such that(xn)converges to x. If we relax the condition thatxnandy have the same dimension, we can prove the following two theorems.

Theorem 4.8. Ifx, y∈Pdand satisfy(C1),then there exists a sequence(xn)inM<∞(y)such that(xn)converges to xin1-norm.

Theorem 4.9. Ifx, y∈Pdand satisfy(C1)–(C2),then there exists a sequence(xn)inMd+1(y)such that(xn)con- verges tox.

SinceM_d(y)⊂T_d(y), both theorems have direct analogues forT_<∞(y)and respectivelyT_d+1(y). Theorem 4.8 restates the authors’ previous result in [1]; however, the proof presented in the next section is more transparent than the previous one. Theorem 4.9 answers a question of [1]. It is an intermediate result between Theorem 4.8 and Con- jecture 4.7.

5. Proof of the theorems

We show here how to derive Theorems 4.8 and 4.9. We first state a proposition which is the translation of Proposi- tion 2.5 in terms of majorization.

Proposition 5.1. Letx, y∈P_<_∞.Assume that x and y have non-zero coordinates,and respective dimensions d_x andd_y.Assume that:

(1) xmin< ymin. (2) xmax< ymax. (3) H (x) > H (y).

(4) N_p(x) < N_p(y)for allp∈ ]1,+∞[.

(5) N_p(x) > N_p(y)for allp∈ ] − ∞,1[.

Then there exists an integerN such that for alln≥N,we havex^⊗n≺y^⊗n.

It is important to notice that sinceN0(x)=dxandN0(y)=dy, the conditions of the proposition can be satisfied only whendx> dy. This is the main reason why our approach fails to prove Conjecture 4.7.

Proof. One checks that the probability measuresμxandμyassociated to the vectorsxandy satisfy the hypotheses of Proposition 2.5. Indeed, forp∈R, one has

Np(x)=

eλdμx, withλ=p−1.

As μ^∗_xⁿ=μ_x⊗n, there exists a integer N such that for n≥N, we have μ_x⊗n ≤stμ_y⊗n. It remains to apply the

Lemma 4.3 in order to complete the proof.

The main idea used in the following proofs is to slightly modify the vectorxso that the couple (x,y) satisfies the hypotheses of Proposition 5.1.

Stochastic domination for iterated convolutions and catalytic majorization

Stochastic domination for iterated convolutions and catalytic majorization 1

Guillaume Aubrun and Ion Nechita

Stochastic domination for iterated convolutions and catalytic majorization ¹