www.imstat.org/aihp 2009, Vol. 45, No. 3, 611–625
DOI: 10.1214/08-AIHP175
© Association des Publications de l’Institut Henri Poincaré, 2009
Stochastic domination for iterated convolutions and catalytic majorization 1
Guillaume Aubrun and Ion Nechita
Université de Lyon, Université Lyon 1, CNRS, UMR 5208 Institut Camille Jordan, Batiment du Doyen Jean Braconnier, 43, boulevard du 11 novembre 1918, F - 69622 Villeurbanne Cedex, France. E-mail: aubrun@math.univ-lyon1.fr; nechita@math.univ-lyon1.fr
Received 24 September 2007; revised 26 March 2008; accepted 4 April 2008
Abstract. We study how iterated convolutions of probability measures compare under stochastic domination. We give necessary and sufficient conditions for the existence of an integernsuch thatμ∗nis stochastically dominated byν∗nfor two given probability measuresμandν. As a consequence we obtain a similar theorem on the majorization order for vectors inRd. In particular we prove results about catalysis in quantum information theory.
Résumé. Nous étudions comment les convolutions itérées des mesures de probabilités se comparent pour la domination stochas- tique. Nous donnons des conditions nécessaires et suffisantes pour l’existence d’un entier ntel queμ∗n soit stochastiquement dominée parν∗n, étant données deux mesures de probabilitésμetν. Nous obtenons en corollaire un théorème similaire pour des vecteurs deRd et la relation de Schur-domination. Plus spécifiquement, nous démontrons des résultats sur la catalyse en théorie quantique de l’information.
MSC:Primary 60E15; secondary 94A05
Keywords:Stochastic domination; Iterated convolutions; Large deviations; Majorization; Catalysis
Introduction and notations
This work is a continuation of [1], where we study the phenomenon of catalytic majorization in quantum information theory. A probabilistic approach to this question involves stochastic domination which we introduce in Section 1 and its behavior with respect to the convolution of measures. We give in Section 2 a condition on measuresμandνfor the existence of an integer nsuch that μ∗n is stochastically dominated by ν∗n. We gather further topological and geometrical aspects in Section 3. Finally, we apply these results to our original problem of catalytic majorization. In Section 4 we introduce the background for quantum catalytic majorization and we state our results. Section 5 contains the proofs and in Section 6 we consider an infinite dimensional version of catalysis.
We introduce now some notation and recall basic facts about probability measures. We write P(R)for the set of probability measures onR. We denote byδx the Dirac mass at pointx. Ifμ∈P(R), we write suppμfor the support ofμ. We write respectively minμ∈ [−∞,+∞)and maxμ∈(−∞,+∞]for min suppμand max suppμ. We also write μ(a, b) andμ[a, b] as a shortcut forμ((a, b)) andμ([a, b]). The convolution of two measures μ andν is denotedμ∗ν. Recall that ifXandY are independent random variables of respective lawsμandν, the law ofX+Y is given byμ∗ν. The results of this paper are stated for convolutions of measures, they admit immediate translations in the language of sums of independent random variables. Forλ∈R, the functioneλis defined byeλ(x)=exp(λx).
1Research was supported in part by the European Network Phenomena in High Dimensions, FP6 Marie Curie Actions, MCRN-511953.
1. Stochastic domination
A natural way of comparing two probability measures is given by the following relation.
Definition 1.1. Letμandν be two probability measures on the real line.We say thatμisstochastically dominated byνand we writeμ≤stν if
∀t∈R μ[t,∞)≤ν[t,∞). (1)
Stochastic domination is an order relation on P(R)(in particular,μ≤stν andν≤stμimplyμ=ν). The following result [9,16] provides useful characterizations of stochastic domination.
Theorem. Letμandνbe probability measures on the real line.The following are equivalent:
(1) μ≤stν.
(2) Sample path characterization.There exists a probability space(Ω,F,P)and two random variablesXandY on Ωwith respective lawsμandν,so that
∀ω∈Ω X(ω)≤Y (ω).
(3) Functional characterization.For any increasing functionf:R→Rso that both integrals exist,
fdμ≤
fdν.
It is easily checked that stochastic domination is well behaved with respect to convolution.
Lemma 1.2. Letμ1,μ2,ν1,ν2be probability measures on the real line.Ifμ1≤stν1andμ2≤stν2,thenμ1∗μ2≤stν1∗ν2. Lemma 1.3. Letμandνbe two probability measures on the real line such thatμ≤stν.Then,for alln≥2,μ∗n≤stν∗n. For fixedμandν, it follows from Lemma 1.2 that the set of integerskso thatμ∗k≤stν∗k is stable under addition.
In generalμ∗n≤stν∗ndoes not implyμ∗(n+1)≤stν∗(n+1). Here is a typical example:
Example 1.4. Letμandνbe the probability measures defined as μ=0.4δ0+0.6δ2,
ν=0.8δ1+0.2δ3.
It is straightforward to verify(see Fig. 1)that:
• Fork=2,and therefore for all evenk,we haveμ∗k≤stν∗k.
• Forkodd,we haveμ∗k≤stν∗k only fork≥9.
Other examples show that the minimalnso thatμ∗n≤stν∗ncan be arbitrarily large. This is the content of the next proposition.
Proposition 1.5. For every integer n, there exist compactly supported probability measures μ and ν such that μ∗n≤stν∗nand,for all1≤k≤n−1,μ∗kstν∗k.
Fig. 1. Cumulative distribution functions ofμ∗k(solid line) andν∗k(dotted line) from Example 1.4 fork=1,2,3,9.
Proof. Letμ=εδ−2n+(1−ε)δ1andνbe the uniform measure on[0,2], where 0< ε <1 will be defined later. For k≥1,
μ∗k= k i=0
k i
(1−ε)iεk−iδi−2n(k−i).
Note that supp(ν∗k)⊂R+, while for 1≤k≤n, the only part ofμ∗k chargingR+is the Dirac mass at pointk. This implies that
μ∗k≤stν∗k⇐⇒μ∗k[k,+∞)≤ν∗k[k,+∞).
We haveμ∗k[k,+∞)=(1−ε)kandν∗k[k,+∞)=1/2. It remains to chooseεso that(1−ε)n<1/2< (1−ε)n−1. 2. Stochastic domination for iterated convolutions and Cramér’s theorem
In light of previous examples, we are going to study the following extension of stochastic domination:
Definition 2.1. We define a relation≤∗stonP(R)as follows:
μ≤∗stν ⇐⇒ ∃n≥1 s.t.μ∗n≤stν∗n.
In turns that when defined on P(R), this relation is not an order relation due to pathological poorly integrable measures. Indeed, there exist two probability measuresμandν so thatμ=ν andμ∗μ=ν∗ν (see [7], p. 479).
Therefore, the relation≤∗stis not anti-symmetric. For this reason, we restrict ourselves to sufficiently integrable mea- sures (however, most of what follows generalizes to wider classes of measures). This is quite usual when studying orderings of probability measures; see [16] for examples of such situations.
Definition 2.2. A measureμonRis said to beexponentially integrableif
eλdμ <+∞for allλ∈R[recall that eλ(x)=exp(λx)].We writePexp(R)for the set of exponentially integrable probability measures.
Notice that the space of exponentially integrable measures is stable under convolution.
Proposition 2.3. When restricted toPexp(R),the relation≤∗stis a partial order.
Proof. One has to check only the antisymmetry property, the other two being obvious. Letkandl be two integers such thatμ∗k≤stν∗k andν∗l≤stμ∗l. Thenμ∗kl≤stν∗kl≤stμ∗kl and thereforeμ∗kl=ν∗kl. But ifμandν are exponen- tially integrable, this implies thatμ=ν. One can see this in the following way: if we denote the moments ofμby
mp(μ)=
xpdμ(x), one checks by induction onpthatmp(μ)=mp(ν)for allp∈N. On the other hand, exponen- tial integrability implies thatm2p(μ)1/2p≤Cpfor some constantC, so that Carleman’s condition is satisfied (see [7],
p. 224). Thereforeμis determined by its moments andμ=ν.
We would like to give a description of the relation≤∗st, for example, similar to the functional characterization of≤st. We start with the following lemma.
Lemma 2.4. Letμ, ν∈Pexp(R)such thatμ≤∗stν.Then the following inequalities hold:
(a) ∀λ >0,
eλdμ≤ eλdν, (b) ∀λ <0,
eλdμ≥ eλdν, (c)
xdμ(x)≤
xdν(x), (d) minμ≤minν, (e) maxμ≤maxν.
Proof. Letμ≤∗stνandλ >0. Sinceμ∗n≤ν∗nfor somen, we get from the functional characterization of≤stthat
eλdμ∗n≤
eλdν∗n. It remains to notice that
eλdμ∗n=
eλdμ n
and we get (a). The proof of (b) is completely symmetric, while (c) follows also from the functional characterization.
Conditions (d) and (e) are obvious since min(μ∗n)=nmin(μ)and max(μ∗n)=nmax(μ).
The following proposition shows that the necessary conditions of Lemma 2.4 are “almost sufficient.”
Proposition 2.5. Letμ, ν∈Pexp(R).Assume that the following inequalities hold:
(a) ∀λ >0,
eλdμ <
eλdν, (b) ∀λ <0,
eλdν <
eλdμ, (c)
xdμ(x) <
xdν(x), (d) maxμ <maxν, (e) minμ <minν.
Thenμ≤∗stν,and more precisely there exists an integerN∈Nsuch that for anyn≥N,μ∗n≤stν∗n.
We give in Proposition 3.6 a counter-example showing that Proposition 2.5 is not true when stated with large inequalities.
We are going to use Cramér’s theorem on large deviations. The cumulant generating functionΛμof the probability measureμis defined for anyλ∈Rby
Λμ(λ)=log
eλdμ.
It is a convex function taking values inR. Its convex conjugate Λ∗μ, sometimes called the Cramér transform, is defined as
Λ∗μ(t )=sup
λ∈R
λt−Λμ(λ).
Note thatΛ∗μ:R→ [0,+∞]is a smooth convex function, which takes the value+∞onR\ [minμ,maxμ]. More- over, fort ∈(minμ,maxμ), the supremum in the definition ofΛ∗μ(t )is attained at a unique pointλt. Moreover,
λt >0 ift >
xdμ(x)andλt <0 ift <
xdμ(x). Also,Λ∗μ(
xdμ(x))=0 sinceΛμ(0)=
xdμ(x). We now state Cramér’s theorem. The theorem can be equivalently stated in the language of sums of i.i.d. random variables [5,9].
Theorem (Cramér’s theorem). Letμ∈Pexp(R).Then for anyt∈R,
nlim→∞
1
nlogμ∗n[t n,+∞)=
0 ift≤
xdμ(x),
−Λ∗X(t ) otherwise, (2)
nlim→∞
1 nlog
1−μ∗n(t n,+∞) =
0 ift≥
xdμ(x),
−Λ∗X(t ) otherwise. (3)
Proof of Proposition 2.5. Note that the hypotheses imply that the quantities maxμand minν are finite. We write alsoMμ=
xdμ(x)andMν=
xdν(x). Forn≥1, define(fn)and(gn)by fn(t )=μ∗n[t n,+∞),
gn(t )=ν∗n[t n,+∞).
We need to prove thatfn≤gnonRfornlarge enough. Ift >maxμ, the inequality is trivial sincefn(t )=0. Similarly, ift <minνwe havegn(t )=1 and there is nothing to prove.
Fix a real numbert0such thatMμ< t0< Mν. We first work on the intervalI= [t0,maxμ]. By Cramér’s theorem, the sequences(fn1/n)and(g1/nn )converge respectively onI towardf andgdefined by
f (t )=exp
−Λ∗μ(t ), g(t )=
1 ift0≤t≤Mν, exp
−Λ∗ν(t ) ifMν≤t≤maxμ.
Note thatf andg are continuous on I. We claim also thatf < g onI. The inequality is clear on [t0, Mν]since f <1. Ift∈(Mν,maxμ], note that the supremum in the definition ofΛ∗ν(t )is attained for someλ >0 – to show this we used hypothesis (d). Using (a) and the definition of the convex conjugate, it implies thatΛ∗ν(t ) > Λ∗μ(t ). We now use the following elementary fact: if a sequence of non-increasing functions defined on a compact interval I converges pointwise toward a continuous limit, then the convergence is actually uniform onI (for a proof see [15], Part 2, Problem 127; this statement is attributed to Pólya or to Dini depending on authors). We apply this result to both (fn1/n)and(g1/nn ); and sincef < g, uniform convergence implies that fornlarge enough,fn1/n< gn1/nonI, and thus fn≤gn.
Finally, we apply a similar argument on the interval J = [minν, t0], except that we consider the sequences (1−fn)1/nand(1−gn)1/n, and we use (3) to compute the limit. We omit the details since the argument is totally symmetric.
We eventually showed that fornlarge enough,fn≤gnonI∪J, and thus onR. This is exactly the conclusion of
the proposition.
3. Geometry and topology of≤∗st
We investigate here the topology of the relation≤∗st. We first need to define an adequate topology on Pexp(R). This space can be topologized in several ways, an important point for us being that the map μ→
eλdμ should be continuous.
Definition 3.1. A functionf:R→Ris said to be subexponential if there exist constantsc, Cso that for everyx∈R f (x)≤Cexp
c|x| .
Definition 3.2. Letτ be the topology defined on the space of exponentially integrable measures,generated by the family of seminorms(Nf)
Nf(μ)=
fdμ ,
wheref belongs to the class of continuous subexponential functions.
The topologyτ is a locally convex vector space topology. It can be shown that the relation≤∗stis notτ-closed (see Proposition 3.6). However, we can give a functional characterization of its closure. This is the content of the following theorem.
Theorem 3.3. LetR⊂Pexp(R)2be the set of couples(μ, ν)of exponentially integrable probability measures so that μ≤∗stν.Then
R=
(μ, ν)∈Pexp(R)2s.t.∀λ≥0,
eλdμ≤
eλdνand∀λ≤0,
eλdμ≥
eλdν
, (4)
the closure being taken with respect to the topologyτ.
Proof. Let us writeXfor the set on the right-hand side of (4). We get from Lemma 2.4 thatR⊂X. Moreover, it is easily checked thatXisτ-closed, thereforeR⊂X. Conversely, we are going to show that the set of couples(μ, ν) satisfying the hypotheses of Proposition 2.5 isτ-dense inX. Let(μ, ν)∈X. We get from the inequalities satisfied by μandνthat:
•
xdμ(x)≤xdν(x)(taking derivatives atλ=0),
• minμ≤minν(takingλ→ −∞),
• maxμ≤maxν(takingλ→ +∞).
We want to define two sequences(μn, νn)whichτ-converge toward(μ, ν), withμn≤stμandν≤stνnand for which the above inequalities become strict. Assume for example that maxμ=maxν= +∞and minμ=minν= −∞.
Then we can defineμnandνnas follows: letεn=μ[n,+∞)andηn=ν(−∞,−n], and set μn=μ|(−∞,n)+εnδn,
νn=ν|(−n,+∞)+ηnδ−n.
We check using dominated convergence that limμn=μand limνn=νwith respect toτ, while by Proposition 2.5 we haveμn≤∗stνn. The other cases are treated in a similar way: we can always play with small Dirac masses to make all inequalities strict (for example, if maxμ=maxν=M <+∞, replaceνby(1−ε)ν+εδM+1, and so on).
A more comfortable way of describing the relation≤∗stis given by the following sets:
Definition 3.4. Letν∈Pexp(R).We defineD(ν)to be the following set:
D(ν)=
μ∈Pexp(R)s.t.μ≤∗stν .
Using the ideas in the proof of Theorem 3.3, it can easily be showed that forν∈Pexp(R)such that minν >−∞, one has
D(ν)=
μ∈Pexp(R)s.t.∀λ≥0,
eλdμ≤
eλdνand∀λ≤0,
eλdμ≥
eλdν
, (5)
where the closure is taken in the topologyτ. However, for measuresνwith minν= −∞, the condition (e) of Propo- sition 2.5 is violated and we do not know if the relation (5) holds.
Another consequence of Eq. (5) is that theτ-closure ofD(ν)is a convex set. It is not clear that the setD(ν)itself is convex. We shall see in Proposition 3.7 that this is not the case in general for measuresν /∈Pexp(R). Note also that for fixedν∈P(R)the set{μ∈P(R)s.t.μ≤stν}is easily checked to be convex.
Remark 3.5. One can analogously define forμ∈Pexp(R)the “dual” set E(μ)=
ν∈Pexp(R)s.t.μ≤∗stν .
Results aboutD(ν)orE(μ)are equivalent.Indeed,letμ↔be the measure defined for a Borel setBbyμ↔(B)= μ(−B).We haveμ≤∗stν⇐⇒ν↔≤∗stμ↔and thereforeE(μ)=D(μ↔)↔.
We now give an example showing that the relation≤∗stis notτ-closed.
Proposition 3.6. There exists a probability measureν∈Pexp(R)so that the setD(ν)is notτ-closed.Consequently, the setRappearing in(4)is not closed either.
Proof. Let us start with a simplified sketch of the proof. By the examples of Section 1, for each positive integerk, one can find probability measuresμkandνk such thatμk∈D(νk), whileμ∗kk≤stν∗k
k . We sum properly rescaled and normalized versions of these measures in order to obtain two probability measures μ andν such that μ /∈D(ν).
However, successive approximationsμ˜nofμare shown to satisfyμ˜n≤stν which impliesμ∈D(ν)and thusD(ν)= D(ν).
We now work out the details. Fork≥1, letak=(k+2)!, bk=(k+2)! +1 and γk =cexp(−kk), where the constantcis chosen so that
γk=1. We check that(ak)and(bk)satisfy the following inequalities:
(k−1)bk+bk−1< kak, (6)
kbk< ak+1. (7)
It follows from Proposition 1.5 that for eachk∈Nthere existμk andνk, probability measures with compact support such thatμk∈D(νk)whileμ∗kk≤stν∗k
k . Moreover, we can assume that supp(μk)⊂(ak, bk)and supp(νk)⊂(ak, bk).
Indeed, we can apply to both measures a suitable affine transformation (increasing affine transformations preserve stochastic domination and are compatible with convolution). We now defineμandνas
μ= ∞ k=1
γkμk and ν= ∞ k=1
γkνk.
Note that the sequence (γk) has been chosen to tend very quickly to 0 to ensure thatμ and ν are exponentially integrable. We also introduce the following sequences of measures:
˜ μn=
n
k=1
γkμk+ ∞
k=n+1
γk
δ0,
˜ νn=
n
k=1
γkνk+ ∞
k=n+1
γk
δ0.
One checks using Lebesgue’s dominated convergence theorem that the sequences(μ˜n)and(ν˜n)converge respectively towardμandνfor the topologyτ. Note also that these sequences are increasing with respect to stochastic domination, so thatν˜n≤stν. For fixedk,μkandνksatisfy the hypotheses of Proposition 2.5 and thus the same holds forμ˜nandν˜n. Thereforeμ˜n∈D(ν˜n)⊂D(ν). This proves thatμ∈D(ν).
We now prove by contradiction thatμ /∈D(ν). Assume thatμ∈D(ν), i.e.,μ∗k≤stν∗kfor somek≥1. Letsk=kak andtk=kbk. Fix a sequencei1, . . . , ik of non-zero integers. Setm=μi1∗ · · · ∗μik orm=νi1 ∗ · · · ∗νik. We know that supp(m)⊂(a, b), with a=k
j=1aij andb=k
j=1bij. It is possible to locate precisely supp(m) using the inequalities (6) and (7).
(a) Ifij> kfor somej, thena≥ak+1> tk and therefore supp(m)⊂(tk,+∞).
(b) Ifij=kfor allj, thena=sk andb=tkand therefore supp(m)⊂(sk, tk).
(c) Ifij≤kfor allj andij0< kfor somej0, thenb≤bk−1+(k−1)bk< sk and therefore supp(m)⊂ [0, sk).
Consequently,
μ∗k[tk,+∞)=
i1,...,ik
γi1. . . γikμi1∗ · · · ∗μik[tk,+∞)=
i1,...,iksatisfying(a)
γi1· · ·γik=ν∗k[tk,+∞).
Moreover, because of (b) and (c), we get that forsk≤t≤tk, μ∗k[t, tk)=γkkμ∗kk[t, tk)=γkkμ∗kk[t,+∞)
and similarly
ν∗k[t, tk)=γkkνk∗k[t,+∞).
We assumed thatμ∗k≤stν∗k, i.e.,μ∗k[t,+∞)≤ν∗k[t,+∞)for allt. Ift≤tk, sinceμ∗k(tk,+∞)=ν∗k(tk,+∞), we get that μ∗k[t, tk)≤ν∗k[t, tk). Sinceγk >0, this implies that for all t ≥sk, μ∗kk[t,+∞)≤νk∗k[t,+∞). This contradicts the fact thatμ∗kk≤stν∗k
k . Thereforeμ∈D(ν)\D(ν), and soD(ν)is not closed.
We now give an example of what can happen if we consider measures with poor integrability properties.
Proposition 3.7. There exists a probability measureν∈P(R)such that the set μ∈P(R)s.t.μ≤∗stν
(8) is not convex.
The difference between Eq. (8) and our definition of D(ν)is that here we do not suppose the measures to be exponentially integrable.
Proof of Proposition 3.7. We rely on the following fact which we already alluded to (see [7], p. 479): there exist two distinct real characteristic functionsφ1andφ2such thatφ12=φ22identically. Consider now the measuresμand νwith respective characteristic functionsφ1andφ2, i.e.,φ1(t )=
eitdμ(t )andφ2(t )=
eitdν(t ). Obviously, we haveν≤∗stνandμ≤∗stνsinceμ∗2=ν∗2. Letχ=12μ+12νand let us show thatχ≤∗stν. We have
χ∗2n= 1 22n
2n
i=0
2n i
μ∗i∗ν∗2n−i= 1 22n
ieven
2n i
ν∗2n+
iodd
2n i
ν∗2n−1∗μ
.
Thusχ∗2n≤∗stν2n, is equivalent toν∗2n−1∗μ≤∗stν2n. Let us show that this is impossible. Indeed, the measuresν∗2n−1∗μ andν∗2nhave real characteristic functions and thus they are symmetric probability measures. Note however that two symmetric probability distributions cannot be compared with≤stunless they are equal. But it cannot be thatν∗2n−1∗ μ=ν∗2nbecause their characteristic functions are different (φ1(ξ )=φ2(ξ )iffφ1(ξ )=0). A similar argument holds
forχ∗2n+1stν∗2n+1.
We conclude this section with few remarks on a relation which is very similar to≤∗st. It is the analogue of catalytic majorization in quantum information theory (see Section 4).
Definition 3.8. Letμ, ν∈Pexp(R).We say thatμis catalytically stochastically dominated byνand writeμ≤Cstνif there exists a probability measureπ∈Pexp(R)such thatμ∗π≤stν∗π.
The following lemma shows a connection between the two relations.
Lemma 3.9. Letμ, ν∈Pexp(R).Assumeμ≤∗stν.Thenμ≤Cstν.
Proof. Assume thatμ∗n≤stν∗nfor somen. Letπbe the probability measure defined by π=1
n
n−1
k=0
μ∗k∗ν∗(n−1−k).
Let alsoρbe the measure defined by ρ=1
n
n−1
k=1
μ∗k∗ν∗(n−k),
then one hasμ∗π= 1nμ∗n+ρ andν∗π=n1ν∗n+ρ, and sinceμ∗n≤stν∗n this implies μ∗π≤stν ∗π. Since
π∈Pexp(R), we getμ≤Cstν.
From Theorem 3.3 and Lemma 3.9 one can easily derive
Corollary 3.10. The analogue of Theorem3.3is true if we substitute≤∗stwith≤Cst. 4. Catalytic majorization
This section is dedicated to the study of the majorization relation, the notion which was the initial motivation of this work. The majorization relation provides, much as the stochastic domination for probability measures, a partial order on the set of probability vectors. Originally introduced in linear algebra [3,12], it has found many applications in quantum information theory with the work of Nielsen [13]. We shall not focus on quantum-theoretical aspects of majorization; we refer the interested reader to [1] and references therein. Here, we study majorization by adapting previously obtained results for stochastic domination.
The majorization relation is defined for probability vectors, i.e., vectorsx∈RN with non-negative components (xi ≥0) which sum up to one (
ixi =1). Before defining precisely majorization, let us introduce some notation.
Ford ∈N∗, let Pd be the set ofd-dimensional probability vectors:Pd = {x∈Rds.t.xi≥0,
xi =1}. Consider also the set of finitely supported probability vectorsP<∞=
d>0Pd. We equipP<∞with the1norm defined by x1=
i|xi|. For a vectorx∈P<∞, we writexmaxfor the largest component ofxandxminfor its smallest non-zero component. In this section we shall consider only finitely supported vectors. For the general case, see Section 6. We shall identify an elementx∈Pd with the corresponding element inPd (d> d) orP<∞obtained by appending null components at the end ofx.
Next, we definex↓, the decreasing rearrangement of a vectorx∈Pdas the vector which has the same coordinates as x up to permutation and such that xi↓≥xi↓+1for all 1≤i < d. We can now define majorization in terms of the ordered vectors:
Definition 4.1. Forx, y∈Pdwe say thatxis majorized byyand we writex≺yif for allk∈ {1, . . . , d} k
i=1
xi↓≤ k
i=1
yi↓. (9)
Note however that there are several equivalent definitions of majorization which do not use the ordering of the vectorsxandy(see [3] for further details):
Proposition 4.2. The following assertions are equivalent:
(1) x≺y, (2) ∀t∈R,d
i=1|xi−t| ≤d
i=1|yi−t|,
(3) ∀t∈R,d
i=1(xi−t )+≤d
i=1(yi−t )+,wherez+=max(z,0), (4) there is a bistochastic matrixBsuch thatx=By.
There are two operations on probability vectors which are of particular interest to us: the tensor product and the direct sum. Forx=(x1, . . . , xd)∈Pd andx=(x1, . . . , xd)∈Pd, we define the tensor productx⊗xas the vector (xixj)ij∈Pdd. We also define the direct sumx⊕x as the concatenated vector(x1, . . . , xd, x1, . . . , xd)∈Rd+d. Note that if we take⊕-convex combinations, we get probability vectors:λx⊕(1−λ)x∈Pd+d.
The construction which permits us to use tools from stochastic domination in the framework of majorization is the following (inspired by [11]): to a probability vectorz∈P<∞we associate a probability measureμzdefined by:
μz= ziδlogzi.
These measures behave well with respect to tensor products:
μx⊗y=μx∗μy.
The connection between majorization and stochastic domination is provided by the following lemma.
Lemma 4.3. Letx, y∈P<∞.Assume thatμx≤stμy.Thenx≺y. Proof. We can assume thatx=x↓andy=y↓. Note that
μx[t,∞)=
i:logxi≥t
xi=
i:xi≥exp(t )
xi.
Thus, for allu >0,
i:xi≥uxi≤
i:yi≥uyi. To start, useu=y1to conclude thatx1≤y1. Notice that it suffices to show thatk
i=1xi ≤k
i=1yi only for thoseksuch thatxk> yk (indeed, ifxk≤yk, the(k+1)th inequality in (9) can be deduced from thekth inequality). Consider such akand letxk> u > yk. We get:
k
i=1
xi≤
i:xi≥u
xi≤
i:yi≥u
yi≤ k
i=1
yi,
which completes the proof of the lemma.
Remark 4.4. The converse of this lemma does not hold.Indeed,considerx=(0.5,0.5)andy=(0.9,0.1).Obviously, x≺ybut1=μx[log 0.5,∞) > μy[log 0.5,∞)=0.9and thusμxstμy.
We can describe the majorization relation by the sets:
Sd(y)= {x∈Pds.t.x≺y},
whereyis a finitely supported probability vector. Mathematically, such a set is characterized by the following lemma, which is a simple consequence of Birkhoff’s theorem on bistochastic matrices:
Lemma 4.5. Fory ad-dimensional probability vector,the setS(y)is a polytope whose extreme points arey and its permutations.
The initial motivation for our work was the following phenomena discovered in quantum information theory (see [10] and respectively [2]). It turns out that additional vectors can act ascatalystsfor the majorization relation: there are vectorsx, y, z∈P<∞such thatx⊀ybutx⊗z≺y⊗z; in such a situation we say thatxis catalytically majorized (ortrumped) byyand we writex≺T y. Another form of catalysis is provided bymultiple copiesof vectors: we can find vectorsx andy such thatx⊀y but still, for somen≥2,x⊗n≺y⊗n; in this case we writex≺My. We have
thus two new order relations on probability vectors, analogues of≤Cstand respectively≤∗st. As before, fory∈Pd, we introduce the sets
Td(y)= {x∈Pds.t.x≺T y}
and
Md(y)= {x∈Pds.t.x≺My}.
It turns out that the relations≺T and≺M (and thus the setsTd(y)andMd(y)) are not as simple as≺andSd(y).
It is known that the inclusionMd(y)⊂Td(y)holds (this is the analogue of Lemma 3.9) and that it can be strict [8].
In general, the setsTd(y)andMd(y)are neither closed nor open, and althoughTd(y)is known to be convex, nothing is known about the convexity ofMd(y)(such questions have been intensively studied in the physical literature; see [4,6] and the references therein). As explained in [1] it is natural from a mathematical point of view to introduce the setsT<∞(y)=
d∈NTd(y)andM<∞(y)=
d∈NMd(y). A key notion in characterizing them isSchur-convexity:
Definition 4.6. A functionf:Pd→Ris said to be
• Schur-convex iff (x)≤f (y)wheneverx≺y,
• Schur-concave iff (x)≥f (y)wheneverx≺y,
• strictly Schur-convex iff (x) < f (y)wheneverxy,
• strictly Schur-concave iff (x) > f (y)wheneverxy, wherexy meansx≺yandx↓=y↓.
Examples are provided as follows: ifΦ:R→Ris a (strictly) convex/concave function, then the following function h:Pd→Rdefined byh(x1, . . . , xd)=Φ(x1)+ · · · +Φ(xd)is (strictly) Schur-convex/Schur-concave.
Forx∈Pdandp∈R, we defineNp(x)as Np(x)=
1≤i≤d xi>0
xip.
We will also use the Shannon entropyH
H (x)= − d
i=1
xilogxi.
Note that−H (x)is the derivative ofp→Np(x)atp=1 and thatN0(x)is the number of non-zero components of the vectorx. These functions satisfy the following properties:
(1) Ifp >1,Npis strictly Schur-convex onP<∞. (2) If 0< p <1,Npis strictly Schur-concave onP<∞.
(3) Ifp <0,Np is strictly Schur-convex onPdfor anyd. However, forp <0, it is not possible to compare vectors with a different number of non-zero components.
(4) H is strictly Schur-concave onP<∞.
One possible way of describing the relations≺Mand≺T is to find a family (the smallest possible) of Schur-convex functions which characterizes them. In this direction, Nielsen conjectured the following result:
Conjecture 4.7. Fix a vectory∈Pd,with non-zero coordinates.ThenTd(y)=Md(y)and they both are equal to the set ofx∈Pdsatisfying:
(C1) Forp≥1,Np(x)≤Np(y).
(C2) For0< p≤1,Np(x)≥Np(y).
(C3) Forp <0,Np(x)≤Np(y).
Here, the closures are taken inRd (recall that neitherMd(y)norTd(y)is closed). By the previous remarks, any vector inTd(y)orMd(y)(and by continuity, also in the closures) must satisfy conditions (C1)–(C3). Recently, Turgut [17] provided a complete characterization of the setTd(y), which implies in particular that Nielsen’s conjecture is true forTd(y). His method, completely different from ours, consists in solving a discrete approximation of the problem using elementary algebraic techniques. Note however that the inclusionMd(y)⊂Td(y)is strict in general, and thus the characterization ofMd(y)is still open. We shall now focus on the setMd(y). Conjecture 4.7 can be reformulated as follows: ifx, y∈Pdand satisfy (C1)–(C3), then there exists a sequence(xn)inMd(y)such that(xn)converges to x. If we relax the condition thatxnandy have the same dimension, we can prove the following two theorems.
Theorem 4.8. Ifx, y∈Pdand satisfy(C1),then there exists a sequence(xn)inM<∞(y)such that(xn)converges to xin1-norm.
Theorem 4.9. Ifx, y∈Pdand satisfy(C1)–(C2),then there exists a sequence(xn)inMd+1(y)such that(xn)con- verges tox.
SinceMd(y)⊂Td(y), both theorems have direct analogues forT<∞(y)and respectivelyTd+1(y). Theorem 4.8 restates the authors’ previous result in [1]; however, the proof presented in the next section is more transparent than the previous one. Theorem 4.9 answers a question of [1]. It is an intermediate result between Theorem 4.8 and Con- jecture 4.7.
5. Proof of the theorems
We show here how to derive Theorems 4.8 and 4.9. We first state a proposition which is the translation of Proposi- tion 2.5 in terms of majorization.
Proposition 5.1. Letx, y∈P<∞.Assume that x and y have non-zero coordinates,and respective dimensions dx anddy.Assume that:
(1) xmin< ymin. (2) xmax< ymax. (3) H (x) > H (y).
(4) Np(x) < Np(y)for allp∈ ]1,+∞[.
(5) Np(x) > Np(y)for allp∈ ] − ∞,1[.
Then there exists an integerN such that for alln≥N,we havex⊗n≺y⊗n.
It is important to notice that sinceN0(x)=dxandN0(y)=dy, the conditions of the proposition can be satisfied only whendx> dy. This is the main reason why our approach fails to prove Conjecture 4.7.
Proof. One checks that the probability measuresμxandμyassociated to the vectorsxandy satisfy the hypotheses of Proposition 2.5. Indeed, forp∈R, one has
Np(x)=
eλdμx, withλ=p−1.
As μ∗xn=μx⊗n, there exists a integer N such that for n≥N, we have μx⊗n ≤stμy⊗n. It remains to apply the
Lemma 4.3 in order to complete the proof.
The main idea used in the following proofs is to slightly modify the vectorxso that the couple (x,y) satisfies the hypotheses of Proposition 5.1.