• Aucun résultat trouvé

Entropic projections and dominating points

N/A
N/A
Protected

Academic year: 2022

Partager "Entropic projections and dominating points"

Copied!
39
0
0

Texte intégral

(1)

DOI:10.1051/ps/2009003 www.esaim-ps.org

ENTROPIC PROJECTIONS AND DOMINATING POINTS

Christian L´ eonard

1

Abstract. Entropic projections and dominating points are solutions to convex minimization problems related to conditional laws of large numbers. They appear in many areas of applied mathematics such as statistical physics, information theory, mathematical statistics, ill-posed inverse problems or large deviation theory. By means of convex conjugate duality and functional analysis, criteria are derived for the existence of entropic projections, generalized entropic projections and dominating points.

Representations of the generalized entropic projections are obtained. It is shown that they are the

“measure component” of the solutions to some extended entropy minimization problem. This approach leads to new results and offers a unifying point of view. It also permits to extend previous results on the subject by removing unnecessary topological restrictions. As a by-product, new proofs of already known results are provided.

Mathematics Subject Classification. 60F10, 60F99, 60G57, 46N10.

Received January 11, 2008. Revised January 12, 2009.

1. Introduction

Entropic projections and dominating points are solutions to convex minimization problems related to con- ditional laws of large numbers. They appear in many areas of applied mathematics such as statistical physics, information theory, mathematical statistics, ill-posed inverse problems or large deviation theory.

1.1. Conditional laws of large numbers

Suppose that the empirical measures

Ln:= 1 n

n i=1

δZi, n≥1, (1.1)

of the Z-valued random variables Z1, Z2, . . .z is the Dirac measure at z) obey a Large Deviation Principle (LDP) in the setPZ of all probability measures onZ with the rate functionI.This approximately means that P(Ln ∈ A)

n→∞exp[−ninfP∈AI(P)] forA ⊂PZ.With regular enough subsetsAandC ofPZ,we can expect that for “all”A

n→∞lim P(Ln ∈ A |Ln ∈ C) =

1, ifA P

0, otherwise (1.2)

Keywords and phrases. Conditional laws of large numbers, random measures, large deviations, entropy, convex optimization, entropic projections, dominating points, Orlicz spaces.

1 Modal-X, Universit´e Paris Ouest, Bˆat. G, 200 av. de la R´epublique, 92000 Nanterre, France;leonard@u-paris10.fr

Article published by EDP Sciences c EDP Sciences, SMAI 2010

(2)

wherePis a minimizer ofIonC. To see this, remark that (formally)P(Ln ∈ A |Ln ∈ C)

n→∞exp[−n(infP∈A∩C

I(P)−infP∈CI(P))].A rigorous statement is given at Theorem7.1.

If (Z1, . . . , Zn) is exchangeable for eachn, then (1.2) is equivalent to L(Z1|Ln∈ C) −→

n→∞P

which means that conditionally onLn∈ C,the law of any tagged “particle” (here we chose the first one) tends toP asntends to infinity.

IfI is strictly convex andC is convex,P is unique and (1.2) roughly means that conditionally onLn ∈ C, as ntends to infinityLn tends to the solutionP of the minimization problem

minimizeI(P)subject to P∈ C, P ∈PZ. (1.3)

Such conditional Laws of Large Numbers (LLN) appear in information theory and in statistical physics where they are often called Gibbs conditioning principles (see [9], Sect. 7.3 and the references therein). If the variables Zi are independent and identically distributed with law R, the LDP for the empirical measures is given by Sanov’s theorem and the rate function Iis the relative entropy

I(P|R) =

Zlog dP

dR

dP, P∈PZ.

Instead of the empirical probability measure of a random sample, one can consider another kind of random measure. Let z1, z2, . . . be deterministic points inZ such that the empirical measure n1n

i=1δzi converges to R∈PZ.LetW1, W2, . . . be a sequence ofindependent random real variables. The random measure of interest is

Ln= 1 n

n i=1

Wiδzi (1.4)

where the Wi’s are interpreted as random weights. If the weights are independent copies of W, as ntends to infinity, Ln tends to the deterministic measure EW.R and obeys the LDP in the spaceMZ of measures onZ with rate function I(Q) =

Zγ(dQdR) dR, Q∈MZ where γ is the Cram´er transform of the law of W.In case theWi’s are not identically distributed, but have a law which depends (continuously) onzi,one can again show that under additional assumptionsLn obeys the LDP inMZ with rate function

I(Q) = Zγz(dQdR(z))R(dz), ifQ≺R

+∞, otherwise, Q∈MZ (1.5)

whereγzis the Cram´er transform ofWz.Asγ is the convex conjugate of the log-Laplace transform ofW,it is a convex function: I is a convex integral functional which is often called anentropy. Again, conditional LLNs hold forLn and lead to the entropy minimization problem:

minimize I(Q)subject toQ∈ C, Q∈MZ. (1.6)

The large deviations of these random measures and their conditional LLNs enter the framework of Maximum Entropy in the Mean (MEM) which has been studied among others by Csisz´ar et al. [7], Dacunha-Castelle and Gamboa [8], Gamboa and Gassiat [12] and Najim [21] and also [9] (Thm. 7.2.3). This problem also arises in the context of statistical physics. It has been studied among others by Boucher et al. [3] and Ellis et al. [11].

The relative entropy corresponds toγ(t) =tlogt−t+ 1 in (1.5): the problem (1.3) is a special case of (1.6) with the additional constraint thatQ(Z) = 1.

(3)

In this paper, the constraint setC is assumed to beconvex as is almost always done in the literature on the subject. This allows to rely on convex analysis, saddle-point theory or on the geometric theory of projection on convex sets.

The convex indicator of a subsetAofX is denoted by ιA(x) =

0, ifx∈A

+∞, otherwise, x∈X. (1.7)

Note that althoughιAis a convex function if and only ifAis a convex set, we call it aconvex indicator in any case to differentiate it from the probabilistic indicator1A.

1.2. Examples

We give a short list of popular entropies. They are described at (1.5) and characterized by their integrandγ. (1) The relative entropy: γ(t) =tlogt−t+ 1 +ι{t≥0}.

(2) The reverse relative entropy: γ(t) =t−logt−1 +ι{t>0}.

(3) The Fermi-Dirac entropy: γ(t) =12[(1 +t) log(1 +t) + (1−t) log(1−t)] +ι{−1≤t≤1}. (4) TheLp norm (1< p <∞): γ(t) =|t|p/pand

(5) theLp entropy (1< p <∞): γ(t) =tp/p+ι{t≥0}. Note that the reverse relative entropy isP →I(R|P).

In the case of the relative and reverse relative entropies, the global minimizer isRand one can interpretI(P|R) andI(R|P) as some type of distances betweenP andR.Note also that the positivity of the minimizers of (1.6) is guaranteed by<∞} ⊂[0,∞).Consequently, the restriction that P is a probability measure is insured by the only unit mass constraintP(Z) = 1.

A very interesting Bayesian interpretation of (1.6) is obtained in [8,12] in terms of the LDPs forLn defined at (1.4). The above entropies correspond to iid weightsW distributed as follows:

(1) the relative entropy: Law(W) = Poisson(1),

(2) the reverse relative entropy: Law(W) = Exponential(1), (3) the Fermi-Dirac entropy: Law(W) = (δ−1+δ+1)/2, (4) theL2 norm: Law(W) = Normal(0,1).

1.3. Entropic and generalized entropic projections

The minimizers of (1.6) are called entropic projections. It may happen that even if the minimizer is not attained, any minimizing sequence converges to some measureQ which does not belong to C. This intriguing phenomenon was discovered by Csisz´ar [5]. Such aQ is called ageneralized entropic projection.

In the special case whereIis the relative entropy, Csisz´ar has obtained existence results in [4] together with dual equalities. His proofs are based on geometric properties of the relative entropy; no convex analysis is needed. Based on the same geometric ideas, he obtained later in [5] a powerful Gibbs conditioning principle for noninteracting particles. For general entropies as in (1.5), he studies the problem of existence of entropic and generalized entropic projections in [6].

The minimization problem (1.6) is interesting in its own right, even when conditional LLNs are not at stake.

The literature on this subject is huge, for instance see [13,29] and the references therein.

1.4. The extended minimization problem

As will be seen, it is worth introducing anextensionof (1.6) to take thegeneralized projections into account.

A solution to (1.6) is in AZ: the space of all Q∈MZ which are absolutely continuous with respect toR.We consider an extension ¯I of the entropyI to a vector space LZ which is the direct sum LZ =AZ⊕SZ of AZ and a vector space SZ ofsingular linear forms (acting on numerical functions) which may not be σ-additive.

(4)

Anyin LZ is uniquely decomposed into=a+switha ∈AZ ands∈SZ and ¯Ihas the following shape I() =¯ I(a) +Is(s)

where Is is a positively homogeneous function onSZ. See (2.9) for the precise description of ¯I.For instance, the extended relative entropy is

I(|R) =¯ I(a|R) + sup

s, u;u,

ZeudR < , ∈LZ and actually s, u= 0 for any such that ¯I()<∞and anyusuch that

Zea|u|dR < forall a >0.The reverse entropy,L1-norm and L1-entropy also admit nontrivial extensions. On the other hand, the extensions of the Fermi-Dirac,Lp-norm andLp-entropy with p >1 are trivial: {k∈SZ;Is(k)<∞}={0}.

The extended problem is

minimize I()¯ subject to∈ C, ∈LZ. (1.8)

In fact, ¯I is chosen to be the largest convex lower semicontinuous extension ofI to LZ with respect to some weak topology. This guarantees tight relations between (1.6) and (1.8). In particular, one can expect that their values are equal for a large class of convex setsC.

Even ifI is strictly convex, ¯Iisn’t strictly convex in general sinceIsis positively homogeneous, so that (1.8) may admit several minimizers.

Examples will be given of interesting situations where (1.6) is not attained in AZ while (1.8) is attained in LZ.

1.5. Dominating points

Let the constraint setC be described by

C={Q∈MZ;T Q∈C} (1.9)

where C is a subset of a vector spaceX and T :MZ → X is a linear operator. As a typical example, one can think of

T Q=

Zθ(z)Q(dz) (1.10)

where θ:Z → X is some function and the integral should be taken formally for the moment. WithLn given at (1.1) or (1.4), ifTis regular enough, we obtain by the contraction principle thatXn:=T Ln= 1nn

i=1θ(Zi) X orXn :=T Ln=n1n

i=1Wiθ(zi) obeys the LDP inX with rate functionJ(x) = inf{I(Q);Q∈MZ, T Q=x}, x∈ X.Once again, the conditional LLN forXn is of the form: For “all”A⊂ X,

n→∞lim P(Xn∈A|Xn ∈C) =

1, ifAx 0, otherwise wherex is a solution to the minimization problem

minimizeJ(x) subject tox∈C, x∈ X. (1.11)

The minimizers of (1.11) are calleddominating points. This notion was introduced by Ney [22,23] in the special case where (Zi)i≥1 is an iid sequence in Z =Rd andθ is the identity, i.e. Xn = 1nn

i=1Zi. Later, Einmahl and Kuelbs [10,14] have extended this study to a Banach spaceZ. In this iid case,J is the Cram´er transform of the law ofZ1.

(5)

1.6. Presentation of the results

We treat the problems of existence of entropic projections and dominating points in a unified way, taking advantage of the mappingT Q=x.Hence, we mainly concentrate on the entropic projections and then transport the results to the dominating points.

It will be proved at Proposition5.4that the entropic projection exists onCif the supporting hyperplanes ofC are directed by sufficiently integrable functions. In some cases of not enough integrable supporting hyperplanes, the representation of the generalized projection is still available and given at Theorem5.6.

It will appear that the generalized projection is the “measure” part of the minimizers of (1.8). For instance, with the relative entropyI(.|R),the projection exists inCif its supporting hyperplanes are directed by functions usuch that

Zeα|u|dR <forall α >0,see Proposition5.7. If theseuonly satisfy

Zeα|u|dR <forsome α >0, the projection may not exist inC, but the generalized projection is computable: its Radon-Nykodym derivative with respect toR is characterized at Proposition5.8.

We find again some already known results of Csisz´ar [5,6], Einmahl and Kuelbs [10,14] with different proofs and a new point of view. The representations of the generalized projections are new results. The conditions on C to obtain dominating points are improved and an interesting phenomenon noticed in [14] is clarified at Remark 6.8by connecting it with the generalized entropic projection.

Finally, a probabilistic interpretation of the singular components of the generalized projections is proposed at Section7. It is obtained in terms of conditional LLNs.

The main results are Theorems4.1,4.4,5.6and6.7.

1.7. Outline of the paper

At Section2, we give precise formulations of the entropy minimization problems (1.6) and (1.8). Then we recall at Theorems 2.7 and 2.8 results from [19] about the existence and uniqueness of the solutions of these problems, related dual equalities and the characterizations of their solutions in terms of integral representations.

Examples of standard entropies and constraints are presented at Section3.

We show at Theorem4.4in Section4that under “critical” constraints, although the problem (1.6) may not be attained, its minimizing sequences may converge in some sense to some measureQ: thegeneralized entropic projection.

Section 5 is mainly a restatement of Sections2and 4 in terms of entropic projections. The results are also stated explicitly for the special important case of the relative entropy.

Section 6is devoted to dominating points. As they are continuous images of entropic projections, the main results of this section are corollaries of the results of Section5.

At Section7, we illustrate our results in terms of conditional LLNs, see (1.2). In particular, the generalized projections and singular components of the minimizers of (1.8) are interpreted in terms of these conditional LLNs.

1.8. Notation

LetX andY be topological vector spaces. The algebraic dual space ofX isX, the topological dual space ofX isX .The topology ofX weakened byY isσ(X, Y) and we writeX, Yto specify that X and Y are in separating duality.

Letf :X [−∞,+∞] be an extended numerical function. Its convex conjugate with respect to X, Yis f(y) = sup

x∈X{x, y −f(x)} ∈[−∞,+∞], y∈Y. (1.12) Its subdifferential atxwith respect toX, Yis

Yf(x) ={y∈Y;f(x+ξ)≥f(x) +y, ξ,∀ξ∈X}.

If no confusion occurs, we write ∂f(x).

(6)

LetAbe a subset ofX,its intrinsic core is icorA={x∈A;∀x affA,∃t >0,[x, x+t(x −x)[⊂A} where affAis the affine space spanned byA.Let us denote domf ={x∈X;f(x)<∞}the effective domain off and icordomf the intrinsic core of domf.

The convex indicatorιA of a subsetA ofX is defined at (1.7).

We write

Iϕ(u) :=

Zϕ(z, u(z))R(dz) =

Zϕ(u) dR andI=Iγ for short, instead of (1.5).

The inf-convolution off andg isfg(z) = inf{f(x) +g(y);x, y:x+y=z}. An index of notation is provided at the end of the article.

2. Minimizing entropy under convex constraints

In this section, the main results of [19] are recalled. They are Theorems2.7and2.8below and their statements necessitate the notion of Orlicz spaces. First, we recall basic definitions and notions about these function spaces which are natural extensions of the standardLpspaces.

2.1. Orlicz spaces

The fact that the generalized projection may not belong to C is connected with some properties of Orlicz spaces associated toI.Let us recall some basic definitions and results. A standard reference is [24].

A setZ is furnished with aσ-finite nonnegative measureR on aσ-field which is assumed to beR-complete.

A function ρ : Z ×Ris said to be a Young function if forR-almost every z, ρ(z,·) is a convex even [0,∞]- valued function on R such that ρ(z,0) = 0 and there exists a measurable function z sz > 0 such that 0< ρ(z, sz)<∞.

In the sequel, every numerical function onZ is supposed to be measurable.

Definition 2.1(the Orlicz spacesLρandEρ). TheOrlicz space associated withρis defined byLρ={u:Z → R;uρ<+∞}where the Luxemburg norm · ρ is defined by

uρ= inf

β >0 ;

Zρ(z, u(z)/β)R(dz)≤1 andR-a.e. equal functions are identified. Hence,

Lρ=

u:Z →R;∃αo>0,

Zρ

z, αou(z)

R(dz)<∞ .

A subspace of interest is

Eρ =

u:Z →R;∀α >0,

Z

ρ

z, αu(z)

R(dz)<∞ .

Takingρp(z, s) =|s|p/pwith p≥1 givesLρp =Eρp =Lp and the corresponding Luxemburg norm is · ρp= p−1/p · p where · pis the usualLpnorm.

The convex conjugate ρ of a Young function is still a Young function, so thatLρ is also an Orlicz space.

H¨older’s inequality in Orlicz spaces is

uv12uρvρ, u∈Lρ, v∈Lρ. (2.1) For instance, withρp(t) =|t|q/q,1/p+1/q= 1,(2.1) isuv12p−1/pq−1/qupvq.Note that 2p−1/pq−1/q 1 with equality when p= 2.

(7)

Theorem 2.2(representation ofEρ). Suppose thatρis a finite Young function. Then, the dual space ofEρ is isomorphic to Lρ, the Orlicz space associated with the Young functionρ which is the convex conjugate ofρ.

A continuous linear form∈Lρis said to besingular if for allu∈Lρ,there exists a decreasing sequence of measurable sets (An) such thatR(∩nAn) = 0 and for alln≥1,, u1Z\An= 0.

Proposition 2.3. Let us assume that ρis finite. Then, ∈Lρ is singular if and only if , u= 0, for all u in Eρ.

Let us denote respectivelyLρR ={f R;f ∈Lρ} and Lsρ the subspaces ofLρ of all absolutely continuous and singular forms.

Theorem 2.4(representation ofLρ). Let ρbe any Young function. The dual space ofLρ is isomorphic to the direct sumLρ=LρR⊕Lsρ. This implies that any∈Lρ is uniquely decomposed as

=a+s (2.2)

with a∈LρR ands∈Lsρ.

In the decomposition (2.2),a is called theabsolutely continuous part ofwhilesis itssingular part.

The functionρis said to satisfy the Δ2-condition if

there existκ >0, so0 such that∀s≥so, z∈ Z, ρz(2s)≤κρz(s). (2.3) When R is bounded, in order that Eρ =Lρ, it is enough that ρ satisfies the Δ2-condition. Consequently, in this situation we haveLρ=LρR so thatLsρ reduces to the null vector space.

We shall see below that the important case of the relative entropy leads to a Young functionρwhich doesn’t satisfy the Δ2-condition; non-trivial singular components will appear.

2.2. The assumptions

LetRbe a positive measure on a spaceZand take a [0,∞]-valued measurable functionγonZ ×Rto define I as at (1.5). In order to define the constraint in a way similar to (1.10), take a function

θ:Z → Xo

whereXois the algebraic dual space of some vector spaceYo.Let us collect the assumptions onR, γ, θandC.

2.3. Assumptions (A).

(AR) It is assumed that the reference measureR is a bounded positive measure on a spaceZ endowed with someR-completeσ-field.

(AC) C is a convex subset ofXo. (Aγ) Assumptions on γ.

(1) γ(·, t) isz-measurable for all t and forR-almost every z∈ Z, γ(z,·) is a lower semicontinuous strictly convex [0,+∞]-valued function onR.

(2) It is also assumed that for R-almost every z ∈ Z, γ(z,·) attains a unique minimum denoted bym(z), the minimum value is γz(m(z)) = 0, ∀z ∈ Z and there exist a(z), b(z) >0 such that 0< γ(z, m(z) +a(z))<∞and 0< γ(z, m(z)−b(z))<∞.

(3)

Zγ((1 +α)m) dR+

Zγ((1−α)m) dR <∞,for some α >0.

The functionγz is the convex conjugate of the lower semicontinuous convex functionγz=γz∗∗.Defining λ(z, s) =γ(z, s)−m(z)s, z∈ Z, s∈R, (2.4) we see that for R-a.e.z, λz is a nonnegative convex function and it vanishes ats= 0.

(8)

(Aθ) Assumptions on θ.

(1) for anyy∈ Yo, the functionz∈ Z → y, θ(z) ∈Ris measurable;

(2) for anyy∈ Yo, y, θ(·)= 0, R-a.e. implies thaty= 0;

(∃) ∀y∈ Yo,∃α >0,

Zλ(αy, θ) dR <∞.

Since

λ(z, s) = max[λ(z, s), λ(z,−s)]∈[0,∞], z∈ Z, s∈R (2.5) is a Young function, one can consider the corresponding Orlicz spacesLλ andLλ whereλ(z,·) is the convex conjugate ofλ(z,·).

Remark 2.5 (some comments about these assumptions).

(1) AssumingR to beσ-finite is standard when working with integral functionals. Here, R is assumed to be bounded for simplicity.

(2) Thanks to the theory of normal integrands [25], γ=γ∗∗ is jointly measurable.

The function m is measurable since its graph {(z, u) : u = m(z)} is measurable. Indeed, it is the (z, u)-projection of the intersection of the horizontal hyperplane u = 0 with the measurable graph {(t, z, u) :γz(t) =u}ofγ.Note that we use the uniqueness assumption in (A2γ) to avoid a measurable selection argument.

As a consequence,λis a jointly measurable function.

(3) The assumption (A3γ) implies thatmis in the Orlicz space Lλ

, see (2.5). In turn, the centered form −mR is inLλ wheneveris. This will be used later without warning. It also follows from (2.4) that γz(t) =λz(t−m(z)).These two facts allow us to write

Iγ(f R) =Iλ(f R−mR), f ∈Lλ

whereλz is the convex conjugate ofλz.

(4) The assumption (Aθ) is equivalent to y, θ(·) ∈Lλ for ally∈ Yo,see (2.7) below.

2.4. Examples

We consider the entropies which were introduced after (1.7) and give the corresponding functionsλandλ. (1) The relative entropy: λ(s) = es−s−1, λ(s) =λ(|s|).

(2) The reverse relative entropy: λ(s) =−log(1−s)−s+ι{s<1}, λ(s) =λ(|s|).

(3) The Fermi-Dirac entropy: λ(s) =λ(s) = log coshs.

(4) TheLp norm (1< p <∞): λ(s) =λ(s) =|s|q/q with 1/p+ 1/q= 1 and (5) theLp entropy (1< p <∞): λ(s) =

0 ifs≤0

sq/q ifs≥0, λ(s) =|s|q/q.

2.5. The entropy minimization problems (P

C

) and (P

C

)

Because of the Remark2.5(3), under (A3γ) the effective domain ofI given at (1.5) is included inLλR and the entropy functional to be considered is defined by

I(f R) =

Zγ(f) dR, f ∈Lλ. (2.6)

Assuming (Aθ), that is

Yo, θ(·) ⊂Lλ, (2.7)

H¨older’s inequality in Orlicz spaces, see (2.1), allows to define the constraint operatorT : ∈Lλ → θ, ∈

Xo by:

y,θ,

Yo,Xo

=

y, θ,

Lλ,Lλ,∀y∈ Yo. (2.8)

If=Q∈LλR⊂MZ,one writesT Q=θ, Q=

ZθdQto mean (2.8); this is aXo-valued weak integral.

(9)

Example 2.6 (moment constraint). This is the easiest constraint to think of. Let θ = (θk)1≤k≤K be a measurable function fromZ to Xo=RK.The moment constraint is specified by the operator

Zθd=

Zθkd

1≤k≤K RK,

which is defined for each∈MZ which integrates all the real valued measurable functions θk. The minimization problem (1.6) becomes

minimize I(Q)subject to

ZθdQ∈C, Q∈Lλ

R (PC)

whereC is a convex subset ofXo.The extended entropy is defined by

I() =¯ I(a) +Is(s), ∈Lλ (2.9) where, using the notation of Theorem2.4,

Is(s) =ιdomIγ(s) = sup{s, u;u∈Lλ, Iγ(u)<∞} ∈[0,∞]. (2.10) It is proved in [18] that ¯Iis the greatest convexσ(Lλ, Lλ)-lower semicontinuous extension ofItoLλ ⊃Lλ

R.

The associated extended minimization problem is

minimize I()¯ subject to θ, ∈C, ∈Lλ. (PC)

2.6. Good and critical constraints

If the Young functionλ doesn’t satisfy the Δ2-condition (2.3) as for instance with the relative entropy and the reverse relative entropy (see items (1) and (2) of the examples above), thesmall Orlicz space Eλ may be a proper subset ofLλ.Consequently, for some functionsθ,the integrability property

Yo, θ(·) ⊂Eλ (2.11)

or equivalently

∀y∈ Yo,

Zλ(y, θ) dR <∞ (Aθ)

may not be satisfied while the weaker property

∀y∈ Yo,∃α >0,

Zλ(αy, θ) dR <∞ (Aθ)

which is equivalent to (2.7), holds. In this situation, analytical complications occur, see Section 4. This is the reason why constraint satisfying (Aθ) are calledgood constraints, while constraints satisfying (Aθ) but not (Aθ) are called critical constraints.

2.7. Definitions of Y , X , T

, Γ

and (D

C

)

These objects will be necessary to state the relevant dual problems. The general hypotheses (A) are assumed.

The spaceY.Because of the hypotheses (A2θ) and (Aθ),Yocan be identified with the subspaceYo, θ(·)ofLλ. The spaceY is the extension ofYo which is isomorphic to the · λ-closure ofYo, θ(·)inLλ.

(10)

The spaceX. The topological dual space ofY isX =Y ⊂ Xo.X is identified withLλ/kerT.

Sinceλ is finite, we can apply Proposition2.3. It tells us that under the assumption (Aθ),T Lsλ ={0}so that X ∼=Lλ

R/kerT.

The operatorT.Let us define the adjointT:X→Lλ for allω∈ X by: , TωLλ

,Lλ =T , ωX,X,∀∈ Lλ. We have the inclusions Yo ⊂ Y ⊂ X. The adjoint operatorT is the restriction ofT to Yo. With some abuse of notation, we still denoteTy=y, θfory∈ Y.This can be interpreted as a dual bracket betweenXo andXosinceTy=˜y, θR-a.e. for some ˜y∈ Xo.

The functionΓ.The basic dual problem associated with (PC) and (PC) is maximize inf

x∈Cy, x −Γ(y), y ∈ Yo

where

Γ(y) =Iγ(y, θ), y∈ Yo. (2.12)

Let us denote

Γ(x) = sup

y∈Yo

{y, x −Iγ(y, θ)}, x∈ Xo (2.13) its convex conjugate. It is shown in [16] (Sect. 4), that dom Γ ⊂ X.

The dual problem (DC).Another dual problem associated with (PC) and (PC) is maximize inf

x∈C∩Xy, x −Iγ(y, θ), y∈ Y. (DC)

2.8. Solving (P

C

)

In this section, we study (PC) under the good constraint hypothesis (Aθ) which imposes that TY ⊂ Eλ andT(Lλ

R)⊂ X.

The extended dual problem (DC).The extended dual problem associated with (PC) is maximize inf

x∈C∩Xω, x −Iγ(ω, θ), ω∈Y (DC) whereY ⊂ X is some cone which containsYo.Its exact definition is given at Appendix A.

Theorem 2.7. Suppose that

(1) the hypotheses (A)and (Aθ)are satisfied;

(2) the convex set C is assumed to be such that T−1C∩LλR=

y∈Y

f R∈LλR;

Zy, θfdR≥ay (2.14) for some subsetY ∈ Xo with y, θ ∈Eλ for all y ∈Y and some function y∈Y →ay R. In other words, T−1C∩Lλ

R is aσ(Lλ

R, Eλ)-closed convex subset ofLλ R.

Then:

(a) The dual equality for (PC)is

inf(PC) = sup(DC) = sup(DC) = inf

x∈CΓ(x)[0,∞].

(b) IfC∩dom Γ=∅or equivalentlyC∩TdomI=∅,then (PC)admits a unique solutionQ inLλRand any minimizing sequence(Qn)n≥1 converges toQ with respect to the topology σ(LλR, Lλ).

(11)

R

Q Q T−1C

Tω˜ I=const.

Figure 1.

Suppose that in addition C∩icordom Γ=∅or equivalently C∩icor (TdomI)=∅. (c) Let us define xˆ:=

ZθdQ in the weak sense with respect to the duality Yo,Xo. There exists ω˜ ∈Y such

that

(a) xˆ∈C∩dom Γ

(b) ˜ω,xˆ X,X≤ ˜ω, xX,X,∀x∈C∩dom Γ (c) Q(dz) = γzω, θ(z))R(dz).

(2.15) Furthermore,Q∈LλR andω˜ ∈Ysatisfy (2.15)if and only ifQ solves (PC)andω˜ solves (DC).

(d) Of course, (2.15)c implies

ˆ x=

Zθγ (ω, θ˜ ) dR (2.16)

in the weak sense. Moreover, 1. xˆ minimizesΓ onC;

2. I(Q) = Γ x) =

Zγ◦γω, θ) dR <∞; and 3. I(Q) +

Zγ(ω, θ˜ ) dR=

Zω, θ˜ dQ.

Proof. This result is [19] (Thm. 3.2).

Figure 1 illustrates the items (a) and (b) of (2.15) which, with TQ = ˆx, T Q = x∈ C, can be rewritten:

Q∈T−1C andTω, Q˜ −Q ≥ 0,∀Q∈T−1C: the shaded area.

AsR is the unconstrained minimizer ofI,the level sets of I form an increasing family of convex sets which expends from R. One sees that the hyperplane{Q : Tω, Q˜ −Q = 0} separates the convex set T−1C and the convex level set of I which is “tangent” to T−1C. Informally, the gradient of I at Q is a normal vector of T−1C at Q: it must be a multiple of Tω.˜ Expressing this orthogonality relation by means of the convex conjugation (2.13) leads to the representation formula (2.15)(c). Clearly, Hahn-Banach theorem has something to do with the existence of Tω.˜ This picture is a guide for the proof in [19].

Following the terminology of Ney [22,23], as it shares the properties (2.15)(a,b) and (2.16), the minimizer ˆx is called adominating point ofC with respect to Γ (see Def.6.1below).

2.9. Solving (P

C

)

In this section, we study (PC) under the critical constraint hypothesis (Aθ) which imposes thatTY ⊂Lλ. By Theorem2.4, we haveL ρ = [Lρ⊕Lsρ]⊕Ls ρ.For anyζ∈L ρ = (LρR⊕Lsρ),let us denote the restrictions ζ1=ζ|LρR andζ2=ζ|Ls

ρ.Since, (LρR) Lρ⊕Lsρ,we see that anyζ∈L ρ is uniquely decomposed into

ζ=ζ1a+ζ1s+ζ2 (2.17)

withζ1=ζ1a+ζ1s∈Lρ, ζ1a∈Lρ, ζ1s∈Lsρ andζ2∈Ls ρ.

(12)

The extended dual problem (DC).The extended dual problem associated with (PC) is maximize inf

x∈C∩Xω, x −Iγ [Tω]a1

, ω∈ Y (DC)

whereY ⊂ X is some cone which containsYo.Its exact definition is given at Appendix A.

Theorem 2.8. Suppose that

(1) the hypotheses (A)are satisfied,

(2) the convex set C is assumed to be such that T−1C∩Lλ =

y∈Y

∈Lλ;y, θ, ≥ay

(2.18) for some subsetY ⊂ Xo with y, θ ∈Lλ for all y ∈Y and some function y ∈Y →ay R. In other words, T−1C is aσ(Lλ

, Lλ)-closed convex subset ofLλ

. Then:

(a) The dual equality for (PC)is

inf(PC) = inf

x∈CΓ(x) = sup(DC) = sup(DC)[0,∞].

(b) If C∩dom Γ = or equivalently C∩Tdom ¯I =∅, then (PC) admits solutions in Lλ, any minimizing sequence admitsσ(Lλ, Lλ)-cluster points and every such point is a solution to (PC).

Suppose that in addition we have C∩icordom Γ=∅ or equivalently C∩icor (Tdom ¯I)=∅. Then:

(c) Let ˆ∈Lλ be any solution to (PC)and denote xˆ:=Tˆ.There existsω¯ ∈ Y such that

⎧⎨

(a) xˆ∈C∩dom Γ

(b) ω,¯ xˆX,X≤ ω, x¯ X,X,∀x∈C∩dom Γ (c) ˆ∈γz([Tω]¯ a1)R+D([Tω]¯2)

(2.19)

where we used notation (2.17) and D([Tω]¯ 2) is some cone in Lsλ which is pointed at zero and whose direction depends on [Tω]¯2 (see Appendix A for its precise definition).

There exists someω˜ ∈ Xo such that

[Tω]¯ a1=ω, θ(˜ ·)Xo,Xo

is a measurable function which can be approximated in some sense (see Appendix A) by sequences (yn, θ(·))n≥1 with yn∈ Yo and

Zλ(yn, θ)dR <∞for all n.

Furthermore,ˆ∈Lλ andω¯ ∈ Y satisfy (2.19)if and only ifˆsolves (PC)andω¯ solves (DC).

(d) Of course, (2.19)c implies xˆ=

Zθγω, θ) dR+θ,ˆs. Moreover, 1. xˆ minimizesΓ onC,

2. I(ˆ¯) = Γx) =

Zγ◦γω, θ) dR+ sup{u,ˆs;u∈domIγ}<∞ and 3. I(ˆ¯) +

Zγ(ω, θ˜ ) dR=

Zω, θ˜ a+[Tω]¯ 2,ˆs.

Proof. This result is [19] (Thm. 4.2).

The exact definitions of Y, Y and D as well as the precise statement of Theorem 2.8(c) are given at Appendix A. In particular, the complete statement of Theorem2.8(c) is given at TheoremA.1.

Figure2 illustrates (2.19) which, with ˆx=Tˆcan be rewritten: ˆ∈T−1C andTω, ¯ ˆ0,∀∈T−1C.

As in Figure 1, one sees that the hyperplane { : Tω, ¯ ˆ = 0} separates the convex setT−1C and the convex level set of ¯I which is “tangent” to T−1C: the shaded area. The same kind of conclusions follow.

(13)

R

ˆ T−1C

Tω¯ I¯=const.

D

Q

Figure 2.

The measure Q= ˆa =γz([Tω]¯ a1)R is the absolutely continuous part of ˆ; its expression is given at (4.1) below. The set of solutions ˆof (PC) is represented by the bold type segment, they all share the same absolutely continuous partQ.The relation (2.19)(c) can be rewritten: ˆ∈Q+D which is the convex cone with vertex Q and directionD. The coneD only contains singular directions. One sees with (2.9) and (2.10) thatD contributes to the positively homogeneous partIsof ¯I.This is the reason why we decided to draw a flat part for the level lines of ¯I when they cutQ+D.

Figure2is only a guide for illustrating Theorem2.8. It shouldn’t be taken too seriously. For instance a better finite dimensional analogue would have been given by ¯I(x, y, z) =x2+|y+z| which requires a 3D graphical representation.

3. Some examples 3.1. Some examples of entropies

Important examples of entropies occur in statistical physics, probability theory and mathematical statistics.

Among them the relative entropy plays a prominent role.

Relative entropy

The reference measure R is assumed to be a probability measure. The relative entropy of Q MZ with respect to R∈PZ is

I(Q|R) = Zlog dQ

dR

dQ ifQ≺R andQ∈PZ

+∞ otherwise, Q∈MZ. (3.1)

It corresponds toγz(t) =

⎧⎨

tlogt−t+ 1 ift >0

1 ift= 0

+∞ ift <0,

m(z) = 1 and

λz(s) = es−s−1, s∈R, z∈ Z. (3.2)

(14)

A variant

Taking the sameγ and removing the unit mass constraint gives

H(Q|R) = Z dQ

dRlog dQ

dR

dQdR + 1

dR if 0≤Q≺R

+∞, otherwise, Q∈MZ.

This entropy is the rate function of (1.4) when (Wi)i≥1 is an iid sequence of Poisson (1) random weights. IfR isσ-finite, it is the rate function of the LDP of normalized Poisson random measures, see [15].

Extended relative entropy

Since λ(s) = es−s−1 and R ∈PZ is a bounded measure, we have λ(s) =τ(s) := e|s|− |s| −1 and the relevant Orlicz spaces are

Lτ =

f :Z →R;

Z|f|log|f|dR < Eτ =

u:Z →R;∀α >0,

Zeα|u|dR < Lτ =

u:Z →R;∃α >0,

Zeα|u|dR < sinceτ(t) = (|t|+ 1) log(|t|+ 1)− |t|.The extended relative entropy is defined by

I(|R) =¯ I(a|R) + sup

s, u;u,

ZeudR <∞ , ∈Oexp (3.3) where=a+sis the decomposition into absolutely continuous and singular parts ofinLτ =Lτ⊕Lsτ,and

Oexp={∈Lτ;0,,1= 1}. (3.4)

Note thatOexp depends onRand that for all∈Oexp, a ∈PZ∩LτR.

3.2. Some examples of constraints

Let us consider the two standard constraints which are the moment constraints and the marginal constraints.

Moment constraints

Letθ= (θk)1≤k≤K be a measurable function fromZtoXo=RK.The moment constraint is specified by the operator

Zθd=

Zθkd

1≤k≤K RK,

which is defined for each MZ which integrates all the real valued measurable functions θk. The adjoint operator is

Ty=y, θ=

1≤k≤K

ykθk, y= (y1, . . . , yK)RK.

Références

Documents relatifs

For those two problems, we present algorithms that maintain a solution under edge insertions and edge deletions in time O(∆· polylog n) per update, where ∆ is the maximum vertex

In the present paper we are concerned with the norm almost sure convergence of series of random vectors taking values in some linear metric spaces and strong laws of large

Conditional laws of large numbers, random measures, large deviations, en- tropy, convex optimization, entropic projections, dominating points, Orlicz

When we further impose that these boxes are visited by another independent walk, one can sum over all possible centers of the boxes, and this yields some bound on the number of

In a previous work ([10]), we proved that the sequence (µ (N) w ) satisfies a Large Deviation Principle (denoted hereafter LDP), with speed N and good rate function given by

More recently, some authors used another random probability measure based on eigenvalues and eigenvectors ([25], [2], [21]). The latter measure carries more information than the

In many random combinatorial problems, the distribution of the interesting statistic is the law of an empirical mean built on an independent and identically distributed (i.i.d.)

has been obtained in Kabluchko and Wang (2012)... Theorem 1.7 indicates that the construction of polygonal line process ζ n is not adapted to the structure of H¨ older topology