• Aucun résultat trouvé

Inference in phi-families of distributions

N/A
N/A
Protected

Academic year: 2021

Partager "Inference in phi-families of distributions"

Copied!
22
0
0

Texte intégral

(1)

HAL Id: hal-00460631

https://hal.archives-ouvertes.fr/hal-00460631

Submitted on 1 Mar 2010

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Inference in phi-families of distributions

Bruno Pelletier

To cite this version:

Bruno Pelletier. Inference in phi-families of distributions. Statistics A Journal of Theoretical and Applied Statistics, 2011, 45 (3), pp.223-236. �hal-00460631�

(2)

I NFERENCE IN ϕ-F AMILIES OF D ISTRIBUTIONS

Bruno PELLETIER

Institut de Math´ematiques et de Mod´elisation de Montpellier UMR CNRS 5149, Equipe de Probabilit´es et Statistique

Universit´e Montpellier II, CC 051

Place Eug`ene Bataillon, 34095 Montpellier Cedex 5, France pelletier@math.univ-montp2.fr

Abstract

This paper is devoted to the study of the parametric family of multivari- ate distributions obtained by minimizing a convex functional under linear constraints. Under certain assumptions on the convex functional, it is es- tablished that this family admits an affine parametrization, and parametric estimation from an i.i.d. random sample is studied. It is also shown that the members of this family are the limit distributions arising in inference based on empirical likelihood. As a consequence, given a probability measureµ0

and an i.i.d. random sample drawn fromµ0, nonparametric confidence do- mains on the generalized moments ofµ0are obtained.

Index Terms— Parametric statistics, Maximum entropy,ϕ-divergence, em- pirical likelihood, generalized moment.

AMS 2000 Classification: 62F10, 62G05.

1 Introduction

Exponential families of distributions cover a large number of useful distributions and their properties have long been studied. It is well known that an exponential family of distributions may be derived by maximizing the entropy under several moments constraints. The entropy, also called therelative entropyor theShannon entropyI(µ), of a probability measureµon a spaceX is defined by

I(µ) =− Z

X

log dµ dµ0

(x)µ0(dx),

(3)

whereµ0 is a reference measure. In this definition, the entropy may take infinite values whenµis not absolutely continuous with respect toµ0.

The negative entropy, i.e. −I(µ), is a convex functional in its argument µ. Sev- eral types of other (negative) entropy-like convex functionals have been defined and used mainly in the context of linear inverse problems and moments problems (Borwein and Lewis, 1991, 1993a, 1993b; Dacunha-Castelle and Gamboa, 1990;

Decarreauet al, 1992; Gamboa and Gassiat, 1997). In these problems, the objec- tive is to reconstruct an unknown measureµfrom the observationyofgeneralized momentsofµ, orΦ-momentsofµ, i.e., the datayis related toµby

y= Z

X

Φ(x)µ(dx), (1.1)

whereΦis a known map fromX toRk. Recovering the measureµfrom the data y is an ill-posed inverse problem in the sense that a solution may not exist for everyyinRk (e.g., in the case of perturbed data), and if a solution exists, it may not be unique nor may it depend continuously on the data. In the field of inverse problems, regularization methods are very popular to cope with these issues. In particular, regularization by entropy amounts at minimizing a negative entropy- like convex functional Iϕ(µ)over all measures µsubject to the constraint (1.1).

The convex functionalIϕis defined by Iϕ(µ) =

Z

X

ϕ dµ

0

(x)

µ0(dx), (1.2)

whereϕis a convex function onR. Under certain conditions onϕand the datay, Borwein and Lewis (1991, 1993a, 1993b) have shown that the problem of mini- mizingIϕ(µ)subject to the constraint (1.1) admits a unique solutionµˆwhich may be written as

ˆ

µ=ϕ(hω,Φ(x)i)µ0, (1.3) whereϕ is the derivative of the Fenchel-Legendre transform ofϕ, and whereω is a vector of scalar parameters obtained as the unique solution to a dual optimiza- tion problem.

The present paper focuses on the family of probability measures which are in the form of (1.3), further referred to as aϕ-family. These measures also arise as the limit distributions in inference based on empirical likelihood, under certain condi- tions on the functionϕwhich turn the functional (1.2) into aϕ-divergence (Liese

(4)

and Vajda, 1987; Keziou, 2003; Broniatowski and Keziou, 2006, Pardo, 2006).

To see this, let µ0 be a probability measure, and suppose that we are interested inµ0 only through itsΦ-momenty0 = R

X Φ(x)µ0(dx). Basically, the method of empirical likelihood introduced in Owen (1988, 2001) amounts at minimizing the Kullback-Leibler divergence K(µ;Pn)between the empirical measure Pn of the random sample, and a measureµ≪Pnsatisfying the contraints of the model. In this display, the statistic

Tn(y) = infn

K(µ;Pn) : µ≪Pnand Z

X

Φ(x)µ(dx) =yo

(1.4) is used to test fory0as well as to construct a nonparametric confidence domain on y0. Recently, several authors (Keziou, 2003; Broniatowski, 2004; Bertail, 2006;

Browniatowski and Keziou, 2006) have proposed to use other convex statistical divergences in the form of (1.2) in lieu of the Kullback-Leibler divergence. This leads to alternative statistics in the form of (1.4) which are intimately related to the ϕ-family considered herein. Indeed, as exposed further in the paper, for a feasible y, the infimum in (1.4) is attained by a random discrete measure which converges to a member of theϕ-family, i.e., a probability measure in the form of (1.3).

The paper is organized as follows. Theϕ-family of distributions is introduced in Section 2. In Section 3, we show that theϕ-family admits an affine parametriza- tion. Section 4 is devoted to the estimation of the affine parameter of a member of the family from an i.i.d. random sample. In Section 5, we show that the ϕ- family is the limit family of distributions arising in empirical likelihood. Next, nonparametric confidence domains on theΦ-moment of the underlying probabil- ity measure are derived. Technical results are postponed in an Appendix, at the end of the paper.

2 Notation and definitions

Let (X, µ0) be a finite measure space, where X is a measurable subset of Rd. LetΦ1, . . . ,Φk bek functions inL2(X, µ0)such that the maps1,Φ1, . . . ,Φk are linearly independent. We shall denote by Φ = (Φ1, . . . ,Φk)the map X → Rk, and byΦ = (1,˜ Φ1, . . . ,Φk)the mapX → Rk+1. The set of finite measures and probability measures onX will be denoted respectively byM(X)andM+1(X).

(5)

Letϕ :R→R∪{+∞}be an extended function satisfying the following assump- tion.

Assumption 1

(i) dom(ϕ) = (0,+∞),

(ii) ϕis strictly convex and essentially smooth, (iii) limx→∞ ϕ(x)

x =κ∈(0,+∞], (iv) ϕisC2 on the interior ofdom(ϕ).

We recall that a proper convex function ϕ is said to be essentially smooth if it is differentiable on the interior of its domain, supposed non empty, and if

|∇ϕ(xi)| → ∞ whenever xi is a sequence converging to a boundary point of dom(ϕ)(Rockafellar, 1970, Chap. 26). Note that sincedom(ϕ) = (0,+∞), we have ϕ(x) = +∞ for allx < 0, and that the Fenchel-Legendre transform of ϕ, further denoted byϕ, may be written as

ϕ(u) = sup

x≥0{xu−ϕ(x)}.

From this definition, it follows thatϕ is monotone increasing, so that its deriva- tive ϕ ≥ 0. Under conditions (i) and (iii), we havedom(ϕ) = (−∞;κ). The essential smoothness ofϕimplies thatϕ is strictly convex. At last,ϕ is invert- ible with(ϕ)−1.

As explained in the Introduction, the aim of this paper is to study the family of measures minimizing the convex functionalIϕdefined in (1.2) under the moments constraints (1.1). Solutions to this problem have been obtained by Borwein and Lewis (1991) (see also Borwein and Lewis, 1993a, 1993b). More precisely, we have the following result.

Theorem 2.1 Letϕbe a strictly convex function satisfying Assumption 1, and let

˜

y∈Rk+1. Consider the following primal problem:

Minimize Iϕ(µ) :=

Z

X

ϕ dµ

0

(x)

µ0(dx) subject to µ∈ M(X) µ≪µ0

and Z

X

Φ(x)µ(dx) = ˜˜ y.

(6)

Suppose that there exists at least one solution µ¯ with Iϕ(¯µ)finite. Letbe the unique solution of the dual problem:

Maximize hy, u˜ i − Z

X

ϕ

hu,Φ(x)˜ i

µ0(dx) subject to u∈Rk+1.

Suppose that ess suphu,¯ Φ(x)˜ i < κ. Then the unique optimal solution of the primal problem is given by

¯

µ=ϕ

hu,¯ Φ(x)˜ i µ0, with dual attainment.

We are now in a position to define the ϕ-family of probability measures. To this aim, consider the parametric familyF˜of finite measures onX defined by

F˜ =n

˜

µξ˜:=ϕ

hξ,˜ Φ(x)˜ i

µ0; ˜ξ ∈Ξ˜o

, (2.1)

where

Ξ =˜ n

ξ˜∈Rk+1 : ess suphξ,˜Φ(x)˜ i< κo

, (2.2)

where the essential supremum is taken with respect to µ0. For all ξ˜in Ξ, the˜ Radon-Nikodym derivative ofµ˜ξ˜with respect toµ0is inL(X, µ0)by Lemma A.1.

Then we define theϕ-familyF as the set of probability measures inF˜, i.e., we set F = ˜F ∩ M+1(X). (2.3) Some examples of possible choices for the convex functionϕsatisfying Assump- tion 1 are provided below.

Example 2.1 Consider the functionϕdefined by

ϕ(x) =





xlog(x)−x+ 1, ifx >0,

1 ifx= 0,

+∞ ifx <0.

We have dom(ϕ) = (0,+∞)andκ = +∞. The convex conjugate ofϕ is given byϕ(u) = exp(u)−1anddom(ϕ) = R. Thenϕ(u) = exp(u)and the family F is therefore an exponential family. Also in this case, the functionalIϕ corre- sponds to the Kullback-Leibler divergence when restricted to probability measures arguments.

(7)

Example 2.2 Consider the functionϕdefined by

ϕ(x) =

(2 (√

x−1)2 ifx≥0, +∞ ifx <0.

We havedom(ϕ) = (0,+∞)andκ= 2. The convex conjugate ofϕis given by ϕ(u) =

( 2u

2−u ifu <2, +∞ ifu≥2.

We havedom(ϕ) = (−∞,2), andϕ(u) = (2−u)4 2 on(−∞,2). When restricted to probability measures arguments,Iϕ corresponds to the Hellinger distance.

3 Parametrization of F

Consider the setS˜ofΦ-moments of the measures in˜ F˜, i.e., S˜=

Z

X

Φ(x)˜˜ µξ˜(dx) : ˜ξ ∈Ξ˜

. (3.1)

Theorem 3.1 Suppose that Assumption 1 holds. The mapΨ : ˜˜ Ξ→S˜defined by Ψ( ˜˜ ξ) =

Z

X

Φ(x)˜˜ µξ˜(dx)

is a diffeomorphism fromΞ˜ toS.˜

Proof. Clearly Ψ˜ is surjective, and differentiable from Lemma A.2. Now we proceed to show thatΨ˜ is injective. Consider the mapU : ˜Ξ→Rdefined by

U( ˜ξ) = Z

X

ϕ

hξ,˜Φ(x)˜ i

µ0(dx).

Note that U( ˜ξ) is well-defined for allΞ˜ by Lemma A.1, and differentiable from Lemma A.2. Then the Φ-moments of˜ µ˜ξ˜are obtained by differentiating U, i.e., we have

Ψ( ˜˜ ξ) = Z

X

Φ(x)˜˜ µξ˜(dx) = ∇U( ˜ξ).

(8)

Furthermore, givenu, v∈ξ˜andα∈(0; 1), we have U(αu+ (1−α)v) =

Z

X

ϕ

αhu,Φ˜i+ (1−α)hv,Φ˜i

µ0(dx)

< αU(u) + (1−α)U(v)

sinceϕis strictly convex. Hence,U is strictly convex. Consequently the gradient mapξ˜→ ∇U( ˜ξ)is injective and so isΨ.˜

There remains to show that Ψ˜−1 is differentiable. To this aim, consider the map H : ˜Ξ×S˜→Rk+1 defined by

H( ˜ξ; ˜y) = ∇U( ˜ξ)−y,˜

so thatψ˜−1(˜y)is the unique solution (inξ) of the equation˜ H( ˜ξ,y) = 0. Differen-˜ tiatingHwith respect toξ, we obtain˜

∂ξ˜i

Hj( ˜ξ; ˜y) = Z

X

Φ˜i(x) ˜Φj(x)ϕ′′

hξ,˜ Φ(x)˜ i

µ0(dx),

where (Hj)j=1,...,k+1 are the components of H. Note that, for allξ˜∈ Ξ, the in-˜ tegral above is finite since the map x 7→ ϕ′′

hξ,˜Φ(x)˜ i

is in L(X, µ0) by Lemma A.1, and since the components ofΦ˜ are inL2(X, µ0). Furthermore,ϕ′′

is strictly positive by the strict convexity ofϕ, so that the matrix

ξ˜iHj( ˜ξ; ˜y)

i,j

is the Gram matrix of the scalar products of the maps1,Φ1, . . . ,Φkw.r.t. the mea- sure ϕ′′

hξ,˜ Φ(x)˜ i

µ0. Since these latter are linearly independent, the above matrix is positive-definite. Consequently, for all( ˜ξ,y),˜ Dξ˜H( ˜ξ,y)˜ is a linear in- vertible map. The continuity and differentiability of Ψ˜−1 then follow from the

Implicit Function Theorem.

Now let

Ξ = ξ˜∈Ξ : ˜˜ µξ˜(X) = 1 . (3.2) and letiΞ : Ξ → Rk+1 be the canonical embedding ofΞinRk+1. Then we may rewrite the familyF as

F =n

µξ :=ϕ

hiΞ(ξ),Φ(x)˜ i

µ0 : ξ ∈Ξo

. (3.3)

(9)

Let

S= Z

X

Φ(x)µξ(dx) : ξ ∈Ξ

. (3.4)

As an immediate consequence of the Theorem above, we obtain the following result.

Theorem 3.2 Suppose that Assumption 1 holds. The mapΨ : Ξ→S defined by Ψ(ξ) =

Z

X

Φ(x)µξ(dx) is a diffeomorphism fromΞtoS.

We are now in a position to provide an affine parametrization of the familyF. Theorem 3.3 Suppose that Assumption 1 holds. There exists a unique subsetΘ ofRkdiffeomorphic toΞand a unique differentiable mapg : Θ→Rsuch that

F =n

µθ :=ϕ(g(θ) +hθ,Φ(x)i)µ0; θ∈Θo .

Proof Let us writeξ˜∈ Ξ˜ ⊂Rk+1 asξ˜= (α, β), with α ∈ Randβ ∈ Rksuch that we have

ϕ hξ,˜Φ(x)˜ i

α+hβ,Φ(x)i .

Furthermore, let π1 and π2 be the projections on respectively R and Rk, i.e., (α, β) = (π1( ˜ξ), π2( ˜ξ)) and let F : π1(˜Ξ)×π2(˜Ξ) → R∪ {+∞}be the map defined by

F(α, β) = Z

X

ϕ α+hβ,Φ(x)i

µ0(dx)−1.

Note that F takes infinite values on the complement of Ξ˜ inπ1(˜Ξ)×π2(˜Ξ)and that we have

Ξ =

(α, β) : F(α, β) = 0 . First we have

∂αF(α, β) = Z

X

ϕ′′ α+hβ,Φ(x)i

µ0(dx)

> 0

sinceϕ is strictly convex. Hence for all(α, β),DαF(α, β)is a linear invertible map from π1(˜Ξ) to itself. Second, Ξ is connected since Ξis homeomorphic to

(10)

S by Theorem 3.2 andS is connected. The existence and uniqueness of the map g now follows from a global version of the Implicit Function Theorem (see e.g., Dieudonn´e, 1972, pp. 265-266, or Blot, 1991) and is defined on Θ := π2(˜Ξ)

which is diffeomorphic toΞ.

As in the proof of Theorem 3.3, we shall writeRk+1 asR×Rk and denote byπ1

and π2 the projections from Rk+1 on Rand Rk, respectively. Then we have the following diagram:

F Ξ

=

OO

= Ψ

iΞ

//iΞ(Ξ)

π2

S Θ=

oo m

whereiΞ denotes the canonical embedding ofΞinRk+1, and where∼=denotes a diffeomorphism. In this diagram, the map m is a diffeomorphism from Θto S and is defined by

m(θ) = Z

X

Φ(x)µθ(dx), (3.5)

i.e.,mis the inverse of map ofπ2◦iΞ ◦Ψ−1.

4 Inference in F

In this section, we consider the estimation of a parameter θ0 ∈ Θ based on an i.i.d. random sample X1, . . . , Xn drawn fromµθ0, which may be written as µθ0 g(θ0) +hθ0,Φ(x)i

µ0 from Theorem 3.3.

Let us start by drawing some consequences of the results in Section 3. If we denoteyθ0 theΦ-moments ofµθ0, i.e.,

yθ0 = Z

X

Φ(x)µθ0(dx).

then we have θ0 = m−10). In practice, though, and depending on the choice of ϕ, it may be difficult to derive explicit expressions for the maps g and m, apart from the special case of an exponential family. However, the results of

(11)

Borwein and Lewis (1991, 1993a, 1993b) exposed in Theorem 2.1 provide one with a convenient algorithm to compute the value of θ0 given the moment yθ0, without explicit expressions for the maps g and m. First of all, we may write S =π2

S˜∩ {1} ×Rk

. Consider the vectory˜θ0 = (1, yθ0)inS. Then˜ Ψ˜−1(˜yθ0) lies iniΞ(Ξ) ⊂Ξ˜ so we obtain

θ0 = (π2◦Ψ˜−1) (˜yθ0).

Second, from the proof of Theorem 3.1, for all y˜ in S,˜ Ψ˜−1(˜y) is the unique solution to the following minimization problem:

Minimize Z

X

ϕ

hu,Φ(x)˜ i

µ0(dx)− hy, u˜ i subject to u∈Rk+1.

Consequently,θ0may be evaluated by taking theklast components of the unique minimum overRk+1 of the map

u:= (u0, ..., uk)7→

Z

X

ϕ u0 +

k

X

i=1

uiΦi(x)

!

µ0(dx)− u0+

k

X

i=1

uiyθ0,i

! , (4.1) i.e., lettingu¯:= (¯u0, . . . ,u¯k)be the unique minimum in (4.1), thenθ0 = (¯u1, . . . ,u¯k).

In addition, we also have g(θ0) = ¯u0. Another interest of this procedure is that the map in (4.1) is convex. So evaluating θ0 fromyθ0 requires solving an uncon- strained convex minimization problem for which efficient numerical algorithms are available.

These observations lead us to estimateθ0 by minimizing the empirical version of (4.1). More precisely, let yˆn be the empirical Φ-moment ofµθ0 associated with the sampleX1, . . . , Xn, i.e.,

ˆ yn= 1

n

n

X

i=1

Φ(Xi), (4.2)

sety˜n = (1,yˆn), and letPnbe the empirical measure associated with the random sample. Then we define the estimateθˆnas a minimizer overRk+1of the map

u7→

Z

X

ϕ

hu,Φ(x)˜ i

Pn(dx)− hy˜n, ui,

(12)

which is the empirical version of (4.1). Indeed, θˆnis an M-estimator, and on the probability event thatyˆnlies in the setS, we may write

θˆn=m−1(ˆyn). (4.3)

Next, by the law of large numbers,yˆnbelongs toSfornlarge enough, with prob- ability one. Consequently, since m is a diffeomorphism from ΘtoS, it follows thatθˆnconverges in probability toθ0, and sinceyˆnis asymptotically normally dis- tributed, it follows thatθˆn is in turn asymptotically normal. Finally, we have the following Theorem.

Theorem 4.1 Suppose that Assumption 1 holds. The sequence

n(ˆθn−θ0)con- verges in distribution to a normal distribution with mean0and covariance matrix given by

Σ =

γ(θ0)−2h Covµ

θ0 Φ(X)i−1

Covµθ0 Φ(X)h Covµ

θ0 Φ(X)i−1

, where

γ(θ) = Z

X

ϕ′′ g(θ) +hθ,Φ(x)i

µ0(dx), and whereµθ0 is the measure defined by

µθ0 =γ(θ0)−1ϕ′′ g(θ) +hθ,Φ(x)i µ0.

Proof Sinceyˆn is asymptotically normal, and since m is a diffeomorphism, it follows from standard arguments on moment estimators (see e.g. Van der Vaart, 1998, Theo. 4.1., p. 36), that√

n(ˆθn−θ0)converges to a normal distribution with mean 0 and covariance matrix

Σ =m′−1θ0 Covµθ0 Φ(X)

m′−1θ0 t

, wheremθ0 is the derivative ofmatθ0. We have

∂mj

∂θi

(θ) = Z

X

Φj(x) ∂g

∂θi

(θ) + Φi(x)

ϕ′′ g(θ) +hθ,Φ(x)i

µ0(dx). (4.4) and

∂g

∂θi

(θ) =− R

X Φi(x)ϕ′′ g(θ) +hθ,Φ(x)i

µ0(dx) R

Xϕ′′ g(θ) +hθ,Φ(x)i

µ0(dx) (4.5)

since R

Xϕ g(θ) +hθ,Φ(x)i

µ0(dx) = 1. Reporting (4.5) in (4.4) yields the

desired result.

(13)

5 Nonparametric inference on the Φ -moment

Let X1, . . . , Xn be an i.i.d. random sample drawn from a probability measure µ0 on X. Suppose that we are interested in µ0 only through itsΦ-momenty0 = R

XΦ(x)µ0(dx). As exposed in the Introduction, the method of empirical likeli- hood (Owen, 1988, 2001) amounts at minimizing the Kullback-Leibler divergence between the empirical measurePn of the random sample, and a measureµsatis- fying the contraints of the model and absolutely continuous with respect to Pn. Replacing the Kullback-Leibler divergence by aϕ-divergence provides one with an alternative statistic to test fory0, as well as to construct a confidence domain ony0.

First of all, let Pn be the empirical measure associated with the random sample X1, . . . , Xn. Define the functionalIϕn(µ)overM(X)by

Iϕn(µ) = Z

X

ϕ dµ

dPn

(x)

Pn(dx),

wheneverµ≪Pnand setIϕn(µ) = +∞otherwise. Observe that ifIϕn(µ)is finite thenµis a discrete measure concentrated on theXi’s. Additional conditions onϕ are needed to ensure thatIϕ is a divergence between probability measure. More precisely, we shall need the following asumption.

Assumption 2 (i) ϕ(1) = 0

(ii) ϕ(x)x →+∞asx→+∞, i.e.,κ= +∞.

For all y ∈ S, we shall let y˜ = (1, y), and we consider the following primal problem:

Minimize Iϕn(µ)

subject to µ∈ M(X), µ≪Pn, and

Z

X

Φ(x)µ(dx) = ˜˜ y.

The dual optimization problem is:

Maximize hy,˜ v˜i − Z

X

ϕ h˜v,Φ(x)˜ i

Pn(dx) subject to v˜∈Rk+1.

(14)

LetΩnbe the probability event that a solution to the dual problem exists, solution further denoted byξ˜n. Then, by Theorem 2.1, on Ωn, the unique primal solution is given by

˜ µn= 1

n

n

X

i=1

ϕ

hξ˜n,Φ(X˜ i)i

δXi. (5.1)

The convergence ofµ˜nmay be analysed using known results on M-estimators (see e.g., van de Geer, 2000, Chap. 12, and van der Vaart, 1998, Chap. 5). In essence, the concavity of the objective function in the dual program (i.e., the convexity of the negative objective function) is sufficient to establish the convergence ofξ˜nto ξ˜in probability, whereξ˜= ˜Ψ−1(˜y).

More precisely, sincey ∈S, we have y˜= R

X Φ(x)˜˜ µξ˜(dx). Consequently, by the law of large numbers, it follows that P(Ωn) → 1asn → ∞. So on Ωn,ξ˜nis the point of minimum of the mapv˜7→R

X h˜v(x)Pn(dx), where h˜v(x) =ϕ hv,˜ Φ(x)˜ i

− h˜v,y˜i.

Since ˜v 7→ h˜v(x)is continuous and convex for µ0-almost every x, and since by Lemma A.2, forε >0small enough,

Z

X

sup

˜ v∈Bε( ˜ξ)

|h˜v(x)|µ˜ξ˜(dx)<∞,

whereBε( ˜ξ)is the Euclidean ball centered atξ˜and of radiusε, it follows that ξ˜n→ξ˜ in probability asn → ∞. (5.2) As a consequence, we obtain the convergence ofµ˜nto the member of the family F havingΦ-momenty, which is stated below without proof.

Theorem 5.1 Suppose that Assumption 1 and Assumption 2 hold. Then for all y ∈ S,µ˜n converges weakly to the probability measureµ˜ξ˜, in probability, where ξ˜= ˜Ψ−1(˜y).

Additionally, sinceξ˜nconverges in probability toξ, by applying Theorem 5.23 in van der Vaart (1998, p. 53), we obtain:

√n ξ˜n−ξ˜

=−Vξ˜−1 1

√n

n

X

i=1

hΦ(X˜ i hξ,˜ Φ˜Xi)i

−y˜i

+oP(1), (5.3)

(15)

whereVξ˜is the matrix defined by Vξ˜=hZ

X

Φ˜i(x) ˜Φj(x)ϕ′′ hξ,˜ Φ(x)˜ i

µ0(dx)i

i,j. (5.4)

Now consider the statisticTn(y)defined by Tn(y) = infn

Iϕn(µ) : Z

X

Φ(x)µ(dx) = ˜˜ yo

. (5.5)

Then we have the following result, which proves that a confidence domain on the Φ-momenty0and a convergent test fory0 may be based on the statisticTn(y).

Theorem 5.2 Suppose that Assumption 1 and Assumption 2 hold. Suppose in addition thatϕ isC3 onRand that, for allj, k, l, there existsε >0such that

sup

˜ v∈Bε( ˜ξ)

ϕ′′′ hv,˜ Φ(x)˜ iΦ˜i(x) ˜Φj(x) ˜Φl(x)

≤hjkl(x)

for someµ0-integrable functionshjkl, and whereBε( ˜ξ)denotes the ball centered atξ˜and of radiusε.

(i) Ify6=y0, then

√n Tn(y)−Iϕ(y) D

−→ N(0, σ2), asn→ ∞, where

σ2 = Z

X

ϕ2 hξ,˜ Φ(x)˜ i

µ0(dx)−hZ

X

ϕ hξ,˜ Φ(x)˜ i

µ0(dx)i2

.

(ii) Ify=y0, then

2n

ϕ′′(1)Tn(y)−→D χ2(k), asn→ ∞.

Proof By dual attainment, we have Tn(y) = hξ˜n,y˜i − 1

n

n

X

i=1

ϕ hξ˜n,Φ(X˜ i)i .

(16)

Let us start with the following decomposition of the sum in the preceding equa- tion:

1 n

n

X

i=1

ϕ hξ˜n,Φ(X˜ i)i

= 1 n

n

X

i=1

ϕ hξ,˜Φ(X˜ i)i

+1 n

n

X

i=1

ϕ hξ,˜ Φ(X˜ i)i

hΦ(X˜ i),ξ˜n−ξ˜i

+1 n

n

X

i=1

ϕ′′ hξ,˜Φ(X˜ i)i

hΦ(X˜ i),ξ˜n−ξ˜i2 +Rn,

where

Rn = 1 n

n

X

i=1

ϕ′′′ hξ˜+αn( ˜ξn−ξ),˜ Φ(X˜ i)i

hΦ(X˜ i),ξ˜n−ξ˜i3,

for someαn ∈(0; 1). Since the sequence√

n( ˜ξn−ξ)is uniformly tight, and since for allj, k, l, the functionsx7→supv∈B˜ ε( ˜ξ)

ϕ′′′ hv,˜ Φ(x)˜ iΦ˜i(x) ˜Φj(x) ˜Φl(x) are dominated by someµ0-integrable functions by assumption, we conclude that

nRn =oP(1). (5.6)

First, suppose that y 6=y0. In this case, it suffices to consider the decomposition at the order two. Set

˜ zn= 1

n

n

X

i=1

ϕ hξ,˜Φ(X˜ i)iΦ(X˜ i)t.

The Central Limit Theorem entails that the sequence √

n z˜n −y˜

is uniformly

(17)

tight. Then we may write Tn(y)−Iϕ(y) = hξ˜n,y˜i − 1

n

n

X

i=1

ϕ hξ˜n,Φ(X˜ i)i

− hξ,˜ y˜i +

Z

X

ϕ hξ,˜ Φ(x)˜ i

µ0(dx)

= hξ˜n,y˜i − 1 n

n

X

i=1

ϕ hξ,˜ Φ(X˜ i)i

− hz˜n,ξ˜n−ξ˜i −oP(1/√ n)

−hξ,˜ y˜i+ Z

X

ϕ hξ,˜Φ(x)˜ i µ0(dx)

= hξ˜n−ξ,˜ y˜−z˜ni − 1 n

n

X

i=1

ϕ hξ,˜ Φ(X˜ i)i +

Z

X

ϕ hξ,˜ Φ(x)˜ i

µ0(dx).

But√

nhξ˜n−ξ,˜ y˜−z˜ni →0in probability, and so the first statement follows from the Central Limit Theorem.

Second, suppose thaty=y0. Thenξ˜= ˜ξ0, and for alli= 1, . . . , n, the following relations hold:

ϕ hξ˜0,Φ(X˜ i)i

= ϕ ϕ(1)

(1), ϕ hξ˜0,Φ(X˜ i)i

= ϕ ϕ(1)

= 1, ϕ′′ hξ˜0,Φ(X˜ i)i

= ϕ′′ ϕ(1)

= 1

ϕ′′(1). Letyˆn = n1 Pn

i=1Φ(Xi)and sety˜n= (1,yˆn). LetV¯nbe the matrix defined by V¯n= 1

n

n

X

i=1

Φ(X˜ i) ˜Φ(Xi)t.

(18)

Then we obtain

Tn(y)−Iϕ(y) = hξ˜n,y˜i −ϕ(1)− hy˜n,ξ˜n−ξ˜0i − 1

′′(1)( ˜ξn−ξ)tn( ˜ξn−ξ0)

−oP(1/n)− hξ˜0,y˜i+ Z

X

ϕ hξ˜0,Φ(x)˜ i

µ0(dx)

= hξ˜n−ξ˜0,y˜−y˜ni − 1

′′(1)( ˜ξn−ξ0)tn( ˜ξn−ξ0) +oP(1/n), sinceR

X ϕ hξ˜0,Φ(x)˜ i

µ0(dx) = ϕ(1). From (5.3), we have

√n( ˜ξn−ξ) =˜ −Vξ−10 (˜yn−y˜0) +oP(1),

where the matrix Vξ0 is defined in (5.4). Since V¯n → EΦ(X) ˜˜ Φ(X)t

element- wise asn → ∞, and sinceIϕ(y) = 0wheny=y0, we obtain

Tn(y) = ϕ′′(1)

2 (˜yn−y˜0)tVξ˜−1

0 (˜yn−y0) +oP(1/n). (5.7) LettingΣ =Covµ0 Φ(X)

, we may write

Vξ˜0 = EΦ(X) ˜˜ Φ(X)t

=

1 yt0 y0 Σ

Using the following relation for an invertible matrix defined by block:

A B C D

=

A−1+A−1B(D−CA−1B)−1CA −A−1B(D−CA−1B)−1

−(D−CA−1B)−1CA−1 (D−CA−1B)−1

,

we obtain the expression of the inverse ofVξ˜0: Vξ˜−1

0 =

1 +y0tΣ−1y0 −y0tΣ−1

−Σ−1y0 Σ−1

. (5.8)

Reporting (5.8) in (5.7), and since(˜yn−y˜0) = (0,yˆn−y0)yields 2n

ϕ′′(1)Tn(y) = (ˆyn−y0−1(ˆyn−y0) +oP(1),

from which the result follows. .

(19)

A Technical Lemma

Lemma A.1 Suppose thatϕsatisfies Assumption 1.

(i) For allp∈ {0; 1; 2}and for allξ˜∈Ξ, the map˜ fp :X →Rdefined by fp(x) =ϕ∗(p) hξ,˜ φ(x)˜ i

isµ0-integrable, whereϕ∗(p)denotes thepthderivative ofϕ. (ii) Furthermore, forp= 1orp= 2,fp is inL(X, µ0).

Proof Let us start by recalling the properties ofϕ. First, sinceϕ is essentially smooth, ϕ is strictly convex, and since dom(ϕ) = (0,+∞), ϕ is monotone increasing. Consequently,ϕ andϕ′′are positive, and additionally,ϕ is mono- tone increasing. Combination of these facts entails thatϕ′′(u)→0asu→ −∞. At last,ϕ(u)/u→0asu→ −∞sinceinf dom(ϕ) = 0.

Givenξ˜∈Ξ, let˜ a = ess suphξ,˜ Φ(x)˜ i< κby definition ofΞ.˜

For p = 0, since ϕ(u)/u → 0 as u → −∞, there exists α < 0 such that

(u)| ≤ |u|wheneveru≤α. Let A =

x∈ X : hξ,˜Φ(x)˜ i ≤α . First, forµ0-a.e. x, we have

|f0(x)1Ac(x)| ≤sup

[α,a](u)|<∞, and second

|f0(x)1A(x)| ≤ h|ξ˜|,|Φ(x)˜ |i.

SinceΦ˜ isµ0-integrable, and sinceµ0is finite, we conclude thatf0isµ0-integrable.

Forp= 1, sinceϕ is positive monotone increasing, we have0≤f1(x)≤ϕ(a) µ0-a.e., and sof1 is inL(X, µ0).

For p = 2, since ϕ′′ is positive with ϕ′′(u) → 0 as u → −∞, we have 0≤f2(x)≤supu∈(−∞,a]ϕ′′(u)µ0-a.e., sof2 is inL(X, µ0).

(20)

Lemma A.2 For all p ∈ {0; 1; 2} and for all ξ˜ ∈ Ξ, there exists˜ ε > 0 and a µ0-integrable functionhsuch that

sup

˜ v∈Bε( ˜ξ)

ϕ∗(p) hv,˜ Φ(x)˜ i

< h(x),

where Bε( ˜ξ)is the Euclidean ball centered at ξ˜and of radius ε. Moreover, for p= 1orp= 2,hmay be taken as a constant function.

Proof Choose ε small enough such that the ball is included in a cube in turn included in Ξ, and denote by˜ v˜i the vertices of the cube, for i = 1, . . . ,2k+1. For all v˜ ∈ Ξ, let˜ C(˜v) = ess suphv,˜ Φ(x)˜ i, which is strictly less than κ by construction. Then, for all˜v ∈Bε( ˜ξ), and forµ0-almost everyx, we have

hξ,˜Φ(x)˜ i −εkΦ(x)˜ k ≤ h˜v,Φ(x)˜ i ≤max

i C(˜vi), (A.1) wherekΦ(x)˜ kdenotes the Euclidean norm inRk+1, and where the upper inequal- ity follows from the convexity of the cube. Since ϕ is monotone increasing, it follows that

sup

˜ v∈Bε( ˜ξ)

ϕ h˜v,Φ(x)˜ i

≤max

ϕ max

i C(˜vi) ;

ϕ hξ,˜ Φ(x)˜ i−εkΦ(x)˜ k

,

forµ0-a.e. x. Sinceµ0 is a finite measure, it is sufficient to prove that the second term in the maximum is µ0 integrable. As in the proof of Lemma A.1, letα ≤ 0 be such that|ϕ(u)| ≤ |u|for allu≤α, and let

A=

x∈ X : hξ,˜ Φ(x)˜ i −εkΦ(x)˜ k ≤α . We have

ϕ hξ,˜ Φ(x)˜ i −εkΦ(x)˜ k

1Ac(x)≤ sup

; maxiC(˜vi)](u)|, and

ϕ hξ,˜ Φ(x)˜ i −εkΦ(x)˜ k

1A(x)≤

hξ,˜ Φ(x)˜ i −εkΦ(x)˜ k , and R

X

hξ,˜ Φ(x)˜ i −εkΦ(x)˜ k

µ0(dx) is finite since the components of Φ˜ are in L2(X, µ0)and sinceµ0(Ac)<∞. This proves the result forp= 0.

For p = 1, since ϕ is positive and monotone increasing, the result follows di- rectly from (A.1).

Forp= 2, the result follows from the fact thatϕ′′is positive withϕ′′(u)→0as

u→ −∞and (A.1).

(21)

References

[1] Bertail, P. (2006). Empirical likelihood in some semiparametric models.

Bernoulli,Vol. 12, pp. 299-331.

[2] Blot, J. (1991). On global implicit functions, Nonlinear Analysis, Theory, Methods and Applications,Vol. 17, pp. 947-959.

[3] Borwein, J.M. and Lewis, A.S. (1991). Duality relationships for entropy- like minimization problems,SIAM J. Control and Optimization,Vol. 29, pp.

325-338.

[4] Borwein, J.M. and Lewis, A.S. (1993a). Partially-finite programming inL1

and the existence of maximum entropy estimates,SIAM J. Optimization,Vol.

3, pp. 248-267.

[5] Borwein, J.M. and Lewis, A.S. (1993b). On the failure of maximum entropy reconstruction for Fredholm equations and other infinite systems.Mathemat- ical Programming,Vol. 63, pp. 251-261.

[6] Broniatowski, M. (2004). Estimation of the Kullback-Leibler divergence.

Mathematical Methods of Statistics,Vol. 12, pp. 391-409.

[7] Broniatowski, M. and Keziou, A. (2006). Minimization of φ-divergences on sets of signed measures.Studia Scientarum Mathematicarum Hungarica, Vol. 43, pp. 403-442.

[8] Dacunha-Castelle, D. and Gamboa, F. (1990). Maximum d’entropie et probl`emes des moments. Annales de l’Institut Henri Poincar´e, Section B, Vol. 26, pp. 567-596.

[9] Decarreau, A., Hilhorst, D., Lemar´echal, C. and Navaza, J. (1992). Dual methods in entropy maximization. Application to some problems in crystal- lography.SIAM Journal on Optimization,Vol. 2, pp; 173-197.

[10] Dieudonn´e, J. (1972). El´ements d’Analyse, Tome I, Fondements de l’Analyse Moderne. Gauthier-Vilars, Paris.

[11] Gamboa, F. and Gasiat, E. (1997). Bayesian methods and maximum entropy for ill-posed inverse problems.The Annals of Statistics,Vol. 25, pp. 328-350.

(22)

[12] Keziou, A. (2003). Dual representation ofφ-divergence and applications.C.

R. Math. Acad. Sci. Paris,Vol. 336, pp. 857-862.

[13] Liese, F. and Vajda, I. (1987). Convex Statistical Distances. Teubner Texts in Mathematics,Vol. 95, Leipzig.

[14] Owen, A.B. (1988). Empirical likelihood ratio confidence intervals for a sin- gle functional.Biometrika,Vol. 75, pp. 237-249.

[15] Owen, A.B. (2001).Empirical Likelihood. Chapman and Hall.

[16] Pardo, L. (2006). Statistical Inference Based on Divergence Measures, Chapman & Hall/CRC.

[17] Rockafellar, R.T. (1970). Convex Analysis, Princeton University Press, Princeton, New Jersey.

[18] van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge University Press.

[19] van der Vaart, A.W. (1998). Asymptotic Statistics. Cambridge University Press.

Références

Documents relatifs

A game is feasible if its vector of deadlines is feasible. A related definition is that of an achievable vector which is defined as a vector of mean response times for which

Ultracold atom systems allow a large control of decoherence, which is essentially due to spontaneous emission, and of atom-atom interactions, for which an experimental “knob”

Based on these data in the max-domain of attraction of the logistic distribution we estimate the dependence parameter θ using our Bayes estimator and compare it to the

In a pioneering work, Deheuvels and Mason [4] established a nonstandard func- tional law of the iterated logarithm for a single functional increment of the empirical

Forecast and simulation of stratospheric ozone filaments: A validation of a high-resolution potential vorticity advection model by airborne ozone lidar measurements.. in

Now, assuming that the mutation rate is very small so that we see at most one mutation per locus, the recombinational distance between adjacent loci is very large, and the tree

The aim of the present analyses was (1) to assess dif- ferences in children’s HRQOL (KINDL-R questions) according to socio-demographic factors in a rural partly farming population

In this chapter we propose inference (i.e. estimation and testing) for vast dimensional elliptical distributions. Estimation is based on quantiles, which always exist regardless of