Divergences and duality for estimation and test under moment condition model

(1)

HAL Id: hal-00451831

https://hal.archives-ouvertes.fr/hal-00451831v2

Submitted on 9 Apr 2010

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Divergences and duality for estimation and test under moment condition model

Michel Broniatowski, Amor Keziou

To cite this version:

Michel Broniatowski, Amor Keziou. Divergences and duality for estimation and test under moment condition model. Journal of Statistical Planning and Inference, Elsevier, 2012, 142 ( 9), pp. 2554-2573.

�hal-00451831v2�

(2)

ON GENERALIZED EMPIRICAL LIKELIHOOD METHODS

MICHEL BRONIATOWSKI^∗AND AMOR KEZIOU^∗∗

Abstract. We introduce estimation and test procedures through divergence minimization for models satisfying linear constraints with unknown parameter. These procedures extend the empirical likelihood (EL) method and share common features with generalized empirical likelihood (GEL) approach. We treat the problems of existence and characterization of the divergence projections of probability measures on sets of signed finite measures. Our approach allows to obtain the limit distributions of the estimates and test statistics (including the EL ones) under alternatives and misspecification. The asymptotic behavior of the estimates and test statistics are studied both under the model and under alternatives including misspecification, using the dual representation of the divergences and the explicit forms of the divergence projections. An approximation to the power function is deduced as well as the sample size which ensures a desired power for a given alternative.

Keywords:Empirical likelihood; Generalized Empirical likelihood; Minimum divergence; Effi- ciency; Power function; Duality; Divergence projection.

MSC (2000) Classification: 62G05; 62G10; 62G15; 62G20; 62G35.

1. Introduction and notation 1

2. Statistical divergences 3

3. Minimum divergence estimates 4

4. Dual representation of φ − divergences under constraints 6 5. Asymptotic properties of the estimates of the parameter and the estimates of the

divergences 10

5.1. Under the model 10

5.2. Asymptotic properties of the estimates of the divergences for a given value of the

parameter 11

5.3. Under misspecification 13

6. Simulation results: Approximation of the power function of the empirical likelihood

ratio test 14

7. Concluding remarks and possible developments 15

8. Appendix 16

References 20

1. Introduction and notation Statistical models are often defined through estimating equations

E [g(X, θ)] = 0

Date: April 2010.

1

(3)

where g(X, θ) is some vector valued function of a random vector X ∈ R

^m

and a parameter vector θ ∈ Θ ⊂ R

^d

. The function g has l real valued functions g

j

as its components. Examples of such models are numerous, see e.g. Qin and Lawless (1994), Haberman (1984), Sheehy (1987), McCullagh and Nelder (1983), Owen (2001) and the references therein. Denoting M

¹

the collection of all probability measures (p.m.) on R

^m

, the submodel M

¹θ

, associated to a given value θ of the parameter, consists of all distributions Q satisfying the linear constraints induced by g(., θ), namely

M

¹θ

:=

Q ∈ M

¹

such that Z

g(x, θ) dQ(x) = 0

. The statistical model which we consider can be written as

(1.1) M

¹

:= [

θ∈Θ

M

¹θ

.

Let X

1

, ..., X

n

denote an i.i.d sample of X with unknown distribution P

0

. We denote θ

0

, if it exists, the value of the parameter such that P

0

belongs to M

¹θ0

, namely the value satisfying E [g(X, θ

0

)] = 0, and we assume obviously that θ

0

is unique. This paper addresses the two following natural questions:

Problem 1 : Does P

0

belong to the model M

¹

?

Problem 2 : When P

0

is in the model, which is the value θ

0

of the parameter for which E [g(X, θ

0

)] = 0? Also can we perform tests about θ

0

? Can we construct confidence areas for θ

0

?

We note that these problems have been investigated by many authors. Hansen (1982) considered generalized method of moments (GMM). Hansen et al. (1996) introduced the continuous updating (CU) estimate. The empirical likelihood (EL) approach, developed by Owen (1988) and Owen (1990), has been investigated in the context of model (1.1) by Qin and Lawless (1994) and Imbens (1997) introducing the EL estimator. The recent literature in econometrics focusses on such models;

Newey and Smith (2004) provided a class of estimates called generalized empirical likelihood (GEL) estimates which contains the EL and CU estimates. Schennach (2007) discussed the asymptotic properties of the empirical likelihood estimate under misspecification; She showed the important fact that the EL estimate may cease to be root n consistent when the functions defining the moments conditions are unbounded. Among other results pertaining to EL, Newey and Smith (2004) stated that EL estimate enjoys optimality properties in term of efficiency when bias corrected among all GEL estimates including the GMM one. Also Corcoran (1998) and Baggerly (1998) proved that in a class of minimum discrepancy statistics (called power divergence statistics), EL ratio is the only one that is Bartlett correctable. Confidence areas for the parameter θ

0

have been considered in the seminal paper by Owen (1990). Problem 1 and 2 have been handled via EL approach in Qin and Lawless (1994) and in Newey and Smith (2004) under the null hypothesis H

⁰

: P

0

∈ M

¹

; however the limit distributions of the EL estimate and the EL test statistic under misspecification have not been obtained so far. Our contribution is as follows:

(1) The approach which we develop is based on minimum discrepancy estimates, which extends the EL method and has common features with minimum distance and GEL techniques, using merely divergences. We present a wide class of estimates, test statistics and confidence regions for the parameter θ

0

as well as various test statistics for Problems 1 and 2, all depending on the choice of the divergence.

(2) The limit distribution of the EL test statistic under the alternative and under misspecifi-

cation remains up to date an open problem. The present paper fills this gap; indeed, we

give the limit distributions of the proposed estimates and test statistics (including the EL

ones) for Problems 1 and 2 both under the null hypotheses, under alternatives and under

misspecification.

(4)

(3) The limit distributions of the test statistics under the alternatives and misspecification are used to give an approximation to the power function and the sample size which ensures a desired power for a given alternative.

(4) We extend confidence region (C.R.) estimation techniques based on EL (see Owen (1990)), providing a wide range of such C.R.’s, each one depending upon a specific criterion.

From the point of view of the statistical criterion under consideration, the main advantage of using a divergence based approach lays in the fact that it leads to all statistical properties of the estimates and test statistics under the alternative, including misspecification, which cannot be achieved through the classical EL context. In the case of parametric models of densities, White (1982) studied the asymptotic properties of the parametric maximum likelihood estimate and the parametric likelihood ratio statistic under misspecification. Broniatowski and Keziou (2009) stated the consistency and obtained the limit distributions of the minimum divergence estimates and the corresponding test statistics (including the parametric likelihood ones) both under the null hypotheses and the alternatives, from which they deduced an approximation to the power function.

In this paper, we extend the above results to the case of the semi-parametric models (1.1) in the global context of empirical divergences; including the EL method.

The paper is organized as follows. Section 2 describes the statistical divergences used in the sequel.

Section 3 is devoted to the description of estimation and test procedures. In Section 3 we adapt the formalism of Lagrangian duality to the context of statistical divergence, and we use it to give practical formulas (for the study and the numerical computation) of the proposed estimates and test statistics. Section 5 deals with the asymptotic properties of the estimates and test statistics.

Simulations results are given in Section 6. All proofs are postponed to the Appendix.

2. Statistical divergences

We first set some general definitions and notations. Let P be some p.m. Denote by M the space of all signed finite measures (s.f.m.) on R

^m

. Let φ be a convex function from R onto [0, + ∞ ] with φ(1) = 0, and such that its domain domφ := { x ∈ R such that φ(x) < ∞} is an interval with endpoints a < 1 < b (which may be finite or infinite). We assume that φ is closed

¹

. For any s.f.m.

Q, the φ-divergence between Q and the p.m. P, when Q is absolutely continuous with respect to (a.c.w.r.t) P , is defined through

(2.1) D

φ

(Q, P ) :=

Z

R^m

φ dQ

dP (x)

dP (x).

in which

^dQ_dP

( · ) denotes the Radon-Nikodym derivative. When Q is not a.c.w.r.t. P , we set D

φ

(Q, P ) = + ∞ . For any p.m. P , the mapping Q ∈ M 7→ D

φ

(Q, P ) is convex and takes nonnegative values. When Q = P then D

φ

(Q, P ) = 0. Furthermore, if the function x 7→ φ(x) is strictly convex on a neighborhood of x = 1, then

(2.2) D

φ

(Q, P ) = 0 if and only if Q = P.

All the above properties are presented in Csisz´ ar (1963), Csisz´ar (1967) and Liese and Vajda (1987) chapter 1, for φ − divergences defined on the set of all p.m.’s M

¹

. When the φ-divergences are defined on M , then the same arguments as developed on M

¹

hold. When defined on M

¹

, the Kullback-Leibler (KL), modified Kullback-Leibler (KL

m

), χ

²

, modified χ

²

(χ

²_m

), Hellinger (H ), and L

¹

divergences are respectively associated to the convex functions φ(x) = x log x − x+1, φ(x) =

− log x + x − 1, φ(x) =

¹₂

(x − 1)

²

, φ(x) =

¹₂

(x − 1)

²

/x, φ(x) = 2( √ x − 1)

²

and φ(x) = | x − 1 | . All these divergences except the L

¹

one, belong to the class of power divergences introduced in

1The closedness ofφmeans that ifaorbare finite thenϕ(x)→ϕ(a) whenx↓a, andϕ(x)→ϕ(b) whenx↑b.

(5)

Cressie and Read (1984) (see also Liese and Vajda (1987) and Pardo (2006)). They are defined through the class of convex functions

(2.3) x ∈ R

^∗₊

7→ φ

γ

(x) := x

^γ

− γx + γ − 1 γ(γ − 1)

if γ ∈ R \{ 0, 1 } and by φ

0

(x) := − log x+x − 1 and φ

1

(x) := x log x − x+1. So, the KL − divergence is associated to φ

1

, the KL

m

to φ

0

, the χ

²

to φ

2

, the χ

²_m

to φ

−1

and the Hellinger distance to φ

1/2

. We extend the definition of the power divergences functions Q ∈ M

¹

7→ D

φγ

(Q, P ) onto the whole set of signed finite measures M as follows. When the function x 7→ φ

γ

(x) is not defined on ( −∞ , 0[ or when φ

γ

is defined on R but is not a convex function we extend the definition of φ

γ

through

(2.4) x ∈ R 7→ φ

γ

(x) 1

_[0,+∞]

(x) + (+ ∞ ) 1

_[−∞,0[

(x).

Note for instance that for χ

²

-divergence, the corresponding φ function φ(x) =

¹₂

(x − 1)

²

is convex and defined on whole R. In this paper, for technical considerations, we assume that the φ functions are strictly convex on their domain (a, b), twice continuously differentiable on the interior of their domain and satisfy φ(1) = 0, φ

^′

(1) = 0 and φ

^′′

(1) = 1. We assume also that φ is “essentially smooth” in the sense that lim

x↓a

φ

^′

(x) = −∞ if a > −∞ and lim

x↑b

φ

^′

(x) = + ∞ if b < + ∞ . Note that all the power functions φ

γ

, see (2.4), satisfy the above conditions, including all standard divergences.

Definition 2.1. Let Ω be some subset in M . The φ − divergence between the set Ω and a p.m. P is defined by

D

φ

(Ω, P ) := inf

Q∈Ω

D

φ

(Q, P ).

A finite measure Q

^∗

∈ Ω, such that D

φ

(Q

^∗

, P ) < ∞ and

D

φ

(Q

^∗

, P ) ≤ D

φ

(Q, P ) for all Q ∈ Ω,

is called a projection of P on Ω. This projection may not exist, or may be not defined uniquely.

3. Minimum divergence estimates

Let X

1

, ..., X

n

denote an i.i.d. sample of a random vector X ∈ R

^m

with distribution P

0

. Let P

n

be the empirical measure pertaining to this sample, namely P

n

( · ) := 1

n X

n i=1

δ

Xi

( · )

in which δ

x

denotes the Dirac measure at point x. We will endow our statistical approach in the global context of s.f.m’s with total mass 1 satisfying linear constraints:

(3.1) M

^θ

:=

Q ∈ M such that Z

R^m

dQ(x) = 1 and Z

R^m

g(x, θ) dQ(x) = 0

and

(3.2) M := [

θ∈Θ

M

^θ

,

sets of signed finite measures that replace M

¹θ

and M

¹

. Enhancing the model (1.1) to the above one (3.2) bears a number of improvements upon existing results; this is argued at the end of the present Section. The “plug-in” estimate of D

φ

( M

^θ

, P

0

) is

(3.3) D b

φ

( M

^θ

, P

0

) := inf

Q∈Mθ

D

φ

(Q, P

n

) = inf

Q∈Mθ

Z

R^m

φ dQ

dP

n

(x)

dP

n

(x).

(6)

If the projection Q

n

of P

n

on M

^θ

exists, then it is clear that Q

n

is a s.f.m. (or possibly a p.m.) a.c.w.r.t. P

n

; this means that the support of Q

n

must be included in the set { X

1

, . . . , X

n

} . So, define the sets

(3.4) M

⁽ⁿ⁾θ

:=

(

Q ∈ M | Q a.c.w.r.t. P

n

, X

n i=1

Q(X

i

) = 1 and X

n i=1

Q(X

i

)g(X

i

, θ) = 0 )

,

which may be seen as subsets of R

ⁿ

. Then, the plug-in estimate (3.3) can be written as (3.5) D b

φ

( M

^θ

, P

0

) = inf

Q∈M⁽ⁿ⁾_θ

1 n

X

n i=1

φ (nQ(X

i

)) .

In the same way, D

φ

( M , P

0

) := inf

θ∈Θ

inf

Q∈Mθ

D

φ

(Q, P

0

) can be estimated by (3.6) D b

φ

( M , P

0

) = inf

θ∈Θ

inf

Q∈M⁽ⁿ⁾_θ

1 n

X

n i=1

φ (nQ(X

i

)) .

By uniqueness of arg inf

θ∈Θ

D

φ

( M

^θ

, P

0

) and since the infimum is reached at θ = θ

0

under the model, we estimate θ

0

through

(3.7) θ b

φ

= arg inf

θ∈Θ

inf

Q∈M⁽ⁿ⁾_θ

1 n

X

n i=1

φ (nQ(X

i

)) .

Enhancing M

¹

to M and accordingly extensions in the definitions of the φ functions on ] −∞ , + ∞ [ and of the φ-divergences on the whole space of s.f.m’s M , is motivated by the following arguments:

- If the domain (a, b) of the function φ is included in [0, + ∞ [ then minimizing over M

¹

or over M leads to the same estimates and test statistics. Hence, both approaches coincide for instance in the case of the divergences KL

m

, KL, modified χ

²

and Hellinger.

- Let θ be a given value in Θ. Denote Q

¹_n

and Q

n

respectively the projection of P

n

on M

¹θ

and on M

^θ

. If Q

¹_n

satisfies 0 < Q

n

(X

i

) < 1 for all i = 1, . . . , n then it coincides with Q

n

, i.e., Q

¹_n

= Q

n

. Therefore, in this case, both approaches leads also to the same estimates and test statistics.

- It may occur that for some θ in Θ and some i = 1, . . . , n, Q

¹_n

(X

i

) is a boundary value of [0, 1], hence the first order conditions are not met which makes a real difficulty for the calculation of the estimates over the sets of p.m. M

¹θ

and M

¹

. However, when M

¹

is replaced by M , then this problem does not hold any longer in particular when domφ = R, which is the case for instance of the χ

²

-divergence. Other arguments are given in remark 4.5 below.

The empirical likelihood paradigm (see Owen (1988), Owen (1990), Qin and Lawless (1994) and Owen (2001)), enters as a special case of the statistical issues related to estimation and tests based on φ − divergences with φ(x) = φ

0

(x) = − log x + x − 1, namely on KL

m

− divergence. Indeed, it is straightforward to see that the empirical log-likelihood ratio statistic for testing P

0

∈ M against P

0

∈ M / , in the context of φ-divergences, can be written as 2n D b

KLm

( M , P

0

); and that the EL estimate of θ

0

can be written as θ b

KLm

= arg inf

_θ∈Θ

D b

KLm

( M

^θ

, P

0

); see Remark 4.3 below. In the case of the power functions φ = φ

γ

, the corresponding estimates (3.7) belong to the class of GEL estimates introduced by Newey and Smith (2004), and (3.5) are the empirical Cressie-Read statistics introduced by Baggerly (1998) and Corcoran (1998).

The constrained optimization problems (3.5), (3.6) and (3.7) can be transformed into unconstrained

ones making use of some arguments of “duality” which we briefly state hereunder from Rockafellar

(1970). On the other hand, the obtention of asymptotic statistical results of the estimates and the

test statistics, under misspecification or under alternative hypotheses, requires to handle existence

(7)

conditions and characterization of the projection of P

0

on the submodel M

^θ

or on the entire model M . This also will be considered through duality, along the following Section.

4. Dual representation of φ − divergences under constraints

This Section is central for our purposes. Indeed, it provides the explicit form of the proposed estimates by transforming the constrained problems (3.5) to unconstrained ones, using Lagrangian duality which is a classical tool in optimization theory. This Section adapts this formalism to the context of divergences and the present statistical setting. The Lagrangian “dual” problems, corresponding to the “primal” ones

(4.1) inf

Q∈Mθ

D

φ

(Q, P

0

)

and its empirical counterpart (3.5), make use of the Fenchel-Legendre transform of φ, defined through

(4.2) ψ : t ∈ R 7→ ψ(t) := sup

x∈R

{ tx − φ(x) } . The “dual” problems associated to (4.1) and (3.5) are respectively

(4.3) sup

t∈R^1+l

 

 t

0

− Z

R^m

ψ(t

0

+ X

l j=1

t

j

g

j

(x, θ)) dP

0

(x)

 

 ,

and

(4.4) sup

t∈R^1+l

 

 t

0

− 1 n

X

n i=1

ψ(t

0

+ X

l j=1

t

j

g

j

(X

i

, θ))

 

 .

In the following propositions, we state sufficient conditions under which the primal problems (4.1) and (3.5) coincide respectively with the dual ones (4.3) and (4.4). First, recall some properties of the convex conjugate ψ of φ. For the proofs we can refer to Rockafellar (1970) Section 26. The function ψ is convex and closed, its domain is an interval with endpoints

(4.5) a

^∗

= lim

x→−∞

φ(x)

x , b

^∗

= lim

x→+∞

φ(x) x

satisfying a

^∗

< 0 < b

^∗

and ψ(0) = 0. The strict convexity of φ on its domain (a, b) is equivalent to the condition that its conjugate ψ is essentially smooth, i.e., differentiable with

(4.6) lim

t↓a^∗

ψ

^′

(t) = −∞ if a

^∗

> −∞ , lim

t↑b^∗

ψ

^′

(t) = + ∞ if b

^∗

< + ∞ .

Conversely, φ is essentially smooth on its domain (a, b) if and only if ψ is strictly convex on its domain (a

^∗

, b

^∗

). In all the sequel, we assume additionally that φ is essentially smooth. Hence, ψ is strictly convex on its domain (a

^∗

, b

^∗

), and it holds that

a

^∗

= lim

x↓a

φ

^′

(x), b

^∗

= lim

x↑b

φ

^′

(x), and

(4.7) ψ(t) = tφ

^′−¹

(t) − φ

φ

^′−¹

(t)

, for all t ∈ ]a

^∗

, b

^∗

[.

It holds also that ψ is twice continuously differentiable on ]a

^∗

, b

^∗

[, (4.8) ψ

^′

(t) = φ

^′−¹

(t) and ψ

^′′

(t) = 1

φ

^′′

φ

^′−¹

(t) .

(8)

In particular, ψ

^′

(0) = 1 and ψ

^′′

(0) = 1. Obviously, since φ is assumed to be closed, we have φ(a) = lim

x↓a

φ(x) and φ(b) = lim

x↑b

φ(x), which may be finite or infinite. Hence, by closedness of ψ, we have

ψ(a

^∗

) = lim

t↓a^∗

ψ(x) and ψ(b

^∗

) = lim

t↑b^∗

ψ(t).

Finally, the first and second derivatives of φ in a and b are defined to be the limits of φ

^′

(x) and φ

^′′

(x) when x ↓ a and when x ↑ b. The first and second derivatives of ψ in a

^∗

and b

^∗

are defined in a similar way. In Table 1, we give the convex conjugates ψ of some functions φ associated to standard divergences. We determine also their domains, (a, b) and (a

^∗

, b

^∗

).

Table 1. Convex conjugates for some standard divergences.

D

φ

φ domφ domψ ψ

D

KLm

φ(x) := − log x + x − 1 ]0, + ∞ [ ] − ∞ , 1[ ψ(t) = − log(1 − t) D

KL

φ(x) := x log x − x + 1 [0, + ∞ [ R ψ(t) = e

^t

− 1 D

χ²_m

φ(x) :=

¹₂^(x−1)_x ²

]0, + ∞ [

−∞ ,

¹₂

ψ(t) = 1 − √ 1 − 2t D

χ²

φ(x) :=

¹₂

(x − 1)

²

R R ψ(t) =

¹₂

t

²

+ t D

H

φ(x) := 2( √ x − 1)

²

[0, + ∞ [ ] − ∞ , 2[ ψ(t) =

_2−t^2t

D

φγ

φ(x) :=

^x^γ^{−γx+γ−1}_γ(γ−1)

−− −− ψ(t) =

¹_γ

(γt − t + 1)

^γ−1^γ

−

¹γ

Proposition 4.1. Let θ be a given value in Θ. If there exists Q

0

in M

⁽ⁿ⁾θ

such that (4.9) a < Q

0

(X

i

) < b, for all i = 1, . . . , n.

Then

(4.10) inf

Q∈M⁽ⁿ⁾_θ

D

φ

(Q, P

n

) = sup

t∈R^1+l

 

 t

0

− 1 n

X

n i=1

ψ(t

0

+ X

l j=1

t

j

g

j

(X

i

, θ))

 

 with dual attainment. Conversely, if there exists a dual optimal solution b t such that (4.11) a

^∗

< t b

0

+

X

l j=1

b

t

j

g

j

(X

i

, θ) < b

^∗

, for all i = 1, . . . , n,

then the equality (4.10) holds, and the unique optimal solution of the primal problem inf

_Q∈M(n) θ

D

φ

(Q, P

n

), namely the projection of P

n

on M

⁽ⁿ⁾θ

, is given by

Q

n

(X

i

) = 1

n φ

^′−¹

(b t

0

+ X

l j=1

t b

j

g

j

(X

i

, θ)), i = 1, ..., n, where b t is solution of the equations

( 1 −

n¹

P

n

i=1

φ

^′−¹

( t b

0

+ P

l

j=1

t b

j

g

j

(X

i

, θ)) = 0

−

n¹

P

n

i=1

g

j

(X

i

, θ)φ

^′−¹

(b t

0

+ P

l

j=1

t b

j

g

j

(X

i

, θ)) = 0, j = 1, ..., l.

Remark 4.1. For the χ

²

− divergence, we have a = −∞ and b = + ∞ . Hence, condition (4.9) holds

whenever M

⁽ⁿ⁾θ

is not void. More generally, the above Proposition holds for any φ-divergence with

φ function satisfying domφ = R.

(9)

Remark 4.2. Assume that g(x, θ) := x − θ. So, for any divergence D

φ

with domφ =]0, + ∞ [, which is the case of the modified χ

²

divergence and the modified Kullback-Leibler divergence (or equivalently EL method), condition (4.9) means that θ is an interior point of the convex hull of the data (X

1

, ..., X

n

). This is precisely what is checked in Owen (1990), p. 100, for the EL method;

For the asymptotic counterpart of the above results we have; see Theorem 1 in Broniatowski and Keziou (2006):

Proposition 4.2. Let θ be a given value in Θ. Assume that R

| g

j

(x, θ) | dP

0

(x) < ∞ for all j = 1, . . . , l. If there exists Q

0

in M

^θ

with D

φ

(Q

0

, P

0

) < ∞ and

²

(4.12) a < inf

x

dQ

0

dP

0

(x) ≤ sup

x

dQ

0

dP

0

(x) < b, P

0

− a.s.

Then

(4.13) inf

Q∈Mθ

D

φ

(Q, P

0

) = sup

t∈R^1+l

 

 t

0

− Z

R^m

ψ(t

0

+ X

l j=1

t

j

g

j

(x, θ)) dP

0

(x)

 



with dual attainment. Conversely, if there exists a dual optimal solution t

^∗

which is an interior point of the set

(4.14)

 

 t ∈ R

^1+l

such that Z

R^m

| ψ(t

0

+ X

l j=1

t

j

g

j

(x, θ)) | dP

0

(x) < ∞

 

 ,

then the dual equality (4.13) holds, and the unique optimal solution Q

^∗_θ

of the primal problem inf

Q∈Mθ

D

φ

(Q, P

0

), namely the projection of P

0

on M

^θ

, is given by

dQ

^∗_θ

dP

0

(x) = φ

^′−¹

(t

^∗₀

+ X

l j=1

t

^∗_j

g

j

(x, θ)), where t

^∗

is solution of

(4.15)

( 1 − R

φ

^′−¹

(t

^∗₀

+ P

l

j=1

t

^∗_j

g

j

(x, θ)) dP

0

(x) = 0

− R

g

j

(x, θ)φ

^′−¹

(t

^∗₀

+ P

l

j=1

t

^∗_j

g

j

(x, θ)) dP

0

(x) = 0, j = 1, . . . , l.

Furthermore, t

^∗

is unique if the functions 1

_Rm

, g

1

(., θ), . . . , g

l

(., θ) are linearly independent in the sense that P

0

n x | t

0

+ P

l

j=1

t

j

g

j

(x, θ) 6 = 0 o

> 0 for all t ∈ R

^m

with t 6 = 0.

For sake of brevity and clearness, we must introduce some additional notations. Denote by g the vector valued function ( 1

_Rm

, g

1

, . . . , g

l

)

^T

. For any p.m. P and any measurable function f on R

^m

, P f denotes the integral R

R^m

f (x) dP (x). Let

(4.16) m(x, θ, t) := t

0

− ψ(t

^T

g(x, θ)), for all x ∈ R

^m

, θ ∈ Θ ⊂ R

^d

, t ∈ R

^1+l

. Note that the sup in (4.10) and (4.13) can be restricted respectively to the sets (4.17) Λ

n

(θ) :=

t ∈ R

^1+l

| a

^∗

< t

^T

g(X

i

, θ) < b

^∗

, for all i = 1, . . . , n

2The strict inequalities in (4.12) mean thatP0

n

x∈R^m| ^dQ_dP⁰

0(x)≤a o

=P0

n x|^dQ_dP⁰

0(x)≥b o

= 0.

(10)

and

(4.18) Λ(θ) :=

 

 t ∈ R

^1+l

| Z

R^m

| ψ(t

0

+ X

l j=1

t

j

g

j

(x, θ)) | dP

0

(x) < ∞

 

 .

In view of the above propositions, we redefine the estimates (3.5), (3.6) and (3.7) as follows (4.19) D b

φ

( M

^θ

, P

0

) := sup

t∈Λn(θ)

1 n

X

n i=1

m(X

i

, θ, t) := sup

t∈Λn(θ)

P

n

m(θ, t),

(4.20) D b

φ

( M , P

0

) := inf

θ∈Θ

sup

t∈Λn(θ)

1 n

X

n i=1

m(X

i

, θ, t) := inf

θ∈Θ

sup

t∈Λn(θ)

P

n

m(θ, t)

and

(4.21) θ b

φ

:= arg inf

θ∈Θ

sup

t∈Λn(θ)

1 n

X

n i=1

m(X

i

, θ, t) := arg inf

θ∈Θ

sup

t∈Λn(θ)

P

n

m(θ, t).

Remark 4.3. When φ(x) = − log x + x − 1, then the estimate (3.7) clearly coincides with the EL one, so it can be seen as the value of the parameter which minimizes the KL

m

-divergence between the model M and the empirical measure P

n

of the data. The statistics 2n D b

KLm

( M , P

0

), see (3.6), coincides with the empirical likelihood ratio associated to the null hypothesis H

⁰

: P

0

∈ M against the alternative H

¹

: P

0

6∈ M . The dual representation of D b

KLm

( M , P

0

), see (4.20), is

D b

KLm

( M , P

0

) = inf

θ∈Θ

sup

t∈Λn(θ)

 

 t

0

+ 1 n

X

n i=1

log(1 − t

0

− X

l j=1

t

j

g

j

(X

i

, θ))

 

 . For a given θ ∈ Θ, the KL

m

-projection Q

n

, of P

n

on M

^θ

, is given by (see proposition 4.1)

1 Q

n

(X

i

) = n



1 − t

^∗₀

− X

l j=1

t

^∗_j

g(X

i

, θ)



 , i = 1, . . . , n,

which, multiplying by Q

n

(X

i

) and summing upon i yields t

^∗₀

= 0. Therefore, t

0

can be omitted, and the above representation can be rewritten as follows

D b

KLm

( M , P

0

) = inf

θ∈Θ

sup

t1,...,tl

 

 1 n

X

n i=1

log(1 + X

l j=1

t

j

g

j

(X

i

, θ))

 

 and then

b θ

KLm

= θ b

EL

= arg inf

θ∈Θ

sup

t1,...,t_l

 

 1 n

X

n i=1

log(1 + X

l j=1

t

j

g

j

(X

i

, θ))

 

 in which the sup is taken over the set

 

 (t

1

, . . . , t

l

) ∈ R

^m

| − 1 <

X

l j=1

t

j

g

j

(X

i

, θ) < + ∞ , for all i = 1, . . . , n

 

 .

This is the ordinary dual representation of the EL estimate; see Qin and Lawless (1994) and Owen

(2001).

(11)

Remark 4.4. Consider the power divergences, associated to the power functions φ

γ

; see (2.3) and (2.4). We will show that the estimates θ b

φγ

belong to the class of GEL estimators introduced by Newey and Smith (2004). The projection Q

n

of P

n

on M

^θ

is given by

Q

n

(X

i

) =



(γ − 1)(t

^∗₀

+ X

l j=1

t

^∗_j

g(X

i

, θ)) + 1





1/(γ−1)

, i = 1, . . . , n.

Using the constraint P

n

i=1

Q

n

(X

i

) = 1, we can explicit t

^∗₀

in terms of t

^∗₁

, . . . , t

^∗_l

, and hence the sup in the dual representation (4.21) can be reduced to a subset of R

^l

, as in Newey and Smith (2004).

When φ(x) =

¹₂

(x − 1)

²

, then θ b

φ

coincides with the continuous updating estimator of Hansen et al.

(1996).

Remark 4.5. ( Numerical calculation of the estimates and the specific role of the χ

²

- divergence). The computation of b t(θ) for fixed θ ∈ Θ as defined in (4.15) is difficult when handling a generic divergence. In the case of χ

²

-divergence, i.e., when φ(x) =

¹₂

(x − 1)

²

, optimizing on all s.f.m’s, the system (4.15) is linear; we thus easily obtain an explicit form for b t(θ), which in turn allows for a single gradient descent when optimizing upon Θ. This procedure is useful in order to calculate the estimates for all other divergences (for which the corresponding system is non linear) including EL, since it provides an easy starting point for the resulting double gradient descent.

5. Asymptotic properties of the estimates of the parameter and the estimates of the divergences

5.1. Under the model. This Section addresses Problems 1 and 2, aiming at testing the null hypothesis H

⁰

: P

0

∈ M against the alternative H

¹

: P

0

6∈ M . We expose the limit distributions of the proposed test statistics which are the estimated divergences between the model M and P

0

. We also derive the limit distributions of the estimates of θ

0

. The following two Theorems extend Theorem 3.1 and 3.2 in Newey and Smith (2004) to the context of divergence based approach. The assumptions which we consider match those of Theorems 3.1 and 3.2 in Newey and Smith (2004).

Assumption 1. (a) P

0

∈ M and θ

0

∈ Θ is the unique solution to E [g(X, θ)] = 0; (b) Θ ⊂ R

^d

is compact; (c) g(X, θ) is continuous at each θ ∈ Θ with probability one; (d) E [sup

_θ∈Θ

k g(X, θ) k

^α

] <

∞ for some α > 2; (e) the matrix Ω := E

g(X, θ

0

)g(X, θ

0

)

^T

is nonsingular.

Theorem 5.1. Under assumption 1, the estimate θ b

φ

exists and converges to θ

0

in probability,

_n¹

P

n

i=1

g(X

i

, θ b

φ

) = O

P

(1/ √ n), b t( θ b

φ

) := arg sup

_t∈Λ

n(bθφ)

P

n

m( θ b

φ

, t) exists and belongs to int(Λ

n

( θ b

φ

)) with probability approaching one as n → ∞ , and b t( θ b

φ

) = O

P

(1/ √

n).

In order to obtain asymptotic normality, we need some additional assumptions. Denote by G the matrix G := E [∂g(X, θ

0

)/∂θ].

Assumption 2. (a) θ

0

∈ int(Θ); (b) With probability one g(X, θ) is continuously differentiable in a neighborhood N of θ

0

and E [sup

_θ∈N

k ∂g(X, θ)/∂θ k ] < ∞ ; (c) rank(G) = d.

Theorem 5.2. Assume that assumptions 1 and 2 hold. Then, (1) √ n

θ b

φ

− θ

0

converges in distribution to a centered normal vector with covariance matrix V :=

GΩ

⁻¹

G

^T

−1

.

(2) If l > d, the statistic 2n D b

φ

( M , P

0

) converges in distribution to a χ

²

random variable with

(l − d) degrees of freedom.

(12)

Remark 5.1. The above Theorem allows to perform statistical tests (of the model) with asymptotic level α. Consider the null hypothesis

(5.1) H

⁰

: P

0

∈ M against the alternative H

¹

: P

0

6∈ M . The critical region is then

C

φ

:= n

2n D b

φ

( M , P

0

) > q

(1−α)

o

where q

_(1−α)

is the (1 − α)-quantile of the χ

²

(l − d) distribution. When φ(x) = − log x + x − 1, the corresponding test is the empirical likelihood ratio one; see Qin and Lawless (1994).

5.2. Asymptotic properties of the estimates of the divergences for a given value of the parameter. For a given θ ∈ Θ, consider the test problems of the null hypothesis H

⁰

: P

0

∈ M

^θ

against two different families of alternative hypotheses: H

¹

: P

0

∈ M /

^θ

and H

1^′

: P

0

∈ M \ M

^θ

. Those two tests address different situations since H

¹

may include misspecification of the model. We present two different test statistics each pertaining to one of the situations and derive their limit distributions both under H

⁰

and under the alternatives. As a by product we also derive confidence areas for the true value θ

0

of the parameter. We will state the convergence in probability of D b

φ

( M

^θ

, P

0

) to D

φ

( M

^θ

, P

0

), and we will obtain the limit law of D b

φ

( M

^θ

, P

0

) both when P

0

∈ M

^θ

and when P

0

6∈ M

^θ

. Obviously, when P

0

∈ M

^θ

, this means that θ = θ

0

since the true-value θ

0

of the parameter is assumed to be unique.

Assumption 3. (a) P

0

∈ M

^θ

and θ is the unique solution to E [g(X, θ)] = 0; (b) E [ k g(X, θ) k

^α

] < ∞ for some α > 2; (c) the matrix

Ω := E

g(X, θ)g(X, θ)

^T

is nonsingular.

Theorem 5.3. Under assumption 3, we have

(1) b t(θ) := arg sup

_t∈Λ(θ)

P

n

m(θ, t) exists and belongs to int(Λ(θ)) with probability approaching one as n → ∞ , and b t(θ) = O

P

(1/ √ n).

(2) The statistic 2n D b

φ

( M

^θ

, P

0

) converges in distribution to a χ

²

(l) random variable.

In order to obtain the limit distribution of the test statistic 2n D b

φ

( M

^θ

, P

0

) under the alternative H

¹

: P

0

∈ M /

^θ

, including misspecification, the following assumption is needed.

Assumption 4. (a) P

0

6∈ M

^θ

, and t

^∗

(θ) := arg sup

_t∈Λ(θ)

E [m(X, θ, t)] exists and is an interior point of Λ(θ); (b) E [sup

_t∈N

| m(X, θ, t) | ] < ∞ for some compact set N ⊂ Λ(θ) such that t

^∗

(θ) ∈ int(N );

(c) the functions 1

_Rm

, g

1

, . . . , g

l

are linearly independent in the following sense:

P

0

n

x | t

0

+ P

l

j=1

t

j

g

j

(x, θ) 6 = 0 o

> 0 for all t ∈ R

^1+l

with t 6 = 0.

Assumption (c) hereabove ensures the strict concavity of the function t ∈ Λ(θ) 7→ E [m (X, θ, t)];

otherwise t

^∗

(θ) may not be defined uniquely implying possible inconsistency of b t(θ).

Theorem 5.4. Under assumption 4, when P

0

6∈ M

^θ

, we have (1) b t(θ) converges in probability to t

^∗

(θ).

(2) D b

φ

( M

^θ

, P

0

) converges in probability to D

φ

( M

^θ

, P

0

).

We now give the limit distribution of the test statistics under H

¹

. We need the following additional condition.

Assumption 5. (a) with probability one, the function t 7→ m(X, θ, t) is C

³

in a neighborhood N (t

^∗

(θ)) of t

^∗

(θ), and all third order partial derivatives (w.r.t. t) of { t 7→ m(X, θ, t); t ∈ N } are dominated by some P

0

-integrable function;

(b) E

m(X, θ, t

^∗

(θ))

²

< ∞ , E

k ∂m(X, θ, t

^∗

(θ))/∂t k

²

< ∞ , and the matrix E

∂

²

m(X, θ, t

^∗

(θ))/∂t

²

exists and nonsingular.

(13)

Theorem 5.5. Under assumptions 4 and 5, we have

(1) √ n(b t(θ) − t

^∗

(θ)) converges in distribution to a centered normal vector with covariance matrix

[E [m

^′′

(X, θ, t

^∗

)]]

⁻¹

E

m

^′

(X, θ, t

^∗

)m

^′

(X, θ, t

^∗

)

^T

[E [m

^′′

(X, θ, t

^∗

)]]

⁻¹

. (2) √ n

D b

φ

( M

^θ

, P

0

) − D

φ

( M

^θ

, P

0

)

converges in distribution to a centered normal random variable with variance

σ

²

(θ) = E

m(X, θ, t

^∗

(θ))

²

− [E [m(X, θ, t

^∗

(θ))]]

²

.

Remark 5.2. Let θ be a given value in Θ. Consider the test problem of the null hypothesis (5.2) H

⁰

: P

0

∈ M

^θ

against P

0

∈ M /

^θ

.

In view of Theorem 5.3 part 2, we reject H

⁰

against H

¹

at asymptotic level α when 2n D b

φ

( M

^θ

, P

0

) exceeds the (1 − α)- quantile of the χ

²

(l) distribution. Theorem 5.5 part 2 is useful to give an approximation to the power function

P

0

∈ M /

^θ

7→ β(P

0

) := P

0

h 2n D b

φ

( M

^θ

, P

0

) > q

_(1−α)

i . We obtain then the following approximation

(5.3) β(P

0

) ≈ 1 − F

N

√ n σ(θ)

h q

1−α

2n − D

φ

( M

^θ

, P

0

) i ,

where F

N

is the cumulative distribution of the standard normal distribution. From this approximation, we can give the approximate sample size that ensures a desired power β for a given alternative P

0

∈ M /

^θ

. Let n

0

be the positive root of the equation

β = 1 − F

N

√ n σ (θ)

q

(1−α)

2n − D

φ

( M

^θ

, P

0

) i.e.,

n

0

= (a + b) − p

a (a + 2b) 2D

φ

( M

^θ

, P

0

)

²

with a := σ(θ

^∗

)

²

F

_N⁻¹

(1 − β )

2

and b := q

_(1−α)

D

φ

( M

^θ

, P

0

) . The required sample size is then

⌊ n

0

⌋ + 1 where ⌊ n

0

⌋ is the integer part of n

0

.

Remark 5.3. (Generalized empirical likelihood ratio test). For testing H

⁰

: P

0

∈ M

^θ

against the alternative H

^′1

: M \ M

^θ

, we propose to use the statistics

(5.4) 2nS

_n^φ

:= 2n

D b

φ

( M

^θ

, P

0

) − inf

θ∈Θ

D b

φ

( M

^θ

, P

0

)

which converge in distribution to a χ

²

(d) random variable under H

⁰

when assumptions 1 and 2 hold. This can be proved using similar arguments as in Theorems 5.2 and 5.3. We then reject H

⁰

at asymptotic level α when 2nS

_n^φ

> q

(1−α)

, the (1 − α)-quantile of the χ

²

(d)-distribution. Under H

^′1

and when assumptions 1,2,4 and 5 hold, as in Theorem 5.5, it can be proved that

(5.5) √

n S

n^φ

− D

φ

( M

^θ

, P

0

) converges to a centered normal random variable with variance

σ

²

(θ) := E m(X, θ, t

^∗

(θ))

²

− (Em(X, θ, t

^∗

(θ)))

²

. So, as in the above remark, we obtain the following approximation

(5.6) β (P

0

) ≈ 1 − F

N

√ n σ(θ)

h q

_1−α

2n − D

φ

( M

^θ

, P

0

) i

(14)

to the power function P

0

∈ M / M

^θ

7→ P

0

2nS

_n^φ

> q

_(1−α)

. The approximated sample size required to achieve a desired power for a given alternative can be obtained as in the above Remark.

Remark 5.4. (Confidence region for the parameter). For a fixed level α, using convergence (5.4), the set

θ ∈ Θ such that 2nS

_n^φ

≤ q

_(1−α)

is an asymptotic confidence region for θ

0

where q

_(1−α)

is the (1 − α)-quantile of the χ

²

(d)- distribution.

5.3. Under misspecification. We address Problem 1 stating the limit distribution of the proposed test statistics under the alternative H

¹

: P

0

∈ M / . This needs the introduction of Q

^∗_θ∗

, the projection of P

0

on M . Assumption 6 below ensures the existence of the “pseudo-true” value θ

^∗

as well as the existence of the projection Q

^∗_θ∗

of P

0

on M , and states some necessary other regularity conditions.

Assumption 6. (a) Θ is compact, θ

^∗

:= arg inf

θ∈Θ

sup

_t∈Λ(θ)

E [m(X, θ, t)] exists and is unique; (b) g(X, θ) is continuous at each θ ∈ Θ with probability one; (c) E

h

sup

_{θ∈Θ,t∈N(θ)}

| m(X, θ, t) | i

< ∞ where N(θ) ⊂ Λ(θ) is a compact set such that t

^∗

(θ) ∈ int (N(θ)); (d) the functions 1

_Rm

, g

1

, . . . , g

l

are linearly independent in the following sense: P

0

n

x | t

0

+ P

l

j=1

t

j

g

j

(x, θ) 6 = 0 o

> 0 for all t ∈ R

^1+l

with t 6 = 0.

Theorem 5.6. Under assumption 6, we have

(1) k b t(θ) − t

^∗

(θ) k converges in probability to 0 uniformly in θ ∈ Θ.

(2) θ b

φ

converges in probability to θ

^∗

;

(3) D b

φ

( M , P

0

) converges in probability to D

φ

( M , P

0

).

The asymptotic normality of the test statistics under misspecification requires the following additional conditions.

Assumption 7. (a) θ

^∗

∈ int(Θ); (b) with probability one, the function (θ, t) 7→ m(X, θ, t) is C

³

in a neighborhood N ⊂ Θ × Λ(Θ) of (θ

^∗

, t

^∗

(θ

^∗

)), and all the third order partial derivative functions are dominated on N by some P

0

-integrable function; (c) E

m(X, θ

^∗

, t

^∗

(θ

^∗

))

²

, E

h k ∂m(X, θ

^∗

, t

^∗

(θ

^∗

))/∂t k

²

i and E

h k ∂m(X, θ

^∗

, t

^∗

(θ

^∗

)/∂θ k

²

i

are finite, and the matrix S :=

S

11

S

12

S

21

S

22

, exists and is nonsingular, where S

11

:= E

∂

²

m(X, θ

^∗

, t

^∗

(θ

^∗

))/∂t

²

, S

12

= S

21T

:= E

∂

²

m(X, θ

^∗

, t

^∗

(θ

^∗

))/∂t∂θ and S

22

:= E

∂

²

m(X, θ

^∗

, t

^∗

(θ

^∗

))/∂θ

²

.

Theorem 5.7. Under assumptions 6 and 7, we have (1)

√ n b t( θ b

φ

) − t

^∗

(θ

^∗

) θ b

φ

− θ

^∗

!

converges in distribution to a centered normal vector with covariance matrix W = S

⁻¹

M S

⁻¹

where

M := E

"

_∂

∂t

m (X, θ

^∗

, t

^∗

(θ

^∗

))

∂

∂θ

m (X, θ

^∗

, t

^∗

(θ

^∗

))

∂

∂t

m (X, θ

^∗

, t

^∗

(θ

^∗

))

∂

∂θ

m (X, θ

^∗

, t

^∗

(θ

^∗

))

T

#

;

(15)

(2) √ n

D b

φ

( M , P

0

) − D

φ

( M , P

0

)

converges in distribution to a centered normal variable with variance

σ

²

(θ

^∗

) = E

m(X, θ

^∗

, t

^∗

(θ

^∗

))

²

− [E [m(X, θ

^∗

, t

^∗

(θ

^∗

))]]

²

.

Remark 5.5. In the case of EL, i.e., when φ(x) = − log x + x − 1, assumption (6-c) implies that (see 4.12)

−∞ < inf

x

t

0

+ t

^T

g(x, θ) ≤ sup

x

t

0

+ t

^T

g(x, θ) < 1

P

0

-a.s for all θ ∈ N (θ

^∗

) and t ∈ N (θ). This imposes a restriction on the model when the support of P

0

is unbounded. Indeed, when the support of P

0

is for example the whole space R

^m

condition above does not hold when g is unbounded. At the contrary the same condition may hold for other divergences associated to φ functions with domφ = R.

Remark 5.6. Theorem 5.7 is useful for the computation of the power function. For testing the null hypothesis P

0

∈ M against the alternative H

¹

: P

0

∈ M / , the power function is

(5.7) P

0

∈ M 7→ / β(P

0

) := P

0

h 2n D b

φ

( M , P

0

) > q

_(1−α)

i .

Using Theorem 5.7 part 2, we obtain the following approximation to the power function (5.7):

(5.8) β(P

0

) ≈ 1 − F

N

√ n σ (θ

^∗

)

q

_(1−α)

2n − D

φ

( M , P

0

)

where F

_N

is the empirical cumulative distribution of the standard normal distribution. From the proxy value of β (P

0

) hereabove, the approximate sample size that ensures a given power β for a given alternative P

0

6∈ M can be obtained as follows. Let n

0

be the positive root of the equation

β = 1 − F

N

√ n σ(θ

^∗

)

q

(1−α)

2n − D

φ

( M , P

0

) i.e.

n

0

= (a + b) − p

a (a + 2b) 2D

φ

( M , P

0

)

²

with a := σ(θ

^∗

)

²

F

_N⁻¹

(1 − β )

2

and b := q

_(1−α)

D

φ

( M , P

0

) . The required sample size is then

⌊ n

0

⌋ + 1 where ⌊ n

0

⌋ is the integer part of n

0

.

6. Simulation results: Approximation of the power function of the empirical likelihood ratio test

We will illustrate by simulation the accuracy of the power approximation (5.8) in the case of EL method, i.e., when φ(x) = − log x + x − 1. Consider the test problem of the composite null hypothesis

H

⁰

: P

0

∈ M against the alternative H

¹

: P

0

∈ M / where M = S

θ∈R

M

^θ

and M

^θ

is the set of all s.f.m’s satisfying the constraints R

dQ(x) = 1 and R g(x, θ) dQ(x) = 0 with g(x, θ) := (x, x

²

− θ), namely

M

^θ

:=

Q such that Z

R

dQ(x) = 1 and Z

R

g(x, θ) dQ(x) = 0

,

where θ ∈ R is the parameter of interest. We consider the asymptotic level α = 0.05 and the

alternatives P

0

:= U ([ − 1, 1 + ǫ]) 6∈ M for different values of ǫ in the interval ]0, 1]. Note that when

ǫ = 0 then the uniform distribution U ([ − 1, 1]) belongs to the model M . For this model, we can show

also that all assumptions of Theorem 5.2 are satisfied when ǫ = 0, and all assumptions of Theorem

5.7 are met under alternatives. In figure 1, the power function (5.7) is plotted (with a continuous

line), with sample sizes n = 50, n = 100, n = 200 and n = 500, for different values of ǫ. Each

(16)

power entry was obtained by Monte-Carlo from 1000 independent runs. The approximation (5.8) is plotted (with a dashed line) as a function of ǫ. The estimates θ b

φ

and D b

φ

( M , P

0

) are calculated using the Newton algorithm. We observe from figure 1 that the approximation is accurate even for moderate sample sizes.

Figure 1. Approximation of the power function

0 0.2 0.4 0.6 0.8 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Alternatives

Power function and its approximation

n=50

Power Approxim.

Level

0 0.2 0.4 0.6 0.8 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Alternatives

n=100

Power Approxim.

Level

0 0.2 0.4 0.6 0.8 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Alternatives

n=200

Power Approxim.

Level

0 0.2 0.4 0.6 0.8 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Alternatives

n=500

Power Approxim.

Level

7. Concluding remarks and possible developments

We have proposed new estimates and tests for model satisfying linear constraints with unknown

parameter through divergence based methods which generalize the EL approach. This leads to the

obtention of the limit distributions of the test statistics and the estimates under alternatives and

under misspecification, which can not be obtained through the likelihood point of view. Consistency

of the test statistics under the alternatives is the starting point for the study of the optimality of

the tests through Bahadur approach; also the generalized Neyman-Pearson optimality of EL test

(as developed by Kitamura (2001)) can be adapted for empirical divergence based methods. Many

problems remain to be studied in the future such as the choice of the divergence which leads to

an optimal (in some sense) estimator or test in terms of efficiency and/or robustness. Preliminary

(17)

simulation results show that Hellinger divergence enjoys good properties in terms of efficiency- robustness; see Broniatowski and Keziou (2008). Also comparisons under local alternatives should be developed.

8. Appendix

Proof of Theorem 5.1 The same arguments, used for the proof of Theorem 3.1 in Newey and Smith (2004), hold when their criterion function (θ, λ) ∈ Θ × R

^l

7→

n¹

P

n

i=1

ρ(λ

^T

g(X, θ)) is replaced by our function (θ, t) ∈ Θ × R

^1+l

7→

n¹

P

n

i=1

m(t

^T

g(X, θ)). In particular, we have max

i≤n

b t( b θ

φ

)

^T

g(X

i

, θ b

φ

) tends to 0 in probability, which implies that b t( b θ

φ

) ∈ int(Λ

n

( θ b

φ

)) with probability one as n → ∞ , since a

^∗

< 0 < b

^∗

.

Proof of Theorem 5.2. The proof is similar to that of Newey and Smith (2004) Theorem 3.2.

Hence, it is omitted.

Proof of Theorem 5.3. (1) It is a particular case of Theorem 5.1 taking Θ = { θ } . (2) The first order conditions P

n

∂m(θ,b t)/∂t = 0 are satisfied with probability one as n → ∞ . Hence by a Taylor expansion we obtain

0 = P

n

∂m θ, b t /∂t

= P

n

∂m (θ, 0) /∂t + 1 2

P

n

∂

²

m θ, t /∂t

²

T

b t, (8.1)

where t ∈ R

^1+l

is a vector inside the segment that links 0 and b t. By the uniform weak law of large numbers (UWLLN), and dominated convergence Theorem, we have P

n

∂

²

m θ, t

/∂t

²

tends in probability to

E

∂

²

m(X, θ, 0)/∂t

²

= −

1 0

^T

0 Ω

=: − M, which is nonsingular and symmetric. Hence, we can write

(8.2) √ nb t = M

⁻¹

√

nP

n

∂m(X, θ, 0)/∂t + o

P

(1).

Using similar arguments, we get also

D b

φ

( M

^θ

, P

0

) = P

n

m(θ, b t) = [P

n

∂m(θ, 0)/∂t]

^T

b t − 1

2 b t

^T

M b t + o

P

(1/n).

From this, using (8.2), we obtain D b

φ

( M

^θ

, P

0

) = 1

2 [P

n

∂m(θ, 0)/∂t]

^T

M

⁻¹

[P

n

∂m(θ, 0)/∂t] + o

P

(1).

This yields to

(8.3) 2n D b

φ

( M

^θ

, P

0

) = [P

n

∂m(θ, 0)/∂t]

^T

M

⁻¹

[P

n

∂m(θ, 0)/∂t] + o

P

(1).

In the other hand, direct calculation shows that E

∂m(X, θ, 0)∂m(X, θ, 0)

^T

= M.

Combining this with (8.3), we conclude the proof.

Proof of Theorem 5.4. (1) First, note that condition (b) implies that t

^∗

(θ) is unique since t ∈ Λ(θ) 7→ E [m(X, θ, t)] is strictly concave by (c) and Λ(θ) is a convex set. By UWLLN, using continuity of m(X, θ, t) in t and condition (b), we obtain

(8.4) | P

n

m(θ, t) − E [m(X, θ, t)] | → 0,

(18)

in probability uniformly in t over the compact set N . Using this and the fact that t

^∗

(θ) :=

arg sup

_t∈Λ(θ)

P

0

m(θ, t) is unique and belongs to int(N) and the strict concavity of t 7→ P

0

m(θ, t), we conclude that any value

(8.5) t := arg sup

t∈N

P

n

m(θ, t)

converges in probability to t

^∗

(θ); see e.g. Theorem 5.7 in van der Vaart (1998). We end the proof by showing that b t(θ) belongs to int(N ) with probability one as n → ∞ , and therefore it converges to t

^∗

(θ). In fact, since for n sufficiently large any value t lies in the interior of N , concavity of t 7→ P

n

m(θ, t) implies that no other point t in the complement of int(N ) can maximize P

n

m(θ, t) over t ∈ R

^1+l

, hence b t(θ) must be in int(N ).

(2) We have D b

φ

( M

^θ

, P

0

) = P

n

m(θ, b t) = P

n

m(θ, t) where the second equality holds for n sufficiently large. Hence we can write

b D

φ

( M

^θ

, P

0

) − D

φ

( M

^θ

, P

0

) = P

n

m(θ, t) − P

0

m(θ, t

^∗

)

≤ P

n

m(θ, t) − P

0

m(θ, t) + P

0

m(θ, t) − P

0

m(θ, t

^∗

) .

The first term tends to 0 in probability by (8.4), the second term tends to 0 by dominated convergence Theorem using assumption (b).

Proof of Theorem 5.5 . (1) By Taylor expansion, there exists t ∈ R

^l+1

inside the segment that links b t and t

^∗

with

(8.6)

0 = P

n

m

^′

(θ, b t)

= P

n

m

^′

(θ, t

^∗

) + (P

n

m

^′′

(θ, t

^∗

))

^T

b t − t

^∗

+

¹₂

b t − t

^∗

T

P

n

m

^′′′

(θ, t) b t − t

^∗

.

By condition (a) and the Law of Large Numbers (LLN), we get P

n

m

^′′′

(θ, t) = O

P

(1). Hence, we can write the last term in the right hand side of (8.6) as o

P

(1) b t − t

^∗

. On the other hand, by the WLLN, P

n

m

^′′

(θ, t

^∗

) converges in probability to the matrix P

0

m

^′′

(θ, t

^∗

). Write P

n

m

^′′

(θ, t

^∗

) as P

0

m

^′′

(θ, t

^∗

) + o

P

(1) to obtain from (8.6)

(8.7) − P

n

m

^′

(θ, t

^∗

) = (P

0

m

^′′

(θ, t

^∗

) + o

P

(1)) b t − t

^∗

.

By the Central Limit Theorem (CLT), we have √ nP

n

m

^′

(θ, t

^∗

) = O

P

(1), which by (8.7) implies that √ n b t − t

^∗

= O

P

(1). Hence, from (8.7), we get

(8.8) √

n b t − t

^∗

= [ − P

0

m

^′′

(θ, t

^∗

)]

⁻¹

√

nP

n

m

^′

(θ, t

^∗

) + o

P

(1).

The CLT concludes the proof of part 1. (2) Using the fact that b t − t

^∗

= O

P

(1/ √ n) and P

n

m

^′

(θ, t

^∗

) = P

0

m

^′

(θ, t

^∗

) + o

P

(1) = 0 + o

P

(1) = o

P

(1), we obtain

√ n

D b

φ

( M

^θ

, P

0

) − D

φ

( M

^θ

, P

0

)

= √

n

D b

φ

( M

^θ

, P

0

) − P

0

m(θ, t

^∗

)

= √ n (P

n

m(θ, t

^∗

) − P

0

m(θ, t

^∗

)) + o

P

(1), and the CLT yields to the conclusion of the proof.

Proof of Theorem 5.6. (1) First note that condition (d) implies that the function t ∈ Λ(θ) 7→

Em(X, θ, t) is strictly concave for all θ ∈ Θ. Hence, condition (c) implies that t

^∗

(θ) is unique for all θ ∈ Θ. By UWLLN, using continuity of m(X, θ, t), in θ and t, and condition (c), we obtain the uniform convergence in probability, over the compact set { (θ, t) | θ ∈ Θ, t ∈ N (θ) } ,

(8.9) sup

θ∈Θ,t∈N(θ)

| P

n

m(θ, t) − P

0

m(θ, t) | → 0.

Divergences and duality for estimation and test under moment condition model

HAL Id: hal-00451831

https://hal.archives-ouvertes.fr/hal-00451831v2

Submitted on 9 Apr 2010

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.