• Aucun résultat trouvé

Asymptotic normality in mixture models

N/A
N/A
Protected

Academic year: 2022

Partager "Asymptotic normality in mixture models"

Copied!
17
0
0

Texte intégral

(1)

ASYMPTOTIC NORMALITY IN MIXTURE MODELS

SARA VAN DE GEER

.

Abstract. We study the estimation of a linear function 0=R adF0 of a distribution F0, using i.i.d. observations of the mixture pF0 =

R

k(y)dF0(y). Let ^Fn be the maximum likelihood estimator of F0 and ^n =R adF^n. We examine the asymptotic distribution of ^n. A problem here is that usually, ^Fn does not dominate F0. Our main aim is to show that this can be overcome by considering the convex combination Fn^ + (1;)F0, with <1.

1. Introduction

Let

X

1

:::X

n be independent identically distributed random variables on (X

A), with distribution

P

. Suppose that for some

-nite measure

,

p

=

dP

d

2f

p

F =

Z

k

(

y

)

dF

(

y

) :

F

2 g

where is the class of all probability measures on a measurable space (Y

B), and where

k

:XY !0

1) is a given kernel withR

k

(

xy

)

d

(

x

) = 1 for all

y

2Y. So

p

=

p

F0 =

Z

k

(

y

)

dF

0(

y

)

for some

F

0 2 .

In this paper, we assume that

F

0 is unknown, and that the maximum likelihood estimator ^

F

n of

F

0 exists. The latter is (not necessarily uniquely)

dened by n

X

i=1log

p

F^n(

X

i) = maxF

2

n

X

i=1log

p

F(

X

i)

:

Consider now functionals of the form

(

F

) =

Z

adF F

2

with

a

a given function on (Y

B). Write

0 = R

adF

0 and ^

n = R

ad F

^n. We shall investigate the asymptotic behaviour (and eciency) of the esti- mator of ^

n of

0. An important issue in this context is the appropriate dierentiability of

(

F

). Let us briey sketch the main idea.

ESAIM: Probability and Statistics is an electronic journal with URL address

http://www.emath.fr/ps.

Received by the journal 8 March 1995. Revised 2 June 1995. Accepted for publica- tion 28 September 1995

(2)

Consider a pair of random variables (

XY

) with values in X Y, let

k

(

y

) be the density of

X

given

Y

=

y

, and

F

0 the distribution of

Y

. For a function

b

:X !R, write

E

(

b

(

X

)j

Y

=

y

) =

A

b

(

y

)

:

Thus

A

b

(

y

) =

Z

k

(

xy

)

b

(

x

)

d

(

x

)

:

Write for

h

:Y !R,

E

(

h

(

Y

)j

X

=

x

) =

A

F0

h

(

x

)

where

A

F

h

(

x

) =

R

k

(

xy

)

h

(

y

)

dF

(

y

)

p

F(

x

)

:

If for some

b

,

E

(

b

(

X

)j

Y

=

y

) =

a

(

y

)

for

F

0;almost all

y

(1

:

1) then clearly

E

(

b

(

X

)) =

0

:

So if (1.1)holds for some

b

2

L

2(

P

) not depending on

F

0, then (1

=n

)Pni=1

b

(

X

i) is ap

n

-consistent and asymptotically normal estimator of

0. We say that

(

F

0) is dierentiable at

F

0 if a solution of (1.1) exists. Note that if

k

(

y

) is a complete family for

y

in the support of

F

0, there is at most one solution of (1.1). In general, there may also be several solutions, in which case we would like to take the one with the smallest variance. But such a solution possibly depends on

F

0. The arguments below indicate that perhaps the maximum likelihood procedure automatically picks the best solution, with the estimator ^

F

n plugged in for

F

0.

We shall now discuss the solution with the smallest variance. First, we center the functions. Instead of

a

in (1.1), we consider the gradient of

(

F

), which is dened as

F =

a

;Z

adF:

If for some

h

F 2

L

1(

F

) with R

h

F

dF

= 0,

A

A

F

h

F =

F

F

;a

:

s

:

(1

:

2) we call

b

F =

A

F

h

F (1

:

3) the ecient inuence curve at

(

F

). Note that

b

F is also centered now:

R

b

F

dP

F = 0. It follows from Van der Vaart (1991) that (1

=n

)R

b

2F0

dP

is a lower bound for the asymptotic variance of an estimator of

0. He considers parametric submodels with Hilbert space structure to arrive at results of this type in a very general context. In our case, the parametric submodel would

(3)

be indexed by F =f

F

t: (

dF

t)1=2= (1+12

th

F)(

dF

)1=2

j

t

jsmallg. For this reason, we call

h

F a direction (along which one can consider a submodel).

The assumptions

h

F 2

L

1(

F

) and R

h

F

dF

= 0 ensure that indeed F . Suppose now that the ecient inuence curves

b

F^n and

b

F0 exist. So

b

F^n =

A

F^n

h

F^n for some

h

F^n

2

L

1( ^

F

n), and

A

b

F^n =

F^n

F

^n;a

:

s

::

(1

:

4) Observe that ^

F

nis an interior point off

F

^nt :

d F

^nt = (1+

th

F^n)

d F

^n

j

t

jsmallg. So we have

dt d

n

X

i=1log

p

F^nt(

X

i)jt=0= 0

or n

X

i=1

b

F^n(

X

i) = 0

:

Write this as Z

b

F^n

dP

n = 0

with

P

n the empirical distribution based on

X

1

:::X

n (see (2.1)). Now, let us compare this withR

b

F^n

dP

. Changing the order of integration gives

Z

b

F^n

dP

=

Z

A

b

F^n

dF

0

:

Moreover, if ^

F

n dominates

F

0, then (1.4) yields

Z

A

b

F^n

dF

0 =

Z

F^n

dF

0=

0;

^n

:

So then we have the identity of van der Laan (1993, 1994):

^n;

0 =

Z

b

F^n

d

(

P

n;

P

)

:

(1

:

5) Finally, if

b

F^n converges to

b

F0 in an appropriate sense, one obtains asymp- totic normality (and eciency) of ^

n.

The problem is now that the assumption that ^

F

n dominates

F

0 is often not valid. Nevertheless, van der Laan (1993) presents some examples that show that the identity (1.5) can hold even when ^

F

n does not dominate

F

0. We shall however be concerned with the situation where the identity (1.5) is not necessarily true.

Our approach is to consider for each 0

<

1 and

F

2 , the convex combination

F

=

F

+ (1; )

F

0

:

Then indeed, ^

F

n dominates

F

0. We shall obtain asymptotic normality of

^n by choosing = ^n in such a way that it tends to one at an appropriate speed.

(4)

The paper is organized as follows. In Section 2, we derive a linear ap- proximation for ^

n. Asymptotic normality and eciency follow from this.

The conditions we need to arrive at the result are consistency of the maxi- mum likelihood estimator (obtained by separate means), dierentiability of

(

F

) at

F

=

F

, 0

<

1, bounded directions and suitable continuity conditions on the inuence curves. A discussion of these conditions can be found in Section 3. Section 4 presents some examples.

2. Asymptotic normality

As pseudo-metric on , we take the Hellinger distance between the mixtures:

d

(

F F

~) =

h

(

p

F

p

F~) = (12Z (p

p

F ;p

p

F~)2

d

)1=2

F F

~ 2

:

Consistency of ^

F

nin this metric holds in fairly general situations. It is closely related to the further assumptions as stated in Condition 1 (see Section 3.1 for more details).

We use the notation

P

n= 1

n

n

X

i=1

Xi

(2

:

1)

i.e.

P

n is the empirical distribution that puts mass (1

=n

) at each of the observations

X

1

:::X

n. Moreover, we shall make frequent use of stochastic order symbols. If f

Z

ng is a sequence of real-valued random variables, and

f

k

ng a sequence of positive numbers, then we say that

Z

n =

O

P(

k

n) if

Mlim!1lim supn!1P(j

Z

nj

> k

n

M

) = 0

:

Similarily,

Z

n=

o

P(

k

n) means that for all

>

0,

nlim!1P(j

Z

nj

> k

n

) = 0

:

Condition 1. (consistency and rates). The estimators ^

F

n and ^

n are consistent, i.e.

d

( ^

F

n

F

0) =

o

P(1)

(2

:

2) and

j

^n;

0j=

o

P(1)

:

(2

:

3) Moreover, for some

n2 =

o

(

n

;1=2),

Z

log( 2

p

F^n

p

F^n +

p

F0)

dP

n =

O

P(

n2)

:

(2

:

4) Condition 2 (dierentiability in a neighbourhood of

F

0 and existence of ecient inuence curves). For some

>

0, and for all 0

<

1 and

F

2 with

d

(

FF

0)

, we have

A

b

F =

F

F

;a

:

s

:

(2

:

5)

(5)

for

b

F satisfying

b

F(

x

) =

A

F

h

F(

x

)

p

F(

x

)

>

0

(2

:

6) for some

h

F 2

L

1(

F

), withR

h

F

dF

= 0.

Condition 3. (control on the direction

h

F). For some

>

0 and

M <

1,

d(FFsup0) sup

0<1y sup

2supp ort(F)j(1; )

h

F(

y

)j

M:

(2

:

7) Condition 4. (control on the ecient inuence curves

b

F). The informa- tion for estimating

0 is positive:

Z

b

2F0

dP >

0

:

(2

:

8) Moreover, the inuence curves are uniformly bounded: for some

>

0,

d(FFsup0) sup

0<1p sup

F

(x)>0j

b

F(

x

)j

<

1

:

(2

:

9) Finally, for 0

<

n!0,

d(FFsup0)n sup

0<1

Z

b

2F

dP

n=

Z

b

2F0

dP

+

o

P(1)

(2

:

10) and

d(FFsup0)n sup

0<1

Z

(

b

F;

b

F0)

d

(

P

n;

P

) =

o

P(

n

;1=2)

:

(2

:

11) Theorem 2.1. Assume that conditions 1-4 are met. Then

^n;

0 =

Z

b

F0

d

(

P

n;

P

) +

o

P(

n

;1=2)

:

(2

:

12) Proof. By (2.4), we can nd a non-random increasing function

: (0

1)! (0

1), satisfying

(

x

)!0 for

x

!0

(2

:

13)

(

x x

) 1

2 for all

x >

0

(2

:

14)

and

x

(

x

) !0 for

x

!0

(2

:

15) such that

n2 =

o

(

n

;1=2)

(

n

;1=2). This gives

Z

log( 2

p

F^n

p

F^n +

p

F0)

dP

n =

o

P(

n

;1=2)

(

n

;1=2)

:

(2

:

16)

(6)

Choose

1;^n = j

^n;

0j+

n

;1=2

(j

^n;

0j+

n

;1=2)

:

(2

:

17) Then ^n

<

1 and by (2.14), ^n 1

=

2. Since j

^n ;

0j =

o

P(1), (2.15) implies that (1;^n) =

o

P(1). Furthermore, (2.13) yields

j

^n;

0j

1;^n =

o

P(1)

(2

:

18)

as well as

n

;1=2

1;^n =

o

P(1)

:

(2

:

19)

Dene

F

~n = ^n

F

^n+ (1;^n)

F

0

:

(2

:

20) Let

>

0 be small enough, so that (2.5), (2.6), (2.7) and (2.9) in conditions 2-4 are fullled for this value of

, and let

D

n be the set

D

n=f

d

( ^

F

n

F

0)

g

:

Because ^n

<

1, we can nd on

D

n an inuence curve

b

F~n and a direction

h

F~n, such that

A

b

F~n =

F~n

and

b

F~n(

x

) =

A

F~n

h

F~n(

x

)

x

2f

p

F~n

>

0g

:

We can write

Z

b

F~n

dP

nlf

D

ng=

Z

b

F~n

d

(

P

n;

P

)lf

D

ng+

Z

b

F~n

dP

lf

D

ng

:

(2

:

21) Because

d

( ^

F

n

F

0) =

o

P(1), we have that lf

D

ng = 1 +

o

P(1). So, using (2.11),

Z

b

F~n

d

(

P

n;

P

)lf

D

ng=

Z

b

F0

d

(

P

n;

P

)+

o

P(

n

;1=2) =

O

P(

n

;1=2)

:

(2

:

22) Since ~

F

n dominates

F

0, and ^n= 1 +

o

P(1),

Z

b

F~n

dP

lf

D

ng=;^n(^

n;

0)lf

D

ng=;(1 +

o

P(1))(^

n;

0)

:

(2

:

23) Insert (2.22) and (2.23) into (2.21) to get that

Z

b

F~n

dP

nlf

D

ng=

Z

b

F0

d

(

P

n;

P

)+

o

P(

n

;1=2);(1+

o

P(1))(^

n;

0)

:

(2

:

24)

Let ^

t

n=

R

b

F~n

dP

n

(1;^n)R

b

2F0

dP

lf

D

ng

:

(2

:

25)

(7)

Equality (2.24), together with (2.8), (2.18) and (2.19) imply that

^

t

n =

o

P(1)

:

(2

:

26)

Take

M

as in (2.7) and

E

n=

D

n\fj^

t

nj

<

1

=M

g. Dene onf

E

ng,

d F

~n(^

t

n) = (1 + ^

t

n(1;^n)

h

F~n)

d F

~n

:

Then onf

E

ng,

F

~n(^

t

n)2

:

(2

:

27) Moreover, from (2.26) we know that lf

E

ng= 1+

o

P(1), so that by (2.9) and (2.10),

Z

log(

p

F~n(t^n)

p

F~n )

dP

nlf

E

ng=

Z

log(1 + ^

t

n(1;^n)

b

F~n)

dP

nlf

E

ng (2

:

28)

= ^

t

n(1;^n)

Z

b

F~n

dP

nlf

E

ng;1

2^

t

2n(1;^n)2

Z

b

2F~n

dP

nlf

E

ng(1 +

o

P(1))

= 12(R

b

F~n

dP

n)2

R

b

2F0

dP

lf

E

ng(1 +

o

P(1))

:

Since ^

F

n maximizes the likelihood, and (2.27) holds on

E

n, we have

Z

log

p

F^n

dP

nlf

E

ngZ log

p

F~n (

^tn)

dP

nlf

E

ng

:

(2

:

29) Furthermore, the concavity of the log-function and the fact that ^n 1

=

2 yield

Z

log

p

F~n

dP

n (2^n;1)

Z

log

p

F^n

dP

n+ 2(1;^n)

Z

log(

p

F^n+

p

F0

2 )

dP

n

:

(2

:

30) Combine (2.28), (2.29) and (2.30) to nd

2(1;^n)

Z

log( 2^

p

F^n

p

F^n+

p

F0)

dP

n 1

2(R

b

F~n

dP

n)2

R

b

2F0

dP

lf

E

ng(1 +

o

P(1))

:

(2

:

31) From (2.16) and (2.17), we know that the left-hand side of this equality is (1;^n)

o

P(

n

;1=2)

(

n

;1=2) = (j

^;

0j+

n

;1=2)

(

n

;1=2)

(j

^n;

0j+

n

;1=2)

o

P(

n

;1=2)

= (j

^n;

0j+

n

;1=2)

o

P(

n

;1=2)

where in the last step, we used that

is increasing. In view of (2.24), the right-hand side of (2.31) is of the form

(

O

P(

n

;1=2);(1 +

o

P(1))(^

n;

0))2(1 +

o

P(1)

R

b

2F0

dP

)

:

(8)

So we nd from (2.31) that

j

^n;

0j2 maxf

O

P(

n

;1)

(j

^n;

0j+

n

;1=2)

o

P(

n

;1=2)g

which implies j

^n;

0j=

O

P(

n

;1=2). But then, the left-hand side of (2.31) is

o

P(

n

;1), so that it reads

(

Z

b

F0

d

(

P

n;

P

)+

o

P(

n

;1=2);(1+

o

P(1))(^

n;

0))2(1+

o

P(1)) =

o

P(

n

;1)

:

In other words,

^n;

0 =

Z

b

F0

d

(

P

n;

P

) +

o

P(

n

;1=2)

:

t u

3. Comments on conditions 1-4

Conditions 1 and 4 can be veried using the concept of entropy. Therefore, we introduce the following denitions. Let

Q

be a probability measure on (X

A) andGLq(

Q

),

q

1.

Definition 1 The

-covering number

N

q(

G

Q

) of G is dened as the number of balls with radius

necessary to cover G. More precisely, let

f

g

jgmj=1 be such that for each

g

2G there is a

j

2f1

:::m

gsuch that

Z

(

g

;

g

j)q

dQ

q

:

(3

:

1) Then

N

q(

G

Q

) is the smallest

m

for which such a collectionf

g

jgmj=1exists.

The

-entropy of G is

H

q(

G

Q

) = log

N

q(

G

Q

)_1.

Definition 2. Let f

g

Lj

g

Uj]gmj=1 be such that for all

g

2 G there is a

j

2f1

:::m

gsuch that

g

Lj

g g

Uj

(3

:

2)

and Z

(

g

Uj;

g

Lj)q

dQ

q

:

(3

:

3) Write

N

Bq(

G

Q

)for the smallest

m

for which such a collectionf

g

Lj

g

Uj]gmj=1 exists. Then

H

Bq(

G

Q

) = log

N

Bq(

G

Q

)_1 is called the

-entropy with bracketing.

3.1. On Condition 1

Suppose thatY is a locally compact Hausdor space with countable base, and that B is the Borel

-algebra. Let C0 be the class of all functions

c

: Y ! R that vanish at innity. If

k

(

x

) 2 C0 for

-almost all

x

2 X, then

d

( ^

F

n

F

0)!0 almost surely (see Pfanzagl (1988)).

(9)

Denote the class of all measures

F

on (Y

B) with

F

(Y) 1 by . The vague topology on is the smallest topology such that

F

7!R

cdF

is continuous for every

c

2C0. Let

be the metric corresponding to the vague topology. We say that

F

0 is identiable (for the metric

) if for all

F

2 ,

d

(

FF

0) = 0 implies

(

FF

0) = 0. If

F

0 is identiable, then

d

( ^

F

n

F

0) !0 almost surely implies

( ^

F

n

F

0) ! 0 almost surely. So in particular, then

j

^n;

0j!0 almost surely, whenever

a

2C0. More details can be found in e.g. Pfanzagl (1988) or Van de Geer (1993a).

Let us now investigate the rate of convergence for the log-likelihood ratio. Note rst of all that

0

Z

log( 2

p

F^n

p

F^n+

p

F0)

dP

n 1 2

Z

log(

p

F^n

p

F0)

dP

n

so that nothing is lost (and indeed something could be gained) by comparing

p

F^n with the convex combination (

p

F^n+

p

F0)

=

2 instead of with

p

F0.

Lemma 3.1. Suppose that

Z

1

0

q

H

2B(

G

P

)

d <

1

(3

:

4) where

G=f

r

p

F +

p

F0

p

F0 :

F

2 g

:

Then Z

log(

p

F^n

p

F0)

dP

n=

O

P(

n2)

with

n2 =

o

(

n

;1=2)

:

Proof. See Van de Geer (1995). A slight modication can be found in Wong

and Shen (1992). tu

We call the left-hand side of (3.4) the entropy integral. If the entropy in- tegral diverges, suboptimal rates can emerge (see Birge and Massart (1993)).

Lemma 3.2 below makes use of the special structure of the mixing model. We need the following notation: for

0,

12(

) =

Z

pF0

p

F0

d

and

22(

) =

Z

pF0> 1

p

F0

d:

Lemma 3.2. Let K= f

k

(

y

) :

y

2 Yg. Suppose that the functions in K are uniformly bounded, and that for all probability measures

Q

with nite support,

N

2(

K

Q

)

A

;w

>

0

(3

:

5)

(10)

where the constants

A

and

w

do not depend on

Q

. Let 0

n ! 0,

n

1(

n)_

n

;(2+w)=(4+4w)(

2(

n))w=(2+2w). Then

Z

log( 2

p

F^n

p

F^n +

p

F0)

dP

n =

O

P(

n2)

:

(3

:

6)

Proof See Van de Geer (1993b). tu

Lemma 3.2 may not yield the optimal rate, but all we need here is a rate faster than

n

;1=2. This is the case if

2(

n) =

o

(

n

2w1 ) . To put it dierently, if we dene

1;1(

) = supf

:

1(

)

2g

then

n2 =

o

(

n

;1=2) if

2(

1;1(

)) =

o

(

;w2) for

#0

:

3.2. On Condition 2

It is natural to require (2.5) and (2.6) for = 0. The fact that we need these equalities for all 0

<

1 is closely related to being able to estimate the ecient inuence curve. However, it should be noted that we only assume

b

F to exist for certain

F

that dominate

F

0.

In some applications,

b

F^n does exist, but this usually will not help to simplify the proofs (see Section 4 for an example).

3.3. On Condition 3

Here, we assume that

h

F behaves like 1

=

(1; ). Clearly, this allows misbehaviour for ! 1, but it also requires

h

F to be bounded. In some applications, this reduces to assuming that

h

F0 is bounded. The proof of Theorem 2.1 reveals that we need that for all

t

suciently small,

dF

(

t

)

=dF

= 1+

t

(1; )

h

F exists and is non-negative, i.e., that

F

(

t

)2 . If for some function

g

,

dg=dF

0 exists and is bounded, say by

C

, then also

dg=dF

exists and (1; )

dg=dF

is bounded by

C

. This is the reason why in Example 4.1 and 4.3 our approach works. But it fails in Example 4.2!

3.4. On Condition 4

Let us present a brief overview of some results from empirical process theory, that can be applied in this context. We cite them from Pollard (1984) and Ossiander (1987). Throughout, we assume that the necessary measurability conditions are satised.

Consider a class G of functions on (X

A), with envelope

G

= supg

2G j

g

j

:

The class G is called a Glivenko-Cantelli class if

supg

2G j

Z

gd

(

P

n;

P

)j!0

almost surely

:

(11)

It is called a Donsker class if

n gd

(

P

n;

P

) converges in distribution to a mean zero Gaussian process onG. This limiting process is assumed to have continuous sample paths with respect to

(

), with

(

g

1

g

2)2=

Z

(

g

1;

g

2)2

dP

;(

Z

(

g

1;

g

2)

dP

)2

:

Necessary and sucient conditions forGto be a Glivenko-Cantelli class

are:

G

2

L

1(

P

)

and

H

1(

G

P

n) =

o

P(

n

)

>

0

:

IfG is a Donsker class, then the asymptotic equicontinuity condition holds, i.e. for all

>

0 there exists a

>

0 such that

limsupn

!1

P(supg

2G

j

p

n

Z

g d

(

P

n;

P

)j

>

)

<

(3

:

7) withG =f(

g

1;

g

2) :

g

1

g

22G (

g

1

g

2)

g. Sucient conditions forG to be a Donsker class are:

G

2

L

2(

P

)

and Z 1

0

q

H

2B(

G

P

)

d <

1

:

(3

:

8) Condition (3.8) may be replaced by

Z

1

0

p

(

G)

d <

1

(3

:

9) where

(

G)supQ

H

2(

(

Z

G

2

dQ

)1=2

G

Q

)

(3

:

10) and where the supremum is taken over all probability measures

Q

with nite support.

Suppose now that (2.9) is met, and that for 0

<

n!0,

d(FFsup0)n sup

0<1

Z

(

b

F;

b

F0)2

dP

!0

:

(3

:

11) Then it is clear from the above that (2.10) and (2.11) are fullled if one of the entropy conditions (3.8) or (3.9) holds, with

G=f

b

F:

d

(

FF

0)

0

<

1g

:

In many applications however, no explicit expressions are available for the ecient inuence curves, so that it may still be dicult to check the entropy conditions.

(12)

4. Examples

Example 4.1: interval censored obervations, case I

Suppose one observes

X

i = (

T

i

i), where

i = lf

Y

i

T

ig,

Y

i and

T

i are independent random variables, both with values in a bounded interval, say (0

1], and where

T

i has (unknown) distribution

G

,

i

= 1

:::n

. This is one of the models studied in Groeneboom and Wellner (1992). The density of

X

i with respect to

G

,

being the counting measure onf0

1g, is

p

F0(

t

) =

F

0(

t

) + (1;

)(1;

F

0(

t

)) =

Z

k

(

ty

)

dF

0(

y

)

with

k

(

ty

) =

lf

y t

g+ (1;

)lf

y > t

g.

Lemma 4.1. Suppose

G

has density

g

with respect to Lebesgue measure.

Assume that _

a

(

y

) =

da

(

y

)

=dy

exists and

j

a

_(

t

)

g

(

t

)j

C

1

<

1

t

2(0

1]

(4

:

1) and

j

d

(_

a

(

t

)

=g

(

t

))

dF

0(

t

) j

C

2

<

1

t

2(0

1]

:

(4

:

2)

Then

^n;

0 =

Z

b

F0

d

(

P

n;

P

) +

o

P(

n

;1=2)

where

b

F0(

t

) =;

(1;

F

0(

t

))_

a

(

t

)

g

(

t

) + (1;

)

F

0(

t

)_

a

(

t

)

g

(

t

)

t

2(0

1] 2f0

1g

:

Proof Condition 1 is met:

d

( ^

F

n

F

0)!0 almost surely,j

^n;

0j!0 almost

surely and Z

log(

p

F^n

p

F0)

dP

n =

O

P(

n

;2=3)

:

This follows from the theory in Subsection 3.1 (see also Groeneboom and Wellner (1992) and Van de Geer (1993a)). Dene

b

F(

t

) =;

(1;

F

(

t

))_

a

(

t

)

g

(

t

) + (1;

)

F

(

t

)_

a

(

t

)

g

(

t

)

t

2(0

1] 2f0

1g

:

Then

A

b

F(

y

) = ;

Z

1

y;(1;

F

(

t

))_

a

(

t

)

dt

+

Z y;

0

F

(

t

)_

a

(

t

)

dt

=

a

(

y

);

a

(0);

Z

1

0

(1;

F

(

t

;))_

a

(

t

)

dt

=

F(

y

)

y

2(0

1]

:

(13)

Because

F

dominates

F

0,

dF

0

=dF

exists. So

h

F(

y

) =;

d

(

F

(

y

)(1;

F

(

y

))_

a

(

y

)

=g

(

y

))

dF

(

y

)

= (2

F

(

y

);1)_

a

(

y

)

g

(

y

) +

F

(

y

;)(1;

F

(

y

;))

g

2(

y

)

d

(_

a

(

y

)

=g

(

y

))

dF

0(

y

)

dF

0(

y

)

dF

(

y

) exists too. Moreover, for

p

F(

t

)

>

0,

A

F

h

F(

t

) =

Rt

0

h

F

dF

F

(

t

) + (1;

)

R

t1

h

F

dF

1;

F

(

t

) =

b

F(

t

)

:

Thus, Condition 2 is fullled.

Clearly,

j

h

F(

y

)j

C

1+

C

2 (1; )

:

Therefore, Condition 3 holds too.

Finally, we shall verify Condition 4. Note rst that

Z

b

2F0

dP

=

Z

F

0(

t

)(1;

F

0(

t

))(_

a

(

t

))2

g

(

t

)

dt >

0

and

j

b

F(

t

)j

C

1

:

To check (2.10) and (2.11), we use the theory of Subsection 3.4. We have

d

2(

FF

0) = 12(Z (p

F

;p

F

0)2

dG

+

Z

(p1;

F

;p1;

F

0)2

dG

)

and

Z

(

b

F;

b

F0)2

dP

=

Z (

F

(

t

);

F

0(

t

))2(_

a

(

t

))2

g

(

t

)

dt C

12

d

2(

F

F

0)

:

So indeed, (3.11) is satised: for 0

<

n !0,

d(FFsup0)n sup

0<1

Z

(

b

F;

b

F0)2

dP

!0

:

Also (3.8) holds, namely forG=f

b

F:

d

(

FF

0)

0

<

1g, we have for some constant

A

,

H

2B(

G

P

)

A

1

>

0

:

This follows from entropy calculations for monotone functions (see Birman and Solomjak (1967) and, for the extension to entropy with bracketing, Van de Geer (1991)). Therefore, (2.10) and (2.11) are fullled. In view of

Theorem 2.1, this completes the proof. tu

(14)

The result of Lemma 4.1 was established earlier in Groeneboom and Wellner (1992). They use specic properties of the maximum likelihood estimator ^

F

n, such as the distance between successive jumps of ^

F

n being smaller than

n

;1=3log

n

with large probability. In this sense, local properties of ^

F

n had to be obtained rst, before one could arrive at the asymptotic behaviour of such global quantities as the mean of the maximum likelihood estimator. It inspired us to develop an alternative proof, which is hopefully applicable in more general situations.

The model for interval censored observations gives a good insight into the diculties that arise due to the fact that ^

F

n does not dominate

F

0. Let us have a closer look for the case

a

(

y

) =

y

. It is known that ^

F

n has nite support, say ^

z

1

< ::: < z

^m. Dene

g

^(

t

) =

G

(^

z

j);

G

(^

z

j;1)

z

^j ;

z

^j;1

t

2(^

z

j;1

^

z

j]

and

b

F^n(

t

) =;

1;

F

^n(

t

)

g

^(

t

) +

F

^n(

t

)

g

^(

t

)

:

Then for

y

2f

z

^1

::: z

^mg,

A

b

F^n(

y

) =

y

;

^n =

F^n(

y

)

:

However, Z

F^n

dF

0 =;(^

n;

~n)

with

~n =

Z Z y 1

g

^(

t

)

dG

(

t

)

dF

0(

y

)

:

In general, ~

n 6=

0, unless

G

happens to be the uniform distribution on (0

1].

Write

h

F^n(

y

) =;

d

( ^

F

n(

y

)(1;

F

^n(

y

))

= g

^(

y

))

d F

^n(

y

)

= 2

F

^n(

y

);1

^

g

(

y

) +

F

^n(

y

;)(1;

F

^n(

y

;))

^

g

(

y

)^

g

(

y

;)

d g

^(

y

)

d F

^n(

y

)

:

In order to have that at ^

F

n, the derivative of the likelihood equals zero in the direction

h

F^n, one must have that this direction is bounded. This leads to showing that the jumps of ^

F

n are large enough.

Example 4.2: interval censored observations: case II Let

X

i= (

T

i

U

i

i

i), with

i= lf

Y

i

T

ig,

i= lf

T

i

< Y

i

U

ig and

Y

i

independent of (

T

i

U

i),

i

= 1

:::n

. We assume bounded support, say

T

i

U

i

and

Y

i2(0

2], and that

T

i

< U

i. This model is also studied in Groeneboom and Wellner (1992). The rate of convergence of the log-likelihood ratio can

Références

Documents relatifs

Key words : Asymptotic normality, Ballistic random walk, Confidence regions, Cramér-Rao efficiency, Maximum likelihood estimation, Random walk in ran- dom environment.. MSC 2000

[r]

On a donc ici un exemple d’anneau principal

Montrer qu'une fonction somme d'une fonction croissante et d'une fonction dé- croissante est à variations bornées.. On suppose que V f ([a, b]) contient un

Given in the following is a diagram of a leveling network to be adjusted. Table 1 contains all of the height difference measurements ∆ h ij necessary to carry out

2. b) Derive the least squares solution of the unknown parameters and their variance-covariance matrix.. Given the angle measurements at a station along with their standard

Given the covariance matrix of the horizontal coordinates (x, y) of a survey station, determine the semi-major, semi-minor axis and the orientation of the

Le voltmètre mesure une tension appelée valeur efficace de