ASYMPTOTIC NORMALITY IN MIXTURE MODELS
SARA VAN DE GEER
.
Abstract. We study the estimation of a linear function 0=R adF0 of a distribution F0, using i.i.d. observations of the mixture pF0 =
R
k(y)dF0(y). Let ^Fn be the maximum likelihood estimator of F0 and ^n =R adF^n. We examine the asymptotic distribution of ^n. A problem here is that usually, ^Fn does not dominate F0. Our main aim is to show that this can be overcome by considering the convex combination Fn^ + (1;)F0, with <1.
1. Introduction
Let
X
1:::X
n be independent identically distributed random variables on (XA), with distributionP
. Suppose that for some -nite measure ,p
=dP
d
2fp
F =Z
k
(y
)dF
(y
) :F
2 gwhere is the class of all probability measures on a measurable space (Y
B), and wherek
:XY !01) is a given kernel withRk
(xy
)d
(x
) = 1 for ally
2Y. Sop
=p
F0 =Z
k
(y
)dF
0(y
) for someF
0 2 .In this paper, we assume that
F
0 is unknown, and that the maximum likelihood estimator ^F
n ofF
0 exists. The latter is (not necessarily uniquely)dened by n
X
i=1log
p
F^n(X
i) = maxF2
n
X
i=1log
p
F(X
i):
Consider now functionals of the form (F
) =Z
adF F
2with
a
a given function on (YB). Write 0 = RadF
0 and ^n = Rad F
^n. We shall investigate the asymptotic behaviour (and eciency) of the esti- mator of ^n of 0. An important issue in this context is the appropriate dierentiability of(F
). Let us briey sketch the main idea.ESAIM: Probability and Statistics is an electronic journal with URL address
http://www.emath.fr/ps.
Received by the journal 8 March 1995. Revised 2 June 1995. Accepted for publica- tion 28 September 1995
Consider a pair of random variables (
XY
) with values in X Y, letk
(y
) be the density ofX
givenY
=y
, andF
0 the distribution ofY
. For a functionb
:X !R, writeE
(b
(X
)jY
=y
) =A
b
(y
):
Thus
A
b
(y
) =Z
k
(xy
)b
(x
)d
(x
):
Write forh
:Y !R,E
(h
(Y
)jX
=x
) =A
F0h
(x
) whereA
Fh
(x
) =R
k
(xy
)h
(y
)dF
(y
)p
F(x
):
If for someb
,E
(b
(X
)jY
=y
) =a
(y
) forF
0;almost ally
(1:
1) then clearlyE
(b
(X
)) =0:
So if (1.1)holds for some
b
2L
2(P
) not depending onF
0, then (1=n
)Pni=1b
(X
i) is apn
-consistent and asymptotically normal estimator of 0. We say that (F
0) is dierentiable atF
0 if a solution of (1.1) exists. Note that ifk
(y
) is a complete family fory
in the support ofF
0, there is at most one solution of (1.1). In general, there may also be several solutions, in which case we would like to take the one with the smallest variance. But such a solution possibly depends onF
0. The arguments below indicate that perhaps the maximum likelihood procedure automatically picks the best solution, with the estimator ^F
n plugged in forF
0.We shall now discuss the solution with the smallest variance. First, we center the functions. Instead of
a
in (1.1), we consider the gradient of(F
), which is dened as F =a
;ZadF:
If for some
h
F 2L
1(F
) with Rh
FdF
= 0,A
A
Fh
F =FF
;a:
s:
(1:
2) we callb
F =A
Fh
F (1:
3) the ecient inuence curve at (F
). Note thatb
F is also centered now:R
b
FdP
F = 0. It follows from Van der Vaart (1991) that (1=n
)Rb
2F0dP
is a lower bound for the asymptotic variance of an estimator of0. He considers parametric submodels with Hilbert space structure to arrive at results of this type in a very general context. In our case, the parametric submodel wouldbe indexed by F =f
F
t: (dF
t)1=2= (1+12th
F)(dF
)1=2 jt
jsmallg. For this reason, we callh
F a direction (along which one can consider a submodel).The assumptions
h
F 2L
1(F
) and Rh
FdF
= 0 ensure that indeed F . Suppose now that the ecient inuence curvesb
F^n andb
F0 exist. Sob
F^n =A
F^nh
F^n for someh
F^n2
L
1( ^F
n), andA
b
F^n =F^nF
^n;a:
s::
(1:
4) Observe that ^F
nis an interior point offF
^nt :d F
^nt = (1+th
F^n)d F
^n jt
jsmallg. So we havedt d
n
X
i=1log
p
F^nt(X
i)jt=0= 0or n
X
i=1
b
F^n(X
i) = 0:
Write this as Z
b
F^ndP
n = 0with
P
n the empirical distribution based onX
1:::X
n (see (2.1)). Now, let us compare this withRb
F^ndP
. Changing the order of integration givesZ
b
F^ndP
=Z
A
b
F^ndF
0:
Moreover, if ^F
n dominatesF
0, then (1.4) yieldsZ
A
b
F^ndF
0 =Z
F^ndF
0=0;^n:
So then we have the identity of van der Laan (1993, 1994): ^n;0 =Z
b
F^nd
(P
n;P
):
(1:
5) Finally, ifb
F^n converges tob
F0 in an appropriate sense, one obtains asymp- totic normality (and eciency) of ^n.The problem is now that the assumption that ^
F
n dominatesF
0 is often not valid. Nevertheless, van der Laan (1993) presents some examples that show that the identity (1.5) can hold even when ^F
n does not dominateF
0. We shall however be concerned with the situation where the identity (1.5) is not necessarily true.Our approach is to consider for each 0
<
1 andF
2 , the convex combinationF
=F
+ (1; )F
0:
Then indeed, ^
F
n dominatesF
0. We shall obtain asymptotic normality of ^n by choosing = ^n in such a way that it tends to one at an appropriate speed.The paper is organized as follows. In Section 2, we derive a linear ap- proximation for ^
n. Asymptotic normality and eciency follow from this.The conditions we need to arrive at the result are consistency of the maxi- mum likelihood estimator (obtained by separate means), dierentiability of
(F
) atF
=F
, 0<
1, bounded directions and suitable continuity conditions on the inuence curves. A discussion of these conditions can be found in Section 3. Section 4 presents some examples.2. Asymptotic normality
As pseudo-metric on , we take the Hellinger distance between the mixtures:
d
(F F
~) =h
(p
Fp
F~) = (12Z (pp
F ;pp
F~)2d
)1=2F F
~ 2:
Consistency of ^
F
nin this metric holds in fairly general situations. It is closely related to the further assumptions as stated in Condition 1 (see Section 3.1 for more details).We use the notation
P
n= 1n
n
X
i=1
Xi (2:
1)i.e.
P
n is the empirical distribution that puts mass (1=n
) at each of the observationsX
1:::X
n. Moreover, we shall make frequent use of stochastic order symbols. If fZ
ng is a sequence of real-valued random variables, andf
k
ng a sequence of positive numbers, then we say thatZ
n =O
P(k
n) ifMlim!1lim supn!1P(j
Z
nj> k
nM
) = 0:
Similarily,Z
n=o
P(k
n) means that for all>
0,nlim!1P(j
Z
nj> k
n) = 0:
Condition 1. (consistency and rates). The estimators ^
F
n and ^n are consistent, i.e.d
( ^F
nF
0) =o
P(1) (2:
2) andj
^n;0j=o
P(1):
(2:
3) Moreover, for somen2 =o
(n
;1=2),Z
log( 2
p
F^np
F^n +p
F0)dP
n =O
P(n2):
(2:
4) Condition 2 (dierentiability in a neighbourhood ofF
0 and existence of ecient inuence curves). For some>
0, and for all 0<
1 andF
2 withd
(FF
0) , we haveA
b
F =FF
;a:
s:
(2:
5)for
b
F satisfyingb
F(x
) =A
Fh
F(x
)p
F(x
)>
0 (2:
6) for someh
F 2L
1(F
), withRh
FdF
= 0.Condition 3. (control on the direction
h
F). For some>
0 andM <
1,d(FFsup0) sup
0<1y sup
2supp ort(F)j(1; )
h
F(y
)jM:
(2:
7) Condition 4. (control on the ecient inuence curvesb
F). The informa- tion for estimating0 is positive:Z
b
2F0dP >
0:
(2:
8) Moreover, the inuence curves are uniformly bounded: for some>
0,d(FFsup0) sup
0<1p sup
F
(x)>0j
b
F(x
)j<
1:
(2:
9) Finally, for 0<
n!0,d(FFsup0)n sup
0<1
Z
b
2FdP
n=Z
b
2F0dP
+o
P(1) (2:
10) andd(FFsup0)n sup
0<1
Z
(
b
F;b
F0)d
(P
n;P
) =o
P(n
;1=2):
(2:
11) Theorem 2.1. Assume that conditions 1-4 are met. Then ^n;0 =Z
b
F0d
(P
n;P
) +o
P(n
;1=2):
(2:
12) Proof. By (2.4), we can nd a non-random increasing function: (01)! (01), satisfying (x
)!0 forx
!0 (2:
13) (x x
) 12 for all
x >
0 (2:
14)and
x
(x
) !0 forx
!0 (2:
15) such thatn2 =o
(n
;1=2)(n
;1=2). This givesZ
log( 2
p
F^np
F^n +p
F0)dP
n =o
P(n
;1=2)(n
;1=2):
(2:
16)Choose
1;^n = j
^n;0j+n
;1=2 (j^n;0j+n
;1=2):
(2:
17) Then ^n<
1 and by (2.14), ^n 1=
2. Since j^n ;0j =o
P(1), (2.15) implies that (1;^n) =o
P(1). Furthermore, (2.13) yieldsj
^n;0j1;^n =
o
P(1) (2:
18)as well as
n
;1=21;^n =
o
P(1):
(2:
19)Dene
F
~n = ^nF
^n+ (1;^n)F
0:
(2:
20) Let>
0 be small enough, so that (2.5), (2.6), (2.7) and (2.9) in conditions 2-4 are fullled for this value of, and letD
n be the setD
n=fd
( ^F
nF
0) g:
Because ^n
<
1, we can nd onD
n an inuence curveb
F~n and a directionh
F~n, such thatA
b
F~n =F~nand
b
F~n(x
) =A
F~nh
F~n(x
)x
2fp
F~n>
0g:
We can writeZ
b
F~ndP
nlfD
ng=Z
b
F~nd
(P
n;P
)lfD
ng+Z
b
F~ndP
lfD
ng:
(2:
21) Becaused
( ^F
nF
0) =o
P(1), we have that lfD
ng = 1 +o
P(1). So, using (2.11),Z
b
F~nd
(P
n;P
)lfD
ng=Z
b
F0d
(P
n;P
)+o
P(n
;1=2) =O
P(n
;1=2):
(2:
22) Since ~F
n dominatesF
0, and ^n= 1 +o
P(1),Z
b
F~ndP
lfD
ng=;^n(^n;0)lfD
ng=;(1 +o
P(1))(^n;0):
(2:
23) Insert (2.22) and (2.23) into (2.21) to get thatZ
b
F~ndP
nlfD
ng=Z
b
F0d
(P
n;P
)+o
P(n
;1=2);(1+o
P(1))(^n;0):
(2:
24)Let ^
t
n=R
b
F~ndP
n(1;^n)R
b
2F0dP
lfD
ng:
(2:
25)Equality (2.24), together with (2.8), (2.18) and (2.19) imply that
^
t
n =o
P(1):
(2:
26)Take
M
as in (2.7) andE
n=D
n\fj^t
nj<
1=M
g. Dene onfE
ng,d F
~n(^t
n) = (1 + ^t
n(1;^n)h
F~n)d F
~n:
Then onf
E
ng,F
~n(^t
n)2:
(2:
27) Moreover, from (2.26) we know that lfE
ng= 1+o
P(1), so that by (2.9) and (2.10),Z
log(
p
F~n(t^n)p
F~n )dP
nlfE
ng=Z
log(1 + ^
t
n(1;^n)b
F~n)dP
nlfE
ng (2:
28)= ^
t
n(1;^n)Z
b
F~ndP
nlfE
ng;12^
t
2n(1;^n)2Z
b
2F~ndP
nlfE
ng(1 +o
P(1))= 12(R
b
F~ndP
n)2R
b
2F0dP
lfE
ng(1 +o
P(1)):
Since ^
F
n maximizes the likelihood, and (2.27) holds onE
n, we haveZ
log
p
F^ndP
nlfE
ngZ logp
F~n (^tn)
dP
nlfE
ng:
(2:
29) Furthermore, the concavity of the log-function and the fact that ^n 1=
2 yieldZ
log
p
F~ndP
n (2^n;1)Z
log
p
F^ndP
n+ 2(1;^n)Z
log(
p
F^n+p
F02 )
dP
n:
(2:
30) Combine (2.28), (2.29) and (2.30) to nd2(1;^n)
Z
log( 2^
p
F^np
F^n+p
F0)dP
n 12(R
b
F~ndP
n)2R
b
2F0dP
lfE
ng(1 +o
P(1)):
(2:
31) From (2.16) and (2.17), we know that the left-hand side of this equality is (1;^n)o
P(n
;1=2)(n
;1=2) = (j^;0j+n
;1=2) (n
;1=2) (j^n;0j+n
;1=2)o
P(n
;1=2)= (j
^n;0j+n
;1=2)o
P(n
;1=2)where in the last step, we used that
is increasing. In view of (2.24), the right-hand side of (2.31) is of the form(
O
P(n
;1=2);(1 +o
P(1))(^n;0))2(1 +o
P(1)R
b
2F0dP
):
So we nd from (2.31) that
j
^n;0j2 maxfO
P(n
;1)(j^n;0j+n
;1=2)o
P(n
;1=2)gwhich implies j
^n;0j=O
P(n
;1=2). But then, the left-hand side of (2.31) iso
P(n
;1), so that it reads(
Z
b
F0d
(P
n;P
)+o
P(n
;1=2);(1+o
P(1))(^n;0))2(1+o
P(1)) =o
P(n
;1):
In other words, ^n;0 =Z
b
F0d
(P
n;P
) +o
P(n
;1=2):
t u
3. Comments on conditions 1-4
Conditions 1 and 4 can be veried using the concept of entropy. Therefore, we introduce the following denitions. Let
Q
be a probability measure on (XA) andGLq(Q
),q
1.Definition 1 The
-covering numberN
q(GQ
) of G is dened as the number of balls with radius necessary to cover G. More precisely, letf
g
jgmj=1 be such that for eachg
2G there is aj
2f1:::m
gsuch thatZ
(
g
;g
j)qdQ
q:
(3:
1) ThenN
q(GQ
) is the smallestm
for which such a collectionfg
jgmj=1exists.The
-entropy of G isH
q(GQ
) = logN
q(GQ
)_1.Definition 2. Let f
g
Ljg
Uj]gmj=1 be such that for allg
2 G there is aj
2f1:::m
gsuch thatg
Ljg g
Uj (3:
2)and Z
(
g
Uj;g
Lj)qdQ
q:
(3:
3) WriteN
Bq(GQ
)for the smallestm
for which such a collectionfg
Ljg
Uj]gmj=1 exists. ThenH
Bq(GQ
) = logN
Bq(GQ
)_1 is called the -entropy with bracketing.3.1. On Condition 1
Suppose thatY is a locally compact Hausdor space with countable base, and that B is the Borel
-algebra. Let C0 be the class of all functionsc
: Y ! R that vanish at innity. Ifk
(x
) 2 C0 for -almost allx
2 X, thend
( ^F
nF
0)!0 almost surely (see Pfanzagl (1988)).Denote the class of all measures
F
on (YB) withF
(Y) 1 by . The vague topology on is the smallest topology such thatF
7!RcdF
is continuous for everyc
2C0. Let be the metric corresponding to the vague topology. We say thatF
0 is identiable (for the metric) if for allF
2 ,d
(FF
0) = 0 implies (FF
0) = 0. IfF
0 is identiable, thend
( ^F
nF
0) !0 almost surely implies ( ^F
nF
0) ! 0 almost surely. So in particular, thenj
^n;0j!0 almost surely, whenevera
2C0. More details can be found in e.g. Pfanzagl (1988) or Van de Geer (1993a).Let us now investigate the rate of convergence for the log-likelihood ratio. Note rst of all that
0
Z
log( 2
p
F^np
F^n+p
F0)dP
n 1 2Z
log(
p
F^np
F0)dP
nso that nothing is lost (and indeed something could be gained) by comparing
p
F^n with the convex combination (p
F^n+p
F0)=
2 instead of withp
F0.Lemma 3.1. Suppose that
Z
1
0
q
H
2B(GP
)d <
1 (3:
4) whereG=f
r
p
F +p
F0p
F0 :F
2 g:
Then Z
log(
p
F^np
F0)dP
n=O
P(n2) withn2 =o
(n
;1=2):
Proof. See Van de Geer (1995). A slight modication can be found in Wong
and Shen (1992). tu
We call the left-hand side of (3.4) the entropy integral. If the entropy in- tegral diverges, suboptimal rates can emerge (see Birge and Massart (1993)).
Lemma 3.2 below makes use of the special structure of the mixing model. We need the following notation: for
0, 12() =Z
pF0
p
F0d
and
22() =Z
pF0> 1
p
F0d:
Lemma 3.2. Let K= f
k
(y
) :y
2 Yg. Suppose that the functions in K are uniformly bounded, and that for all probability measuresQ
with nite support,N
2(KQ
)A
;w>
0 (3:
5)where the constants
A
andw
do not depend onQ
. Let 0 n ! 0, n1(n)_n
;(2+w)=(4+4w)(2(n))w=(2+2w). ThenZ
log( 2
p
F^np
F^n +p
F0)dP
n =O
P(n2):
(3:
6)Proof See Van de Geer (1993b). tu
Lemma 3.2 may not yield the optimal rate, but all we need here is a rate faster than
n
;1=2. This is the case if 2(n) =o
(n
2w1 ) . To put it dierently, if we dene 1;1() = supf: 1() 2g thenn2 =o
(n
;1=2) if 2(1;1()) =o
(;w2) for#0:
3.2. On Condition 2
It is natural to require (2.5) and (2.6) for = 0. The fact that we need these equalities for all 0
<
1 is closely related to being able to estimate the ecient inuence curve. However, it should be noted that we only assumeb
F to exist for certainF
that dominateF
0.In some applications,
b
F^n does exist, but this usually will not help to simplify the proofs (see Section 4 for an example).3.3. On Condition 3
Here, we assume that
h
F behaves like 1=
(1; ). Clearly, this allows misbehaviour for ! 1, but it also requiresh
F to be bounded. In some applications, this reduces to assuming thath
F0 is bounded. The proof of Theorem 2.1 reveals that we need that for allt
suciently small,dF
(t
)=dF
= 1+t
(1; )h
F exists and is non-negative, i.e., thatF
(t
)2 . If for some functiong
,dg=dF
0 exists and is bounded, say byC
, then alsodg=dF
exists and (1; )dg=dF
is bounded byC
. This is the reason why in Example 4.1 and 4.3 our approach works. But it fails in Example 4.2!3.4. On Condition 4
Let us present a brief overview of some results from empirical process theory, that can be applied in this context. We cite them from Pollard (1984) and Ossiander (1987). Throughout, we assume that the necessary measurability conditions are satised.
Consider a class G of functions on (X
A), with envelopeG
= supg2G j
g
j:
The class G is called a Glivenko-Cantelli class ifsupg
2G j
Z
gd
(P
n;P
)j!0 almost surely:
It is called a Donsker class if
n gd
(P
n;P
) converges in distribution to a mean zero Gaussian process onG. This limiting process is assumed to have continuous sample paths with respect to(), with (g
1g
2)2=Z
(
g
1;g
2)2dP
;(Z
(
g
1;g
2)dP
)2:
Necessary and sucient conditions forGto be a Glivenko-Cantelli class
are:
G
2L
1(P
)and
H
1(GP
n) =o
P(n
)>
0:
IfG is a Donsker class, then the asymptotic equicontinuity condition holds, i.e. for all
>
0 there exists a>
0 such thatlimsupn
!1
P(supg
2G
j
p
n
Zg d
(P
n;P
)j>
)<
(3:
7) withG =f(g
1;g
2) :g
1g
22G (g
1g
2) g. Sucient conditions forG to be a Donsker class are:G
2L
2(P
)and Z 1
0
q
H
2B(GP
)d <
1:
(3:
8) Condition (3.8) may be replaced byZ
1
0
p
(G)d <
1 (3:
9) where (G)supQH
2((Z
G
2dQ
)1=2GQ
) (3:
10) and where the supremum is taken over all probability measuresQ
with nite support.Suppose now that (2.9) is met, and that for 0
<
n!0,d(FFsup0)n sup
0<1
Z
(
b
F;b
F0)2dP
!0:
(3:
11) Then it is clear from the above that (2.10) and (2.11) are fullled if one of the entropy conditions (3.8) or (3.9) holds, withG=f
b
F:d
(FF
0) 0<
1g:
In many applications however, no explicit expressions are available for the ecient inuence curves, so that it may still be dicult to check the entropy conditions.
4. Examples
Example 4.1: interval censored obervations, case I
Suppose one observes
X
i = (T
ii), where i = lfY
iT
ig,Y
i andT
i are independent random variables, both with values in a bounded interval, say (01], and whereT
i has (unknown) distributionG
,i
= 1:::n
. This is one of the models studied in Groeneboom and Wellner (1992). The density ofX
i with respect toG
, being the counting measure onf01g, isp
F0(t
) =F
0(t
) + (1;)(1;F
0(t
)) =Z
k
(ty
)dF
0(y
) withk
(ty
) =lfy t
g+ (1;)lfy > t
g.Lemma 4.1. Suppose
G
has densityg
with respect to Lebesgue measure.Assume that _
a
(y
) =da
(y
)=dy
exists andj
a
_(t
)g
(t
)jC
1<
1t
2(01] (4:
1) andj
d
(_a
(t
)=g
(t
))dF
0(t
) jC
2<
1t
2(01]:
(4:
2)Then
^n;0 =Z
b
F0d
(P
n;P
) +o
P(n
;1=2) whereb
F0(t
) =;(1;F
0(t
))_a
(t
)g
(t
) + (1;)F
0(t
)_a
(t
)g
(t
)t
2(01] 2f01g:
Proof Condition 1 is met:d
( ^F
nF
0)!0 almost surely,j^n;0j!0 almostsurely and Z
log(
p
F^np
F0)dP
n =O
P(n
;2=3):
This follows from the theory in Subsection 3.1 (see also Groeneboom and Wellner (1992) and Van de Geer (1993a)). Dene
b
F(t
) =;(1;F
(t
))_a
(t
)g
(t
) + (1;)F
(t
)_a
(t
)g
(t
)t
2(01] 2f01g:
ThenA
b
F(y
) = ;Z
1
y;(1;
F
(t
))_a
(t
)dt
+Z y;
0
F
(t
)_a
(t
)dt
=
a
(y
);a
(0);Z
1
0
(1;
F
(t
;))_a
(t
)dt
=F(y
)y
2(01]:
Because
F
dominatesF
0,dF
0=dF
exists. Soh
F(y
) =;d
(F
(y
)(1;F
(y
))_a
(y
)=g
(y
))dF
(y
)= (2
F
(y
);1)_a
(y
)g
(y
) +F
(y
;)(1;F
(y
;))g
2(y
)d
(_a
(y
)=g
(y
))dF
0(y
)dF
0(y
)dF
(y
) exists too. Moreover, forp
F(t
)>
0,A
Fh
F(t
) =Rt
0
h
FdF
F
(t
) + (1;)R
t1
h
FdF
1;
F
(t
) =b
F(t
):
Thus, Condition 2 is fullled.Clearly,
j
h
F(y
)jC
1+C
2 (1; ):
Therefore, Condition 3 holds too.Finally, we shall verify Condition 4. Note rst that
Z
b
2F0dP
=Z
F
0(t
)(1;F
0(t
))(_a
(t
))2g
(t
)dt >
0 andj
b
F(t
)jC
1:
To check (2.10) and (2.11), we use the theory of Subsection 3.4. We have
d
2(FF
0) = 12(Z (pF
;pF
0)2dG
+Z
(p1;
F
;p1;F
0)2dG
) andZ
(
b
F;b
F0)2dP
=Z (
F
(t
);F
0(t
))2(_a
(t
))2g
(t
)dt C
12d
2(F
F
0):
So indeed, (3.11) is satised: for 0<
n !0,d(FFsup0)n sup
0<1
Z
(
b
F;b
F0)2dP
!0:
Also (3.8) holds, namely forG=f
b
F:d
(FF
0) 0<
1g, we have for some constantA
,H
2B(GP
)A
1>
0:
This follows from entropy calculations for monotone functions (see Birman and Solomjak (1967) and, for the extension to entropy with bracketing, Van de Geer (1991)). Therefore, (2.10) and (2.11) are fullled. In view of
Theorem 2.1, this completes the proof. tu
The result of Lemma 4.1 was established earlier in Groeneboom and Wellner (1992). They use specic properties of the maximum likelihood estimator ^
F
n, such as the distance between successive jumps of ^F
n being smaller thann
;1=3logn
with large probability. In this sense, local properties of ^F
n had to be obtained rst, before one could arrive at the asymptotic behaviour of such global quantities as the mean of the maximum likelihood estimator. It inspired us to develop an alternative proof, which is hopefully applicable in more general situations.The model for interval censored observations gives a good insight into the diculties that arise due to the fact that ^
F
n does not dominateF
0. Let us have a closer look for the casea
(y
) =y
. It is known that ^F
n has nite support, say ^z
1< ::: < z
^m. Deneg
^(t
) =G
(^z
j);G
(^z
j;1)z
^j ;z
^j;1t
2(^z
j;1^z
j] andb
F^n(t
) =;1;F
^n(t
)g
^(t
) +F
^n(t
)g
^(t
):
Then fory
2fz
^1::: z
^mg,A
b
F^n(y
) =y
;^n =F^n(y
):
However, Z
F^ndF
0 =;(^n;~n)with
~n =Z Z y 1
g
^(t
)dG
(t
)dF
0(y
):
In general, ~
n 6= 0, unlessG
happens to be the uniform distribution on (01].Write
h
F^n(y
) =;d
( ^F
n(y
)(1;F
^n(y
))= g
^(y
))d F
^n(y
)= 2
F
^n(y
);1^
g
(y
) +F
^n(y
;)(1;F
^n(y
;))^
g
(y
)^g
(y
;)d g
^(y
)d F
^n(y
):
In order to have that at ^
F
n, the derivative of the likelihood equals zero in the directionh
F^n, one must have that this direction is bounded. This leads to showing that the jumps of ^F
n are large enough.Example 4.2: interval censored observations: case II Let
X
i= (T
iU
iii), with i= lfY
iT
ig, i= lfT
i< Y
iU
ig andY
iindependent of (
T
iU
i),i
= 1:::n
. We assume bounded support, sayT
iU
iand