• Aucun résultat trouvé

Estimation and tests in finite mixture models of nonparametric densities

N/A
N/A
Protected

Academic year: 2022

Partager "Estimation and tests in finite mixture models of nonparametric densities"

Copied!
7
0
0

Texte intégral

(1)

DOI:10.1051/ps:2008014 www.esaim-ps.org

ESTIMATION AND TESTS IN FINITE MIXTURE MODELS OF NONPARAMETRIC DENSITIES

Odile PONS

1

Abstract. The aim is to study the asymptotic behavior of estimators and tests for the components of identifiable finite mixture models of nonparametric densities with a known number of components.

Conditions for identifiability of the mixture components and convergence of identifiable parameters are given. The consistency and weak convergence of the identifiable parameters and test statistics are presented for several models.

R´esum´e. Dans les mod`eles de m´elanges de densit´es non param´etriques, une question est de d´eterminer le comportement asymptotique d’estimateurs et de statistiques de test sur les composantes identifiables.

Des mod`eles de m´elanges non param´etriques d’un nombre connu de densit´es sont consid´er´es. Des con- ditions pour l’identifiabilit´e et pour les convergences des param`etres et fonctions identifiables sont pr´esent´ees. Le comportement des statistiques de test est d´ecrit et des estimateurs des composantes des densit´es sont d´efinis dans plusieurs cas.

Mathematics Subject Classification. 62G10, 62H10.

Received April 25, 2007. Revised January 16, 2008.

1. Introduction

Consider a real random variableX on a probability space (Ω, A, P0) andF a family of densities with respect to a probability μ on (Ω, A), such that the density f0 = dP0/dμ belongs to F and the functions of F are L2(μ)-integrable, it is a metric space with the metricd2(f, f) =f−fL2(μ).

For a fixed numberp, the mixture ofpcomponents ofF is defined by a vector ofpunknown densitiesf(p)= (f1, . . . , fp) ofFp with unknown proportions in Sp=(p)= (λ1, . . . , λp)]0,1[p;p

j=1λj = 1}. The density of the mixture isgf(p)(p) =p

j=1λjfjand we consider the functional setGp={gf(p)(p);f(p)∈ Fp, λ(p)∈ Sp}.

The densities and the proportions of a mixture gf(p)(p) of Gp are identifiable if for anyf(p) inFp andλ(p)in Sp, the equality p

j=1λjfj =p

j=1λjfj is satisfied μ-a.s. if there exists a permutation π of{1, . . . , p} such thatλj=λπ(j) andfj=fπ(j) μ-a.s., forj= 1, . . . , p. The identifiability condition is written as a condition on the parameters of a nonparametricF, it requires that the densities cannot be confounded and a mixture of two componentsf1 andf2 ofF does not belong toF.

For a parametric mixture of two componentsg=λf+ (1−λ)f0ofG2, the likelihood ratio test (LRT) for the hypothesisH0: g=f0 is studied through a reparametrization because the information matrix is not positive

Keywords and phrases. Mixture models, nonparametric densities, test, weak convergence.

1 pons.odile@free.fr

Article published by EDP Sciences c EDP Sciences, SMAI 2009

(2)

definite and either λ or f may be considered as nuisance parameters under H0. Conditions and proof of the convergence of the LRT asf0is known (admixture or contamination model) were given in [4,5] for parametric families of two or p≥2 components, for a test of a known density ofGq againstGp,q < p. The results were generalized to an unknown densityf0by plugging an estimator off0in the same statistic, for identifiable families, and they are easily adapted for testing sub-models of the semi-parametric mixture g=p

j=1λjf(η(· −μ)),f inF andθ= (η, μ) in a Borel compact subset ofR2. The methods are extended here to nonparametric density familiesG2with identifiable components, the parameter of interest for an admixture isλd2(f, f0) which is zero if and only ifH0holds, for a mixture it isλd2(f1, f2), withλ≤1/2. Wald’s method [8] for the proof of consistency for the parametric case is no more sufficient and the asymptotic distribution is not Gaussian, as it was known for parametric distributions. The conditions are sufficient to remove the assumption of uniform convergence of small order terms present in most papers about expansions of the LRT in parametric models [1–4,6]. Under the alternative, direct estimators of λ and f are built and they satisfy the usual convergence. The tests are extended to finite mixtures of nonparametric densities and the estimators to mixtures of two nonparametric symmetric densities.

2. Admixture models for a density

Let (X1, . . . , Xn) be an i.i.d. sample with density having two mixture componentsg=λf+ (1−λ)f0inG2,0, the subset ofG2of the mixtures with a knownf0. Denotedχ2(f, f) =

(f−f)2f−1and assume that A1: λandF are identifiables: ifλf+ (1−λ)f0=λf+ (1−λ)f0,μ-a.s., withλandλ in [0,1],f and

f in F, thenλ=λ andf =f,μ-a.s., A2: There exists a function ϕ0 inL2(P0) s.t.

lim sup

d2(f,f0)→0d−12 (f, f0)f0−1(f−f0)−d2(f, f00L2(P0)0.

Forf inF andλ∈]0,1[, denoteU ={d2(f, f0);f ∈ F} ⊂R+,

αf =λd2(f, f0), ϕf =d−12 (f, f0)f0−1(f−f0) iff =f0, (2.1) and ϕf =ϕ0 defined byA2if f =f0, DF =f;f ∈ F}, the L2(P0)-closure defined using A2of the set of functions ϕf. LetE0the expectation underP0, further conditions are

A3: E0supf∈F|logf|<∞andE0supϕ∈DFϕ(X)3<∞, A4: supϕ∈DF|n−1/2n

i=1ϕ(Xi)−Gϕ|tends to zero inP0-probability, withGϕa centred Gaussian process with variance Σϕ=E0ϕ2(X) and covariance Σϕ12=E0ϕ1(X)ϕ2(X).

Remark 1. TheL2(P0)-norm of the convergence in A2 is a dχ2-derivability of F at f0. It is not symmetric with respect to its arguments f and f0, unlike the L2 distance with respect to μ, and it is weaker than the Hellinger-derivability ofF. It appears naturally for the convergence under the null hypothesis of the variance of the likelihood ratio after the reparametrization. The functionϕ0is related to the direction of the approximation off0 byf.

Remark 2. By A2–A3 and the definition of DF,Eϕ(X) = 0 and the weak convergence ofn−1/2n

i=1ϕ(Xi) holds for everyϕofDF, and forn−1/2n

i=1{supDFϕ(Xi)E0supDFϕ(X)}by integrability of supDFϕ2(X).

Condition A4 is satisfied under a stronger condition on the dimension ofDF.

2.1. Tests for the density f

0

Under the alternative, a density is written

gλ,f =f0{1 +λf0−1(f −f0)}=f0(1 +αfϕf), αf 0.

(3)

The parameters set for a test off0against an admixture alternative isf 0;f ∈ F}and the hypothesisH0 is equivalent to the existence of a functionf in F such thatαf = 0, or inff∈Fαf = 0. For a sample (Xi)1≤i≤n ofX, the LRT statistic is written

Tn = 2 sup

λ∈]0,1[,sup

f∈F

1≤i≤n

{loggλ,f(Xi)logf0(Xi)}

= 2 sup

α∈U, sup

ϕ∈DF

1≤i≤n

ln(α, ϕ)

withln(α, ϕ) =

1≤i≤nlog{1 +αϕ(Xi)}.

Lemma 2.1. The estimatorαn(ϕ) = arg maxα∈Uln(α, ϕ)converges P0-a.s. to zero, uniformly onU.

Proof. For anyϕ∈DF, the MLEgϕ,n=f0{1 +αnϕ} of a densitygα,ϕ=f0{1 +αϕ}, withα≥0 andαn 0 converges at the parametric rate, therefore supR|gϕ,n−f0| ≤supR|gn−f0|for any other estimatorsgnwithout this parametric model. For everyε >0 there existsδε>0 s.t. fornlarge enough

P0{ sup

ϕ∈DFαn(ϕ)> ε} ≤ P0{ sup

ϕ∈DFn−1{ln(αn(ϕ), ϕ)−ln(0, ϕ)}> δε}

P0{

f0>0 sup

ϕ∈DFlog{1 +αn(ϕ)ϕ}dPn> δε}, for the empirical d.f. Pn ofP0. Let ¯gn= supϕ∈DFf0{1 +αn(ϕ)ϕ}, then

P0{ sup

ϕ∈DFαn(ϕ)> ε} ≤ P0{

f0>0logg¯n

f0 dPn> δε}

P0{

f0>0loggn

f0 dPn> δε}

P0{

f0>0loggn

f0 d(Pn−P0)> δε}

and this sequence tends to zero by the weak convergence of Pn.

LetYn the process defined byYn(ϕ) =n−1/2n

i=1ϕ(Xi), with variance-covariance Σ,Zn= Σ−1/2Yn andZ be a centred and continuous Gaussian process onDF, with covariance Σ−1/2ϕ1 Σϕ12Σ−1/2ϕ2 .

Theorem 2.2. UnderH0,supϕ∈DFn1/2αn(ϕ)−(Σϕ)−1/2Z(ϕ)1{Z(ϕ)>0}converges in probability to zero and Tn converges weakly to the variable supϕ∈DFZ2(ϕ)1{Z(ϕ)>0}.

Proof. LetϕinDF, ˙ln(·, ϕ) the derivative ofln(·, ϕ) with respect toαandαn(ϕ) under the constraintαn 0, thenαn(ϕ) = 0 or it is solution of

0 = n−1/2l˙n(ϕ,αn(ϕ))1{αn(ϕ)>0}

= n−1/2

i

ai(ϕ)

1 +αn(ϕ)ai(ϕ)1{αn(ϕ)>0}. (2.2) For everyϕ∈DF, letai≡ai(ϕ) =ϕ(Xi) andR1n =n−1

ia3i(1 +αnai)−1. Ifαn(ϕ)>0, the score equation is simply written

0 = Yn(ϕ)−n1/2αn(ϕ){n−1

i

a2i(ϕ)−αn(ϕ)R1n(ϕ)}. (2.3)

(4)

Under supϕ∈DF|n−1

i2(Xi)Σϕ}|convergesP0-a.s. to zero and (2.3) becomes 0 =Yn(ϕ)−n1/2αn(ϕ)ϕ+op(1)−αn(ϕ)R1n(ϕ)}. As proved in [7]:

n1/2αn(ϕ) = (Σϕ)−1/2Zn(ϕ)1{Zn(ϕ)>0}+op(1). (2.4) By integration of R1n,ln(αn(ϕ), ϕ) =n1/2αn(ϕ)Yn(ϕ)12α2n(ϕ)

iϕ2(Xi) +R2n(ϕ) where R2n =

i

αn(ϕ)

0

a3iα2

1 +αai=

i

αn α∗2n a3i 1 +αnai

with αn(ϕ) between 0 and αn(ϕ). As the functions α a3i(1 +αai)−1 are decreasing and from the bound established for R1n,n−1|R2n| ≤α3nn−1|

ia3i|. Then supDFn−1|R2n(ϕ)|=op(1) and ln(αn(ϕ), ϕ) =n1/2αn(ϕ)Yn(ϕ)−n

2α2n(ϕ)Σϕ+op(1)

uniformly on DF. The expression (2.4) of αn(ϕ) implies a uniform approximation of Tn as Tn = 2 supϕ∈DF

ln(αn(ϕ), ϕ) = supϕ∈DF[Zn2(ϕ)1{Zn(ϕ)>0}] +oP(1).

With p additional components in the mixture under the alternative, the metric is written d2(f, f) = p

k=1fk−fkL2(μ) for two densities f = (f1, . . . , fp) and f = (f1, . . . , fp). The density gλ,f = (1−λ)f0+ p

k=1λkfk becomes

gλ,f =f0{1 + p k=1

λkf0−1(fk−f0)}=f0(1 +αTfϕf), where the coefficient off0 satisfiesλ=p

k=1λk, αf = (αf k)1≤k≤p andϕf = (ϕf k)1≤k≤p are given by

αf k =λkd2(fk, f0k), ϕf k =d−12 (fk, f0k)f0k−1(fk−f0k) iff =f0, (2.5) and ϕ0k =f0k−1f0k . The hypothesisH0is equivalent to αf= 0 and the result of Lemma 2.1 still holds. The process Yn(ϕ) is ap-dimensional process with a diagonal variance Σ = (Σϕk)1≤k≤p. and Theorem 2.2 adapts for each component with the same proof, using tensor product and norms of the vectors already defined when necessary. Denoting 1{Z(ϕ)>0}for the vector of indicators 1{Zk(ϕ)>0}, we obtain

Theorem 2.3. UnderH0,supϕ∈DF|n1/2αn(ϕ) − (Σϕ)−1/2Z(ϕ)1{Z(ϕ)>0}|converges in probability to zero andTn converges weakly to the variable supϕ∈DFZ(ϕ)21{Z(ϕ)>0}.

Whenf0is already a known mixture of functions inF, the result is extended as in [5] for mixtures of several parametric densities.

2.2. Estimation of the densities and mixture coefficients

Under the alternative, λ0 ∈ {0,1} and the model for the mixture density is g0 = λf+ (1−λ)f0 where λ belongs to a close subset of ]0,1[. We consider a class of functionsFwith compact supports not all confounded with the support suppF0 of the distribution function F0, then both λ0 and f are identifiable and may be estimated. If all the functions of F have the same support, further conditions are necessary for identifiability of the components ofg0. Estimators ofλandf are obtained after the estimation of the different supports.

(5)

LetGnthe empirical estimator of the d.f. of the variableX, with support suppGn= [miniXi,maxiXi], and Gn(x) =n−1

n i=1

1{Xi∈suppF0}1{Xi≤x}−F0(x)

a consistent estimator of (G0−F0)1{suppF∩suppF0} = λ(F −F0)1{suppF∩suppF0}, with support estimated by suppGn. An estimator ofλ0 is deduced from a classification of the observations into three intervals,

I1n = suppGn, I2n = suppF0\suppGn, I3n= suppGn\suppF0. The following identities

I2n

dGn = (1−λ)

I2n

dF0,

I1n

1x≤tdGn(x) = λ

I1n

1x≤tdFn(x) + (1−λ)

I1n

1x≤tdF0(x),

I3n

1x≤tdGn(x) = λ

I3n

1x≤tdFn(x),

imply simple expressions for the estimators ofλ0 andF onI1n andI3n λn = 1

I1ndGn

I1ndF0, (2.6)

I1n

1x≤tdFn(x) = λ−1n

I1n

1x≤tdGn(x)

I1ndGn

I1ndF0

I1n

1x≤tdF0(x) ,

I3n

1x≤tdFn(x) = λ−1n

I3n

1x≤tdGn(x).

LetK a symmetric kernel inL2(μ) andha window,gn(x) =n−1n

i=1Kh(x−Xi) an estimator ofg0, then the unknown density of the mixture is estimated by

fn(x) = λ−1n {gn(x)

I1ndGn

I1ndF0f0(x)}, ifx∈I1n,

fn(x) = λ−1n gn(x), ifx∈I3n. (2.7)

Assume that the support ofGis bounded andσ1,λ2 =

x2d(1−G)n(x)−(

xd(1−G)n(x))2<∞if inf suppF0<

inf suppF, and σ2,λ2 =

x2dGn(x)(

xdGn(x))2 <∞ if inf suppF0 >inf suppF, then n1/2(maxiXi max SuppG) and n1/2(miniXimin SuppG) converge weakly to centered Gaussian variables with variances σ21,λandσ21,λ. It follows that:

Proposition 2.4. The estimators λn,Fn andfn are uniformly P0-a.s. consistent. The variable n1/2(λn−λ) converges weakly to a centered Gaussian variable, n1/2(Fn−F) converges weakly to a transformed Brownian bridge and if F is a subset of C2 and h = o(n−1/5), n2/5(fn−f) converges weakly to a centered Gaussian process.

(6)

3. Mixture model of nonparametric densities

Consider a mixture of two unknown distributions with densities in a family F of concave unimodal and symmetric densities onR,

gλ,f1,f2 =λf1+ (1−λ)f2.

An identifiable mixture of two symmetric densities of F does not belong to F even with overlapping densities since the mixture density is not concave or not symmetric, except when they have the same mean and in that case they are unidentifiable. LetH0the null hypothesis of a single unknown distribution ofF. The distribution f0 under H0 may be estimated by a symmetric estimatorf0n: let θ0 the center of symmetry of f0, estimated by the median θn of the sample and consider the transformed sample Xi =Xi ifXi ≥θn, Xi = 2θn−Xi if Xin hence (X1, . . . , Xn) is a sample with asymptotic distributiona−10 F01]−∞,θ0], witha0=θ0

−∞dF0. The densityf0 is estimated by a kernel estimatorf0n(x) = n−1n

i=1Kh(x−Xi) on ]− ∞,θn] and by symmetry, f0n(x) =f0n(2θn−x) on ]θn,∞[, it isP0-a.s. uniformly consistent onR.

In the following, any other constraint on the form of the set F may obviously replace the symmetry, for example a disymmetry with some proportions between both sides of the functions. The main condition aboutF is the identifiability of the components of mixtures of functions belonging toF and mixtures cannot themselves belong to F. UnderH0, any estimatorf0n of the single unknown density f0 with the relevant constraint may be used.

A test of a mixturegλ,f1,f2 =λf1+ (1−λ)f2in G2, withf1 andf2in F, against the alternative of a single unknown distribution ofF may be performed by replacingf0 byf0n in the expression ofTn, with the statistic

Sn = 2 sup

λ∈]0,1/2],sup

f∈F

1≤i≤n

{loggλ,f,f

0n(Xi)logf0n(Xi)}.

The conditions for convergence ofSn are

A1: λand F are identifiable: if λf1+ (1−λ)f2=λf1 + (1−λ)f2, μ-a.s., withλand λ ∈]0,1/2],f1, f2,f1 andf2 ∈ F, thenλ1=λ2,f1=f1 andf2=f2,μ-a.s.,

A2: for allf inF, theL2(P0)-derivative atf,ϕf exists:

lim sup

d2(f,f)→0d−12 (f, f)f−1(f−f)−d2(f, ffL2(P0)0.

For all f and f in F, denote αf,f = λd2(f, f), ϕf,f = d−12 (f, f)f−1(f−f) if f = f and ϕf,f = ϕf, Uf={d2(f, f);f ∈ F} ⊂R+;DFf =f,f;f ∈ F}andDF =f,f;f, f∈ F}.

A3: E0supλ∈]0,1],supf1,f2∈F|logλf1+ (1−λ)f2|<∞and there exists a sequence of neighborhoodsV0n off0in F containing (f0n)n and converging tof0s.t.

E0 sup

f0n∈V0n,

sup

ϕn∈DFf0n

ϕn(X)3<∞,

A4: supf0n∈V0n,supϕn∈DFf

0n|n−1/2n

i=1ϕn(Xi)−Gϕ|converges inP0-probability to zero, withGϕde- fined as in A4.

The mixture density is written gλ,f,f = λf + (1−λ)f = f(1 +αf,fϕf,f) and the parameter set for a test of H0 is f,f;f, f ∈ F}. A sequence of parameters of smaller size is sufficient for the test with the statistic Sn, it is defined by {(αf,f0n)n;f ∈ F}. Let un,f = u(f,f0n), αn,f = λun,f, ϕn,f = ϕ(f,f0n), and gn,f =f0n(1 +αn,fϕn,f),Un={un,f;f ∈ F}andDFn=n,f;f ∈ F}. The statisticSn is then written

Sn= 2 sup

αn∈Un/2, sup

ϕn∈DFn

log(1 +αnϕn).

(7)

Under conditions A, the proofs of Section 2 are easily extended, with the same notations.

Lemma 3.1. For every ϕn inDFn, the estimator

αnn) = arg max

αn∈Un/2ln(α, ϕn) is such that lim supϕn∈DFnnn)| converges P0-a.s. to zero.

Theorem 3.2. Under H0, supϕn∈DFn|n1/2αnn)ϕ)−1/2Z(ϕ)1{Z(ϕ)>0}| converges in probability to zero and the statisticSn converges weakly tosupϕ∈DFZ2(ϕ)1{Z(ϕ)>0}.

As in Section 2, the convergence rate is due to the sum of n i.i.d. variables and does not depend on the convergence rate ofϕn.

With a vector ofpadditional components f = (f1, . . . , fp) to an unknown density f with true value f0 in the mixture under the alternative,gλ,f,f = (1−λ)f +p

k=1λkfk in Gp is written with the notations ofA2 as gλ,f,=f0(1 +αTf,fϕf,f),

the hypothesis H0 is equivalent to αf = 0 and the results of Lemma 2.1 and Theorem 2.3 hold. They are extended to a model with a mixture of identifiable components underH0, as in Section 2.

Under the alternative of a mixture of two functions ofF with different supports, the components of the true densityg0=λ0f1,0+ (1−λ0)f2,0may be estimated. For densities with known or estimated supports (densities with estimable center of symmetry), λ0 is estimated as in (2.6) and both symmetric densities are estimated by an estimator similar to f0n defined in this section. If only one center of symmetry of f1,0 or f2,0 may be estimated, one density is estimated by this method and the other density is defined by (2.7) wheref0is replaced byf0n. If both supports are unknown, a learning sample is necessary for the estimation of the support of the densities.

References

[1] J.M. Aza¨ıs, E. Gassiat and C. Mercadier, Asymptotic distribution and local power of the likelihood ratio test for mixtures:

bounded and unbounded cases.Bernoulli12(2006) 775–799.

[2] K. Fukumizu, Likelihood ratio of unidendifiable models and multilayer neural networks,Ann. Statist.31(2003) 833–851.

[3] J.K. Ghosh and P.K. Sen, On the aymptotic performance of the log-likelihood ratio statistic for the mixture model and related results. InProc. of the Berkeley conference in honor of Jerzy Neyman and Jack Kiefer, volume II, edited by L.M. Le Cam and R.A. Olshen, (1985), pp. 789–806.

[4] M. Lemdani and O. Pons, Likelihood ratio tests in mixture models.C. R. Acad. Sci. Paris, Ser. I322(1995) 399–404.

[5] M. Lemdani and O. Pons, Likelihood ratio tests in contamination models,Bernoulli5(1999) 705–719.

[6] X. Liu and Y. Shao, Asymptotics for likelihood ratio tests under loss of identifiability,Ann. Statist.31(2003) 807–832.

[7] O. Pons,Estimation et tests dans les mod`eles de m´elanges de lois et de ruptures. Herm`es-Science Lavoisier, London and Paris (2009).

[8] A. Wald, Note on the consistency of the maximum likelihood estimate,Ann. Math. Statist.20(1949) 595–601.

Références

Documents relatifs

Cette condition est requise dans la quasi totalit´e cas pour prouver la consistence faible de la distribution a posteriori, et a fortiori pour prouver la consistence

We investigate the expected L 1 -error of the resulting estimates and derive optimal rates of convergence over classical nonparametric density classes provided the clustering method

On the likelyhood for finite mixture models and Kirill Kalinin’s paper “Validation of the Finite Mixture Model Using Quasi-Experimental Data and Geography”... On the likelyhood

Given estimates of the projection coefficients, we then construct bona fide series estimators of the component densities and a least-squares estimator of the mixing proportions..

Jang SCHILTZ Identifiability of Finite Mixture Models with underlying Normal Distribution June 4, 2020 2 /

In this paper, we show under which conditions generalized finite mixture with underlying normal distribution are identifiable in the sense that a given dataset leads to a

The present paper is concerned with the problem of estimating the convolution of densities. We propose an adaptive estimator based on kernel methods, Fourier analysis and the

Here, it appears that estimating f jl for multivariate densities with strong dependence structure using a kernel den- sity estimate (KDE) with diagonal bandwidth matrix is