• Aucun résultat trouvé

The LAN property for McKean-Vlasov models in a mean-field regime

N/A
N/A
Protected

Academic year: 2023

Partager "The LAN property for McKean-Vlasov models in a mean-field regime"

Copied!
35
0
0

Texte intégral

(1)

arXiv:2205.05932v1 [math.ST] 12 May 2022

LAETITIA DELLA MAESTRA AND MARC HOFFMANN

ABSTRACT. We establish the local asymptotic normality (LAN) property for estimating a multidi- mensional parameter in the drift of a system ofN interacting particles observed over a fixed time horizon in a mean-field regimeN→ ∞. By implementing the classical theory of Ibragimov and Has- minski, we obtain in particular sharp results for the maximum likelihood estimator that go beyond its simple asymptotic normality thanks to H´ajek’s convolution theorem and strong controls of the likelihood process that yield asymptotic minimax optimality (up to constants). Our structural results shed some light to the accompanying nonlinear McKean-Vlasov experiment, and enable us to derive simple and explicit criteria to obtain identifiability and non-degeneracy of the Fisher information ma- trix. These conditions are also of interest for other recent studies on the topic of parametric inference for interacting diffusions.

Mathematics Subject Classification (2010): 62C20, 62F12, 62F99, 62M99 .

Keywords: Parametric estimation; LAN property; maximum likelihood estimation; statistics and PDE; interacting particle systems; McKean-Vlasov models.

CONTENTS

1. Introduction 2

1.1. Motivation 2

1.2. Setting 2

1.3. Results and organisation of the paper 3

2. Construction and properties of the statistical model 5

2.1. Notation 5

2.2. Model assumptions 5

2.3. The companion McKean-Vlasov product experiment 8

2.4. Identifiability and non-degeneracy of the Fisher information 10

3. Main results 13

3.1. The LAN property 13

3.2. Maximum likelihood estimation and properties 14

4. Examples 14

4.1. McKean-like models 15

4.2. Generalised linear like models 16

4.3. A double layer potential model 17

4.4. A genuinely non-linear example 19

5. Proof of the main results 20

5.1. Preliminaries: couplings 20

5.2. Proof of Theorem 17 21

5.3. Proof of Theorem 19 24

6. Remaining proofs 28

Date: May 13, 2022.

1

(2)

6.1. Proof of Proposition 9 28

6.2. Proof of Proposition 10 28

6.3. Proof of Proposition 11 29

6.4. Proof of Proposition 16 30

7. Appendix 30

7.1. Proof of Lemma 7 30

7.2. Proof of Lemma 8 31

7.3. Proof of Lemma 20 31

7.4. Proof of Lemma 21 32

References 33

1. INTRODUCTION

1.1. Motivation. Collective dynamics models are becoming increasingly popular in modelling complex stochastic systems, with a versatiliy of applications, ranging from mathematical biol- ogy (neurosciences, Baladronet al. [2], structured models in population dynamics, Mogilneret al. [40], Burgeret al. [8]) to social sciences (opinion dynamics, Chazelleet al. [13], cooperative behaviours, Canutoet al. [9]) and finance (systemic risk, Fouque and Sun [17]), or more recently, mean-field games (Cardaliguetet al.[10], Cardaliaguet and Lehalle [11]). Whereas stochastic sys- tems of interacting particles and associated nonlinear Markov processes in the sense of McKean [38] date back to the 1960’s and have been studied extensively over more than half a century, see e.g.[7, 44, 45, 39, 47] among a myriad of references, the development of statistical inference in this setting is only emerging, (with some notable exceptions like L ¨ocherbach [35] in large time ore.g.

Kasonga [27] or Bishwal [4]) in a mean-field limit. Recently, Gieseckeet al.[22] and Sharrock, Kan- tas, Parpas and Grigorios [43] revisit the work of Kasonga and consider a parametric framework where convergent and asymptotically normal contrast estimators are constructed. Several other parametric frameworks (that consider various observation schemes and asymptotic frameworks) have also been recently considered, like [14, 34, 49] or Genon-Catalot and Laredo [20, 21]. There also exist recent results in nonparametric inference: we mention our work [16] and Belometsnyet al.[3], together with studies in identification like [31, 32, 33] or learning [30, 36, 37].

The present paper, close in spirit to [4, 22, 27] and [43] (in their so-called offline case) consid- ers a parametric framework in a mean-field regime over a fixed time horizon. We take a deeper look at the asymptotic structure of the associated statistical experiment, in the sense of local as- ymptotic normality or LAN, in order to derive strong results for the maximum likelihood, both in asymptotic distribution and in an asymptotic minimax sense (up to constants) for various loss functions. For simplicity, we keep-up with continuous observations, but we briefly explain how to move to a discrete data setting. Also, we look for simple and explicit criteria that enable us to verify identifiability and non-degeneracy of the model. This is a non-trivial issue in the context of nonlinear McKean-Vlasov models that is usually a bit overlooked in the literature.

1.2. Setting. We have a parameter of interestϑlying in a compact setΘ⊂Rp(with non empty interior), for some fixedp ≥1. For some fixed time horizonT > 0, we continuously observe a stochastic system ofN interacting particles

(1) X(N)= (Xt1, . . . , XtN)t∈[0,T],

(3)

evolving in an Euclidean ambient spaceRd, that solves (2)

(

dXti=b(ϑ;t, Xti, µ(Nt ))dt+σ(t, Xti)dBti, 1≤i≤N, t∈[0, T], L(X01, . . . , X0N) =µ⊗N0 ,

whereµ(Nt )=N−1PN

i=1δXti is the empirical measure of the system. The(Bti)t∈[0,T]are indepen- dentRd-valued Brownian motions. The initial conditionµ0, the driftband the diffusion coefficient σare at least sufficiently regular so that

µ(N)= (µ(Nt ))t∈[0,T]→µ= (µt)t∈[0,T]

weakly asN → ∞, whereµis a family of probability measures that solves (in a weak sense) the parabolic nonlinear equation

(3)

tµ+ div b(ϑ;·, µ)µ

=12Pd

k,k=1kk2 ckkµ

, t∈[0, T], µt=00,

with c = σσ. We will write µϑ = (µϑt)t∈[0,T] to emphasise the dependence in ϑ. In this context, we are interested in estimating from data (1) the parameter ϑ ∈ Θ of the function (ϑ;t, x, ν)7→b(ϑ;t, x, ν)∈Rd. Asymptotics are taken asN → ∞.

A particular case of interest that covers many examples is when the dependence in the measure variable forbis linear: we then have

(4) b(ϑ;t, Xti, µ(N)t ) = Z

Rd

eb(ϑ;Xti, y)µ(Nt )(dy) =N−1 XN j=1

eb(ϑ;Xti, Xtj),

for some functioneb: Θ×Rd×Rd→Rd. A typical form iseb(ϑ;t, x, y) =Gϑ(x) +Fϑ(x−y)where Gϑ, Fϑ:Rd→Rdplay the role of a common external force to the system and an interaction force respectively.

1.3. Results and organisation of the paper. In Section 2, we rigorously construct the (sequence of) statistical experiment(s) generated by the observation (1) under the dynamics (2) that we de- note(EN)N≥1. It is well defined and regular in the classical sense of Ibragimov and Hasminski [24] under strong integrability of the initial conditionµ0and standard smoothness assumptions on the driftϑ 7→ b(ϑ;·)and the diffusion matrixc = σσ, see Assumptions 1, 2, 3 and 4 and Proposition 6. The deep study of the identifiability ofEN and the non-degeneracy of its informa- tion matrixIEN(ϑ)is simplified via the accompanying experimentG⊗N, whereGis generated by the continuous observation of a solution to the McKean-Vlasov equation

dXt=b(ϑ;t, Xt, µϑt)dt+σ(t, Xt)dBt, t∈[0, T], L(X0) =µ0,

for a standard Brownian motion(Bt)t∈[0,T]onRdand whereµϑt is the marginal distribution of the solution at timet. In particular, in the case of representation 4 we have thatEN andG⊗N do not separate asymptotically by a simple entropy argument, see Proposition 10, and we always have the convergence of the corresponding Fisher information matrices:

N−1IEN(ϑ)→IG(ϑ)

in a mean-field limitN→ ∞, as established in Proposition 11. This approximation is the gateway to obtain explicit identifiability and non-degeneracy criteria, as detailed in Section 2.4. In partic- ular, under additional regularity assumptions, we obtain a quite simple criterion forIG(ϑ)to be

(4)

non-degenerate in Proposition 15, namely the property that one of the functions (5) x7→ ∇ϑ(c−1/2b)j(ϑ; 0, x, µ0)z, j= 1, . . . , d

is not identically vanishing, for everyz∈Rpwith|z|= 1, withc−1/2a square root ofc=σσ. We use the notationf = (fj)1≤j≤dcomponentwise, thefjbeing real-valued functions. In particular, (5) has the advantage to only relate to the initial conditionµ0in the measure argument and not the whole (µϑt)t∈[0,T] which is (almost) never explicit. Having a simple criterion to achieve the non-degeneracy of the Fisher information seems to have been a bit overlooked in the literature (where it is usually simply assumed to hold true) and our result is thus of interest for other studies.

In Section 3, we state the main results of the paper, Theorem 17, where we establish the LAN property: if we reparametrise the experiments viaϑ=ϑ0+N−1/2ulocally around a fixed point ϑ0, withu∈Rpbeing now the unknown parameter, then bothEN andG⊗N look like a Gaussian shift: we observe

YN =u+IG0)−1/2ξ,

whereξis a standard Gaussian random vector inRp. This has important consequences in terms of existence and properties of optimal procedures: we have H´ajek’s convolution theorem (Corollary 18), namely for any estimatorϑbN,

(6) lim inf

N→∞ sup

−ϑ|≤δ

EPNϑ

w N1/2IG(ϑ)1/2(ϑbN −ϑ)

≥E[w(ξ)], for small enoughδ >0, wherePN

ϑ is the distribution of the data when the parameter isϑ andw is an arbitrary loss function satisfying some regularity properties. The bound (6) is achieved by the maximum likelihood estimatorϑbNmleobtained by maximising the contrast

(7) ϑ7→ℓN(ϑ;X(N)) = XN i=1

Z T 0

(c−1b)(ϑ;t, Xti, µ(N)t )dXti12|(c−1/2b)(ϑ;t, Xti, µ(N)t )|2dt .

This implies in particular the convergence

(8) √

N ϑbNmle−ϑ

→N 0,IG(ϑ)−1

in distribution. Moreover, we have in Theorem 19 the minimax asymptotic optimality ofϑbNmle, in the sense that

RNw(ϑbNmle; Θ) = inf

ϑbN

RNw(ϑbN; Θ)(1 +o(1)) whereRNw(ϑbN; Θ) = supϑ∈ΘE

PNϑ[w(N1/2IG(ϑ)1/2(ϑbN −ϑ))]is the classical minimax risk. Thus the LAN property enables us to obtain considerably stronger results than simply (8). In Section 4, we investigate several non-trivial examples that generalise the results of [27], and where our identifiability and non-degeneracy criteria easily apply. We treat in particular the case of a kinetic mean-field double layer potential that may serve as a representative model for swarming models, see in particular [6] and the references therein. The proofs are delayed until Sections 5 and 6, with an appendix (Section 7) that contains useful technical results.

In practice, maximising the function (7) is not feasible, since only discrete data are available. It is then reasonable to replace the ideal observation (1) by the more realistic

X(N,m)= Xt1, . . . , XtN

t∈{tm0,...,tmm},

(5)

where(0 =tm0 < tm1 < . . . < tmm=T)is a subdivision of[0, T]with mesh

1≤j≤mmax (tmj −tmj−1)≤Cm−1.

We thus have(m+ 1)×Ndata with values inRd. We may then replace (7) by ϑ7→N−1

XN i=1

Xm j=0

(c−1b)(ϑ;tmj , Xtim

j−1, µ(Ntm)

j−1)(Xtim

j −Xtim

j−1)

12|(c−1/2b)(ϑ;tmj−1, Xtim

j−1, µ(Ntm)

j−1)|2(tmj −tmj−1) .

Assuming the function(t, x)7→(c−1/2b)(ϑ;t, x, µ(N)t )to be smooth, we may safely expect the dis- crete approximation to be close to its continuous counterpart up to an additional error of order m−1/2, by standard high-frequency discretisation techniques, see the textbooks of Jacod and co- authors [1, 25, 26]. In particular, ifm≫ N, the same results as for continuous observations are likely to hold true.

2. CONSTRUCTION AND PROPERTIES OF THE STATISTICAL MODEL

2.1. Notation. The dimensiond≥1of the state spaceRdand the dimensionp≥1of the param- eter spaceΘas well as the time horizonT >0are fixed once for all. We write| · |for the Euclidean distance onRq (q=p, dor any other integer, depending on the context) or for a matrix norm on Rp⊗Rpfixed throughout.

We consider functions that are mappings defined on products of metric spaces (typicallyΘ× [0, T]×Rd×P1or subsets of these) with values inRorRd. Here,P1denotes the set of probability measures onRdwith a first moment, endowed with the Wasserstein1-metric

W1(µ, ν) = inf

m∈Γ(µ,ν)

Z

Rd×Rd

x−ym(dx, dy) = sup

|φ|Lip≤1

Z

Rd

φ d µ−ν ,

whereΓ(µ, ν)denotes the set of probability measures on the product spaceRd×Rdwith marginals µandν. For a probability measureµonRd, we also set

mr(µ) = Z

Rd|y|rµ(dy)

for its moment of order r ≥ 1 and we say that µ ∈ Pr if mr(µ)is finite. All the functions in the paper are implicitly measurable with respect to the Borel-sigma field induced by the product topology. ARd-valued functionf is written componentwise asf = (fk)1≤k≤d where thefk are real-valued. We denote by∂ϑk,∇ϑ,∂ϑ2kϑl respectively the partial derivative of a function with respect to the k-th componentϑk, the gradient of a real-valued function with respect toϑ, the second order partial derivative of a function with respect to thek-th andl-th componentsϑk, ϑl.

Finally, we repeatedly use the notationCfor a positive number that does not depend on N, norϑ, that may vary from line to line and that we call a constant, although it usually depends on some other (fixed) quantities of the model. In most cases, it is explicitly computable.

2.2. Model assumptions.

(6)

Well-posedness of the model and its associated statistical experiment. We work under the following strong integrability property for the initial conditionµ0.

Assumption 1. For everyr≥1, we haveµ0∈Pr.

As for the diffusion matrixσ: [0, T]×Rd→Rd⊗Rd, we make the following strong ellipticity and Lipschitz smoothness assumption.

Assumption 2. The diffusion matrixσis measurable and for someC≥0, we have

|σ(t, x)−σ(t, x)| ≤C|x−x|.

Moreover,c=σσis such thatσ2|y|2≤(c(t, x)y)y≤σ2+|y|2for someσ±>0.

As for the drift partb: Θ×[0, T]×Rd×P1→Rd, we work under usual Lipschitz smoothness assumptions.

Assumption 3. The drift b is measurable and for someC≥0, we have sup

t∈[0,T],ϑ∈Θ

b(ϑ;t, x, ν)−b(ϑ;t, x, ν)≤C |x−x|+W1, ν) ,

and there exists someϑ0∈Θsuch that b0= sup

t∈[0,T]|b(ϑ0;t,0, δ0)|<∞.

We let|b|Lipdenote the smallestC≥0for which Assumption 3 holds.

Assumptions 1, 2, 3 together are sufficient to guarantee the well-posedness of the statistical model: there exists a unique weak solution to (2) for everyϑ∈Θhence the dataX(N)of (1) is well- defined. More precisely, we letCN =C([0, T],(Rd)N)denote the space of continuous functions on (Rd)N, equipped with the natural filtration(Ft)t∈[0,T]induced by the canonical mappings

Xt(N)(ω) = Xt1(ω), . . . , XtN(ω)

t.

For µ0 ∈ P1 and ϑ ∈ Θ, the probabilityPNϑ on(CN,FN)under which the canonical process X(N) = (Xt(N))t∈[0,T] is a solution of (2) for the initial condition µ⊗N0 is uniquely defined un- der Assumptions 1, 2 and 3. Recommended reference (that covers our set of assumptions) is the textbook by Carmona and Delarue [12] or the lectures notes of Lacker [28]. Moreover, for every ϑ∈Θ, the parabolic nonlinear equation (3) has a unique probability solutionµ= (µϑt)t∈[0,T]and we have the weak convergenceµ(Nt )→µϑt underPNϑ, for everyϑ∈Θ.

We thus study under Assumptions 1, 2, 3 the (sequence of) statistical experiment(s) generated by the observation (1) under the dynamics (2) and that we realise as

(EN)N≥1=

CN,FN, PNϑ, ϑ∈Θ

N≥1.

Note that at that stage, we do not impose any identifiability assumptioni.e. we do not assume that the mapping ϑ 7→ PNϑ is one-to-one. We will discuss that matter together with the non- degeneracy of the model later in Section 2.4.

(7)

Regularity of the experimentEN. In order to study the regularity of the model, we need specific smoothness properties for the functionϑ7→b(ϑ,·).

Assumption 4. There existr1, r2 ≥ 1andC > 0such that for every pointϑin the interior ofΘ, the functionϑ7→b(ϑ;t, x, ν)is twice differentiable and for every1≤ℓ, ℓ≤p,

sup

t∈[0,T]

(|∂ϑb(ϑ;t, x, ν)|+|∂ϑ2ϑb(ϑ;t, x, ν)|)≤C(1 +|x|r1+mr2(ν)), sup

t∈[0,T]|∂ϑb(ϑ;t, x, ν)−∂ϑb(ϑ;t, x, ν)| ≤C(|x−x|+W1, ν)).

The smoothness properties of the map ϑ 7→ b(ϑ;·)granted by Assumption 4 enables us to explore further the regularity of the experimentEN. First, note that we have a log-likelihood by setting

(9) ℓN(ϑ;X(N)) = XN i=1

Z T 0

(c−1b)(ϑ;t, Xti, µ(Nt ))dXti−1 2

XN i=1

Z T

0 |(c−1/2b)(ϑ;t, Xti, µ(N)t )|2dt, wherec−1/2 is fixed once for all. Indeed, by Girsanov’s theorem again, the lawsPNϑ are all ab- solutely continuous w.r.t. WN, defined as the unique probability on(CN,FN)under which the processes

Z t

0

c−1/2(s, Xsi)dXsi

t∈[0,T], 1≤i≤N

are independent standard Brownian motions on Rd, together with L(X01, . . . , X0N) = µ⊗N0 . In turn, for everyϑ∈Θ,

dPN

ϑ

dWN(X(N)) = exp ℓN(ϑ;X(N))

holdsWN-almost-surely. We further writeLN(ϑ;X(N)) = exp ℓN(ϑ;X(N))

for the likelihood process, indexed by the parameterϑ∈Θ. We recall one possible classical definition of a regular statistical experiment, following [24].

Definition 5. The dominated (sequence of) experiment(s)(EN)N≥1is regular if

(i) ϑ7→LN(ϑ;X(N))is differentiable for everyϑin (the interior of)Θ,WN-almost surely, (ii) ϑ7→ ∇ϑLN(ϑ;X(N))is continuous in quadraticWN-mean, for everyϑin (the interior of)Θ, (iii) we have finite Fisher information

EPNϑ

|∇ϑN(ϑ;X(N))|2

<∞ for everyϑin (the interior of)Θ.

Proposition 6. Under Assumptions 1, 2, 3 and 4 the (sequence of) experiment(s)(EN)N≥1is regular.

(Sketch of) Proof. By exchanging the order of the differentiation with respect toϑand the stochastic integral we have

ϑkN(ϑ;X(N)) = XN i=1

Z T 0

ϑk(c−1b)(ϑ;t, Xti, µ(N)t )dXti

− XN i=1

Z T 0

ϑk(c−1/2b)(ϑ;t, Xti, µ(Nt ))(c−1/2b)(ϑ;t, Xti, µ(N)t )dt.

(8)

We obtain the representation (10) ∂ϑkN(ϑ;X(N)) =

XN i=1

Z T 0

ϑk(c−1/2b)(ϑ;t, Xti, µ(Nt ))dBti,N,ϑ, where the

(Bi,N,ϑt )t∈[0,T]= Z t

0

c−1/2(s, Xsi)(dXsi−b(ϑ;s, Xsi, µ(Ns ))ds)

t∈[0,T], 1≤i≤N are independent Brownian motions onRd underPN

ϑ. The properties (i), (ii) and (iii) are then a simple consequence of Assumption 4 together with the following moment bound,

Lemma 7. Under Assumptions 1, 2, 3, for everyr≥1, we have sup

ϑ∈Θ,t∈[0,T],N≥1

EPNϑ[|Xti|r]<∞. Note thatE

PNϑ[|Xti|r]does not depend oni. The proof of Lemma 7 is given in Appendix 7.1.

Finally, we have a notion of Fisher information matrix by setting IEN(ϑ) =EPN

ϑ

ϑN(ϑ;X(N))∇ϑN(ϑ;X(N)) . Thanks to (10), we also have

(11) IEN(ϑ) =XN

i=1

EPNϑ

h Z T

0

ϑ(c−1/2b)(ϑ;t, Xti, µ(Nt ))∂ϑ(c−1/2b)(ϑ;t, Xti, µ(N)t )dti

1≤ℓ,ℓ≤p. 2.3. The companion McKean-Vlasov product experiment. We letC = C([0, T],Rd)denote the space of continuous functions onRd, equipped with the natural filtration(Ft)0≤t≤T induced by the canonical mapping Xt(ω) = ωt. For every ϑ ∈ Θ, we let Pϑ denote the unique law under which the process

(Btϑ)t∈[0,T]= Z t

0

c−1/2(s, Xs)(dXs−b(ϑ;s, Xs, µϑs)ds)

t∈[0,T]

is a standard Brownian motion on Rd, appended with the condition L(X0) = µ0, andµϑ = (µϑt)t∈[0,T]is a probability solution of (3). The family(Pϑ)ϑ∈Θis well-defined under Assumptions 1, 2, 3. In particular, the canonical process X on (C,FT) is a solution to the McKean-Vlasov equation

(12)

dXt=b(ϑ;t, Xt, µϑt)dt+σ(t, Xt)dBtϑ, t∈[0, T], L(X0) =µ0.

The following result is the counterpart of Lemma 7. Note in particular that the marginals ofPϑ coincide with the solutionµϑ= (µϑt)t∈[0,T]of the Fokker-Planck equation (3).

Lemma 8. Under Assumptions 1, 2, 3, for everyr≥1, we have sup

ϑ∈Θ,t∈[0,T]

EPϑ[|Xt|r] = sup

ϑ∈Θ,t∈[0,T]

Z

Rd|x|rµϑt(dx)<∞.

The proof is given in Section 7.2. We also have the following smoothness property in the parameterϑ, proof of which is delayed until Section 6.1.

Proposition 9. Under Assumption 1, 2, 3 and 4, the mappingϑ 7→ µϑt is Lipschitz continuous in the Wasserstein-1 metricW1, uniformly int∈[0, T].

(9)

We next consider the limit experiment

G= C,FT,(Pϑ)ϑ∈Θ and itsN-fold counterpart

G⊗N =

CN,FNT,(P⊗Nϑ )ϑ∈Θ

that serves as an approximation for the experimentEN. Inspired by classical propagation of chaos techniques (see in particular [29]), we can easily show that the measuresPN

ϑ andP⊗Nϑ are indis- tinguishable when the drift is of the form

(13) b(ϑ;t, x, ν) =

Z

Rd

eb(ϑ;t, x, y)ν(dy), for some kerneleb(ϑ;·) : [0, T]×Rd×Rd→Rdsuch that

(14) sup

t∈[0,T],ϑ∈Θ

eb(ϑ;t;x;y)≤C(1 +|x|r1+|y|r2)

for some r1, r2 ≥ 1, a situation that covers most of our examples, see Section 4 below. More precisely, we have the following

Proposition 10. Under Assumptions 1, 2, 3, ifbhas moreover the form(13)-(14), we have

(15) lim sup

N→∞

sup

ϑ∈Θ

EP⊗Nϑ

hlogdP⊗N

ϑ

dPN

ϑ

i<∞.

In particular, if

sup

ϑ∈Θ

Z T 0

Z

Rd×Rd|eb(ϑ;t, x, y)|2ϑt ⊗µϑt)(dx, dy)dt <4, then

(16) lim sup

N→∞ sup

ϑ∈ΘkPNϑ −P⊗Nϑ kT V <1, wherek · kT V denotes total variation distance.

The proof is given in Section 6.2. Some remarks are in order: 1)The estimate (15) tells us that it is impossible to statistically discriminate betweenPN

ϑ andP⊗Nϑ asymptotically. More precisely, inequality (16) shows in particular that providedeb is not too big orT not too large, then there exists no test of the nullH0 :PN

ϑ =P⊗Nϑ against the alternativeH1 :PN

ϑ 6=P⊗Nϑ with asymptot- ically arbitrarily small first and second kind error in the limitN → ∞.2)We will actually prove a stronger result in Section 3 below, showing that both(EN)N≥1and(G⊗N)N≥1share the LAN property, with same asymptotic Fisher information. 3)Finally, (15) may hold in wider generality when the dependence in the measure variable in the drift is nonlinear, as soon as we have some differentiability in the following sense: there exists∂νb(ϑ;t, x,·) :Rd×P1→Rdsuch that

b(ϑ;t, x, ν)−b(ϑ;t, x, ν) = Z 1

0

νb(ϑ;t, x, y, λν+ (1−λ)ν)(ν−ν)(dy)

for everyν, ν∈P1and∂νb(ϑ;t, x,·)satisfies additional smoothness properties. Iterating the op- erator∂ν, if∂νkb(ϑ;t, x,·) : (Rd)k×P1→Rdexists and satisfies some smoothness and integrability properties, we may expect (15) to hold as soon ask≥d/2. We refer to Assumption 4 and Propo- sition 19 of [16] where this approach is developed.

(10)

We also have a log-likelihood in the experimentG⊗N by setting (17) ℓN(ϑ;X(N)) =

XN i=1

Z T 0

(c−1b)(ϑ;t, Xti, µϑt)dXti−1 2

XN i=1

Z T

0 |(c−1/2b)(ϑ;t, Xti, µϑt)|2dt.

This is the same argument as before: the lawsP⊗Nϑ are all absolutely continuous w.r.t. WN, and for everyϑ∈Θ,

dP⊗Nϑ

dWN (X(N)) = exp ℓN(ϑ;X(N)) holdsWN-almost-surely.

Finally under Assumptions 1, 2, 3 and 4, the (sequence of) experiment(s)G⊗N is also a regular model and its (normalised) Fisher informationIG(ϑ) =N−1IG⊗N(ϑ)is given by

N−1E

Pϑ

ϑN(ϑ;X(N))∇ϑN(ϑ;X(N))

= Xd j=1

Z T 0

Z

Rdϑ(c−1/2b)j(ϑ;t, x, µϑt)∇ϑ(c−1/2b)j(ϑ;t, x, µϑt)µϑt(dx)dt

= Z T

0

Z

Rd

ϑ(c−1/2b)(ϑ;t, x, µϑt)∂ϑ(c−1/2b)(ϑ;t, x, µϑt)µϑt(dx)dt

1≤ℓ,ℓ≤p.

Moreover, the mappingϑ7→IG(ϑ)is smooth and appears as the (normalised) asymptotic infor- mation ofEN:

Proposition 11. Under Assumptions 1, 2, 3 and 4, the mappingϑ 7→ IG(ϑ)is Lipschitz continuous.

Moreover, for everyϑin (the interior of)Θ, we have

N−1IEN(ϑ)→IG(ϑ)

asN → ∞, whereIEN(ϑ)is the Fisher information matrix of the experimentEN defined in(11)above.

The proof is given in Section 6.3.

2.4. Identifiability and non-degeneracy of the Fisher information.

Motivation. In the preceding section, we have builtEN andG⊗N (equivalentlyG) as possibly re- dundant, in the sense that the mappingsϑ 7→ PN

ϑ and ϑ 7→ Pϑ are not necessarily one-to-one onΘ. Having a well-posed parametrisation is required since we wish to have at least consistent estimators. Arguing asymptotically, we only need to work in the limit modelG.

Also, asymptotic identifiability is somehow linked to the non-degeneracy of the (normalised) Fisher information matrixIG. Following [42], see also [48], we say that a pointϑin (the interior of)Θisregularifϑ 7→IG)has constant rank in a neighbourhood ofϑand the experimentGis calledlocally identifiableatϑif the mappingϑ 7→Pϑis injective in a neighbourhood ofϑ. We have the following classical result (that goes back at least to Cramer [15]):

Proposition 12(Theorem 1 in [42]). Ifϑis regular, thenGis locally identifiable atϑif and only ifIG(ϑ) has full rank.

Unfortunately, there is no hope to obtain a global result that links the two notions unless in very specific cases, see Proposition 16 below. We next givead-hocassumptions that give sufficient (and independent) condition for both identifiability and non-degeneracy of the Fisher information.

(11)

An identifiability assumption. We first have a relatively weak assumption that guarantees global identifiability inG.

Assumption 13. For allϑ∈Θ, forPϑ-almost allω, for allϑ 6=ϑ, the functionst7→b(ϑ;t, Xt(ω), µϑt) andt7→b(ϑ;t, Xt(ω), µϑt)are notdt-a.e. equal.

Assumption 13 is relatively standard in the literature of statistics of random processes and minimal (seee.g.[19] in a somewhat analogous context). Indeed, by Girsanov’s theorem, for two different parametersϑ, ϑ ∈Θ, the lawsPϑandPϑare absolutely continuous and

log dPϑ dPϑ

(X) = Z T

0

((c−1b)(ϑ;s, Xs, µϑs)−(c−1b)(ϑ;s, Xs, µϑs))dXsi

−1 2

Z T 0

(|(c−1/2b)(ϑ;s, Xsi, µϑs)|2− |(c−1/2b)(ϑ;s, Xsi, µϑs)|2)ds.

Having Assumption 13 fail for someϑ impliesPϑ dPϑ

dPϑ(X) = 1

,i.e. Pϑ =Pϑ. Assumption 13 may be difficult to check in practice. Yet, it is satisfied as soon as the mappingϑ 7→ (t, x) 7→

b(ϑ;t, x, µϑt)

is one-to-one. Also, for certain form of the likelihood, we have other criteria, see Proposition 16 below.

Non-degeneracy of the information. We need some notation. For anyϑ, ϑ∈Θsuch that the segment [ϑ, ϑ] ={ϑ+λ(ϑ−ϑ), λ∈[0,1]} ⊂Θand a functionφdefined onΘ, we set

φ([ϑ, ϑ]) = Z 1

0

φ(ϑ+λ(ϑ−ϑ))dλ.

Definition 14. The statistical experimentGis non-degenerate if

(18) inf

[ϑ,ϑ]⊂Θ

detE

Pϑ

ϑ1([ϑ, ϑ])∇ϑ1([ϑ, ϑ])

>0, wheredetdenotes the determinant.

Equivalently, we can rewrite (18) as infdet

Xd

j=1

Z T 0

Z

Rdϑ(c−1/2b)j([ϑ, ϑ];t, x, µϑt)∇ϑ(c−1/2b)j([ϑ, ϑ];t, x, µϑt)µϑt(dx)dt

>0, where the infimum is taken over all segments [ϑ, ϑ] ⊂ Θ. Obviously, if G is non-degenerate, takingϑ=ϑ, Definition 14 boils down to

(19) inf

ϑ∈Θ

detIG(ϑ)>0

i.e. ϑ 7→ IG(ϑ)has full rank uniformly inϑand we find back the usual non-degeneracy of the Fisher information. The somewhat stronger non-degeneracy criterion that we pick in Definition 14 enables us to check the assumptions of the theory of Ibragimov and Hasminski for obtaining sharp properties for the maximum likelihood estimator (see in particular Step 2 of the proof of Theorem 19 in Section 5.3 below). In explicit examples, proving (18) is no more difficult than proving (19), see Section 4 below.

(12)

Checking(18)or(19)in practice. A special difficulty for the statistical analysis ofEN or ratherGlies in the asymptotic form (12) with the presence of(µϑt)0≤t≤T in the drift, which is never explicit, except in very special cases with a specific moment structure in the measure dependence, see Sec- tion 4 below.

It is noteworthy that (18) can usually be tested in a simple way given an explicit parametrisa- tion. Indeed, Definition 14 is equivalent to show that for every segment[ϑ, ϑ]⊂Θ,

[ϑ,ϑinf]⊂Θmin

|z|=1

Xd j=1

Z T 0

Z

Rd

(∇ϑ c−1/2b)j([ϑ, ϑ];t, x, µϑt)z2

µϑt(dx)dt >0.

Under Assumptions 1, 2, 3, we have thatµϑt(dx) =µϑt(x)dx is absolutely continuous onRd for t >0, and we may pick a versionµϑt of the density that is continuous and positive onRd. This follows from classical Gaussian tail estimates for the solution of parabolic equations. We refer for example to Corollary 8.2.2 of [5]. By a simple continuity argument, it is then sufficient to show that there cannot exist a segment[ϑ, ϑ]⊂Θand some|z|= 1, such that the function

x7→

Z T 0

Xd j=1

ϑ(c−1/2b)j([ϑ, ϑ];t, x, µϑt)z2

dt

vanishes asymptotically, or, as soon as we have continuity intast→0, if one of the functions x7→ ∇ϑ(c−1/2b)j([ϑ, ϑ]; 0, x, µ0)z, j= 1, . . . , d

does not identically vanishes. This last criterion has the advantage to avoid the termµϑt fort >0.

We gather these observations in the following:

Proposition 15. Work under Assumptions 1, 2, 3 and 4. Assume moreover that the functions t7→ ∇ϑ(c−1/2b)j([ϑ, ϑ];t, x, µϑt), j= 1, . . . , d

are all continuous att= 0for every[ϑ, ϑ]⊂Θand a.e.-almostx∈Rd. If, for every[ϑ, ϑ]⊂Θand anyz∈Rpwith|z|= 1, one of the functions (20) x7→ ∇ϑ(c−1/2b)j([ϑ, ϑ]; 0, x, µ0)z, j= 1, . . . , d does not identically vanishes, thenGis non-degenerate in the sense of Definition 14.

We specifically apply this criterion in the examples Section 4 and check that the criterion (20) is particularly simple to establish when the dependence in the measure argument of the function bis of the form (13).

A case of equivalence between global identifiability and non-degeneracy of the information. We revisit Theorem 3 in [42] to obtain the following criterion:

Proposition 16. Work under Assumptions 1, 2, 3 and 4. Assume that the log-likelihoodN(ϑ;X(N))in EN defined by(9)has the form

(21) ℓN(ϑ,;XN) =ϑGN(X(N)) +ϑHN(X(N))ϑ,

whereGN andHN are functions of the trajectoryX(N)with values inRpandRp⊗Rprespectively, and (HN)=HN is symmetric. IfΘ0⊂Θis a convex set such thatIG(ϑ)is non-singular for everyϑ∈Θ0, then, both(EN)N≥1andGare identifiable onΘ0.

(13)

By identifiability of the sequence of experiment(EN)N≥1, we mean injectivity of the mapping ϑ 7→ (PN

ϑ)N≥1 (i.e. simultaneously for everyN ≥ 1). The proof is given in Section 6.4. In the specific case of McKean type models that date back to [38, 44, 46] and widely used in practice (see e.g. [12, 17] or [27] in statistics), we have in some instances a representation like (21) and explicit formulas forIG(ϑ), which gives global identifiability for free as soon asIG(ϑ)is non-degenerate.

See the examples in Section 4.

3. MAIN RESULTS

3.1. The LAN property. The local asymptotic normality property of a statistical model charac- terises its regularity: it expresses the fact that the experiment locally resembles a Gaussian shift in an optimal scale driven by the Fisher information. It has powerful consequences in terms of prop- erties of optimal procedures via the celebrated H´ajek convolution theorem [23]. More precisely the sequence of experiments(EN)N≥1satisfies the LAN property atϑ∈Θwith information rate NIG(ϑ)if

(22) logdPN

ϑ+(NIG(ϑ))−1/2u

dPN

ϑ

=uξϑN12|u|2+rN(ϑ, u),

whereξϑNconverges in distribution underPNϑ to standard Gaussian variable inRpandrN(ϑ, u)→ 0inPNϑ-probability. Of course, the convergence (22) is meaningful only ifϑ+ (NIG(ϑ))−1/2u∈Θ and is well-defined,i.e. ifdetIG(ϑ)>0. This is granted for instance forϑin the interior ofΘfor large enoughNand under (19).

Theorem 17. Work under Assumptions 1, 2, 3, 4 and 13. Assume moreover thatGis non-degenerate according to Definition 14. For every ϑin (the interior of)Θ, the sequence of experiments(EN)N≥1 is locally asymptotically normal atϑwith information rateNIG(ϑ).

The same result holds for(G⊗N)N≥1.

Several remarks are in order: 1)Theorem 17 is the most powerful result one can obtain about the structure of(EN)N≥1and(G⊗N)N≥1: it tells us that around a given pointϑ0, if we parametrise locally the experiment viaϑ=ϑ0+N−1/2uwithu∈Rpbeing the unknown parameter, then the experiments look like the simplest possible experiment, namely a Gaussian shift

YN =u+IG0)−1/2ξ+o(1)

whereξis a standard normalN(0,IdRp)ando(1)is a small term that vanishes inPN

ϑ orP⊗Nϑ prob- ability, locally uniformly inu.2)The fact that both(EN)N≥1and(G⊗N)N≥1share the LAN prop- erty with same asymptotic Fisher variance quantifies their asymptotic similarity, see in particular Proposition 10. 3)The LAN property has several consequences in terms of strong properties of the maximum likelihood estimator, see Theorem 19 below. In particular, the first simple conse- quence is given in terms of exact asymptotic minimax lower bounds: call a centrally symmetric functionw:Rp →[0,∞)such that the sets{w < c}, c >0are all convex apolynomial loss function if it admits a polynomial majorant.

Corollary 18. In the setting of Theorem 17, letwbe a polynomial loss function. Then, for any estimator ϑbN inEN and any sufficiently smallδ >0, for everyϑin (the interior of)Θfor whichdetIG(ϑ)>0, we have

lim inf

N→∞ sup

−ϑ|≤δ

EPN ϑ

w N1/2IG(ϑ)1/2(ϑbN −ϑ)

≥(2π)−p/2 Z

Rp

w(x) exp(−12|x|2)dx.

The same result holds true forϑbN inG⊗N replacingPNϑ byP⊗Nϑ .

(14)

Corollary 18 is a simple application of H´ajek convolution theorem, given the LAN property of Theorem 17, seee.g. Theorem II.12.1 (an in particular Remark III.12.1) in [24]. It provide with a sharp local asymptotically minimax bound, up to constants. We shall see below that the maxi- mum likelihood estimator achieves this bound.

3.2. Maximum likelihood estimation and properties. We elaborate on the properties of the max- imum likelihood estimator by relying on (a uniform version of) the LAN property of Theorem 17.

It implies several fine results that go beyond the usual asymptotic weak expansions given by an ad-hocstudy of the form of the estimator, as is usually the case in the literature.

Theorem 19. Work under Assumptions 1, 2, 3, 4 and 13. Then, for large enoughN, the solutionϑbNmleto (23) LN(bϑNmle;X(N)) = sup

ϑ∈Θ

LN(ϑ;X(N)) is well-defined. Moreover, the following asymptotic upper bounds are valid:

(i) ifGis non-degenerate in the sense of Definition 14,

√N ϑbNmle−ϑ

→N 0,IG(ϑ)−1 inPN

ϑ-distribution asN → ∞.

(ii) For every polynomial loss functionwand anyϑin the interior ofΘ, we have exact local asymptotic minimax optimality:

lim sup

N→∞ sup

−ϑ|≤δ

EPN ϑ

w N1/2IG(ϑ)1/2 ϑbNmle−ϑ

→(2π)−p/2 Z

Rp

w(x) exp(−12|x|2)dx asδ→0.

(iii) For every polynomial loss function wand any (non empty) open set Θ0 ⊂ Θ, we have global asymptotic minimax optimality:

RNw(ϑbNmle; Θ0) = inf

b ϑN

RNw(ϑbN; Θ0)(1 +o(1)) asN→ ∞, where

RNw(ϑbN; Θ0) = sup

ϑ∈Θ0

EPNϑ

w N1/2IG(ϑ)1/2(ϑbN −ϑ) .

Some further remarks:1)We find back the classical asymptotic properties (i) of the maximum likelihood estimator that are given in the literature, but the result is appended by a much stronger convergence in (ii), that matches in particular the lower bound of Corollary 18. 2) We finally obtain global asymptotic minimax optimality by (iii), which is the parametric analog (in a much more precise way) of our minimax results of Section 4 in [16] in the nonparametric case.

4. EXAMPLES

In this section, we elaborate on specific examples that appear in the literature and in applica- tions. We first revisit the linear McKean model studied at length in [27]. We slightly extend in Section 4.1 his example (1.3) fromp= 2top= 3. In Section 4.2, we develop an example of a gen- eralised linear form and show in particular how our identifiability and non-degeneracy criteria of Section 2.4 are easily implementable and avoid to use the machinery of [27]. In Section 4.3, we develop a non-trivial example of kinetik mean-field model with a double layer potential that may serve in many applications, like swarming models or more general individual based-models, see [6] and the references therein. We finally develop a genuinely non-linear example,i.e. when the

(15)

measure argument is not linear like in (4), as for instance in the examples of [41]. Assumption 1 is in force throughout.

4.1. McKean-like models. In many applications, (2) takes the explicit form (24) dXti= (ϑ1Xti2)dt−ϑ3N−1

XN j=1

(Xti−Xtj)dt+dBit, i= 1, . . . , N

withXti∈R. The parameter isϑ= (ϑ1ϑ2ϑ3). In [27] the caseϑ2= 0is studied at length in par- ticular. In our setting, we can encompass a more general situation withXti∈Rdfor some arbitrary d≥1and replaceϑ3by a parameter inRd⊗Rd as well asϑ2by a parameter inRd. In this case, Assumptions 2, 3 and 4 are readily checked. Likewise, the identifiability and non-degeneracy assumptions can be obtained with some extra care on the initial condition. We elaborate on a specific case below.

Likelihood equations. To keep-up with notational simplicity, we detail the casep = 3with ϑ = (ϑ1 ϑ2 ϑ3) ∈Θas a compact subset ofR3for an ambient dimensiond = 1, withϑ1 6= ϑ3 and ϑ16= 0. Introduce

ANt (x) =



x2 x −h· −x, µ(Nt )i2

x 1 0

−h· −x, µ(Nt )i2 0 h· −x, µ(Nt )i2

, BNt (x) =

x 1 h· −x, µ(Nt )i

,

where we use the bracket notationh·, νito denote integration w.r.t. the measureν. Define

(25) ANT =

Z T

0 hANt (x), µ(Nt )idt and

(26) BNT =N−1

XN i=1

Z T 0

BNt (Xti)dXti. Thanks to the linearity inϑof the driftb(ϑ;t, x, ν) =ϑ1x+ϑ2−ϑ3R

R(x−y)ν(dy), the likelihood equations are explicit and the maximum likelihood estimatorϑbNmlesolves

(27) ANTϑbNmle =BNT.

Moreover, the Fisher information matrix is given by IG(ϑ) =

Z T

0 hAt(ϑ;x), µϑtidt, with

At(ϑ;x) =

 x2 x −h· −x, µϑti2

x 1 0

−h· −x, µϑti2 0 h· −x, µϑti2

.

Références

Documents relatifs

using a univariate analysis, alcohol consumption was a risk factor for developing a solid cancer, but in multivariate analysis it was not an independent risk factor, unlike

Although the four dimensions and 16 items reported here provide a better theo- retical and empirical foundation for the measurement of PSM, the lack of metric and scalar

In Chapter 6 , by adapting the techniques used to study the behavior of the cooperative equilibria, with the introduction of a new weak form of MFG equilibrium, that we

Picard approximation scheme. The other advantage is that the approximate solutions converge strongly to the unique solution under any conditions on the coe¢ cients ensuring the

Si l'on veut retrouver le domaine de l'énonciation, il faut changer de perspective: non écarter l'énoncé, évidemment, le &#34;ce qui est dit&#34; de la philosophie du langage, mais

The performance of the baseline controller is compared to the same design augmented with one of two different model-reference adaptive controllers: a classical open- loop

/ La version de cette publication peut être l’une des suivantes : la version prépublication de l’auteur, la version acceptée du manuscrit ou la version de l’éditeur. Access

Although no change was observed at the gene expression level in response to the OLIV diet in LXR+/+ mice, oleic acid-rich diet induced an LXR-dependent increase in SREBP-1c