• Aucun résultat trouvé

CENTRAL LIMIT THEOREM FOR BIFURCATING MARKOV CHAINS UNDER L 2 -ERGODIC CONDITIONS

N/A
N/A
Protected

Academic year: 2021

Partager "CENTRAL LIMIT THEOREM FOR BIFURCATING MARKOV CHAINS UNDER L 2 -ERGODIC CONDITIONS"

Copied!
40
0
0

Texte intégral

(1)

HAL Id: hal-03261827

https://hal.archives-ouvertes.fr/hal-03261827

Preprint submitted on 16 Jun 2021

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

CENTRAL LIMIT THEOREM FOR BIFURCATING MARKOV CHAINS UNDER L 2 -ERGODIC

CONDITIONS

Siméon Valère Bitseki Penda, Jean-François Delmas

To cite this version:

Siméon Valère Bitseki Penda, Jean-François Delmas. CENTRAL LIMIT THEOREM FOR BIFUR-

CATING MARKOV CHAINS UNDER L 2 -ERGODIC CONDITIONS. 2021. �hal-03261827�

(2)

UNDER L2-ERGODIC CONDITIONS

S. VAL `ERE BITSEKI PENDA AND JEAN-FRANC¸ OIS DELMAS

Abstract. Bifurcating Markov chains (BMC) are Markov chains indexed by a full binary tree representing the evolution of a trait along a population where each individual has two children.

We provide a central limit theorem for additive functionals of BMC underL2-ergodic conditions with three different regimes. This completes the pointwise approach developed in a previous work. As application, we study the elementary case of symmetric bifurcating autoregressive process, which justify the non-trivial hypothesis considered on the kernel transition of the BMC.

We illustrate in this example the phase transition observed in the fluctuations.

Keywords: Bifurcating Markov chains, bifurcating auto-regressive process, binary trees, fluc- tuations for tree indexed Markov chain, density estimation.

Mathematics Subject Classification (2020): 60J05, 60F05, 60J80.

1. Introduction

Bifurcating Markov chains (BMC) are a class of stochastic processes indexed by regular binary tree and which satisfy the branching Markov property (see below for a precise definition). This model represents the evolution of a trait along a population where each individual has two children.

We refer to [4] for references on this subject. The recent study of BMC models was motivated by the understanding of the cell division mechanism (where the trait of an individual is given by its growth rate). The first model of BMC, named “symmetric” bifurcating auto-regressive process (BAR), see Section 4.1 for more details in a Gaussian framework, were introduced by Cowan &

Staudte [6] in order to analyze cell lineage data. In [8], Guyon has studied “asymmetric” BAR in order to prove statistical evidence of aging in Escherichia Coli.

In this paper, our objective is to establish a central limit theorem for additive functionals of BMC. This will be done for the class of functions which belong toL4(µ), whereµis the invariant probability measure associated to the associated Markov chain given by the genealogical evolution of an individual taken at random in the population. This paper complete the pointwise approach developed in [4] in a very close framework. Let us emphasize that theL2-approach is an important step toward the kernel approximation of the densities of the kernel transition of the BMC and the invariant probability measure µ which will be developed in a companion paper. The main contribution of this paper, with respect to [4], is the derivation of a non-trivial hypothesis on the kernel transition given in Assumption 2.4 (i). More precisely let the random variable (X, Y, Z) model the trait of the mother, X, and the traits of its two children Y and Z. Notice, we do not assume that conditionally on X, the random variables Y and Z are independent nor have the same distribution. In this setting, µis the distribution of an individual picked at random in the stationary regime. From an ergodic point of view, it would be natural to assume some L2(µ)

1

(3)

continuity in the sense that for some finite constant M and all functionsf andg:

EX∼µ[f(Y)2g(Z)2]≤MEY∼µ[f(Y)2]EZ∼µ[f(Z)2],

whereEW∼µmeans that the random variableW has distributionµ. However, this condition is not always true even in the simplest case of the symmetric BAR model, see comments in Remarks2.5 and the detailed computation in Section 4. This motivate the introduction of Assumption2.4(i), which allows to recover the results from [4] in the context of theL2approach, and in particular the three regimes: sub-critical, critical and super-critical regime. Since the results are similar and the proofs follows the same steps, we only provide a detailed proof in the sub-critical case. To finish, let us mention that the numerical study on the symmetric BAR, see Section 4.2 illustrates the phase transitions for the fluctuations. We also provide an example where the asymptotic variance in the critical regime is 0; this happens when the considered function is orthogonal to the second eigenspace of the associated Markov chain.

The paper is organized as follows. In Section2, we present the model and give the assumptions:

we introduce the BMC model in Section 2.1, we give the assumptions under which our results will be stated in Section 2.2 and we give some useful notations in Section 2.3. In Section 3, we state our main results: the sub-critical case in Section 3.1, the critical case in Section 3.2 and the super-critical case in Section 3.3. In Section 4, we study the special case of symmetric BAR process.

The proof of the results in the sub-critical case given in Section5, which are in the same spirit of [4], rely essentially on explicit second moments computations and precise upper bounds of fourth moments for BMC which are recalled in Section6. The proof of the results in the critical case is an adaptation of the sub-critical space in the same spirit as in [4]; the interested reader can find the details in [3]. The proof of the results in the super-critical case does not involve the original Assumption2.4(i); it not reproduced here as it is very close to its counter-part in [4].

2. Models and assumptions

2.1. Bifurcating Markov chain: the model. We denote byNthe set of non-negative integers andN=N\ {0}. If (E,E) is a measurable space, thenB(E) (resp. Bb(E), resp. B+(E)) denotes the set of (resp. bounded, resp. non-negative) R-valued measurable functions defined onE. For f ∈B(E), we setkfk= sup{|f(x)|, x∈E}. For a finite measureλon (E,E) andf ∈B(E) we shall write hλ, fiforR

f(x) dλ(x) whenever this integral is well defined. Forp≥1 andf ∈B(E), we set kfkLp(λ) = hλ,|f|pi1/p and we define the space Lp(λ) =

f ∈B(E);kfkLp(λ)<+∞ of p-integrable functions with respect to λ. For n∈N, the product spaceEn is endowed with the productσ-field En.

Let (S,S) be a measurable space. LetQbe a probability kernel onS×S, that is: Q(·, A) is measurable for all A∈S, and Q(x,·) is a probability measure on (S,S) for all x∈S. For any f ∈Bb(S), we set forx∈S:

(1) (Qf)(x) =

Z

S

f(y)Q(x,dy).

We define (Qf), or simplyQf, forf ∈B(S) as soon as the integral (1) is well defined, and we have Qf ∈B(S). For n∈N, we denote byQn then-th iterate of Qdefined byQ0 =Id, the identity map onB(S), andQn+1f =Qn(Qf) forf ∈Bb(S).

LetP be a probability kernel onS×S⊗2, that is: P(·, A) is measurable for all A∈S⊗2, and P(x,·) is a probability measure on (S2,S⊗2) for all x∈S. For anyg ∈Bb(S3) andh∈Bb(S2),

(4)

we set forx∈S:

(2) (P g)(x) = Z

S2

g(x, y, z)P(x,dy,dz) and (P h)(x) = Z

S2

h(y, z)P(x,dy,dz).

We define (P g) (resp. (P h)), or simplyP gforg∈B(S3) (resp. P hforh∈B(S2)), as soon as the corresponding integral (2) is well defined, and we have thatP gandP hbelong toB(S).

We now introduce some notations related to the regular binary tree. We set T0 =G0 ={∅}, Gk ={0,1}k andTk =S

0rkGr fork∈N, andT=S

r∈NGr. The setGk corresponds to the k-th generation, Tk to the tree up to the k-th generation, and T the complete binary tree. For i ∈ T, we denote by|i| the generation ofi (|i|= k if and only if i∈ Gk) and iA ={ij;j ∈A} for A⊂T, where ij is the concatenation of the two sequencesi, j ∈T, with the convention that

∅i=i∅=i.

We recall the definition of bifurcating Markov chain from [8].

Definition 2.1. We say a stochastic process indexed byT,X = (Xi, i∈T), is a bifurcating Markov chain (BMC) on a measurable space (S,S) with initial probability distribution ν on (S,S) and probability kernel PonS×S⊗2 if:

- (Initial distribution.) The random variable X is distributed asν.

- (Branching Markov property.) For a sequence(gi, i∈T)of functions belonging toBb(S3), we have for all k≥0,

Eh Y

i∈Gk

gi(Xi, Xi0, Xi1)|σ(Xj;j ∈Tk)i

= Y

i∈Gk

Pgi(Xi).

LetX= (Xi, i∈T) be a BMC on a measurable space (S,S) with initial probability distribution ν and probability kernelP. We define three probability kernelsP0, P1and QonS×S by:

P0(x, A) =P(x, A×S), P1(x, A) =P(x, S×A) for (x, A)∈S×S, and Q=1

2(P0+P1).

Notice thatP0(resp. P1) is the restriction of the first (resp. second) marginal ofPtoS. Following [8], we introduce an auxiliary Markov chainY = (Yn, n∈N) on (S,S) withY0 distributed asX and transition kernel Q. The distribution ofYn corresponds to the distribution ofXI, whereI is chosen independently fromX and uniformly at random in generationGn. We shall writeExwhen X=x(i.e. the initial distributionν is the Dirac mass atx∈S).

We end this section with a useful inequality and the Gaussian BAR model.

Remark 2.2. By convention, forf, g∈B(S), we define the functionf⊗g∈B(S2) by (f⊗g)(x, y) = f(x)g(y) forx, y∈S and introduce the notations:

f ⊗symg= 1

2(f⊗g+g⊗f) and f⊗2=f⊗f.

Notice that P(g⊗sym1) =Q(g) forg∈B+(S). For f ∈B+(S), asf⊗f ≤f2sym1, we get:

(3) P(f⊗2) =P(f ⊗f)≤P(f2sym1) =Q f2 .

Example 2.3 (Gaussian bifurcating autoregressive process).We will consider the real-valued Gauss- ian bifurcating autoregressive process (BAR)X = (Xu, u∈T) where for allu∈T:

(Xu0=a0Xu+b0u0, Xu1=a1Xu+b1u1,

(5)

with a0, a1 ∈ (−1,1), b0, b1 ∈ R and ((εu0, εu1), u ∈ T) an independent sequence of bivariate GaussianN(0,Γ) random vectors independent ofXwith covariance matrix, withσ >0 andρ∈R such that|ρ| ≤σ2:

Γ =

σ2 ρ ρ σ2

.

Then the processX = (Xu, u∈T) is a BMC with transition probabilityPgiven by:

P(x, dy, dz) = 1 2πp

σ4−ρ2 exp

− σ2

2(σ4−ρ2)g(x, y, z)

dydz,

with

g(x, y, z) = (y−a0x−b0)2−2ρσ2(y−a0x−b0)(z−a1x−b1) + (z−a1x−b1)2. The transition kernelQof the auxiliary Markov chain is defined by:

Q(x, dy) = 1 2√

2πσ2

e(ya0xb0)2/2σ2+ e(ya1xb1)2/2σ2 dy.

2.2. Assumptions. We assume thatµis an invariant probability measure forQ.

We state first some regularity assumptions on the kernelsPandQand the invariant measureµ we will use later on. Notice first that by Cauchy-Schwartz we have forf, g∈L4(µ):

|P(f ⊗g)|2≤P(f2⊗1)P(1⊗g2)≤4Q(f2)Q(g2), so that, asµis an invariant measure ofQ:

(4) kP(f⊗g)kL2(µ)≤2kQ(f2)k1/2L2(µ)kQ(g2)k1/2L2(µ)≤2kfkL4(µ)kgkL4(µ), and similarly forf, g∈L2(µ):

(5) hµ,P(f⊗g)i ≤2kfkL2(µ) kgkL2(µ).

We shall in fact assume thatP(in fact only its symmetrized version) is in a sense anL2(µ) operator, see also Remark 2.5below.

Assumption 2.4. There exists an invariant probability measure, µ, for the Markov transition kernel Q.

(i) There exists a finite constantM such that for all f, g, h∈L2(µ):

kP(Qf ⊗symQg)kL2(µ)≤MkfkL2(µ)kgkL2(µ), (6)

kP(P(Qf ⊗symQg)⊗symQh)kL2(µ)≤MkfkL2(µ)kgkL2(µ)khkL2(µ), (7)

kP(f ⊗symQg)kL2(µ)≤MkfkL4(µ)kgkL2(µ). (8)

(ii) There existsk0∈N, such that the probability measureνQk0 has a bounded density, sayν0, with respect toµ. That is:

νQk0(dy) =ν0(y)µ(y)dy and kν0k<+∞.

Remark 2.5. Let µbe an invariant probability measure ofQ. If there exists a finite constant M such that for allf, g∈L2(µ):

(9) kP(f⊗g)kL2(µ)≤MkfkL2(µ)kgkL2(µ),

then we deduce that (6), (7) and (8) hold. Condition (9) is much more natural and simpler than the latter ones, and it allows to give shorter proofs. However Condition (9) appears to be too strong even in the simplest case of the symmetric BAR model developed in Example 2.3 with

(6)

a0=a1andb0=b1. Letadenote the common value ofa0anda1. In fact, according to the value ofa∈(−1,1) in the symmetric BAR model, there existsk1∈Nsuch that for allf, g∈L2(µ) (10) kP(Qk1f⊗Qk1g)kL2(µ)≤MkfkL2(µ)kgkL2(µ),

withk1increasing with|a|. Since Assumption2.4(i) is only necessary for the asymptotic normality in the case|a| ∈[0,1/√

2] (corresponding to the sub-critical and critical regime), it will be enough to consider k1= 1 (but not sufficient to considerk1= 0). For this reason, we consider (6), that is (10) with k1 = 1. A similar remark holds for (7) and (8). In a sense Condition (10) (and similar extensions of (7) and (8)) is in the same spirit as item (ii) of Assumption2.4: ones use iterates of Qto get smoothness on the kernelPand the initial distribution ν.

Remark 2.6. Letµ be an invariant probability measure ofQand assume that the transition ker- nel P has a density, denoted by p, with respect to the measure µ2, that is: P(x, dy, dz) = p(x, y, z)µ(dy)µ(dz) for all x ∈ S. Then the transition kernel Q has a density, denoted by q, with respect to µ, that is: Q(x, dy) =q(x, y)µ(dy) for allx∈S withq(x, y) = 21R

S(p(x, y, z) + p(x, z, y))µ(dz). We set:

(11) h(x) =

Z

S

q(x, y)2µ(dy) 1/2

.

Assume that:

kP(h⊗2)kL2(µ)<+∞, (12)

kP(P(h⊗2)⊗symh)kL2(µ)<+∞, (13)

and that there exists a finite constantC such that for allf ∈L4(µ):

(14) kP(f ⊗symh)kL2(µ)≤C kfkL4(µ).

Since|Qf| ≤ kfkL2(µ)h, we deduce that (12), (13) and (14) imply respectively (6), (7) and (8).

We consider the following ergodic properties of Q, which in particular implies thatµis indeed the unique invariant probability measure forQ. We refer to [7] Section 22 for a detailed account on L2(µ)-ergodicity (and in particular Definition 22.2.2 on exponentially convergent Markov kernel).

Assumption 2.7. The Markov kernel Qhas an (unique) invariant probability measure µ, andQ is L2(µ) exponentially convergent, that is there exists α ∈ (0,1) and M finite such that for all f ∈L2(µ):

(15) kQnf− hµ, fikL2(µ)≤M αnkfkL2(µ) for alln∈N.

We consider the stronger ergodic property based on a second spectral gap. (Notice in particular that Assumption2.8implies Assumption2.7.)

Assumption 2.8. The Markov kernelQhas an (unique) invariant probability measureµ, and there exists α∈(0,1), a finite non-empty set J of indices, distinct complex eigenvalues {αj, j ∈J} of the operator Q with |αj| = α, non-zero complex projectors {Rj, j ∈ J} defined on CL2(µ), the C-vector space spanned byL2(µ), such that Rj◦Rj0=Rj0◦Rj= 0for allj6=j0 (so thatP

j∈JRj

is also a projector defined on CL2(µ)) and a positive sequence (βn, n∈ N) converging to 0 such that for all f ∈L2(µ), withθjj/α:

(16) kQnf− hµ, fi −αnX

jJ

θnjRj(f)kL2(µ)≤βnαnkfkL2(µ) for alln∈N.

(7)

Assumptions2.7and2.8stated in anL2framework corresponds to [4, Assumptions 2.4 and 2.6]

stated in a pointwise framework. The structural Assumption2.4on the transition kernelPreplace the structural [4, Assumptions 2.2] on the set of considered functions.

Remark 2.9. Assume thatQ has a density q with respect to an invariant probability measure µ such thath∈L2(µ), wherehis defined in (11), that is:

Z

S2

q(x, y)2µ(dx)µ(dy)<+∞.

Then the operator Qis a non-negative Hilbert-Schmidt operator (and then a compact operator) on L2(µ). It is well known that in this case, except for the possible value 0, the spectrum ofQ is equal to the setσp(Q) of eigenvalues of Q; σp(Q) is a countable set with 0 as the only possible accumulation point and for allλ∈σp(Q)\ {0}, the eigenspace associated toλis finite-dimensional (we refer for e.g. to [2, chap. 4] for more details). In particular, if 1 is the only eigenvalue of Q with modulus 1 and if it has multiplicity 1 (that is the corresponding eigenspace is reduced to the constant functions), then Assumptions2.7and 2.8also hold. Let us mention thatq(x, y)>0 µ(dx)⊗µ(dy)-a.s. is a standard condition which implies that 1 is the only eigenvalue ofQ with modulus 1 and that it has multiplicity 1, see for example [1].

2.3. Notations for average of different functions over different generations. Let X = (Xu, u∈T) be a BMC on (S,S) with initial probability distributionν, and probability kernelP.

RecallQis the induced Markov kernel. We shall assume thatµis an invariant probability measure ofQ. For a finite setA⊂Tand a functionf ∈B(S), we set:

MA(f) =X

iA

f(Xi).

We shall be interested in the casesA=Gn (then-th generation) and A=Tn (the tree up to the n-th generation). We recall from [8, Theorem 11 and Corollary 15] that under geometric ergodicity assumption, we have for f a continuous bounded real-valued function defined on S, the following convergence inL2(µ) (resp. a.s.):

(17) lim

n→∞|Gn|−1MGn(f) =hµ, fi and lim

n→∞|Tn|−1MTn(f) =hµ, fi.

Using Lemma 5.1and the Borel-Cantelli Theorem, one can prove that we also have (17) with the L2(µ) and a.s. convergences under Assumptions2.4-(ii) and2.7.

We shall now consider the corresponding fluctuations. We will use frequently the following notation:

f˜=f− hµ, fi forf ∈L1(µ).

Recall that forf ∈L1(µ), we set ˜f =f− hµ, fi. In order to study the asymptotics ofMGn−`( ˜f), we shall consider the contribution of the descendants of the individuali∈Tn−` forn≥`≥0:

(18) Nn,i` (f) =|Gn|−1/2MiGn−|i|−`( ˜f),

whereiGn−|i|−`={ij, j∈Gn−|i|−`} ⊂Gn`. For allk∈Nsuch that n≥k+`, we have:

MGn−`( ˜f) =p

|Gn| X

i∈Gk

Nn,i` (f) =p

|Gn|Nn,`(f).

(8)

Letf= (f`, `∈N) be a sequence of elements ofL1(µ). We set forn∈Nandi∈Tn:

(19) Nn,i(f) =

n−|i|

X

`=0

Nn,i` (f`) =|Gn|−1/2

n−|i|

X

`=0

MiGn−|i|−`( ˜f`).

We deduce thatP

i∈GkNn,i(f) =|Gn|1/2Pnk

`=0 MGn−`( ˜f`) which gives fork= 0:

(20) Nn,∅(f) =|Gn|−1/2

n

X

`=0

MGn−`( ˜f`).

The notationNn, means that we consider the average from the root∅ to then-th generation.

Remark 2.10. We shall consider in particular the following two simple cases. Letf ∈L1(µ) and consider the sequencef= (f`, `∈N). Iff0=f andf`= 0 for `∈N, then we get:

Nn,(f) =|Gn|1/2MGn( ˜f).

Iff`=f for`∈N, then we shall writef = (f, f, . . .), and we get, as|Tn|= 2n+1−1 and|Gn|= 2n: Nn,(f) =|Gn|1/2MTn( ˜f) =√

2−2n |Tn|1/2MTn( ˜f).

Thus, we will deduce the fluctuations of MTn(f) andMGn(f) from the asymptotics ofNn,(f).

Because of condition (ii) in Assumption 2.4which roughly state that after k0 generations, the distribution of the induced Markov chain is absolutely continuous with respect to the invariant measure µ, it is better to consider only generationsk≥k0 for somek0∈Nand thus remove the firstk0−1 generations in the quantityNn,∅(f) defined in (20).

To study the asymptotics ofNn,(f), it is convenient to write for n≥k≥1:

(21) Nn,(f) =|Gn|1/2

k−1

X

r=0

MGr( ˜fn−r) +X

i∈Gk

Nn,i(f).

Iff = (f, f, . . .) is the infinite sequence of the same functionf, this becomes:

Nn,∅(f) =|Gn|−1/2MTn( ˜f) =|Gn|−1/2MTk−1( ˜f) + X

i∈Gk

Nn,i(f).

3. Main results

3.1. The sub-critical case: 2α2 < 1. We shall consider, when well defined, for a sequence f= (f`, `∈N) of measurable real-valued functions defined onS, the quantities:

(22) Σsub(f) = Σsub1 (f) + 2Σsub2 (f), where:

Σsub1 (f) =X

`0

2−`hµ,f˜`2i+ X

`0, k0

2k−`hµ,P

(Qk`)⊗2 i, (23)

Σsub2 (f) = X

0≤`<k

2`hµ,f˜kQk``i+ X

0≤`<k r0

2r`hµ,P

QrksymQk`+r`

i. (24)

The proof of the next result is detailed in Section5.

(9)

Theorem 3.1. Let X be a BMC with kernel P and initial distributionν such that Assumptions 2.4and2.7are in force with α∈(0,1/√

2). We have the following convergence in distribution for all sequencef= (f`, `∈N)bounded inL4(µ)(that is sup`∈Nkf`kL4(µ)<+∞):

Nn,∅(f) −−−−→n→∞(d) G,

where G is centered Gaussian random variable with variance Σsub(f) given by (22) which is well defined and finite.

Notice that the variance Σsub(f) already appears in the sub-critical pointwise approach case, see [4, (15) and Theorem 3.1]. Then, arguing similarly as in [4, Section 3.1], we deduce that if Assumptions2.4and2.7are in force withα∈(0,1/√

2), then forf ∈L4(µ), we have the following convergence in distribution:

(25) |Gn|1/2MGn( ˜f) −−−−→n(d)

→∞ G1 and |Tn|1/2MTn( ˜f) −−−−→n(d)

→∞ G2,

where G1 and G2 are centered Gaussian random variables with respective variances ΣsubG (f) = Σsub(f), withf= (f,0,0, . . .), and ΣsubT (f) = Σsub(f)/2 withf = (f, f, . . .), given in [4, Corollary 3.3] which are well defined and finite.

3.2. The critical case: 2α2 = 1. In the critical case α = 1/√

2, we shall denote by Rj the projector on the eigen-space associated to the eigenvalueαj with αjjα, |θj|= 1 and forj in the finite set of indices J. SinceQis a real operator, we get that ifαj is a non real eigenvalue, so isαj. We shall denote by Rj the projector associated toαj. Recall that the sequence (βn, n∈N) in Assumption2.8is non-increasing and bounded from above by 1. For all measurable real-valued functionf defined onS, we set, when this is well defined:

(26) fˆ= ˜f −X

jJ

Rj(f) with f˜=f− hµ, fi.

We shall consider, when well defined, for a sequence f = (f`, ` ∈ N) of measurable real-valued functions defined onS, the quantities:

(27) Σcrit(f) = Σcrit1 (f) + 2Σcrit2 (f), where:

Σcrit1 (f) =X

k0

2khµ,Pfk,k i=X

k0

2kX

jJ

hµ,P(Rj(fk)⊗symRj(fk))i, (28)

Σcrit2 (f) = X

0`<k

2(k+`)/2hµ,Pfk,` i, (29)

with, fork, `∈N:

fk,` =X

j∈J

θ`−kj Rj(fk)⊗symRj(f`).

Notice thatfk,` =f`,k and thatfk,` is real-valued asθ`−kj Rj(fk)⊗Rj(f`) =θ`−kj0 Rj0(fk)⊗Rj0(f`) forj0 such thatαj0j and thusRj0 =Rj.

The technical proof of the next result is omitted as it is an adaptation of the proof of Theorem 3.1 in the sub-critical space in the same spirit as [4, Theorem 3.4] (critical case) is an adaptation of the proof of [4, Theorem 3.1] (sub-critical case). The interested reader can find the details in [3].

(10)

Theorem 3.2. Let X be a BMC with kernel P and initial distributionν such that Assumptions 2.4 (withk0∈N),2.7 and2.8are in force with α= 1/√

2. We have the following convergence in distribution for all sequencef= (f`, `∈N)bounded inL4(µ)(that is sup`∈Nkf`kL4(µ)<+∞):

n1/2Nn,(f) −−−−→n(d)→∞ G,

where Gis centered Gaussian random variable with variance Σcrit(f) given by (27), which is well defined and finite.

Notice that the variance Σcrit(f) already appears in the critical pointwise approach case, see [4, (20) and Theorem 3.4]. Then, arguing similarly as in [4, Section 3.2], we deduce that if Assumptions 2.4(withk0∈N),2.7and2.8are in force withα= 1/√

2, then forf ∈L4(µ), we have the following convergence in distribution:

(30) (n|Gn|)1/2MGn( ˜f) −−−−→n(d)→∞ G1, and (n|Tn|)1/2MTn( ˜f) −−−−→n→∞(d) G2,

where G1 and G2 are centered Gaussian random variables with respective variances ΣcritG (f) = Σcrit(f), withf= (f,0,0, . . .), and ΣcritT (f) = Σcrit(f)/2 withf = (f, f, . . .), given in [4, Corollary 3.6] which are well defined and finite.

3.3. The super-critical case 2α2 >1. We consider the super-critical case α∈(1/√

2,1). This case is very similar to the super-critical case in the pointwise approach, see [4, Section 3.3]. So we only mention the most interesting results without proof. The interested reader can find the details in [3].

We shall assume that Assumptions 2.4 (ii) and 2.8 hold. In particular we do not assume Assumption2.8 (i). Recall (16) with the eigenvalues{αjjα, j ∈J} ofQ, with modulus equal toα(i.e. |θj|= 1) and the projectorRjon the eigen-space associated to eigenvalueαj. Recall that the sequence (βn, n∈N) in Assumption2.8can (and will) be chosen non-increasing and bounded from above by 1. We shall consider the filtration H= (Hn, n∈) defined byHn =σ(Xi, i∈Tn).

The next lemma exhibits martingales related to the projectorRj.

Lemma 3.3. Let X be a BMC with kernel P and initial distribution ν such that Assumptions 2.4 (ii) and 2.8 are in force withα∈(1/√

2,1) in (16). Then, for all j ∈J andf ∈L2(µ), the sequence Mj(f) = (Mn,j(f), n∈N), with

Mn,j(f) = (2αj)−nMGn(Rj(f)),

is aH-martingale which converges a.s. and in L2(ν)to a random variable, sayM,j(f).

The next result corresponds to [4, Corollary 3.13] in the pointwise approach.

Corollary 3.4. Let X be a BMC with kernel Pand initial distribution ν such that Assumptions 2.4(ii) and2.8are in force withα∈(1/√

2,1)in (16). Assumeαis the only eigen-value ofQwith modulus equal to α(and thusJ is reduced to a singleton, say {j0}), then we have forf ∈L2(µ):

(2α)nMGn( ˜f)−−−−→n→∞P M(f) and (2α)nMTn( ˜f)−−−−→n→∞P

2α−1M∞,j0(f), whereM,j0(f)is the random variable defined in Lemma3.3.

(11)

4. Application to the study of symmetric BAR

4.1. Symmetric BAR. We consider a particular case from [6] of the real-valued bifurcating autoregressive process (BAR) from Example2.3. We keep the same notations. Leta∈(−1,1) and assume that a=a0 =a1, b0=b1= 0 andρ= 0. In this particular case the BAR has symmetric kernel as:

P(x, dy, dz) =Q(x, dy)Q(x, dz).

We haveQf(x) =E[f(ax+σG)] and more generallyQnf(x) =E

f anx+√

1−a2nσaG , where Gis a standardN(0,1) Gaussian random variable and σa =σ(1−a2)−1/2. The kernel Qadmits a unique invariant probability measureµ, which isN(0, σ2a) and whose density, still denoted byµ, with respect to the Lebesgue measure is given by:

µ(x) =

√1−a2

√2πσ2 exp

−(1−a2)x22

.

The density p(resp. q) of the kernelP(resp. Q) with respect toµ2(resp. µ) are given by:

p(x, y, z) =q(x, y)q(x, z) and

q(x, y) = 1

√1−a2exp

−(y−ax)2

2 +(1−a2)y22

= 1

√1−a2e(a2y2+a2x22axy)/2σ2. Notice that q is symmetric. The operator Q (in L2(µ)) is a symmetric integral Hilbert-Schmidt operator whose eigenvalues are given by σp(Q) = (an, n∈ N), their algebraic multiplicity is one and the corresponding eigen-functions (¯gn(x), n∈N) are defined forn∈Nby :

¯

gn(x) =gn σ−1a x ,

where gn is the Hermite polynomial of degreen(g0= 1 andg1(x) =x). LetRbe the orthogonal projection on the vector space generated by ¯g1, that isRf =hµ, fg¯1ig¯1or equivalently, forx∈R:

(31) Rf(x) =σa1xE[Gf(σaG)].

Recallhdefined (11). It is not difficult to check that:

h(x) = (1−a4)−1/4exp

a2(1−a2) 1 +a2

x22

forx∈R, and h ∈L2(µ) (that is R

R2q(x, y)2µ(x)µ(y)dxdy < +∞). Using elementary computations, it is possible to check that Qh ∈ L4(µ) if and only if |a| < 31/4 (whereas h ∈ L4(µ) if and only if

|a| < 3−1/2). As P is symmetric, we get P(h⊗2) ≤ (Qh)2 and thus (12) holds for |a| < 3−1/4. We also get, using Cauchy-Schwartz inequality, that kP(f ⊗symh)kL2(µ) = k(Qf)(Qh)kL2(µ) ≤ kfkL4(µ) kQ(h)kL4(µ), and thus (14) holds for |a|< 31/4. Some elementary computations give that (13) also holds for|a| ≤0.724 (but (13) fails for|a| ≥0.725). (Notice that 21/2<0.724<

31/4.) As a consequence of Remark2.6, if |a| ≤0.724, then (6)-(8) are satisfied and thus (i) of Assumption2.4holds.

Notice thatνQkis the probability distribution ofakXa

1−a2kG, withGaN(0,1) random variable independent ofX. So property (ii) of Assumption2.4holds in particular ifν has compact support (withk0 = 1) or ifν has a density with respect to the Lebesgue measure, which we still denote by ν, such that kν/µk is finite (with k0 ∈ N). Notice that if ν is the probability distribution of N(0, ρ20), then ρ0 > σa (resp. ρ0 ≤ σa) implies that (ii) of Assumption 2.4 fails (resp. is satisfied).

(12)

Using that (¯gn/√

n!, n∈N) is an orthonormal basis ofL2(µ) and Parseval identity, it is easy to check that Assumption 2.8holds withJ ={j0},αj0 =α=a,βn=an andRj0 =R.

4.2. Numerical studies: illustration of phase transitions for the fluctuations. We con- sider the symmetric BAR model from Section4.1witha=α∈(0,1). Recallαis an eigenvalue with multiplicity one, and we denote byRthe orthogonal projection on the one-dimensional eigenspace associated to α. The expression ofRis given in (31).

In order to illustrate the effects of the geometric rate of convergence α on the fluctuations, we plot for An ∈ {Gn,Tn}, the slope, say bα,n, of the regression line log(Var(|An|1MAn(f))) versus log(|An|) as a function of the geometric rate of convergenceα. In the classical cases (e.g.

Markov chains), the points are expected to be distributed around the horizontal liney=−1. For n large, we have log(|An|) 'nlog(2) and, for the symmetric BAR model, convergences (25) for α < 1/√

2, (30) for α = 1/√

2, and Corollary 3.4 for α > 1/√

2 yields that bα,n ' h1(α) with h1(α) = log(α2∨2−1)/log(2) as soon as the limiting Gaussian random variable in (25) and (30) or M(f) in Corollary 3.4is non-zero.

For our illustrations, we consider the empirical moments of orderp∈ {1, . . . ,4}, that is we use the functionsf(x) = xp. As we can see in Figures 1and 2, these curves present two trends with a phase transition around the rate α= 1/√

2 for p∈ {1,3} and around the rate α2 = 1/√ 2 for p∈ {2,4}. For convergence rates α∈(0,1/√

2), the trend is similar to that of classic cases. For convergence rates α∈(1/√

2,1), the trend differs to that of classic cases. One can observe that the slope bα,n increases with the value of geometric convergence rate α. We also observe that for α > 1/√

2, the empirical curves agrees with the graph of h1(α) = log(α2∨2−1)/log(2) for f(x) = xp when pis odd, see Figure 1. However, the empirical curves does not agree with the graph ofh1forf(x) =xpwhenpis even, see Figure2, but it agrees with the graph of the function h2(α) = log(α4∨2−1)/log(2). This is due to the fact that for p even, the function f(x) = xp belongs to the kernel of the projectorR(which is clear from formula (31)), and thusM(f) = 0.

In fact, in those two cases, one should take into account the projection on the eigenspace associated to the third eigenvalue, which in this particular case is equal toα2. Intuitively, this indeed give a rate of orderh2. Therefore, the normalization given forf(x) =xp whenpeven, is not correct.

5. Proof of Theorem3.1

In the following proofs, we will denote byC any unimportant finite constant which may vary from line to line (in particular Cdoes not depend onnnor on f).

Let (pn, n∈N) be a non-decreasing sequence of elements ofN such that, for allλ >0:

(32) pn < n, lim

n→∞pn/n= 1 and lim

n→∞n−pn−λlog(n) = +∞. When there is no ambiguity, we writepforpn.

Leti, j∈T. We write i4j ifj∈iT. We denote byi∧j the most recent common ancestor of i and j, which is defined as the onlyu∈ Tsuch that ifv ∈T andv 4i, v 4j then v 4u. We also define the lexicographic order i ≤j if eitheri 4j or v0 4i and v1 4j for v =i∧j. Let X = (Xi, i∈T) be aBM Cwith kernel Pand initial measure ν. Fori∈T, we define theσ-field:

Fi={Xu;u∈Tsuch thatu≤i}. By construction, theσ-fields (Fi;i∈T) are nested asFi⊂Fj fori≤j.

(13)

0.0 0.2 0.4 0.6 0.8 1.0

-1.0-0.8-0.6-0.4-0.20.0

Study of |G15|−1MG15(x)

Ergodicity rateα Slopebα,n

Theoretical

Empirical on 1000 trees

0.0 0.2 0.4 0.6 0.8 1.0

-1.0-0.8-0.6-0.4-0.20.0

Study of|G15|−1MG15(x3)

Ergodicity rateα Slopebα,n

Theoretical

Empirical on 1000 trees

0.0 0.2 0.4 0.6 0.8 1.0

-1.0-0.8-0.6-0.4-0.20.0

Study of |T15|−1MT15(x)

Ergodicity rateα Slopebα,n

Theoretical

Empirical on 1000 trees

0.0 0.2 0.4 0.6 0.8 1.0

-1.0-0.8-0.6-0.4-0.20.0

Study of |T15|−1MT15(x3)

Ergodicity rateα Slopebα,n

Theoretical

Empirical on 1000 trees

Figure 1. Slope bα,n (empirical mean and confidence interval in black) of the regression line log(Var(|An|1MAn(f)))versus log(|An|) as a function of the geo- metric ergodic rateα, forn= 15,An ∈ {Gn,Tn} andf(x) =xp withp∈ {1,3}. In this case, we have R(f) 6= 0, where R is the projector defined from formula (31). One can see that the empirical curve (in black) is close to the graph (in red) of the functionh1(α) = log(α2∨2−1)/log(2) forα∈(0,1).

We define forn∈N,i∈Gn−pn andf∈FNthe martingale increments:

(33) ∆n,i(f) =Nn,i(f)−E[Nn,i(f)|Fi] and ∆n(f) = X

i∈Gn−pn

n,i(f).

(14)

0.0 0.2 0.4 0.6 0.8 1.0

-1.0-0.8-0.6-0.4-0.20.0

Study of |G15|−1MG15(x2)

Ergodicity rateα Slopebα,n

log(α22−1)/log(2) log(α42−1)/log(2) Empirical on 1000 trees

0.0 0.2 0.4 0.6 0.8 1.0

-1.0-0.8-0.6-0.4-0.20.0

Study of|G15|−1MG15(x4)

Ergodicity rateα Slopebα,n

log(α22−1)/log(2) log(α42−1)/log(2) Empirical on 1000 trees

0.0 0.2 0.4 0.6 0.8 1.0

-1.0-0.8-0.6-0.4-0.20.0

Study of |T15|−1MT15(x2)

Ergodicity rateα Slopebα,n

log(α22−1)/log(2) log(α42−1)/log(2) Empirical on 1000 trees

0.0 0.2 0.4 0.6 0.8 1.0

-1.0-0.8-0.6-0.4-0.20.0

Study of |T15|−1MT15(x4)

Ergodicity rateα Slopebα,n

log(α22−1)/log(2) log(α42−1)/log(2) Empirical on 1000 trees

Figure 2. Slope bα,n (empirical mean and confidence interval in black) of the regression line log(Var(|An|1MAn(f)))versus log(|An|) as a function of the geo- metric ergodic rateα, forn= 15,An ∈ {Gn,Tn} andf(x) =xp withp∈ {2,4}. In this case, we have R(f) = 0, where R is the projector defined from formula (31). One can see that the empirical curve (in black) does not agree with the graph (dash line in red) of the functionh1(α) = log(α2∨2−1)/log(2) for 2α2>1;

but it is close to the graph (in blue) of the functionh2(α) = log(α4∨2−1)/log(2) forα∈(0,1).

Thanks to (19), we have:

X

i∈Gn−pn

Nn,i(f) =|Gn|1/2

pn

X

`=0

MGn−`( ˜f`) =|Gn|1/2

n

X

k=npn

MGk( ˜fn−k).

(15)

Using the branching Markov property, and (19), we get fori∈Gn−pn: E[Nn,i(f)|Fi] =E[Nn,i(f)|Xi] =|Gn|1/2

pn

X

`=0

EXi

hMGpn−`( ˜f`)i .

We deduce from (21) withk=n−pn that:

(34) Nn,(f) = ∆n(f) +R0(n) +R1(n), with

(35) R0(n) =|Gn|−1/2

npn1

X

k=0

MGk( ˜fnk) and R1(n) = X

i∈Gn−pn

E[Nn,i(f)|Fi].

We first state a very useful Lemma which holds in sub-critical, critical and super-critical cases.

Lemma 5.1. LetX be a BMC with kernelPand initial distributionν such that (ii) from Assump- tion2.4(withk0∈N) is in force. There exists a finite constantC, such that for allf ∈B+(S)all n≥k0, we have:

(36) |Gn|1E[MGn(f)]≤CkfkL1(µ) and |Gn|1E

MGn(f)2

≤C

n

X

k=0

2kkQkfk2L2(µ). Proof. Using the first moment formula (73), (ii) from Assumption 2.4 and the fact that µ is invariant forQ, we get that:

|Gn|1E[MGn(f)] =hν,Qnfi ≤ kν0khµ,Qnk0fi=kν0khµ, fi. We also have:

|Gn|1E

MGn(f)2

=hν,Qn(f2)i+

n−1

X

k=0

2khν,Qnk1 P(Qkf⊗2 i

≤ hν,Qn(f2)i+

n1

X

k=0

2khν,Qn−k (Qkf)2 i

≤ hν,Qn(f2)i+

nk0

X

k=0

2khν,Qn−k (Qkf)2 i+

n1

X

k=nk0+1

2khν,Qk0 (Qn−k0f)2 i

≤C

n−k0

X

k=0

2kkQkfk2L2(µ),

where we used the second moment formula (74) for the equality, (3) for the first inequality, Jensen inequality for the second, and (ii) from Assumption 2.4and the fact that µis invariant for Qfor

the last.

We set fork∈N:

(37) ck(f) = sup

n∈NkfnkLk(µ) and qk(f) = sup

n∈NkQ(fnk)k1/k .

We will denote by C any unimportant finite constant which may vary from line to line (but in particularC does not depend onnnor onf, but may depends on k0 andkν0k).

(16)

Remark 5.2. Recallk0 given in Assumption2.4(ii). Letf= (f`, `∈N) be a bounded sequence in L4(µ). We have

(38) Nn,∅(f) =Nn,[k0](f) + |Gn|−1/2

k01

X

`=0

MG`( ˜fn`), where we set:

(39) Nn,∅[k0](f) =|Gn|−1/2

nk0

X

`=0

MGn−`( ˜f`).

Using the Cauchy-Schwartz inequality, we get (40) |Gn|1/2|

k0−1

X

`=0

MG`( ˜fn−`)| ≤Cc2(f)|Gn|1/2 +|Gn|1/2

k0−1

X

`=0

MG`(|fn−`|).

Since the sequencefis bounded inL4(µ) and sincek0 is finite, we have, for all`∈ {0, . . . , k0−1}, limn→∞|Gn|1/2MG`(|fn`|) = 0 a.s. and then that (used (40))

nlim→∞|Gn|1/2|

k01

X

`=0

MG`( ˜fn`)|= 0 a.s.

Therefore, from (38), the study ofNn,(f) is reduced to that ofNn,∅[k0](f).

Recall (pn, n∈N) is such that (32) holds. Assume thatnis large enough so thatn−pn−1≥k0. We have:

Nn,∅[k0](f) = ∆n(f) +Rk00(n) +R1(n), where ∆n(f) andR1(n) are defined in (33) and (35), and :

Rk00(n) =|Gn|1/2

npn1

X

k=k0

MGk( ˜fn−k).

Lemma 5.3. Under the assumptions of Theorem3.1, we have the following convergence:

nlim→∞E[Rk00(n)2] = 0.

Proof. Assumen−p≥k0. We write:

Rk00(n) =|Gn|1/2

np1

X

k=k0

X

i∈Gk0

MiGk−k0( ˜fnk).

We have thatP

i∈Gk0E[MiGk−k0( ˜fnk)2] =E[MGk0(hk,n)], where:

hk,n(x) =Ex[MGk−k0( ˜fn−k)2].

We deduce from (ii) from Assumption 2.4, see (36), that E[MGk0(hk,n)] ≤ Chµ, hk,ni. We have also that:

hµ, hk,ni=Eµ[MGk−k0( ˜fnk)2]≤C2k

k

X

`=0

2`kQ`nkk2L2(µ)≤C2kc22(f)

k

X

`=0

2`α2`≤C2kc22(f), where we used (36) for the first inequality (notice one can takek0= 0 in this case as we consider the expectation Eµ), (15) in the second, and 2α2<1 in the last. We deduce that:

Références

Documents relatifs

We define the kernel estimator and state the main results on the estimation of the density of µ, see Lemma 3.5 (consistency) and Theorem 3.6 (asymptotic normality), in Section 3.1.

The literature on approximations of diffusions with atoms in the speed measure is scarce. We remark that [1] provides a sequence of random walks that converges in distribution to

To check Conditions (A4) and (A4’) in Theorem 1, we need the next probabilistic results based on a recent version of the Berry-Esseen theorem derived by [HP10] in the

The goal of this paper is to show that the Keller-Liverani theorem also provides an interesting way to investigate both the questions (I) (II) in the context of geometrical

As an exception, let us mention the result of Adamczak, who proves in [Ada08] subgaussian concentration inequalities for geometrically ergodic Markov chains under the

The quantities g a (i, j), which appear in Theorem 1, involve both the state of the system at time n and the first return time to a fixed state a. We can further analyze them and get

Consequently, if ξ is bounded, the standard perturbation theory applies to P z and the general statements in Hennion and Herv´e (2001) provide Theorems I-II and the large

It is possible to develop an L 2 (µ) approach result similar to Corollary 3.4. Preliminary results for the kernel estimation.. Statistical applications: kernel estimation for