CENTRAL LIMIT THEOREM FOR BIFURCATING MARKOV CHAINS S. VAL`ERE BITSEKI PENDA AND JEAN-FRANC¸ OIS DELMAS Abstract.

(1)

S. VAL`ERE BITSEKI PENDA AND JEAN-FRANC¸ OIS DELMAS

Abstract. Bifurcating Markov chains (BMC) are Markov chains indexed by a full binary tree representing the evolution of a trait along a population where each individual has two children.

We provide a central limit theorem for general additive functionals of BMC, and prove the existence of three regimes. This corresponds to a competition between the reproducing rate (each individual has two children) and the ergodicity rate for the evolution of the trait. This is in contrast with the work of Guyon (2007), where the considered additive functionals are sums of martingale increments, and only one regime appears. Our result can be seen as a discrete time version, but with general trait evolution, of results in the time continuous setting of branching particle system from Adamczak and Mi lo´s (2015), where the evolution of the trait is given by an Ornstein-Uhlenbeck process.

Keywords: Bifurcating Markov chains, tree indexed Markov chain, binary trees, central limit theorem.

Mathematics Subject Classification (2020): 60J05, 60F05, 60J80.

1. Introduction

Bifurcating Markov chains are a class of stochastic processes indexed by regular binary tree and which satisfy the branching Markov property (see below for a precise definition). This model represents the evolution of a trait along a population where each individual has two children. To the best of our knowledge, the term bifurcating Markov chain (BMC) appears for the first time in the work of Basawa and Zhou [4]. But, it was Guyon who, in [17], highlighted and developed a theory of asymmetric bifurcating Markov chains. Since the works of Guyon, BMC theory has been enriched from probabilistic and statistical point of view and several extensions and models using BMC have been studied; we can cite the works (see also the references therein) of Bercu, de Saporta & G´egout-Petit [5], Delmas & Marsalle [14], Bitseki, Djellout & Guillin [8], Bitseki, Hoffmann & Olivier [9], Doumic, Hoffmann, Krell & Robert [16], Bitseki & Olivier [10, 11] and Hoffmann & Marguet [19].

The recent study of BMC models was motivated by the understanding of the cell division mechanism (where the trait of an individual is given by its growth rate). The first model of BMC, named “symmetric” bifurcating auto-regressive process (BAR) were introduced by Cowan

& Staudte [13] in order to analyze cell lineage data. Since the works of Cowan and Staudte, many extensions of their model were studied in Markovian and non-Markovian setting (see for e.g. [10]

and references therein). In particular, in [17], Guyon has studied “asymmetric” BAR in order to prove statistical evidence of aging in Escherichia Coli, giving a new approach to the problem studied in [23]. Let us also note that BMC have been used recently in several statistical works to study the estimator of the cell division rate [16, 9, 19]. Moreover, another studies, such as [15], can be generalized using the BMC theory (we refer to the conclusion therein).

1

(2)

In this paper, our objective is to establish a central limit theorem for additive functionals of BMC. With respect to this objective, notice that asymptotic results for BMC have been studied in [17] (law of large numbers and central limit theorem) and in [8] (moderate deviations principle and strong law of large numbers). See [14] for the law of large numbers and central limit theorem for BMC on Galton-Watson tree. Notice also that recently, limit theorems, in particular law of large numbers, has been studied for branching Markov process, see [20] and [12], and that large values of parameters in stable BAR process allows to exhibit two regimes, see [3]. However, the central limit theorems which appear in [17, 5, 14] have been done for additive functionals using increments of martingale, which implies in particular that the functions considered depend on the traits of the mother and its two daughters. The study of the case where the functions depend only on the trait of a single individual has not yet been treated for BMC (in this case it is not useful to solve the Poisson equation and to write additive functional as sums of martingale increments as the error term on the last generation is not negligible in general). For such functions, the central limit theorems have been studied recently for branching Markov processes and for superprocesses [1, 21, 22, 24]. Our results can be seen as a discrete version of those given in the previous works, but with general ergodic hypothesis on the evolution of the trait. Unlike the results given in [17, 5, 14], we observe three regimes (sub-critical, critical and super-critical), which correspond to a competition between the reproducing rate (here a mother has two daughters) and the ergodicity rate for the evolution of the trait along a lineage taken uniformly at random. This phenomenon already appears in the works of Athreya [2]. For BMC models, we stress that the three regimes already appears for moderate deviations and deviation inequalities in [8, 7, 6].

We follow the approach of [17, 14] and consider ergodic theorem with respect to the point- wise convergence. However, unlike the latter papers, we provide a different normalization for the fluctuations according to the regime being critical, sub-critical and super-critical, see respectively Corollaries 3.3, 3.6 and 3.13. We shall explicit in a forthcoming paper, that those results allow to recover the one regime result from [17] for additive functionals given by a sum of martingale increments.

The paper is organized as follows. We introduce the BMC model in Section 2.1 and consider the sets of assumptions in the spirit of [17] in Section 2.2. The main results are presented in Section 3:

see Section 3.1 for results in the sub-critical case, with technical proofs in Section 4; see Section 3.2 for results in the critical case, with technical proofs in Section 5; and see Section 3.3 for results in the super-critical case, with technical proofs in Section 6. The proof relies essentially on explicit second moments computations and precise upper bounds of fourth moments for BMC, which are recalled in Section 7.

2. Models and assumptions

2.1. Bifurcating Markov chain: the model. We denote byNthe set of non-negative integers andN^∗=N\ {0}. If (E,E) is a measurable space, thenB(E) (resp. B_b(E), resp. B₊(E)) denotes the set of (resp. bounded, resp. non-negative) R-valued measurable functions defined onE. For f ∈B(E), we setkfk∞= sup{|f(x)|, x∈E}. For a finite measureλon (E,E) andf ∈B(E) we shall writehλ, fiforR

f(x) dλ(x) whenever this integral is well defined. For n∈N^∗, the product spaceEⁿ is endowed with the productσ-field E^⊗n. If (E, d) is a metric space, thenE will denote its Borelσ-field and the setC_b(E) (resp. C₊(E)) denotes the set of bounded (resp. non-negative) R-valued continuous functions defined onE.

Let (S,S) be a measurable space. LetQbe a probability kernel onS×S, that is: Q(·, A) is measurable for all A∈S, andQ(x,·) is a probability measure on (S,S) for allx∈S. For any

(3)

f ∈B_b(S), we set forx∈S:

(1) (Qf)(x) =

Z

S

f(y)Q(x,dy).

We define (Qf), or simplyQf, forf ∈B(S) as soon as the integral (1) is well defined, and we have Qf ∈B(S). For n∈N, we denote byQⁿ then-th iterate ofQ defined byQ⁰ =Id, the identity map onB(S), and Qⁿ⁺¹f =Qⁿ(Qf) forf ∈B_b(S).

LetP be a probability kernel onS×S^⊗2, that is: P(·, A) is measurable for allA∈S^⊗2, and P(x,·) is a probability measure on (S²,S^⊗2) for all x∈S. For anyg ∈B_b(S³) andh∈B_b(S²), we set forx∈S:

(2) (P g)(x) = Z

S²

g(x, y, z)P(x,dy,dz) and (P h)(x) = Z

S²

h(y, z)P(x,dy,dz).

We define (P g) (resp. (P h)), or simplyP gforg∈B(S³) (resp. P hforh∈B(S²)), as soon as the corresponding integral (2) is well defined, and we have that P gandP hbelong toB(S).

We now introduce some notations related to the regular binary tree. We set T₀ =G₀= {∅}, G_k ={0,1}^k andT_k =S

0≤r≤kG_r fork∈N^∗, and T=S

r∈NG_r. The setG_k corresponds to the k-th generation,T_k to the tree up to the k-th generation, and T the complete binary tree. For i ∈ T, we denote by|i| the generation ofi (|i| = k if and only if i∈ G_k) andiA ={ij;j ∈A} for A⊂T, where ij is the concatenation of the two sequencesi, j ∈T, with the convention that

∅i=i∅=i.

We recall the definition of bifurcating Markov chain from [17].

Definition 2.1. We say a stochastic process indexed byT,X = (Xi, i∈T), is a bifurcating Markov chain (BMC) on a measurable space (S,S) with initial probability distribution ν on (S,S) and probability kernel PonS×S^⊗2 if:

- (Initial distribution.) The random variable X_∅ is distributed asν.

- (Branching Markov property.) For a sequence(gi, i∈T)of functions belonging toB_b(S³), we have for all k≥0,

Eh Y

i∈G_k

gi(Xi, Xi0, Xi1)|σ(Xj;j∈T_k)i

= Y

i∈G_k

Pgi(Xi).

LetX= (Xi, i∈T) be a BMC on a measurable space (S,S) with initial probability distribution ν and probability kernelP. We define three probability kernelsP0, P1and QonS×S by:

P0(x, A) =P(x, A×S), P1(x, A) =P(x, S×A) for (x, A)∈S×S, and Q=1

2(P0+P1).

Notice thatP0(resp. P1) is the restriction of the first (resp. second) marginal ofPtoS. Following [17], we introduce an auxiliary Markov chain Y = (Yn, n ∈N) on (S,S) withY0 distributed as X_∅ and transition kernelQ. The distribution ofYn corresponds to the distribution of XI, where I is chosen independently fromX and uniformly at random in generationG_n. We shall writeE_x whenX∅=x(i.e. the initial distributionν is the Dirac mass atx∈S).

We end this section with a useful notation. By convention, for f, g ∈ B(S), we define the function f⊗g∈B(S²) by (f⊗g)(x, y) =f(x)g(y) forx, y∈S and introduce the notations:

f ⊗^symg= 1

2(f ⊗g+g⊗f) and f⊗²=f⊗f.

Notice that P(g⊗^sym1) =Q(g) forg∈B₊(S).

(4)

2.2. Assumptions. For a set F ⊂ B(S) of R-valued functions, we write F² = {f²;f ∈ F}, F ⊗F ={f0⊗f1;f0, f1 ∈ F}, and P(E) = {P f;f ∈ E} whenever a kernel P act on a set of functions E. Following [17], we state a structural assumption on the set of functions we shall consider.

Assumption 2.2. Let F⊂B(S) be a set ofR-valued functions such that:

(i) F is a vector subspace which contains the constants;

(ii) F²⊂F;

(iii) F ⊂L¹(ν);

(iv) F⊗F ⊂L¹(P(x,·)) for allx∈S, andP(F⊗F)⊂F.

The condition (iv) implies that P0(F) ⊂ F, P1(F) ⊂ F as well as Q(F) ⊂ F. Notice that if f ∈ F, then even if |f| does not belong to F, using conditions (i) and (ii), we get, with g = (1 +f²)/2, that |f| ≤g and g ∈F. Typically, when (S, d) is a metric space, the set F can be the setC_b(S) of bounded real-valued functions, or the set of smooth real-valued functions such that all derivatives have at most polynomials growth.

Following [17], we also consider the following ergodic properties forQ.

Assumption 2.3. There exists a probability measure µ on (S,S) such that F ⊂L¹(µ) and for all f ∈F, we have the point-wise convergencelimn→∞Qⁿf =hµ, fiand there existsg∈F with:

(3) |Qⁿ(f)| ≤g for alln∈N.

We consider also the following geometrical ergodicity.

Assumption 2.4. There exists a probability measure µ on (S,S) such that F ⊂ L¹(µ), and α∈(0,1) such that for allf ∈F there existsg∈F such that:

(4) |Qⁿf− hµ, fi| ≤αⁿg for all n∈N.

A sequencef= (fℓ, ℓ∈N) of elements ofF satisfies uniformly (3) and (4) if there isg∈F such that:

(5) |Qⁿ(fℓ)| ≤g and |Qⁿfℓ− hµ, fℓi| ≤αⁿg for alln, ℓ∈N.

This implies in particular that |fℓ| ≤ g and |hµ, fℓi| ≤ hµ, gi. Notice that (5) trivially holds if f takes finitely distinct values (i.e. the subset{fℓ;ℓ∈N} ofF is finite) each satisfying (3) and (4).

Example 2.5. Let (S, d) be a metric space, S its Borelσ-field, and Y a Markov chain uniformly geometrically ergodici.e. there exists α∈(0,1) and a finite constant Csuch that for allx∈S:

(6) kQⁿ(x,·)−µk^{T V} ≤Cαⁿ,

where, for a signed finite measure π on (S,S), its total variation norm is defined by kπk^{T V} = sup_f∈B(S),kfk_∞≤1|hπ, fi|. Then, taking for F the set of R-valued continuous bounded function C_b(S), we get that properties (i-iii) from Assumption 2.2 and Assumption 2.4 hold. In particular, Equation (6) implies that (4) holds withg=Ckfk∞.

We consider the stronger ergodic property based on a second spectral gap.

Assumption 2.6. There exists a probability measure µ on (S,S) such that F ⊂ L¹(µ), and α ∈ (0,1), a finite non-empty set J of indices, distinct complex eigenvalues {αj, j ∈ J} of the operator Q with |αj| =α, non-zero complex projectors {R_j, j ∈J} defined on CF, the C-vector space spanned by F, such that R_j ◦R_j′ = R_j′ ◦R_j = 0 for all j 6=j^′ (so that P

j∈JR_j is also

(5)

a projector defined on CF) and a positive sequence(βn, n∈N)converging to0, such that for all f ∈F there existsg∈F and, withθj=αj/α:

(7)

Qⁿ(f)− hµ, fi −αⁿX

j∈J

θⁿ_j R_j(f)

≤βnαⁿg for all n∈N.

Without loss of generality, we shall assume that the sequence (βn, n∈N) in Assumption 2.6 is non-increasing and bounded from above by 1.

Remark 2.7. In [17], only the structural Assumption 2.2 and the ergodic Assumption 2.3 were assumed. If F contains a setAof bounded functions which is separating (that is two probability measures which coincides on Aare equal), then Assumption 2.2 and 2.3 imply in particular that µis the only invariant measure ofQ. Notice that the geometric ergodicity Assumption 2.4 implies Assumption 2.3, and that Assumption 2.6 implies Assumption 2.4 (with the sameαbut possibly different functiong).

Example 2.8. We consider the real-valued Gaussian symmetric bifurcating autoregressive process (BAR) X= (Xu, u∈T) where for allu∈T\{∅}:

Xu=aXv+εv,

where vis the parent of u, that isu=v0 oru=v1,a∈(−1,1), and (εv, v∈T) are independent Gaussian random variablesN(0, σ²) withσ >0. We obtain:

P(x, dy, dz) =Q(x, dy)Q(x, dz) with Qf(x) =E[f(ax+σG)],

where G is a standard N(0,1) Gaussian random variable. More generally we have Qⁿf(x) = E

f aⁿx+√

1−a²ⁿσaG

, where σa =σ(1−a²)^−1/2. The kernelQ admits a unique invariant probability measure µ, which is Gaussian N(0, σ_a²). The operator Q (on L²(µ)) is a symmetric integral Hilbert-Schmidt operator whose eigenvalues are given by σp(Q) = (aⁿ, n ∈ N), their algebraic multiplicity is one and the corresponding eigen-functions (¯gn(x), n∈ N) are defined for n∈Nby ¯gn(x) =gn σ_a⁻¹x

, wheregn is the Hermite polynomial of degreen. In particular, we have ¯g0= 1 and ¯g1(x) =σ⁻¹_a x. LetRbe the orthogonal projection on the vector space generated by ¯g1, that isRf =hµ, f¯g1ig¯1or equivalently, forx∈R:

(8) Rf(x) =σ⁻¹_a xE[Gf(σaG)].

Consider F the set of functions f ∈ C²(R) such that f, f^′ and f^′′ have at most polynomial growth. And assume that the probability distributionν has all its moments, which is equivalent to say thatF⊂L¹(ν). Then the setFsatisfies Assumption 2.2. We also have thatF ⊂L¹(µ). Then, it is not difficult to check directly that Assumption 2.6 also holds with J ={j0}, αj0 = α=a, βn=aⁿ andR_j₀ =R(and also Assumptions 2.3 and 2.4 hold).

2.3. Notations for average of different functions over different generations. Let X = (Xu, u∈T) be a BMC on (S,S) with initial probability distributionν, and probability kernelP. Recall Qis the induced Markov kernel. We assume thatµis an invariant probability measure of Q.

For a finite setA⊂Tand a functionf ∈B(S), we set:

MA(f) =X

i∈A

f(Xi).

We shall be interested in the casesA=G_n (then-th generation) and A=T_n (the tree up to the n-th generation). We recall from [17, Theorem 11 and Corollary 15] that under Assumptions 2.2

(6)

and 2.3 (resp. and also Assumption 2.4), we have for f ∈F the following convergence in L²(µ) (resp. a.s.):

(9) lim

n→∞|G_n|⁻¹MG_n(f) =hµ, fi and lim

n→∞|T_n|⁻¹MT_n(f) =hµ, fi.

We shall now consider the corresponding fluctuations. We will use frequently the following notation:

f˜=f− hµ, fi forf ∈L¹(µ).

In order to study the asymptotics of MG_n−ℓ( ˜f), we shall consider the contribution of the descendants of the individuali∈T_n−ℓ forn≥ℓ≥0:

(10) N_n,i^ℓ (f) =|G_n|^−1/2MiG_{n−|i|−ℓ}( ˜f),

whereiG_{n−|i|−ℓ}={ij, j∈G_{n−|i|−ℓ}} ⊂G_n−ℓ. For allk∈Nsuch that n≥k+ℓ, we have:

MG_n−ℓ( ˜f) =p

|G_n| X

i∈G_k

N_n,i^ℓ (f) =p

|G_n|N_n,∅^ℓ (f).

Letf= (fℓ, ℓ∈N) be a sequence of elements ofL¹(µ). We set forn∈Nandi∈T_n:

(11) Nn,i(f) =

n−|i|

X

ℓ=0

N_n,i^ℓ (fℓ) =|G_n|^−1/2

n−|i|

X

ℓ=0

MiG_{n−|i|−ℓ}( ˜fℓ).

InNn,i, we consider the contribution of the descendants ofiup to generationn. We deduce that P

i∈G_kNn,i(f) =|G_n|^−1/2Pn−k

ℓ=0 MG_n−ℓ( ˜fℓ) which gives fork= 0:

(12) N_n,∅(f) =|G_n|^−1/2

n

X

ℓ=0

MG_n−ℓ( ˜fℓ).

In N_n,∅, we consider the contribution of all the individual from generation 0 up to generationn.

We shall prove the convergence in law ofN_n,∅(f) in the following sections.

Remark 2.9. We shall consider in particular the following two simple cases. Letf ∈ L¹(µ) and consider the sequencef= (fℓ, ℓ∈N). Iff0=f andfℓ= 0 forℓ∈N^∗, then we get:

N_n,∅(f) =|G_n|^−1/2MG_n( ˜f).

Iffℓ=f forℓ∈N, then we shall writef = (f, f, . . .), and we get, as|T_n|= 2ⁿ⁺¹−1 and|G_n|= 2ⁿ: Nn,∅(f) =|G_n|^−1/2MT_n( ˜f) =√

2−2⁻ⁿ |T_n|^−1/2MT_n( ˜f).

Thus, we will easily deduce the fluctuations ofMT_n(f) andMG_n(f) from the asymptotics ofN_n,∅(f).

To study the asymptotics ofN_n,∅(f), it is convenient to write for n≥k≥1:

(13) N_n,∅(f) =|G_n|^−1/2

k−1

X

r=0

MG_r( ˜fn−r) + X

i∈G_k

Nn,i(f).

Iff = (f, f, . . .) is the infinite sequence of the same functionf, this becomes:

(14) N_n,∅(f) =|G_n|^−1/2MT_n( ˜f) =|G_n|^−1/2MT_k−1( ˜f) + X

i∈G_k

Nn,i(f).

(7)

In the proofs, we will denote by C any unimportant finite constant which may vary from line to line (in particular C does not depend on n ∈ N nor on the considered sequence of functions f= (fℓ, ℓ∈N)).

3. Main results

3.1. The sub-critical case: 2α² < 1. We shall consider, when well defined, for a sequence f= (fℓ, ℓ∈N) of measurable real-valued functions defined onS, the quantities:

(15) Σ^sub(f) = Σ^sub₁ (f) + 2Σ^sub₂ (f), where:

Σ^sub₁ (f) =X

ℓ≥0

2^−ℓhµ,f˜_ℓ²i+ X

ℓ≥0, k≥0

2^k−ℓhµ,P

(Q^kf˜ℓ)⊗² i, (16)

Σ^sub₂ (f) = X

0≤ℓ<k

2^−ℓhµ,f˜kQ^k−ℓf˜ℓi+ X

0≤ℓ<k r≥0

2^r−ℓhµ,P

Q^rf˜k⊗^symQ^k−ℓ+rf˜ℓ

i. (17)

We have the following result whose proof is given in Section 4.

Theorem 3.1. Let X be a BMC with kernel P and initial distribution ν such that Assumptions 2.2 and 2.4 are in force with α∈(0,1/√

2). We have the following convergence in distribution for all sequencef= (fℓ, ℓ∈N)of elements of F satisfying Assumptions 2.4 uniformly, that is (5) for someg∈F:

N_n,∅(f) −−−−→_n→∞^(d) G,

whereGis a centered Gaussian random variable with varianceΣ^sub(f)given by (15), which is well defined and finite.

The convergence in distribution of Nn,∅(f) allows to recover the convergence in distribution of the average over different successive generations |G_n|^−1/2(MG_n( ˜f0), . . . , MG_n−k( ˜fk)). Notice the limit is a Gaussian random vector (G1, . . . , Gk). A priori the random variablesG1, . . . , Gk are not independent because of the interaction coming from (17). In contrast, it was proved in [14], that the average over different successive generations of martingale increments converges to Gaussian independent random variables.

Remark 3.2. Forf ∈B(S), when it is well defined, we set:

(18) Σ^subG (f) =hµ,f˜²i+X

k≥0

2^khµ,P

Q^kf˜⊗²

i and Σ^subT (f) = Σ^subG (f) + 2Σ^subT,2(f), where

Σ^subT,2(f) =X

k≥1

hµ,f˜Q^kf˜i+X

k≥1r≥0

2^rhµ,P

Q^rf˜⊗^symQ^r+kf˜ i.

If we take f = (f,0,0, . . .), we have Σ^sub(f) = Σ^subG (f). If we take f = (f, f, . . .), the infinite sequence of the same functionf, we have Σ^sub(f) = 2Σ^subT (f).

As a direct consequence of Remarks 3.2 and 2.9, and the more general Theorem 3.1, we get the following result.

(8)

Corollary 3.3. Let X be a BMC with kernel Pand initial distribution ν such that Assumptions 2.2 and 2.4 are in force with α∈(0,1/√

2). Let f ∈F. Then, we have the following convergence in distribution:

|G_n|^−1/2MG_n( ˜f) −−−−→_n→∞^(d) G1 and |T_n|^−1/2MT_n( ˜f) −−−−→_n→∞^(d) G2,

where G1 andG2 are centered Gaussian random variables with respective variances Σ^subG (f)and Σ^subT (f)given in (18), which are well defined and finite.

Proof of Corollary 3.3. Take the infinite sequencef= (f,0,0,· · ·), where only the first component is non-zero, to deduce from Theorem 3.1 the convergence in distribution of |G_n|^−1/2MG_n( ˜f) = N_n,∅(f). Next, take the infinite sequencef=f = (f, f, . . .) of the same functionf in Theorem 3.1 and use (14) and as well as limn→∞|G_n|/|T_n| = 1/2, to get the convergence in distribution for

|T_n|^−1/2MT_n( ˜f) = (|G_n|/|T_n|)^1/2N_n,∅(f).

3.2. The critical case2α²= 1. In the critical caseα= 1/√

2, we shall denote byR_jthe projector on the eigen-space associated to the eigenvalueαj withαj =θjα, |θj|= 1 and forj in the finite set of indices J. Since Q is a real operator, we get that if αj is a non real eigenvalue, so is αj. We shall denote by R_j the projector associated to αj. Recall that the sequence (βn, n ∈ N) in Assumption 2.6 can (and will) be chosen non-increasing and bounded from above by 1. For all measurable real-valued functionf defined onS, we set, when this is well defined:

(19) fˆ= ˜f −X

j∈J

R_j(f) with f˜=f− hµ, fi.

We shall consider, when well defined, for a sequence f = (fℓ, ℓ ∈ N) of measurable real-valued functions defined onS, the quantities:

(20) Σ^crit(f) = Σ^crit₁ (f) + 2Σ^crit₂ (f), where:

Σ^crit₁ (f) =X

k≥0

2^−khµ,Pf_k,k^∗ i=X

k≥0

2^−kX

j∈J

hµ,P(R_j(fk)⊗^symR_j(fk))i, (21)

Σ^crit₂ (f) = X

0≤ℓ<k

2^−(k+ℓ)/2hµ,Pf_k,ℓ^∗ i, (22)

with, fork, ℓ∈N:

(23) f_k,ℓ^∗ =X

j∈J

θ^ℓ−k_j R_j(fk)⊗^symR_j(fℓ).

Notice thatf_k,ℓ^∗ =f_ℓ,k^∗ and thatf_k,ℓ^∗ is real-valued asθ^ℓ−k_j R_j(fk)⊗R_j(fℓ) =θ^ℓ−k_j′ R_j′(fk)⊗R_j′(fℓ) forj^′ such thatαj^′ =αj and thusR_j′ =R_j.

We shall consider sequences f = (fℓ, ℓ ∈ N) of elements of F which satisfies Assumption 2.6 uniformly, that is such that there existsg∈F with:

We deduce that there exists a finite constant cJ depending only on{αj, j ∈J} such that for all ℓ∈N,n∈N,j0∈J:

(25) |fℓ| ≤g, |f˜ℓ| ≤g, |hµ, fℓi| ≤ hµ, gi,

X

j∈J

θ_jⁿR_j(fℓ)

≤2g and |R_j₀(fℓ)| ≤cJg,

(9)

where for the last inequality, we used that the Vandermonde matrix (θⁿ_j; j∈J, n∈ {0, . . . ,|J|−1}) is invertible. Notice that (24) holds in particular if (7) holds for all f ∈F and f= (fn, n ∈N) takes finitely distinct values inF (i.e. the set{fℓ;ℓ∈N} ⊂F is finite). The proof of the following result is given in Section 5.

Theorem 3.4. LetX be a BMC with kernelPand initial distributionν. Assume that Assumptions 2.2 and 2.6 hold withα= 1/√

2. We have the following convergence in distribution for all sequence f= (fℓ, ℓ∈N)of elements ofF satisfying Assumptions 2.6 uniformly, that is (24) for someg∈F:

n^−1/2N_n,∅(f) −−−−→_n→∞^(d) G,

where G is a Gaussian real-valued random variable with varianceΣ^crit(f)given by (20), which is well defined and finite.

Remark 3.5. Forf ∈B(S), when it is well defined, we set:

(26) Σ^critG (f) =X

j∈J

hµ,P(R_j(f)⊗^symR_j(f))i and Σ^critT (f) = Σ^critG (f) + 2Σ^crit_T,2(f), where

Σ^crit_T,2(f) =X

j∈J

√ 1

2θj−1hµ,P(R_j(f)⊗^symR_j(f))i.

If we take f = (f,0,0, . . .), we have Σ^crit(f) = Σ^critG (f). If we take f = (f, f, . . .), the infinite sequence of the same functionf, we have Σ^crit(f) = 2Σ^critT (f).

As a direct consequence of Remarks 3.5 and 2.9, and the more general Theorem 3.4, we get the following result. The proof which mimic the proof of Corollary 3.3 is left to the reader.

Corollary 3.6. Let X be a BMC with kernel Pand initial distribution ν such that Assumptions 2.2 and 2.6 are in force with α= 1/√

2. Letf ∈F. Then, we have the following convergence in distribution:

(n|G_n|)^−1/2MG_n( ˜f) −−−−→_n→∞^(d) G1, and (n|T_n|)^−1/2MT_n( ˜f) −−−−→_n→∞^(d) G2,

where G1 and G2 are centered Gaussian real-valued random variables with respective variance Σ^critG (f)andΣ^critT (f)given in (26), which are well defined and finite.

Remark 3.7. We stress that the variances Σ^critT (f) and Σ^critG (f) can take the value 0. This is the case in particular if the projection off on the eigenspace corresponding to the eigenvaluesαjequal 0 for allj∈J. In the symmetric BAR model developed in Example 2.8 whereJ is reduced to a singleton and the projector is given by (8), we deduce that if a=α= 1/√

2 then Σ^critG (f) = Σ^critT (f) = 0 if E[G f(σaG)] = 0, whereG is a standardN(0,1) Gaussian random variable. This is in particular the case iff is even.

3.3. The super-critical case 2α² > 1. We consider the super-critical case α ∈(1/√

2,1). We shall assume that Assumption 2.6 holds. Recall (7) with the eigenvalues {αj = θjα, j ∈ J} of Q, with modulus equal toα (i.e. |θj|= 1) and the projectorR_j on the eigen-space associated to eigenvalueαj. Recall that the sequence (βn, n∈N) in Assumption 2.6 can (and will ) be chosen non-increasing and bounded from above by 1.

We shall consider the filtrationH= (H_n, n∈) defined byH_n=σ(Xi, i∈T_n). The next lemma, whose the proof is given in Section 6.1, exhibits martingales related to the projectorR_j.

(10)

Lemma 3.8. Let X be a BMC with kernel Pand initial distributionν. Assume that Assumption 2.2 and 2.6 hold with α ∈ (1/√

2,1) in (7). Then, for all j ∈ J and f ∈ F, the sequence Mj(f) = (Mn,j(f), n∈N), with

Mn,j(f) = (2αj)⁻ⁿMG_n(R_j(f)),

is aH-martingale which converges a.s. and inL² to a random variable, sayM∞,j(f).

Now, we state the main result of this section, whose proof is given in Section 6.2. Recall that θj =αj/αand|θj|= 1 andM∞,j is defined in Lemma 3.8.

Theorem 3.9. LetX be a BMC with kernelPand initial distributionν. Assume that Assumptions 2.2 and 2.6 hold with α∈(1/√

2,1) in (7). We have the following convergence in probability for all sequence f = (fℓ, ℓ ∈N) of elements of F satisfying Assumptions 2.6 uniformly, that is (24) holds for some g∈F:

(2α²)^−n/2N_n,∅(f)−X

ℓ∈N

(2α)^−ℓX

j∈J

θ^n−ℓ_j M∞,j(fℓ) −−−−→_n→∞^P 0.

Remark 3.10. We stress that if for allℓ∈N, the orthogonal projection offℓ on the eigen-spaces corresponding to the eigenvalues 1 andαj,j ∈J, equal 0, thenM∞,j(fℓ) = 0 for allj ∈J and in this case, we have

(2α²)^−n/2N_n,∅(f) −−−−→_n→∞^P 0.

As a direct consequence of Theorem 3.9 and Remark 2.9, we deduce the following results. Recall that ˜f =f − hµ, fi.

Corollary 3.11. Under the assumptions of Theorem 3.9, we have for allf ∈F: (2α)⁻ⁿMT_n( ˜f)−X

j∈J

θⁿ_j(1−(2αθj)⁻¹)⁻¹M∞,j(f)

P

−−−−→_n→∞ 0

(2α)⁻ⁿMG_n( ˜f)−X

j∈J

θ_jⁿM∞,j(f)

P

−−−−→_n→∞ 0.

Proof. We first takef= (f, f, . . .) and nextf= (f,0, . . .) in Theorem 3.9, and then use (12).

We directly deduce the following two Corollaries.

Corollary 3.12. Under the hypothesis of Theorem 3.9, if α is the only eigen-value of Q with modulus equal toα(and thusJ is reduced to a singleton), then we have:

(2α²)^−n/2Nn,∅(f) −−−−→_n→∞^P X

ℓ∈N

(2α)^−ℓM∞(fℓ),

where, forf ∈F,M∞(f) = limn→∞(2α)⁻ⁿMG_n(R(f)), andRis the projection on the eigen-space associated to the eigen-valueα.

The next Corollary is a direct consequence of Corollary 3.12.

Corollary 3.13. Let X be a BMC with kernel P and initial distribution ν. Assume that As- sumption 2.2 and 2.6 hold with α∈(1/√

2,1) in (7). Assumeαis the only eigen-value ofQ with modulus equal toα(and thusJ is reduced to a singleton), then we have for f ∈F:

(2α)⁻ⁿMG_n( ˜f)

P

−−−−→_n→∞ M∞(f) and (2α)⁻ⁿMT_n( ˜f)

P

−−−−→_n→∞ 2α

2α−1M∞(f), where M∞(f)is a random variable defined in Corollary 3.12.

(11)

4. Proof of Theorem 3.1

Let (pn, n∈N) be a non-decreasing sequence of elements ofN^∗ such that, for allλ >0:

(27) pn < n, lim

n→∞pn/n= 1 and lim

n→∞n−pn−λlog(n) = +∞. When there is no ambiguity, we writepforpn.

Leti, j∈T. We write i4j ifj∈iT. We denote byi∧j the most recent common ancestor of i and j, which is defined as the onlyu∈T such that ifv ∈T and v4i, v 4j then v 4u. We also define the lexicographic orderi ≤j if either i4j or v0 4i and v1 4j for v =i∧j. Let X = (Xi, i∈T) be aBM C with kernel Pand initial measureν. Fori∈T, we define theσ-field:

F_i={Xu;u∈Tsuch thatu≤i}. By construction, theσ-fields (F_i;i∈T) are nested asF_i⊂F_j fori≤j.

We define forn∈N,i∈G_n−p

n andf∈F^Nthe martingale increments:

(28) ∆n,i(f) =Nn,i(f)−E[Nn,i(f)|F_i] and ∆n(f) = X

i∈G_n−_pn

∆n,i(f).

Thanks to (11), we have:

X

i∈G_n−pn

Nn,i(f) =|G_n|^−1/2

pn

X

ℓ=0

MG_n−ℓ( ˜fℓ) =|G_n|^−1/2

n

X

k=n−pn

MG_k( ˜fn−k).

Using the branching Markov property, and (11), we get fori∈G_n−p

n: E[Nn,i(f)|F_i] =E[Nn,i(f)|Xi] =|G_n|^−1/2

pn

X

ℓ=0

E_X

i

hMG_pn−ℓ( ˜fℓ)i . We deduce from (13) with k=n−pn that:

(29) N_n,∅(f) = ∆n(f) +R0(n) +R1(n), with

(30) R0(n) =|G_n|^−1/2

n−pn−1

X

k=0

MG_k( ˜f_n−k) and R1(n) = X

i∈G_n−_pn

E[Nn,i(f)|F_i]. We have the following elementary lemma.

Lemma 4.1. Under the assumptions of Theorem 3.1, we have the following convergence:

n→∞lim E[R0(n)²] = 0.

Proof. For allk≥1, we have:

E_x[MG_k( ˜fn−k)²]≤2^kg1(x) +

k−1

X

ℓ=0

2^k+ℓα^2ℓ Q^k−ℓ−1(P(g2⊗g2)) (x)

≤2^kg1(x) + 2^k

k−1

X

ℓ=0

(2α²)^ℓg3(x)

≤2^kg4(x),

with g1, g2, g3, g4∈F and where we used (70), (5) twice and (3) twice (withf andg replaced by 2(g²+hµ, gi²) andg1, and withf andgreplaced bygandg2) for the first inequality, (3) (withf

(12)

andgreplaced byP(g2⊗g2) andg3) for the second, and that 2α²<1 andg4=g1+ (1−2α²)⁻¹g3

for the last. As g4∈F ⊂L¹(ν), this implies thatE[MG_k( ˜f_n−k)²]≤c²2^k for some finite constant c which does not depend onn ork. We can takec large enough, so that this upper bound holds also fork= 0 and alln∈N, thanks to (5). We deduce that:

(31) E[R0(n)²]^1/2≤ |G_n|^−1/2

n−p−1

X

k=0

E[MG_k( ˜fn−k)²]^1/2≤c2^−n/2

n−p−1

X

k=0

2^k/2≤3c2^−p/2.

Use that limn→∞p=∞to conclude.

We have the following lemma.

n→∞lim E

R1(n)²

= 0.

Proof. We set forp≥ℓ≥0:

(32) R1(ℓ, n) = X

i∈G_n−p

E

N_n,i^ℓ (fℓ)|F_i , so that, thanks to (11), R1(n) =Pp

ℓ=0R1(ℓ, n). We have fori∈G_n−p:

|G_n|^1/2E

N_n,i^ℓ (fℓ)|F_i

=Eh

MiGp−ℓ( ˜fℓ)|Xi

i=E_X_ih

MG_p−ℓ( ˜fℓ)i

=|G_p−ℓ|Q^p−ℓf˜ℓ(Xi), where we used definition (10) ofN_n,i^ℓ for the first equality, the Markov property ofXfor the second and (69) for the third. We deduce that:

R1(ℓ, n) =|G_n|^−1/2|G_p−ℓ|MG_n−p(Q^p−ℓf˜ℓ).

Using (70), we get:

E_x

R1(ℓ, n)²

=|G_n|⁻¹|G_p−ℓ|²E_x

MG_n−p(Q^p−ℓf˜ℓ)2

=|G_n|⁻¹|G_p−ℓ|²2^n−pQ^n−p (Q^p−ℓf˜ℓ)² (x) +|G_n|⁻¹|G_p−ℓ|²

n−p−1

X

k=0

2^n−p+kQ^{n−p−k−1} P

Q^k+p−ℓf˜ℓ⊗² (x).

We deduce that:

E_x

R1(ℓ, n)²

≤α^2(p−ℓ)2^p−2ℓQ^n−p(g²)(x) + 2^p−2ℓ

n−p−1

X

k=0

α^2(k+p−ℓ)2^kQ^{n−p−k−1}(P(g⊗g))

≤α^2(p−ℓ)2^p−2ℓ g1(x) +

n−p−1

X

k=0

(2α²)^kg2(x)

! (33)

≤(2α²)^p(2α)^−2ℓg3(x),

withg1, g2, g3∈F and where we used (5) for the first inequality, (3) twice (withf andgreplaced by g² and g1 and by P(g⊗g) and g2) for the second, and that 2α² < 1 for the last. Since g3∈F ⊂L¹(ν), this gives thatE

R1(ℓ, n)²

≤(2α²)^p(2α)^−2ℓhν, g3i. We deduce that:

E

R1(n)²1/2

≤

p

X

ℓ=0

E

R1(ℓ, n)²1/2

≤a1,nhν, g3i^1/2,

(13)

with the sequence (a1,n, n∈N) defined by:

a1,n= (2α²)^p/2

p

X

ℓ=0

(2α)^−ℓ.

Notice the sequence (a1,n, n∈N) converges to 0 since limn→∞p=∞, 2α²<1 and

p

X

ℓ=0

(2α)^−ℓ≤







2α/(2α−1) if 2α >1,

p+ 1 if 2α= 1,

(2α)^−p/(1−2α) if 2α <1.

We conclude that lim_n→∞E

R1(n)²

= 0.

We now study the bracket of ∆n:

(34) V(n) = X

i∈G_n−pn

E

∆n,i(f)²|F_i . Using (11) and (28), we write:

(35) V(n) =|G_n|⁻¹ X

i∈G_n−_pn

E_X

i





pn

X

ℓ=0

MG_pn−ℓ( ˜fℓ)

!2

−R2(n) =V1(n) + 2V2(n)−R2(n), with:

V1(n) =|G_n|⁻¹ X

i∈G_n−pn pn

X

ℓ=0

E_X

i

hMG_pn_−ℓ( ˜fℓ)²i , V2(n) =|G_n|⁻¹ X

i∈G_n−_pn

X

0≤ℓ<k≤pn

E_X

i

hMG_pn−ℓ( ˜fℓ)MG_pn−k( ˜fk)i , R2(n) = X

i∈G_n−_pn

E[Nn,i(f)|Xi]².

n→∞lim E[R2(n)] = 0.

Proof. We define the sequence (a2,n, n∈N) forn∈Nby:

a2,n= 2^−p

p

X

ℓ=0

(2α)^ℓ

!2

.

Notice that the sequence (a2,n, n∈N) converges to 0 since limn→∞p=∞, 2α²<1 and

p

X

ℓ=0

(2α)^ℓ≤







(2α)^p+1/(2α−1) if 2α >1,

p+ 1 if 2α= 1,

1/(1−2α) if 2α <1.

(14)

We now compute E_x[R2(n)].

E_x[R2(n)] =|G_n|⁻¹ X

i∈G_n−p

E_x



E_x

" _p X

ℓ=0

MiG_p−ℓ( ˜fℓ)|Xi

#2



=|G_n|⁻¹ X

i∈G_n−p

E_x





p

X

ℓ=0

E_X

i

hMG_p−ℓ( ˜fℓ)i

!2



=|G_n|⁻¹|G_n−p|Q^n−p X^p

ℓ=0

|G_p−ℓ|Q^p−ℓf˜ℓ

2! (x)

≤2^−p

p

X

ℓ=0

(2α)^p−ℓ

!2

Q^n−p(g²)(x) (36)

≤a2,ng1(x),

withg1∈F and where we used the definition ofNn,i(f) for the first equality, the Markov property ofX for the second, (69) for the third, (5) for the first inequality, and (3) (withf andg replaced byg² and g1) for the last. We conclude that lim_n→∞E[R2(n)] = 0, using that hν, g1iif finite as

g1∈F ⊂L¹(ν).

We have the following technical lemma.

Lemma 4.4. Under the assumptions of Theorem 3.1, we have thatΣ^sub₂ (f)defined in (17) is well defined and finite, and that a.s. limn→∞V2(n) = Σ^sub₂ (f)<+∞.

Proof. Using (71), we get:

(37) V2(n) =V5(n) +V6(n),

with

V5(n) =|G_n|⁻¹ X

i∈G_n−p

X

0≤ℓ<k≤p

2^p−ℓQ^p−k

f˜kQ^k−ℓf˜ℓ

(Xi),

V6(n) =|G_n|⁻¹ X

i∈G_n−p

X

0≤ℓ<k<p p−k−1

X

r=0

2^p−ℓ+rQ^{p−1−(r+k)} P

(Xi).

We consider the termV6(n). We have:

(38) V6(n) =|G_n−p|⁻¹MG_n−p(H6,n), with:

H6,n= X

0≤ℓ<k r≥0

h⁽ⁿ⁾_k,ℓ,r1{r+k<p} and h⁽ⁿ⁾_k,ℓ,r= 2^r−ℓQ^{p−1−(r+k)} P

.

Using (4) and sinceP(Q^r(F)⊗Q^k−ℓ+r(F))⊂F and limn→∞p= +∞, we have that:

n→∞lim h⁽ⁿ⁾_k,ℓ,r=hk,ℓ,r,

(15)

where the constant hk,ℓ,r is equal to 2^r−ℓhµ,P

i. Using (4), we also have that:

|h⁽ⁿ⁾_k,ℓ,r| ≤2^r−ℓα^k−ℓ+2rQ^{p−1−(r+k)}(P(g⊗g))

≤2^r−ℓα^k−ℓ+2rg∗,

withg_∗∈F (which does not depend onn, r, kandℓ) and where we used (5) for the first inequality and (3) (withf andg replaced byP(g⊗g) andg∗). Taking the limit, we also deduce that:

|hk,ℓ,r| ≤2^r−ℓα^k−ℓ+2rg∗. Define the constant

H6(f) = X

0≤ℓ<k r≥0

hk,ℓ,r = X

0≤ℓ<k r≥0

2^r−ℓhµ,P

i

which is finite as

(39) X

0≤ℓ<k, r≥0

2^r−ℓα^k−ℓ+2r= 2α

(1−α)(1−2α²) <+∞. Using (4) (withf andg replaced byP

and gk,ℓ,r), we deduce that:

|h⁽ⁿ⁾_k,ℓ,r−hk,ℓ,r| ≤2^r−ℓα^{p−1−(r+k)}gk,ℓ,r. Setr0∈N^∗andgr0=P

0≤ℓ<k;r≥0;k∨r≤r⁰gk,ℓ,r. Notice thatgr0 belongs toF and is non-negative.

Furthermore, we have:

|H6,n−H6(f)| ≤ X

0≤ℓ<k r≥0 k∨r≤r⁰

2^r−ℓα^{p−1−(r+k)}gr0+ X

0≤ℓ<k r≥0 r∨k>r⁰

|h⁽ⁿ⁾_k,ℓ,r|1_{r+k<p}+|hk,ℓ,r|

≤(r0+ 1)²2^r⁰⁺¹α^p−1−2r⁰gr0+γ1(r0)g∗, with

γ1(r0) = X

0≤ℓ<k r≥0 r∨k>r⁰

2^r−ℓα^k−ℓ+2r.

Using (9) withnreplaced byn−pandf replaced by g_∗andgr0, and that lim_n→∞α^p = 0 as well as lim_n→∞n−p=∞, we deduce that:

lim sup

n→∞ |G_n−p|⁻¹MG_n−p(|H6,n−H6(f)|)≤γ(r0)hµ, g∗i.

Thanks to (39), we get by dominated convergence that limr0→∞γ1(r0) = 0. This implies that:

n→∞lim |G_n−p|⁻¹MG_n−p(|H6,n−H6(f)|) = 0.

Since|G_n−p|⁻¹MG_n−p(·) is a probability measure, we deduce from (38) that a.s.:

n→∞lim V6(n) = lim

n→∞|G_n−p|⁻¹MG_n−p(H6,n) =H6(f) = X

0≤ℓ<k r≥0

2^r−ℓhµ,P

i.

Similarly, we get that a.s. limn→∞V5(n) =H5(f), with the finite constantH5(f) defined by:

H5(f) = X

0≤ℓ<k

2^−ℓhµ,f˜kQ^k−ℓf˜ℓi.

(16)

Notice that Σ^sub₂ (f) =H5(f) +H6(f) is finite thanks to (5) and (39). This finishes the proof.

Using similar arguments as in the proof of Lemma 4.4, we get the following result.

Lemma 4.5. Under the assumptions of Theorem 3.1, we have thatΣ^sub₁ (f)in (16) is well defined and finite, and that a.s. lim_n→∞V1(n) = Σ^sub₁ (f).

Proof. Using (70), we get:

(40) V1(n) =V3(n) +V4(n),

with

V3(n) =|G_n|⁻¹ X

i∈G_n−p p

X

ℓ=0

2^p−ℓQ^p−ℓ( ˜f_ℓ²)(Xi),

V4(n) =|G_n|⁻¹ X

i∈G_n−p p−1

X

ℓ=0 p−ℓ−1

X

k=0

2^p−ℓ+kQ^{p−1−(ℓ+k)} P

Q^kf˜ℓ⊗² (Xi).

We consider the termV4(n). We have:

(41) V4(n) =|G_n−p|⁻¹MG_n−p(H4,n), with:

(42) H4,n= X

ℓ≥0, k≥0

h⁽ⁿ⁾_ℓ,k1_{ℓ+k<p} and h⁽ⁿ⁾_ℓ,k = 2^k−ℓQ^{p−1−(ℓ+k)} P

Q^kf˜ℓ⊗² . Using (4), we have that:

n→∞lim h⁽ⁿ⁾_ℓ,k =hℓ,k, where the constanthℓ,k is equal to 2^k−ℓhµ,P

Q^kf˜ℓ⊗²

i. We also have that:

|h⁽ⁿ⁾_ℓ,k| ≤2^k−ℓα^2kQ^{p−1−(ℓ+k)}(P(g⊗g))

≤2^k−ℓα^2kg∗,

withg∗∈F (which does not depend onn, ℓandk) and where we used (5) for the first inequality and (3) (withf andg replaced byP(g⊗g) andg∗). Taking the limit, we also deduce that:

|hℓ,k| ≤2^k−ℓα^2kg_∗. Define the constant

H4(f) = X

ℓ≥0, k≥0

hℓ,k, which is finite as:

(43) X

ℓ≥0, k≥0

2^k−ℓα^2k = 2/(1−2α²)<+∞.

Using (4) (withf andg replaced byP

Q^kf˜ℓ⊗²

andgℓ,k), we deduce that:

|h⁽ⁿ⁾_ℓ,k−hℓ,k| ≤2^k−ℓα^{p−1−(ℓ+k)}gℓ,k,