Adequacy of the Harris path with the expected contour

Chapitre 7 On the inference for size constrained Galton-Watson trees 133

7.2 Inferring σ −1 from a forest

7.2.1 Adequacy of the Harris path with the expected contour

Let τ_n ∼ GW_n(µ) with µ = 1. We assume that the offspring distribution µ is unknown. By virtue of Theorem 7.1.3, the asymptotic average behavior of the normalized Harris process (n^−1/2H[τ_n](2nt), 0 ≤ t ≤ 1) is given by (2σ⁻¹Et, 0 ≤ t ≤ 1), where σ⁻¹ is obviously also unknown. We propose to estimateσ⁻¹ by minimizing theL²-error defined by

λ7→

H[τ_n](2n·)

√n −2λE

2 2

The solution of this least-square problem is well-known and is given by bλ[τn] = hH[τ_n](2n·), Ei

2√

nkEk²₂ . (7.4)

Corollary 7.2.1. When n goes to infinity, we have bλ[τn]−→^(d) σ⁻¹Λ∞, where the real random variable Λ∞ is defined by

Λ∞= he, Ei kEk²₂.

Proof. The result directly follows from Theorem 7.1.2 because the functional x 7→ hx, Ei is

continuous onC([0,1]). 2

Remark 7.2.2. The convergence in distribution stated in Corollary 7.2.1 seems quite unsatis-factory because this means that bλ[τ_n] is not a consistent estimator of σ⁻¹ and the least-square strategy thus looks like inadequate. Nevertheless, one can not expect a stronger convergence from the observation of only one stochastic process within a finite window of time. This is why one may only focus on the estimation of the parameter of interest σ⁻¹ from a forest of conditioned Galton-Watson trees. This statistical framework is also considered in [9].

Computing bλ[τ_n] is only a first step in the estimation of the inverse standard deviation from a large number of conditioned Galton-Watson trees. As a consequence, the distribution of the limit variableΛ∞ is of first importance.

7.2. Inferring σ⁻¹ from a forest Proposition 7.2.3. The random variable Λ∞ admits a densityfΛ∞ with respect to the Lebesgue measure. Furthermore,

E[Λ∞] = 1. (7.5)

Proof.The existence of a density was already known [70, 71] for the random variable R1

0 e_sds. In these papers the study is performed thanks to the analysis of the double Laplace transform

λ7→

Z ∞ 0

exp(−λt)E

exp

−t Z 1

esds

dt.

Thanks to the Feynmann-Kac formula, the authors express this quantity in terms of Airy func-tions. Then, they inverse the Laplace transform via analytical methods. Unfortunately, their method does not extend to our case. Indeed, in their case, an expression of the double Laplace transform given above is derived from the Feynmann-Kac formula for standard Brownian motion which tells us that the function

u(t, x) =Ex

f(Bt) exp Z t

Bsds

, ∀(t, x)∈R+×R, is the solution of the PDE

( ∂_tu(t, x) = ¹₂∆u(t, x) +xu(t, x) ∀x∈R, t∈R+,

u(0, x) =f(x) ∀x∈R.

In this case, taking the Laplace transform in time ofu leads to an ODE whose solution can be express in term of Airy functions (see [52]). In our case, the PDE becomes inhomogeneous in time which makes such transformation useless. As a consequence, one cannot obtain informations by this method.

That is why we propose a new method using Malliavin calculus and the representation of the Brownian excursion as a three-dimensional Bessel bridge (7.3) to show thatΛ∞admits a density.

We consider the probability space (C([0,1],R³),F,W), where C([0,1],R³) is endowed with the topology of uniform convergence, F is the corresponding Borel σ-field and W is the Wiener measure. LetT be the continuous linear operator defined by

T : C([0,1],R³) → (C([0,1],R³), ϕ 7→ (T ϕ(s) =ϕ_s−sϕ₁). Let alsoΓ be the following function,

Γ :ϕ7→

Z 1 0

kϕ(s)k₃E_sds.

wherekxk denotes the Euclidian norm onR³. With these notations and (7.3), we have that the pushforward measure ofWthrough the application

F :ϕ7→Γ(T ϕ),

is the law ofkEk²₂Λ∞. In other words, the random variableF is equal in distribution tokEk²₂Λ∞. Now for every ϕ in C([0,1],R³) such that Leb

{t∈R+ : ϕ(t) = 0}

= 0, we have that Γ is Frechet differentiable at point ϕ(whereLebdenotes the Lebesgue measure). Indeed, set

DϕΓ : (C([0,1],R³) → R,

h 7→ R1

hϕ(s),h(s)i kϕ(s)k E_s ds.

Then, some straightforward manipulations give Z 1

Now, Cauchy-Schwarz inequality entails

is well-defined (because the integrand is bounded by2) and goes to zero askhk∞goes to zero, this prove thatDϕΓis the Frechet derivative ofΓat pointϕ. Now, sinceT is linear, we have thatF is Frechet differentiable at everyϕsuch thatLeb

{t∈R+ : ϕ(t) = 0}

= 0andD_ϕF =D_{T ϕ}Γ◦T. We now show that F belongs to the Malliavin-Sobolev space D^1,2 (see [75, p. 25-27] for the definition of this space). Leth be an element ofL²([0,1],R³), it is easily seen that

But in the right hand side of the last inequality, we have, using Jensen’s inequality, Z 1

7.2. Inferring σ⁻¹ from a forest From this, using the results of [75, p. 35], we have thatF belongs to the space D^1,2.

Before going further let us recall some facts on Malliavin derivative. When, working with the probability space (C([0,1],R³),F,W), its is known (see Section 1.2.1 in [75]) that there exists strong connexions between Malliavin derivative and Frechet derivative for a random variableG ofD^1,2 defined from(C([0,1],R³),F,W)toR. Since, the Frechet derivativeDωGat point ωofG is a continuous linear form fromC([0,1],R³) intoR, it can be identified to a triple (µ^ω₁, µ^ω₂, µ^ω₃) ofσ-finite measures onR such that

D_ϕGh=

i=1

[0,1]

hⁱ_s µ^ω_i(ds), ∀h∈ C([0,1],R³).

In such case, the Malliavin derivative ofGis random process belonging toL²([0,1],R³) given by {(µ^ω₁(u,1], µ^ω₂(u,1], µ^ω₃(u,1]), u∈[0,1]}.

In our case, since D_ϕF h=

Z 1 0

h_s

ϕ_s−sϕ₁

kϕ_s−sϕ₁kE_s ds− Z 1

v(ϕ_v−vϕ₁) kϕ_v−vϕ₁kE_v dv

δ₁(ds)

, it follows that the Malliavin derivative ofF is given by

DF = Z 1

(ωs−sω1)Es

kω_s−sω₁k (1s>u−s)ds, u∈[0,1]

∈L²([0,1],R³).

Now, sinceDF isW-almost everywhere not zero (inL²([0,1],R³)), we have using [75, Theorem 2.1.2] the existence of a density for the push-forward measure of W by F with respect to the Lebesgue measure.

2 It should be noted that the weak limit of bλ[τn] has mean equal to σ⁻¹ by (7.5). Moreover, it can be showed that the random variable Λ∞ is square integrable. Indeed, since the function E is bounded, we have

0≤Λ∞≤ C Z 1

e_tdt,

for some positive constantC. Now, its is known that the random variable R1

0 e_tdtadmit moments at all order (see for instance [71]).

The variance of Λ∞ can then be evaluated numerically in order to compare our methods with other estimators. We use Monte-Carlo simulations to produce a sample with same law asΛ∞to achieve this task. This lead to

Var(Λ∞)'0.0690785.

At this point, it is quite interesting to compare our approach to the one developed in [9]. As in the present paper, the authors of [9] construct estimators for the inverse standard deviation of the offspring distribution of a forest of conditioned critical Galton-Watson trees. Their strategy relies on the distance to the root of a uniformly sampled vertexvof the considered treeτn∼GW_n(µ),

bδ[τn] = h(v)

√n,

where we recall thath(v)is the height ofv in the tree. Using Theorem 7.1.2, it has been shown thatbδ[τ_n]converges in law, when the number of nodesngoes to infinity, towards σ⁻¹∆∞ where the random variable∆∞follows the Rayleigh distribution with parameter scale1[9, Proposition 4] with density,

∀x∈R+, f∆∞(x) =xexp

−1 2x²

This was not noticed in [9], but we emphasize that δ[τb _n] is somehow biased because E[∆∞] = p_π

2 6= 1. Nevertheless, one may avoid this issue by considering the quantity

bδ[τ_n] = r2

πδ[τ_n] which converges to σ⁻¹

π∆∞ which is σ⁻¹ on average. As a consequence, bλ[τn] and bδ[τn] are two quantities directly computable from the treeτ_nand that may be used to estimate the inverse standard deviation. We propose to compare them from their respective asymptotic dispersion. A first comparison may be done by computing the variances of Λ∞ and

π∆∞. One has Var

r2 π∆∞

'0.2732395 and Var(Λ∞)'0.0690785.

This difference in the dispersions is quite apparent in Figure 7.3 where the densities of q2

π∆∞

andΛ∞ have been displayed. Consequently, one may expect better results in terms of dispersion from our strategy.

Dans le document Processus de branchements non Markoviens en dynamique et génétique des populations (Page 141-145)