Deviation results for sparse tables in hashing with linear probing

(1)

HAL Id: hal-01097276

https://hal.archives-ouvertes.fr/hal-01097276v3

Preprint submitted on 15 Jan 2021 (v3), last revised 14 Dec 2021 (v4)

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Deviation results for sparse tables in hashing with linear probing

Thierry Klein, A Lagnoux, P Petit

To cite this version:

Thierry Klein, A Lagnoux, P Petit. Deviation results for sparse tables in hashing with linear probing.

2014. �hal-01097276v3�

(2)

Deviation results for sparse tables in hashing with linear probing

1, Thierry Klein¹, Agnès Lagnoux², and Pierre Petit³

1Institut de Mathématiques de Toulouse; UMR5219. Université de Toulouse;

ENAC - Ecole Nationale de l’Aviation Civile , Université de Toulouse, France

CNRS. UT2J, F-31058 Toulouse, France.

CNRS. UT3, F-31062 Toulouse, France.

January 14, 2021

Abstract

We consider the model of hashing with linear probing and we establish the moderate and large deviations for the total displacement in sparse tables. In this context, Weibull- like-tailed random variables appear. Deviations for sums of such heavy-tailed random variables are studied in [20,21]. Here we adapt the proofs therein to deal with conditioned sums of such variables and solve the open question in [8]. By the way, we establish the deviations of the total displacement in full tables, which can be derived from the deviations of empirical processes of i.i.d. random variables established in [28].

Keywords: large deviations, hashing with linear probing, parking problem, Brownian motion, Airy distribution, Łukasiewicz random walk, empirical processes, conditioned sums of i.i.d.

random variables, triangular arrays, Weibull-like distribution.

AMS subject classification: 60F10; 60C05; 60G50; 68W40.

1 Introduction

Hashing with linear probing is a classical model in theoretical computer science that appeared in the 50’s. It has been studied from a mathematical point of view firstly by Knuth in [14].

Here is a simple description given in [6].

A table of length m, T[1..m] is set up, as well as a hash function h that maps keys from some domain to the interval [1..m] of table addresses. A collection of n elements with n ≤ m are entered sequentially into the table according to the following rule: Each element x is placed at the first unoccupied location starting from h(x) in cyclic order, namely the first of h(x), h(x) + 1, . . . , m, 1, 2, . . . , h(x)−1.

For more details on the model, we refer to [6, 9, 18, 3, 2, 11]. The length of the move of each element x is called the displacement of x and the sum of all displacements, denoted by d_m,n, is called the total displacement. In its seminal papers [14, 15], Knuth assumes that

(3)

all the sequence of hash addresses h(x) are independent and uniformly distributed on J1, mK, computes exact expressions ofE[d_m,n] and Var(d_m,n), and provides their asymptotic behaviors.

The limit distribution ofd_m,n remains unknown until 1998: in [6], Flajolet, Poblete, and Viola give the limit distribution of d_m,n for full tables (n = m) and for sparse tables (n/m = µ ∈ (0,1)) using combinatorial arguments. In [4], Chassaing and Marckert recover the previous results in the full case via a probabilistic approach. They prove that d_m,n is the area under the Łukasiewicz random walk (also called Breadth First Search random walk) associated to a Galton-Watson tree with Poisson progeny. Consequently, the limit distribution of the total displacement d_m,m is that of the area under the Brownian excursion, which involves the Airy distribution.

In [9], reformulating the problem in terms of conditioned sums of random variables, Janson establishes the limit distribution of d_m,n in all cases with probabilistic tools. In [10], Janson extends the central limit theorem in the sparse case to a general model of conditioned sums of random variables. The corresponding Berry-Esseen bounds are proved by Klein, Lagnoux, and Petit in [12]. Concerning the deviations of such conditioned models, Gamboa, Klein, and Prieur give an answer in the case of light-tailed random variables (see [8]). Unfortunately, their results cannot be applied to the model of hashing with linear probing since this model involves heavy-tailed random variables.

In this paper, we establish the moderate and large deviations for the total displacement d_m,n in sparse tables. Deviations for heavy-tailed random variables are studied by several authors (e.g., [24, 17, 20, 21, 22]) and a good survey can be found in Mikosch [19]. In the context of hashing with linear probing, Weibull-like-tailed random variables appear. Deviations for sums of such variables are studied by Nagaev in [20, 21]. Here we adapt its proofs to deal with conditioned sums of such variables. We also need to establish the deviations of d_m,n for full tables, which can be derived from the deviations of empirical processes of i.i.d. random variables established by Wu in [28].

The paper is organized as follows. In Section 2, we state the main results for full and sparse tables. The proofs for full tables are given in Section 3 and those for sparse tables can be found in Section 5. In Section 4, we expose Janson’s reformulation of the model and provide several useful estimates required in Section 5.

2 Setting and main results

2.1 Model

An equivalent formulation of the problem of hashing can be made in terms of the discrete version of the classical parking problem described, for instance, by Knuth [16]:

A certain one-way street has m parking spaces in a row numbered 1 to m. A man and his dozing wife drive by, and suddenly, she wakes up and orders him to park immediately. He dutifully parks at the first available space [...].

More precisely, the model describes the following experiment. Let n 6 m. n cars enter sequentially into a circular parking uniformly at random. The parking spaces are numbered clockwise. A car that intends to park at an occupied space moves to the next empty space, always moving clockwise. The length of the move is called the displacement of the car and we are interested in the sum of all displacements which is a random variable denoted bydm,n. When all cars are parked, there are N = m−n empty spaces. These divide the occupied spaces into blocks of consecutive spaces. We consider that the empty space following a block belongs to this block.

For example, assume that n= 8, m = 10, and (6,9,1,9,9,6,2,5) are the addresses where the cars land. This sequence of (hash) addresses is called a hash sequence of lengthm and size n.

(4)

Let d_i be the displacement of car i. Then d₁ = d₂ =d₃ = 0. The car number 4 should park on the 9thspace which is occupied by the 2nd car; thus it moves one space ahead and parks on the 10thspace so that d₄ = 1. The car number 5 should park on the 9thspace. Since the 9th, the 10th, and the 1st spaces are occupied, d₅ = 3. And so on: d₆ = 1, d₇ = 1, d₈ = 0.

Here, the total displacement is equal to d_10,8 = 1 + 3 + 1 + 1 = 6. In our example, there are two blocks: the first one containing spaces 9, 10, 1, 2, 3 (occupied), and space 4 (empty), and the second one containing spaces 5, 6, 7 (occupied), and space 8 (empty).

In this paper, we are interested in the deviations of the total displacement d_m,n in sparse tables. To do so, we need the large deviation behavior of d_m,n in full tables. By the way, we also established the moderate deviation behavior of d_m,n in full tables.

2.2 Deviations in full tables

In this section, we first recall some already existing results for the total displacement d_m,m in full tables (n =m). As mentioned in the introduction, Knuth in [15] and Flajolet et al. in [6, Theorem 2]) derive the asymptotic behavior of the expectation and the variance of dm,m:

E[d_m,m] ∼

m→∞

√2π

4 m^3/2 and Var(d_m,m) ∼

m→∞

10−3π

24 m³. (1)

The following result was first established in [6, Theorem 3].

Theorem 1 (Standard deviations). For full tables, the distribution of the total displacement d_m,m/m^3/2 is asymptotically distributed as the area Aunder the standard Brownian excursion, in the sense that, for all δ >0,

P(d_m,m>m^3/2δ)−−−→

m→∞ P(A>δ).

In this paper, we establish the probabilities of deviation for the total displacement d_m,m. Theorem 2 (Moderate deviations). For all α ∈(3/2,2)and for all δ >0,

1

m^2α−3 logP(d_m,m>m^αδ)−−−→

m→∞ −6δ². Theorem 3 (Large deviations). For all δ >0,

−1

mlogP(d_m,m>m²δ)−−−→

m→∞ J(δ) :=







(¹₂ −δ)·λ(δ) + log(1−(¹₂ +δ)·λ(δ)) if δ <1/2

∞ if δ>1/2,

where λ(δ) is the smallest solution of the equation in λ

λ·

δ+ 1 2

−1

(1−e^λ) =λ. (2)

Observe that lower deviations are trivial: for allα∈(3/2,2] and all mlarge enough,P(d_m,m− E[d_m,m]6 −m^αδ) = 0 because of the positiveness of d_m,m and using (1). When dealing with the very large deviations, the same trivial behavior occurs both for upper and lower deviations:

for all α >2 and all m large enough, P(d_m,m−E[d_m,m]>m^αδ) = 0 and P(d_m,m−E[d_m,m]6

−m^αδ) = 0, since d_m,m 6m(m−1)/2.

Remark 4. Ifδ = 0, λ(0) = 0 is the unique solution of Equation (2). If δ∈(0,1/2), Equation (2) has two solutions: λ(δ) < 0 and 0. Moreover, J(δ) ∼ 6δ² as δ → 0 (we recover the rate function of the moderate deviations in Theorem 2) and J(δ)→+∞as δ→1/2.

(5)

Remark 5. The conclusions of Theorems 1, 2, and 3 are still valid replacing δ by δ+o(1), since the rate functions are continuous. In particular, one may replace d_m,m byd_m,n as soon as m−n m^α−1 (for instance, in the almost full case where n =m−1). Indeed, naturally coupling d_m,n and d_m,m by adding m−n balls, one has

|d_m,n−d_m,m|6(m−1) + (m−2) +· · ·+n∼m(m−n)m^α, whence

P(d_m,n >m^αδ) =P(d_m,m >m^α(δ+o(1))).

Moreover, using a new probabilistic approach developed in [4], Theorem 1 was extended in [9, Theorems 1.1 and 2.2] to the case (m−n)/√

m→a∈[0,∞): for all δ>0, P(d_m,n >m^3/2δ)−−−→

m→∞ P(W_a >δ), with

W_a=

Z 1 0

maxs6t (b(t)−b(s)−a(t−s))dt where b is a Brownian bridge b on [0,1] periodically extended to R.

2.3 Deviations in sparse tables

In this section, we consider asymptotics in (m, n) with m→ ∞ andn/m→µ∈(0,1) (sparse case). This definition of the sparse case is a slight extension of that of [6] (n/m=µ∈(0,1)).

Set N =m−n. By [6, Theorem 5], E[d_m,n] ∼

m→∞

µ²

2(1−µ)²N and Var(d_m,n) ∼

m→∞σ²(µ)N, (3)

where (cf. [6, Theorem 5])

σ²(µ) := 6µ²−6µ³+ 4µ⁴−µ⁵

12(1−µ)⁵ . (4)

The following result was first proved in [6, Theorem 6] while another probabilistic proof was given in [9].

Theorem 6 (Standard deviations). The distribution of the total displacement d_m,n is asymp- totically Gaussian distributed, in the sense that, for all y,

P

d_m,n−E[d_m,n]6N^1/2y−−−→

m→∞ P(Z 6y) where Z ∼ N(0, σ²(µ)).

In this paper, we establish the probabilities of deviation of the total displacement d_m,n in the sparse case.

Theorem 7 (Lower moderate deviations). For all α∈(1/2,1) and for all y>0, 1

N^2α−1 logP(d_m,n−E[d_m,n]6−N^αy)−−−→

m→∞ − y²

2σ²(µ). (5)

Theorem 8 (Lower large deviations). For all y>0, 1

N logP(d_m,n−E[d_m,n]6−N y)−−−→

m→∞ −Λ^∗

1

1−µ, µ²

2(1−µ)² −y

, (6)

where Λ^∗ is the Fenchel-Legendre transform of the function Λ : R² →(−∞,∞] defined by Λ(s, t) = log

∞

X

l=1

e^(s−µ)l(µl)^l−1

l! E[e^td^l,l−1].

(6)

For all α > 1, we have, asymptotically, P(d_m,n −E[d_m,n] 6 −N^αy) = 0, since d_m,n > 0 and E[d_m,n] is asymptotically linear inN.

Theorem 9 (Upper deviations).

(i) For all α∈(1/2,2/3) and for all y>0, 1

N^2α−1 logP(d_m,n−E[d_m,n]>N^αy)−−−→

m→∞ − y²

2σ²(µ). (7)

(ii) For all y>0,

1

N^1/3 logP(d_m,n−E[d_m,n]>N^2/3y)−−−→

m→∞ −I(y) (8) with

I(y) :=







y²

2σ²(µ) if y6y(µ),

q(µ)(1−t(y))^1/2y^1/2+^t(y)_2σ2²(µ)^y² if y > y(µ), where

q(µ) := inf

0<δ<1/2

√1

δ(κ(µ) +J(δ)),

κ(µ) :=µ−log(µ)−1∈(0,∞), J has been defined in Theorem 3, y(µ) := 3 (q(µ)σ²(µ))^2/3/2, and t(y) is defined for y > y(µ) as the smallest root of the cubic equation in t ∈[0,1]

t³−t²+q²(µ)σ⁴(µ) 4y³ = 0.

(iii) For all α ∈(2/3,2)and for all y>0, 1

N^α/2 logP(d_m,n−E[d_m,n]>N^αy)−−−→

m→∞ −q(µ)y^1/2. (9) (iv) For all y>0,

1

N log P(dm,n−E[dm,n]>N²y)

−−−→m→∞







−inf

δ>0

q_y

δ(κ(µ) +J(δ)) + Λ^∗₀

1

1−µ −^q^y_δ

if y < µ²/(2(1−µ)²)

−∞ if y>µ²/(2(1−µ)²),

(10) where Λ^∗₀ is the Fenchel-Legendre transform of the function Λ₀: R → (−∞,∞] defined by Λ₀(s) := Λ(s,0) and Λ has been defined in Theorem 8.

For allα >2, we have, asymptotically, P(d_m,n−E[d_m,n]>N^αy) = 0, since dm,n 6n(n−1)/2 and N is asymptotically linear in n.

Remark 10. Observe that, for α= 2/3 and for all y >0, I(y) = inf

t∈[0,1]f(t), where

f(t) =

q(µ)(1−t)^1/2y^1/2+ t²y² 2σ²(µ)

.

If y 6 y1(µ) = 3 (q(µ)σ²(µ))^2/3/2^4/3, then f is decreasing and its minimum y²/(2σ²(µ)) is attained at t= 1. Ify > y₁(µ),f has two local minima, at 1 and att(y), corresponding to the smallest of the two roots in [0,1] off⁰(t) = 0 which is equivalent tot³−t²+q²(µ)σ⁴(µ)/(4y³) =

(7)

0. If y₁(µ) < y 6 y(µ), the minimum is attained at 1, and at t(y) otherwise. Let c = q²(µ)σ⁴(µ)/(4y³). One can prove thatt(y) = 2Re(z) + 1/3 where z is the only complex cube root of

1 27− c

2+i

s c 27− c²

4 having argument in (π/3,2π/3).

Remark 11 (Typical events of deviation in Theorem 9).

(i) All the displacements within the blocks are small but their sum has a Gaussian contribution.

(iii) One block has a large size δ^−1/2N^α/2y^1/2 and the displacement within this block is N^αy, δ being chosen by the optimization in q(µ) (two competing terms).

(ii) One block has a large size and the displacement within this block is large (N_n^α(1−t(y))y) and the sum of the other displacements has a Gaussian contribution (two extra competing terms).

(iv) One block has a large size δ^−1/2N y^1/2 and the displacement within this block is N²y which forces the sum of the length of the other blocks to be abnormally small, that is m− δ^−1/2N y^1/2 ∼N((1−µ)⁻¹−δ^−1/2y^1/2) (three competing terms).

3 Proofs: full tables

Here, we take n=m. All limits are considered as m→ ∞ unless stated otherwise.

3.1 Upper moderate deviations - Theorem 2

For allm>1, let (U_m,i)_16i6mbe a sequence of i.i.d. random variables with uniform distribution U on [0,1] and let, for all i ∈ J1, mK, Vm,i = dmUm,ie. Note that (Vm,i)_16i6m is a sequence of i.i.d. random variables uniformly distributed on J1, mK and, for all i = 1, . . . , m, V_m,i corresponds to the hash address of item i. Let us define

S_m(k) :=

m

X

i=1

1Vm,i6k=L_m([0, k/m]),

where L_m is the empirical measure associated to the random sequence (U_m,i)_16i6m.

As in [9, Lemma 2.1], we extend the definition of Sm(k) for k ∈ Z so that the sequence (S_m(k)−k)k∈Z be m-periodic, whence

minl<k{S_m(l)−l}= min

16l6m{S_m(l)−l}.

Thus, by [9, Equation (2.1) and Lemma 2.1], the total displacementd_m,m is given by d_m,m=

m

X

k=1

S_m(k)−k− min

16l6m{S_m(l)−l}

=m

m

X

k=1

(L_m− U)([0, k/m])−m² min

16l6m{(L_m− U)([0, l/m])}

=m²

Z ₁

0

(L_m− U)([0, y])dy− inf

06y61, y∈Q

{(L_m− U)([0, y])}+A_m m

where A_m ∈[−3/2,−1/2]. Let α∈(3/2,2). Then, P(dm,m>m^αδ) =P

√ m m^α−3/2

Z ₁

0

(Lm− U)([0, y])dy− inf

06y61, y∈Q

{(Lm− U)([0, y])}+Am

m

>δ

=P

ϕ

√ m

m^α−3/2(L_m− U)

>δ_m

,

(8)

where δ_m = δ−A_m/m^−(α−1) → δ and, for any measure ν ∈ M([0,1]) (the space of signed measures on [0,1]),

ϕ(ν) :=

Z ₁

0

ν([0, y])dy− inf

06y61, y∈Q

ν([0, y]). (11)

Since

ϕ(ν) =

Z 1 0

(1−x)dν(x)− inf

06y61, y∈Q

Z 1

0 1[0,y](x)dν(x)

by Fubini’s theorem,ϕis a measurable function whenM([0,1]) is equipped with the σ-algebra generated by the applications ν 7→ν(f) for all bounded measurable function f on [0,1]. By [28, Equation (1.3)], we have

1

m^2α−3logP

ϕ

√ m

m^α−3/2(L_m− U)

>δ

→ −inf

(1 2

Z 1 0

dν dy(y)

2

dy; ν ∈ M([0,1]), ν U, ϕ(ν)>δ, ν([0,1]) = 0

)

=−inf

1 2

Z 1 0

G⁰(y)²dy ; G∈AC([0,1]),

Z 1 0

G(y)dy−minG>δ, G(0) =G(1) = 0

, (12) where AC([0,1]) is the space of absolutely continuous functions on [0,1]. We have used the fact that inf{G(y) ; 0 6y61, y ∈Q} = minG by the continuity of G on [0,1]. If G is a minimizer in (12) and if ˜y is a minimizer of G, then ˜G(y) = G(˜y +y)−G(˜y) is also a a minimizer in (12). The addition ˜y+y has to be understood as the addition on the torus T. Moreover, min ˜G= 0 and ^R₀¹G˜⁰(y)²dy=^R₀¹G⁰(y)²dy. So

inf

1 2

Z 1 0

G⁰(y)²dy ; G∈AC([0,1]),

Z 1 0

G(y)dy−minG>δ, G(0) =G(1) = 0

= inf

1 2

Z 1 0

G⁰(y)²dy; G∈AC([0,1]),

Z 1 0

G(y)dy>δ, G(0) =G(1) = 0

. (13)

Using the method of Lagrange multipliers, if G is a minimizer of

1 2

Z 1 0

G⁰(y)²dy; G∈AC([0,1]),

Z 1 0

G(y)dy =δ

, (14)

then there exists λ∈R such that 1 2

Z 1 0

G⁰(y)²dy+λ

Z 1 0

G(y)dy = 0.

Differentiating with respect to G, we get

∀h∈ C_c^∞((0,1))

Z 1 0

G⁰(y)h⁰(y) +λh(y)dy= 0.

Integrating by parts, one has

∀h∈ C_c^∞((0,1))

Z 1 0

G⁰(y)−λyh⁰(y)dy= 0.

By Du Bois-Reymond’s lemma in [5, p.184], the function y7→G⁰(y)−λy is constant, so G is a quadratic polynomial. Getting back to (14), the minimizer is given by

G(y) = 6δy(1−y)

(9)

and the infimum is 6δ². Moreover, we check that G(0) = G(1) = 0 as required in (13).

Consequently,

1

m^2α−3 logP ϕ

√m

m^α−3/2 (L_m− U)

!

>δ

!

→ −6δ². Using the fact that, for Φ = ϕ((√

m/m^α−3/2) (L_m− U)),

P(Φ>δ+ε)6P(Φ>δ_m)6P(Φ>δ−ε) (15) for all ε >0 and all m large enough, we conclude to the result in Theorem 2 by continuity of the function δ 7→ −6δ².

3.2 Upper large deviations - Theorem 3

The result for δ = 0 is trivial. Suppose thatδ >0. We have P

d_m,m >m²δ=P

Z ₁

0

(L_m− U)([0, y])dy− inf

06y61, y∈Q

{(L_m− U)([0, y])}+ A_m m >δ

=P

ϕ(L_m− U)>δ− A_m m

,

where ϕis the measurable function defined in (11). By Sanov’s theorem (see, e.g., [28, Equa- tion (1.1)]), we have

1

mlogP(ϕ(L_m− U)>δ)

→ −inf

(Z 1 0

dν dy(y)

log

dν dy(y)

dy; ν ∈ M⁺₁([0,1]), ν U, ϕ(ν− U)>δ

)

=−inf

(Z 1 0

F⁰(y) logF⁰(y)dy ; F ∈AC([0,1]), F⁰ >0,

Z 1 0

(F −id)(y)dy−min(F −id)>δ, F(0) = 0, F(1) = 1

)

, (16)

where M⁺₁([0,1]) is the space of probability measures on [0,1]. If F is a minimizer in (16) and if ˜y is a minimizer of F −id, then ˜F(y) =F(˜y+y)−F(˜y) is also a a minimizer in (16).

Moreover, min( ˜F −id) = 0 and ^R₀¹F˜⁰(y) log ˜F⁰(y)dy =^R₀¹F⁰(y) logF⁰(y)dy. So

−inf

(Z 1 0

F⁰(y) logF⁰(y)dy ; F ∈AC([0,1]), F⁰ >0,

Z 1 0

(F −id)(y)dy−min(F −id)>δ, F(0) = 0, F(1) = 1

)

=−inf

(Z 1 0

F⁰(y) logF⁰(y)dy ; F ∈AC([0,1]), F⁰ >0,

Z 1 0

F(y)dy>δ+1 2, F(0) = 0, F(1) = 1

)

=−infK,

where K: AC([0,1])→[0,∞] is the convex function defined by K(F) =





 R1

0 F⁰(y) log(F⁰(y))dy if F⁰ >0,^R₀¹F(y)dy>δ+¹₂, F(0) = 0,F(1) = 1

∞ otherwise.

(10)

It is a standard convex optimization problem, a minimizer of which is F¯(y) =a(1−e^λy), where







a(1−e^λ) = 1 a=δ+ ¹₂ − ¹_λ.

(One can see that a > 1 and λ 6 0.) Indeed, by the definition of a convex function and the subdifferential, it suffices to check that 0 belongs to the subdifferential of K at ¯F. For all h∈AC([0,1]) and for all t >0,

K( ¯F +th) =





 R1

0( ¯F⁰+th⁰) log( ¯F⁰+th⁰) if F⁰+th⁰ >0, ^R₀¹h>0,h(0) =h(1) = 0

∞ otherwise.

Differentiating under the integral sign with respect to t and integrating by parts gives K⁰( ¯F;h) =

Z 1 0

h⁰(y)(log ¯F⁰(y) + 1)dy=−λ

Z 1 0

h(y)dy>0

since h(0) =h(1) = 0, ^R₀¹h(y)dy> 0, and λ <0. It remains to compute the value of K at ¯F and to apply the same argument of continuity as in (15) to get the required result.

4 Interlude

4.1 Janson’s reformulation

Here, we consider (m, n) withm → ∞andn/m→µ∈(0,1). As a consequence,N =m−n →

∞. To make notation clearer, we make quantities depend on N and all limits are considered as N → ∞unless stated otherwise. In the next section, we are interested in the deviations of d_m_N_,n_N in that regime, which is called the sparse case (see [6, 9] for this denomination with slight variants). In this section, we introduce a reformulation of the model of hashing with linear probing due to Janson in [9] and prove some preliminary results.

For all N > 1, we consider a vector of random variables (X_N, Y_N) defined as follows. We assume that X_N is distributed according to the Borel distribution with parameter µ_N :=

n_N/m_N ∈(0,1), i.e.

∀l ∈J1,∞J P(X_N =l) =e^−µ^N^l(µ_Nl)^l−1

l! (17)

(see, e.g., [6] or [9] for more details). In some places, for the ease of computation, we may also use the parametrizationλ_N =e^−µ^Nµ_N to get an equivalent definition of the Borel distribution:

P(X_N =l) = 1 T(λ_N)

l^l−1λ^l_N

l! , (18)

where T is the tree function (see, e.g., [7, p. 127]). Furthermore, we assume that Y_N given {X_N =l} is distributed as dl,l−1.

Let (X_N,i, Y_N,i)_16i6N be an i.i.d. sample distributed as (X_N, Y_N) and define, for allk ∈J1, NK, S_N,k :=

k

X

i=1

X_N,i and T_N,k :=

k

X

i=1

Y_N,i.

To lighten notation, let S_N := S_N,N and T_N := T_N,N. Notice that, for all N > 1, E[X_N] = (1−µ_N)⁻¹ =m_N/N, so E[S_N] =m_N. Moreover,P(S_N =m_N)>0 and we have the following identity (see [9, Lemma 4.1]):

L(d_m_N_,n_N) = L(T_N|S_N =m_N).