Sudoku Latin Square Sampling for Markov Chain Simulation

(1)

Sudoku Latin Square Sampling for Markov Chain Simulation

Rami El Haddad, Joseph El Maalouf, Christian L´ecot, and Pierre L’Ecuyer

Abstract We are interested in Monte Carlo simulations of discrete-time Markov chains on discrete and totally ordered state spaces. To improve simulation efficiency, we use a technique previously introduced in the context of quasi-Monte Carlo simulation of an array of N Markov chains. This method simulates the N copies of the chain simultaneously, reorders the chains at each step by increasing order of their states, and samples the next state by using N two-dimensional points in the unit square. The first coordinate of each point is used to match a chain, and the second coordinate is used to sample the next state by inversion from its cumulative distribution function conditional on the current state. We study the case where the N points are obtained at each step from Sudoku Latin square sampling, which means that (1) if the unit square is uniformly divided into N identical subsquares, exactly one point lies in each subsquare, (2) for each axis, the N projections of the points are distributed with exactly one projection in each of the N subintervals of length 1/N that partition the unit interval, and (3) in both cases, each individual point has the uniform distribution in the subsquare and interval to which it belongs. We prove that the variance of the Sudoku Latin square sampling estimator is of order O (N

^−3/2

). The same convergence rate is obtained when property (2) is removed, which gives simple stratified sampling. However, in our numerical experiments, we observe empirically a much smaller variance and better efficiency with the Sudoku Latin square sampling than with simple stratified sampling alone.

Rami El Haddad·Joseph El Maalouf

Laboratoire de Mathématiques et Applications, U.R. Mathématiques et modélisation, Faculté des sciences, Université Saint-Joseph, B.P. 7-5208, Mar Mikhaël Beyrouth 1104 2020, Liban, e-mail:

[email protected], e-mail: [email protected] Christian L´ecot

Université Grenoble Alpes, Université Savoie Mont Blanc, CNRS, LAMA, 73000 Chambéry, France, e-mail: [email protected]

Pierre L’Ecuyer

DIRO, Université de Montréal, C.P. 6128, Succ. Centre-Ville, Montréal, H3C 3J7, Canada, e-mail:

[email protected]

(2)

1 Introduction

We consider a discrete-time Markov chain {X

_n

, n ≥ 0} over a countable state space X , where X

n

∈ X is the state at step n, and we are interested in estimating by simulation the expected cost at step n, E [c(X

_n

)], for one or several cost functions c : X → [0, ∞) (we assume non-negative cost functions for simplicity). If the state space is finite with small cardinality and sometimes when the chain has a very special structure, it is possible to compute the exact distribution of X

_n

and the exact expected cost at step n, for any n. Otherwise, one can use standard Monte Carlo (MC): simulate the chain until step n, repeat N times independently, and average the N realizations of c(X

_n

). The main drawback of this general approach is its slow convergence: The variance of the Monte Carlo estimator of E [c(X

_n

)] typically converges a O (N

⁻¹

) for any n.

A (deterministic) quasi-Monte Carlo (QMC) method for Markov chains has been proposed in [9] for the case where the chain has a totally ordered state space. The method simulates an array of N copies of the chain in parallel. At each step n, it reorders the chains by increasing order of their states, and it uses two-dimensional quasi-random points to move them ahead by one step. Convergence (in the deterministic sense) of the average cost to the expectation when N → ∞ was established, and the QMC approach outperformed plain MC in numerical experiments. However, QMC error bounds are typically too loose and inconvenient for practical error assess- ment. A randomized quasi-Monte Carlo (RQMC) approach named Array-RQMC, which resembles the previous QMC scheme, was proposed and analyzed in [11, 12], in the setting of a Markov chain model with general state space. The method was shown to provide an unbiased estimator of E [c(X

_n

)] for any n, and variance bounds for this estimator were proved under certain conditions. In particular, it was proved that for a Markov chain with a one-dimensional state space, if stratified sampling as in [2, 8] is used at each step to advance the array of chains by one step, and under some technical conditions, the variance converges as O (N

^−3/2

), which beats Monte Carlo. In numerical experiments with Markov chains having one-dimensional and higher-dimensional states, the empirical variance was typically much smaller than the Monte Carlo variance, and was observed to decrease often at better rates than O (N

^−3/2

), sometimes even faster than O (N

⁻²

): see [12, 13, 14], for example.

However, no proof of these faster rates is available so far, and the O (N

^−3/2

) rate has been proved only for ordinary stratification of the unit square in identical subsquares, for a one-dimensional state. A related convergence-rate result worth mentioning was obtained in [7], in the context of particle filters. The authors proved that if the RQMC point set used at each step is a (t, m, s)-net with a nested uniform scramble [18] and if the states are sorted using a Hilbert-curve when their dimension is larger than 1, then the variance of the Array-RQMC estimator converges as o(N

⁻¹

), which is faster than Monte Carlo.

The aim of this paper is to increase our theoretical understanding of the method by

expanding the class of sampling methods for which an O (N

^−3/2

) convergence rate

is proved. We revisit the simple stratified sampling (SSS) setting and we consider

a Sudoku Latin square sampling (SLSS) setting, which combines two-dimensional

(3)

stratified sampling with Latin hypercube sampling [15, 20]. Our theoretical results are consolidated by three numerical experiments in which we observe a significantly lower variance with SLSS than with simple stratification.

SLSS turns out to be a special case of the U-sampling method of [21] for sampling N points in the unit hypercube. The U-sampling first generates a random orthogonal array-based Latin hypercube design, which is a selection of N small cubic boxes of side size 1/N that form an orthogonal array of strength t [16, 17] and a Latin hypercube at the same time. Then it samples one point uniformly inside each selected small box, independently across the boxes. For the special case where t = 2, this type of design (the selection of the boxes) gives a Sudoku Latin square [19] for each two-dimensional projection of the points. Thus, each two-dimensional projection satisfies the properties (1) to (3) mentioned in the abstract. An example of a Sudoku Latin square is given in Fig. 1. Since our SLSS is in two dimensions, there is a single two-dimensional projection and it must form a Sudoku Latin square. Sudoku Latin squares are also studied in [22], although these authors are only considering discrete designs and space filling constructions, and not in sampling random points uniformly in the unit hypercube.

0 1

1

0 1

1

0 1

1

Fig. 1 Example of a Sudoku Latin square with 16 points.

A different sampling method that generalizes the SLSS to more than two dimen-

sions was studied in [5]. In d dimensions, that method generates N = p

^d

points

in a way that (1) there is always one point per subcube when we partition the d-

dimensional unit cube into N identical subcubes, (2) there is one value in each

subinterval when we project all the points over a single coordinate to obtain N values

in the unit interval, and we partition the unit interval into N subintervals of length

1/N, and (3) each point taken individually has the uniform distribution in the sub-

cube to which it belongs. Variance bounds have been obtained when the integral of a

function over the unit cube is estimated by the average of the function values at the N

points, under certain assumptions on the integrand. Different types of variance results,

in terms of the ANOVA decomposition of the integrand, were proved in [21] for the

same type of integration problem with U-sampling. SLSS is the two-dimensional

special case of each of these two methods.

(4)

Our paper is the first to study the use of SLSS in the context of simulating an array of Markov chains. SLSS gives stronger constructions than simple stratified points over the unit square. Our aim is to investigate if, and how much, this strengthening has an impact on the variance of expected cost estimators.

The remainder is organized as follows. In Sect. 2, we define our setting for Monte Carlo simulation of discrete-state Markov chains and explain how we proceed with plain (standard) Monte Carlo, with SSS, and with SLSS. With SSS, the way we map the points to chain states at each step follows [6, 9] and differs from what was done in [12]. In Sect. 3, we analyze the variance of these schemes. We prove that the variance of the simulation estimator of an expected state-dependent cost at any given step n is O (N

^−3/2

) for both SSS and SLSS. This beats the known rate of O (N

⁻¹

) for standard Monte Carlo. Results of computational experiments and comparison between standard Monte Carlo, SSS, and SLSS are given in Sect. 4. The empirical convergence rates of the variance for SSS and SLSS are close to those established in Sect. 3, but the variance with SLSS is significantly smaller than with SSS. In Sect. 5, we give the technical proofs of some results and we conclude in Sect. 6.

2 Monte Carlo Simulations of Markov Chains

Let {X

_n

, n ≥ 0} be a stationary discrete-time Markov chain over a countable and ordered state space X . Without loss of generality one can assume that X = N or Z . Let P(i, j) = P (X

_n+1

= j|X

_n

= i) denote the transition probabilities and P P P = (P(i, j) : (i, j) ∈ X

²

) the transition probability matrix. We denote by µ

n

(i) = P (X

_n

= i) the state probabilities at step n and µ

n

= (µ

_n

(i) : i ∈ X ) the probability vector for step n. We assume that the initial probability vector µ

0

is given (often, it is degenerate over a single state).

For i, j ∈ X we set

q

_j

(i) := ∑

h≤j

P(i, h). (1)

We define the conditional cumulative distribution function F

_i

( j) := P (X

_n+1

≤ j|X

_n

= i) = q

_j

(i). If I denotes the unit interval (0, 1] we have a disjoint union I =

^S_j∈_X

I

i,j

, where I

i,j

:= (q

_j−1

(i), q

_j

(i)]. So that for any i ∈ X and u ∈ I, there exists a unique j ∈ X such that u ∈ I

_i,j

: we denote it by F

_i⁻¹

(u). If X

_n

= i, then F

_i⁻¹

(u) is the next state.

Let δ

i

be the Dirac measure at i, defined by δ

_i

( j) =

( 1 if j = i, 0 otherwise.

For any integer n, the distribution µ

n

is approximated by

(5)

b µ

n

:= 1 N

N k=1

∑

δ

_iⁿ

k

,

where N is a fixed integer and i

ⁿ₁

, . . . , i

ⁿ_N

are calculated iteratively.

First, a set (i

⁰_k

: 1 ≤ k ≤ N) of N states is sampled from µ

0

: several techniques are proposed in [4]. In many applications, the initial state of the chain is fixed, and then b µ

0

= µ

0

.

We describe the transition from step n to n + 1 for three Monte Carlo methods.

We introduce e µ

n+1

:= µ b

n

P P P as an intermediate distribution (which is not used in effective calculations). This µ e

_n+1

is an approximation of µ

_n+1

, but it is generally not an equally-weighted sum of Dirac measures, like µ b

_n

, so that an additional step is needed. We formulate this step as a quadrature: the MC methods correspond to quadrature algorithms, possibly combined with variance reduction techniques. To that end, let us consider an arbitrary sequence s = (s(i) : i ∈ X ) (a column vector);

we assume that s is non-negative, just to avoid worrying about convergence of series.

Then

e µ

n+1

s = µ b

_n

P P Ps = 1 N

N

∑

k=1

∑

j∈X

P(i

ⁿ_k

, j)s( j).

Let 1

_k

be the indicator function of the interval I

_k

:= ((k −1)/N, k/N] and 1

_i,_j

denote the indicator function of the interval I

_i,_j

. If we associate to s the function C

_sⁿ

defined by

C

ⁿ_s

(u) :=

N k=1

∑ ∑

j∈X

1

_k

(u

₁

)1

_iⁿ

k,j

(u

₂

)s( j), u = (u

₁

, u

₂

) ∈ I

²

, (2) then we have

µ e

n+1

s =

Z

I²

C

ⁿ_s

(u)du.

We obtain µ b

n+1

by approximating the integral with Monte Carlo estimation. In the following, if m is an integer, we denote [1, m] := {1, 2, . . . , m}. The notation U ∼ U ( E ) means that U is a random variable uniformly distributed over the set E .

2.1 Standard Monte Carlo

The transition from step n to step n + 1 acts as follows: if the state of the chain is i, i.e. X

_n

= i, then a random number U with U ∼ U (I) is generated and the new state of the chain is F

_i⁻¹

(U), i.e. X

_n+1

= F

_X⁻¹

n

(U). The operation is repeated N times independently, in order to advance N copies of the chain. With our notations, this may be written as follows. Let {U

_k

: 1 ≤ k ≤ N} be independent random variables with U

_k

∼ U (I), then

i

ⁿ⁺¹_k

= F

_i⁻¹n k

(U

_k

), 1 ≤ k ≤ N.

(6)

That is, if, for any non-negative sequence s,

X b

_sⁿ⁺¹

:= 1 N

N

∑

k=1

C

ⁿ_s

k − 1

N ,U

_k

, (3)

then

µ b

n+1

s = X b

_sⁿ⁺¹

. (4)

2.2 Simple Stratified Sampling

We suppose that N = p

²

, for some integer p > 0. The transition from n to n +1 has two steps: renumbering of the states and numerical integration.

(S1) The states are relabeled so that i

ⁿ₁

≤ · · · ≤ i

ⁿ_N

. The technique was used in the QMC context and ensures theoretical and numerical convergence of the scheme (see [9]) .

(S2) Consider a partition of I

²

into N squares: I

_`

= H

_`₁

×H

_`⁰

2

, where, for ` ∈ [1, p]

²

: H

_`₁

:= ((`

₁

−1)/p, `

₁

/ p] and H

_`₂

:= ((`

₂

− 1)/ p, `

₂

/p]. Let {V

_`

: ` ∈ [1, p]

²

} be independent random variables, where V

_`

= (V

_`,1

,V

_`,2

) ∼ U (I

_`

).

For an arbitrary non-negative sequence s, let Y b

_sⁿ⁺¹

:= 1

N ∑

`∈[1,p]²

C

ⁿ_s

(V

_`

), (5)

then

b µ

n+1

s = Y b

_sⁿ⁺¹

. (6) If u ∈ I, let

κ(u) := dNue, (7)

where dxe is the least integer greater than or equal to x. Hence equation (6) means that the next states are calculated as follows:

i

ⁿ⁺¹_(`

1−1)p+`₂

= F

_i⁻¹n κ(V`,1)

(V

_`,2

), ` ∈ [1, p]

²

(the numbering of the states i

ⁿ⁺¹_k

is arbitrary). The first projection V

_`,1

of V

_`

is

used for selecting the state at step n and the second projection V

_`,2

is used for

performing the transition to step n +1. Note that with this scheme, the mapping

between the N points and the N states is not necessarily one-to-one: it is possible

to pick the same state more than once and leave out some of the states. This differs

from the SSS scheme used in [12, 14].

(7)

2.3 Sudoku Latin Square Sampling

As before, we assume N = p

²

, and the transition from n to n + 1 has two steps:

renumbering of the states and numerical integration.

(S1) The states are relabeled so that i

ⁿ₁

≤ · · · ≤ i

ⁿ_N

.

(S2) We consider the same partition of I

²

as before: I

_`

for ` ∈ [1, p]

²

. Let {W

_`

: ` ∈ [1, p]

²

} be random variables, where W

_`

= (W

_`,1

, W

_`,2

), with

W

`,1

= `

₁

− 1

p + σ

1

(`

₂

) − 1 + ξ

_`¹

p

²

W

`,2

= `

₂

−1

p + σ

2

(`

₁

) −1 + ξ

_`²

p

²

.

Here σ

1

and σ

2

are random permutations of [1, p] and ξ

_`¹

∼ U (I) and ξ

_`²

∼ U (I).

All these random variables being independent. The set of values of the random variable W

_`

is included in I

_`

and has the properties:

(P1) for any ` ∈ [1, p]

²

, there is a unique point of this set in each square I

_`

, (P2) for any k ∈ [1,N], there is a unique point of this set in each rectangle I ×I

_k

or I

_k

×I.

In addition W

_`

∼ U (I

_`

). For an arbitrary non-negative sequence s, let Z b

_sⁿ⁺¹

:= 1

N ∑

`∈[1,p]²

C

ⁿ_s

(W

_`

). (8)

Then

b µ

n+1

s = Z b

_sⁿ⁺¹

. (9) Due to property (P2), the mapping (see (7))

` := (`

₁

, `

₂

) ∈ [1, p]

²

→ κ (W

_`,1

) ∈ [1, N]

is one-to-one: each state of step n is considered exactly once for a transition (this is not the case with SSS). Equation (9) means that the next states are calculated as follows:

i

ⁿ⁺¹_(`

1−1)p+`₂

= F

_i⁻¹n κ(W`,1)

(W

_`,2

), ` ∈ [1, p]

²

(as before, the numbering of the states i

ⁿ⁺¹_k

is arbitrary). The first projection W

_`,1

of W

_`

is used for selecting the state at step n and the second projection W

_`,2

is used for performing the transition to step n + 1.

3 Convergence Analysis

In this section we prove, for each method, that the estimator of the expected cost

at each step is unbiased and we establish that the variance of the estimator used is

(8)

O (N

⁻¹

) for standard MC and O (N

^−3/2

) for SSS and SLSS, where N is the number of simulation paths. In the following, λ is the Lebesgue measure and λ

₂

the two- dimensional Lebesgue measure; we put | E | for the number of elements of a set E . We use the sequence s

_h

, for h ∈ X :

s

_h

(i) :=

( 1 if i ≤ h, 0 otherwise.

The total variation of a sequence s = (s(i) : i ∈ X ) is defined by TV(s) := ∑

i∈X

|s(i + 1)− s(i)|. (10)

We use below the total variation of q

_h

, for h ∈ X . We recall that we have from (1):

q

_h

(i) = P (X

_n+1

≤ h|X

_n

= i), and from (10):

TV (q

_h

) = ∑

i∈X

| P (X

_n+1

≤ h|X

_n

= i + 1)− P (X

_n+1

≤ h|X

_n

= i)|.

In the following, we assume that M := sup

h∈X

TV (q

_h

) < +∞.

There are situations for which q

_h

is monotone and situations for which M < 1 (or both), but this is not always true. See [13, 14] for examples and further discussion.

Our M corresponds to Λ

_j

in [12].

3.1 Standard Monte Carlo

Lemma 1. Let s be a non-negative sequence. The standard Monte Carlo estimator of e µ

n+1

s:

X b

_sⁿ⁺¹

:= 1 N

N

∑

k=1

C

ⁿ_s

k − 1

N ,U

_k

has the following properties.

1. X b

_sⁿ⁺¹

is unbiased.

2. If s = s

_h

, for h ∈ X , then

Var(b X

_sⁿ⁺¹

h

) ≤ 1

4N . Proof.

1. We have

(9)

E

C

_sⁿ

k − 1

N ,U

_k

= ∑

j∈X

P(i

ⁿ_k

, j)s( j),

so that E [ X b

_sⁿ⁺¹

] = µ e

n+1

s.

2. The variable

C

_sⁿ

h

k −1 N ,U

_k

= ∑

j∈X,j≤h

1

_iⁿ

k,j

(U

_k

)

is a Bernoulli random variable, with variance ≤ 1/4. Hence the result.

u t We then obtain an error bound by using the same techniques as in [12]. We assume that, for any non-negative sequence s, the standard Monte Carlo estimator µ b

₀

s of µ

₀

s is unbiased and that, for any h ∈ X ,

Var( b µ

₀

s

_h

) ≤ x

₀

N ,

for some x

0

≥ 0 (as noticed before, in many applications, b µ

0

= µ

0

).

Proposition 1. For the standard Monte Carlo method, it holds:

1. for any non-negative sequence s

E [ b µ

_n

s] = µ

_n

s, 2. for any h ∈ X ,

Var( µ b

_n

s

_h

) ≤ x

n

N , where x

n+1

= M

²

x

n

+ 1/4 (n ≥ 0).

Proof.

1. We have

µ

n+1

s − µ b

n+1

s = µ

n+1

s− µ e

n+1

s + µ e

n+1

s− µ b

n+1

s = µ

n

P P Ps− b µ

n

P P Ps + µ e

n+1

s− X b

_sⁿ⁺¹

, so, by using Lemma 1, the result follows by induction.

2. The variables µ

n+1

s

_h

− µ e

n+1

s

_h

and µ e

n+1

s

_h

− b µ

n+1

s

_h

are uncorrelated and e µ

n+1

s

_h

− µ b

n+1

s

_h

= µ e

n+1

s

_h

− X b

_sⁿ⁺¹

h

is a centered variable, consequently Var( b µ

n+1

s

_h

) = E

(µ

_n+1

s

_h

− µ e

n+1

s

_h

)

²

+ E

( µ e

n+1

s

_h

− b µ

n+1

s

_h

)

²

. (11)

For any i ∈ X , we have P P Ps

_h

(i) = F

_i

(h), hence µ

n+1

s

_h

− µ e

n+1

s

_h

=µ

_n

P P Ps

_h

− µ b

n

P P Ps

_h

= ∑

i∈X

µ

n

(i)F

_i

(h)− ∑

i∈X

b µ

n

(i)F

_i

(h)

= − ∑

i∈X

µ

_n

s

_i

(F

_i+1

(h) −F

_i

(h)) + ∑

i∈X

µ b

_n

s

_i

(F

_i+1

(h) − F

_i

(h))

= ∑

i∈X

( b µ

n

s

_i

− µ

n

s

_i

)(F

_i+1

(h) − F

_i

(h)).

(10)

On the one hand, we write

E

(µ

_n+1

s

_h

− µ e

_n+1

s

_h

)

²

= E



 ∑

i∈X

( µ b

_n

s

_i

− µ

_n

s

_i

)(q

_h

(i +1) −q

_h

(i))

!

2





= ∑

(i,j)∈X²

E [( µ b

n

s

_i

− µ

n

s

_i

)(q

_h

(i +1) −q

_h

(i))( µ b

n

s

_j

− µ

n

s

_j

)(q

_h

( j + 1) − q

_h

( j))]

≤ ∑

(i,j)∈X²

q

Var( µ b

n

s

i

)Var( µ b

n

s

j

)|q

_h

(i +1) −q

_h

(i)| × |q

_h

( j + 1) − q

_h

( j)|.

On the other hand, Lemma 1 gives E

( µ e

n+1

s

_h

− b µ

n+1

s

_h

)

²

= E h

( µ e

n+1

s

_h

− X b

_sⁿ⁺¹

h

)

²

i

≤ 1 4N . So, by using (11), the result follows by induction.

u t The bounds for Var(b Y

_sⁿ⁺¹

h

) (SSS) and Var(b Z

_sⁿ⁺¹

h

) (SLSS) are not so easily obtained, and the proofs of Lemma 2 and 3 are given in Sect. 5.

3.2 Simple Stratified Sampling

Lemma 2. Let s be a non-negative sequence. The SSS estimator of µ e

_n+1

s:

Y b

_sⁿ⁺¹

:= 1

N ∑

`∈[1,p]²

C

ⁿ_s

(V

_`

)

has the following properties.

1. Y b

_sⁿ⁺¹

is unbiased.

2. If s = s

_h

, for h ∈ X , then Var(b Y

_sⁿ⁺¹

h

) ≤ 1

4N

^3/2

(TV (q

_h

) + 2).

A similar result (with the same N

^−3/2

order) was established in [12], but the SSS method studied there differs from the one used here, so we provide a different proof (see Sect. 5). Intermediate results from this proof (eqs. (12) and (13)) will be re-used afterwards for the analysis of SLSS.

The proof of the next result is similar to the proof of Proposition 1. We assume that, for any non-negative sequence s, the SSS estimator b µ

₀

s of µ

₀

s is unbiased and that, for any h ∈ X ,

Var( µ b

0

s

_h

) ≤ y

₀

N

^3/2

for some y

₀

≥ 0.

(11)

Proposition 2. For the SSS method, it holds:

1. for any non-negative sequence s

E [ b µ

n

s] = µ

n

s, 2. for any h ∈ X ,

Var( b µ

n

s

_h

) ≤ y

_n

N

^3/2

, where y

_n+1

= M

²

y

_n

+ (M + 2)/4 (n ≥ 0).

3.3 Sudoku Latin Square Sampling

Lemma 3. Let s be a non-negative sequence. The SLSS estimator of e µ

n+1

s:

Z b

_sⁿ⁺¹

:= 1

N ∑

`∈[1,p]²

C

_sⁿ

(W

_`

)

has the following properties.

1. Z b

_sⁿ⁺¹

is unbiased.

2. If s = s

_h

, for h ∈ X , and if q

_h

is a piecewise monotonic sequence, with r pieces, then

Var(b Z

ⁿ⁺¹_s

h

) ≤ 1

N

^3/2

13 4 + r

(TV (q

_h

) + 2) + 2(TV (q

_h

) + 2)

²

.

The constant involved in the O (N

^−3/2

) bound of Var(b Z

ⁿ⁺¹_s

h

) (SLSS) is larger than the corresponding constant for Var(b Y

_sⁿ⁺¹

h

) (SSS), since r ≥ 1 in Lemma 3; this would suggest degraded performance. But in the examples of Sect. 4 we see that it is not necessarily the case.

The proof of the next result is similar to the proof of Proposition 1. We assume that, for any non-negative sequence s, the SLSS estimator µ b

0

s of µ

0

s is unbiased and that, for any h ∈ X ,

Var( b µ

₀

s

_h

) ≤ z

₀

N

^3/2

, for some z

₀

≥ 0.

Proposition 3. For the SLSS method, it holds:

1. for any non-negative sequence s

E [ b µ

n

s] = µ

n

s, 2. for any h ∈ X ,

Var( b µ

n

s

_h

) ≤ z

_n

N

^3/2

,

(12)

where z

n+1

= M

²

z

_n

+ (13/4 + r)(M +2) + 2(M + 2)

²

n ≥ 0.

Remark 1. The bounds in Propositions 1, 2, and 3 increase exponentially with n when M > 1, and remains bounded if M < 1, which is not uncommon; see [14].

Remark 2. The variance of each estimator is bounded for a test sequence of the form s

_h

, for h ∈ X . We obtain a bound for a nonnegative cost function c by the same reasoning as in Proposition 1 (see [12]):

µ

n

c − b µ

n

c = ∑

i∈X

( b µ

n

s

_i

− µ

n

s

_i

)(c(i + 1) − c(i)), hence

E

(µ

_n

c − b µ

n

c)

²

≤ ∑

(i,j)∈X²

q

Var( µ b

n

s

_i

)Var( b µ

n

s

_j

)|c(i +1)−c(i)|×|c( j +1)−c( j)|, and then

Var( µ b

n

c) ≤ TV(c)

²

× sup

h∈X

Var( µ b

n

s

_h

).

4 Numerical Examples

In this section, we compare standard Monte Carlo to the variance reduction strategies analyzed previously: SSS and SLSS, for three examples. For each example, each strategy, and each N considered, we compute the unbiased estimator ˆ µ

_n

s of µ

_n

s for the selected n, replicate this 100 times, and compute the empirical variance Var of the 100 realizations of ˆ µ

_n

s. We then plot log

₁₀

Var as a function of log

₁₀

N. Assuming that Var ≈ KN

^−α

for some positive constants K and α, we estimate the variance rate α by linear regression. We also compute the (empirical) efficiency of each simulation estimator, defined as the inverse of the product of Var by the CPU time [10], and we plot log

₁₀

efficiency as a function of log

₁₀

N. Note that for standard Monte Carlo, the efficiency does not depend on N. For SSS and SLSS, it takes into account the additional work to compute the estimators.

4.1 A Geo/Geo/1 Queue

We consider a discrete-time Geo/Geo/1 queue (see [1]): the queue is empty at the initial time. During each unit of time, the customer in service (if there is one) completes it with probability 0.5, and one new customer arrives with probability 0.6.

We estimate the mean number of customers in the queue at time n = 12. Figure 2

(top) shows log

₁₀

Var as a function of log

₁₀

N on the left and log

₁₀

efficiency as a

function of log

₁₀

N on the right, for N = 10

²

, 50

²

, 100

²

, . . . , 1 000

²

. We find from the

(13)

plots that SSS and SLSS give not only smaller variances than standard MC (for the same N), but also better efficiencies, and that SLSS outperforms SSS. The regression estimates of α are given in the first row of Table 1. They match the upper bounds of O (N

^−3/2

) established in Sect. 3.

4.2 A Gambler in a Casino

A gambler is going to a casino for four hours. He plans to play the same game every ten seconds (so he will play 1 440 times). At this game, for each Euro that he bids, he gets 0 with probability 0.9 and m ∈ {1, 2, . . . , 10} with probability 0.01 each. His policy is the following: if he has more than 100 Euros, he plays 2 Euros, but if he has 100 Euros or less, he plays only 1. To make sure that he can play during the four hours, he brings 2 780 Euros with him. The model is a Markov chain on state space E = [0, 28 700]. We estimate the probability that the gambler has more than 1 500 Euros at the end of the game. Here we use N = 10

²

, 20

²

, 30

²

, . . . , 200

²

. The results are reported in the middle rows of Fig. 2 and Table 1. We find that SSS and SLSS produce both smaller variances and better efficiencies than standard MC for large enough N, and that SLSS outperforms SSS. The regression estimate of α for SLSS corresponds to the bound established in Sect. 3, but for SSS it is better. However, the variance itself is smaller for SLSS than for SSS, and the better rate of SSS might not hold beyond the observed range (it is likely caused by a few poor values of Var for the smallest values of N).

4.3 Diffusion

The 1-D diffusion equation

∂ c

∂ t (x,t) = D ∂

²

c

∂ t

²

(x, t), x ∈ R , t > 0 and c(x, 0) = c

₀

(x), x ∈ R

(where c

0

≥ 0 and

^R_R

c

0

(x)dx = 1) may be discretized with a time step ∆ t and a spatial step ∆ x and the solution is approximated using a random walk: P(i,i −1) = P(i, i + 1) = D∆ t/∆x

²

, P(i,i) = 1 −2D∆t/∆ x

²

(we refer to [3] in a QMC context). Here we specify D = 1 and c

₀

is the indicator function of the interval [−1/2, 1/2]; we want to approximate

^R_−1/2^1/2

c(x, T )dx. We choose T = 1, with ∆t = 6.25 10

⁻⁴

and ∆ x = 5.0 10

⁻²

. Here we take N = 11

²

, 19

²

, 31

²

, . . . , 199

²

. In a previous version of the paper, we had N = 10

²

, 20

²

, 30

²

, . . . , 200

²

, but this gave oscillations in SLSS outcomes, with better results when p = √

N was a multiple of 20, because of interactions with other

discretization parameters. To avoid this, we changed our choices of p. So we use

prime numbers near the previous ones for p. The bottom part of Figure 2 and the

(14)

last row of Table 1 give the results, which are very similar to those of the previous example. Again, SLSS outperforms the other methods.

Table 1 Calculation of orderαof the sample variance: comparison of standard Monte Carlo (MC), SSS, and SLSS for three examples and estimators

Calculation MC SSS SLSS

Geo/Geo/1 queue E[X12] 0.99 1.50 1.51

Gambler in a casino P(X_{1 440}>1 500) 1.02 1.76 1.53

Diffusion ^R_−1/2^1/2 c(x,T)dx 0.98 1.61 1.49

5 The Proofs

5.1 Proof of Lemma 2

Proof.

1. We have

E [C

ⁿ_s

(V

_`

)] = N

N

∑

k=1

∑

j∈X

Z I_`

1

_k

(u

₁

)1

_iⁿ

k,j

(u

₂

)du

s( j).

Hence

E [b Y

_sⁿ⁺¹

] = 1 N

N

∑

k=1

∑

j∈X

P(i

ⁿ_k

, j)s( j) = µ e

_n+1

s.

2. The function C

_sⁿ

h

is the indicator function of the set J

_hⁿ

:=

N [

k=1

I

_k

×

^[

j∈X,j≤h

I

_iⁿ

k,j

!

=

N [

k=1

I

_k

× (0, q

_h

(i

ⁿ_k

)]. (12) The variable C

_sⁿ

h

(V

_`

) is a Bernoulli random variable, with expectation f

_h,`ⁿ

= Nλ

₂

(J

_hⁿ

∩ I

_`

). Here f

_h,`ⁿ

= 1 if I

_`

⊂ J

_hⁿ

and f

_h,`ⁿ

= 0 if I

_`

∩ J

_hⁿ

= /0. Consequently, Var(C

_sⁿ

h

(V

_`

)) = f

_h,`ⁿ

(1− f

_h,`ⁿ

) ≤ 1/4 and Var(C

_sⁿ

h

(V

_`

)) = 0 if I

`

⊂ J

_hⁿ

or if I

`

∩J

_hⁿ

= /0, so that

Var(b Y

_sⁿ⁺¹

h

) ≤ 1

4N

²

{` ∈ [1, p]

²

: I

_`

6⊂ J

_hⁿ

and I

_`

∩ J

_hⁿ

6= /0}

.

a. If I

_`

6⊂ J

_hⁿ

, then there exists (u

₁

,u

₂

) which belongs to I

_`

and not to J

_hⁿ

; since

this u

₁

is in some I

_k

, we have:

(15)

1. Geo/Geo/1 queue: sample variance (left) and efficiency (right) of 100 copies of the calculation ofE[X12]as a function ofN(N=10²,50²,100², . . . ,1 000²) (log-log scale)

2. Gambler in a casino: sample variance (left) and efficiency (right) of 100 copies of the calculation ofP(X_{1 440}>1 500)as a function ofN(N=10²,20²,30², . . . ,200²) (log-log scale)

3. Diffusion: sample variance (left) and efficiency (right) of 100 copies of the calculation of R1/2

−1/2c(x,T)dxas a function ofN(N=11²,19²,31², . . . ,199²) (log-log scale)

Fig. 2 Comparison of standard Monte Carlo (MC) to SSS and SLSS (Sudoku) for three examples

(16)

∃k ∈ [1, N], ∃(u

₁

, u

2

) ∈ I

`

: u

1

∈ I

_k

⊂

`

₁

− 1 p , `

₁

p

and u

2

∈ / (0, q

_h

(i

ⁿ_k

)], so that

∃k ∈ { p(`

₁

−1) + 1, p(`

₁

− 1) + 2, . . . , p`

₁

}, ∃u

₂

∈

`

₂

− 1 p , `

₂

p

: u

2

> q

_h

(i

ⁿ_k

), consequently

`

₂

> p min

p(`₁−1)<k≤p`₁

q

_h

(i

ⁿ_k

).

b. Analogously, if I

_`

∩ J

_hⁿ

6= /0, then there exists (u

₁

,u

₂

) which belongs to I

_`

and also to J

_hⁿ

; and we eventually obtain:

`

₂

< p max

p(`₁−1)<k≤p`₁

q

_h

(i

ⁿ_k

) + 1.

We then have the following bounds

{` ∈ [1, p]

²

: I

`

6⊂ J

_hⁿ

and I

`

∩ J

_hⁿ

6= /0}

≤ p

p

∑

`₁=1

max

p(`₁−1)<k≤p`₁

q

_h

(i

ⁿ_k

)− min

p(`₁−1)<k≤p`₁

q

_h

(i

ⁿ_k

)

+ 2

!

≤ N

^1/2

∑

i∈X

|q

_h

(i + 1) − q

_h

(i)| + 2

! ,

because the states are relabeled so that i

ⁿ₁

≤ · · · ≤ i

ⁿ_N

. Consequently, {` ∈ [1, p]

²

: I

_`

6⊂ J

_hⁿ

and I

_`

∩ J

_hⁿ

6= /0}

≤ N

^1/2

(TV (q

_h

) + 2), (13) and the result follows.

u t

5.2 Proof of Lemma 3

Proof.

1. Since W

`

∼ U (I

_`

), the demonstration is the same as in Lemma 2.

2. In the following, we have many summations with `, `

⁰

, m, m

⁰

∈ [1, p]

²

. In order to lighten the notations, we omit this set. We have

Var(b Z

_sⁿ⁺¹

h

) = V

₀

(b Z

_sⁿ⁺¹

h

) + 1

N

²

∑

(`,`⁰):`6=`⁰

Cov C

_sⁿ

h

(W

_`

),C

ⁿ_s

h

(W

_`⁰

) ,

where

(17)

V

₀

(b Z

_sⁿ⁺¹

h

) := 1

N

²

∑

`

Var C

_sⁿ

h

(W

_`

) . a. The function C

_sⁿ

h

is the indicator function of the set J

_hⁿ

defined by (12). Since W

_`

∼ U (I

_`

), we have, as in Lemma 2:

V

₀

(b Z

_sⁿ⁺¹

h

) ≤ 1

4N

²

{` ∈ [1, p]

²

; I

_`

6⊂ J

_hⁿ

and I

_`

∩ J

_hⁿ

6= /0}

.

From the bound (13), we deduce V

₀

(b Z

_sⁿ⁺¹

h

) ≤ 1

4N

^3/2

(TV (q

_h

) + 2).

b. We split Var(b Z

ⁿ⁺¹_s

h

) = V

₀

(b Z

_sⁿ⁺¹

h

) + V

₁

(b Z

ⁿ⁺¹_s

h

) + V

₂

(b Z

_sⁿ⁺¹

h

) + V

₃

(b Z

_sⁿ⁺¹

h

), with V

₁

(b Z

_sⁿ⁺¹

h

) := 1

N

²

∑

(`,`⁰):`₁6=`⁰₁,`₂=`⁰₂

Cov C

_sⁿ

h

(W

_`

),C

ⁿ_s

h

(W

_`0

) ,

V

₂

(b Z

_sⁿ⁺¹

h

) := 1

N

²

∑

(`,`⁰):`₁=`⁰₁,`₂6=`⁰

2

Cov C

_sⁿ

h

(W

_`

),C

ⁿ_s

h

(W

_`⁰

) ,

V

₃

(b Z

_sⁿ⁺¹

h

) := 1

N

²

∑

(`,`⁰):`₁6=`⁰₁,`₂6=`⁰₂

Cov C

_sⁿ

h

(W

_`

),C

ⁿ_s

h

(W

_`⁰

) .

We introduce the N

²

squares I

_`,m

= H

_`₁_,m₁

× H

_`₂_,m₂

, where, for (`, m) ∈ [1, p]

⁴

: H

`₁,m₁

:=((`

₁

− 1)/ p + (m

₁

− 1)/N, (`

₁

− 1)/ p +m

₁

/N],

H

_`₂_,m₂

:=((`

₂

− 1)/ p + (m

₂

− 1)/N, (`

₂

− 1)/ p +m

₂

/N].

We have

(18)

V

₁

(b Z

_sⁿ⁺¹

h

) = ∑

(`,`⁰):`₁6=`⁰

1,`₂=`⁰₂

N

p − 1 ∑

(m,m⁰):m₁=m⁰₁,m₂6=m⁰

2

λ

₂

(I

_`,m

∩ J

_hⁿ

)λ

₂

(I

_`⁰_,m⁰

∩ J

_hⁿ

)

− λ

2

(I

_`

∩ J

_hⁿ

)λ

₂

(I

_`⁰

∩ J

_hⁿ

)

! ,

V

₂

(b Z

_sⁿ⁺¹

h

) = ∑

(`,`⁰):`₁=`⁰₁,`₂6=`⁰₂

N

p − 1 ∑

(m,m⁰):m₁6=m⁰₁,m₂=m⁰₂

λ

2

(I

_`,m

∩ J

_hⁿ

)λ

₂

(I

_`⁰_,m⁰

∩ J

_hⁿ

)

− λ

2

(I

_`

∩ J

_hⁿ

)λ

₂

(I

_`⁰

∩ J

_hⁿ

)

! ,

V

₃

(b Z

_sⁿ⁺¹

h

) = ∑

(`,`⁰):`₁6=`⁰₁,`₂6=`⁰₂

N

(p −1)

²

∑

(m,m⁰):m₁6=m⁰₁,m₂6=m⁰₂

λ

2

(I

_`,m

∩ J

_hⁿ

)λ

₂

(I

_`⁰_,m⁰

∩ J

_hⁿ

)

− λ

2

(I

_`

∩ J

_hⁿ

)λ

₂

(I

_`0

∩ J

_hⁿ

)

! .

i. We have

V

₁

(b Z

ⁿ⁺¹_s

h

) = ∑

`:I_`6⊂Jⁿ

h,I_`∩Jⁿ

h6=/0

∑

`⁰:`⁰₁6=`₁,`⁰₂=`₂

V

₁

(`, `

⁰

), where

V

₁

(`, `

⁰

) := N

p −1 ∑

(m,m⁰):m₁=m⁰₁,m₂6=m⁰

2

λ

2

(I

_`,m

∩ J

_hⁿ

)λ

₂

(I

_`⁰_,m⁰

∩ J

_hⁿ

)

− λ

₂

(I

_`

∩ J

_hⁿ

)λ

₂

(I

_`0

∩ J

_hⁿ

).

We split V

1

(`, `

⁰

) = V ˆ

1

(`, `

⁰

) + V ˇ

1

(`, `

⁰

), with V ˆ

1

(`, `

⁰

) := N

p − 1 ∑

(m,m⁰):m1=m⁰₁,m26=m⁰₂

λ

2

(I

_`,m

∩ J

_hⁿ

)λ

₂

(I

_`0,m⁰

∩ J

_hⁿ

)

− N

p ∑

(m,m⁰):m1=m⁰₁

λ

2

(I

_`,m

∩ J

_hⁿ

)λ

₂

(I

_`0,m⁰

∩ J

_hⁿ

), V ˇ

1

(`, `

⁰

) :=p

× ∑

(m,m⁰):m₁=m⁰₁

λ

₂

(I

_`,m

∩ J

_hⁿ

)λ

₂

(I

_`⁰_,m⁰

∩ J

_hⁿ

) − λ

₂

(I

_`

∩ J

_hⁿ

)λ

₂

(I

_`⁰

∩ J

_hⁿ

).

On one side

(19)

V ˆ

₁

(`, `

⁰

) =N ∑

m

λ

₂

(I

_`,m

∩ J

_hⁿ

)

×



 1

p(p − 1) ∑

m⁰:m⁰₁=m1,m⁰₂6=m₂

λ

2

(I

_`0,m⁰

∩ J

_hⁿ

) − 1

p λ

2

(I

_`0,m

∩ J

_hⁿ

)



.

Since both terms inside the parentheses are bounded by 1/(pN

²

), we have

| V ˆ

₁

(`, `

⁰

)| ≤ 1/(pN

²

) and so

∑

`⁰:`⁰₁6=`₁,`⁰₂=`2

V ˆ

1

(`, `

⁰

)

≤ p − 1 pN

²

.

On the other side V ˇ

₁

(`, `

⁰

) = ∑

m₁∈[1,p]

λ

2

((H

_`₁_,m₁

× H

_`₂

)∩ J

_hⁿ

) ∑

m⁰₁∈[1,p]

V ˇ

₁

(`, `

⁰

,m

₁

, m

⁰₁

),

where ˇ V

₁

(`, `

⁰

, m

₁

, m

⁰₁

) := λ

₂

((H

_`⁰

1,m₁

×H

_`₂

)∩ J

_hⁿ

)−λ

₂

((H

_`⁰

1,m⁰₁

×H

_`₂

) ∩J

_hⁿ

).

We have

V ˇ

1

(`, `

⁰

, m

1

, m

⁰₁

) = 1 N

λ

H

`₂

∩ (0,q

_h

(i

ⁿ_p(`0

1−1)+m₁

)]

−λ

H

_`₂

∩ (0,q

_h

(i

ⁿ_p(`0

1−1)+m⁰₁

)]

.

As we have

| V ˇ

₁

(`, `

⁰

, m

₁

,m

⁰₁

)| ≤ 1 N

×λ H

_`₂

∩

"

min

p(`⁰₁−1)<k≤p`⁰₁

q

_h

(i

ⁿ_k

), max

p(`⁰₁−1)<k≤p`⁰₁

q

_h

(i

ⁿ_k

)

#!

,

we deduce

| V ˇ

₁

(`, `

⁰

)| ≤ 1

N p λ H

_`₂

∩

"

min

p(`⁰₁−1)<k≤p`⁰₁

q

_h

(i

ⁿ_k

), max

p(`⁰₁−1)<k≤p`⁰₁

q

_h

(i

ⁿ_k

)

#!

.

Consequently

∑

`⁰:`⁰₁6=`₁,`⁰₂=`₂

V ˇ

1

(`, `

⁰

)

≤ 1 N p

× ∑

`⁰₁∈[1,p]:`⁰

16=`₁

λ H

_`₂

∩

"

min

p(`⁰₁−1)<k≤p`⁰₁

q

_h

(i

ⁿ_k

), max

p(`⁰₁−1)<k≤p`⁰₁

q

_h

(i

ⁿ_k

)

#!

.

(20)

Since q

_h

is a piecewise monotonic sequence, and because the states are relabeled so that i

ⁿ₁

≤ · · · ≤ i

ⁿ_N

the intervals

min

p(`⁰₁−1)<k≤p`⁰₁

q

_h

(i

ⁿ_k

), max

p(`⁰₁−1)<k≤p`⁰₁

q

_h

(i

ⁿ_k

)

!

, `

⁰₁

∈ [1, p]

are pairwise disjoint on each of the r pieces where q

_h

is monotonic, and we obtain

∑

`⁰:`⁰₁6=`₁,`⁰₂=`₂

V ˇ

₁

(`, `

⁰

)

≤ r

N p λ (H

_`₂

) ≤ r N

²

. And so, using the bound (13):

|V

₁

(b Z

ⁿ⁺¹_s

h

)| ≤ (r +1)p − 1

pN

^3/2

(TV (q

_h

) + 2).

ii. We have

V

2

(b Z

ⁿ⁺¹_s

h

) = ∑

`:I_`6⊂Jⁿ

h,I_`∩Jⁿ

h6=/0

∑

`⁰:`⁰₁=`₁,`⁰₂6=`₂

V

2

(`, `

⁰

), where

V

₂

(`, `

⁰

) := N p − 1

× ∑

(m,m⁰):m₁6=m⁰₁,m₂=m⁰₂

λ

2

(I

_`,m

∩ J

_hⁿ

)λ

₂

(I

_`⁰_,m⁰

∩ J

_hⁿ

)− λ

2

(I

_`

∩ J

_hⁿ

)λ

₂

(I

_`⁰

∩ J

_hⁿ

).

We split V

₂

(`, `

⁰

) = V ˆ

₂

(`, `

⁰

) + V ˇ

₂

(`, `

⁰

), with V ˆ

₂

(`, `

⁰

) := N

p − 1 ∑

(m,m⁰):m₁6=m⁰₁,m₂=m⁰₂

λ

₂

(I

_`,m

∩ J

_hⁿ

)λ

₂

(I

_`0,m⁰

∩ J

_hⁿ

)

− N

p ∑

(m,m⁰):m₂=m⁰₂

λ

₂

(I

_`,m

∩ J

_hⁿ

)λ

₂

(I

_`0,m⁰

∩ J

_hⁿ

), V ˇ

₂

(`, `

⁰

) :=p ∑

(m,m⁰):m₂=m⁰₂

λ

₂

(I

_`,m

∩ J

_hⁿ

)λ

₂

(I

_`⁰_,m⁰

∩ J

_hⁿ

)− λ

₂

(I

_`

∩ J

_hⁿ

)λ

₂

(I

_`⁰

∩ J

_hⁿ

).

On one side V ˆ

2

(`, `

⁰

) =N ∑

m

λ

2

(I

_`,m

∩ J

_hⁿ

)

×



 1

p(p − 1) ∑

m⁰:m⁰₁6=m₁,m⁰₂=m₂

λ

2

(I

_`⁰_,m⁰

∩ J

_hⁿ

) − 1

p λ

2

(I

_`⁰_,m

∩ J

_hⁿ

)



.

(21)

Since both terms inside the parentheses are bounded by 1/(pN

²

), we have

| V ˆ

₂

(`, `

⁰

)| ≤ 1/(pN

²

) and so

∑

`⁰:`⁰₁=`1,`⁰₂6=`₂

V ˆ

2

(`, `

⁰

)

≤ p − 1 pN

²

.

On the other side V ˇ

₂

(`, `

⁰

) = ∑

m₂∈[1,p]

λ

2

((H

_`₁

×H

_`₂_,m₂

)∩ J

_hⁿ

) ∑

m⁰₂∈[1,p]

V ˇ

₂

(`, `

⁰

,m

₂

, m

⁰₂

),

where ˇ V

₂

(`, `

⁰

, m

₂

, m

⁰₂

) := λ

₂

((H

_`₁

×H

_`⁰

2,m2

) ∩J

_hⁿ

)−λ

₂

((H

_`₁

×H

_`⁰

2,m⁰₂

)∩J

_hⁿ

);

we have

V ˇ

₂

(`, `

⁰

, m

₂

, m

⁰₂

) = 1

N ∑

m⁰₁∈[1,p]

λ

H

_`0

2,m₂

∩ (0, q

_h

(i

ⁿ_p(`

1−1)+m⁰₁

)]

−λ H

_`⁰

2,m⁰₂

∩ (0, q

_h

(i

ⁿ_p(`

1−1)+m⁰₁

)]

. Note that the difference in the parentheses is equal to 0 if `

⁰₂

6= `

⁰₂

(`

₁

, m

⁰₁

) :=

dnq

_h

(i

ⁿ_p(`

1−1)+m⁰₁

)e +1. Consequently

∑

`⁰:`⁰₁=`₁,`⁰₂6=`₂

V ˇ

₂

(`, `

⁰

)

≤ 1

N ∑

m₂∈[1,p]

λ

2

((H

_`₁

× H

_`₂_,m₂

) ∩ J

_hⁿ

)

× ∑

m⁰

λ

H

_`⁰

2(`₁,m⁰₁),m₂

∩ (0, q

_h

(i

ⁿ_p(`

1−1)+m⁰₁

)]

−λ H

_`⁰

2(`₁,m⁰₁),m⁰₂

∩ (0,q

_h

(i

ⁿ_p(`

1−1)+m⁰₁

)]

≤ 1

N ∑

m₂∈[1,p]

λ

2

((H

_`₁

× H

_`₂_,m₂

) ∩ J

_hⁿ

) ≤ 1

N λ

2

(I

_`

) = 1 N

²

.

And so, using the bound (13):

|V

₂

(b Z

_sⁿ⁺¹

h

)| ≤ 2 p −1

pN

^3/2

(TV (q

_h

) + 2).

iii. We have

V

₃

(b Z

ⁿ⁺¹_s

h

) = ∑

`:I_`6⊂Jⁿ

h,I_`∩Jⁿ

h6=/0

∑

`⁰:`⁰₁6=`₁,`⁰₂6=`₂

I`0 6⊂Jn h,I

`0 ∩Jn

h6=/0

V

₃

(`, `

⁰

),

where