Sudoku Latin Square Sampling for Markov Chain Simulation
Rami El Haddad, Joseph El Maalouf, Christian L´ecot, and Pierre L’Ecuyer
Abstract We are interested in Monte Carlo simulations of discrete-time Markov chains on discrete and totally ordered state spaces. To improve simulation efficiency, we use a technique previously introduced in the context of quasi-Monte Carlo simula- tion of an array of N Markov chains. This method simulates the N copies of the chain simultaneously, reorders the chains at each step by increasing order of their states, and samples the next state by using N two-dimensional points in the unit square. The first coordinate of each point is used to match a chain, and the second coordinate is used to sample the next state by inversion from its cumulative distribution function conditional on the current state. We study the case where the N points are obtained at each step from Sudoku Latin square sampling, which means that (1) if the unit square is uniformly divided into N identical subsquares, exactly one point lies in each subsquare, (2) for each axis, the N projections of the points are distributed with exactly one projection in each of the N subintervals of length 1/N that partition the unit interval, and (3) in both cases, each individual point has the uniform distribution in the subsquare and interval to which it belongs. We prove that the variance of the Sudoku Latin square sampling estimator is of order O (N
−3/2). The same conver- gence rate is obtained when property (2) is removed, which gives simple stratified sampling. However, in our numerical experiments, we observe empirically a much smaller variance and better efficiency with the Sudoku Latin square sampling than with simple stratified sampling alone.
Rami El Haddad·Joseph El Maalouf
Laboratoire de Math´ematiques et Applications, U.R. Math´ematiques et mod´elisation, Facult´e des sciences, Universit´e Saint-Joseph, B.P. 7-5208, Mar Mikha¨el Beyrouth 1104 2020, Liban, e-mail:
[email protected], e-mail: [email protected] Christian L´ecot
Universit´e Grenoble Alpes, Universit´e Savoie Mont Blanc, CNRS, LAMA, 73000 Chamb´ery, France, e-mail: [email protected]
Pierre L’Ecuyer
DIRO, Universit´e de Montr´eal, C.P. 6128, Succ. Centre-Ville, Montr´eal, H3C 3J7, Canada, e-mail:
1 Introduction
We consider a discrete-time Markov chain {X
n, n ≥ 0} over a countable state space X , where X
n∈ X is the state at step n, and we are interested in estimating by simulation the expected cost at step n, E [c(X
n)], for one or several cost functions c : X → [0, ∞) (we assume non-negative cost functions for simplicity). If the state space is finite with small cardinality and sometimes when the chain has a very special structure, it is possible to compute the exact distribution of X
nand the exact expected cost at step n, for any n. Otherwise, one can use standard Monte Carlo (MC): simulate the chain until step n, repeat N times independently, and average the N realizations of c(X
n). The main drawback of this general approach is its slow convergence: The variance of the Monte Carlo estimator of E [c(X
n)] typically converges a O (N
−1) for any n.
A (deterministic) quasi-Monte Carlo (QMC) method for Markov chains has been proposed in [9] for the case where the chain has a totally ordered state space. The method simulates an array of N copies of the chain in parallel. At each step n, it reorders the chains by increasing order of their states, and it uses two-dimensional quasi-random points to move them ahead by one step. Convergence (in the deter- ministic sense) of the average cost to the expectation when N → ∞ was established, and the QMC approach outperformed plain MC in numerical experiments. However, QMC error bounds are typically too loose and inconvenient for practical error assess- ment. A randomized quasi-Monte Carlo (RQMC) approach named Array-RQMC, which resembles the previous QMC scheme, was proposed and analyzed in [11, 12], in the setting of a Markov chain model with general state space. The method was shown to provide an unbiased estimator of E [c(X
n)] for any n, and variance bounds for this estimator were proved under certain conditions. In particular, it was proved that for a Markov chain with a one-dimensional state space, if stratified sampling as in [2, 8] is used at each step to advance the array of chains by one step, and under some technical conditions, the variance converges as O (N
−3/2), which beats Monte Carlo. In numerical experiments with Markov chains having one-dimensional and higher-dimensional states, the empirical variance was typically much smaller than the Monte Carlo variance, and was observed to decrease often at better rates than O (N
−3/2), sometimes even faster than O (N
−2): see [12, 13, 14], for example.
However, no proof of these faster rates is available so far, and the O (N
−3/2) rate has been proved only for ordinary stratification of the unit square in identical subsquares, for a one-dimensional state. A related convergence-rate result worth mentioning was obtained in [7], in the context of particle filters. The authors proved that if the RQMC point set used at each step is a (t, m, s)-net with a nested uniform scramble [18] and if the states are sorted using a Hilbert-curve when their dimension is larger than 1, then the variance of the Array-RQMC estimator converges as o(N
−1), which is faster than Monte Carlo.
The aim of this paper is to increase our theoretical understanding of the method by
expanding the class of sampling methods for which an O (N
−3/2) convergence rate
is proved. We revisit the simple stratified sampling (SSS) setting and we consider
a Sudoku Latin square sampling (SLSS) setting, which combines two-dimensional
stratified sampling with Latin hypercube sampling [15, 20]. Our theoretical results are consolidated by three numerical experiments in which we observe a significantly lower variance with SLSS than with simple stratification.
SLSS turns out to be a special case of the U-sampling method of [21] for sampling N points in the unit hypercube. The U-sampling first generates a random orthogonal array-based Latin hypercube design, which is a selection of N small cubic boxes of side size 1/N that form an orthogonal array of strength t [16, 17] and a Latin hypercube at the same time. Then it samples one point uniformly inside each selected small box, independently across the boxes. For the special case where t = 2, this type of design (the selection of the boxes) gives a Sudoku Latin square [19] for each two-dimensional projection of the points. Thus, each two-dimensional projection satisfies the properties (1) to (3) mentioned in the abstract. An example of a Sudoku Latin square is given in Fig. 1. Since our SLSS is in two dimensions, there is a single two-dimensional projection and it must form a Sudoku Latin square. Sudoku Latin squares are also studied in [22], although these authors are only considering discrete designs and space filling constructions, and not in sampling random points uniformly in the unit hypercube.
0 1
1
0 1
1
0 1
1
Fig. 1 Example of a Sudoku Latin square with 16 points.
A different sampling method that generalizes the SLSS to more than two dimen-
sions was studied in [5]. In d dimensions, that method generates N = p
dpoints
in a way that (1) there is always one point per subcube when we partition the d-
dimensional unit cube into N identical subcubes, (2) there is one value in each
subinterval when we project all the points over a single coordinate to obtain N values
in the unit interval, and we partition the unit interval into N subintervals of length
1/N, and (3) each point taken individually has the uniform distribution in the sub-
cube to which it belongs. Variance bounds have been obtained when the integral of a
function over the unit cube is estimated by the average of the function values at the N
points, under certain assumptions on the integrand. Different types of variance results,
in terms of the ANOVA decomposition of the integrand, were proved in [21] for the
same type of integration problem with U-sampling. SLSS is the two-dimensional
special case of each of these two methods.
Our paper is the first to study the use of SLSS in the context of simulating an array of Markov chains. SLSS gives stronger constructions than simple stratified points over the unit square. Our aim is to investigate if, and how much, this strengthening has an impact on the variance of expected cost estimators.
The remainder is organized as follows. In Sect. 2, we define our setting for Monte Carlo simulation of discrete-state Markov chains and explain how we proceed with plain (standard) Monte Carlo, with SSS, and with SLSS. With SSS, the way we map the points to chain states at each step follows [6, 9] and differs from what was done in [12]. In Sect. 3, we analyze the variance of these schemes. We prove that the variance of the simulation estimator of an expected state-dependent cost at any given step n is O (N
−3/2) for both SSS and SLSS. This beats the known rate of O (N
−1) for standard Monte Carlo. Results of computational experiments and comparison between standard Monte Carlo, SSS, and SLSS are given in Sect. 4. The empirical convergence rates of the variance for SSS and SLSS are close to those established in Sect. 3, but the variance with SLSS is significantly smaller than with SSS. In Sect. 5, we give the technical proofs of some results and we conclude in Sect. 6.
2 Monte Carlo Simulations of Markov Chains
Let {X
n, n ≥ 0} be a stationary discrete-time Markov chain over a countable and ordered state space X . Without loss of generality one can assume that X = N or Z . Let P(i, j) = P (X
n+1= j|X
n= i) denote the transition probabilities and P P P = (P(i, j) : (i, j) ∈ X
2) the transition probability matrix. We denote by µ
n(i) = P (X
n= i) the state probabilities at step n and µ
n= (µ
n(i) : i ∈ X ) the probability vector for step n. We assume that the initial probability vector µ
0is given (often, it is degenerate over a single state).
For i, j ∈ X we set
q
j(i) := ∑
h≤j
P(i, h). (1)
We define the conditional cumulative distribution function F
i( j) := P (X
n+1≤ j|X
n= i) = q
j(i). If I denotes the unit interval (0, 1] we have a disjoint union I =
Sj∈XI
i,j, where I
i,j:= (q
j−1(i), q
j(i)]. So that for any i ∈ X and u ∈ I, there exists a unique j ∈ X such that u ∈ I
i,j: we denote it by F
i−1(u). If X
n= i, then F
i−1(u) is the next state.
Let δ
ibe the Dirac measure at i, defined by δ
i( j) =
( 1 if j = i, 0 otherwise.
For any integer n, the distribution µ
nis approximated by
b µ
n:= 1 N
N k=1
∑
δ
ink
,
where N is a fixed integer and i
n1, . . . , i
nNare calculated iteratively.
First, a set (i
0k: 1 ≤ k ≤ N) of N states is sampled from µ
0: several techniques are proposed in [4]. In many applications, the initial state of the chain is fixed, and then b µ
0= µ
0.
We describe the transition from step n to n + 1 for three Monte Carlo methods.
We introduce e µ
n+1:= µ b
nP P P as an intermediate distribution (which is not used in effective calculations). This µ e
n+1is an approximation of µ
n+1, but it is generally not an equally-weighted sum of Dirac measures, like µ b
n, so that an additional step is needed. We formulate this step as a quadrature: the MC methods correspond to quadrature algorithms, possibly combined with variance reduction techniques. To that end, let us consider an arbitrary sequence s = (s(i) : i ∈ X ) (a column vector);
we assume that s is non-negative, just to avoid worrying about convergence of series.
Then
e µ
n+1s = µ b
nP P Ps = 1 N
N
∑
k=1
∑
j∈X
P(i
nk, j)s( j).
Let 1
kbe the indicator function of the interval I
k:= ((k −1)/N, k/N] and 1
i,jdenote the indicator function of the interval I
i,j. If we associate to s the function C
sndefined by
C
ns(u) :=
N k=1
∑ ∑
j∈X
1
k(u
1)1
ink,j
(u
2)s( j), u = (u
1, u
2) ∈ I
2, (2) then we have
µ e
n+1s =
ZI2
C
ns(u)du.
We obtain µ b
n+1by approximating the integral with Monte Carlo estimation. In the following, if m is an integer, we denote [1, m] := {1, 2, . . . , m}. The notation U ∼ U ( E ) means that U is a random variable uniformly distributed over the set E .
2.1 Standard Monte Carlo
The transition from step n to step n + 1 acts as follows: if the state of the chain is i, i.e. X
n= i, then a random number U with U ∼ U (I) is generated and the new state of the chain is F
i−1(U), i.e. X
n+1= F
X−1n
(U). The operation is repeated N times independently, in order to advance N copies of the chain. With our notations, this may be written as follows. Let {U
k: 1 ≤ k ≤ N} be independent random variables with U
k∼ U (I), then
i
n+1k= F
i−1n k(U
k), 1 ≤ k ≤ N.
That is, if, for any non-negative sequence s,
X b
sn+1:= 1 N
N
∑
k=1
C
nsk − 1
N ,U
k, (3)
then
µ b
n+1s = X b
sn+1. (4)
2.2 Simple Stratified Sampling
We suppose that N = p
2, for some integer p > 0. The transition from n to n +1 has two steps: renumbering of the states and numerical integration.
(S1) The states are relabeled so that i
n1≤ · · · ≤ i
nN. The technique was used in the QMC context and ensures theoretical and numerical convergence of the scheme (see [9]) .
(S2) Consider a partition of I
2into N squares: I
`= H
`1×H
`02
, where, for ` ∈ [1, p]
2: H
`1:= ((`
1−1)/p, `
1/ p] and H
`2:= ((`
2− 1)/ p, `
2/p]. Let {V
`: ` ∈ [1, p]
2} be independent random variables, where V
`= (V
`,1,V
`,2) ∼ U (I
`).
For an arbitrary non-negative sequence s, let Y b
sn+1:= 1
N ∑
`∈[1,p]2
C
ns(V
`), (5)
then
b µ
n+1s = Y b
sn+1. (6) If u ∈ I, let
κ(u) := dNue, (7)
where dxe is the least integer greater than or equal to x. Hence equation (6) means that the next states are calculated as follows:
i
n+1(`1−1)p+`2
= F
i−1n κ(V`,1)(V
`,2), ` ∈ [1, p]
2(the numbering of the states i
n+1kis arbitrary). The first projection V
`,1of V
`is
used for selecting the state at step n and the second projection V
`,2is used for
performing the transition to step n +1. Note that with this scheme, the mapping
between the N points and the N states is not necessarily one-to-one: it is possible
to pick the same state more than once and leave out some of the states. This differs
from the SSS scheme used in [12, 14].
2.3 Sudoku Latin Square Sampling
As before, we assume N = p
2, and the transition from n to n + 1 has two steps:
renumbering of the states and numerical integration.
(S1) The states are relabeled so that i
n1≤ · · · ≤ i
nN.
(S2) We consider the same partition of I
2as before: I
`for ` ∈ [1, p]
2. Let {W
`: ` ∈ [1, p]
2} be random variables, where W
`= (W
`,1, W
`,2), with
W
`,1= `
1− 1
p + σ
1(`
2) − 1 + ξ
`1p
2W
`,2= `
2−1
p + σ
2(`
1) −1 + ξ
`2p
2.
Here σ
1and σ
2are random permutations of [1, p] and ξ
`1∼ U (I) and ξ
`2∼ U (I).
All these random variables being independent. The set of values of the random variable W
`is included in I
`and has the properties:
(P1) for any ` ∈ [1, p]
2, there is a unique point of this set in each square I
`, (P2) for any k ∈ [1,N], there is a unique point of this set in each rectangle I ×I
kor I
k×I.
In addition W
`∼ U (I
`). For an arbitrary non-negative sequence s, let Z b
sn+1:= 1
N ∑
`∈[1,p]2
C
ns(W
`). (8)
Then
b µ
n+1s = Z b
sn+1. (9) Due to property (P2), the mapping (see (7))
` := (`
1, `
2) ∈ [1, p]
2→ κ (W
`,1) ∈ [1, N]
is one-to-one: each state of step n is considered exactly once for a transition (this is not the case with SSS). Equation (9) means that the next states are calculated as follows:
i
n+1(`1−1)p+`2
= F
i−1n κ(W`,1)(W
`,2), ` ∈ [1, p]
2(as before, the numbering of the states i
n+1kis arbitrary). The first projection W
`,1of W
`is used for selecting the state at step n and the second projection W
`,2is used for performing the transition to step n + 1.
3 Convergence Analysis
In this section we prove, for each method, that the estimator of the expected cost
at each step is unbiased and we establish that the variance of the estimator used is
O (N
−1) for standard MC and O (N
−3/2) for SSS and SLSS, where N is the number of simulation paths. In the following, λ is the Lebesgue measure and λ
2the two- dimensional Lebesgue measure; we put | E | for the number of elements of a set E . We use the sequence s
h, for h ∈ X :
s
h(i) :=
( 1 if i ≤ h, 0 otherwise.
The total variation of a sequence s = (s(i) : i ∈ X ) is defined by TV(s) := ∑
i∈X
|s(i + 1)− s(i)|. (10)
We use below the total variation of q
h, for h ∈ X . We recall that we have from (1):
q
h(i) = P (X
n+1≤ h|X
n= i), and from (10):
TV (q
h) = ∑
i∈X
| P (X
n+1≤ h|X
n= i + 1)− P (X
n+1≤ h|X
n= i)|.
In the following, we assume that M := sup
h∈X
TV (q
h) < +∞.
There are situations for which q
his monotone and situations for which M < 1 (or both), but this is not always true. See [13, 14] for examples and further discussion.
Our M corresponds to Λ
jin [12].
3.1 Standard Monte Carlo
Lemma 1. Let s be a non-negative sequence. The standard Monte Carlo estimator of e µ
n+1s:
X b
sn+1:= 1 N
N
∑
k=1
C
nsk − 1
N ,U
khas the following properties.
1. X b
sn+1is unbiased.
2. If s = s
h, for h ∈ X , then
Var(b X
sn+1h
) ≤ 1
4N . Proof.
1. We have
E
C
snk − 1
N ,U
k= ∑
j∈X
P(i
nk, j)s( j),
so that E [ X b
sn+1] = µ e
n+1s.
2. The variable
C
snh
k −1 N ,U
k= ∑
j∈X,j≤h
1
ink,j
(U
k)
is a Bernoulli random variable, with variance ≤ 1/4. Hence the result.
u t We then obtain an error bound by using the same techniques as in [12]. We assume that, for any non-negative sequence s, the standard Monte Carlo estimator µ b
0s of µ
0s is unbiased and that, for any h ∈ X ,
Var( b µ
0s
h) ≤ x
0N ,
for some x
0≥ 0 (as noticed before, in many applications, b µ
0= µ
0).
Proposition 1. For the standard Monte Carlo method, it holds:
1. for any non-negative sequence s
E [ b µ
ns] = µ
ns, 2. for any h ∈ X ,
Var( µ b
ns
h) ≤ x
nN , where x
n+1= M
2x
n+ 1/4 (n ≥ 0).
Proof.
1. We have
µ
n+1s − µ b
n+1s = µ
n+1s− µ e
n+1s + µ e
n+1s− µ b
n+1s = µ
nP P Ps− b µ
nP P Ps + µ e
n+1s− X b
sn+1, so, by using Lemma 1, the result follows by induction.
2. The variables µ
n+1s
h− µ e
n+1s
hand µ e
n+1s
h− b µ
n+1s
hare uncorrelated and e µ
n+1s
h− µ b
n+1s
h= µ e
n+1s
h− X b
sn+1h
is a centered variable, consequently Var( b µ
n+1s
h) = E
(µ
n+1s
h− µ e
n+1s
h)
2+ E
( µ e
n+1s
h− b µ
n+1s
h)
2. (11)
For any i ∈ X , we have P P Ps
h(i) = F
i(h), hence µ
n+1s
h− µ e
n+1s
h=µ
nP P Ps
h− µ b
nP P Ps
h= ∑
i∈X
µ
n(i)F
i(h)− ∑
i∈X
b µ
n(i)F
i(h)
= − ∑
i∈X
µ
ns
i(F
i+1(h) −F
i(h)) + ∑
i∈X
µ b
ns
i(F
i+1(h) − F
i(h))
= ∑
i∈X
( b µ
ns
i− µ
ns
i)(F
i+1(h) − F
i(h)).
On the one hand, we write
E
(µ
n+1s
h− µ e
n+1s
h)
2= E
∑
i∈X
( µ b
ns
i− µ
ns
i)(q
h(i +1) −q
h(i))
!
2
= ∑
(i,j)∈X2
E [( µ b
ns
i− µ
ns
i)(q
h(i +1) −q
h(i))( µ b
ns
j− µ
ns
j)(q
h( j + 1) − q
h( j))]
≤ ∑
(i,j)∈X2
q
Var( µ b
ns
i)Var( µ b
ns
j)|q
h(i +1) −q
h(i)| × |q
h( j + 1) − q
h( j)|.
On the other hand, Lemma 1 gives E
( µ e
n+1s
h− b µ
n+1s
h)
2= E h
( µ e
n+1s
h− X b
sn+1h
)
2i
≤ 1 4N . So, by using (11), the result follows by induction.
u t The bounds for Var(b Y
sn+1h
) (SSS) and Var(b Z
sn+1h
) (SLSS) are not so easily obtained, and the proofs of Lemma 2 and 3 are given in Sect. 5.
3.2 Simple Stratified Sampling
Lemma 2. Let s be a non-negative sequence. The SSS estimator of µ e
n+1s:
Y b
sn+1:= 1
N ∑
`∈[1,p]2
C
ns(V
`)
has the following properties.
1. Y b
sn+1is unbiased.
2. If s = s
h, for h ∈ X , then Var(b Y
sn+1h
) ≤ 1
4N
3/2(TV (q
h) + 2).
A similar result (with the same N
−3/2order) was established in [12], but the SSS method studied there differs from the one used here, so we provide a different proof (see Sect. 5). Intermediate results from this proof (eqs. (12) and (13)) will be re-used afterwards for the analysis of SLSS.
The proof of the next result is similar to the proof of Proposition 1. We assume that, for any non-negative sequence s, the SSS estimator b µ
0s of µ
0s is unbiased and that, for any h ∈ X ,
Var( µ b
0s
h) ≤ y
0N
3/2for some y
0≥ 0.
Proposition 2. For the SSS method, it holds:
1. for any non-negative sequence s
E [ b µ
ns] = µ
ns, 2. for any h ∈ X ,
Var( b µ
ns
h) ≤ y
nN
3/2, where y
n+1= M
2y
n+ (M + 2)/4 (n ≥ 0).
3.3 Sudoku Latin Square Sampling
Lemma 3. Let s be a non-negative sequence. The SLSS estimator of e µ
n+1s:
Z b
sn+1:= 1
N ∑
`∈[1,p]2
C
sn(W
`)
has the following properties.
1. Z b
sn+1is unbiased.
2. If s = s
h, for h ∈ X , and if q
his a piecewise monotonic sequence, with r pieces, then
Var(b Z
n+1sh
) ≤ 1
N
3/213
4 + r
(TV (q
h) + 2) + 2(TV (q
h) + 2)
2.
The constant involved in the O (N
−3/2) bound of Var(b Z
n+1sh
) (SLSS) is larger than the corresponding constant for Var(b Y
sn+1h
) (SSS), since r ≥ 1 in Lemma 3; this would suggest degraded performance. But in the examples of Sect. 4 we see that it is not necessarily the case.
The proof of the next result is similar to the proof of Proposition 1. We assume that, for any non-negative sequence s, the SLSS estimator µ b
0s of µ
0s is unbiased and that, for any h ∈ X ,
Var( b µ
0s
h) ≤ z
0N
3/2, for some z
0≥ 0.
Proposition 3. For the SLSS method, it holds:
1. for any non-negative sequence s
E [ b µ
ns] = µ
ns, 2. for any h ∈ X ,
Var( b µ
ns
h) ≤ z
nN
3/2,
where z
n+1= M
2z
n+ (13/4 + r)(M +2) + 2(M + 2)
2n ≥ 0.
Remark 1. The bounds in Propositions 1, 2, and 3 increase exponentially with n when M > 1, and remains bounded if M < 1, which is not uncommon; see [14].
Remark 2. The variance of each estimator is bounded for a test sequence of the form s
h, for h ∈ X . We obtain a bound for a nonnegative cost function c by the same reasoning as in Proposition 1 (see [12]):
µ
nc − b µ
nc = ∑
i∈X
( b µ
ns
i− µ
ns
i)(c(i + 1) − c(i)), hence
E
(µ
nc − b µ
nc)
2≤ ∑
(i,j)∈X2
q
Var( µ b
ns
i)Var( b µ
ns
j)|c(i +1)−c(i)|×|c( j +1)−c( j)|, and then
Var( µ b
nc) ≤ TV(c)
2× sup
h∈X
Var( µ b
ns
h).
4 Numerical Examples
In this section, we compare standard Monte Carlo to the variance reduction strategies analyzed previously: SSS and SLSS, for three examples. For each example, each strategy, and each N considered, we compute the unbiased estimator ˆ µ
ns of µ
ns for the selected n, replicate this 100 times, and compute the empirical variance Var of the 100 realizations of ˆ µ
ns. We then plot log
10Var as a function of log
10N. Assuming that Var ≈ KN
−αfor some positive constants K and α, we estimate the variance rate α by linear regression. We also compute the (empirical) efficiency of each simulation estimator, defined as the inverse of the product of Var by the CPU time [10], and we plot log
10efficiency as a function of log
10N. Note that for standard Monte Carlo, the efficiency does not depend on N. For SSS and SLSS, it takes into account the additional work to compute the estimators.
4.1 A Geo/Geo/1 Queue
We consider a discrete-time Geo/Geo/1 queue (see [1]): the queue is empty at the initial time. During each unit of time, the customer in service (if there is one) completes it with probability 0.5, and one new customer arrives with probability 0.6.
We estimate the mean number of customers in the queue at time n = 12. Figure 2
(top) shows log
10Var as a function of log
10N on the left and log
10efficiency as a
function of log
10N on the right, for N = 10
2, 50
2, 100
2, . . . , 1 000
2. We find from the
plots that SSS and SLSS give not only smaller variances than standard MC (for the same N), but also better efficiencies, and that SLSS outperforms SSS. The regression estimates of α are given in the first row of Table 1. They match the upper bounds of O (N
−3/2) established in Sect. 3.
4.2 A Gambler in a Casino
A gambler is going to a casino for four hours. He plans to play the same game every ten seconds (so he will play 1 440 times). At this game, for each Euro that he bids, he gets 0 with probability 0.9 and m ∈ {1, 2, . . . , 10} with probability 0.01 each. His policy is the following: if he has more than 100 Euros, he plays 2 Euros, but if he has 100 Euros or less, he plays only 1. To make sure that he can play during the four hours, he brings 2 780 Euros with him. The model is a Markov chain on state space E = [0, 28 700]. We estimate the probability that the gambler has more than 1 500 Euros at the end of the game. Here we use N = 10
2, 20
2, 30
2, . . . , 200
2. The results are reported in the middle rows of Fig. 2 and Table 1. We find that SSS and SLSS produce both smaller variances and better efficiencies than standard MC for large enough N, and that SLSS outperforms SSS. The regression estimate of α for SLSS corresponds to the bound established in Sect. 3, but for SSS it is better. However, the variance itself is smaller for SLSS than for SSS, and the better rate of SSS might not hold beyond the observed range (it is likely caused by a few poor values of Var for the smallest values of N).
4.3 Diffusion
The 1-D diffusion equation
∂ c
∂ t (x,t) = D ∂
2c
∂ t
2(x, t), x ∈ R , t > 0 and c(x, 0) = c
0(x), x ∈ R
(where c
0≥ 0 and
RRc
0(x)dx = 1) may be discretized with a time step ∆ t and a spatial step ∆ x and the solution is approximated using a random walk: P(i,i −1) = P(i, i + 1) = D∆ t/∆x
2, P(i,i) = 1 −2D∆t/∆ x
2(we refer to [3] in a QMC context). Here we specify D = 1 and c
0is the indicator function of the interval [−1/2, 1/2]; we want to approximate
R−1/21/2c(x, T )dx. We choose T = 1, with ∆t = 6.25 10
−4and ∆ x = 5.0 10
−2. Here we take N = 11
2, 19
2, 31
2, . . . , 199
2. In a previous version of the paper, we had N = 10
2, 20
2, 30
2, . . . , 200
2, but this gave oscillations in SLSS outcomes, with better results when p = √
N was a multiple of 20, because of interactions with other
discretization parameters. To avoid this, we changed our choices of p. So we use
prime numbers near the previous ones for p. The bottom part of Figure 2 and the
last row of Table 1 give the results, which are very similar to those of the previous example. Again, SLSS outperforms the other methods.
Table 1 Calculation of orderαof the sample variance: comparison of standard Monte Carlo (MC), SSS, and SLSS for three examples and estimators
Calculation MC SSS SLSS
Geo/Geo/1 queue E[X12] 0.99 1.50 1.51
Gambler in a casino P(X1 440>1 500) 1.02 1.76 1.53
Diffusion R−1/21/2 c(x,T)dx 0.98 1.61 1.49
5 The Proofs
5.1 Proof of Lemma 2
Proof.
1. We have
E [C
ns(V
`)] = N
N
∑
k=1
∑
j∈X
Z I`1
k(u
1)1
ink,j
(u
2)du
s( j).
Hence
E [b Y
sn+1] = 1 N
N
∑
k=1
∑
j∈X
P(i
nk, j)s( j) = µ e
n+1s.
2. The function C
snh
is the indicator function of the set J
hn:=
N [
k=1
I
k×
[j∈X,j≤h
I
ink,j
!
=
N [
k=1
I
k× (0, q
h(i
nk)]. (12) The variable C
snh
(V
`) is a Bernoulli random variable, with expectation f
h,`n= Nλ
2(J
hn∩ I
`). Here f
h,`n= 1 if I
`⊂ J
hnand f
h,`n= 0 if I
`∩ J
hn= /0. Consequently, Var(C
snh
(V
`)) = f
h,`n(1− f
h,`n) ≤ 1/4 and Var(C
snh
(V
`)) = 0 if I
`⊂ J
hnor if I
`∩J
hn= /0, so that
Var(b Y
sn+1h
) ≤ 1
4N
2{` ∈ [1, p]
2: I
`6⊂ J
hnand I
`∩ J
hn6= /0}
.
a. If I
`6⊂ J
hn, then there exists (u
1,u
2) which belongs to I
`and not to J
hn; since
this u
1is in some I
k, we have:
1. Geo/Geo/1 queue: sample variance (left) and efficiency (right) of 100 copies of the calculation ofE[X12]as a function ofN(N=102,502,1002, . . . ,1 0002) (log-log scale)
2. Gambler in a casino: sample variance (left) and efficiency (right) of 100 copies of the calculation ofP(X1 440>1 500)as a function ofN(N=102,202,302, . . . ,2002) (log-log scale)
3. Diffusion: sample variance (left) and efficiency (right) of 100 copies of the calculation of R1/2
−1/2c(x,T)dxas a function ofN(N=112,192,312, . . . ,1992) (log-log scale)
Fig. 2 Comparison of standard Monte Carlo (MC) to SSS and SLSS (Sudoku) for three examples
∃k ∈ [1, N], ∃(u
1, u
2) ∈ I
`: u
1∈ I
k⊂
`
1− 1 p , `
1p
and u
2∈ / (0, q
h(i
nk)], so that
∃k ∈ { p(`
1−1) + 1, p(`
1− 1) + 2, . . . , p`
1}, ∃u
2∈
`
2− 1 p , `
2p
: u
2> q
h(i
nk), consequently
`
2> p min
p(`1−1)<k≤p`1
q
h(i
nk).
b. Analogously, if I
`∩ J
hn6= /0, then there exists (u
1,u
2) which belongs to I
`and also to J
hn; and we eventually obtain:
`
2< p max
p(`1−1)<k≤p`1
q
h(i
nk) + 1.
We then have the following bounds
{` ∈ [1, p]
2: I
`6⊂ J
hnand I
`∩ J
hn6= /0}
≤ p
p
∑
`1=1
max
p(`1−1)<k≤p`1
q
h(i
nk)− min
p(`1−1)<k≤p`1
q
h(i
nk)
+ 2
!
≤ N
1/2∑
i∈X
|q
h(i + 1) − q
h(i)| + 2
! ,
because the states are relabeled so that i
n1≤ · · · ≤ i
nN. Consequently, {` ∈ [1, p]
2: I
`6⊂ J
hnand I
`∩ J
hn6= /0}
≤ N
1/2(TV (q
h) + 2), (13) and the result follows.
u t
5.2 Proof of Lemma 3
Proof.
1. Since W
`∼ U (I
`), the demonstration is the same as in Lemma 2.
2. In the following, we have many summations with `, `
0, m, m
0∈ [1, p]
2. In order to lighten the notations, we omit this set. We have
Var(b Z
sn+1h
) = V
0(b Z
sn+1h
) + 1
N
2∑
(`,`0):`6=`0
Cov C
snh
(W
`),C
nsh
(W
`0) ,
where
V
0(b Z
sn+1h
) := 1
N
2∑
`
Var C
snh
(W
`) . a. The function C
snh
is the indicator function of the set J
hndefined by (12). Since W
`∼ U (I
`), we have, as in Lemma 2:
V
0(b Z
sn+1h
) ≤ 1
4N
2{` ∈ [1, p]
2; I
`6⊂ J
hnand I
`∩ J
hn6= /0}
.
From the bound (13), we deduce V
0(b Z
sn+1h
) ≤ 1
4N
3/2(TV (q
h) + 2).
b. We split Var(b Z
n+1sh
) = V
0(b Z
sn+1h
) + V
1(b Z
n+1sh
) + V
2(b Z
sn+1h
) + V
3(b Z
sn+1h
), with V
1(b Z
sn+1h
) := 1
N
2∑
(`,`0):`16=`01,`2=`02
Cov C
snh
(W
`),C
nsh
(W
`0) ,
V
2(b Z
sn+1h
) := 1
N
2∑
(`,`0):`1=`01,`26=`0
2
Cov C
snh
(W
`),C
nsh
(W
`0) ,
V
3(b Z
sn+1h
) := 1
N
2∑
(`,`0):`16=`01,`26=`02
Cov C
snh
(W
`),C
nsh
(W
`0) .
We introduce the N
2squares I
`,m= H
`1,m1× H
`2,m2, where, for (`, m) ∈ [1, p]
4: H
`1,m1:=((`
1− 1)/ p + (m
1− 1)/N, (`
1− 1)/ p +m
1/N],
H
`2,m2:=((`
2− 1)/ p + (m
2− 1)/N, (`
2− 1)/ p +m
2/N].
We have
V
1(b Z
sn+1h
) = ∑
(`,`0):`16=`0
1,`2=`02
N
p − 1 ∑
(m,m0):m1=m01,m26=m0
2
λ
2(I
`,m∩ J
hn)λ
2(I
`0,m0∩ J
hn)
− λ
2(I
`∩ J
hn)λ
2(I
`0∩ J
hn)
! ,
V
2(b Z
sn+1h
) = ∑
(`,`0):`1=`01,`26=`02
N
p − 1 ∑
(m,m0):m16=m01,m2=m02
λ
2(I
`,m∩ J
hn)λ
2(I
`0,m0∩ J
hn)
− λ
2(I
`∩ J
hn)λ
2(I
`0∩ J
hn)
! ,
V
3(b Z
sn+1h
) = ∑
(`,`0):`16=`01,`26=`02
N
(p −1)
2∑
(m,m0):m16=m01,m26=m02
λ
2(I
`,m∩ J
hn)λ
2(I
`0,m0∩ J
hn)
− λ
2(I
`∩ J
hn)λ
2(I
`0∩ J
hn)
! .
i. We have
V
1(b Z
n+1sh
) = ∑
`:I`6⊂Jn
h,I`∩Jn
h6=/0
∑
`0:`016=`1,`02=`2
V
1(`, `
0), where
V
1(`, `
0) := N
p −1 ∑
(m,m0):m1=m01,m26=m0
2
λ
2(I
`,m∩ J
hn)λ
2(I
`0,m0∩ J
hn)
− λ
2(I
`∩ J
hn)λ
2(I
`0∩ J
hn).
We split V
1(`, `
0) = V ˆ
1(`, `
0) + V ˇ
1(`, `
0), with V ˆ
1(`, `
0) := N
p − 1 ∑
(m,m0):m1=m01,m26=m02
λ
2(I
`,m∩ J
hn)λ
2(I
`0,m0∩ J
hn)
− N
p ∑
(m,m0):m1=m01
λ
2(I
`,m∩ J
hn)λ
2(I
`0,m0∩ J
hn), V ˇ
1(`, `
0) :=p
× ∑
(m,m0):m1=m01
λ
2(I
`,m∩ J
hn)λ
2(I
`0,m0∩ J
hn) − λ
2(I
`∩ J
hn)λ
2(I
`0∩ J
hn).
On one side
V ˆ
1(`, `
0) =N ∑
m
λ
2(I
`,m∩ J
hn)
×
1
p(p − 1) ∑
m0:m01=m1,m026=m2
λ
2(I
`0,m0∩ J
hn) − 1
p λ
2(I
`0,m∩ J
hn)
.
Since both terms inside the parentheses are bounded by 1/(pN
2), we have
| V ˆ
1(`, `
0)| ≤ 1/(pN
2) and so
∑
`0:`016=`1,`02=`2
V ˆ
1(`, `
0)
≤ p − 1 pN
2.
On the other side V ˇ
1(`, `
0) = ∑
m1∈[1,p]
λ
2((H
`1,m1× H
`2)∩ J
hn) ∑
m01∈[1,p]
V ˇ
1(`, `
0,m
1, m
01),
where ˇ V
1(`, `
0, m
1, m
01) := λ
2((H
`01,m1
×H
`2)∩ J
hn)−λ
2((H
`01,m01
×H
`2) ∩J
hn).
We have
V ˇ
1(`, `
0, m
1, m
01) = 1 N
λ
H
`2∩ (0,q
h(i
np(`01−1)+m1
)]
−λ
H
`2∩ (0,q
h(i
np(`01−1)+m01
)]
.
As we have
| V ˇ
1(`, `
0, m
1,m
01)| ≤ 1 N
×λ H
`2∩
"
min
p(`01−1)<k≤p`01
q
h(i
nk), max
p(`01−1)<k≤p`01
q
h(i
nk)
#!
,
we deduce
| V ˇ
1(`, `
0)| ≤ 1
N p λ H
`2∩
"
min
p(`01−1)<k≤p`01
q
h(i
nk), max
p(`01−1)<k≤p`01
q
h(i
nk)
#!
.
Consequently
∑
`0:`016=`1,`02=`2
V ˇ
1(`, `
0)
≤ 1 N p
× ∑
`01∈[1,p]:`0
16=`1
λ H
`2∩
"
min
p(`01−1)<k≤p`01
q
h(i
nk), max
p(`01−1)<k≤p`01
q
h(i
nk)
#!
.
Since q
his a piecewise monotonic sequence, and because the states are relabeled so that i
n1≤ · · · ≤ i
nNthe intervals
min
p(`01−1)<k≤p`01
q
h(i
nk), max
p(`01−1)<k≤p`01
q
h(i
nk)
!
, `
01∈ [1, p]
are pairwise disjoint on each of the r pieces where q
his monotonic, and we obtain
∑
`0:`016=`1,`02=`2
V ˇ
1(`, `
0)
≤ r
N p λ (H
`2) ≤ r N
2. And so, using the bound (13):
|V
1(b Z
n+1sh
)| ≤ (r +1)p − 1
pN
3/2(TV (q
h) + 2).
ii. We have
V
2(b Z
n+1sh
) = ∑
`:I`6⊂Jn
h,I`∩Jn
h6=/0
∑
`0:`01=`1,`026=`2
V
2(`, `
0), where
V
2(`, `
0) := N p − 1
× ∑
(m,m0):m16=m01,m2=m02
λ
2(I
`,m∩ J
hn)λ
2(I
`0,m0∩ J
hn)− λ
2(I
`∩ J
hn)λ
2(I
`0∩ J
hn).
We split V
2(`, `
0) = V ˆ
2(`, `
0) + V ˇ
2(`, `
0), with V ˆ
2(`, `
0) := N
p − 1 ∑
(m,m0):m16=m01,m2=m02
λ
2(I
`,m∩ J
hn)λ
2(I
`0,m0∩ J
hn)
− N
p ∑
(m,m0):m2=m02
λ
2(I
`,m∩ J
hn)λ
2(I
`0,m0∩ J
hn), V ˇ
2(`, `
0) :=p ∑
(m,m0):m2=m02
λ
2(I
`,m∩ J
hn)λ
2(I
`0,m0∩ J
hn)− λ
2(I
`∩ J
hn)λ
2(I
`0∩ J
hn).
On one side V ˆ
2(`, `
0) =N ∑
m
λ
2(I
`,m∩ J
hn)
×
1
p(p − 1) ∑
m0:m016=m1,m02=m2
λ
2(I
`0,m0∩ J
hn) − 1
p λ
2(I
`0,m∩ J
hn)
.
Since both terms inside the parentheses are bounded by 1/(pN
2), we have
| V ˆ
2(`, `
0)| ≤ 1/(pN
2) and so
∑
`0:`01=`1,`026=`2
V ˆ
2(`, `
0)
≤ p − 1 pN
2.
On the other side V ˇ
2(`, `
0) = ∑
m2∈[1,p]
λ
2((H
`1×H
`2,m2)∩ J
hn) ∑
m02∈[1,p]
V ˇ
2(`, `
0,m
2, m
02),
where ˇ V
2(`, `
0, m
2, m
02) := λ
2((H
`1×H
`02,m2
) ∩J
hn)−λ
2((H
`1×H
`02,m02
)∩J
hn);
we have
V ˇ
2(`, `
0, m
2, m
02) = 1
N ∑
m01∈[1,p]
λ
H
`02,m2
∩ (0, q
h(i
np(`1−1)+m01
)]
−λ H
`02,m02
∩ (0, q
h(i
np(`1−1)+m01
)]
. Note that the difference in the parentheses is equal to 0 if `
026= `
02(`
1, m
01) :=
dnq
h(i
np(`1−1)+m01
)e +1. Consequently
∑
`0:`01=`1,`026=`2
V ˇ
2(`, `
0)
≤ 1
N ∑
m2∈[1,p]
λ
2((H
`1× H
`2,m2) ∩ J
hn)
× ∑
m0
λ
H
`02(`1,m01),m2
∩ (0, q
h(i
np(`1−1)+m01
)]
−λ H
`02(`1,m01),m02
∩ (0,q
h(i
np(`1−1)+m01
)]
≤ 1
N ∑
m2∈[1,p]
λ
2((H
`1× H
`2,m2) ∩ J
hn) ≤ 1
N λ
2(I
`) = 1 N
2.
And so, using the bound (13):
|V
2(b Z
sn+1h
)| ≤ 2 p −1
pN
3/2(TV (q
h) + 2).
iii. We have
V
3(b Z
n+1sh
) = ∑
`:I`6⊂Jn
h,I`∩Jn
h6=/0
∑
`0:`016=`1,`026=`2
I`0 6⊂Jn h,I
`0 ∩Jn
h6=/0