• Aucun résultat trouvé

Randomized Quasi-Monte Carlo:

N/A
N/A
Protected

Academic year: 2022

Partager "Randomized Quasi-Monte Carlo:"

Copied!
184
0
0

Texte intégral

(1)1. aft. Randomized Quasi-Monte Carlo: Theory, Choice of Discrepancy, and Applications. Pierre L’Ecuyer and David Munger. Dr. Informatique et Recherche Opérationnelle, Université de Montréal.

(2) 1. aft. Randomized Quasi-Monte Carlo: Theory, Choice of Discrepancy, and Applications featuring randomly-shifted lattice rules Pierre L’Ecuyer and David Munger. Dr. Informatique et Recherche Opérationnelle, Université de Montréal 1. Monte Carlo (MC), quasi-MC (QMC), randomized QMC (RQMC). 2. Lattice rules and RQMC variance.. 3. Weighted discrepancies and choice of weights. 4. Several examples..

(3) 2. Basic Monte Carlo setting Want to estimate. aft. Z. µ = µ(f ) =. [0,1)s. f (u) du = E[f (U)]. where f : [0, 1)s → R and U is a uniform r.v. over [0, 1)s . Standard Monte Carlo: I. Generate n independent copies of U, say U1 , . . . , Un ; P estimate µ by µ̂n = n1 ni=1 f (Ui ).. Dr. I.

(4) 2. Basic Monte Carlo setting Want to estimate. aft. Z. µ = µ(f ) =. [0,1)s. f (u) du = E[f (U)]. where f : [0, 1)s → R and U is a uniform r.v. over [0, 1)s . Standard Monte Carlo: I. Generate n independent copies of U, say U1 , . . . , Un ; P estimate µ by µ̂n = n1 ni=1 f (Ui ).. Dr. I. Almost sure convergence as n → ∞ (strong law of large numbers). For confidence interval of level 1 − α, can use central limit theorem:    cα Sn cα Sn P µ ∈ µ̂n − √ , µ̂n + √ ≈ 1 − α, n n where Sn2 is any consistent estimator of σ 2 = Var[f (U)]..

(5) 3. aft. Quasi-Monte Carlo (QMC). Dr. Replace the random points Ui by a set of deterministic points Pn = {u0 , . . . , un−1 } that cover [0, 1)s more evenly. This Pn is called a highly-uniform or low-discrepancy point set if some measure of discrepancy between the empirical distribution of Pn and the uniform distribution → 0 faster than for independent random points..

(6) 3. aft. Quasi-Monte Carlo (QMC). Replace the random points Ui by a set of deterministic points Pn = {u0 , . . . , un−1 } that cover [0, 1)s more evenly. This Pn is called a highly-uniform or low-discrepancy point set if some measure of discrepancy between the empirical distribution of Pn and the uniform distribution → 0 faster than for independent random points.. Dr. Main construction methods: lattice rules and digital nets (Korobov, Hammersley, Halton, Sobol’, Faure, Niederreiter, etc.).

(7) 4. Simplistic solution: rectangular grid. 1. Dr. ui,1. aft. Pn = {(i1 /d, . . . , is /d) such that 0 ≤ ij < d ∀j} where n = d s .. 0. 1. ui,2.

(8) 4. Simplistic solution: rectangular grid. 1. Dr. ui,1. aft. Pn = {(i1 /d, . . . , is /d) such that 0 ≤ ij < d ∀j} where n = d s .. 0. 1. ui,2. Quickly becomes impractical when s increases. And each one-dimensional projection has only d distinct points, each two-dimensional projections has only d 2 distinct points, etc..

(9) 5. Example: lattice with s = 2, n = 101, a = 12. aft. Pn = {(x/m, (ax/m) mod 1) : x = 0, . . . , m − 1} = {(x/101, (12x/101) mod 1) : x = 0, . . . , 100} 1. Dr. ui,1. 0 1 ui,2 Here, each one-dimensional projection is {0, 1/n, . . . , (n − 1)/n}..

(10) 5. Example: lattice with s = 2, n = 101, a = 12. aft. Pn = {(x/m, (ax/m) mod 1) : x = 0, . . . , m − 1} = {(x/101, (12x/101) mod 1) : x = 0, . . . , 100} 1. Dr. ui,1. 0 1 ui,2 Here, each one-dimensional projection is {0, 1/n, . . . , (n − 1)/n}..

(11) 5. Example: lattice with s = 2, n = 101, a = 12. aft. Pn = {(x/m, (ax/m) mod 1) : x = 0, . . . , m − 1} = {(x/101, (12x/101) mod 1) : x = 0, . . . , 100} 1. Dr. ui,1. 0 1 ui,2 Here, each one-dimensional projection is {0, 1/n, . . . , (n − 1)/n}..

(12) 5. Example: lattice with s = 2, n = 101, a = 12. aft. Pn = {(x/m, (ax/m) mod 1) : x = 0, . . . , m − 1} = {(x/101, (12x/101) mod 1) : x = 0, . . . , 100} 1. Dr. ui,1. 0 1 ui,2 Here, each one-dimensional projection is {0, 1/n, . . . , (n − 1)/n}..

(13) 5. Example: lattice with s = 2, n = 101, a = 12. aft. Pn = {(x/m, (ax/m) mod 1) : x = 0, . . . , m − 1} = {(x/101, (12x/101) mod 1) : x = 0, . . . , 100} 1. Dr. ui,1. 0 1 ui,2 Here, each one-dimensional projection is {0, 1/n, . . . , (n − 1)/n}..

(14) 5. Example: lattice with s = 2, n = 101, a = 12. aft. Pn = {(x/m, (ax/m) mod 1) : x = 0, . . . , m − 1} = {(x/101, (12x/101) mod 1) : x = 0, . . . , 100} 1. Dr. ui,1. 0 1 ui,2 Here, each one-dimensional projection is {0, 1/n, . . . , (n − 1)/n}. Two problems: (1) point at (0, 0) and (2) how to estimate the error?.

(15) 6. Randomized quasi-Monte Carlo (RQMC). aft. An RQMC estimator of µ has the form n−1. µ̂n,rqmc. 1X f (Ui ), = n i=0. with Pn = {U0 , . . . , Un−1 } ⊂ (0, 1)s an RQMC point set:. (i) each point Ui has the uniform distribution over (0, 1)s ;. Dr. (ii) Pn as a whole is a low-discrepancy point set. E[µ̂n,rqmc ] = µ. (unbiased).. Can perform m independent realizations X1 , . . . , Xm of µ̂n,rqmc , then estimate µ and Var[µ̂n,rqmc ] by their sample mean X̄m and sample 2 (also unbiased). variance Sm Temptation: assume that X̄m has the normal distribution..

(16) 7. 1. Dr. ui,1. aft. Example: lattice with s = 2, n = 101, a = 12. 0. 1. ui,2.

(17) 7. 1. Dr. ui,1. aft. Example: lattice with s = 2, n = 101, a = 12. U. 0. 1. ui,2.

(18) 7. 1. Dr. ui,1. aft. Example: lattice with s = 2, n = 101, a = 12. 0. 1. ui,2.

(19) 7. 1. Dr. ui,1. aft. Example: lattice with s = 2, n = 101, a = 12. 0. 1. ui,2.

(20) 8. Var[µ̂n,rqmc ] =. aft. Generalized antithetic variates and RQMC. n−1 n−1 1 XX Cov[f (Ui ), f (Uj )] n2 i=0 j=0. =. Var[f (Ui )] 2 X Cov[f (Ui ), f (Uj )]. + 2 n n i<j. Dr. We want to make the last sum as negative as possible. Special cases: antithetic variates (n = 2), Latin hypercube sampling (LHS), randomized quasi-Monte Carlo (RQMC)..

(21) 9. Lattice rules Integration lattice:. aft.   s   X Ls = v = zj vj such that each zj ∈ Z ,   j=1. Dr. where v1 , . . . , vs ∈ Rs are linearly independent over R and where Ls contains Zs . Lattice rule: Take Pn = {u0 , . . . , un−1 } = Ls ∩ [0, 1)s ..

(22) 9. Lattice rules Integration lattice:. aft.   s   X Ls = v = zj vj such that each zj ∈ Z ,   j=1. where v1 , . . . , vs ∈ Rs are linearly independent over R and where Ls contains Zs . Lattice rule: Take Pn = {u0 , . . . , un−1 } = Ls ∩ [0, 1)s .. Dr. Lattice rule of rank 1: ui = iv1 mod 1 for i = 0, . . . , n − 1, where nv1 = z = (z1 , . . . , zs ) ∈ {0, 1, . . . , n − 1}. Korobov rule: z = (1, a, a2 mod n, . . . )..

(23) 9. Lattice rules Integration lattice:. aft.   s   X Ls = v = zj vj such that each zj ∈ Z ,   j=1. where v1 , . . . , vs ∈ Rs are linearly independent over R and where Ls contains Zs . Lattice rule: Take Pn = {u0 , . . . , un−1 } = Ls ∩ [0, 1)s .. Dr. Lattice rule of rank 1: ui = iv1 mod 1 for i = 0, . . . , n − 1, where nv1 = z = (z1 , . . . , zs ) ∈ {0, 1, . . . , n − 1}. Korobov rule: z = (1, a, a2 mod n, . . . ). For any u ⊂ {1, . . . , s}, the projection Ls (u) of Ls is also a lattice..

(24) 9. Lattice rules Integration lattice:. aft.   s   X Ls = v = zj vj such that each zj ∈ Z ,   j=1. where v1 , . . . , vs ∈ Rs are linearly independent over R and where Ls contains Zs . Lattice rule: Take Pn = {u0 , . . . , un−1 } = Ls ∩ [0, 1)s .. Dr. Lattice rule of rank 1: ui = iv1 mod 1 for i = 0, . . . , n − 1, where nv1 = z = (z1 , . . . , zs ) ∈ {0, 1, . . . , n − 1}. Korobov rule: z = (1, a, a2 mod n, . . . ). For any u ⊂ {1, . . . , s}, the projection Ls (u) of Ls is also a lattice. Random shift modulo 1: generate a single point U uniformly over (0, 1)s and add it to each point of Pn , modulo 1, coordinate-wise: Ui = (ui + U) mod 1. Each Ui is uniformly distributed over [0, 1)s ..

(25) 10. 1 5 2. aft. A small lattice shifted by the red vector, modulo 1.. 7. ui,2 4 1. Dr. 6. 3. 0. 0. ui,1. 1.

(26) 10. 1. aft. A small lattice shifted by the red vector, modulo 1. 1. 5 2. 7. ui,2 4 1. Ui,2. Dr 3. 0. ui,1. 4. 1. 6. 6. 0. 2. 7. 1. 0. 3. 0. 5. Ui,1. 1.

(27) 11. aft. Can generate the shift uniformly in the parallelotope determined by basis vectors, 1. 0. Dr. ui,2. ui,1. 1.

(28) 11. aft. Can generate the shift uniformly in the parallelotope determined by basis vectors, 1. 1. ui,2. 0. Dr. Ui,2. ui,1. 1. 0. Ui,1. 1.

(29) 11. aft. Can generate the shift uniformly in the parallelotope determined by basis vectors, or in any shifted copy of it. 1. 1. ui,2. 0. Dr. Ui,2. ui,1. 1. 0. Ui,1. 1.

(30) 12. Perhaps less obvious: Can generate it in any of the colored shapes below.. aft. 1. 0. Dr. Ui,2. Ui,1. 1.

(31) 12. Perhaps less obvious: Can generate it in any of the colored shapes below. 1. aft. 1. Ui,2. 0. Dr. Ui,2. Ui,1. 1. 0. Ui,1. 1.

(32) 12. Perhaps less obvious: Can generate it in any of the colored shapes below. 1. aft. 1. Ui,2. 0. Dr. Ui,2. Ui,1. 1. 0. Ui,1. 1.

(33) 13. aft. Generating the shift uniformly in one tile. Proposition. Let R ⊂ [0, 1)s such that. {Ri = (R + ui ) mod 1, i = 0, . . . , n − 1} is a partition of [0, 1)s in n regions of volume 1/n. Then, sampling the random shift U uniformly in any given Ri is equivalent to sampling it uniformly in [0, 1)s .. Dr. The error function. gn (U) = µ̂n,rqmc − µ. over any Ri is the same as over R..

(34) 14. Error function gn (u) for f (u1 , u2 ) = (u1 − 1/2) (u2 − 1/2).. 0.8. 0.2. 0. Dr. u2. 0.6. 0.4. 0.3. aft. 1. 0. 0.2. 0.4. 0.6. 0.2. 0.1. 0. −0.1. −0.2 0.8. 1.

(35) 15. Error function gn (u) for f (u1 , u2 ) = (u1 − 1/2) + (u2 − 1/2).. aft. 1. 0.8. 0.4. 0.2. 0. Dr. u2. 0.6. 0. 0.2. 0.4. 0.6. 1. 0.5. 0. −0.5. −1. 0.8. 1.

(36) 16. Error function gn (u) for f (u1 , u2 ) = u1 u2 (u1 − 1/2) (u2 − 1/2). ·10−2. aft. 1. 0.8. 0.4. 0.2. 0. 0.5. 0. Dr. uy. 0.6. 1. 0. 0.2. 0.4. 0.6. −0.5. −1. −1.5 0.8. 1.

(37) 17. aft. Variance expression Suppose f has Fourier expansion √ X t f (u) = fˆ(h)e 2π −1h u . h∈Zs. For a randomly shifted lattice, the exact variance is (always) X Var[µ̂n,rqmc ] = |fˆ(h)|2 ,. Dr. 06=h∈L∗s. where L∗s = {h ∈ Rs : ht v ∈ Z for all v ∈ Ls } ⊆ Zs is the dual lattice. From the viewpoint of variance reduction, an optimal lattice for f minimizes the square “discrepancy” D 2 (Pn ) = Var[µ̂n,rqmc ]..

(38) 18. Var[µ̂n,rqmc ] =. X. |fˆ(h)|2 .. aft. 06=h∈L∗s. If f has square-integrable mixed partial derivatives up to order α, and the periodic continuation of its derivatives up to order α − 1 is continuous across the unit cube boundaries, then |fˆ(h)|2 = O((max(1, h1 ), . . . , max(1, hs ))−2α ). Moreover, there is a v1 = v1 (n) such that X. (max(1, h1 ), . . . , max(1, hs ))−2α = O(n−2α+δ ).. Dr. def. P2α =. 06=h∈L∗s. This is the variance for a worst-case f having |fˆ(h)|2 = (max(1, h1 ), . . . , max(1, hs ))−2α ..

(39) 18. Var[µ̂n,rqmc ] =. X. |fˆ(h)|2 .. aft. 06=h∈L∗s. If f has square-integrable mixed partial derivatives up to order α, and the periodic continuation of its derivatives up to order α − 1 is continuous across the unit cube boundaries, then |fˆ(h)|2 = O((max(1, h1 ), . . . , max(1, hs ))−2α ). Moreover, there is a v1 = v1 (n) such that X. (max(1, h1 ), . . . , max(1, hs ))−2α = O(n−2α+δ ).. Dr. def. P2α =. 06=h∈L∗s. This is the variance for a worst-case f having |fˆ(h)|2 = (max(1, h1 ), . . . , max(1, hs ))−2α .. Beware of hidden factor in O when s is large. This worst-case function may be far from representative in applications..

(40) 19. Baker’s transformation. aft. Want to make the periodic continuation continuous.. Dr. If f (0) 6= f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2. This f˜ has the same integral as f and f˜(0) = f˜(1).. 0. 1. 1/2. ..

(41) 19. Baker’s transformation. aft. Want to make the periodic continuation continuous.. Dr. If f (0) 6= f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2. This f˜ has the same integral as f and f˜(0) = f˜(1).. 0. 1. 1/2. ..

(42) 19. Baker’s transformation. aft. Want to make the periodic continuation continuous.. Dr. If f (0) 6= f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2. This f˜ has the same integral as f and f˜(0) = f˜(1).. 0. 1. 1/2. ..

(43) 19. Baker’s transformation. aft. Want to make the periodic continuation continuous.. Dr. If f (0) 6= f (1), define f˜ by f˜(1 − u) = f˜(u) = f (2u) for 0 ≤ u ≤ 1/2. This f˜ has the same integral as f and f˜(0) = f˜(1).. 0. 1/2. 1. For smooth f , can reduce the variance to O(n−4+δ ) (Hickernell 2002). The resulting f˜ also symmetric with respect to u = 1/2. In practice, we transform the points Ui instead of f ..

(44) 20. aft. One-dimensional case. Random shift followed by baker’s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing Uj by min[2Uj , 2(1 − Uj )].. 0.5. 1. Dr. 0.

(45) 20. aft. One-dimensional case. Random shift followed by baker’s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing Uj by min[2Uj , 2(1 − Uj )].. 0.5. U/n. 1. Dr. 0.

(46) 20. aft. One-dimensional case. Random shift followed by baker’s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing Uj by min[2Uj , 2(1 − Uj )].. 0.5. 1. Dr. 0.

(47) 20. aft. One-dimensional case. Random shift followed by baker’s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing Uj by min[2Uj , 2(1 − Uj )]. Gives locally antithetic points in intervals of size 2/n.. 0.5. 1. Dr. 0.

(48) 21. Searching for a lattice that minimizes Var[µ̂n,rqmc ] =. X. |fˆ(h)|2. is unpractical, because:. aft. 06=h∈L∗s. I. the Fourier coefficients are usually unknown,. I. there are infinitely many,. I. must do it for each f .. Dr. We nevertheless want to see how far we can go in that direction..

(49) 21. Searching for a lattice that minimizes Var[µ̂n,rqmc ] =. X. |fˆ(h)|2. is unpractical, because:. aft. 06=h∈L∗s. I. the Fourier coefficients are usually unknown,. I. there are infinitely many,. I. must do it for each f .. Dr. We nevertheless want to see how far we can go in that direction. We start with a simple function for which we know the Fourier expansion. Even then, the discrepancy involves an infinite number of terms! Possible ideas: Truncate the sum to a finite subset B: X |fˆ(h)|2 , 06=h∈L∗s ∩B. or to the largest q square coefficients |fˆ(h)|2 . But hard to implement!.

(50) 22. Dual-space exploration The following makes sense if the |fˆ(h)|2 tend to decrease with each |hj |.. aft. Start with a large set L of lattices (or generating vectors v1 , for given n). Search for vectors h with large weights w (h) = |fˆ(h)|2 , via a neighborhood search starting at h = 0, keeping a sorted list (as in Dijkstra’s shortest path algorithm), and eliminate (successively) from L the lattices whose dual contains h for the next largest w (h), until a single lattice remains.. Dr. Example of neighborhood N (h): only one coordinate differs, by one unit..

(51) 22. Dual-space exploration The following makes sense if the |fˆ(h)|2 tend to decrease with each |hj |.. aft. Start with a large set L of lattices (or generating vectors v1 , for given n). Search for vectors h with large weights w (h) = |fˆ(h)|2 , via a neighborhood search starting at h = 0, keeping a sorted list (as in Dijkstra’s shortest path algorithm), and eliminate (successively) from L the lattices whose dual contains h for the next largest w (h), until a single lattice remains. Example of neighborhood N (h): only one coordinate differs, by one unit.. Dr. Component-by-component version: For j = 1, 2, . . . , s, we apply the algorithm for a set L of j-dimensional lattices with common (fixed) j − 1 first coordinates, and determine the jth coordinate by visiting the j-dimensional vectors h. When the |fˆ(h)| are unknown, we may think of estimating them as needed, dynamically..

(52) 23. Dr. aft. Algorithm Dual-Space-Exploration(lattice set L, weights w ); Q ← N (0) // vectors h to be visited, sorted by weight w (h); M ← N (0) // vectors h who already entered Q; while |L| > 1 do h ← remove first from Q; for all lattices Ls ∈ L such that h ∈ L∗s do remove Ls from L; if |L| = 1 then return the single lattice Ls ∈ L and exit; end if end for for all h0 ∈ N (h) \ M do add h0 to M and to Q with priority (weight) w (h0 ); end for end while.

(53) 24. aft. An example. Take the product V-shaped function f (u) =. s Y |4uj − 2| + cj , 1 + cj j=1. so. Y. 4 . (1 + cj )π 2 hj2. Dr. fˆ(h) =. {j : hj is odd}. Dimensions s = 5 and 10. Constants cj = 1, j, j 2 , j 3 ..

(54) 25. aft. The Dual Exploration Algorithm in Action. (1,1). Dr. 2.74 × 10−2. Colors:. [in Q]. [in M \ Q]. [visiting].

(55) 25. aft. The Dual Exploration Algorithm in Action. (1,3). 3.04 × 10−3 (1,1). (3,1). 2.74 × 10−2. 3.04 × 10−3. Dr. (-1,1). 2.74 × 10−2. Colors:. [in Q]. [in M \ Q]. [visiting].

(56) 25. aft. The Dual Exploration Algorithm in Action. (1,3). 3.04 × 10−3 (1,1). (3,1). 2.74 × 10−2. 3.04 × 10−3. Dr. (-1,1). 2.74 × 10−2. Colors:. [in Q]. [in M \ Q]. [visiting].

(57) 25. (-3,1). (-1,3). (1,3). 3.04 × 10−3. 3.04 × 10−3. (-1,1). (1,1). (3,1). 2.74 × 10−2. 2.74 × 10−2. 3.04 × 10−3. Dr. 3.04 × 10−3. aft. The Dual Exploration Algorithm in Action. Colors:. [in Q]. [in M \ Q]. [visiting].

(58) 25. (-3,1). (-1,3). (1,3). 3.04 × 10−3. 3.04 × 10−3. (-1,1). (1,1). (3,1). 2.74 × 10−2. 2.74 × 10−2. 3.04 × 10−3. Dr. 3.04 × 10−3. aft. The Dual Exploration Algorithm in Action. Colors:. [in Q]. [in M \ Q]. [visiting].

(59) 25. (-3,3) 3.38 × 10−4 (-3,1) 3.04 × 10−3. (-1,3). (1,3). 3.04 × 10−3. 3.04 × 10−3. (-1,1). (1,1). (3,1). 2.74 × 10−2. 2.74 × 10−2. 3.04 × 10−3. Dr. (-5,1) 1.10 × 10−3. aft. The Dual Exploration Algorithm in Action. Colors:. [in Q]. [in M \ Q]. [visiting].

(60) 25. (-3,3) 3.38 × 10−4 (-3,1) 3.04 × 10−3. (-1,3). (1,3). 3.04 × 10−3. 3.04 × 10−3. (-1,1). (1,1). (3,1). 2.74 × 10−2. 2.74 × 10−2. 3.04 × 10−3. Dr. (-5,1) 1.10 × 10−3. aft. The Dual Exploration Algorithm in Action. Colors:. [in Q]. [in M \ Q]. [visiting].

(61) 25. aft. The Dual Exploration Algorithm in Action. (-1,5). 1.10 × 10−3 (-3,3) 3.38 × 10−4 (-3,1) 3.04 × 10−3. (1,3). 3.04 × 10−3. (-1,1). (1,1). (3,1). 2.74 × 10−2. 2.74 × 10−2. 3.04 × 10−3. Dr. (-5,1) 1.10 × 10−3. (-1,3). 3.04 × 10−3. Colors:. [in Q]. [in M \ Q]. [visiting].

(62) 25. aft. The Dual Exploration Algorithm in Action. (-1,5). 1.10 × 10−3 (-3,3) 3.38 × 10−4 (-3,1) 3.04 × 10−3. (1,3). 3.04 × 10−3. (-1,1). (1,1). (3,1). 2.74 × 10−2. 2.74 × 10−2. 3.04 × 10−3. Dr. (-5,1) 1.10 × 10−3. (-1,3). 3.04 × 10−3. Colors:. [in Q]. [in M \ Q]. [visiting].

(63) 25. (-3,3) 3.38 × 10−4 (-3,1) 3.04 × 10−3. (-1,5). (1,5). 1.10 × 10−3. 1.10 × 10−3. (-1,3). (1,3). (3,3). 3.04 × 10−3. 3.04 × 10−3. 3.38 × 10−4. (-1,1). (1,1). (3,1). 2.74 × 10−2. 2.74 × 10−2. 3.04 × 10−3. Dr. (-5,1) 1.10 × 10−3. aft. The Dual Exploration Algorithm in Action. Colors:. [in Q]. [in M \ Q]. [visiting].

(64) 25. (-3,3) 3.38 × 10−4 (-3,1) 3.04 × 10−3. (-1,5). (1,5). 1.10 × 10−3. 1.10 × 10−3. (-1,3). (1,3). (3,3). 3.04 × 10−3. 3.04 × 10−3. 3.38 × 10−4. (-1,1). (1,1). (3,1). 2.74 × 10−2. 2.74 × 10−2. 3.04 × 10−3. Dr. (-5,1) 1.10 × 10−3. aft. The Dual Exploration Algorithm in Action. Colors:. [in Q]. [in M \ Q]. [visiting].

(65) 25. (-3,3) 3.38 × 10−4 (-3,1) 3.04 × 10−3. (-1,5). (1,5). 1.10 × 10−3. 1.10 × 10−3. (-1,3). (1,3). (3,3). 3.04 × 10−3. 3.04 × 10−3. 3.38 × 10−4. (-1,1). (1,1). (3,1). (5,1). 2.74 × 10−2. 2.74 × 10−2. 3.04 × 10−3. 1.10 × 10−3. Dr. (-5,1) 1.10 × 10−3. aft. The Dual Exploration Algorithm in Action. Colors:. [in Q]. [in M \ Q]. [visiting].

(66) 25. (-3,3) 3.38 × 10−4 (-3,1) 3.04 × 10−3. (-1,5). (1,5). 1.10 × 10−3. 1.10 × 10−3. (-1,3). (1,3). (3,3). 3.04 × 10−3. 3.04 × 10−3. 3.38 × 10−4. (-1,1). (1,1). (3,1). (5,1). 2.74 × 10−2. 2.74 × 10−2. 3.04 × 10−3. 1.10 × 10−3. Dr. (-5,1) 1.10 × 10−3. aft. The Dual Exploration Algorithm in Action. Colors:. [in Q]. [in M \ Q]. [visiting].

(67) 26. aft. Evolution of the Dual Exploration Algorithm. 10−4 10−6 10−8. |L|. 1,000 500 0. 0. 500. 1,000. 1,500. 2,000. 2,500. Dr. merit. s = 2, cj = j, n = 1021. 0. 500. 1,000. 1,500. 2,000. 2,500. iteration. 3,000. 3,500. 4,000. 4,500. 3,000. 3,500. 4,000. 4,500.

(68) 27. Estimated variance vs n for s = 5. 10−10. 26. 28. 210 n. 26. 10−8. variance. variance. Dr. 10−15. 10−10. 212. s = 5, cj = j 2. 10−10. s = 5, cj = j. aft variance. variance. 10−5. s = 5, cj = 1. 10−5. 28. 210. n. 212. 210. 212. n. s = 5, cj = j 3. 10−12. exploration. 10−16. 26. 28. 26. 28. 210 n. 212.

(69) 28. Estimated variance vs n for s = 10 10−5. 10−6 10−8 26. 28. 210 n. 26. 10−8. variance. variance. Dr. 10−15. 10−10. 212. s = 10, cj = j 2. 10−10. s = 10, cj = j. aft. 10−4. variance. variance. s = 10, cj = 1. 28. 210. n. 212. 210. 212. n. s = 10, cj = j 3. 10−12. exploration. 10−16. 26. 28. 26. 28. 210 n. 212.

(70) 29. ANOVA decomposition. f (u) =. X. aft. The Fourier expansion has too many terms to handle. As a cruder expansion, we can write f (u) = f (u1 , . . . , us ) as: fu (u) = µ +. s X. f{i} (ui ) +. i=1. u⊆{1,...,s}. where. Z fu (u) =. [0,1)|ū|. f (u) duū −. s X. f{i,j} (ui , uj ) + · · ·. i,j=1. X. fv (uv ),. v⊂u. Dr. and the Monte Carlo variance decomposes as X σ2 = σu2 , where σu2 = Var[fu (U)]. u⊆{1,...,s}. Sensitivity indices: Su = σu2 /σ 2 . Can be estimated by MC or RQMC. Heuristic intuition: Make sure the projections of Pn are very uniform for the important subsets u (i.e., with large Su )..

(71) 30. aft. Shift-invariant discrepancy In a reproducing kernel Hilbert space (RKHS) with kernel K , and randomly-shifted points, the relevant discrepancy corresponds to the shift-invariant kernel Ksh (ui , uj ) := E[K (Ui , Uj )] = E[K (ui + U, uj + U)] = E[K (ui − uj + U, U)].. The mean square discrepancy can be written as. n−1 √ X X t 1 X w (h) e 2π −1h ui = w (h) n s s. Dr. E[D 2 (Pn )] =. 06=h∈Z. i=0. (for a lattice).. Key issue: choice of the weights w (h).. 06=h∈L∗.

(72) 31. Regrouping by projections: projection weights. E[D 2 (Pn )] =. X. aft. Denote u(h) = u(h1 , . . . , hs ) the set of indices j for which hj 6= 0. We have X. u⊆{1,...,s} h:u(h)=u. w (h). n−1 1 X 2π√−1ht ui e n i=0. =. X. Du2 (Pn ).. u⊆{1,...,s}. Dr. The RKHS decomposes as a direct sum and the RQMC variance has a corresponding decomposition. X Var[µ̂n,rqmc ] = Var[µ̂n,rqmc (fu )]. u⊆{1,...,s}. Q Restriction on the weights: take w (h) = γu(h) j∈u hj−2α for all h. Those projection weights are the so-called general weights..

(73) 31. Regrouping by projections: projection weights. E[D 2 (Pn )] =. X. aft. Denote u(h) = u(h1 , . . . , hs ) the set of indices j for which hj 6= 0. We have X. u⊆{1,...,s} h:u(h)=u. w (h). n−1 1 X 2π√−1ht ui e n i=0. =. X. Du2 (Pn ).. u⊆{1,...,s}. Dr. The RKHS decomposes as a direct sum and the RQMC variance has a corresponding decomposition. X Var[µ̂n,rqmc ] = Var[µ̂n,rqmc (fu )]. u⊆{1,...,s}. Q Restriction on the weights: take w (h) = γu(h) j∈u hj−2α for all h. Those projection weights are the so-called general weights. Anyhow, how should we choose them?.

(74) 32. Example: a weighted Sobolev space. u⊆{1,...,s}. j∈u. aft. Space of functions with integrable partial derivatives. RKHS with kernel X Y K (u, x) = γu 2π 2 [B2 ((uj − xj ) mod 1)/2 + (uj − 0.5)(xj − 0.5)] where B2 (u) = u 2 − u + 1/6. The shift-invariant kernel is X Y Ksh (u, x) = γu 2π 2 B2 ((uj − xj ) mod 1) j∈u. Dr. u⊆{1,...,s}. and the corresponding mean square discrepancy for a randomly-shifted lattice rule with v1 = (v1 , . . . , vs ) = z/n is E[D 2 (Pn )] =. n 1X n. X. i=1 u⊆{1,...,s}. γu. Y j∈u. 2π 2 B2 ((i vj ) mod 1)..

(75) 33. From the Fourier expansion of B2 , we also have E[D (Pn )] = =. n √ YX 1X X γu hj−2 e 2π −1ihj vj n i=1 u⊆{1,...,s} j∈u hj 6=0 X Y X γu(h) hj−2 = w (h).. aft. 2. 06=h∈L∗s. 06=h∈L∗s. j∈u(h). For those weights, we have w (h) = |fˆ(h)|2 for the function X Y 1/2 f (u) = (2π)|u| γu (uj − 0.5), u⊆{1,...,s}. j∈u. Dr. so E[D 2 (Pn )] is the RQMC variance for this f .. On the other hand, the ANOVA variance components for this f are Y σu2 = (4π 2 )|u| γu Var[U − 0.5] = (4π 2 /12)|u| γu = (π 2 /3)|u| γu . j∈u. The optimal weights for this f are then. γu = (3/π 2 )|u| σu2 ≈ (0.30396)|u| σu2 ..

(76) aft. 34. Using the same kernel and a different heuristic argument, Wang and Sloan (2006) come up with weights that generalize to (they do this for product weights only): γu2 = (45/π 4 )|u| σu2 , √ γu = ( 45/π 2 )|u| σu ≈ (0.6797)|u| σu ,. Dr. that is,.

(77) aft. 34. Using the same kernel and a different heuristic argument, Wang and Sloan (2006) come up with weights that generalize to (they do this for product weights only): γu2 = (45/π 4 )|u| σu2 , √ γu = ( 45/π 2 )|u| σu ≈ (0.6797)|u| σu ,. Dr. that is,. With γu = 1, we obtain the classical (unweighted) P2 ..

(78) Weighted P2α :. 35. X 06=h∈L∗s. γu(h) (max(1, h1 ), . . . , max(1, hs ))−2α. aft. P2α =. Variance for a worst-case function whose square Fourier coefficients are |fˆ(h)|2 = γu(h) (max(1, h1 ), . . . , max(1, hs ))−2α . This is the RQMC variance for the function f (u) =. X. γu. Y (2π)α j∈u. Dr. u⊆{1,...,s}. √. α!. Bα (uj ).. We also have for this f : σu2.  |u|  |u| (4π 2 )α (4π 2 )α = γu Var[Bα (U)] = γu |B2α (0)| . (α!)2 (2α)!.

(79) Weighted P2α :. 35. X 06=h∈L∗s. γu(h) (max(1, h1 ), . . . , max(1, hs ))−2α. aft. P2α =. Variance for a worst-case function whose square Fourier coefficients are |fˆ(h)|2 = γu(h) (max(1, h1 ), . . . , max(1, hs ))−2α . This is the RQMC variance for the function f (u) =. X. γu. Y (2π)α j∈u. Dr. u⊆{1,...,s}. √. α!. Bα (uj ).. We also have for this f : σu2.  |u|  |u| (4π 2 )α (4π 2 )α = γu Var[Bα (U)] = γu |B2α (0)| . (α!)2 (2α)!. For α = 1, we should take γu = (3/π 2 )|u| σu2 ≈ (0.30396)|u| σu2 . For α = 2, we should take γu = [45/π 4 ]|u| σu2 ≈ (0.46197)|u| σu2 . The ratios weight / variance should decrease exponentially with |u|..

(80) 36. Heuristics for choosing the weights. Dr. aft. Idea 1: take γu ≈ σu2 or γu ≈ Su for each u. Too simplistic..

(81) 36. Heuristics for choosing the weights. aft. Idea 1: take γu ≈ σu2 or γu ≈ Su for each u. Too simplistic.. Dr. Idea 2: Just take simple order-dependent weights. For example, γu = 1 for |u| ≤ d and γu = 0 otherwise. Wang (2007) recommends this with d = 2..

(82) 36. Heuristics for choosing the weights. aft. Idea 1: take γu ≈ σu2 or γu ≈ Su for each u. Too simplistic. Idea 2: Just take simple order-dependent weights. For example, γu = 1 for |u| ≤ d and γu = 0 otherwise. Wang (2007) recommends this with d = 2.. Dr. Idea 3: In general, one can define a simple parametric model for the weights and then estimate the parameters by matching the ANOVA variances (e.g., Wang and Sloan 2006). Q For example, γu = j∈u γj for some constants γj ≥ 0 (product weights). Fewer parameters: take γj = aβ j for a, β > 0 (geometric). With a weighted Pα -type criterion, we should have γu = ρ|u| σu2 for some ρ > 0..

(83) aft. 37. Proposal 4: A strategy for order-dependent weights. Assume γu = Γ|u| . Need to select Γ1 , . . . , Γs .. For each u, let vu2 be an estimate of the optimal γu . Strategy: take Γr as the average. X. Dr.  −1 s Γr = r. vu2 .. {u :|u|=r }. Here, scaling all weights by the same factor changes nothing..

(84) 38. Proposal 5: A strategy for product weights. Ignore one-dimensional projections; they are the same for all lattices.. aft. The idea is to fit the estimated “optimal weights” over all two-dimensional projections via a least-squares procedure. Then we rescale all the weights by a constant factor to match the ratio of average estimated “optimal weights” over the three-dimensional projections to that over the two-dimensional projections. Let τj be the unscaled weight for projection j. We first minimize 2 τj τk − v{j,k}. Dr. R=. s X k−1  X. 2. .. k=1 j=1. Differentiating w.r.t. τj and equaling to 0, we obtain, for each j, τj. s X. k=1, k6=j. τk2. =. s X. k=1, k6=j. 2 τk v{j,k} ..

(85) 39. aft. This can be solved by an iterative fixed-point algorithm:. (i) 2 k=1, k6=j τk v{j,k}  2 , Ps (i) k=1, k6=j τk. Ps. (0) τj. =. max v{k,l} ,. k,l=1,...,s. for i = 1, 2, . . . .. (i+1) τj. =. Dr. We then rescale the weights via γj = cτj where the constant c satisfies Pk−1 k=1 j=1 τj τk Ps Pk−1 Pj−1 k=1 j=1 l=1 τj τk τl. Ps. Ps. =c. Pk−1 2 k=1 j=1 v{j,k} . Ps Pk−1 Pj−1 2 k=1 j=1 l=1 v{j,k,l}.

(86) 40. Idea 6: Control the shortest vector in dual lattice, for each projection.. min. aft. Spectral test for LCGs (Knuth, Fishman, etc.):. 2≤r ≤t1. `{1,...,r } `∗r (n). Dr. where `u is the length of a shortest vector in L∗s (u) and `∗r (n) is a theoretical upper bound on this length, in r dimensions. Advantages: Computing time of `u are almost independent of n, although exponential in |u|. Poor lattices can be eliminated quickly: search is fast..

(87) 40. aft. Idea 6: Control the shortest vector in dual lattice, for each projection. Lemieux and L’Ecuyer (2000, etc.) maximize   `{1,...,r } ` u  Mt1 ,...,td = min  min , min min , ∗ (n) 2≤r ≤t1 `∗ 2≤r ≤d (n) ` u={j ,...,j }⊂{1,...,s} r 1 r r 1=j1 <···<jr ≤tr. Dr. where `u is the length of a shortest vector in L∗s (u) and `∗r (n) is a theoretical upper bound on this length, in r dimensions. Advantages: Computing time of `u are almost independent of n, although exponential in |u|. Poor lattices can be eliminated quickly: search is fast. This can of course be generalized by adding weights to projections..

(88) 41. Searching for lattice parameters. Dr. aft. Korobov lattices. Search for z = (1, a, a2 , . . . , ...) over all admissible integers a..

(89) 41. Searching for lattice parameters. aft. Korobov lattices. Search for z = (1, a, a2 , . . . , ...) over all admissible integers a. Component by component (CBC) construction.. Dr. Let z1 = 1; For j = 2, . . . , s, find zj ∈ {1, . . . , n − 1}, gcd(zj , n) = 1, such that (z1 , . . . , zj−1 , zj ) minimizes the selected discrepancy for the first j dimensions..

(90) 41. Searching for lattice parameters. aft. Korobov lattices. Search for z = (1, a, a2 , . . . , ...) over all admissible integers a. Component by component (CBC) construction.. Dr. Let z1 = 1; For j = 2, . . . , s, find zj ∈ {1, . . . , n − 1}, gcd(zj , n) = 1, such that (z1 , . . . , zj−1 , zj ) minimizes the selected discrepancy for the first j dimensions. Partial randomized CBC construction. Let z1 = 1; For j = 2, . . . , s, try r random zj ∈ {1, . . . , n − 1}, gcd(zj , n) = 1, and retain the one for which (z1 , . . . , zj−1 , zj ) minimizes the selected discrepancy for the first j dimensions..

(91) 42. Example: stochastic activity network. aft. [Elmaghraby 1977]. Each arc j has random length Vj = Fj−1 (Uj ). Let T = f (U1 , . . . , U13 ) = length of longest path from node 1 to node 9. Want to estimate q(x) = P[T > x] for a given constant x. sink. V11. 6. V6 3 V2. 9. V10. Dr. V5. 5. 1 source. V13. V9. V3. 8. V8. V12. V1. 2. V4. 4. V7. 7.

(92) 43. aft. To estimate q(x) by MC, we generate n independent realizations of T , Pn say T1 , . . . , Tn , and take (1/n) i=1 I[Ti > x].. Dr. For RQMC, we replace the n realizations of (U1 , . . . , U13 ) by the n points of a randomly-shifted lattice..

(93) 43. aft. To estimate q(x) by MC, we generate n independent realizations of T , Pn say T1 , . . . , Tn , and take (1/n) i=1 I[Ti > x]. For RQMC, we replace the n realizations of (U1 , . . . , U13 ) by the n points of a randomly-shifted lattice. Illustration: Vj ∼ Normal(µj , σj2 ) for j = 1, 2, 4, 11, 12, and Vj ∼ Exponential(1/µj ) otherwise.. Dr. The µj : 13.0, 5.5, 7.0, 5.2, 16.5, 14.7, 10.3, 6.0, 4.0, 20.0, 3.2, 3.2, 16.5..

(94) 43. aft. To estimate q(x) by MC, we generate n independent realizations of T , Pn say T1 , . . . , Tn , and take (1/n) i=1 I[Ti > x]. For RQMC, we replace the n realizations of (U1 , . . . , U13 ) by the n points of a randomly-shifted lattice. Illustration: Vj ∼ Normal(µj , σj2 ) for j = 1, 2, 4, 11, 12, and Vj ∼ Exponential(1/µj ) otherwise.. Dr. The µj : 13.0, 5.5, 7.0, 5.2, 16.5, 14.7, 10.3, 6.0, 4.0, 20.0, 3.2, 3.2, 16.5. CMC estimator. Generate the Vj ’s only for the 8 arcs that do not belong to the cut L = {5, 6, 7, 9, 10}, and replace I[T > x] by its conditional expectation given those Vj ’s, P[T > x | {Vj , j 6∈ L}]. This makes the integrand continuous in the Uj ’s..

(95) 44. sink. aft. V11. 6. V6 3 V2. 9. V10. V5. 5. V3. source. V9. Dr. 1. V13. 8. V8. V12. V1. 2. V4. 4. V7. 7.

(96) 45. aft. ANOVA Variances for the Stochastic Activity Network Stochastic Activity Network. x = 64 x = 100. Dr. CMC, x = 30. 40. 60. % of total variance. Order 3 Order 4. Order 6 Order 7. CMC, x = 100. 20. Order 2. Order 5. CMC, x = 64. 0. Order 1. 80. 100.

(97) ANOVA decomposition. aft. 46. There are six paths from 1 to 9:. {{1, 5, 11}, {2, 6, 11}, {1, 3, 6, 11}, {1, 4, 7, 12, 13}, {1, 4, 8, 9, 13}, {1, 4, 8, 10, 11}} . Intuition: the important projections should be only the subsets of those paths. Fraction of the total variance that lies in these projections: x = 64 80.6 % 99.5 %. Dr. x = 30. crude MC conditional MC. 88.8 %. x = 100 96.3 % 100 %.

(98) 47. Lattices of Rank 1 with CBC Stochastic Activity Network (x = 64). aft. 10−3. 10−5. MC Sobol. 10−6. M13,13,13,13,13,13 P2 order 2 only. Dr. variance. 10−4. P2 product vu2 = (3/π 2 )|u| σu2. 10−7. P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) n−2. 26. 28. 210 n. 212. 214.

(99) 48. Lattices of Rank 1 with CBC. aft. Stochastic Activity Network (x = 100) 10−4. variance. 10−5. MC Sobol. 10−6. Dr. M13,13,13,13,13,13 P2 order 2 only. P2 product vu2 = (3/π 2 )|u| σu2. 10−7. P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) n−2. 26. 28. 210 n. 212. 214.

(100) 49. Lattices of Rank 1 with CBC 10−5. variance. 10−7. 10−8. 10−9. MC Sobol. M13,13,13,13,13,13 weighted M13,13,13,13,13,13 P2 order 2 only. Dr. 10−6. aft. Stochastic Activity Network (CMC x = 30). P2 order vu2 = (3/π 2 )|u| σu2. P2 product vu2 = (3/π 2 )|u| σu2 10−10 P product v 2 = (3/π 2 )|u| σ 2 (no baker) 2 u u. P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) 8 26 210 n−2 2 n. 212. 214.

(101) 50. Lattices of Rank 1 with CBC. 10−6. 10−7. aft. variance. 10−5. Stochastic Activity Network (CMC x = 64). MC Sobol. M13,13,13,13,13,13 weighted M13,13,13,13,13,13 P2 order 2 only. Dr. 10−4. P2 order vu2 = (3/π 2 )|u| σu2. 10−8. 10−9. P2 product vu2 = (3/π 2 )|u| σu2. P2 product vu2 = (3/π 2 )|u| σu2 (no baker). P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) 8 26 210 n−2 2 n. 212. 214.

(102) 51. Lattices of Rank 1 with CBC 10−5. MC Sobol 10−7. M13,13,13,13,13,13 weighted M13,13,13,13,13,13 P2 order 2 only. Dr. variance. 10−6. aft. Stochastic Activity Network (CMC x = 100). P2 order vu2 = (3/π 2 )|u| σu2. 10−8. P2 product vu2 = (3/π 2 )|u| σu2. P2 product vu2 = (3/π 2 )|u| σu2 (no baker). P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) 8 26 210 n−2 2 n. 212. 214.

(103) 52. Random vs. Full CBC. aft. Stochastic Activity Network (CMC x = 30) 10−6. 10−8. 10−9. 10−10. Dr. variance. 10−7. Full CBC (P2 product) Random CBC (P2 product). 26. 28. 210 n. 212. 214.

(104) 53. Random vs. Full CBC 10−4. aft. Stochastic Activity Network (CMC x = 64). 10−6. 10−7. 10−8. Dr. variance. 10−5. Full CBC (P2 product) Random CBC (P2 product). 26. 28. 210 n. 212. 214.

(105) 54. Random vs. Full CBC. aft. Stochastic Activity Network (CMC x = 100) 10−5. variance. 10−6. 10−8. Dr. 10−7. Full CBC (P2 product) Random CBC (P2 product). 26. 28. 210 n. 212. 214.

(106) 55. Prime vs. Power-of-2 Number of Points. aft. Stochastic Activity Network (CMC x = 30) 10−6. 10−8. 10−9. 10−10. Dr. variance. 10−7. prime (P2 product) power of 2 (P2 product). 26. 28. 210 n. 212. 214.

(107) 56. Prime vs. Power-of-2 Number of Points 10−4. aft. Stochastic Activity Network (CMC x = 64). 10−6. 10−7. 10−8. Dr. variance. 10−5. prime (P2 product) power of 2 (P2 product). 26. 28. 210 n. 212. 214.

(108) 57. Prime vs. Power-of-2 Number of Points. aft. Stochastic Activity Network (CMC x = 100) 10−5. variance. 10−6. 10−8. Dr. 10−7. prime (P2 product) power of 2 (P2 product). 26. 28. 210 n. 212. 214.

(109) 58. Korobov vs. CBC 10−5. aft. Stochastic Activity Network (CMC x = 30). 10−6. 10−8. 10−9. 10−10. Dr. variance. 10−7. M32,24,16,12 M13,13,13,13,13,13 P2 product. 26. 28. Solid: CBC.. 210 n. 212. Dashed: Korobov.. 214.

(110) 59. Korobov vs. CBC Stochastic Activity Network (CMC x = 64). aft. 10−4. 10−6. 10−7. 10−8. Dr. variance. 10−5. M32,24,16,12 M13,13,13,13,13,13 P2 product. 26. 28. Solid: CBC.. 210 n. 212. Dashed: Korobov.. 214.

(111) 60. Korobov vs. CBC. aft. Stochastic Activity Network (CMC x = 100) 10−5. variance. 10−6. 10−8. Dr. 10−7. M32,24,16,12 M13,13,13,13,13,13 P2 product. 26. 28. Solid: CBC.. 210 n. 212. Dashed: Korobov.. 214.

(112) 61. Histograms, for n = 8191, m = 104 replications. 0. aft. probability. single MC draw (x = 100) 1 0.8 0.6 0.4 0.2 0. 0.5. 1. 0.15 0.1. 5 · 10−2 0. Dr. probability. MC estimator (x = 100). 6. 7 ·10−2. probability. RQMC estimator (x = 100). 0.1. 5 · 10−2 0. 6.5. 7 ·10−2.

(113) 62. Histograms 0.3 0.2 0.1 0. aft. probability. single MC draw (CMC x = 100). 0. 0.5. 1. 0.15 0.1. 5 · 10−2 0. Dr. probability. MC estimator (CMC x = 100). 6. 6.5. 7 ·10−2. probability. RQMC estimator (CMC x = 100). 0.15 0.1. 5 · 10−2 0. 6.4. 6.5. 6.6. 6.7 ·10−2.

(114) 63. aft. Function of a Multinormal vector. Dr. Let µ = E [f (U)] = E [g (Y)] where Y = (Y1 , . . . , Ys ) ∼ N(0, Σ)..

(115) 63. aft. Function of a Multinormal vector Let µ = E [f (U)] = E [g (Y)] where Y = (Y1 , . . . , Ys ) ∼ N(0, Σ).. Dr. For example, if the payoff of a financial derivative is a function of the values taken by a c-dimensional geometric Brownian motions (GMB) at d observations times 0 < t1 < · · · < td = T , then we have s = cd..

(116) 63. aft. Function of a Multinormal vector Let µ = E [f (U)] = E [g (Y)] where Y = (Y1 , . . . , Ys ) ∼ N(0, Σ). For example, if the payoff of a financial derivative is a function of the values taken by a c-dimensional geometric Brownian motions (GMB) at d observations times 0 < t1 < · · · < td = T , then we have s = cd.. Dr. To generate Y: Decompose Σ = AAt , generate Z = (Z1 , . . . , Zs ) = (Φ−1 (U1 ), . . . , Φ−1 (Us )) ∼ N(0, I) and return Y = AZ..

(117) 63. aft. Function of a Multinormal vector Let µ = E [f (U)] = E [g (Y)] where Y = (Y1 , . . . , Ys ) ∼ N(0, Σ). For example, if the payoff of a financial derivative is a function of the values taken by a c-dimensional geometric Brownian motions (GMB) at d observations times 0 < t1 < · · · < td = T , then we have s = cd.. Dr. To generate Y: Decompose Σ = AAt , generate Z = (Z1 , . . . , Zs ) = (Φ−1 (U1 ), . . . , Φ−1 (Us )) ∼ N(0, I) and return Y = AZ. Choice of A?.

(118) 63. aft. Function of a Multinormal vector Let µ = E [f (U)] = E [g (Y)] where Y = (Y1 , . . . , Ys ) ∼ N(0, Σ). For example, if the payoff of a financial derivative is a function of the values taken by a c-dimensional geometric Brownian motions (GMB) at d observations times 0 < t1 < · · · < td = T , then we have s = cd.. Dr. To generate Y: Decompose Σ = AAt , generate Z = (Z1 , . . . , Zs ) = (Φ−1 (U1 ), . . . , Φ−1 (Us )) ∼ N(0, I) and return Y = AZ. Choice of A? Cholesky factorization: A is lower triangular..

(119) Dr. aft. 64 Principal component decomposition (PCA): 1/2 A = PD where D = diag(λs , . . . , λ1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors..

(120) Dr. aft. 64 Principal component decomposition (PCA): 1/2 A = PD where D = diag(λs , . . . , λ1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A, Z1 accounts for the maximum amount of variance of Y, then Z2 for the maximum amount of variance conditional on Z1 , and so on..

(121) aft. 64 Principal component decomposition (PCA): 1/2 A = PD where D = diag(λs , . . . , λ1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A, Z1 accounts for the maximum amount of variance of Y, then Z2 for the maximum amount of variance conditional on Z1 , and so on.. Dr. Function of a Brownian motion: Payoff depends on c-dimensional Brownian motion {X(t), t ≥ 0} observed at times 0 = t0 < t1 < · · · < td ..

(122) aft. 64 Principal component decomposition (PCA): 1/2 A = PD where D = diag(λs , . . . , λ1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A, Z1 accounts for the maximum amount of variance of Y, then Z2 for the maximum amount of variance conditional on Z1 , and so on.. Function of a Brownian motion: Payoff depends on c-dimensional Brownian motion {X(t), t ≥ 0} observed at times 0 = t0 < t1 < · · · < td .. Dr. Sequential (or random walk) method: generate X(t1 ), then X(t2 ) − X(t1 ), then X(t3 ) − X(t2 ), etc..

(123) aft. 64 Principal component decomposition (PCA): 1/2 A = PD where D = diag(λs , . . . , λ1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A, Z1 accounts for the maximum amount of variance of Y, then Z2 for the maximum amount of variance conditional on Z1 , and so on.. Function of a Brownian motion: Payoff depends on c-dimensional Brownian motion {X(t), t ≥ 0} observed at times 0 = t0 < t1 < · · · < td .. Dr. Sequential (or random walk) method: generate X(t1 ), then X(t2 ) − X(t1 ), then X(t3 ) − X(t2 ), etc. Brownian bridge (BB) sampling: Suppose d = 2m . Generate X(td ), then X(td/2 ) conditional on (X(0), X(td )), then X(td/4 ) conditional on (X(0), X(td/2 )), and so on. The first few N(0, 1) r.v.’s already sketch the path trajectory..

(124) aft. 64 Principal component decomposition (PCA): 1/2 A = PD where D = diag(λs , . . . , λ1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A, Z1 accounts for the maximum amount of variance of Y, then Z2 for the maximum amount of variance conditional on Z1 , and so on.. Function of a Brownian motion: Payoff depends on c-dimensional Brownian motion {X(t), t ≥ 0} observed at times 0 = t0 < t1 < · · · < td .. Dr. Sequential (or random walk) method: generate X(t1 ), then X(t2 ) − X(t1 ), then X(t3 ) − X(t2 ), etc. Brownian bridge (BB) sampling: Suppose d = 2m . Generate X(td ), then X(td/2 ) conditional on (X(0), X(td )), then X(td/4 ) conditional on (X(0), X(td/2 )), and so on. The first few N(0, 1) r.v.’s already sketch the path trajectory. Each of these methods corresponds to some matrix A. Choice has large impact on the ANOVA decomposition of f ..

(125) 65. aft. Example: Pricing an Asian option. Single asset, s observation times t1 , . . . , ts . Want to estimate E[f (U)], where   s X 1 S(tj ) − K  f (U) = e −rts max 0, s j=1. Dr. and {S(t), t ≥ 0} is a geometric Brownian motion. We have f (U) = g (Y) where Y = (Y1 , . . . , Ys ) ∼ N(0, Σ)..

(126) 65. aft. Example: Pricing an Asian option. Single asset, s observation times t1 , . . . , ts . Want to estimate E[f (U)], where   s X 1 S(tj ) − K  f (U) = e −rts max 0, s j=1. Dr. and {S(t), t ≥ 0} is a geometric Brownian motion. We have f (U) = g (Y) where Y = (Y1 , . . . , Ys ) ∼ N(0, Σ). Let S(0) = 100, K = 100, r = 0.05, ts = 1, and tj = jT /s for 1 ≤ j ≤ s. We consider σ = 0.2, 0.5 and s = 3, 6, 12..

(127) 66. ANOVA Variances for the Asian Option. aft. Asian Option with S(0) = 100, K = 100, r = 0.05, σ = 0.5 s = 3, seq. s = 3, BB s = 3, PCA s = 6, seq. s = 6, BB. Dr. s = 6, PCA. Order 1 Order 2 Order 3 Order 4 Order 5. s = 12, seq.. Order 6. s = 12, BB. Order 7. s = 12, PCA. 0. 20. 40. 60. % of total variance. 80. 100.

(128) 67. Total Variance per Coordinate for the Asian Option. aft. Asian Option (s = 6) with S(0) = 100, K = 100, r = 0.05, σ = 0.5. sequential. PCA. Dr. BB. 0. 20. 40. 60. % of total variance. Coordinate 1 Coordinate 2 Coordinate 3 Coordinate 4 Coordinate 5 Coordinate 6. 80. 100.

(129) 68. Lattices of Rank 1 with CBC. aft. 100 Asian Option (s = 6, sequential) with S(0) = 100, K = 100, r = 0.05, σ = 0.5. 10−1. MC Sobol. 10−3. M6,6,6,6,6,6 P2 order 2 only. 10−4. 10−5. Dr. variance. 10−2. P2 product vu2 = σu2. P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) n−2. 26. 28. 210 n. 212. 214.

(130) 69. Lattices of Rank 1 with CBC Asian Option (BB), s = 6, S(0) = 100, K = 100, r = 0.05, σ = 0.5. aft. 100. 10−1. MC Sobol. 10−3. 10−4. M6,6,6,6,6,6 P2 order 2 only. Dr. variance. 10−2. P2 product vu2 = σu2. 10−5. P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) n−2. 10−6. 26. 28. 210 n. 212. 214.

(131) 70. Lattices of Rank 1 with CBC Asian Option (PCA) s = 6, S(0) = 100, K = 100, r = 0.05, σ = 0.5. aft. 100. 10−1. MC Sobol. 10−3. 10−4. M6,6,6,6,6,6 P2 order 2 only. Dr. variance. 10−2. P2 product vu2 = σu2. 10−5. P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) n−2. 10−6. 26. 28. 210 n. 212. 214.

(132) 71. Lattices of Rank 1 with CBC Asian Option (sequential) s = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5. aft. 100. 10−1. MC Sobol. 10−3. M12,12,12,12,12,12 P2 order 2 only. 10−4. 10−5. Dr. variance. 10−2. P2 product vu2 = σu2. P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) n−2. 26. 28. 210 n. 212. 214.

(133) 72. Lattices of Rank 1 with CBC Asian Option (BB) s = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5. aft. 100. 10−1. MC Sobol. 10−3. 10−4. M12,12,12,12,12,12 P2 order 2 only. Dr. variance. 10−2. P2 product vu2 = σu2. 10−5. P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) n−2. 10−6. 26. 28. 210 n. 212. 214.

(134) 73. Lattices of Rank 1 with CBC Asian Option (PCA) s = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5. aft. 100. 10−1. MC Sobol. 10−3. 10−4. M12,12,12,12,12,12 P2 order 2 only. Dr. variance. 10−2. P2 product vu2 = σu2. 10−5. P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) n−2. 10−6. 26. 28. 210 n. 212. 214.

(135) 74. Random vs. Full CBC. aft. Asian Option (seq.) s = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5. 10−2. 10−3. 10−4. Dr. variance. 10−1. Full CBC (P2 product) Random CBC (P2 product). 26. 28. 210 n. 212. 214.

(136) 75. Random vs. Full CBC. aft. Asian Option (BB) s = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5 10−1. 10−3. 10−4. 10−5. Dr. variance. 10−2. Full CBC (P2 product) Random CBC (P2 product). 26. 28. 210 n. 212. 214.

(137) 76. Random vs. Full CBC. aft. Asian Option (PCA) s = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5 10−1. 10−3. 10−4. 10−5. Dr. variance. 10−2. Full CBC (P2 product) Random CBC (P2 product). 26. 28. 210 n. 212. 214.

(138) 77. Prime vs. Power-of-2 Number of Points Asian Option (seq.) s = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5. aft. 100. 10−2. 10−3. 10−4. Dr. variance. 10−1. prime (P2 product) power of 2 (P2 product). 26. 28. 210 n. 212. 214.

(139) 78. Prime vs. Power-of-2 Number of Points. aft. Asian Option (BB) s = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5 10−1. 10−3. 10−4. 10−5. Dr. variance. 10−2. prime (P2 product) power of 2 (P2 product). 26. 28. 210 n. 212. 214.

(140) 79. Prime vs. Power-of-2 Number of Points. aft. Asian Option (PCA) s = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5 10−1. 10−3. 10−4. 10−5. Dr. variance. 10−2. prime (P2 product) power of 2 (P2 product). 26. 28. 210 n. 212. 214.

(141) 80. Korobov vs. CBC. aft. Asian Option (seq.) s = 6, S(0) = 100, K = 100, r = 0.05, σ = 0.5. 10−2. 10−3. 10−4. Dr. variance. 10−1. M32,24,16,12 M6,6,6,6,6,6 P2 product. 26. 28. Solid: CBC.. 210 n. 212. Dashed: Korobov.. 214.

(142) 81. Korobov vs. CBC. aft. Asian Option (BB) s = 6, S(0) = 100, K = 100, r = 0.05, σ = 0.5 10−1. 10−3. 10−4. 10−5. Dr. variance. 10−2. M32,24,16,12 M6,6,6,6,6,6 P2 product. 26. 28. Solid: CBC.. 210 n. 212. Dashed: Korobov.. 214.

(143) 82. Korobov vs. CBC Asian Option (PCA) s = 6, S(0) = 100, K = 100, r = 0.05, σ = 0.5. aft. 10−1. 10−3. 10−4. 10−5. Dr. variance. 10−2. M32,24,16,12 M6,6,6,6,6,6 P2 product. 26. 28. Solid: CBC.. 210 n. 212. Dashed: Korobov.. 214.

(144) 83. Korobov vs. CBC. aft. Asian Option (seq.) s = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5. 10−2. 10−3. 10−4. Dr. variance. 10−1. M32,24,16,12 M12,12,12,12,12,12 P2 product. 26. 28. Solid: CBC.. 210 n. 212. Dashed: Korobov.. 214.

(145) 84. Korobov vs. CBC. aft. Asian Option (BB) s = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5 10−1. 10−3. 10−4. 10−5. Dr. variance. 10−2. M32,24,16,12 M12,12,12,12,12,12 P2 product. 26. 28. Solid: CBC.. 210 n. 212. Dashed: Korobov.. 214.

(146) 85. Korobov vs. CBC Asian Option (PCA) s = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5. aft. 10−1. 10−3. 10−4. 10−5. Dr. variance. 10−2. M32,24,16,12 M12,12,12,12,12,12 P2 product. 26. 28. Solid: CBC.. 210 n. 212. Dashed: Korobov.. 214.

(147) 86. Histograms for the Asian Option, s = 6, sequential. aft. probability. single MC draw (s = 6, seq.) 0.6 0.4 0.2 0 0. 200. 400. probability. MC estimator (s = 6, seq.). 0.15 0.1 0. Dr. 5 · 10−2 13. 14. 15. probability. RQMC estimator (s = 6, seq.). 0.1. 5 · 10−2 0. 13.9. 13.95.

(148) 87. Histograms for the Asian option, s = 6, PCA. aft. probability. single MC draw (s = 6, PCA) 0.6 0.4 0.2 0 0. 200. 400. 0.15 0.1. 5 · 10−2 0. Dr. probability. MC estimator (s = 6, PCA). 13. 14. 15. probability. RQMC estimator (s = 6, PCA). 0.3 0.2 0.1 0. 13.92. 13.94.

(149) 88. aft. A down-and-in Asian option with barrier B. Same as for Asian option, except that payoff is zero unless min S(tj ) ≤ 80.. Dr. 1≤j≤s.

(150) 89. ANOVA Variances for the down-and-in Asian Option. aft. Down-and-in with S(0) = K = 100, r = 0.05, σ = 0.2, B = 80 s = 3, seq. s = 3, BB s = 3, PCA s = 6, seq. s = 6, BB. Dr. s = 6, PCA. Order 1 Order 2 Order 3 Order 4 Order 5. s = 12, seq.. Order 6. s = 12, BB. Order 7. s = 12, PCA. 0. 20. 40. 60. % of total variance. 80. 100.

(151) 90. aft. Total Variance per Coordinate for the down-and-in Asian Option Down-and-In (s = 6), S(0) = K = 100, r = 0.05, σ = 0.2, B = 80. sequential. PCA. Dr. BB. 0. 20. 40. 60. % of total variance. Coordinate 1 Coordinate 2 Coordinate 3 Coordinate 4 Coordinate 5 Coordinate 6. 80. 100.

(152) 91. Lattices of Rank 1 with CBC. aft. Down-and-In (seq.) s = 12, S(0) = K = 100, r = 0.05, σ = 0.5, B = 80 10−1. MC Sobol. 10−3. Dr. variance. 10−2. M6,6,6,6,6,6. P2 order vu2 = (45/π 4 )|u| σu2. 10−4. P2 product vu2 = (45/π 4 )|u| σu2 P2 product γj = 0.5 n−2. 26. 28. 210 n. 212. 214.

(153) 92. Lattices of Rank 1 with CBC. aft. Down-and-In (BB) s = 6, S(0) = K = 100, r = 0.05, σ = 0.5, B = 80 10−1. MC Sobol. 10−3. Dr. variance. 10−2. M6,6,6,6,6,6. P2 order vu2 = (45/π 4 )|u| σu2. 10−4. P2 product vu2 = (45/π 4 )|u| σu2 P2 product γj = 0.5 n−2. 26. 28. 210 n. 212. 214.

(154) 93. Lattices of Rank 1 with CBC. aft. Down-and-In (PCA) s = 6, S(0) = K = 100, r = 0.05, σ = 0.5, B = 80 10−1. MC Sobol. 10−3. Dr. variance. 10−2. M6,6,6,6,6,6. P2 order vu2 = (45/π 4 )|u| σu2. 10−4. P2 product vu2 = (45/π 4 )|u| σu2 P2 product γj = 0.5 n−2. 26. 28. 210 n. 212. 214.

(155) 94. Lattices of Rank 1 with CBC. aft. Down-and-In (seq.) s = 12, S(0) = K = 100, r = 0.05, σ = 0.5, B = 80. 10−1. variance. 10−2. MC Sobol. Dr. 10−3. M12,12,12,12,12,12. 10−4. P2 order vu2 = (45/π 4 )|u| σu2 P2 product γj = 0.5 n−2. 26. 28. 210 n. 212. 214.

(156) 95. Lattices of Rank 1 with CBC. aft. Down-and-In (BB) s = 12, S(0) = K = 100, r = 0.05, σ = 0.5, B = 80. 10−1. variance. 10−2. MC Sobol. Dr. 10−3. M12,12,12,12,12,12. P2 order vu2 = (45/π 4 )|u| σu2. 10−4. P2 product vu2 = (45/π 4 )|u| σu2 P2 product γj = 0.5 n−2. 26. 28. 210 n. 212. 214.

(157) 96. Lattices of Rank 1 with CBC. aft. Down-and-In (PCA) s = 12, S(0) = K = 100, r = 0.05, σ = 0.5, B = 80. 10−1. variance. 10−2. MC Sobol. Dr. 10−3. M12,12,12,12,12,12. P2 order vu2 = (45/π 4 )|u| σu2. 10−4. P2 product vu2 = (45/π 4 )|u| σu2 P2 product γj = 0.5 n−2. 26. 28. 210 n. 212. 214.

(158) 97. aft. Call on the maximum of 6 assets. Each of 6 asset prices obeys a GBM with s0 = 100, r = 0.05, σ = 0.2. The pairwise correlation between Brownian motions is 0.3.. Dr. The assets pay a dividend at rate 0.10, which means that the effective risk-free rate can be taken as r 0 = 0.05 − 0.10 = −0.05..

(159) 98. aft. ANOVA variances for the maximum of 6 assets. Maximum of 6 assets, S(0) = K = 100, r = 0.05, σ = 0.5, ρ = 0.3. Cholesky. Dr. PCA. 0. 20. 40. 60. % of total variance. Order 1 Order 2 Order 3 Order 4 Order 5 Order 6. 80. 100.

(160) 99. aft. Total Variance per Coordinate for max of 6 assets Maximum of 6 assets, S(0) = K = 100, r = 0.05, σ = 0.5, ρ = 0.3. PCA. 0. Dr. Cholesky. 20. 40. 60. % of total variance. Coordinate 1 Coordinate 2 Coordinate 3 Coordinate 4 Coordinate 5 Coordinate 6. 80. 100.

(161) 100. Lattices of Rank 1 with CBC 102. aft. Maximum of 6 assets (Cholesky), S(0) = K = 100, r = 0.05, σ = 0.5, ρ = 0.3. 100. 10−1. MC Sobol. Dr. variance. 101. M6,6,6,6,6,6. P2 order vu2 = (45/π 4 )|u| σu2. 10−2. P2 product vu2 = (45/π 4 )|u| σu2 P2 product γj = 0.5 n−2. 26. 28. 210 n. 212. 214.

(162) 101. Lattices of Rank 1 with CBC 102. aft. Maximum of 6 assets (PCA), S(0) = K = 100, r = 0.05, σ = 0.5, ρ = 0.3. 100. 10−1. MC Sobol. Dr. variance. 101. M6,6,6,6,6,6. P2 order vu2 = (45/π 4 )|u| σu2. 10−2. P2 product vu2 = (45/π 4 )|u| σu2 P2 product γj = 0.5 n−2. 26. 28. 210 n. 212. 214.

(163) 102. Prime vs. Power-of-2 Number of Points Maximum of 6 assets with S(0) = K = 100, r = 0.05, σ = 0.5, ρ = 0.3. aft. 102. 100. 10−1. 10−2. Dr. variance. 101. prime (P2 order) power of 2 (P2 order). 26. 28. 210 n. 212. 214.

(164) 103. Korobov vs. CBC Maximum of 12 assets with S(0) = K = 100, r = 0.05, σ = 0.5, ρ = 0.3. aft. 102. 100. 10−1. 10−2. Dr. variance. 101. M32,24,16,12 M12,12,12,12,12,12 P2 order. 26. 28. 210 n. Solid: CBC. Dashed: Korobov.. 212. 214.

(165) 104. aft. Discrete choice with multinomial mixed logit probability, max likelihood estimation Utility of alternative j for individual q is Uq,j. = β tq xq,j + q,j =. s X. βq,` xq,j,` + q,j , where. `=1. β tq = (βq,1 , . . . , βq,s ) gives the tastes of individual q, q,j. = (xq,j,1 , . . . , xq,j,s ) attributes of alternative j for individual q,. Dr. xtq,j. noise; Gumbel of mean 0 and scale parameter λ = 1.. Individual q selects alternative with largest utility Uq,j . Can observe the xq,j and choices yq , but not the rest..

(166) 105. aft. Logit model: for β q fixed, j is chosen with probability Lq (j | β q ) = P. exp[β tq xq,j ] t a∈A(q) exp[β q xq,a ]. Dr. where A(q) are the available alternatives for q..

(167) 105. aft. Logit model: for β q fixed, j is chosen with probability Lq (j | β q ) = P. exp[β tq xq,j ] t a∈A(q) exp[β q xq,a ]. where A(q) are the available alternatives for q.. Dr. For a random individual, suppose β q is random with density fθ , which depends on (unknown) parameter vector θ. We want to estimate θ from the data (the xq,j and yq ). The unconditional probability of choosing j is Z pq (j, θ) = Lq (j | β)fθ (β)β. . It depends on A(q), j, and θ..

(168) Maximum likelihood: Maximize the log of the joint probability of the sample, w.r.t. θ: m X. aft. ln L(θ) = ln. m Y. pq (yq , θ) =. q=1. Dr. q=1. ln pq (yq , θ).. 106.

(169) Maximum likelihood: Maximize the log of the joint probability of the sample, w.r.t. θ: m X. aft. ln L(θ) = ln. m Y. pq (yq , θ) =. q=1. 106. ln pq (yq , θ).. q=1. No formula for pq (j, θ), but can use MC or RQMC, for each q and fixed θ. (1). (n). Generate n realizations of β from fθ , say β q (θ), . . . , β q (θ), and estimate pq (yq , θ) by n. Dr. 1X (i) p̂q (yq , θ) = Lq (j, β q (θ)). n i=1. Then we can find the maximizer θ̂ of ln. Qm. q=1 p̂q (yq , θ). w.r.t. θ..

(170) Maximum likelihood: Maximize the log of the joint probability of the sample, w.r.t. θ: m X. aft. ln L(θ) = ln. m Y. pq (yq , θ) =. q=1. 106. ln pq (yq , θ).. q=1. No formula for pq (j, θ), but can use MC or RQMC, for each q and fixed θ. (1). (n). Generate n realizations of β from fθ , say β q (θ), . . . , β q (θ), and estimate pq (yq , θ) by n. Dr. 1X (i) p̂q (yq , θ) = Lq (j, β q (θ)). n i=1. Then we can find the maximizer θ̂ of ln. Qm. q=1 p̂q (yq , θ). w.r.t. θ.. We take 4 alternatives, with indep. attributes, resp. N(1, 1), N(1, 1), N(0.5, 1), N(0.5, 1). We try s = 5, 10, 15. β q is a vector of s indep. N(1, 1) random variables..

(171) 107. aft. ANOVA Variances for the Mixed Logit Model Mixed Logit Model. s = 5, individual 1. s = 5, individual 2. Dr. s = 15, individual 1. Order 1 Order 2 Order 3 Order 4 Order 5 Order 6. s = 15, individual 2. 0. 20. 40. 60. % of total variance. Order 7 80. 100.

(172) 108. Total variance per coordinate. aft. Mixed Logit Model (s = 5). Dr. s = 5, individual 1. 20. 40. 60. % of total variance. Coordinate 2 Coordinate 3 Coordinate 4 Coordinate 5. s = 5, individual 2. 0. Coordinate 1. 80. 100.

(173) 109. aft. Total variance per coordinate Mixed Logit Model (s = 15). Dr. s = 15, individual 1. 20. 40. 60. % of total variance. Coordinate 2 Coordinate 3 Coordinate 4 Coordinate 5 Coordinate 6 Coordinate 7 Coordinate 8 Coordinate 9 Coordinate 10 Coordinate 11. s = 15, individual 2. 0. Coordinate 1. Coordinate 12 Coordinate 13 Coordinate 14 80. 100. Coordinate 15.

(174) 110. Lattices of Rank 1 with CBC. aft. Mixed Logit Model (s = 10, individual 1) 10−4. variance. 10−5. MC Sobol. 10−6. Dr. M10,10,10,10,10,10. P2 order vu2 = (45/π 4 )|u| σu2. 10−7. P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) P2 product γj = 0.5 n−2. 26. 28. 210 n. 212. 214.

(175) 111. Lattices of Rank 1 with CBC 10−3. aft. Mixed Logit Model (s = 10, individual 2). 10−5. MC Sobol. 10−6. M10,10,10,10,10,10. Dr. variance. 10−4. P2 order vu2 = (45/π 4 )|u| σu2. 10−7. P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) P2 product γj = 0.5 n−2. 26. 28. 210 n. 212. 214.

(176) 112. Lattices of Rank 1 with CBC 10−3. aft. Mixed Logit Model (s = 15, individual 1). 10−5. MC Sobol. M15,15,15,15,15,15. 10−6. 10−7. Dr. variance. 10−4. P2 order vu2 = (45/π 4 )|u| σu2. P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) P2 product γj = 0.5 n−2. 26. 28. 210 n. 212. 214.

(177) 113. Lattices of Rank 1 with CBC 10−3. aft. Mixed Logit Model (s = 15, individual 2). 10−5. MC Sobol. M15,15,15,15,15,15. 10−6. 10−7. Dr. variance. 10−4. P2 order vu2 = (45/π 4 )|u| σu2. P2 product vu2 = (45/π 4 )|u| σu2 P2 product Wang & Sloan (2006) P2 product γj = 0.5 n−2. 26. 28. 210 n. 212. 214.

(178) 114. Random vs. Full CBC. aft. Mixed Logit Model (s = 10, individual 1). 10−5. 10−6. 10−7. Dr. variance. 10−4. Full CBC (P2 order) Random CBC (P2 order). 26. 28. 210 n. 212. 214.

(179) 115. Random vs. Full CBC. aft. Mixed Logit Model (s = 10, individual 2). 10−5. 10−6. 10−7. Dr. variance. 10−4. Full CBC (P2 order) Random CBC (P2 order). 26. 28. 210 n. 212. 214.

(180) 116. Prime vs. Power-of-2 Number of Points. aft. Mixed Logit Model (s = 10, individual 1). 10−5. 10−6. 10−7. Dr. variance. 10−4. prime (P2 order) power of 2 (P2 order). 26. 28. 210 n. 212. 214.

(181) 117. Prime vs. Power-of-2 Number of Points. aft. Mixed Logit Model (s = 10, individual 2). 10−5. 10−6. 10−7. Dr. variance. 10−4. prime (P2 order) power of 2 (P2 order). 26. 28. 210 n. 212. 214.

(182) 118. Korobov vs. CBC. aft. Mixed Logit Model (s = 10, individual 1). 10−5. 10−6. 10−7. Dr. variance. 10−4. M32,24,16,12 M10,10,10,10,10,10 P2 order. 26. 28. 210 n. Solid: CBC. Dashed: Korobov.. 212. 214.

(183) 119. Korobov vs. CBC 10−3. aft. Mixed Logit Model (s = 10, individual 2). 10−5. 10−6. 10−7. Dr. variance. 10−4. M32,24,16,12 M10,10,10,10,10,10 P2 order. 26. 28. 210 n. Solid: CBC. Dashed: Korobov.. 212. 214.

(184) 120. aft. References for the material of this talk (some were added afterward): 1. P. L’Ecuyer and D. Munger, “On the Choice of Figure of Merit for Randomly-Shifted Lattice Rules”, in Monte Carlo and Quasi Monte Carlo Methods 2010, H. Wozniakowski and L. Plaskota, Eds., Springer-Verlag, Berlin, 2012, 133–159. 2. P. L’Ecuyer and C. Lemieux, “Variance Reduction via Lattice Rules”, Management Science 46, 9 (2000), 1214–1235.. Dr. 3. P. L’Ecuyer, “Quasi-Monte Carlo Methods in Finance”, Proceedings of the 2004 Winter Simulation Conference, IEEE Press, 2004, 1645–1655. 4. P. L’Ecuyer, “Quasi-Monte Carlo Methods with Applications in Finance,” Finance and Stochastics, 13, 3 (2009), 307–349. 5. P. L’Ecuyer and D. Munger, “Algorithm 958: LatticeBuilder: A General Software Tool for Constructing Rank-1 Lattice Rules”, ACM Transactions on Mathematical Software, 42, 2, Article 15, 2016..

(185)

Références

Documents relatifs

Randomized quasi-Monte Carlo (RQMC) methods replace the independent random numbers by dependent vectors of uniform random numbers that cover the space more evenly.. When estimating

With this scrambling just by itself, each point does not have the uniform distribution (e.g., the point 0 is unchanged), but one can apply a random digital shift in base b after

Keywords: random number generator, pseudorandom numbers, linear generator, multi- ple recursive generator, tests of uniformity, random variate generation, inversion,

The estimated variance, bias, MSE and fraction of the MSE due to the square bias on the likelihood function estimator with constructed lattices and other point sets are plotted

However, a new RQMC method specially designed for Markov chains, called array-RQMC (L’Ecuyer et al., 2007), is often very effective in this situation. The idea of this method is

In summary, for array-RQMC, we have the follow- ing types of d-dimensional RQMC point sets for P n : a d + 1-dimensional Korobov lattice rule with its first coordinate skipped,

We have theoretical results on the rate of convergence of the variance of the mean estimator (as n → ∞) only for narrow special cases, but our empirical results with a variety

Like for RNGs, different discrepancy measures are often used in practice for different types of point set constructions, for computational efficiency considera- tions (e.g.. This