Monte Carlo and Randomized Quasi-Monte Carlo Density Estimation by Conditioning

Texte intégral

(1)1. aft. Monte Carlo and Randomized Quasi-Monte Carlo Density Estimation by Conditioning Pierre L’Ecuyer. Dr. Joint work with Amal Ben Abdellah and Florian Puchhammer. Optimization Days, Montreal, May 2019.

(2) 2. What this talk is about. Dr. aft. Monte Carlo (MC) simulation is widely used to estimate the expectation E[X ] of a random variable X and compute a confidence interval on E[X ]. MSE = Var[X̄n ] = O(n−1 )..

(3) 2. What this talk is about. aft. Monte Carlo (MC) simulation is widely used to estimate the expectation E[X ] of a random variable X and compute a confidence interval on E[X ]. MSE = Var[X̄n ] = O(n−1 ).. Dr. But simulation usually provides information to do much more! The output data can be used to estimate the entire distribution of X , e.g., the cumulative distribution function (cdf) F of X , defined by F (x) = P[X ≤ x], or its density f defined by f (x) = F 0 (x)..

(4) 2. What this talk is about. aft. Monte Carlo (MC) simulation is widely used to estimate the expectation E[X ] of a random variable X and compute a confidence interval on E[X ]. MSE = Var[X̄n ] = O(n−1 ). But simulation usually provides information to do much more! The output data can be used to estimate the entire distribution of X , e.g., the cumulative distribution function (cdf) F of X , defined by F (x) = P[X ≤ x], or its density f defined by f (x) = F 0 (x). If X1 , . . . , Xn are n indep. realizations of X , the empirical cdf n. 1X I[Xi ≤ x] n. Dr. F̂n (x) =. i=1. is unbiased for F (x) at all x, and Var[F̂n (x)] = O(n−1 )..

(5) 2. What this talk is about. aft. Monte Carlo (MC) simulation is widely used to estimate the expectation E[X ] of a random variable X and compute a confidence interval on E[X ]. MSE = Var[X̄n ] = O(n−1 ). But simulation usually provides information to do much more! The output data can be used to estimate the entire distribution of X , e.g., the cumulative distribution function (cdf) F of X , defined by F (x) = P[X ≤ x], or its density f defined by f (x) = F 0 (x). If X1 , . . . , Xn are n indep. realizations of X , the empirical cdf n. 1X I[Xi ≤ x] n. Dr. F̂n (x) =. i=1. is unbiased for F (x) at all x, and Var[F̂n (x)] = O(n−1 ). However, for a continuous r.v. X , the density f provides a better visual idea of the distribution. Here we focus on estimating f over [a, b] ⊂ R..

(6) 2. What this talk is about. aft. Monte Carlo (MC) simulation is widely used to estimate the expectation E[X ] of a random variable X and compute a confidence interval on E[X ]. MSE = Var[X̄n ] = O(n−1 ). But simulation usually provides information to do much more! The output data can be used to estimate the entire distribution of X , e.g., the cumulative distribution function (cdf) F of X , defined by F (x) = P[X ≤ x], or its density f defined by f (x) = F 0 (x). If X1 , . . . , Xn are n indep. realizations of X , the empirical cdf n. 1X I[Xi ≤ x] n. Dr. F̂n (x) =. i=1. is unbiased for F (x) at all x, and Var[F̂n (x)] = O(n−1 ). However, for a continuous r.v. X , the density f provides a better visual idea of the distribution. Here we focus on estimating f over [a, b] ⊂ R. Can we have E[fˆn (x) − f (x)]2 = O(n−1 )???.

(7) 3. Small example: A stochastic activity network. aft. Gives precedence relations between activities. Activity k has random duration Yk (also length of arc k) with known cdf Fk (y ) := P[Yk ≤ y ]. Project duration X = (random) length of longest path from source to sink. Can look at deterministic equivalent of X , E[X ], cdf, density, .... 5. Dr. Y6. Want to estimate the density of X , d P[X ≤ x]. f (x) = F 0 (x) = dx. Y2. 0 Y1 source. 2. sink Y11. 8. Y10. Y5. 4. Y3. Y13 Y9. 7. Y8 1. Y5. 3. Y12 Y7. 6.

(8) 3. Small example: A stochastic activity network. aft. Gives precedence relations between activities. Activity k has random duration Yk (also length of arc k) with known cdf Fk (y ) := P[Yk ≤ y ]. Project duration X = (random) length of longest path from source to sink. Can look at deterministic equivalent of X , E[X ], cdf, density, .... 5. Y6. Dr. The sample cdf P F̂n (x) = n1 ni=1 I[Xi ≤ x] is an unbiased estimator of the cdf F (x) = P[X ≤ x]. Want to estimate the density of X , d P[X ≤ x]. f (x) = F 0 (x) = dx. Y2. 0 Y1 source. 2. sink Y11. 8. Y10. Y5. 4. Y3. Y13 Y9. 7. Y8 1. Y5. 3. Y12 Y7. 6.

(9) 3. Small example: A stochastic activity network. aft. Gives precedence relations between activities. Activity k has random duration Yk (also length of arc k) with known cdf Fk (y ) := P[Yk ≤ y ]. Project duration X = (random) length of longest path from source to sink. Can look at deterministic equivalent of X , E[X ], cdf, density, .... 5. Y6. Dr. The sample cdf P F̂n (x) = n1 ni=1 I[Xi ≤ x] is an unbiased estimator of the cdf F (x) = P[X ≤ x]. Want to estimate the density of X , d P[X ≤ x]. f (x) = F 0 (x) = dx. Y2. 2. sink Y11. 8. Y10. Y5. 4. Y3. Y13 Y9. 7. Y8. Y12. F̂n0 (x). The sample derivative is useless 0 fo estimate f (x), because it is 0 almost Y1 source everywhere.. 1. Y5. 3. Y7. 6.

(10) 4. Numerical illustration from Elmaghraby (1977): N(µk , σk2 ). aft. Yk ∼ for k = 1, 2, 4, 11, 12, and Yk ∼ Expon(1/µk ) otherwise. µ1 , . . . , µ13 : 13.0, 5.5, 7.0, 5.2, 16.5, 14.7, 10.3, 6.0, 4.0, 20.0, 3.2, 3.2, 16.5. Results of an experiment with n = 100 000. Frequency. Note: X is not normal!. mean = 64.2 Xdet = 48.2. 5000. Dr. 10000. ξˆ0.99 = 131.8. 0 0. 25. 50. 75. 100. 125. X 150. 175. 200.

(11) 5. aft. Density Estimation Suppose we estimate the density f over a finite interval [a, b]. Let fˆn (x) denote the density estimator at x, with sample size n. We use the following measures of error:. Z. MISE = mean integrated squared error =. b. E[(fˆn (x) − f (x))2 ]dx. a. = IV + ISB. b. Dr. Z. Var[fˆn (x)]dx a Z b (E[fˆn (x)] − f (x))2 dx ISB = integrated squared bias = IV = integrated variance =. a.

(12) 6. aft. Density Estimation Simple histogram: Partition [a, b] in m intervals of size h = (b − a)/m and define nj for x ∈ Ij = [a + (j − 1)h, a + jh), j = 1, ..., m fˆn (x) = nh where nj is the number of observations Xi that fall in interval j.. Dr. Kernel Density Estimator (KDE) : Select kernel k (unimodal symmetric density centered at 0) and bandwidth h > 0 (horizontal stretching factor for the kernel). The KDE is n. 1 X fˆn (x) = k nh i=1. . x − Xi h. ..

(13) 7. KDE bandwidth selection: an illustration in s = 1 dimension midpoint rule (left),. aft. KDE (blue) vs true density (red) with RQMC point sets with n = 219 : Stratified sample of U = F (X ) (right). 80 70 60 50 40 30 20 10 0 -5.1. (10−5 ). -4.6. 80 70 60 50 40 30 20 10 0 -5.1. Dr. (10−5 ). -4.1. -3.6. -4.6. -4.1. -3.6.

(14) 8. For any g : R → R, define Z R(g ) =. aft. KDE asymptotic convergence with Monte Carlo for smooth f b. (g (x))2 dx,. Za ∞ µr (g ) =. x r g (x)dx,. for r = 0, 1, 2, . . .. −∞. Dr. For histograms and KDEs, when n → ∞ and h → 0:. AMISE = AIV + AISB ∼. Histogram KDE. C + Bhα . nh. C 1. B R(f 0 ) /12. α 2. µ0 (k 2 ). (µ2 (k))2 R(f 00 ) /4. 4.

(15) 9. The asymptotically optimal h is h = and it gives AMISE = Kn−α/(1+α) . C. KDE. B R(f 0 ). 1 µ0. C Bαn. 1/(α+1). 12 (µ2 (k))2 R(f 00 ). α. h∗. AMISE. 2. (nR(f 0 )/6)−1/3. O(n−2/3 ). Dr. Histogram. . aft ∗. (k 2 ). 4. . 4. µ0 (k 2 ) (µ2 (k))2 R(f 00 )n. 1/5. O(n−4/5 ). To estimate h∗ , one can estimate R(f 0 ) and R(f 00 ) via KDE (plugin). This is under the simplifying assumption that h must be the same all over [a, b]..

(16) 10. aft. Can we take the stochastic derivative of an estimator of F ?. Can we estimate the density f (x) = F 0 (x) by the derivative of an estimator of F (x). A simple candidate cdf estimator is the empirical cdf n. F̂n (x) =. 1X I[Xi ≤ x]. n. Dr. i=1. However dF̂n (x)/dx = 0 almost everywhere, so this cannot be a useful density estimator! We need a smoother estimator of F ..

(17) 11. Conditional Monte Carlo (CMC) for Derivative Estimation. aft. Idea: Replace indicator I[Xi ≤ x] by its conditional cdf given filtered information: def. F (x | G) = P[X ≤ x | G]. Dr. where the sigma-field G contains not enough information to reveal X but enough to compute F (x | G), and is chosen so that the following holds:.

(18) 11. Conditional Monte Carlo (CMC) for Derivative Estimation. aft. Idea: Replace indicator I[Xi ≤ x] by its conditional cdf given filtered information: def. F (x | G) = P[X ≤ x | G]. where the sigma-field G contains not enough information to reveal X but enough to compute F (x | G), and is chosen so that the following holds:. Dr. Assumption 1. For all realizations of G, F (x | G) is a continuous function of x over [a, b], differentiable except perhaps over a denumerable set of points D(G) ⊂ [a, b], and for which F 0 (x | G) = dF (x | G)/dx (when it exists) is bounded uniformly in x by a random variable Γ such that E[Γ2 ] ≤ Kγ < ∞. Theorem 1: Under Assump. 1, for x ∈ [a, b], E[F 0 (x | G)] = f (x) and Var[F 0 (x | G)] < Kγ ..

(19) 11. Conditional Monte Carlo (CMC) for Derivative Estimation. aft. Idea: Replace indicator I[Xi ≤ x] by its conditional cdf given filtered information: def. F (x | G) = P[X ≤ x | G]. where the sigma-field G contains not enough information to reveal X but enough to compute F (x | G), and is chosen so that the following holds:. Dr. Assumption 1. For all realizations of G, F (x | G) is a continuous function of x over [a, b], differentiable except perhaps over a denumerable set of points D(G) ⊂ [a, b], and for which F 0 (x | G) = dF (x | G)/dx (when it exists) is bounded uniformly in x by a random variable Γ such that E[Γ2 ] ≤ Kγ < ∞. Theorem 1: Under Assump. 1, for x ∈ [a, b], E[F 0 (x | G)] = f (x) and Var[F 0 (x | G)] < Kγ . Theorem 2: If G ⊂ G̃ both satisfy Assumption 1, then Var[F 0 (x | G)] ≤ Var[F 0 (x | G̃)]..

(20) 11. Conditional Monte Carlo (CMC) for Derivative Estimation. aft. Idea: Replace indicator I[Xi ≤ x] by its conditional cdf given filtered information: def. F (x | G) = P[X ≤ x | G]. where the sigma-field G contains not enough information to reveal X but enough to compute F (x | G), and is chosen so that the following holds:. Dr. Assumption 1. For all realizations of G, F (x | G) is a continuous function of x over [a, b], differentiable except perhaps over a denumerable set of points D(G) ⊂ [a, b], and for which F 0 (x | G) = dF (x | G)/dx (when it exists) is bounded uniformly in x by a random variable Γ such that E[Γ2 ] ≤ Kγ < ∞. Theorem 1: Under Assump. 1, for x ∈ [a, b], E[F 0 (x | G)] = f (x) and Var[F 0 (x | G)] < Kγ . Theorem 2: If G ⊂ G̃ both satisfy Assumption 1, then Var[F 0 (x | G)] ≤ Var[F 0 (x | G̃)]. P Conditional density estimator (CDE) with sample size n: fˆcde,n (x) = n1 ni=1 F 0 (x | G (i) ) where G (1) , . . . , G (n) are n independent realizations of G..

(21) 12. aft. Example 1. A sum of independent random variables X = Y1 + · · · + Yd , where the Yj are independent and continuous with cdf Fj and density fj , and G is defined by hiding Yk for an arbitrary k: def. G = Gk = S−k = Y1 + · · · + Yk + · · · + Yd . We have. F (x | Gk ) = P[X ≤ x | S−k ] = P[Yk ≤ x − S−k ] = Fk (x − S−k ). Dr. and the density estimator becomes F 0 (x | Gk ) = fk (x − S−k )..

(22) 12. aft. Example 1. A sum of independent random variables X = Y1 + · · · + Yd , where the Yj are independent and continuous with cdf Fj and density fj , and G is defined by hiding Yk for an arbitrary k: def. G = Gk = S−k = Y1 + · · · + Yk + · · · + Yd . We have. F (x | Gk ) = P[X ≤ x | S−k ] = P[Yk ≤ x − S−k ] = Fk (x − S−k ). Dr. and the density estimator becomes F 0 (x | Gk ) = fk (x − S−k ). Shifted density of Yk . The idea of using CMC for density estimation was introduced by Asmussen (2018) for this special case, with k = d and same Fj for all j..

(23) 13. Example 2: generalization. Dr. aft. Let X = h(Y1 , . . . , Yd ) and define Gk again by erasing a continuous Yk ; Gk = (Y1 , . . . , Yk−1 , Yk+1 , . . . , Yd )..

(24) 13. Example 2: generalization. aft. Let X = h(Y1 , . . . , Yd ) and define Gk again by erasing a continuous Yk ; Gk = (Y1 , . . . , Yk−1 , Yk+1 , . . . , Yd ).. Dr. Exemple: X = (Y1 + Y22 )/Y3 where Y3 > 0..

(25) 13. Example 2: generalization. aft. Let X = h(Y1 , . . . , Yd ) and define Gk again by erasing a continuous Yk ; Gk = (Y1 , . . . , Yk−1 , Yk+1 , . . . , Yd ). Exemple: X = (Y1 + Y22 )/Y3 where Y3 > 0.. Dr. If k = 3, since X ≤ x if and only if Y3 ≥ Y1 + Y22 )/x, we have F (x | G3 ) = P(X ≤ x | Y1 , Y2 ) = 1 − F3 ((Y1 + Y22 )/x),.

(26) 13. Example 2: generalization. aft. Let X = h(Y1 , . . . , Yd ) and define Gk again by erasing a continuous Yk ; Gk = (Y1 , . . . , Yk−1 , Yk+1 , . . . , Yd ). Exemple: X = (Y1 + Y22 )/Y3 where Y3 > 0.. Dr. If k = 3, since X ≤ x if and only if Y3 ≥ Y1 + Y22 )/x, we have F (x | G3 ) = P(X ≤ x | Y1 , Y2 ) = 1 − F3 ((Y1 + Y22 )/x), and the density estimator at x is F 0 (x | G3 ) = f3 ((Y1 + Y22 )/x)(Y1 + Y22 )/x 2 ..

(27) 13. Example 2: generalization. aft. Let X = h(Y1 , . . . , Yd ) and define Gk again by erasing a continuous Yk ; Gk = (Y1 , . . . , Yk−1 , Yk+1 , . . . , Yd ). Exemple: X = (Y1 + Y22 )/Y3 where Y3 > 0.. Dr. If k = 3, since X ≤ x if and only if Y3 ≥ Y1 + Y22 )/x, we have F (x | G3 ) = P(X ≤ x | Y1 , Y2 ) = 1 − F3 ((Y1 + Y22 )/x), and the density estimator at x is F 0 (x | G3 ) = f3 ((Y1 + Y22 )/x)(Y1 + Y22 )/x 2 . If k = 2, then F (x | G2 ) = P(X ≤ x | Y1 , Y3 ) = P(|Y2 | ≤ (Y3 x − Y1 )1/2 ) = F2 (Z ) − F2 (−Z ) where Z = (Y3 x − Y1 )1/2 , and the density estimator at x is F 0 (x | G2 ) = (f2 (Z ) + f2 (−Z ))dZ /dx = (f2 (Z ) − f2 (−Z ))Y3 /(2Z ). This second estimator can be problematic if Z can take values near 0; this shows that a good choice of k can be crucial in general..

(28) 14. Example 3: discontinuity issues. aft. Let X = max(Y1 , Y2 ) where Y1 and Y2 are independent and continuous. With G = G2 (we hide Y2 ): ( P[Y2 ≤ x | Y1 = y ) = F2 (x) if x ≤ y ; P[X ≤ x | Y1 = y ) = 0 if x < y .. Dr. If F2 (y ) > 0, this function is discontinuous at x = y , so Assumption 1 does not hold. The method does not work in this case..

(29) 14. Example 3: discontinuity issues. aft. Let X = max(Y1 , Y2 ) where Y1 and Y2 are independent and continuous. With G = G2 (we hide Y2 ): ( P[Y2 ≤ x | Y1 = y ) = F2 (x) if x ≤ y ; P[X ≤ x | Y1 = y ) = 0 if x < y . If F2 (y ) > 0, this function is discontinuous at x = y , so Assumption 1 does not hold. The method does not work in this case.. Dr. Same problem if X = min(Y1 , Y2 ). With G = G2 , we have ( F2 (x) if x ≤ y ; P[X ≤ x | Y1 = y ) = 1 if x ≤ y . If F2 (y ) < 1, this function is also discontinuous at x = y ..

(30) 15. Elementary quasi-Monte Carlo (QMC) Bounds (Recall). aft. Integration error for g : [0, 1)s → R with point set Pn = {u0 , . . . , un−1 } ⊂ [0, 1)s : n−1. 1X En = g (ui ) − n i=0. Z. g (u)du.. [0,1)s. Koksma-Hlawka inequality: |En | ≤ VHK (g )D ∗ (Pn ) where VHK (g ) =. X Z. D ∗ (Pn ) =. (Hardy-Krause (HK) variation). Dr. ∅6=v⊆S. [0,1)s. ∂ |v| g (u) du, ∂v. sup. u∈[0,1)s. vol[0, u) −. |Pn ∩ [0, u)| n. (star-discrepancy).. There are explicit point sets for which D ∗ (Pn ) = O((log n)s−1 /n) = O(n−1+ ), ∀ > 0. Explicit RQMC constructions for which E[En ] = 0 and Var[En ] = O(n−2+ ), ∀ > 0. With ordinary Monte Carlo (MC), one has Var[En ] = O(n−1 )..

(31) 16. aft. Combining RQMC with the KDE. Done in Ben Abdellah, L’Ecuyer, Owen, Puchhammer (2019).. Dr. Difficulty: The KDE has a very large variation when the bandwidth h is small (to reduce the bias). So unless the (effective) dimension is very small, RQMC reduces the MISE only modestly..

(32) 17. aft. Applying RQMC to the CDE To apply RQMC to the CDE, we must be able to write the density estimator as a function of u ∈ [0, 1)s : F (x | G) = g̃ (x, u),. F 0 (x | G) = g̃ 0 (x, u) = dg (x, u)/dx. Dr. for some g̃ : [a, b] × [0, 1)s for which g̃ 0 (x, ·) has bounded HK variation for each x..

(33) 17. aft. Applying RQMC to the CDE To apply RQMC to the CDE, we must be able to write the density estimator as a function of u ∈ [0, 1)s : F (x | G) = g̃ (x, u),. F 0 (x | G) = g̃ 0 (x, u) = dg (x, u)/dx. Dr. for some g̃ : [a, b] × [0, 1)s for which g̃ 0 (x, ·) has bounded HK variation for each x. CDE sample: g̃ 0 (x, U1 ), . . . , g̃ 0 (x, Un ) where {U1 , . . . , Un } is an RQMC point set over [0, 1)s ..

(34) 17. aft. Applying RQMC to the CDE To apply RQMC to the CDE, we must be able to write the density estimator as a function of u ∈ [0, 1)s : F (x | G) = g̃ (x, u),. F 0 (x | G) = g̃ 0 (x, u) = dg (x, u)/dx. Dr. for some g̃ : [a, b] × [0, 1)s for which g̃ 0 (x, ·) has bounded HK variation for each x. CDE sample: g̃ 0 (x, U1 ), . . . , g̃ 0 (x, Un ) where {U1 , . . . , Un } is an RQMC point set over [0, 1)s . If g̃ 0 (x, ·) does not have bounded variation, RQMC can still be worthwhile, although there is no guarantee..

(35) 18. aft. Example: sum of independent random variables (again). X = Y1 + · · · + Yd , where the Yj are independent and continuous with cdf Fj and density fj , and G is defined by hiding Yk for an arbitrary k: def. Gk = S−k = Y1 + · · · + Yk + · · · + Yd = F1−1 (U1 ) + · · · + Fk−1 (Uk ) + · · · + Fd−1 (Ud ).. Dr. We have F (x | Gk ) = Fk (x − S−k ) = g̃ (x, ·) and the density estimator is F 0 (x | Gk ) = fk (x − S−k ) = g̃ 0 (x, U) where U = (U1 , . . . , Ud ). If g̃ 0 (x, ·) has bounded HK variation, then MISE = O(n−2+ )..

(36) 19. Experimental setting for numerical experiments. Dr. aft. We want to test the method on some examples. For each method and each n considered, we compute the CDE with n samples, evaluate it at a set of ne evaluation points over [a, b], repeat this nr times, compute the variance at each evaluation point, and estimate the IV..

(37) 19. Experimental setting for numerical experiments. aft. We want to test the method on some examples. For each method and each n considered, we compute the CDE with n samples, evaluate it at a set of ne evaluation points over [a, b], repeat this nr times, compute the variance at each evaluation point, and estimate the IV.. Dr. We repeat this for n = 214 , . . . , 219 and fit the model IV = Kn−ν by linear regression: log2 IV ≈ log2 K − ν log2 n . We report ν̂ and also the IV for n = 219 ..

(38) 19. Experimental setting for numerical experiments. aft. We want to test the method on some examples. For each method and each n considered, we compute the CDE with n samples, evaluate it at a set of ne evaluation points over [a, b], repeat this nr times, compute the variance at each evaluation point, and estimate the IV. We repeat this for n = 214 , . . . , 219 and fit the model IV = Kn−ν by linear regression: log2 IV ≈ log2 K − ν log2 n . We report ν̂ and also the IV for n = 219 .. Dr. MC and RQMC Point sets:. I MC: Independent points (MC), I Lat+s: lattice rule with a random shift modulo 1, I Lat+s+b: lattice rule with a random shift modulo 1 + baker’s transformation, I LMS: Sobol’ points with left matrix scrambling (LMS) + digital random shift..

(39) 20. Displacement of a cantilever beam (Bingham 2017). aft. Displacement X of a cantilever beam with horizontal load Y2 and vertical load Y3 : r κ Y22 Y32 X = h(Y1 , Y2 , Y3 ) = + 4 Y1 w 4 t. (1). where κ = 5 × 105 , w = 4, t = 2, Y1 , Y2 , Y3 independent normal, Yj ∼ N (µj , σj2 ), Symbol Y1 Y2 Y3. µj 2.9 × 107 500 1000. Dr. Description Young’s modulus Horizontal load Vertical load. σj 1.45 × 106 100 100. The goal is to estimate the density of X over [3.1707, 5.6675], which covers about 99% of the density (it clips 0.5% on each side)..

(40) aft. 21. Conditioning on G1 = {Y2 , Y3 } means hiding Y1 . We have r r κ Y22 Y32 κ Y22 Y32 def def X = + 4 ≤ x if and only if Y1 ≥ + 4 = W1 (x) = W1 . Y1 w 4 t x w4 t For x > 0, and. Dr. F (x | G1 ) = P[Y1 ≥ W1 | W1 ] = 1 − Φ((W1 − µ1 )/σ1 ) F 0 (x | G1 ) = −. φ((W1 − µ1 )/σ1 )W10 (x) φ((W1 − µ1 )/σ1 )W1 (x) = . σ1 xσ1.

(41) 22. Suppose we condition on G2 = {Y1 , Y3 } instead, i.e., hide Y2 . We have def Y22 ≤ w 4 (xY1 /κ)2 − Y32 /t 4 = W2 .. aft. if and only if. Dr. X ≤x.

(42) 22. Suppose we condition on G2 = {Y1 , Y3 } instead, i.e., hide Y2 . We have if and only if. def Y22 ≤ w 4 (xY1 /κ)2 − Y32 /t 4 = W2 .. aft. X ≤x. If W2 ≤ 0, then F 0 (x | G2 ) = 0. If W2 > 0, p p p p F (x | G2 ) = P[− W2 ≤ Y2 ≤ W2 | W2 ] = Φ(( W2 − µ2 )/σ2 ) − Φ(−( W2 + µ2 )/σ2 ) √ √ φ(( W2 − µ2 )/σ2 ) + φ(−( W2 + µ2 )/σ2 ) √ F (x | G2 ) = > 0. w 4 x(Y1 /κ)2 /(σ2 W2 ) 0. Dr. and.

(43) 22. Suppose we condition on G2 = {Y1 , Y3 } instead, i.e., hide Y2 . We have if and only if. def Y22 ≤ w 4 (xY1 /κ)2 − Y32 /t 4 = W2 .. aft. X ≤x. If W2 ≤ 0, then F 0 (x | G2 ) = 0. If W2 > 0, p p p p F (x | G2 ) = P[− W2 ≤ Y2 ≤ W2 | W2 ] = Φ(( W2 − µ2 )/σ2 ) − Φ(−( W2 + µ2 )/σ2 ) √ √ φ(( W2 − µ2 )/σ2 ) + φ(−( W2 + µ2 )/σ2 ) √ F (x | G2 ) = > 0. w 4 x(Y1 /κ)2 /(σ2 W2 ) 0. Dr. and. For conditioning on G3 , the analysis is the same as for G2 , by symmetry, and we get √ √ φ(( W3 − µ3 )/σ3 ) + φ(−( W3 + µ3 )/σ3 ) 0 √ F (x | G3 ) = > 0. t 4 x(Y1 /κ)2 /(σ3 W3 ) for W3 > 0, where W3 is defined in a similar way as W2 ..

(44) 23. Instead of choosing a single conditioning k, we can take a convex combination:. aft. fˆ(x) = α1 F 0 (x | G1 ) + α2 F 0 (x | G2 ) + α3 F 0 (x | G3 ),. Dr. where α1 + α2 + α3 = 1. This is equivalent to taking F 0 (x | G1 ) as the main estimator and the other two as control variates (CV). We can use CV theory to optimize the αj ’s..

(45) 23. Instead of choosing a single conditioning k, we can take a convex combination:. aft. fˆ(x) = α1 F 0 (x | G1 ) + α2 F 0 (x | G2 ) + α3 F 0 (x | G3 ), where α1 + α2 + α3 = 1. This is equivalent to taking F 0 (x | G1 ) as the main estimator and the other two as control variates (CV). We can use CV theory to optimize the αj ’s.. G1 0.97 2.06 2.26 2.21. G3 0.99 2.04 1.98 2.21. comb. 0.98 2.02 2.07 2.21. KDE 14.7 — — 20.5. Dr. MC Lat+s Lat+s+b Sob+LMS. KDE 0.80 — — 0.96. ν̂ G2 0.98 2.82 2.55 2.03. − log2 MISE (n = 219 ) G1 G2 G3 comb. 19.3 14.5 22.8 22.5 38.9 25.4 41.5 41.5 44.3 23.3 45.5 46.0 44.0 23.6 45.7 46.1.

(46) 23. Instead of choosing a single conditioning k, we can take a convex combination:. aft. fˆ(x) = α1 F 0 (x | G1 ) + α2 F 0 (x | G2 ) + α3 F 0 (x | G3 ), where α1 + α2 + α3 = 1. This is equivalent to taking F 0 (x | G1 ) as the main estimator and the other two as control variates (CV). We can use CV theory to optimize the αj ’s.. G1 0.97 2.06 2.26 2.21. G3 0.99 2.04 1.98 2.21. comb. 0.98 2.02 2.07 2.21. KDE 14.7 — — 20.5. Dr. MC Lat+s Lat+s+b Sob+LMS. KDE 0.80 — — 0.96. ν̂ G2 0.98 2.82 2.55 2.03. − log2 MISE (n = 219 ) G1 G2 G3 comb. 19.3 14.5 22.8 22.5 38.9 25.4 41.5 41.5 44.3 23.3 45.5 46.0 44.0 23.6 45.7 46.1. For n = 219 , the MISE is about 2−14.7 for the usual KDE+MC and 2−46 for the new CDE+RQMC; i.e., MISE is divided by more than 231 ≈ 2 millions..

(47) 24. aft. Comparison for CDE with linear combination of 3 estimators, for cantilever. Independent points Lattice+Shift Lattice+shift+baker Sobol+LMS. log(IV) vs log(n). −30. −40. Dr. log IV. −20. 14. 16. log n. 18.

(48) 24. CMC for the SAN Example. aft. Want to estimate the density of the longest path length X . CMC estimator of P[X ≤ x]: F (x | G) = P[X ≤ x | {Yj , j 6∈ L}] for a minimal cut L. Ex.: L = {5, 6, 7, 9, 10} and Yj = Fj−1 (Uj ). This estimator continuous in the Uj ’s and in x. 5. Y6. Y11. Y10. Dr 2. Y2. source. 0. Y1. 8. Y5. 4. Y3. Y13. Y9. 7. Y8. 1. Y4. 3. sink. Y12 Y7. 6.

(49) 25. aft. For each j ∈ L, let Pj be the length of the longest path that goes through arc j when we exclude Yj from that length. Then Y F (x | G) = P [X < x | {Yj : j 6∈ L}] = Fj [x − Pj ] j∈L. and F 0 (x | G) =. X. if fj exists for all j ∈ L.. Y. Fl [x − Pj ],. l∈L, l6=j. Dr. j∈L. fj [x − Pj ]. Under this conditioning, the cdf of every path length is continuous in x, and so is F (· | G), and Assumption 1 holds, so F 0 (x | G) is an unbiased density estimator..

(50) 26. KDE. MC Lat+s Sobol+LMS MC Lat+s Sobol+LMS. ν̂ 0.77 0.75 0.76 0.99 1.26 1.25. − log2 MISE (n = 219 ) 20.9 22.0 22.0 25.5 29.9 29.9. Dr. CDE. aft. Estimated MISE = Kn−ν , for KDE with CMC.. With RQMC, we observe a convergence rate near O(n−1.25 ) for the IV and the MISE. For n = 219 , by using the new CDE+RQMC rather than the usual KDE+MC, the MISE is divided by about 29 ≈ 500..

(51) 27. aft. Conclusion. Dr. I The CDE is an unbiased density estimator with better convergence rate for the IV and the MISE. Combining it with RQMC can provide an even better rate, and sometimes huge MISE reductions. I Future: Density estimation for a function of the state of a Markov chain, using Array-RQMC. I What if we we cannot find G for which Assumption 1 holds and F 0 (x | G) is easy to compute? Current work: density estimator based on likelihood ratio derivative estimation. I Lots of potential applications..

(52) Some related references. 27. aft. I S. Asmussen. Conditional Monte Carlo for sums, with applications to insurance and finance, Annals of Actuarial Science, prepublication, 1–24, 2018. I S. Asmussen and P. W. Glynn. Stochastic Simulation. Springer-Verlag, 2007. I A. Ben Abdellah, P. L’Ecuyer, A. B. Owen, and F. Puchhammer. Density estimation by Randomized Quasi-Monte Carlo. Submitted, 2018. I J. Dick and F. Pillichshammer. Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration. Cambridge University Press, Cambridge, U.K., 2010. I P. L’Ecuyer. A unified view of the IPA, SF, and LR gradient estimation techniques. Management Science 36: 1364–1383, 1990.. Dr. I P. L’Ecuyer. Quasi-Monte Carlo methods with applications in finance. Finance and Stochastics, 13(3):307–349, 2009. I P. L’Ecuyer. Randomized quasi-Monte Carlo: An introduction for practitioners. In P. W. Glynn and A. B. Owen, editors, Monte Carlo and Quasi-Monte Carlo Methods 2016, 2017. I P. L’Ecuyer and G. Perron. On the Convergence Rates of IPA and FDC Derivative Estimators for Finite-Horizon Stochastic Simulations. Operations Research, 42 (4):643–656, 1994. I D. W. Scott. Multivariate Density Estimation. Wiley, 2015..

(53)