• Aucun résultat trouvé

≤P0

kfbn,α−E0fbn,αk>(M0−Cd)ǫn /(2α+d) . Now apply (7.7) of Lemma 7.6 with x=CI2n and since ǫn≫p

logn/n,

Q1 s

2Pdl=1hn,l(α)logn

n +p

2Q2CI2Pdl=1hn,l(α)/2ǫn ≤2p

2Q2CIǫn /(2α+d), when n is large enough. Therefore, take M0 > 2√

2Q2CI +Cd and we will have E0Φn ≤ e−CI2n.

For the Type II error with f ∈ {B∞,∞α (R) :kf−f0k > Mρnǫn} such that M > M0, we apply the reverse triangle inequality twice to yield

Ef(1−Φn) =Pf

kfbn,α−f0k ≤M0ρnǫn

≤Pf

kfbn,α−Effbn,αk ≥ kf−f0k−M0ρnǫn− kEffbn,α−fk .

Now since f ∈ Bα∞,∞(R), we apply (7.6) again to conclude kEffbn,α−fk ≤Cdǫn /(2α+d). Hence we are in the same situation as in the Type I error case, and we can use the same argu-ment to conclude that forM and nlarge enough, we have supf∈Bα∞,∞(R):kf−f0k>M ρnǫnEf(1− Φn) ≤ e−CII2n for some constant CII > 0 when ρn = ǫ−d/(2αn +d). The last statement is proved using the master theorem (see Theorem 3 of [11]) once we have tests with expo-nential errors. The Kullback-Leibler neighborhood and prior complement criteria follow the same steps as in Lemma 7.4 to prove L2-contraction rate.

7 Technical lemmas

The lemma below quantifies the error in approximating Riemann’s sum with its integral version, and it is useful to give size estimates of various discrete sums found in this paper.

Let hf, gi be the inner product of two functions f, g in Hilbert space.

Lemma 7.1. Suppose∂df /(∂x1· · ·∂xd)∈L1, then if the fixed design points are chosen such that (3.3) holds, we have for some constant C >0,

where U(x) is the Uniform([0,1]d) cumulative distribution function. Thus, the first term is

|R

[0,1]df(x)dx|. To bound the second term, observe that by the multivariate integration by parts, R

For the other direction, use the reverse triangle inequality. This together with the upper bound established above will then prove the result.

Lemma 7.2. Under the assumption of (3.3),

|(ΨTaΨb)c,e| −n|hψa,c, ψb,ei|.

Proof. To prove the first statement of (7.1), apply Lemma 7.1 with f(x) = ψa,c(x)ψb,e(x).

By construction, the support of the CDV wavelet ψa,c is some compact set Ia such that its Lebesgue measure is OQd

l=12−al

. Therefore, we can restrict the domain of integration to Ia ∩ Ib, which is the intersection of the supports of ψa,c and ψb,e. Since the wavelets and their derivatives are uniformly bounded, we can upper boundI(f) up to some constant multiple by by orthonormality, and I(f) is of the order of Qd

l=12al/2 The assertion then follows by appealing to Lemma 7.1.

To prove (7.2), let f(x) = |ψj,k(x)|and note thatR

[0,1]df(x)dx=Qd l=1

R1

0jk,kl(xl)|dxl. Then if ψjl,kl is an interior CDV wavelet, we will have by a change of variable

Z 1

0jl,kl(xl)|dxl=

Z 2jl(Nl+kl) 2jl(−Nl+1+kl)

2jl/2kl(2jlxl)|dxl.2−jl/2,

where the constant in . above does not depend on jl. The same argument holds for the boundary corrected case, where the lower limit of the integral is replaced by 0 if the 0-boundary is considered and the upper limit by 1 for the 1-boundary. As a result, R

[0,1]df(x)dx.Qd

l=12−jl/2. By restricting the domain of integration to the support of ψj,k, i.e., Ij,

Remark 7.3. For uniform random design, the stochastic version of Lemma 7.2 can be deduced from Bernstein’s inequality (see (3.24) in Theorem 3.1.7 of [13]). In this case, (7.1)

is:

P

"

TaΨb)c,e−nhψa,c, ψb,ei.√ n

Yd l=1

2(al+bl)/2

#

≥1−2e−2

Pd l=1

al+bl 2

P

"

TaB)c,m

.√ n

Yd l=1

2al/2

#

≥1−2e−2

Pd l=1al

. The extra √

n is due to the fact that kGn−Uk =OP(n−1/2) by Donsker’s theorem in the random case, instead of the rateO(n−1)for the fixed design case as in (3.3). In particular, for jl ≤ Jen,l, l = 1, . . . , d where Jen,l is increasing with n, then if 2Pdl=1Jen,l =o(√

n), this implies that (ΨTjΨj)k,k≍ n by the orthonormality of ψj,k and Pn

i=1j,k(Xi)|.nQd

l=12−jl/2 with probability at least 1−2e−2Pdl=1jl.

The lemma below gives the L2-posterior contraction rate for spike-and-slab prior in non-parametric regression models. It shows in particular that there is an extra logarithmic factor in the rate, and is a reflection of the fact that separable selection rules (coefficient-wise spike-and-slab) will have at least a logarithmic penalty when trying to estimatef adaptively under a global L2-loss.

Lemma 7.4. Under the hierarchical spike-and-slab prior in (3.1), there exist constants M, P4 >0 such that for any 0< αl< η+ 1, l= 1, . . . , d and uniformly overf0 ∈ Bα∞,∞(R),

E0Π kf−f0kn+|σ2−σ20|> M(n/logn)−α/(2α+d)Y

≤ n−P4.

Proof. We will use the master theorem (see Theorem 3 of [11]) by constructing test function with exponential error probabilities, and verifying that the prior gives sufficient mass on Kullback-Leibler neighborhood around (f0, σ02). Let ǫn→ 0 andnǫ2n→ ∞. For l= 1, . . . , d, we choose Jn,l(α) such that 0.5U(n/logn)α/{αl(2α+d)} ≤2Jn,l(α) ≤U(n/logn)α/{αl(2α+d)}

for some large enough constant U > 0. For Θn := {θ : θj,k = θj,k1{j≤Jn(α),k}}, define sieves Fn:={f :θ∈Θn} consisting of functions with wavelet expansion truncated at levels Jn,l(α)−1, l= 1, . . . , d.

For any f ∈ Fn and by the property of L2-projection, kf −KJn(α)(f0)kn ≤ kf−f0kn. Then by the triangle inequality, kf −f0kn ≤ kf −KJn(α)(f0)kn +kKJn(α)(f0)−f0k . kf −KJn(α)(f0)kn+Pd

l=12−αlJn,l(α), where the last inequality follows from Proposition 3.3 sincef0 ∈ B∞,∞α (R). In view of Lemma 7.7,kf−KJn(α)(f0)k2n≍ kϑ−ϑ0k2+kθe−eθ0k2, since 2Pdl=1Jn,l(α)=o(n). Here the tilde ineθ represents the truncated mother wavelet coefficients.

We then conclude that there are constants W1, W2 >0 such that for f ∈ Fn, W1kθe−eθ0k ≤ kf −f0kn

≤W2

hkϑ−ϑ0k+keθ−eθ0k+ (logn/n)α/(2α+d)i

(7.3)

by the definition of Jn,l(α). Let us define sieve slices Fnj = {f ∈ Fn : jǫn < kf −f0kn+

2−σ20| ≤(j+ 1)ǫn} for any integer j ≥M. It follows from (7.3) above that Fnj ⊂n

kθe−eθ0k ≤(2/W1)jǫn, |σ2−σ20| ≤2jǫn

o.

By calculating the covering number of the Euclidean space on the right hand side, we conclude that Fnj has a ǫn-net of at most eCjnǫ2n points for some constant C >0 if 2Pdl=1Jn,l(α) .nǫ2n. Then by Lemma 1 of [22], there exists a testφn,j with exponentially small error probabilities for testing f = f0 against f ∈ Fnj, by maximizing over 3 tests corresponding to the cases where|σ2−σ02| ≤σ02/2, σ2 >3σ02/2 andσ2 < σ02/2. Then using the arguments outlined in the proof of Theorem 9 in [11], we conclude that φn = supj≥Mφn,j is a test with exponentially small Type I and II errors, thus fulfilling the testing requirement of the master theorem.

To characterize prior concentration, letK(p, q) :=R

plog (p/q)dµbe the Kullback-Leibler divergence and V(p, q) := R

p[log (p/q)−K(p, q)]2dµ, where µ is the Lesbegue measure.

Define the Kullback-Leibler neighborhood Bnn) := {(f, σ2) : n−1Pn

i=1K(pf0,i, pf,i) ≤ ǫ2n, n−1Pn

i=1V(pf0,i, pf,i) ≤ ǫ2n} with pg,i being the density of N[g(Xi), σ2]. After some calculations,

1 n

Xn i=1

K(pf0,i, pf,i) = 1 2log

σ2 σ02

− 1 2

1− σ20

σ2

+ kf−f0k2n

2 , 1

n Xn

i=1

V(pf0,i, pf,i) = 1 2

1− σ02

σ2 2

02kf −f0k2n

σ4 .

Hence, there are constants W3,fW3 >0 such that Bnn) ⊃ {kf −f0kn ≤ W3ǫn,|σ2−σ02| ≤ Wf3ǫn}. By (7.3), take ǫn ≥ (3W2/W3)(logn/n)α/(2α+d) and we have Bnn) ⊃ {kϑ − ϑ0k ≤ W3ǫn/(3W2),keθ−eθ0k ≤ W3ǫn/(3W2),|σ2−σ02| ≤ fW3ǫn}. Therefore by the assumed independence of the priors, Π[Bnn)] can be lower bounded by

Π

kϑ−ϑ0k ≤ W3

3W2

ǫn

Π

kθe−eθ0k ≤ W3

3W2

ǫn

Π

2−σ02| ≤fW3ǫn

. (7.4)

Since πσ is continuous and πσ(·)>0 by assumption, we have Π(|σ2−σ20| ≤Wf3ǫn)≥2Wf3ǫn inf

|u−σ02|≤Wf3ǫn

πσ(u) = 2fW3ǫnπσ02)[1 +o(1)],

which is greater than e−H1logn for some constant H1 > 0, where the last equality follows since ǫn &n−1/2 by assumption. For a setA in some Euclidean space, we denote vol(A) to

be the volume of A. Let Ne =Qd for some constant H2 >0. We lower bound the second factor in (7.4) by

Πh that j takes on all 2d−1 possible combinations of the Ql’s, and each combination has the form such configurations. Thus, the sum over j ∈ Kn(α)c in (7.5) is bounded above up to some universal constant by

By the assumption in (3.2) and denoting Je=Qd

for some constantH4 >0. Therefore by multiplying all the lower bounds obtained for (7.4), it follows that Π[Bnn)]≥e−Cnǫ2n for some constantC >0 only if (logn/n)2Pdl=1Jn,l(α)2n.

Corollary 7.5. As a consequence of Lemma 7.4 above, we have with posterior probability at least 1−n−P4 that

kϑ−ϑ0k.(n/logn)−α/(2α+d), kθ−θ0k .(n/logn)−α/(2α+d).

Proof. If f has wavelet expansion as in (3.1) at resolution Jn, then by the property of L2 -projection and Lemma 7.2, we havekf−f0kn≥ kf−KJn(f0)kn &(kϑ−ϑ0k2+kθ−θ0k2)1/2 since 2Pdl=1Jn,l =o(n) by assumption. The result follows by applying Lemma 7.4.

Lemma 7.6. Project f unto the wavelet bases at resolution Jn and write the regression model in (1.1) (assuming known σ = σ0) as Y = Ψθ +ε with Ψ the wavelet basis matrix and ε ∼N(0, σ02I). Let fbn(x) :=ψJn(x)TTΨ)−1ΨTY be the corresponding least squares estimator withψJn(x)being the vector of all wavelet functions at resolution Jn evaluated at x. Let 2Pdl=1Jn,l =o(n), then for any f ∈ Bα∞,∞(R), there exist constantsC, Q1, Q2 >0such

Proof. By Proposition 3.3, for anyf ∈ B∞,∞α (R), there is a ξ such thatkf−ψJn(·)Tξk. Pd

l=12−αlJn,l. Therefore by adding and subtracting Ψξ and using the triangle inequality, kEffbn−fk is

Jn(·)TTΨ)−1ΨTF −fk

≤ kψJn(·)TTΨ)−1ΨT(F −Ψξ)k+kψJn(·)Tξ−fk, where the second term is O(Pd

l=12−αlJn,l). For a matrix A, let kAk(∞,∞) = maxiP

j|aij| (max of absolute row sums) and kAk(1,1) = maxjP

i|aij| (max of absolute column sums).

Using H¨older’s inequality |xTy| ≤ kxk1kyk and definition of the induced matrix norm kAxk1 ≤ kAk(1,1)kxk1, the first term is bounded by

sup

x∈[0,1]dJn(x)k1k(ΨTΨ)−1k(1,1)Tk(1,1)kF −Ψξk. (7.8) By (6.3) with r = 0, it holds that kψJn(x)k1 . 2Pdl=1Jn,l/2 uniformly in x ∈ [0,1]d. Note that since each entry of Ψ is a dilated version of the base CDV wavelet with com-pact support, it follows that ΨTΨ is banded. Furthermore by choosing 2Pdl=1Jn,l = o(n), all eigenvalues of ΨTΨ are ≍ n by virtue of Lemma 7.7. Therefore by appealing to Lemma A.4 of [30], we conclude k(ΨTΨ)−1k(∞,∞) . n−1. Since (ΨTΨ)−1 is symmetric, it follows that k(ΨTΨ)−1k(1,1) = k(ΨTΨ)−1k(∞,∞) . n−1. Now observe that kΨTk(1,1) = max1≤i≤nP

j,kj,k(Xi)| ≤supx∈[0,1]dJn(x)k1, and this isO(2Pdl=1Jn,l/2) as shown above.

It now remains to bound kF −Ψξk, and we know it is O(Pd

l=12−αlJn,l) by Proposition 3.3. Combine everything and use the assumption that 2Pdl=1Jn,l ≤ n to conclude (7.6).

By construction, fbn−Effbn ∼ GP(0, σ02ΣJn) where the covariance kernel ΣJn(x,y) = ψJn(x)TTΨ)−1ψJn(y) for any x,y ∈ [0,1]d. Since the wavelets are uniformly bounded and applying (6.3) with r = 0, we can deduce that for any x ∈ [0,1]d, kψJn(x)k2 ≤ maxj≤Jn1d,kj,k(x)|P

j≤Jn1d,kj,k(x)|.Qd

l=12Jn,l. Then by appealing to Lemma 7.7, sup

x∈[0,1]d

ΣJn(x,x).k(ΨTΨ)−1k(2,2) sup

x∈[0,1]dJn(x)k2 ≤Q2n−12Pdl=1Jn,l

for some constant Q2 > 0. By the Borell’s inequality (see Proposition A.2.1 from [29] or Theorem 2.5.8 in [13]), we have for any x≥0 that

Pf

kfbn−Effbnk≥Efkfbn−Effbnk+ q

2Q2n−12Pdl=1Jn,lx

≤e−x. (7.9) Define η:=ΨTε. Observe thatfbn−Effbn isψJn(·)TTΨ)−1η, and by H¨older’s inequality, Efkfbn−Effbnk is bounded above by

sup

x∈[0,1]dJn(x)k1k(ΨTΨ)−1k(∞,∞)Ekηk .n−12Pdl=1Jn,l/2Ekηk,

in view of the bounds established in (7.8). Let us index the rows and columns ofΨTΨwith multi-indices of the form (j,k). Since η ∼N(0, σ20ΨTΨ), we can apply Lemma 2.3.4 of [13]

to conclude that for Zj,k ∼ N(0,1) i.i.d. with (j,k) running across all the indices of the wavelet series up to resolution Jn,

Ekηk ≤ max

where we have utilized Lemma 7.7 to upper bound the diagonals ofΨTΨ, under the assump-tion that 2Pdl=1Jn,l =o(n). Combine all these established bounds back into (7.9) to deduce (7.7).

Lemma 7.7. For any vector θ, we have

kθk2 n−

by applying the third display of Section 5 in [2]. The last statement follows since the maximum or minimum eigenvalue is the maximization or minimization of θTΨTΨθ/kθk2 over θ 6=0.

References

[1] Belitser, E. and Ghosal, S. (2003). Adaptive bayesian inference on the mean of an infinite-dimensional normal distribution. Ann. Statist., 31(2):536–559.

[2] Cai, T. T. (2002). On adaptive wavelet estimation of a derivative and other related linear inverse problems. J. Statist. Plann. Inference, 108:329–349.

[3] Castillo, I. (2014). On Bayesian supremum norm contraction rates. Ann. Statist., 42(5):2058–2091.

[4] Castillo, I., Schmidt-Hieber, J., and van der Vaart, A. (2015). Bayesian linear regression with sparse priors. Ann. Statist., 43(5):1986–2018.

[5] Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist., 40(4):2069–2101.

[6] Chipman, H. A., Kolaczyk, E. D., and McCulloch, R. E. (1997). Adaptive bayesian wavelet shrinkage. Journal of the American Statistical Association, 92(440):1413–1421.

[7] Cohen, A., Daubechies, I., and Vial, P. (1993). Wavelets on the interval and fast wavelet transforms. Applied and Computational Harmonic Analysis, 1(1):54–81.

[8] F. Abramovich, T. S. and Silverman, B. W. (1998). Wavelet thresholding via a bayesian approach. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 60(4):725–749.

[9] George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling.Journal of the American Statistical Association, 88(423):881–889.

[10] Ghosal, S., Ghosh, J. K., and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist., 28(2):500–531.

[11] Ghosal, S. and van der Vaart, A. W. (2007). Convergence rates of posterior distributions for noniid observations. Ann. Statist., 35(1):192–223.

[12] Gin´e, E. and Nickl, R. (2011). Rates of contraction for posterior distributions in Lr -metrics, 1≤r≤ ∞. Ann. Statist., 39(6):2883–2911.

[13] Gin´e, E. and Nickl, R. (2016). Mathematical Foundations of Infinite-Dimensional Sta-tistical Models. Cambridge University Press, New York.

[14] Hoffmann, M., Rousseau, J., and Schmidt-Hieber, J. (2015). On adaptive posterior concentration rates. Ann. Statist., 43(5):2259–2295.

[15] Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist., 33(2):730–773.

[16] Johnstone, I. M. and Silverman, B. W. (2005). Empirical bayes selection of wavelet thresholds. Ann. Statist., 33(4):1700–1752.

[17] Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional bayesian statistics. Ann. Statist., 34(2):837–877.

[18] Li, F. and Zhang, N. R. (2010). Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics. J. Amer. Statist. Assoc., 105(491):1202–1214.

[19] Meyer, Y. (1992). Wavelets and operators. Cambridge University Press.

[20] Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023–1032.

[21] Ray, K. (2013). Bayesian inverse problems with non-conjugate priors. Electron. J. Stat., 7:2516–2549.

[22] Salomond, J.-B. (2018). Supplement for “Testing un-separated hypotheses by estimating a distance”. Bayesian Anal., 13(2):461–484.

[23] Schumaker, L. (2007). Spline Functions: Basic Theory. Cambridge University Press, New York, third edition.

[24] Scricciolo, C. (2014). Adaptive Bayesian density estimation inLp-metrics with Pitman-Yor or normalized inverse-Gaussian process kernel mixtures.Bayesian Anal., 9(2):475–520.

[25] Shen, W. and Ghosal, S. (2016). Adaptive bayesian density regression for high-dimensional data. Bernoulli, 22(1):396–420.

[26] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions.

Ann. Statist., 29(3):687–714.

[27] Sniekers, S. (2015). Credible sets in nonparametric regression. PhD thesis, Leiden University.

[28] van der Vaart, A. W. and van Zanten, J. H. (2009). Adaptive bayesian estimation using a gaussian random field with inverse gamma bandwidth. Ann. Statist., 37(5B):2655–2675.

[29] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Process With Applications to Statistics. Springer-Verlag New York, Inc.

[30] Yoo, W. W. and Ghosal, S. (2016). Supremum norm posterior contraction and credible sets for nonparametric multivariate regression. Ann. Statist., 44(3):1069–1102.

[31] Yoo, W. W. and Ghoshal, S. (2018). Bayesian mode and maximum estimation and accelerated rates of contraction. To appear in Bernoulli.

Documents relatifs