Multi-level Bayes and MAP monotonicity testing

(1)

HAL Id: hal-02293008

https://hal.archives-ouvertes.fr/hal-02293008

Preprint submitted on 20 Sep 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Multi-level Bayes and MAP monotonicity testing

Yuri Golubev, Christophe Pouet

To cite this version:

Yuri Golubev, Christophe Pouet. Multi-level Bayes and MAP monotonicity testing. 2019. �hal- 02293008�

(2)

Multi-level Bayes and MAP monotonicity testing

Golubev Yu. ^∗ and Pouet C. ^†

September 20, 2019

Abstract

In this paper, we develop Bayes and maximum a posteriori probability (MAP) approaches to monotonicity testing. In order to simplify this problem, we consider a simple white Gaussian noise model and with the help of the Haar transform we reduce it to the equivalent problem of testing positivity of the Haar coefficients. This approach permits, in particular, to understand links between monotonicity testing and sparse vectors detection, to construct new tests, and to prove their optimality without supplementary assumptions. The main idea in our construction of multi-level tests is based on some invariance prop- erties of specific probability distributions. Along with Bayes and MAP tests, we construct also adaptive multi-level tests that are free from the prior information about the sizes of non-monotonicity segments of the function.

Keywords: Haar transform, Bayes and MAP tests, multi-level hypothesis testing, stable distributions, type I and II error probabilities, critical signal- noise ratio.

AMS Subject Classification 2010: Primary 62C20; secondary 62J05.

1 Introduction

The literature on non-parametric monotonicity testing deals usually with the model

Y =f(X) +ξ,

∗Institute for Information Transmission Problems, Moscow, Russia. e-mail:

[email protected]

†Centrale Marseille, Aix Marseille Univ, CNRS, I2M, Marseille, France. e-mail:

[email protected]

(3)

whereY is a scalar dependent random variable,X a scalar independent random variable,f(·) an unknown function, andξan unobserved scalar random variable withE{ξ|X}= 0. We are interested in testing the null hypothesis, H₀ thatf(x) is increasing against the alternative,H₁ that there arex1 and x₂ such thatx₁< x₂ and f(x₁) > f(x₂). The decision is to be made based on the i.i.d. sample{X_i, Y_i}_1≤i≤n from the distribution of (X, Y). Typical applications of monotonicity testing are related to econometric models, see, e.g., Chetverikov [4].

Usual approaches to this problem have in their core simple heuristic ideas and assumptions. So, the tests proposed in Gijbels et. al. [9] and Ghosal, Sen, and van der Vaart [8] are based on the signs of (Y_i+k−Y_i)(X_i+k−X_i).

Hall and Heckman [10] developed a test based on the slopes of local linear estimates off(·). Along with these papers we can cite Schlee [15], Bowman, Jones, and Gijbels [2], Dümbgen and Spokoiny [6], Durot [7], Baraud, Huet, and Laurent [1], Wang and Meyer [17], and Chetverikov [4]. As to typical hypothesis aboutf(·), it is often assumed that f(x) is a Lipschitz function, i.e.,

|f(y)−f(x)| ≤L|y−x|, where the constantL <∞ may be known or unknown.

In this paper, we look at the problem of monotonicity testing from a little different and less intuitive viewpoint. As we will see below, our approach permits, in particular, to understand links between this problem and sparse vectors detection and to construct new powerful tests. In order to simplify technical details and to get rid of supplementary assumptions, we begin with monotonicity testing of an unknown functionf(t), t∈[0,1], in the so-called white noise model similar to that one considered in [6]. So, it is assumed we have at our disposal the noisy data

Y(t) =f(t) +σn(t), t∈[0,1], (1) where n(·) is a standard white Gaussian noise and σ >0 is a known noise level. With the help of these observations we want to test

the null hypothesis

H₀ : f⁰(t)≥0,for all t∈[0,1], vs. the alternative

H₁ : f⁰(t)<0,for somet∈[0,1].

Our approach to this problem is based on estimating the following linear functionals:

θ_h,t(f)^def= 1 h

Z t+h t

f(u)du− 1 h

Z t t−h

f(u)du

(4)

for allh, tthat are admissible, i.e., such that [t−h, t+h]⊆[0,1]. It is clear thatθ_h,t(f)/hmay be interpreted as approximations of the derivativef⁰(t) since

lim

h→0

θ_h,t(f)

h =f⁰(t), for any givent∈(0,1).

With the help of (1), the functionalsθ_h,t(f) are estimated as follows:

θˆh,t(Y) = 1 h

Z t+h t

Y(u)du− 1 h

Z t

t−hY(u)du and these estimates admit the obvious representation

θˆ_h,t(Y) =θ_h,t(f) +σ_hξ_h,t, (2) where

σ_h =σ r2

h, ξ_h,t= 1

√ 2h

Z _t+h

t

n(u)du− Z t

t−h

n(u)du

∼ N(0,1).

Notice that ifH₀is true, thenθh,t(f)≥0 for all admissibleh, t, otherwise (H₁ is true) there exist h⁰, t⁰ such that θ_h⁰_,t⁰(f) < 0. That is why in what follows we will focus on testing

the null hypothesis

H₀ : θ_h,t(f)≥0, for all admissibleh, t vs. the alternative

H₁ : θ_h,t(f)<0, for some admissibleh, t

(3)

based on the observations (2).

Let us denote for brevity

θ_h,t=θ_h,t(f), θˆ_h,t = ˆθ_h,t(Y).

In order to explain our approach to the problem (3), we begin with the simple case assuming thath, tare given. So, we have to test two composite hypotheses

H^h,t₀ :θ_h,t ≥0 vs.H^h,t₁ :θ_h,t<0.

Intuitively, the most powerful test with the type I error probability α rejectsH^h,t₀ if

θˆ_h,t ≤ −σ_htα, (4)

(5)

wheretα isα-value of the standard Gaussian distribution, i.e., a solution to Φ(tα) = 1−α,

where

Φ(x) = 1

√2π Z x

−∞

exp

−x² 2

dx.

Of course, there exist a lot of motivations for this test. In this paper, we make use of the so-called improper Bayes approach assuming thatθ_h,tin (2) is a random variable uniformly distributed on the interval [0, A], A >0, if H^h,t₀ is true, and on [−A,0] ifH^h,t₁ is true. So, we observe a random variable θˆ_h,t with the probability density

p^A₀(x|H^h,t₀ is true) = 1 A

Z A 0

exp

−(x−θ)² 2σ²_h

dθ and

p^A₁(x|H^h,t₁ is true) = 1 A

Z 0

−A

exp

−(x−θ)² 2σ_h²

dθ.

Thus, we deal with the simple hypothesis testing and by the Neyman- Pearson lemma, the most powerful test at significance level α rejects H^h,t₀ when

pÂ₁(ˆθh,t) pÂ₀(ˆθ_h,t) ≥tÂ_α.

Taking the limit in this equation asA→ ∞, we arrive at the improper Bayes test that rejects H^h,t₀ if

S θˆ_h,t

σ_h

≥t⁰_α, (5)

where

S(x) = Z 0

−∞

exp−(x−θ)²/2dθ Z ∞

0

exp−(x−θ)²/2dθ

= 1

Φ(x)−1. (6)

SinceS(x) is decreasing inx∈R, the tests (4) and (5) are obviously equivalent.

In what follows, we will make use of the following asymptotic result:

S(x) =

1 +O 1

x² √

2π(1−x) exp x²

2

, asx→ −∞. (7)

(6)

Along with this method, one can apply the maximum likelihood (ML) or minimax approaches. Finally, all these methods result in (4) but their initial forms are different. For instance, the ML test rejectsH^h,t₀ when

max

θ<0 exp−(ˆθ_h,t−θ)²/(2σ_h²) max

θ>0 exp−(ˆθ_h,t−θ)²/(2σ_h²) = exp

− θˆ_h,t²

2σ_h²sign(ˆθ_h,t)

≥t⁰⁰_α. (8) Emphasize that from a viewpoint of testing H^h,t₀ vs. H^h,t₁ there is no difference between (8) and (5), but the aggregation of these methods for testingH₀ vs. H₁ from (3) results in different tests. In this paper, we make use of the tests defined by (5) since their aggregation is simple.

In order to aggregate the statistical tests, we will make use of the so- called multi-resolution approach assuming that

1. h belongs to the following set of dyadic bandwidths H^def=

1 2,1

4, . . . 1 2^k, . . .

;

2. tbelongs to the family of dyadic grids G_h, h∈ H, defined by G_h ^def= h,3h, . . . ,1−h , h∈ H.

There are simple arguments motivating these assumptions

• random variables ξ_h,t and ξ_h⁰_,t⁰ in (2) are independent if {h, t} 6=

{h⁰, t⁰}. This fact simplifies significantly the statistical analysis of tests.

• ^ph/2 ˆθh,t are the Haar coefficients admitting a fast computation in the discrete version of (1).

2 Testing at a given resolution level

Let us fix some bandwidth h ∈ H and denote for brevity by n_h = 1/(2h).

In this section, we focus on testing the null hypothesis

H^h₀ : θ_h,t≥0 for allt∈ G_h vs. the alternative

H^h₁ : θ_h,t<0 for somet∈ G_h.

In order to construct Bayes and MAP tests, we assume that for given h∈ H

(7)

• the set{θ_h,t, t∈ G_h}contains the only one negative entry θh,τ;

• τ is an unobservable random variable uniformly distributed on G_h. 2.1 A Bayes test

With the arguments used in deriving (5), we get the following Bayes test:

H^h₀ is rejected if

1 nh

X

t∈G_h

S θˆ_h,t

σh

≥t^B_α,

whereS(·) is defined by (6). The critical levelt^B_α is defined by a conservative way, i.e., as a solution to

maxΘ≥0P_Θ 1

n_h X

t∈G_h

S θˆh,t

σ_h

> t^B_α

=α,

where hereP_Θstands for the measure generated by observations ˆθ_h,t defined by (2) for given Θ ={θ_h,t, h∈ H, t∈ G_h}.

It follows from Mudholkar’s theorem [12], see also Theorem 6.2.1 in [16], that for any Θ with nonnegative entriesθ_h,t≥0

P_Θ 1

n_h X

t∈G_h

S θˆ_h,t

σ_h

> x

≤P 1

n_h X

t∈G_h

S(ξ_h,t)≥x

(9)

and, thus, t^B_α may be computed as a solution to P

1 n_h

X

t∈G_h

S(ξ_h,t)≥t^B_α

=α. (10)

Therefore our next step is to study the following random variable:

B_h(ξ)^def= 1 n_h

X

t∈G_h

S(ξ_h,t).

2.1.1 A weak approximation of Bh(ξ)

We begin with computing a weak limit of B_h(ξ) as h → 0. Recall some standard definitions (see, e.g., [13]).

(8)

Definition. Let X1 and X2 be independent copies of a random variable X.

ThenX is said to be stable if for any constants a >0 andb >0the random variable aX₁+bX₂ has the same distribution as cX+dfor some constants c >0 andd.

In the class of stable distributions there is an interesting sub-class of the so-called stable distributions with the index of stabilityα= 1. For brevity, we will call them 1-stable distributions. The formal definition of this class is as follows:

Definition. A random variableX is called 1-stable if its characteristic func- tion can be written as

Eexp(itX) = exp

µit− |ct| −i2β|c|

π tlog(|t|)

. (11)

The next theorem shows that the weak limit of B_h(ξ) −log(n_h) is a 1-stable distribution.

Theorem 1.

h→0limEexpitB_h(ξ)−log(n_h) +γ = exp

itlog 1

|t|−π|t|

2

, where γ ≈0.57721 is Euler’s constant.

In other words, this theorem states that

h→0lim

B_h(ξ)−log(n_h) +γ]=^D ζ, whereζ is a 1-stable random variable (see (11)) with

µ= 0, c= π

2, β = 1. (12)

Apparently,ζ appeared firstly in [5]. Emphasize also that this random variable originate usually in Bayes hypothesis testing related to sparse vectors, see e.g. [3], [11].

The probability distribution of ζ has the following invariance property that plays an important role in Bayes tests aggregation.

Proposition 1.Letζkbe i.i.d. copies ofζ andπ¯be a probability distribution onZ⁺ with a bounded entropy. Then

∞

X

k=1

π¯_k

ζ_k−log 1 π¯k

D

=ζ. (13)

The proof of (13) follows immediately from (11) and (12).

(9)

2.1.2 A strong approximation of Bh(ξ)

Theorem1is not very informative about the tail behavior of the distribution of B_h(ξ). However, for obtaining a good approximation of t^B_α in (10) this behavior may play a crucial role because in some applicationsαmay be very small (of order 10⁻⁷) and so, the Monte-Carlo method and Theorem1 may not be good in this case.

Therefore our goal is to find an approximation of B_h(ξ) that controls well the tail of its distribution. Fortunately, this can be easily done. It is clear that

Φ(ξ_k)=^D U_k,

whereU_k are i.i.d. random variables uniformly distributed on [0,1]. Hence Bh(ξ)=^D 1

n_h

nh

X

k=1

1 U_k −1

= 1 n_h

nh

X

i=1

1 U_(k) −1,

where U_(k) is a non-decreasing permutation of U_k, k = 1, . . . , n_h. The distribution of U₍₁₎, U₍₂₎, . . . , U_(n_h₎ can be easily obtained with the help of the Pyke theorem [14]

U_(k)=^D E_k

E_n_h₊₁, (14)

where

E_k =

k

X

l=1

κl

is the cumulative sum of i.i.d. standard exponentially distributed random variables κl

Pκl ≥y = exp(−y).

In other words,E_k∼Gamma(k,1). With this in mind, we obtain B_h(ξ)=^D

1 +O

1

√n_h ⁿh

X

k=1

1 E_k −1

=

1 +O 1

√n_h ⁿh

X

k=1

1 E_k − 1

k

+

nh

X

k=1

1 k

−1.

(15)

Next, we make use of the following simple equations:

E

^∞ X

k=nh+1

1 E_k −1

k

2m1/(2m)

≤O 1

√n_h

(10)

and nh

X

k=1

1

k = log(nh) +γ+O 1

n_h

.

So, substituting them in (15), we arrive at the following theorem.

Theorem 2. Let

ζ^◦ =

∞

X

k=1

1 E_k −1

k

. (16)

Then

Bh(ξ)−log(nh) +γ =^D 1 +O(εh)ζ^◦+ 2γ−1 +O(εh) log(nh), (17) where ε_h is such that

E ε^2m_h ^1/(2m)≤ C_m

√nh

. (18)

Remark. The random variable ζ in Theorem 1 admits the following repre- sentation

ζ =

∞

X

k=1

1 E_k − 1

k

+ 2γ−1.

Notice also that it follows immediately from (17) that convergence rate in Theorem 1 islog(n_h)/√

n_h, i.e., as h→0, EexpitB_h(ξ)−log(n_h) +γ = exp

itlog 1

|t|−π|t|

2 +O

log(n_h)

√n_h

. Figure 1illustrates numerically Theorem2 and the above remark show- ing log-tail approximation error

∆(x;n_h) = log^hPB_h{ξ)−log(n_h) +γ ≥x ⁱ−log^hPζ^◦+ 2γ−1≥x ⁱ. computed with the help of the Monte-Carlo method with 0.5·10⁶ replica- tions. This picture shows that even for smalln_h = 4 the approximation (17) works very good.

2.2 A MAP test

Similarly to the Bayes test, we can construct the MAP test that rejectsH^h₀ if

maxt∈G_h

1 n_hS

θˆ_h,t σ_h

≥t^M_α ,

(11)

Figure 1: Log-tail approximation errors ∆(x;n_h) forn_h = 4 andn_h= 1024.

wheret^M_α is defined as a solution to maxΘ≥0P_Θ

maxt∈Gh

1 nh

S θˆ_h,t

σh

> t^M_α

=α.

Similarly to (9),t^M_α may be obtained from P

maxt∈G_h

S(ξ_h,t) n_h > t^M_α

=α.

As to the limit distribution of max_t∈G_hS(ξ_h,t)/n_h, as h → 0, it follows immediately from (14) that

maxt∈G_h

S(ξ_h,t) n_h

=D 1 +εh

κ , ash→0, (19) whereκ is a standard exponential random variable andε_h satisfies (18).

3 Multi-level testing

3.1 MAP multi-level tests

A heuristic idea behind our construction of multi-level MAP tests for (3) is related to (19) and consists in computing a positive deterministic function U_h, h ∈ H, bounding from above the random process log(1/κh), h ∈ H, whereκh are independent standard exponential random variables. In other words, we are looking forU_h such that

ζ^U = sup

h∈H

log 1

κh

−U_h

(12)

would be a non-degenerate random variable.

LetqÛ_α be α-value of ζÛ, i.e., solution to PζÛ ≥q_αÛ =α.

Therefore with (19), upper bounding random process log(1/κh) by U_h, we arrive at the test that rejectsH₀ if

sup

h∈H

maxt∈Gh

log 1

nh

S θˆ_h,t

σh

−U_h

≥q_α^U. (20) Computing q^U_α is based on the following simple fact. Assume that

K^U = log X

h∈H

e^−U^h

<∞.

Then

sup

h∈H

log 1

κh

−U_h

−K^U = log^D 1

κ. (21)

The proof of this identity is very simple. Indeed, Pⁿsup

h∈H

log 1 κh

−U_h−K^U > x^o

= 1− ^Y

h∈H

Pⁿlog 1 κh

≤U_h+x+K^U^o

= 1−exp

−^X

h∈H

exp

−x−Uh−K^U

= 1−exp[−exp(−x)].

Let us we denote

¯π_h = e^−U^h

X

h∈H

e^−U^h, then (21) can be rewritten in the following form:

Proposition 2. Let π¯ be a probability distribution on H. Then sup

h∈H

log 1

κh

+ log(¯πh) D

= log 1

κ. (22)

Therefore with the help of (21) we can compute α-critical level q_α^U in (20)

q_α^U =q_α^κ+K^U,

(13)

where

q_α^κ =−log

log 1 1−α

isα-value of log(1/κ).

Summarizing (see (20)), the MAP multi-level test rejectsH₀ if sup

h∈H

Z_h^M + log(¯π_h) ≥q_α^κ, (23) where

Z_h^M = max

t∈G_hlog 1

n_hS θˆh,t

σ_h

and ¯π is a probability distribution on H.

In order to study the performance of this method, we analyze the type II error probability. For given{ρ, τ :ρ∈ H, τ ∈ G_g} andA∈R⁺ define

Θ_ρ,τ(A) =ⁿθ_h,t:θρ,τ =−A; θ_h,t ≥0,(h, t)6= (ρ, τ)^o. (24) In other words, we consider the situation, where all shifts θ_h,t in (2) are positive except the only one. The position of the negative entry {ρ, τ}

and its amplitude are unknown, but it is assumed that {ρ, τ} are random variables with the distribution defined by

• P{ρ=h}= ¯πh,

• P{τ =t|ρ=h}=n⁻¹_h ,

where ¯π is a probability distribution on H with a bounded entropy Hπ¯ = ^X

h∈H

¯πhlog 1 π¯_h.

In what follows, we will deal with priors ¯π with large uncertainties assuming that ¯π→0, or more precisely, sup_h∈Hπ¯_h →0,but such that

lim

π→0¯

1 log[H¯π]

X

h∈H

π¯_h

H_π_¯−log 1 π¯h

= 0. (25)

In particular, we will consider the following class of prior distributions:

π¯_h = ¯π^ω,ν_h =ν

log₂(1/h) ω

^∞ X

k=1

ν k

ω

≈ 1 ων

log₂(1/h) ω

. (26)

(14)

This class is characterized by the bandwidth ω > 1 and the probability density ν(x), x ∈ R⁺, which is assumed to be continuous, bounded, and with

Hν = Z ∞

0

ν(x) log 1

ν(x)dx <∞, Z ∞

0

ν(x) log(x+ 1)dx <∞. (27) A typical example of a such distribution is the uniform one that corresponds toν(x) = 1, x∈[0,1].

It is clear that ¯π_h^ω,ν →0 as ω→ ∞ and that Condition (25) holds.

Let us begin with the case, where the prior distribution is known, the case of unknown ¯π will be considered later in Section 4.

The type II error probability over Θ_ρ,τ(A) of the MAP test (23) is defined as follows:

β_ρ,τ^M(A) = sup

Θ∈Θρ,τ(A)

P_Θⁿmax

h∈H

Z_h^M + log(¯π_h)≤q_α^κ^o. Our goal is to study the average type II error probability

β¯_¯_π^M(A) = ^X

h∈H

π¯h

n_h X

t∈Gh

β_h,t^M(Ah), where here and belowA={A_h, h∈ H}.

Denote for brevity

R_h(q, H) = 2[q+ log(n_h) +H]−log[4π(q+ log(n_h) +H)]

and

log^∗(x) = log[log(x)], H_π^∗_¯ = log(Hπ¯).

The next theorem shows that Rh(q^κ_α, Hπ¯) is a critical signal/noise ratio.

Roughly speaking, this means that if Ah

σ_h

¯

≤π ^qRh(q^κ_α, Hπ¯) +x

for any given x > 0, then the MAP multi-level test cannot discriminate betweenH₀ and H₁. Otherwise, if

Ah

σ_h

π¯

≥^qR_h(q_α^κ, Hπ¯) +^qH_π^∗_¯, for some >0, then reliable testing is possible.

In the next theorem, E_π_¯ stands for the expectation w.r.t. ¯π.

(15)

Theorem 3. Suppose (25) holds. If for somex∈Rand >0

¯lim

π→0

1 H_π^∗_¯E_¯_π

A_h σh

−x 2

+H_¯_π^∗−R_h(q_α^κ, H_π_¯)

+

= 0, (28)

then

¯lim

π→0

β¯_π^M_¯ (A)≥(1−α)[1−Φ(x)]. (29) If for some >0

¯lim

π→0

1 H_¯_π^∗E_π_¯

Rh(q_α^κ, H¯π) + 2 q

H_π^∗_¯A_h σh

−A²_h σ²_h

+

= 0, (30)

then

¯lim

π→0

β¯_π^M_¯ (A) = 0. (31)

3.2 Multi-level Bayes tests

To construct these tests, let us consider the following statistics:

Z_h^B= 1 n_h

X

t∈Gh

S θˆh,t

σ_h

−log(nh)−γ+ 1, h∈ H.

When all θ_h,t = 0, in view of Theorem 2, these random variables are ap- proximated by the family of independent and identically distributed random variablesζ_h^◦, h∈ H, defined by (16). An important property of this family is provided by (13), which is used in our construction multi-level Bayes tests.

More precisely, the multi-level Bayes test rejectsH₀ if X

h∈H

π¯_h

Z_h^B−log 1

¯πh

≥q_α^◦, whereq_α^◦ isα-value ofζ^◦.

The type II error probability over Θ_ρ,τ(A) (see (24)) is defined by β_ρ,τ^B (A) = sup

Θ∈Θ_ρ,τ(Aρ)

P_Θ X

h∈H

π¯_h

Z_h^B−log 1 π¯h

≤q_α^◦

and our goal is to analyze the average type II error probability β¯_π^B_¯(A) = ^X

h∈H

π¯_h n_h

X

τ∈G_h

β_h,τ^B (A_h).

(16)

Theorem 4. Suppose (25) holds and for somex∈Rand >0

¯lim

π→0

1 H_π_¯^∗E_π_¯

A_h σ_h −x

2

+H_π^∗_¯−R_h[log(q_α^◦), H_¯_π]

+

= 0, (32) then

¯lim

π→0

β¯^B_¯_π(A)≥(1−α)[1−Φ(x)]. (33) If for some >0

lim

¯π→0

1 H_π_¯^∗E_π_¯

R_hlog(q_α^◦), H_π_¯+ 2^qH_π^∗_¯A_h σh

−A²_h σ_h²

+

= 0, (34)

then

¯lim

π→0

β¯_¯_π^B(A) = 0. (35)

Remark. Notice that asα→0

log(q^◦_α) = (1 +o(1))q^κ_α = (1 +o(1)) log 1 α.

Therefore, since H_¯_π → ∞ as π¯ → 0, conditions (28) and (32) along with (30) and (34) are almost equivalent. This means that in the considered statistical problem there is no substantial difference between MAP and Bayes tests.

4 Adaptive multi-level tests

The main drawback of the MAP and Bayes tests is related to their depen- dence on the prior distribution ¯πthat is hardly known in practice. Therefore our next goal is to construct a test that, on the one hand, does not depend on ¯π, but on the other hand, has a nearly optimal critical signal-noise ratio.

In order to simplify our presentation, we will deal with the class of prior distributions ¯π^ω,ν defined by (26). The entropy of ¯π^ω,ν obviously satisfies

Hπ¯^ω,ν = log(ω) +Hν +o(1), ω→ ∞, (36) and therefore denote for brevity

Reh(q, ω) = 2[q+ log(nh) + log(ω)]−log[4π(q+ log(nh) + log(ω)]. (37) With (36), Condition (25) is checked easily and the next result follows immediately from Theorem3.

(17)

Corollary 1. If for some x∈R and >0

ω→∞lim 1

log^∗(ω)E_π_¯^ω,ν A_h

σh

−x 2

+log^∗(ω)−Re_h(q_α^κ, ω)

+

= 0, then

ω→∞lim

β¯_π^M_¯ω,ν(A)≥(1−α)[1−Φ(x)].

If for some >0

ω→∞lim 1

log^∗(ω)]E_π_¯^ω,ν

Re_h(q_α^κ, ω) + 2 q

log^∗(ω)A_h σ_h − A²_h

σ_h²

+

= 0, then

ω→∞lim

β¯_¯_π^Mω,ν(A) = 0.

In order to construct an adaptive test, let us compute a nearly minimal functionU_h in (21). We begin with

ψ₀(x) = 1 + log(x), x∈R⁺, and then iterate this functionmtimes

ψ_l(x) =ψ₀ψl−1(x), l= 1, . . . , m.

Finally, for givenε∈(0,1), define L^m,ε(k) =−log

1

ε[ψ_m(k)]^ε − 1 ε[ψ_m(k+ 1)]^ε

, k∈Z⁺. (38) Sinceψ_m(1) = 1, it is clear that

∞

X

k=1

exp[−L^m,ε(k)] = 1 ε.

In what follows, we will make use of the following approximation of L^m,ε(k) for large k. Denote (see (38))

Le^m,ε(k) =−log

−1 ε

[dψm(k)]^−ε dk

=−log

1 [ψ_m(k)]^1+ε

dψm(k) dk

= log(k) + log[ψ0(k)] +· · ·+ log[ψm−1(k)] + (1 +ε) log[ψm(k)].

(39)

Since

d²ψ_m(k) dk²

dψ_m(k) dk =O

1 k

,

(18)

Figure 2: The functions L^m,ε(·) and the approximation errors ∆^m,ε(·) for m= 1,2 and ε= 0.1.

by the Taylor formula we obtain from (38)

∆^m,ε(k) =L^e^m,ε(k)−L^m,ε(k) =O 1

ψm(k)

. (40)

Figure2 shows thatL^m,ε(k) and approximation errors ∆^m,ε(k).

Since h∈ H={2⁻¹, . . . ,2^−k, . . .}, we choose U_h =L^m,ε[log₂(1/h)]

and in view of (38) we arrive at the following prior distribution:

Π_h = 1

{ψ_m[log₂(1/h)]}^ε − 1

{ψ_m[log₂(1/h) + 1]}^ε, h∈ H.

It is easy to check that the entropy of Π is unbounded and this is why this distribution might be viewed as an improper prior.

The MAP test associated with Π rejects H₀ when maxh∈H

Z_h^M + log Π_h ≥q_α^κ

and its type II error probability over Θ_ρ,τ(A) (see (24)) is defined by β_ρ,τ^Π (A) = sup

Θ∈Θ_ρ,τ(A)

P_Θⁿmax

h∈H

Z_h^M + log(Π_h)≤q_α^κ^o. Denote for brevity

R⁺(q^κ_α, ω) =R(q^e _α^κ, ω) + log^∗(ω), (41)

(19)

whereR(qe α, ω) is defined by (37), and let β¯_π^Π_¯ω,ν A) = ^X

h∈H

π¯_h^ω,ν n_h

X

t∈G_h

β_h,t^Π(A_h) be the average type II error probability.

Theorem 5. If for some x∈R and >0

ω→∞lim 1

log^∗(ω)E_π_¯^ω,ν A_h

σ_h −x 2

+log^∗(ω)−R⁺_h(q_α^κ, ω)

+

= 0, then

ω→∞lim

β¯_π^Π_¯ω,ν(A)≥(1−α)[1−Φ(x)].

If for some >0

ω→∞lim 1

log^∗(ω)E_¯_π^ω,ν

R⁺_h(q^κ_α, ω) + 2 q

log^∗(ω)A_h σ_h − A²_h

σ_h²

+

= 0, then

ω→∞lim

β¯_¯_π^Πω,ν(A) = 0.

This theorem and Corollary1 demonstrate that the critical signal-noise ratio of the adaptive test is only slightly greater, see (41), (by the additive term log^∗(ω)) than the one of the MAP test that knows the prior distribution π¯^ω,ν.

(20)

5 Appendix

5.1 Proof of Theorem 1

With a simple algebra we obtain logEexpitBh(ξ) =nhlog

1

√2π Z ∞

−∞

cos tS(x)

n_h

e^−x²^/2dx + i

√ 2π

Z ∞

−∞

sin tS(x)

n_h

e^−x²^/2dx

=n_hlog

1 + Z ∞

−∞

cos

t n_h

1 Φ(x)−1

−1

dΦ(x) + i

Z ∞

−∞

sin t

nh

1 Φ(x)−1

dΦ(x)

=nhlog

1 + Z 1

0

cos

t n_h

1 u −1

−1

du + i

Z 1 0

sin t

n_h 1

u −1

du

=n_hlog

1 + Z ∞

0

1 (1 +u)²

cos

tu n_h

−1

du + i

Z ∞ 0

1 (1 +u)² sin

tu nh

du

=nhlog

1 + |t|

n_h Z ∞

0

cos(u)−1

(|t|/n_h+u)² du+ it n_h

Z ∞ 0

sin(u) (|t|/n_h+u)²du

.

(42)

It is clear that as n_h → ∞ Z ∞

0

cos(u)−1

(|t|/n_h+u)²du→ Z ∞

0

cos(u)−1

u² du=−π

2. (43)

Let us choose_h<1 such that

h→0limh = 0, lim

h→0hnh=∞.

(21)

Then we get by the Taylor formula Z ∞

0

sin(u)

(|t|/n_h+u)² du= Z h

0

sin(u)

(|t|/n_h+u)²du+ Z ∞

h

sin(u) (|t|/n_h+u)² du

= Z _h

0

u

(|t|/n_h+u)² du+O Z _h

0

u³

(|t|/n_h+u)² du

+ (1 +o(1)) Z ∞

_h

sin(u) u² du

= log

1 +_hn_h

|t|

− _hn_h (|t|+_hn_h) + (1 +o(1))

Z ∞ _h

sin(u)

u² du+O(²_h).

(44)

Next, integrating by parts, we obtain Z ∞

x

sin(z)

z² dz=− Z ∞

x

sin(z) z d

log1

z

=sin(x) x log1

x + Z ∞

x

zcos(z)−sin(z) z² log1

zdz.

Hence, asx→0 Z ∞

x

sin(z)

z² dz= log 1 x +

Z ∞ 0

zcos(z)−sin(z) z² log1

zdz+O

x²log1 x

= log 1

x + (1−γ) +O

x²log1 x

, whereγ is Euler’s constant.

With this equation we continue (44) as follows:

Z ∞ 0

sin(u)

(|t|/n_h+u)² du= log 1

|t|+ log(nh)−γ+O |t|

_hn_h

+O

²_hlog 1 _h

. Substituting this equation and (43) in (42), we get

logEexpitB_h(ξ) =−π|t|

2 + it

log 1

|t|+ log(n_h)−γ

+o(1), thus, proving the theorem.