A LOWER BOUND FOR A MEASURE OF PERFORMANCE OF AN EMPIRICAL BAYES ESTIMATOR

(1)

A LOWER BOUND FOR A MEASURE OF PERFORMANCE

OF AN EMPIRICAL BAYES ESTIMATOR

ROXANA CIUMARA

We derive a lower bound for the measure of performance of an empirical Bayes estimator for the scale parameterθ of a Pareto distribution by using a weighted squared-error loss function and assuming a prior distribution uniform on (0,1).

AMS 2000 Subject Classification: 62P05.

Key words: empirical Bayes, weighted squared-error loss, rate of convergence, measure of performance.

1. INTRODUCTION

Liang [2] and Tiwari and Zalkikar [5] proved some results concerning the performance of the empirical Bayes estimator of the scale parameter of Pareto distribution using a squared error loss function. In fact, Liang [2] relaxed the conditions imposed by Tiwari and Zalkikar [5] on the prior distribution and found a rate of convergence of ordern⁻3² – as compared ton⁻¹2 found in [5] – for the sequence of empirical Bayes estimator.

In [4], a weighted squared-error loss function was considered and an empirical Bayes estimator for the scale parameter of Pareto distribution was proposed. There was assumed that the weights are given by a function which satisﬁes certain properties. It was proved that under certain condition, the empirical Bayes estimator of the scale parameter is asymptotic optimal and the corresponding rate of convergence is of ordern⁻²³.

In this paper we derive a lower bound for the diﬀerence between the overall Bayes risk of the sequence of empirical Bayes estimators of the scale parameter of a Pareto distribution found in [4], and the Bayes risk of Bayes estimator of the same parameter. This diﬀerence is taken as a measure of performance of the empirical Bayes estimators. The loss function we use here is of weighted quadratic type.

The paper is structured as follows. In Section 2 there is described the Pareto distribution with a known shape parameterα and unknown scale parameter θ. Furthermore, the conditions that have to be satisﬁed in order to

MATH. REPORTS9(59),3 (2007), 257–266

(2)

obtain the results from Sections 4 are stated. We deﬁne the Bayes risk for a weighted squared-error loss and the overall Bayes risk for a sequence of empirical Bayes estimators. Next, asymptotic optimality and rate of convergence for a sequence of empirical Bayes estimators are deﬁned. Moreover, we recall some results of Preda and Ciumara [4] that we will apply in order to obtain the main result of this paper.

In Section 3, considering a uniform distibution for the prior, we ﬁnd a lower bound for the diﬀerence between overall Bayes risk of the sequence of empirical Bayes estimators and the the Bayes risk of Bayes estimator.

The result obtained here generalizes the result of Liang [2] in the sense that if the weights function is constant and equal to unity, then we recover the case presented there. In fact, if the weights function satisﬁes certain properties, then the lower bound is the same as in the squared-error loss function case.

2. PRELIMINARIES

We consider a random variable X having a Pareto distribution for a given θ. The parameter θ is a value of a random variable Θ that has a prior distribution functionG: (0,∞)→[0,1]. Then the probability density function ofX|Θ =θ is

f(x|θ) = αθ^α x^α+1,

where x > θ, α > 0 and θ > 0. The shape parameter α is known while the scale parameterθ is unknown. The marginal density ofX is

f(x) =

_min(x,m)

0

f(x|θ) dG(θ) =

_min(x,m)

0

f(x|θ)g(θ)dθ, where dG(θ) =g(θ)dθ.

Ifϕ(θ) =αθ^α and u(x) = _x_α+1¹ , thenf(x|θ) =ϕ(θ)u(x) and f(x) =u(x)

_min(x,m)

0 αθ^αdG(θ).

We impose (see[2]) the following conditions on the prior distributionG:

(A1)G(m) = 1 for some known positive real number m.

(A2) If a^∗ = sup{θ|G(θ) = 0}, then f is a decreasing function in x on (a^∗, m].

As in [4], we consider the weighted squared-error loss L : R²₊ → R+

deﬁned as L(x, θ) = w(θ) (x−θ)²,with the weights function w :R₊ →R^∗₊ continuous and diﬀerentiable.

The conditions (see[4]) that whas to satisfy are

(3)

(A3)∃ c₁ ∈R^∗₊ such that w(θ)≤c₁,∀θ∈R+.

(A4)∃ c₂ ∈R^∗₊ such that 0≤w(θ) +θw(θ)≤c₂,∀θ∈R₊ and

∃ε >0 such thatw(θ) +θw(θ)> εon (0, m].

(A5)∃ε₀ >0 such thatε₀ < q(x) =E(w(Θ)|X=x),∀x∈(0, m].

Then the Bayes estimator ofθ givenX =x is

(2.1) ϕ_G(x) = E(Θw(Θ)|X =x)

E(w(Θ)|X =x) ,

assuming that all posterior expectations involved in the above expression exist andE(w(Θ)|X =x)= 0.

The Bayes risk ofϕ_G is R(G, ϕ_G) =E

w(Θ)(ϕ_G(X)−Θ)² , where the expectation is taken with respect to (X,Θ).

LetX₁, X₂, . . . , X_n be the past data, that are independently and inden- tically distributed random variables with probability density function f(x).

We denote byX_n= (X₁, X₂, . . . , X_n) andϕ_n(X) =ϕ_n(X, X_n) the empirical Bayes estimator of the parameter θ based on past data X_n and the present observationX.

Definition 2.1. The conditional Bayes risk ofϕ_n given X_n is R(G, ϕ_n|X_n) =E

w(Θ)(ϕ_n(X)−Θ)²|X_n and the overall Bayes risk ofϕ_n is

R(G, ϕ_n) =E(R(G, ϕ_n|X_n)). Here the expectation is taken with respect toX_n.

Becauseϕ_G is the Bayes estimator, we have R(G, ϕ_G)≤R(G, ϕ_n|X_n)

∀X_n vector of past data and ∀n∈N^∗. Moreover, (2.2) R(G, ϕ_G)≤R(G, ϕ_n), ∀n∈N^∗.

Definition 2.2. The nonnegative diﬀerence R(G, ϕ_n) −R(G, ϕ_G) is a measure of performance of the empirical Bayes estimatorϕ_n.

We recall (see [3]) that a sequence of empirical Bayes estimators (ϕ_n)_n≥1 is said to be asymptotically optimal if

R(G, ϕ_n)−R(G, ϕ_G) −→

n→∞0.

(4)

Moreover, ifR(G, ϕ_n)−R(G, ϕ_G) =O(α_n), where (α_n)_n≥1 is a sequence of real numbersα_n>0 andα_n −→

n→∞0, then (ϕ_n)_n≥1 is said to be asymptotically optimal with convergence rate of orderα_n.

In [4] there were proved Theorems 2.1 and 2.2 below.

Theorem 2.1. Under assumption (A1)–(A5), the Bayes estimator of the scale parameter for Pareto distribution is given by

(2.3) ϕ_G(x) =





xw(x) −_xα+1^M(x)f(x) if 0< x≤m, mw(m)−_mα+1^M(m)f(m) if x > m,

wherew(x) = ^w(x)_q(x),M(x) = ^M_q(x)^(x) and M(x) = ₀^xθ^α+1(w(θ) +θw(θ)) dF(θ).

If (b_n)_n≥1 is a sequence of strictly positive numbers satisfyingb_n −→

n→∞0 andnb_n −→

n→∞∞, we deﬁne

f_n(x) = F_n(x+b_n)−F_n(x)

b_n ,

where F_n(x) is the empirical distribution function based on X₁, X₂, . . . , X_n. We note thatf_n(x) can be expressed as

(2.4) f_n(x) = 1

nb_n n j=1

I_(x,x+b_n_](X_j).

Moreover,E(f_n(x)) −→

n→∞f(x). Thus,f_n(x) is a consistent estimator of f(x) (see[1]).

In [4], there is constructed a consistent estimator ofM(x), namely, (2.5) M_n(x) = 1

n n j=1

X_j^α+1

w(X_j) +X_jw(X_j)

I_(0,x)(X_j).

Next the empirical Bayes estimator for the scale parameter θ proposed in [4] is given by

ϕ_n(X) =

Xw(X) − M_n(X) X^α+1f_n(X)

I_(0,m](X)∨0

+ (2.6)

+

mw(m)− M_n(m) m^α+1f_n(m)

I_(m,∞)(X)∨0

, whereM_n= ^M_qⁿ and a∨b= max (a, b).

(5)

The asymptotic optimality of empirical Bayes estimator (2.6) is asserted by the result below.

Theorem2.2. If (b_n)_n≥1 is a sequence of strictly positive numbers sat- isfying b_n −→

n→∞0 and nb_n −→

n→∞ ∞, (ϕ_n)_n is the sequence of empirical Bayes estimators (2.6) and ϕ_G is the Bayes estimator (2.3), then

R(G, ϕ_n)−R(G, ϕ_G) =O 1

n

+O 1

nb_n

+O b²_n

.

3. A LOWER BOUND FORR(G, ϕn)−R(G, ϕG) WHENG IS A UNIFORM DISTRIBUTION FUNCTION

We suppose thatα >1,m= 1 andGis the uniform on (0,1) cumulative distribution function. Therefore,g(θ) = 1 forθ∈(0,1). Then

f(x) =





 α

α+ 1 if 0< x≤1 α

α+ 1· 1

x^α+1 ifx >1.

It is clear that f is decreasing on (0,∞) and conditions (A1) and (A2) are fulﬁlled.

Remark 3.1. For x < 1, with the notation from Section 2 we have E(f_n(x)) =f(x).

We will prove thatR(G, ϕ_n)−R(G, ϕ_G)≥ O

nb1n

+O

b²_n

. In order to do this we need some preliminary results which will be proved following the lines in Liang [2].

First, we take δ∈ 0,¹₄

and for eachx∈(0,1−δ] we deﬁne B_n(x) =I

M_n(x)

x^α+1f_n(x) ≤xw(x)

and

B_n^c(x) =I

M_n(x)

x^α+1f_n(x) > xw(x)

. Lemma3.1. We have lim

n→∞E(B_n(x)) = 1and lim

n→∞E(B_n^c(x)) = 0.

Proof. We consider only the case wherenis suﬃciently large, such that b_n< δ. Note that forx ∈(0,1−δ] we have E(f_n(x)) =f(x) and then

E

M_n(x)−x^α+2w(x)f_n(x)

=M(x) −x^α+2w(x)f (x),

(6)

that we suppose to be ﬁnite and diﬀerent from zero. Next, we evaluate Var

≤

≤ 1 n

1 q²(x)Var

X_j^α+1

w(X_j) +X_jw(X_j)

I_(0,x)(X_j)

+ +1

n

x^2(α+2)w²(x) b²_n Var

I_(x,x+b_n_](X_j) .

Because

Var

X_j^α+1

w(X_j) +X_jw(X_j)

I_(0,x)(X_j) ≤c²₂ and

Var

I_(x,x+b_n_](X_j)

= _x+b_n

x

α

α+ 1dy−

_x+b_n

x

α α+ 1dy

2

=

=b_nf(x) (1−f(x)b_n) , we have

Var

≤ 2 n

c²₂

q²(x) +2x^2(α+2)w²(x)f(x)

nb_n −→

n→∞0.

Furthermore, the Chebychev inequality ([1]) yields E(B_n^c(x)) = Pr

M_n(x)

x^α+1f_n(x) > xw(x)

= Pr

M_n(x)−x^α+2f_n(x)w(x) −

−M(x)−x^α+2f(x)w(x)

>−M(x)−x^α+2f(x)w(x)

≤

≤ Var

M_n(x)−x^α+2f_n(x)w(x) M(x)−x^α+2w(x) _α+1^α ₂ −→

n→∞0.

Then, lim

n→∞E(B_n^c(x)) = 0 and lim

n→∞E(B_n(x)) = 1.

Lemma 3.2. We have ₀¹E

q(x) (ϕ_n(x)−ϕ_G(x))²

f(x) dx≥ O

nb1n

. Proof. For x ∈ (0,1) we have ϕ_G(x) = xw(x) − _xα+1^M(x)f(x) = x^w(x)_q(x) −

M(x)

q(x)·x^α+1f(x) andϕ_n(x) =xw(x) −_xα+1^Mⁿf^(x)_n(x) =x^w(x)_q(x) − _q(x)·x^Mα+1ⁿ^(x)f_n(x). Then ϕ_n(x)−ϕ_G(x) = 1

q(x)

M_n(x)

x^α+1f_n(x)− M(x) x^α+1f(x)

.

(7)

As in the proof of Theorem 4.1 from [4], we have R(G, ϕ_n)−R(G, ϕ_G) =

₁

0 E

q(x) (ϕ_n(x)−ϕ_G(x))²

f(x)dx+

+E

q(1) (ϕ_n(1)−ϕ_G(1))²

·(1−F(1)). We see that ₁

0

E

q(x) (ϕ_n(x)−ϕ_G(x))²

f(x)dx≥

≥ _1−δ

0 E

q(x) (ϕ_n(x)−ϕ_G(x))²B_n(x)

f(x)dx, where

q(x) (ϕ_n(x)−ϕ_G(x))²B_n(x) = 1 q(x)

M_n(x)

x^α+1f_n(x) − M(x) x^α+1f(x)

₂

B_n²(x) =

= 1

q(x)

α(M_n(x)−M(x))−(α+ 1)M(x) (f_n(x)−f(x)) αx^α+1f_n(x)

₂

B_n²(x).

We know thatE(f_n(x)) =f(x), Var (f_n(x)) = _nb¹

nf(x) (1−f(x)b_n) and

n→∞limb_n= 0. Therefore, by the Central Limit Theorem ([1]), we get (α+ 1)M(x)

q(x)

nb_n(f_n(x)−f(x)) −→^d

n→∞N

0,(α+ 1)²M²(x) q(x) f(x)

. Moreover,

α√ nb_n

q(x) (M_n(x)−M(x)) −→^P

n→∞0, αx^α+1f_n(x) −→^P

n→∞αx^α+1f(x), and

n→∞limE(B_n(x)) = 1.

It follows from the Slutsky theorem that nb_n





√α

q(x)(M_n(x)−M(x))−^(α+1)M(x)√

q(x) (f_n(x)−f(x)) αx^α+1f_n(x)



B_n(x) −→^d

n→∞

−→d n→∞N

0,

α+ 1 α

₂ M²(x) x^2α+2q(x)f(x)

. Now, we get

n→∞limE

nb_nq(x) (ϕ_n(x)−ϕ_G(x))²B_n(x)

≥

α+ 1 α

₂ M²(x) x^2α+2q(x)f(x),

(8)

for allx∈(0,1−δ] andb_n< δ. By Fatou’s lemma we obtain

n→∞lim _1−δ

0 E

f(x) dx≥

≥ _1−δ

0 lim

n→∞E

f(x)dx≥

≥ _1−δ

0

M(x) x^α+1f(x)

₂ 1

q(x)dx≥ ε² c₁(α+ 2)²

(1−δ)³

3 .

Finally, we have ₁

0 E

q(x) (ϕ_n(x)−ϕ_G(x))²

f(x)dx≥ O 1

nb_n

.

Lemma 3.3. We have E

q(1) (ϕ_n(1)−ϕ_G(1))²

≥ O b²_n

for α≥1.

Proof. Denote

A_n=

q(1)|ϕ_n(1)−ϕ_G(1)| ≥c_n , wherec_n= √^M(1)

q(1)f(1)b_n. Then E

q(1) (ϕ_n(1)−ϕ_G(1))²

=E

q(1) (ϕ_n(1)−ϕ_G(1))²B_n(1)

+

+E

q(1) (ϕ_n(1)−ϕ_G(1))²B_n^c(1) .

We have E

q(1) (ϕ_n(1)−ϕ_G(1))²B_n^c(1)

=q(1)ϕ²_G(1) Pr (B_n^c(1)) . Furthermore,

E

q(1) (ϕ_n(1)−ϕ_G(1))²B_n(1)

≥c²_nPr (A_n∩B_n(1))≥

≥c²_nPr

M_n(1)−f_n(1)

M(1)

f(1) +c_n q(1)

≥0 ∩B_n(1)

. Thus,

E

q(1) (ϕ_n(1)−ϕ_G(1))² ≥

≥c²_nPr

M_n(1)−f_n(1)

M(1)

f(1) +c_n q(1)

≥0 ∩B_n(1)

+ +q(1)ϕ²_G(1) Pr (B_n^c(1))≥

≥c²_nPr

M_n(1)−f_n(1)

M(1)

f(1) +c_n q(1)

≥0 ,

(9)

since fornsuﬃciently large we havec²_n≤q(1)ϕ²_G(1) becausec_n= √^M⁽¹⁾

q(1)f(1)b_n

n→∞−→ 0. Now, we get

E

q(1) (ϕ_n(1)−ϕ_G(1))² ≥

≥c²_nPr

M_n(1)−f_n(1)

M(1)

f(1) +c_n q(1)

≥0 =

=c²_nPr

M_n(1)−f_n(1)

M(1)

f(1) +c_n q(1)

−

−E

M_n(1)−f_n(1)

M(1)

f(1) +c_n q(1)

≥

≥ −E

M_n(1)−f_n(1)

M(1)

f(1) +c_n

q(1) .

We also note that E

M_n(1)−f_n(1)

M(1)

f(1) +c_n q(1)

=

=M(1)

1− 1 b_n

_1+b_n

1

y^α+1dy−c_n

q(1)f(1) b_n·M(1)

_1+b_n

1

1 y^α+1

≥

≥M(1)

1− 1 b_n

_1+b_n

1

1 y²dy−

_1+b_n

1

1 y²

= 0.

Hence, by the last relation and the Central Limit Theorem ([1]), Pr

M_n(1)−f_n(1)

M(1)

f(1) +c_n q(1)

≥0 ≥

≥Pr

M_n(1)−f_n(1)

M(1)

f(1) +c_n q(1)

−

−E

M_n(1)−f_n(1)

M(1)

f(1) +c_n q(1)

≥0 −→

n→∞

1 2. Now, we can conclude that

E

q(1) (ϕ_n(1)−ϕ_G(1))²

≥ O c²_n

=O b²_n

.

The above results allow one to establish the main result of this paper.

Theorem 3.1. We have R(G, ϕ_n)−R(G, ϕ_G)≥ O

nb1n

+O

b²_n .

(10)

Proof. By Theorem 4.1 in [4] and Lemmas 3.1–3.3, we have R(G, ϕ_n)−R(G, ϕ_G) =

₁

0 E

q(x) (ϕ_n(x)−ϕ_G(x))²

f(x)dx+

+E

q(1) (ϕ_n(1)−ϕ_G(1))²

·(1−F(1))≥ O 1

nb_n

+O b²_n

.

REFERENCES

[1] M. Iosifescu, G. Mihoc and R. Theodorescu, Teoria probabilit˘at¸ilor ¸si statistica mate- matic˘a. Editura Tehnic˘a, Bucure¸sti, 1966.

[2] T.C. Liang,Convergence rates for empirical Bayes estimation of the scale parameter in a Pareto distribution. Comput. Statist. Data Anal.16(1993), 35–45.

[3] V. Preda,Teoria deciziilor statistice. Ed. Academiei Romˆane, Bucure¸sti, 1992.

[4] V. Preda and R. Ciumara,Convergence rates in empirical Bayes problems with weighted squared error loss: The Pareto distribution case.Rev. Roumaine Math. Pures Appl.52 (2007),6.

[5] R.C. Tiwari and J.N. Zalkikar,Empirical Bayes estimation of the scale parameter in a Pareto distribution. Comput. Statist. Data Anal.10(1990), 261–270.

Received 30 November 2006 Academy of Economic Studies

Department of Mathematics Calea Dorobant¸ilor 15-17 010552 Bucharest, Romania roxana [email protected]