CONVERGENCE RATES IN EMPIRICAL BAYES PROBLEMS WITH A WEIGHTED SQUARED-ERROR LOSS. THE PARETO DISTRIBUTION CASE

(1)

CONVERGENCE RATES IN EMPIRICAL BAYES PROBLEMS WITH A WEIGHTED SQUARED-ERROR

LOSS. THE PARETO DISTRIBUTION CASE

VASILE PREDA and ROXANA CIUMARA

We study the problem of estimating the scale parameterθfor a Pareto distribution under a weighted squared-error loss through the empirical Bayes approach. An empirical Bayes estimator is proposed and some asymptotic optimality properties are given. Also, under certain conditions, the empirical Bayes estimator proposed is asymptotically optimal with rate of convergence of ordern⁻²³.

AMS 2000 Subject Classification: 62P05.

Key words: empirical Bayes, weighted squared-error loss, asymptotic optimality, rate of convergence.

1. INTRODUCTION

Robbins [12] argued that, for some estimation problems, the information obtained at each step could be used to improve the next step decision. These procedures, known as empirical Bayes or adaptive methods, were studied by many authors and among them we remind the work of Johns [4], Samuel [13], Berger and Berliner [2], Preda [9, 11], Tiwari and Zalkikar [16], Liang [5], and Singh [15].

The usefulness of empirical Bayes estimation in practical statistical applications depends on the overall risks rate of convergence to optimal risk.

The problem of convergence rates for empirical Bayes estimates was studied by Lin [6], Preda [8] and by Tiwari and Zalkikar [16] and Liang [5], considering a squared-error loss.

Tiwari and Zalkikar [16] found that, under certain conditions, the empirical Bayes estimator for the scale parameter in the Pareto distribution is asymptotically optimal and the rate of convergence is of ordern⁻¹². Liang [5]

used the same squared-error loss, but relaxed the conditions stated in Tiwari and Zalkikar [16] and proved that the empirical Bayes estimator proposed of the same parameter of the Pareto distribution is asymptotically optimal with associated rate of convergence of ordern⁻²³.

REV. ROUMAINE MATH. PURES APPL.,52(2007),6, 673–682

(2)

In this paper, we consider a weighted squared-error loss and propose an empirical Bayes estimator for the scale parameter of the Pareto distribution.

We assume that the weights are given by a function which satisﬁes certain properties.

In Section 2, we describe the Pareto distribution with a known shape parameter α and unknown scale parameter θ. Furthermore, the conditions that have to be satisfied in order to obtain the results from Sections 3 and 4 are stated. We define the Bayes risk for a weighted squared-error loss and the overall Bayes risk for a sequence of empirical Bayes estimators. Next, asymptotic optimality and rate of convergence for a sequence of empirical Bayes estimators are defined.

In Section 3, we consider the conditions imposed in Section 2 and propose an empirical Bayes estimator for the unknown scale parameter of the Pareto distribution for a class of prior distributions. A useful study of the Pareto distribution could be found in Arnold [1] and Preda [10].

In Section 4, we study asymptotic optimality and prove that under the conditions assumed the rate of convergence is of ordern⁻²³.

2. SOME PRELIMINARIES

LetXbe a random variable having a Pareto distribution with probability density function

f(x|θ) = αθ^α x^α+1,

wherex > θ,α >0 andθ >0. The shape parameterα is known and the scale parameterθ is unknown. We suppose that the parameterθ represents a value of a random variable Θ, which has a prior distribution functionG: (0,∞)→ [0,1]. In this case, the marginal density of X is given by

f(x) =

_min(x,m)

0 f(x|θ) dG(θ) =

_min(x,m)

0 f(x|θ)g(θ)dθ, where dG(θ) =g(θ)dθ.

Denotingf(x|θ) =ϕ(θ)u(x), whereϕ(θ) =αθ^αandu(x) = _x_α+1¹ , we get f(x) =u(x)

_min(x,m)

0 αθ^αdG(θ) or

x^α+1f(x) =

_min(x,m)

0 αθ^αdG(θ).

As for the prior distributionG, we impose the conditions below.

Conditions on G (Liang [5]):

(3)

(A1)G(m) = 1 for some known positive real numberm.

(A2) If a^∗ = sup{θ|G(θ) = 0} then f is a decreasing function in x on (a^∗, m].

We consider the problem of estimating the parameterθunder a weighted squared-error loss,L:R²₊→R+ deﬁned as

L(x, θ) =w(θ) (x−θ)²,

with a weight function w : R+ → R^∗₊ continuous and diﬀerentiable. The robustness of loss function of this type was studied by Makov [7].

Next, we suppose thatw satisﬁes the conditions below.

Conditions onw:

(A3)∃c₁ ∈R^∗₊ such thatw(θ)≤c₁,∀θ∈R₊.

(A4)∃c₂ ∈R^∗₊ such that 0≤w(θ) +θw(θ)≤c₂,∀θ∈R+ and

∃ε >0 such thatw(θ) +θw(θ)> εon (0, m].

(A5)∃ε₀ >0 such thatε₀ < q(x) =E(w(Θ)|X=x),∀x∈(0, m].

Example 2.1. Ifw :R+ → R+, w(θ) = _1+θ¹ , we get 0 ≤w(θ) ≤ 1 and 0 ≤ w(θ) +θw(θ) ≤ 1, thus, conditions (A3) and (A4) are satisﬁed. If we consider a uniform on [0,1] prior distribution, andm= 1, then condition (A5) also holds since ¹₂ < q(x)≤1.

The Bayes estimator of θgiven X=xis

(2.1) ϕ_G(x) = arg minE(L(X,Θ)|X=x) = E(Θw(Θ)|X=x) E(w(Θ)|X =x)

assuming that all posterior expectations involved in the above expression exist andE(w(Θ)|X =x)= 0.

The Bayes risk ofϕ_G is

R(G, ϕ_G) =E(L(ϕ_G(X),Θ)) =E

w(Θ) (ϕ_G(X)−Θ)² ,

where the expectation is taken with respect to (X,Θ).

Let X₁, X₂, . . . , X_n be the past data, independent and indentically distributed random variables with probability density functionf(x). Denote by X_n= (X₁, X₂, . . . , X_n) andϕ_n(X) =ϕ_n(X, X_n) the empirical Bayes estimator of the parameterθ based on past dataX_n and the present observationX.

The conditional Bayes risk ofϕ_n given X_n is R(G, ϕ_n|X_n) =E

w(Θ) (ϕ_n(X)−Θ)²|X_n and

R(G, ϕ_n) =E(R(G, ϕ_n|X_n))

(4)

is the overall Bayes risk of ϕ_n. Here the expectation is taken with respect to X_n.

We note that becauseϕ_G is the Bayes estimator, that is, ϕ_G(x) = arg minE(L(X,Θ)|X =x) we have

R(G, ϕ_G)≤R(G, ϕ_n|X_n)

∀X_n vector of past data and ∀n∈N^∗. Moreover, (2.2) R(G, ϕ_G)≤R(G, ϕ_n) ∀n∈N^∗.

Thus,R(G, ϕ_n)−R(G, ϕ_G) is nonnegative and could be used as a mea- sure of performance of the empirical Bayes estimator ϕ_n.

Definition 2.1 (Robbins [12], Preda [11], Liang [5]). A sequence (ϕ_n)_n≥1 of empirical Bayes estimators is said to be asymptotically optimal if

R(G, ϕ_n)−R(G, ϕ_G) −→

n→∞0.

Moreover, if R(G, ϕ_n)−R(G, ϕ_G) =O(α_n), where (α_n)_n≥1 is a sequence of real numbersα_n>0 andα_n −→

n→∞0, then (ϕ_n)_n≥1 is said to be asymptotically optimal with convergence rate of orderα_n.

3. THE EMPIRICAL BAYES ESTIMATOR

In order to propose an empirical Bayes estimator, we ﬁrst have to derive the Bayes estimator.

Theorem3.1. Under conditions (A1)–(A5), the Bayes estimator of the scale parameter for the Pareto distribution is given by

(3.1) ϕ_G(x) =





xw(x) − _xα+1^M^(x)f(x) if 0< x≤m mw(m) −_mα+1^M^(m)f(m) if x > m,

wherew(x) = ^w(x)_q(x),M(x) = ^M(x)_q(x) andM(x) = ₀^xθ^α+1(w(θ) +θw(θ)) dF(θ).

Proof. On account of (2.1), we evaluate the numeratorE(Θw(Θ)|X=x) in the expression of the Bayes estimator.

For 0< x≤m we have E(Θw(Θ)|X =x) =xw(x)

f(x) f(x)− 1 x^α+1f(x)

_x

0 θ^α+1

w(θ) +θw(θ) dF(θ) that is,

(3.2) E(Θw(Θ)|X =x) =xw(x)− 1

x^α+1f(x)M(x),

(5)

whereM(x) = ₀^xθ^α+1(w(θ) +θw(θ)) dF(θ). It follows from condition (A4), thatM(x)≥0. SinceE(Θw(Θ)|X =x)≥0, we get

(3.3) 1

x^α+1f(x)M(x)≤xw(x) and

(3.4) E(Θw(Θ)|X =x)≤xw(x).

Forx > mwe have

E(Θw(Θ)|X =x) = m^α+2w(m)

x^α+1f(x) f(m)− M(m) x^α+1f(x). Since

(3.5) x^α+1f(x) =m^α+1f(m)

forx > m, we get

(3.6) E(Θw(Θ)|X =x) =mw(m)− M(m) m^α+1f(m), for allx > m.

Because E(w(Θ)|X =x) = q(x) > 0 , w(x) = ^w(x)_q(x) and M(x) = ^M_q(x)^(x), for 0< x≤m we obviously have

ϕ_G(x) =xw(x) − M(x) x^α+1f(x). Sinceq(x) =q(m), forx > mwe have

ϕ_G(x) =mw(m) − M(m) m^α+1f(m). Thus, for x > m,

ϕ_G(x) =ϕ_G(m).

We proved before that _x_α+1¹_f(x)M(x) ≤ xw(x) for 0 < x ≤ m. Since we imposed condition (A5) andq(x)>0, the previous inequality implies

(3.7) M(x)

x^α+1f(x) ≤xw(x).

Now, we can express the Bayes estimator ofθas ϕ_G(x) =





xw(x) −_xα+1^M(x)f(x) if 0< x≤m mw(m) −_mα+1^M(m)f(m) ifx > m.

(6)

Let (b_n)_n≥1 be a sequence of strictly positive real numbers such that b_n −→

n→∞0 and nb_n −→

n→∞∞. We deﬁne

f_n(x) = F_n(x+b_n)−F_n(x)

b_n ,

where F_n(x) is the empirical distribution function based on X₁, X₂, . . . , X_n. We note thatf_n(x) can be expressed as

(3.8) f_n(x) = 1

nb_n n j=1

I_(x,x+b_n_](X_j) .

Moreover, E(f_n(x)) = ^F^(x+bⁿ_b^)−F^(x)

n _n→∞−→ f(x). Thus, f_n(x) is a consistent estimator off(x) (Iosifescu, Mihoc and Theodorescu [3]).

Next, deﬁne (3.9) M_n(x) = 1

n n j=1

X_j^α+1

w(X_j) +X_jw(X_j)

I_(0,x)(X_j) .

We can easily see that E(M_n(x)) =M(x) since X₁, X₂, . . . , X_n are independent and identically distributed random variables:

E(M_n(x)) = 1 nnE

X_j^α+1

w(X_j) +X_jw(X_j)

I_(0,x)(X_j)

=

= _x

0 θ^α+1

w(θ) +θw(θ)

dF(θ) =M(x).

Thus, M_n(x) is a consistent estimator of M(x) (Iosifescu, Mihoc and Theo- dorescu [3]).

The empirical Bayes estimator for the scale parameterθthat we propose is given by

(3.10) ϕ_n(X) =

Xw(X) − M_n(X) X^α+1f_n(X)

I_(0,m](X)∨0

+

mw(m) − M_n(m) m^α+1f_n(m)

I_(m,∞)(X)∨0

, whereM_n= ^M_qⁿ and a∨b= max (a, b).

(7)

4. ASYMPTOTIC OPTIMALITY OF

THE EMPIRICAL BAYES ESTIMATOR PROPOSED In this section we study the asymptotic optimality of empirical Bayes estimator. Our analysis is based on conditions (A1)–(A5). The main result is as follows.

Theorem4.1. If (b_n)_n≥1 is a sequence of strictly positive real numbers such that b_n −→

n→∞0and nb_n −→

n→∞∞, (ϕ_n)_n is the sequence of empirical Bayes estimators (3.10) and ϕ_G is the Bayes estimator (3.1), then

R(G, ϕ_n)−R(G, ϕ_G) =O 1

n

+O 1

nb_n

+O b²_n

.

Proof. SinceR(G, ϕ_G)≤R(G, ϕ_n) and condition (A3) holds, we have 0≤R(G, ϕ_n)−R(G, ϕ_G) =

=E(R(G, ϕ_n|X_n))−E

w(Θ) (ϕ_G(X)−Θ)² ≤

≤c₁E

(ϕ_n(X)−ϕ_G(X))² . Moreover,

0≤R(G, ϕ_n)−R(G, ϕ_G)≤c₁E

(ϕ_n(X)−ϕ_G(X))²

=

=c₁ _m

0 E

(ϕ_n(x)−ϕ_G(x))²

f(x)dx+

_∞

m E

(ϕ_n(x)−ϕ_G(x))² f(x)dx

.

For x > m we have ϕ_G(x) = ϕ_G(m) and ϕ_n(x) = ϕ_n(m). Therefore, ϕ_n(x)−ϕ_G(x) =ϕ_n(m)−ϕ_G(m) and, consequently,

_∞

m E

(ϕ_n(x)−ϕ_G(x))²

f(x)dx=E

(ϕ_n(m)−ϕ_G(m))²

·(1−F(m)) . Assume now that 0 < x ≤ m. In this case, on account of conditions (A4) we have 0≤ ϕ_G(x) ≤xw(x) = x^w(x)_q(x) and 0 ≤ϕ_n(x)≤ xw(x) = x^w(x)_q(x) becauseM_n(x)≥0. We thus obtain

|(ϕ_n(x)−ϕ_G(x))| ≤xw(x) = xw(x) q(x).

So, considering the expressions of ϕ_n(x) and ϕ_G(x) and following the same reasoning as in Singh [14], we get

E

(ϕ_n(x)−ϕ_G(x))²

=E

M_n(x)

q(x)·x^α+1f_n(x) − M(x) q(x)·x^α+1f(x)

₂

≤

(8)

≤ 8 f²(x)q²(x)E

M_n(x)−M(x) x^α+1

₂ +

+ 8

f²(x)q²(x)

M(x) x^α+1f(x)

₂

+x²w²(x) 2

E

(f_n(x)−f(x))² .

SinceE(M_n(x)) =M(x), we have E

M_n(x)−M(x) x^α+1

₂

= Var

M_n(x) x^α+1

=

= Var



1 n

n j=1

X_j^α+1 x^α+1

w(X_j) +X_jw(X_j)

I_(0,x)(X_j)



≤ 1 nc²₂ from condition (A4). Moreover,

E

(f_n(x)−f(x))²

= Var (f_n(x)) + (E(f_n(x))−f(x))² with

Var (f_n(x)) = Var



 1 nb_n

n j=1

I_(x,x+b_n_](X_j)



≤ 1 nb²_n

_x+b_n

x f(y)dy≤ f(x) nb_n ,

where the last inequality holds because of condition (A2) while, from Liang [5], (E(f_n(x))−f(x))² ≤f²(x)(α+ 1)²b²_n

4x² . Finally,

M(x) x^α+1f(x)

₂

+x²w²(x)

2 ≤x²w²(x) +x²w²(x)

2 ≤2x²w²(x). Now, on account of the above expressions, for 0< x≤m we have E

(ϕ_n(x)−ϕ_G(x))²

≤ 8c²₂

nf²(x)q²(x)+ 16x²c²₁

nb_nf(x)q²(x) +4c²₁(α+ 1)²b²_n q²(x) and

E

(ϕ_n(m)−ϕ_G(m))²

≤ 8c²₂

nf²(m)q²(m) + 16m²c²₁

nb_nf(m)q²(m) +4c²₁(α+1)²b²_n q²(m) =

=O 1

n

+O 1

nb_n

+O b²_n

.

(9)

Then, since (A2) and (A5) hold, we get _m

0

E

(ϕ_n(x)−ϕ_G(x))²

f(x)dx≤1 n

8mc²₂ ε²₀f(m)+ 1

nb_n

16m³c²₁

3ε²₀ +b²_n4c²₁(α+1)² ε²₀ =

=O 1

n

+O 1

nb_n

+O b²_n

.

Summaryzing the results obtained by now, we have 0≤R(G, ϕ_n)−R(G, ϕ_G) =O

1 n

+O

1 nb_n

+O

b²_n

.

Remark 4.1. Under the conditions of Theorem 3.1, ifw(θ) = 1 then we get the Bayes estimator and, respectively, the empirical Bayes estimator from Liang [5].

REFERENCES

[1] B.C. Arnold,Pareto Distributions.International Co-operative Publishing House, Fair- land, MD, 1983.

[2] J. Berger and L.M. Berliner,Robust Bayes empirical Bayes analysis withε-contaminated priors. Ann. Statist.14(1986),2, 461–486.

[3] M. Iosifescu, G. Mihoc and R. Theodorescu,Teoria probabilit˘at¸ilor ¸si statistica matema- tic˘a.Ed. Tehnic˘a, Bucure¸sti, 1966.

[4] M.V. Johns, Jr., Nonparametric empirical Bayes procedures. Ann. Math. Statist. 28 (1957), 649–669.

[5] T.C. Liang,Convergence rates for empirical Bayes estimation of the scale parameter in a Pareto distribution. Comput. Statist. Data Anal.16(1993), 35–45.

[6] P.E. Lin, Rates of convergence in empirical Bayes estimation problems: Continuous case. Ann. Statist.3(1975), 155–164.

[7] U. Makov, Loss robustness via Fisher-weighted squared-error loss function. Insurance Math. Econom.16(1995), 1–6.

[8] V. Preda, Entropia ponderat˘a ¸si problema de select¸ie neparametric˘a. Stud. Cerc. Mat.

34(1982), 169–181.

[9] V. Preda and V. Craiu,Probleme de decizie multipl˘a. Tipografia Univ. Bucure¸sti, 1980.

[10] V. Preda,Informational characterizing of Pareto and power distributions. Bull. Math.

Soc. Sci. Math. R.S. Roumanie (N.S.)28(76) (1984), 77–79.

[11] V. Preda,Teoria deciziilor statistice. Ed. Academiei Romˆane, 1992.

[12] H. Robbins,An empirical Bayes approach to statistics. In:Proc. Third Berkeley Sympos.

Math. Statist. Probab.1(1956), 157–163.

[13] E. Samuel,An empirical Bayes approach to the testing of certain parametric hypotheses.

Ann. Math. Statist.34(1963), 1370–1385.

[14] R.S. Singh,Applications of estimators of a density and its derivatives to certain statis- tical problems. J. Roy. Statist. Soc. Ser. B39(1977), 357–363.

(10)

[15] R.S. Singh,Empirical Bayes estimation in Lebesgue-exponential families rates near the best possible rate. Ann. Statist.7 (1979), 890–902.

[16] R.C. Tiwari and J.N. Zalkikar,Empirical Bayes estimation of the scale parameter in a Pareto distribution. Comput. Statist. Data Anal.10 (1990), 261–270.

Received 11 December 2006 University of Bucharest

Faculty of Mathematics and Computer Science Str. Academiei 14

010014 Bucharest, Romania [email protected]

and

Academy of Economic Studies Department of Mathematics

Calea Dorobantilor 15-17 010552 Bucharest, Romania roxana [email protected]