CONVERGENCE RATES IN EMPIRICAL BAYES PROBLEMS WITH A WEIGHTED SQUARED-ERROR
LOSS. THE PARETO DISTRIBUTION CASE
VASILE PREDA and ROXANA CIUMARA
We study the problem of estimating the scale parameterθfor a Pareto distribution under a weighted squared-error loss through the empirical Bayes approach. An empirical Bayes estimator is proposed and some asymptotic optimality properties are given. Also, under certain conditions, the empirical Bayes estimator proposed is asymptotically optimal with rate of convergence of ordern−23.
AMS 2000 Subject Classification: 62P05.
Key words: empirical Bayes, weighted squared-error loss, asymptotic optimality, rate of convergence.
1. INTRODUCTION
Robbins [12] argued that, for some estimation problems, the information obtained at each step could be used to improve the next step decision. These procedures, known as empirical Bayes or adaptive methods, were studied by many authors and among them we remind the work of Johns [4], Samuel [13], Berger and Berliner [2], Preda [9, 11], Tiwari and Zalkikar [16], Liang [5], and Singh [15].
The usefulness of empirical Bayes estimation in practical statistical ap- plications depends on the overall risks rate of convergence to optimal risk.
The problem of convergence rates for empirical Bayes estimates was stud- ied by Lin [6], Preda [8] and by Tiwari and Zalkikar [16] and Liang [5], con- sidering a squared-error loss.
Tiwari and Zalkikar [16] found that, under certain conditions, the em- pirical Bayes estimator for the scale parameter in the Pareto distribution is asymptotically optimal and the rate of convergence is of ordern−12. Liang [5]
used the same squared-error loss, but relaxed the conditions stated in Tiwari and Zalkikar [16] and proved that the empirical Bayes estimator proposed of the same parameter of the Pareto distribution is asymptotically optimal with associated rate of convergence of ordern−23.
REV. ROUMAINE MATH. PURES APPL.,52(2007),6, 673–682
In this paper, we consider a weighted squared-error loss and propose an empirical Bayes estimator for the scale parameter of the Pareto distribution.
We assume that the weights are given by a function which satisfies certain properties.
In Section 2, we describe the Pareto distribution with a known shape parameter α and unknown scale parameter θ. Furthermore, the conditions that have to be satisfied in order to obtain the results from Sections 3 and 4 are stated. We define the Bayes risk for a weighted squared-error loss and the overall Bayes risk for a sequence of empirical Bayes estimators. Next, asymptotic optimality and rate of convergence for a sequence of empirical Bayes estimators are defined.
In Section 3, we consider the conditions imposed in Section 2 and propose an empirical Bayes estimator for the unknown scale parameter of the Pareto distribution for a class of prior distributions. A useful study of the Pareto distribution could be found in Arnold [1] and Preda [10].
In Section 4, we study asymptotic optimality and prove that under the conditions assumed the rate of convergence is of ordern−23.
2. SOME PRELIMINARIES
LetXbe a random variable having a Pareto distribution with probability density function
f(x|θ) = αθα xα+1,
wherex > θ,α >0 andθ >0. The shape parameterα is known and the scale parameterθ is unknown. We suppose that the parameterθ represents a value of a random variable Θ, which has a prior distribution functionG: (0,∞)→ [0,1]. In this case, the marginal density of X is given by
f(x) =
min(x,m)
0 f(x|θ) dG(θ) =
min(x,m)
0 f(x|θ)g(θ)dθ, where dG(θ) =g(θ)dθ.
Denotingf(x|θ) =ϕ(θ)u(x), whereϕ(θ) =αθαandu(x) = xα+11 , we get f(x) =u(x)
min(x,m)
0 αθαdG(θ) or
xα+1f(x) =
min(x,m)
0 αθαdG(θ).
As for the prior distributionG, we impose the conditions below.
Conditions on G (Liang [5]):
(A1)G(m) = 1 for some known positive real numberm.
(A2) If a∗ = sup{θ|G(θ) = 0} then f is a decreasing function in x on (a∗, m].
We consider the problem of estimating the parameterθunder a weighted squared-error loss,L:R2+→R+ defined as
L(x, θ) =w(θ) (x−θ)2,
with a weight function w : R+ → R∗+ continuous and differentiable. The robustness of loss function of this type was studied by Makov [7].
Next, we suppose thatw satisfies the conditions below.
Conditions onw:
(A3)∃c1 ∈R∗+ such thatw(θ)≤c1,∀θ∈R+.
(A4)∃c2 ∈R∗+ such that 0≤w(θ) +θw(θ)≤c2,∀θ∈R+ and
∃ε >0 such thatw(θ) +θw(θ)> εon (0, m].
(A5)∃ε0 >0 such thatε0 < q(x) =E(w(Θ)|X=x),∀x∈(0, m].
Example 2.1. Ifw :R+ → R+, w(θ) = 1+θ1 , we get 0 ≤w(θ) ≤ 1 and 0 ≤ w(θ) +θw(θ) ≤ 1, thus, conditions (A3) and (A4) are satisfied. If we consider a uniform on [0,1] prior distribution, andm= 1, then condition (A5) also holds since 12 < q(x)≤1.
The Bayes estimator of θgiven X=xis
(2.1) ϕG(x) = arg minE(L(X,Θ)|X=x) = E(Θw(Θ)|X=x) E(w(Θ)|X =x)
assuming that all posterior expectations involved in the above expression exist andE(w(Θ)|X =x)= 0.
The Bayes risk ofϕG is
R(G, ϕG) =E(L(ϕG(X),Θ)) =E
w(Θ) (ϕG(X)−Θ)2 ,
where the expectation is taken with respect to (X,Θ).
Let X1, X2, . . . , Xn be the past data, independent and indentically dis- tributed random variables with probability density functionf(x). Denote by Xn= (X1, X2, . . . , Xn) andϕn(X) =ϕn(X, Xn) the empirical Bayes estima- tor of the parameterθ based on past dataXn and the present observationX.
The conditional Bayes risk ofϕn given Xn is R(G, ϕn|Xn) =E
w(Θ) (ϕn(X)−Θ)2|Xn and
R(G, ϕn) =E(R(G, ϕn|Xn))
is the overall Bayes risk of ϕn. Here the expectation is taken with respect to Xn.
We note that becauseϕG is the Bayes estimator, that is, ϕG(x) = arg minE(L(X,Θ)|X =x) we have
R(G, ϕG)≤R(G, ϕn|Xn)
∀Xn vector of past data and ∀n∈N∗. Moreover, (2.2) R(G, ϕG)≤R(G, ϕn) ∀n∈N∗.
Thus,R(G, ϕn)−R(G, ϕG) is nonnegative and could be used as a mea- sure of performance of the empirical Bayes estimator ϕn.
Definition 2.1 (Robbins [12], Preda [11], Liang [5]). A sequence (ϕn)n≥1 of empirical Bayes estimators is said to be asymptotically optimal if
R(G, ϕn)−R(G, ϕG) −→
n→∞0.
Moreover, if R(G, ϕn)−R(G, ϕG) =O(αn), where (αn)n≥1 is a sequence of real numbersαn>0 andαn −→
n→∞0, then (ϕn)n≥1 is said to be asymptotically optimal with convergence rate of orderαn.
3. THE EMPIRICAL BAYES ESTIMATOR
In order to propose an empirical Bayes estimator, we first have to derive the Bayes estimator.
Theorem3.1. Under conditions (A1)–(A5), the Bayes estimator of the scale parameter for the Pareto distribution is given by
(3.1) ϕG(x) =
xw(x) − xα+1M(x)f(x) if 0< x≤m mw(m) −mα+1M(m)f(m) if x > m,
wherew(x) = w(x)q(x),M(x) = M(x)q(x) andM(x) = 0xθα+1(w(θ) +θw(θ)) dF(θ).
Proof. On account of (2.1), we evaluate the numeratorE(Θw(Θ)|X=x) in the expression of the Bayes estimator.
For 0< x≤m we have E(Θw(Θ)|X =x) =xw(x)
f(x) f(x)− 1 xα+1f(x)
x
0 θα+1
w(θ) +θw(θ) dF(θ) that is,
(3.2) E(Θw(Θ)|X =x) =xw(x)− 1
xα+1f(x)M(x),
whereM(x) = 0xθα+1(w(θ) +θw(θ)) dF(θ). It follows from condition (A4), thatM(x)≥0. SinceE(Θw(Θ)|X =x)≥0, we get
(3.3) 1
xα+1f(x)M(x)≤xw(x) and
(3.4) E(Θw(Θ)|X =x)≤xw(x).
Forx > mwe have
E(Θw(Θ)|X =x) = mα+2w(m)
xα+1f(x) f(m)− M(m) xα+1f(x). Since
(3.5) xα+1f(x) =mα+1f(m)
forx > m, we get
(3.6) E(Θw(Θ)|X =x) =mw(m)− M(m) mα+1f(m), for allx > m.
Because E(w(Θ)|X =x) = q(x) > 0 , w(x) = w(x)q(x) and M(x) = Mq(x)(x), for 0< x≤m we obviously have
ϕG(x) =xw(x) − M(x) xα+1f(x). Sinceq(x) =q(m), forx > mwe have
ϕG(x) =mw(m) − M(m) mα+1f(m). Thus, for x > m,
ϕG(x) =ϕG(m).
We proved before that xα+11f(x)M(x) ≤ xw(x) for 0 < x ≤ m. Since we imposed condition (A5) andq(x)>0, the previous inequality implies
(3.7) M(x)
xα+1f(x) ≤xw(x).
Now, we can express the Bayes estimator ofθas ϕG(x) =
xw(x) −xα+1M(x)f(x) if 0< x≤m mw(m) −mα+1M(m)f(m) ifx > m.
Let (bn)n≥1 be a sequence of strictly positive real numbers such that bn −→
n→∞0 and nbn −→
n→∞∞. We define
fn(x) = Fn(x+bn)−Fn(x)
bn ,
where Fn(x) is the empirical distribution function based on X1, X2, . . . , Xn. We note thatfn(x) can be expressed as
(3.8) fn(x) = 1
nbn n j=1
I(x,x+bn](Xj) .
Moreover, E(fn(x)) = F(x+bnb)−F(x)
n n→∞−→ f(x). Thus, fn(x) is a consistent estimator off(x) (Iosifescu, Mihoc and Theodorescu [3]).
Next, define (3.9) Mn(x) = 1
n n j=1
Xjα+1
w(Xj) +Xjw(Xj)
I(0,x)(Xj) .
We can easily see that E(Mn(x)) =M(x) since X1, X2, . . . , Xn are indepen- dent and identically distributed random variables:
E(Mn(x)) = 1 nnE
Xjα+1
w(Xj) +Xjw(Xj)
I(0,x)(Xj)
=
= x
0 θα+1
w(θ) +θw(θ)
dF(θ) =M(x).
Thus, Mn(x) is a consistent estimator of M(x) (Iosifescu, Mihoc and Theo- dorescu [3]).
The empirical Bayes estimator for the scale parameterθthat we propose is given by
(3.10) ϕn(X) =
Xw(X) − Mn(X) Xα+1fn(X)
I(0,m](X)∨0
+
+
mw(m) − Mn(m) mα+1fn(m)
I(m,∞)(X)∨0
, whereMn= Mqn and a∨b= max (a, b).
4. ASYMPTOTIC OPTIMALITY OF
THE EMPIRICAL BAYES ESTIMATOR PROPOSED In this section we study the asymptotic optimality of empirical Bayes estimator. Our analysis is based on conditions (A1)–(A5). The main result is as follows.
Theorem4.1. If (bn)n≥1 is a sequence of strictly positive real numbers such that bn −→
n→∞0and nbn −→
n→∞∞, (ϕn)n is the sequence of empirical Bayes estimators (3.10) and ϕG is the Bayes estimator (3.1), then
R(G, ϕn)−R(G, ϕG) =O 1
n
+O 1
nbn
+O b2n
.
Proof. SinceR(G, ϕG)≤R(G, ϕn) and condition (A3) holds, we have 0≤R(G, ϕn)−R(G, ϕG) =
=E(R(G, ϕn|Xn))−E
w(Θ) (ϕG(X)−Θ)2 ≤
≤c1E
(ϕn(X)−ϕG(X))2 . Moreover,
0≤R(G, ϕn)−R(G, ϕG)≤c1E
(ϕn(X)−ϕG(X))2
=
=c1 m
0 E
(ϕn(x)−ϕG(x))2
f(x)dx+
∞
m E
(ϕn(x)−ϕG(x))2 f(x)dx
.
For x > m we have ϕG(x) = ϕG(m) and ϕn(x) = ϕn(m). Therefore, ϕn(x)−ϕG(x) =ϕn(m)−ϕG(m) and, consequently,
∞
m E
(ϕn(x)−ϕG(x))2
f(x)dx=E
(ϕn(m)−ϕG(m))2
·(1−F(m)) . Assume now that 0 < x ≤ m. In this case, on account of conditions (A4) we have 0≤ ϕG(x) ≤xw(x) = xw(x)q(x) and 0 ≤ϕn(x)≤ xw(x) = xw(x)q(x) becauseMn(x)≥0. We thus obtain
|(ϕn(x)−ϕG(x))| ≤xw(x) = xw(x) q(x).
So, considering the expressions of ϕn(x) and ϕG(x) and following the same reasoning as in Singh [14], we get
E
(ϕn(x)−ϕG(x))2
=E
Mn(x)
q(x)·xα+1fn(x) − M(x) q(x)·xα+1f(x)
2
≤
≤ 8 f2(x)q2(x)E
Mn(x)−M(x) xα+1
2 +
+ 8
f2(x)q2(x)
M(x) xα+1f(x)
2
+x2w2(x) 2
E
(fn(x)−f(x))2 .
SinceE(Mn(x)) =M(x), we have E
Mn(x)−M(x) xα+1
2
= Var
Mn(x) xα+1
=
= Var
1 n
n j=1
Xjα+1 xα+1
w(Xj) +Xjw(Xj)
I(0,x)(Xj)
≤ 1 nc22 from condition (A4). Moreover,
E
(fn(x)−f(x))2
= Var (fn(x)) + (E(fn(x))−f(x))2 with
Var (fn(x)) = Var
1 nbn
n j=1
I(x,x+bn](Xj)
≤ 1 nb2n
x+bn
x f(y)dy≤ f(x) nbn ,
where the last inequality holds because of condition (A2) while, from Liang [5], (E(fn(x))−f(x))2 ≤f2(x)(α+ 1)2b2n
4x2 . Finally,
M(x) xα+1f(x)
2
+x2w2(x)
2 ≤x2w2(x) +x2w2(x)
2 ≤2x2w2(x). Now, on account of the above expressions, for 0< x≤m we have E
(ϕn(x)−ϕG(x))2
≤ 8c22
nf2(x)q2(x)+ 16x2c21
nbnf(x)q2(x) +4c21(α+ 1)2b2n q2(x) and
E
(ϕn(m)−ϕG(m))2
≤ 8c22
nf2(m)q2(m) + 16m2c21
nbnf(m)q2(m) +4c21(α+1)2b2n q2(m) =
=O 1
n
+O 1
nbn
+O b2n
.
Then, since (A2) and (A5) hold, we get m
0
E
(ϕn(x)−ϕG(x))2
f(x)dx≤1 n
8mc22 ε20f(m)+ 1
nbn
16m3c21
3ε20 +b2n4c21(α+1)2 ε20 =
=O 1
n
+O 1
nbn
+O b2n
.
Summaryzing the results obtained by now, we have 0≤R(G, ϕn)−R(G, ϕG) =O
1 n
+O
1 nbn
+O
b2n
.
Remark 4.1. Under the conditions of Theorem 3.1, ifw(θ) = 1 then we get the Bayes estimator and, respectively, the empirical Bayes estimator from Liang [5].
REFERENCES
[1] B.C. Arnold,Pareto Distributions.International Co-operative Publishing House, Fair- land, MD, 1983.
[2] J. Berger and L.M. Berliner,Robust Bayes empirical Bayes analysis withε-contaminated priors. Ann. Statist.14(1986),2, 461–486.
[3] M. Iosifescu, G. Mihoc and R. Theodorescu,Teoria probabilit˘at¸ilor ¸si statistica matema- tic˘a.Ed. Tehnic˘a, Bucure¸sti, 1966.
[4] M.V. Johns, Jr., Nonparametric empirical Bayes procedures. Ann. Math. Statist. 28 (1957), 649–669.
[5] T.C. Liang,Convergence rates for empirical Bayes estimation of the scale parameter in a Pareto distribution. Comput. Statist. Data Anal.16(1993), 35–45.
[6] P.E. Lin, Rates of convergence in empirical Bayes estimation problems: Continuous case. Ann. Statist.3(1975), 155–164.
[7] U. Makov, Loss robustness via Fisher-weighted squared-error loss function. Insurance Math. Econom.16(1995), 1–6.
[8] V. Preda, Entropia ponderat˘a ¸si problema de select¸ie neparametric˘a. Stud. Cerc. Mat.
34(1982), 169–181.
[9] V. Preda and V. Craiu,Probleme de decizie multipl˘a. Tipografia Univ. Bucure¸sti, 1980.
[10] V. Preda,Informational characterizing of Pareto and power distributions. Bull. Math.
Soc. Sci. Math. R.S. Roumanie (N.S.)28(76) (1984), 77–79.
[11] V. Preda,Teoria deciziilor statistice. Ed. Academiei Romˆane, 1992.
[12] H. Robbins,An empirical Bayes approach to statistics. In:Proc. Third Berkeley Sympos.
Math. Statist. Probab.1(1956), 157–163.
[13] E. Samuel,An empirical Bayes approach to the testing of certain parametric hypotheses.
Ann. Math. Statist.34(1963), 1370–1385.
[14] R.S. Singh,Applications of estimators of a density and its derivatives to certain statis- tical problems. J. Roy. Statist. Soc. Ser. B39(1977), 357–363.
[15] R.S. Singh,Empirical Bayes estimation in Lebesgue-exponential families rates near the best possible rate. Ann. Statist.7 (1979), 890–902.
[16] R.C. Tiwari and J.N. Zalkikar,Empirical Bayes estimation of the scale parameter in a Pareto distribution. Comput. Statist. Data Anal.10 (1990), 261–270.
Received 11 December 2006 University of Bucharest
Faculty of Mathematics and Computer Science Str. Academiei 14
010014 Bucharest, Romania [email protected]
and
Academy of Economic Studies Department of Mathematics
Calea Dorobantilor 15-17 010552 Bucharest, Romania roxana [email protected]