On linear estimates with nearly minimum variance By GUNNAR BLOM

(1)

A R K I V F O R M A T E M A T I K B a n d 3 n r 31

C o m m u n i c a t e d 23 M a y 1956 b y HAXALD C ~

On linear estimates with nearly m i n i m u m variance

B y GUNNAR BLOM

1. I n t r o d u c t i o n

L e t z be a r a n d o m variable with a continuous cumulative distribution function F [ ( z - / ~ ) / a ] which depends upon two unknown p a r a m e t e r s # a n d a. Con- sider an ordered r a n d o m sample

Z(1) ~ z ( 2 ) ~ . . . ~Z(n )

of z-values. I f the m e a n s and covariances of the reduced order statistics x~ z ( ° - # ( i = l , 2 ... n)

are known, it is possible to find linear unbiased m i n i m u m variance estimates gl,z(o and ~ g2,z(o

i - 1 i = l

of /~ a n d a respectively (Lloyd, 1952). These estimates m a y be called best unbiased estimates. A serious d r a w b a c k of the solution is t h a t in m o s t cases it involves v e r y time-consuming numerical calculations.

The object of this p a p e r is to show t h a t , under general conditions, it is possible to find a convenient approximation to the best solution which m a y be t e r m e d a nearly best unbiased estimate. The variance of this estimate is, as some examples will show, often v e r y little in excess of the m i n i m u m variance.

T h e method presupposes t h a t t h e means (but not the covariances) of t h e variables x~ are known.

B y a slight modification of the method it m a y be used also when neither t h e means nor the covariances are known. The resulting estimates will be called nearly best, nearly unbiased estimates.

B o t h types of estimates mentioned above m a y be derived from a t h e o r e m given in the n e x t section.

2. A t h e o r e m o n linear e s t i m a t e s

Denote b y E x~ a n d coy (x~, xj) the means and covariances of the variables xf ( i = 1 , 2 . . . n). P u t p~=i/(n+l) and q~=l-p~. Further, define 2 i = G ( p t ) , where G(u) is the inverse function of F(x).

(2)

C. BLOM, Linear estimates with nearly minimum variance Consider a linear estimate

U

= }

gt z(o

i=1

(2.1)

of the p a r a m e t e r k l # + k~a, where k I and k~ are given constants.

T h e o r e m . Let

00, 01 . . . On, On+l

be any set o/ quantities subject to the conditions

0 o = 0 , + 1 = 0 and 0 , * 0 ( i = 1 , 2 . . . n).

Replace the coe[/icients gi in (2.1) by new quantities ho, hi . . . h,~

de/(ned, apart [rom an additive constant, by the relations

I[ the conditions

are [ul/illed, where Cli = 0 t - 0i+l ;

g, = O, (h~ - h i - l ) ( i = 1 . . . n ) .

Cl, h,= kl ; ~ C2, h,= k~.

i=O i=O

Ce~=O, Ex~-O,+lExi+l ( i = 0 , 1 . . . n)

(2.2)

(2.3)

(2.4)

then U in (2.1) /s an unbiased estimate of kltz +k~(r.

Further, the variance o/ U can be written

w h e r ~

(,~ [ 1 ~ ( 1 "h]~' ]

var U = - ~ - 2 ~ - l ,~oh~- ~n~-l ,~o ,1 j

R = ~ g~ g~ R~ s,

i,/=l

Rij being defined by

• P i q ] (Oi ~)'~1 + R1/ (i --~ j).

COY (X, x j ) = ~ - ~

+ ~ R, (2.5)

The t h e o r e m holds for a n y choice of quantities 0~. I t is, however, useful only when t h e y are chosen so t h a t R in (2.5) behaves in a suitable way. I n particular, it is desirable to choose 0 , so t h a t R tends rapidly to zero when n increases. When F(x) has a derivative [(x) which is continuous (except, pos- sibly, in the end-points of the range of variation), we take 0 ~ = / ( ~ ) .

(3)

ARKIV FOR MATEMATIK. B d 3 nr 31

3. D e r i v a t i o n o f t h e n e a r l y b e s t u n b i a s e d e s t i m a t e

T h e variance of U given in (2.5) is, a p a r t f r o m the last t e r m , proportional to

Z = h? 1 ht

L e t now Z be minimized with respect to t h e h~ subject to the side conditions (2.3). T h e resulting quantities hf obviously provide an unbiased e s t i m a t e of k l # + k2a, b u t there is no guarantee t h a t t h e true m i n i m u m of vat U is obtained, since R is neglected. Determining ht in this w a y a n d using (2.2) we find as coefficients of the nearly best estimate

g, = 0t [al ( C l f - C1 ~-1) + as (C2~ - Co.i_~)], where a 1 a n d a 2 are two multipliers. Introducing the notation

i = 0

D = d l l d22 - dl~, we h a v e

1

a 1 = ~ (kl d2~ - k s d12),

1

a 2 = ~ ( - k 1 d21 + k2 dlx ).

(3.1)

W h e n the distribution is symmetrical, dl~= d21= 0.

The coefficients of the nearly best estimates of /~ a n d a are obtained b y t a k i n g k x = 1, k 2 = 0 and k 1 = 0 , k e = 1 respectively in these relations. I t should be noticed t h a t t h e solution depends u p o n t h e first and second order differences of the sequences {0,} a n d {O, Ex,} ( i = 0 , 1 . . . n + l ) .

The nearly best estimates of # a n d a h a v e been determined in the case n = 5 for some special functions F(x) for which best estimates are known. I n these examples the quantities 0, have been p u t equal to /(2i).

As a measure of t h e efficiency of the estimates the quotient of the variance of the best estimate a n d the variance of the nearly best estimate has been used.

(The last-mentioned q u a n t i t y has been determined accurately b y aid of existing tables, not from the approximation formula obtained b y neglecting R in (2.5).) The result of the calculations is given in Table 1. As seen from the table t h e loss of efficiency b y using nearly best instead of best estimates is v e r y small in these examples. (The estimate of // in the normal case is of little interest b u t has been included for completeness.)

I t should be mentioned t h a t , in the rectangular case, the nearly best e s t i m a t e is identical with the well-known best estimate based upon t h e extreme values of t h e sample.

W h e n F (x) has a continuous derivative which vanishes in the end-points of the range of variation, it m a y sometimes be convenient to replace the second

(4)

G, BLOM, Linear estimates with nearly minimum variance

Table 1.

Efficiency of nearly best linear estimates.

Distribution

R e c t a n g u l a r

Normal TrianOgul ar Extreme value Right triangular

P a r a m e t e r

1O0 % 100 %

99.8 99.7

94.4 99.6

98.8 96.6

99.9 99.8

Best estimate given by Lloyd Godwin Sarhan Lieblein Downton

Efficieney ffi quotient of variance of best linear unbiased estimate and variance of nearly best linear unbiased estimate.

order differences in (3.1) b y the derivatives of /(x). I f this modification is used in t h e case of the normal distribution, the nearly best estimate of/~ is replaced b y the sample m e a n and the nearly best estimate of a b y t h e e s t i m a t e

iffil

t = 1

The variance of this estimate is practically the s a m e as t h a t of the best esti- m a t e , the loss of efficiency being a b o u t 0 . 1 % when

n=5.

I t deserves to be mentioned t h a t the estimates obtained b y t h e modified m e t h o d are, a p a r t from the values of t h e multipliers, identical with the esti- m a t e s studied b y J u n g (1955) in the case of a class of distributions which, however, does n o t include t h e normal distribution.

4. Nearly best, nearly unbiased estimates

W h e n the m e a n s of the variables xf are unknown, the a p p r o x i m a t i o n

Ex,,.,,G(n i--~

1)

is used in t h e definition of C2t in (2.4). I n other respects the procedure is the s a m e as in t h e preceding section. The resulting estimates will in general be biased. B y a suitable choice of constants ~ a n d / ~ the bias m a y , however, often be v e r y small. General rules for the selection of the constants m a y be formu- lated. The variance of the estimates obtained in this w a y is often practically t h e same as t h a t of t h e nearly best unbiased estimate.

Finally it deserves to be mentioned t h a t b y a s t u d y of the a s y m p t o t i c variance of n e a r l y best estimates, it is possible to obtain information a b o u t the n a t u r e of estimates which are non-regular in t h e sense used b y Cram~r (1946, Ch. 32).

Details concerning the m e t h o d described in this p a p e r will be given in a forthcoming publication.

(5)

XRKIV FOR MATEMATIK. B d 3 n r 3 1

R E F E R E N C E S

CRAM~R, H. (1946), Mathematical Methods of Statistics. Princeton.

DOWNTO1% F. (1954), Least-squares estimates using ordered observations. Ann. Math. Star.

25, 303.

GODWIN, H. J . (t949), On t h e estimation of dispersion b y linear s y s t e m a t i c statistics. Bio- m e t r i k a 36, 92.

Ju~G, J. (1955), On linear estimates defined b y a continuous weight function. Arkiv fSr M a t e m a t i k B d 3 m" 15, 199.

LIEBL~.I~, J. (1954), A new m e t h o d of analyzing extreme-value data. Nat. Adv. Comm.

Aeronat. Techn. Note 3053.

LLOYD, E. H. (1952), Least-squares estimation of location a n d scale p a r a m e t e r s using order statistics. B i o m e t r i k a 39, 88.

SARHAN, A. E. (1954), E s t i m a t i o n of t h e m e a n a n d s t a n d a r d deviation b y order statistics.

Ann. Math. Star. 25, 317.

Tryckt den 27 december 1956 Uppsala 1956. Almqvist & Wiksells Boktryckerl AB