• Aucun résultat trouvé

Comparison of tests for non-parametric hypotheses By ERIK RUIST

N/A
N/A
Protected

Academic year: 2022

Partager "Comparison of tests for non-parametric hypotheses By ERIK RUIST "

Copied!
31
0
0

Texte intégral

(1)

A R K I V F O R M A T E M A T I K B a n d 3 n r 9

Communicated 10 March 1954 by ARNE WESTGItEN a n d HARALD CRA~I~-R

Comparison of tests for non-parametric hypotheses By ERIK RUIST

W i t h 1 figure in t h e t e x t

L S u m m a r y

The main object of this paper is to find a criterion for comparison of two tests for non-parametric hypotheses, taking advantage of the qualitative information t h a t m a y exist. After a detailed analysis of the problem and some earlier suggestions for its solution (sections 2-4), a criterion is suggested in section 5. I n order to apply it to concrete case, a location problem is specified in section 6. The rank tests to be compared are analyzed in section 7, and the comparison b y w a y of the criterion is carried out in section 8. I t turns out t h a t sign tests, sometimes slightly modified , are v e r y often optimal according to the criterion used.

2. Statistical hypotheses testing

The general problem of testing statistical hypotheses can be described in the following way. We are interested in the distribution F of a stochastic variable X, which m a y generally be thought of as a vector {X1, X~ . . . Xn} the elements of which m a y correspond to different dimensions of the variable a n d / o r to several observations on the same variable. Let us make the following basic assumptions.

I n the sample space there is a a-algebra of subsets (:~). All probability measures are defined on (~). All functions introduced in the following arc measurable (:~), and all sets belong to (~).

We know t h a t F E D , a class of probability distributions. After drawing a sample, i.e. observing a sample point x, we want to decide to which of a number of mutually exclusive sub-classes o)1, co 2 .. . . , w m of D we shall refer F. I n the most common cases, m = 2. The problem is then often stated to be the testing of the null hypothesis Hi:

FEool,

against the alternative hypothesis H2:

FEw~.

If eoi is not topologically equiv- alent to a finite-dimensional Euclidean space, H/ is said to be a non-parametric h y p o t h e s i s / I n the following, we shall be mainly concerned with such hypotheses.

1 T h e n a m e is n o t a v e r y g o o d one, as co i is o f t e n c h a r a c t e r i z e d b y a p a r a m e t e r . A s a n e x a m p l e , eot m a y c o n t a i n a l l F t h e m a r g i n a l d i s t r i b u t i o n s of w h i c h , w i t h r e s p e c t t o a l l

Xi,

h a v e p o s i t i v e m e d i a n s . T h e t e r m s e e m s t o h a v e b e e n f i r s t u s e d b y WO•FOWITZ (1942) f o r t h e t r u l y n o n - p a r a - m e t r i c t w o - s a m p l e p r o b l e m w h e r e eo I c o n t a i n s all F t h a t a r e i n v a r i a n t w i t h r e s p e c t t o a l l p e r - m u t a t i o n s of X1, . . . ,

Xn,

w h i l e to~ c o n t a i n s a l l ~w i n v a r i a n t o n l y t o p e r m u t a t i o n s within" e a c h of t h e s e q u e n c e s X1, . . . , X k a n d X k + 1 . . . . ,

Xn,

b u t n o t b e t w e e n t h e m .

(2)

E. RU~ST, N o n - p a r a m e t r i c h y p o t h e s e s

Let us denote the decision to accept H i b y d~ (i = 1, 2). I t is then required to con- struct a test ~, i.e. a function ~ (x) of the sample point x, such t h a t 0 < ~ ( x ) _< 1, indicating for every sample point the probability with which decision d~ should be made. B y the s i z e of the test we shall mean

= sup P (d~ ] ~, F) = sup f ~ (x) d F (x)

F E ~ 1 ~e0l

where the integral is to be taken over the entire sample space. If P (d~ [ ~, F) = 0¢

for all F 6 w ~ the test is said to be s i m i l a r with respect to wl. Usually, with each test (x) there is associated a real-valued function y~ (x) such t h a t

(])

1 if ~ ( x ) > c (x)= p if y J ( x ) = c

0 if v2(x)<c

where c and p are parameters, 0 _< p _< 1. The choice of c and p determines the value of ~. B y changing c and p a n y value of ~ (0 _< a _< 1) can be obtained. To every value of a there is only one relevant combination of c and p (if P {~(x) = c I F 6 W l } = O, the value of p is irrelevant). Thus y~(x) m a y be said to generate a [ a m i l y o] test~¢

~b = {~ (x)} where to each value of ¢¢ corresponds one, and only one, ~ (x). If necessary, we shM1 denote this member of the family b y ~ . According to this definition, the t test is a test family, generated b y

~o (x) = ~ - m ~ _ 1.

8

Another type of tests which is of interest in the present context is obtained if c and p are replaced b y functions c (x) and p (x) of the sample point. This is the case with the invariance tests investigated by LEH~e~N a n d ST~n~ (1949). Denote b y x (1), x c21 . . . . , x ¢M) the sample points obtained from x by attaching in all possible w a y s plus and minus signs to the absolute values of the components of x, a n d let t h e order among the sample points be defined b y

~0 (x (1)) > ~o (x (2)) > - - - > ~o (x(M)).

Then we define c ( x ) = ~fl(x a+tM~])) where [M~] denotes the largest integer less t h a a or equal to M~. The value of p ( x ) is determined so as to give to the test the desired

1 x) size a. A simple example is the following test of s y m m e t r y around 0: Let n~0 ( = ~,, the sample mean. Suppose t h a t the sample values are 6, 3, - 1, 4, 2, 3, so t h a t yJ(x) =

= 17. If we choose c¢ =0.05, we get [M~] = 3. Further, ~0(x m) = 19, ~o(x (~) = 17,.

~p(x

TM)

= 15, ~(x (4~) = 13, so c ( x ) = 13, and since ~p(x) > c ( x ) we find ~0(x) = 1, i.e.

we reject the hypothesis of symmetry.

F o r this type of tests also, a n y value of :¢ could be obtained by the use of one ~o (x).

Thus, this function generates one family of tests when c is a parameter, independent.

of x, and another family when c (x) takes values depending on x in the w a y described above. We m a y call them the fixed-limit test family and the permutation test family

(3)

ARKIV FOR MATEMATIK. B d 3 n r 9 g e n e r a t e d b y ~v (x). I t is clear t h a t a n y strictly increasing f u n c t i o n of v 2 (x) will g e n e r a t e t h e same families as ~v(x). HO~.FFDr~O (1952) has s h o w n t h a t , subject t o certain restrictions, t h e t w o families g e n e r a t e d b y a f u n c t i o n ~v (x) are a s y m p t o t i c a l l y equiva- l e n t for increasing n. I n this p a p e r we shall be concerned w i t h these t y p e s of tests exclusively, a n d b y " a t e s t " we shall m e a n a test belonging to a fixed-limit or a p e r m u t a t i o n test family.

A test f a m i l y m a y be said t o be

distribution-/tee

in a class co of distributions, if t h e distribution of q~ (x) is t h e same for all FEco. This means, in t h e cases considered here, t h a t t h e probabilities P {q~ (x) = 1}, P (q0 (x) = p}, a n d P {~0 (x) = 0} are t h e s a m e for all FEco. Thus, t h e t test f a m i l y is distribution-free in t h e class of n o r m a l distributions w i t h m e a n m. W h e n testing a n o n - p a r a m e t r i c h y p o t h e s i s H 1, it is often c o n v e n i e n t t o use a test f r o m a f a m i l y t h a t is distribution-free in t h e c o r r e s p o n d i n g cor Accordingly, these tests are usually called n o n - p a r a m e t r i c . As t h e y are also used i n other cases, a n d as t h e y often test t h e value of some p a r a m e t e r , t h e n a m e is mis- leading, a n d t h e t e r m distribution-free will be used here.

There is a simple relation between similar tests a n d distribution-free test families as is seen f r o m t h e following

T h e o r e m 1. A test f a m i l y is distribution-free in cos, if a n d o n l y if all tests be- longing to it are similar w i t h respect to co 1.

P r o o f : F o r t h e sake of clarity, d e n o t e c a n d p b y c~ a n d p~. Then, for a n y given a n d F , let

~s (~, F ) = P {~ (x) = 1 I~, F } = P {~v (x) > c~ I F }

~ ( ~ , F) = P {~(~)

=p~ [~, F} = P

{~(~) = c~ IF}

~ ( ~ , F) = P

{q0 (x)

= 0 ]~, F} = P

{~v(x) < c~ I F } .

3

B y definition, ~ ~ (~, F ) = 1 a n d ~s (~, F ) + p~ ~ , (a, F) = ~ for all

Fecos.

N o w suppose t h a t for t w o distributions F ' , lv"eco s t h e i n e q u a l i t y z , ( a , F ' ) ~ = z ~ ( ~ , F " ) i s t r u e a t least for some i a n d for some c¢, s a y ~1- T h e n define c¢, = ~s (as, F ' ) a n d

~3 = ~l(~S, F ' ) + ~ ( ~ s , F ' ) . T o reach these sizes for

F',

we m u s t m a k e

B u t , as t h e test is similar,

a n d

p~, = 0 p~0 = 1.

~1(~,, F " ) = z 1 ( ~ 3 , F " ) = z 1 ( ~ i , F " ) = ~

~ ( ~ z , F " ) =~2(~1, F " ) = ~ 3 - ~ 2

~ 3 (~1, F ' ) = ~3 (~1, F " ) = 1 - ~3-

(4)

E . R U I S T ,

Non-parametric hypotheses

Thus, the probabilities must be equal for a n y value of a, and the test family is distribution-free, which completes the proof of the first part of the theorem. As the converse of this is trivial, the second p a r t follows.

3. Optimality criteria

The choice among all possible tests in a given situation is guided b y the general aim of keeping the probability of a wrong decision as small as possible. F o r a test ~%, the probability of wrongly making decision d2 is at most ~, and the probability of wrongly making decision d I is

P (dl [ qga, F eo~2).

Its complement,

P (d 2 ] q~a, F Eeo2) =

= 1 - P (dll ~0~, F Eo)2) = f ~a (x)d F (x) is called the

power

of the test, and is a function of F. Several criteria have been suggested for discriminating between the tests.

Some of these pick out one test as the best one, weighting in some w a y the seriousness of the different kinds of wrong decisions. Others point out the

test ]amily

to be used, leaving the size of the test to be determined subjectively. These criteria are based on the power function only. Sometimes the choice of size is supposed to be made prior to the choice of a test family, b u t this is generally equivalent to using the reverse order, as in most cases all tests of a family have the same optimality proper- ties.

Suppose first t h a t the size of the test is chosen subjectively. Then in the simple case when Wl and w2 each contain only one distribution, say _F i and F2, it is clearly desirable to use a test family which for a n y given ~ maximizes

P(d2 Iq~a, F2),

i.e.

most power]ul

tests. Even in more general cases there sometimes, though rarely, exists a

unilormly most power/ul

test family, the tests ~' of which satisfy

P(d 2

[q~, F) = max

P(d 2

[ ~ , F) for all FE(o2.

c a

When no uniformly most powerful tests exist, the choice between several test families is not so obvious. If, however, two tests of size :¢ satisfy

P (d 2 ]~0~, F) _> P (d2 ]q~, 2 ' ) for all FEw2

P (d2[ q~'~, F) > P (d21 q~, F)

for some FE~o 2

~0~ is said to be

uni/ormly more power/ul

t h a n q~ (WiLI), 1942), a n d ~9~ is clearly pre- ferred to V~- A test for which no uniformly more powerful test exists m a y be called

or-admissible

(L~HMANN, 1947, called such a test admissible). A test t h a t is more powerful t h a n a n y other test of the same size for at least one

FEw2

is obviously a-admissible. A test family all tests of which are a-admissible m a y be called admissible.

Test families t h a t are n o t admissible are usually undesirable.

For the choice between admissible test families some additional criterion, more or less subjectively chosen, is necessary. I t m a y have the effect of disqualifying a class of test families having a property t h a t is regarded as undesirable. An example of such a property is bias, which means t h a t there exists a F ' E o 2 for which

B(d2]

~0~, F ' ) < ~¢. If all biased families are excluded from the class of admissible test families, there m a y sometimes be only one left, the uniformly most powerful unbiased test family.

Another method of choosing between admissible families is to take special regard to the power against some specific

FEw,..

Thus, for the case where co i contains only

(5)

ARKIV FSR MATEMATIK. B d 3 n r 9

one element, a n d co~ is a p a r a m e t r i c family of distributions, thus ~ being a finite- dimensional p a r a m e t e r space, N E ~ A ~ and PERSON (1936) suggested t h a t a m o n g all unbiased tests of size a, t h e test with the steepest power function in the vicinity of col should be regarded as best. T h e y knew t h a t the criterion could be criticized, since usually a wrong decision d 1 is not so serious when F is close to co x as when it is far from it. T h e y hoped, however, t h a t a good test according to this criterion would also behave well f a r t h e r a w a y from co~.

W ~ D (1942) suggested another criterion, saying t h a t a t e s t q" is

most stringent,

if it minimizes

m a x [sup p ( ~

1 ~ ,

F) - P (d~

I ~ ,

F)].

Thus, the criterion takes account of the m a x i m a l deviation f r o m the envelope power function.

F r o m a n intuitive point of view, an o p t i m a l i t y criterion should preferably t a k e account of the whole of the power function. LINDLEY (1953) tried to do this, saying essentially t h a t a test is optimal, if it minimizes a weighted average p r o b a b i l i t y of a wrong decision. Thus, he considered not only P (di

I q,

F eco~) b u t also P(d~

[

q, Fecol) a n d accordingly he obtained a unique optimal test, not a t e s t family. I n the situation described above, if co~ is a one-parameter family {F(0)), a test is optimal according to Lindley if it minimizes

VlP (d~lq, Fecol)+ f %(o) P(dl]q~, F(O))dO

where v~ and v~ (0) are fixed weights. The weights are derived b y a combined evalua- tion of the belief in the plausibility of the distribution a n d the seriousness of com- mitting the corresponding error. I t seems v e r y difficult to determine such weights in a given situation, a n d hence the applicability of the criterion is restricted.

WXLD (1950) a t t a c k e d the problem differently, b u t he also introduced numerical weights, first to all

Ffico2,

indicating the seriousness of, or the loss suffered from, m a k i n g decision d 1 when the true distribution is F. Considerations of s y m m e t r y led to the introduction of corresponding weights in col, indicating for a n y Ffico 1 the seriousness of making t h e wrong decision d 2. L e t us denote the weight function b y

W(F, dJ.

F o r completeness, let

W(F,

di) = 0 for FEco~ (i = 1, 2). Now, for a n y a n d a n y F, the expected loss equals r@, F) = r~@, F) + r~(% F ) where

ri( % F) =P(di[ % F) W(F, di).

A test q ' is said to be

uni/ormly better

t h a n ~ if

r @ ' , F) < r ( q , F) for all F e / 2 , a n d r ( 9 ' , F ) < r@, F) for some Fc~Q which is in this case equivalent to

P(d, lqJ, F)>P(d,]%F) forallFeco~,i=l, 2, and

P (di ] ~0', F) > P (di] % F) for some

F ecol.

This concept is seen to be related to the concept " u n i f o r m l y more powerful", b u t is m o r e general, since it admits a comparison of two tests of different sizes, a n d stronger,

(6)

~.. RUIST, Non-parametric hypotheses

since it takes account of col as well as of co 2. A test for which no uniformly better test exists is called admissible. This is a stronger concept t h a n co-admissibility.

Admissibility implies ~-admissibility, b u t the c o n t r a r y is n o t necessarily true.

For the choice between admissible tests, WXLD introduced the m i n i m a x principle, saying t h a t the test ~' is optimal in the minimax sense if

sup r (~', F ) = rain sup r (~0, F).

F~.Q ~o Fefl

B y this criterion also, we do not get a family of tests of different sizes, but usually only one test, owing to the fact t h a t we have evaluated the relative seriousness of committing the two possible kinds of error. According to a theorem of SVE~DRUI"

(1952), a minimax test is admissible, if it is unique, b u t not necessarily otherwise.

For a n y test ~0, the distribution F ' for which r (~0, F ' ) = sup r (~, F )

FE£)

is, if it exists, called a least favorable distribution with respect to ~0. WALD showed that, subject to rather mild restrictions, the minimax test ¢ ' satisfies

r (~', F ' ) = m i n r (q, F')

¢

if F ' is a least favorable distribution with respect to ~0'. Thus, a minimax test is optimal for its least favorable distributions, meaning inter alia t h a t if F'Eco2, q~' is more powerful relative ix) F' than a n y other test of the same size.

Let, as above, eo 1 contain only one element and co 2 be a parametric family, a n d W ( F , d~) = k i for F$co i. Then, for all unbiased tests,

r l ( ~ p , ~ ) <

]c1(1

- - 0¢),

but if the power function is continuous, r l ( q ~ , F ) - + k l ( 1 - ~ ) as F - + o J r Thus, the m a x i m u m risk in e% is kl (1 - ~ ) + e where e-->0. I n 0) 1 the risk r 2 is always = k2:¢.

Then, a test is minimax if it is optimal for F-->oJl, and of size ¢¢ so t h a t k1(1 - ~¢) =

= k2~, i.e. ~ kl . Such a test has higher power in the close vicinity of 0) 1, i.e.

k 1 + k 2

steeper power function in t h a t vicinity, t h a n a n y other test, and is thus optimal according to the N e y m a n - P e a r s o n criterion. This is a slight modification of a theorem b y SVV, RDRUP (1953, p. 85).

The minimax approach is applicable to a n y s e t / 2 and is thus of great generality.

I t has, however, been subjected to some criticism. Thus, it is said not to be practical to let the worst possible situation guide one's behavior. Nor is it usually possible t o define the absolute value of the loss for different F, i.e. the weight function. Speci- fically, the effect of wrongly making decision d 1 m a y be of a kind quite different from t h a t of wrongly making decision d2. Thus, the choice of a test within the " b e s t "

test family cannot be made on the basis of quantitative judgement. The practical applicability of the minimax principle to non-parametric problems is limited also b y the fact t h a t the problem of finding the minimax test for a given situation h a s so far been solved only for some special cases.

HO~.FFD~TG (1951) has used the minimax criterion in a w a y slightly different from WALD'S. Thus, he agrees with the second part of the criticism cited above, t h a t it is

(7)

ARKIV F6R MATEMATIK. B d 3 n r 9

often impossible to introduce a weight function in f2, a n d to c o m p a r e the two kinds of wrong decisions. Consequently, he restricts himself to defining a weight function W ( F , dl) in eo~ only. We m i g h t call such a weight function incomplete. HOEFFDINO defines a test ~ of given size c¢ as being of m i n i m a x risk if

sup r 1 ( ~ , F) = rain sup r 1 ( ~ , F).

Feeoz q~ Fem2

H e further points out t h a t although it m a y not be realistic to give numerical weights to different Few2, i t is usually possible to r a n k all pairs F 1, F26co 2 according to whether the weight of F 1 should be greater than, equal to, or smaller t h a n t h a t of F2.

This is equivalent to partitioning eo 2 into disjoint sets ~2 (a) where a is a real valued p a r a m e t e r such t h a t

W ( F 1, dl) <_ W(F2, dl) if F l e f 2 ( a l ) and F~6f2(a2) where a 1 < a 2, a n d W(F1, dl) = W(Fa, dl) if F , , F26~Q(a).

We m a y call such a weight function a-ordered incomplete.

Now, HOEFFDINO has shown t h a t if a test ~ satisfies the condition inf P ( d p . { q ~ ' , F ) = m a x inf P ( d 2 { T ~ , F ) for all a

F e ~ ta) q~o~ FefJ(a)

it minimizes, a m o n g all tests of size ~, the m a x i m u m risk r 1 ( ~ , F) with respect to all a-ordered incomplete weight functions. Thus, HOEFFDING'S criterion gives tests t h a t are optimal in a great m a n y situations, b u t it does not, like WALD'S, choose a test, only a test family. IqOEFFDINO gives examples of such test families of maximin power with respect to {~Q (a)}. As in the case of complete weight functions, we m a y speak of F'6o~ 2 as a least favorable distribution in eo~ relative to a test family ¢ , if for a n y

r I ( ~ , iV') = s u p r I (0%, F).

F~co~

B y restricting F to f2 (a) we get b y the same definition a least favorable distribution in f2 (a). Since W (F, dl) is constant within f2 (a), the condition is in this case equivalent to

P (d 2 { ~ , F ' ) = inf P (dz {~=, F).

Fef2 (a)

So far there is no general way of establishing the existence of such a test family of m a x i m i n power in a given situation, nor of finding it, if it exists. Thus, even after the introduction of HOEFFDINO'S criterion, it is n o t always possible to find a n o p t i m a l test. This is especially t r u e when ~Q is a non-parametric class of distributions, i.e.

when little or nothing is known of the functional form of F.

The m a i n results in finding optimal tests with respect to n o n - p a r a m e t r i c classes of distributions are due to LE~MA~N and STEIN (1949). T h e y found t h a t if m 1 contains all distributions i n v a r i a n t with respect to p e r m u t a t i o n s of the variables X1, X2, • •., X~, a n d w2 contains only one element, say F ' , the m o s t powerful test f a m i l y is a p e r m u t a - tion family with yJ (x) = / ' (x), the density function of F'. B y a theorem of HUNT a n d STEIn, t h e y extended the results to some cases where wz is a p a r a m e t e r space.

The results are v e r y i m p o r t a n t , as the invariance hypothesis tested is a very c o m m o n one, including as special cases the two-sample problem a n d the problem of s y m m e t r y .

(8)

E. RUIST,

Non-parametric hypotheses

However, the c o m p u t a t i o n a l work necessary for the tests obtained is prohibitive unless the sample is very small, say less t h a n 10. Further, the tests' p r o p e r t y of being uniformly m o s t powerful is retained only as long as eo 1 is a v e r y wide class of distri- butions.

Thus, in this case as well as in others where no o p t i m a l test has been found, it is necessary to use less powerful tests, and to decide which of two given tests is to be considered the better.

4. C o m p a r i n g t w o t e s t s or t e s t families

Of course, the criteria referred to above are capable of picking out t h e better of two tests as well as of finding a best test. Thus, if one test is uniformly more powerful t h a n another, it should generally be preferred to the second one. I f none of the tests is uniformly more powerful t h a n the other one, we have to a p p l y some other criterion, which is sometimes difficult if ~Q is not a parametric space. F o r several distribution- free tests, such comparisons h a v e been made, based on different criteria, the choice between these generally depending more upon what is possible t h a n on what is desirable. To understand the background of these comparisons, it is necessary t o r e m e m b e r under w h a t circumstances distribution-free tests are being used.

Besides being distribution-free, these tests h a v e often the p r o p e r t y of being v e r y simple to compute, and t h a t is the reason w h y t h e y are applied in two distinctly different situations:

1. W h e n the n u m b e r of observations is large, a n d computational ease m a y be w o r t h some loss of power. E v e n if the form of the distribution is unknown, most of t h e s t a n d a r d tests could h a v e been applied in this case, as the ~ (x) corresponding t o t h e m are often a s y m p t o t i c a l l y normally distributed for almost a n y •.

2. When the n u m b e r of observations is small, and the functional f o r m of the dis- t r i b u t i o n is unknown. The s t a n d a r d tests could not be applied, as the sample size is too small for the a s y m p t o t i c normality to be relied upon. This is v e r y often t h e case in statistical problems relating to the social sciences.

F o r the first case, it is of course of interest to see how m u c h power is lost b y n o t using t h e m o s t powerful test. I t is quite enough for this purpose to m a k e the inves- tigation under the assumption t h a t F is normal in some sense, and t h a t the n u m b e r of observations is so large t h a t a s y m p t o t i c results m a y be used. The comparison is ordinarily m a d e in terms of relative efficiency. The

e//iciency

of test ~ relative t o q~' is said to be e, if the power of ~ using n observations is in some sense equivalent to the power of q~', using e n observations. I f ~ ' is a uniformly m o s t powerful test, e m a y be called t h e absolute efficiency of q~. The first to compute t h e efficiency of a distribution-free test seems to have been COC~RAN (1937), who used a sign test (see below, p. 147) for testing Student's hypothesis, a n d found its efficiency (relative to the uniformly m o s t powerful unbiased t test) to t e n d to 2 / g as n increased infinitely. B y his definition the powers are equivalent if their slopes are equal in the near vicinity of to r This measure of efficiency is of course related to the N e y m a n - Pearson definition of an optimal test, referred to above, a n d m a y be called

asymptotic

local e//iciency

{BLoMQWST, 1950). The same measure was obtained for Wilcoxon's one-sample test {cf. p. 147) b y PITMA~ (1948), who found it to be 3/~. Thus, according to N e y m a n - P e a r s o n ' s criterion, Wilcoxon's test should be preferred to the sign t e s t when ~o 2 contains only normal distributions a n d the sample is large. I~TMA~ applied t h e same technique to Wilcoxon's two-sample test, whose a s y m p t o t i c local efficiency

(9)

ARKIV FOR MATEMATIK. B d 3 n r 9

relative to the t test was found to be 3 / ~ = 0.96 if w~ contained normal distri- butions, 1 for rectangular distributions, a n d 81/64 for a / ' ( 3 ) distribution, giving a rough indication of the applicability of the test for big samples.

I t was felt, however, t h a t the local efficiency of the distribution-free tests decreases when the sample size increases, and thus it was of interest to c o m p u t e the relative local efficiency of some tests also for small samples. VAN D~R VA~_RT (1950) c o m p u t e d the first t w o derivatives of the power functio~ of t h e Wilcoxon two-sample t e s t for 0) 2 normal a n d n _< 6, a n d compared t h e m with t h e corresponding measures for the t test with the same n u m b e r of observations. H e did not work out the value of the relative efficiency as defined above, b u t a p p r o x i m a t e calculations based on his results indicate t h a t t h e local efficiency of Wileoxon's t e s t relative to the t t e s t is a t least 0.95 for these sample sizes.

However, it seemed necessary to look not only for local properties of the tests, b u t for more general properties of the power functions. WALSH (1946, 1949) m a d e extensive calculations for different tests when ~ is a class of n o r m a l distributions, a n d when the power for different FEe% depends only on one p a r a m e t e r , say 0.

H e chose to introduce a new definition of equivalence of power functions and o b t a i n e d w h a t m a y perhaps be called

average e[ficiency.

According to WALSH'S definition, t w o power functions are equivalent if the integral of their difference over the range of 0 corresponding to 0) 2 is equal to 0. This definition is obviously closely related t o LINDLEY'S o p t i m a l i t y criterion, except t h a t WALSH gives the same weight to all I n the examples given above, the power efficiency in some sense was c o m p u t e d for v e r y restricted we, m a i n l y normal ones. Where, moreover, only a s y m p t o t i c expressions were derived, t h e y point mainly to the usefulness of the tests in case 1, a n d m a y serve as rough guides only for case 2.

W i t h o u t assuming normality or some other functional form of the distribution, v e r y few results on the power of distribution-free tests exist. HOEFFDING (1952) has shown t h a t under certain restrictions the tests belonging to t h e p e r m u t a t i o n family generated b y a function yJ (x) h a v e a s y m p t o t i c a l l y the same power against a certain m2 as the corresponding tests of the fixed limit family generated b y y~ (x).

Sometimes, it is possible to define a partition (Q (a)) such t h a t the power for c e r t a i n tests is constant within every ~ (a). I f so, the methods used for p a r a m e t r i c p r o b l e m s can be directly applied. LEHMAN~ (1953) found such a p a r t i t i o n a n d a corresponding class of tests for the two-sample problem. Thus, within this class the tests m a y be compared using, say, WXLSH'S method. I t is, however, rarely possible to find such a partition. Thus, there does not seem to be a n y a t t e m p t so far to develop a general m e t h o d for comparing two distribution-free tests when the reason for their use is t h e second one listed above. Curiously enough, little has been done even for t h e d e v e l o p m e n t of o p t i m a l tests except on a very general or a v e r y specified level.

This m a y be the reason for the common complaint t h a t too m u c h information is lost when using a distribution-free test. Therefore standard tests are applied even when the conditions for their strict validity are not satisfied. Very often, the informa- tion " l o s t " never existed, b u t there m a y be some information regarding t h e distri- bution, e.g. t h a t it is symmetric, unimodal, etc. This information is usually l o s t when using a distribution-free test, since m o s t of t h e m are distribution-free with respect to v e r y general classes of distributions. To increase the applicability of distri- bution-free tests, it seems i m p o r t a n t to find optimal tests with respect to m o r e restricted classes ~ , or in a n y case to r a n k given tests with respect to such classes.

(10)

E . R U I S T ,

Non-parametric hypotheses

F o r the comparison of tests, the m i n i m a x principle seems v e r y well fitted. The criticism against it (cf. p. 138) m a y to some extent be t a k e n into account. If HO~FF- DINO'S m e t h o d of arranging the elements of wz according to their distance from w 1 is used, no numerical weights h a v e to be given to them. I f moreover, Q is restricted t o include only those distributions t h a t are relevant, using all a priori information, t h e m e t h o d of looking primarily to the least favorable of these cannot be criticized as much as when ~2 is too wide. The' following section is devoted to the development of the theoretical set-up necessary for a comparison based on these principles.

5. The n e w c r i t e r i o n

I n consequence with the m i n i m a x principle, we m a y define a test ~ ' as being better in the m i n i m a x sense t h a n another test ~ with respect to a given weight function, if

sup r @', F) < sup r @, F).

F e ~ . F~.O

Of course, the best test according to this criterion is identical with the m i n i m a x t e s t defined on p. 138.

Specifically, we m a y compare all tests ~ E ~ a n d find the best one among these in the m i n i m a x sense. L e t us call it ~{,~. Then, b y definition,

sup r (~0 ~m~, F ) = rain sup r (~, F).

F e . Q 9 ~ F~.Q

Different weight functions will usually yield different ~v ~ . I f for two families ¢ ' a n d ¢ the ~'~'~ relative to a n y weight function belonging to a given class is b e t t e r t h a n the corresponding ~'~), ¢ ' m a y be said to be superior to ¢ in t h e m i n i m a x sense with respect to t h a t class.

As in the case of optimal tests, we m a y take regard to r I only, a n d compare tests of the same size. Thus, ~ m a y be said to be conditionally better t h a n qa in the m i n m a x sense if

sup r 1 (~0~, (F) < sup r I @~, F).

FEOJ I .~'G~ 2

Now, b y an obvious extension of HOEFFDING'S (1951) t h e o r e m 1 on o p t i m a l tests, we h a v e the following

T h e o r e m 2. I f for two tests of size ~,

(2) inf P ( d 2 1 ~ , F) > inf P ( d 2 ] ~ , F ) for all a,

F~bQ ( a ) F~.(2 ( a )

t h e n ~ is conditionally b e t t e r t h a n ~ in the m i n i m a x sense with respect to a n y a-ordered incomplete weight function.

The proof is analogous to HOEFFDINO'S: I f (2) is true, t h e n (3) sup W~ sup P (dl I ~ , F ) < sup W~ sup P (d 11 ~ , F)

a Fe.Q ( a ) a F~.Q ( a )

where Wa is the constant value of W (F, dl) for all F G ~ (a).

(11)

ARKIV F(JR MATEMATIK, B d 3 mr 9

(3) m a y also be written

sup sup W (F, dl) P (d 11 q~, F) < sup sup W (F, dl) P (d 1 [ ~ , F)

a F e ~ ( a ) a F ~ Q (a)

which is b y definition equivalent to

sup rl (~,', F ) < sup r~ (9~, F )

FEo~ Fern2

a n d T~ is conditionally b e t t e r t h a n ~ in the m i n i m a x sense.

Now, if for all c¢ the tests of a f a m i l y ~b' are conditionally b e t t e r t h a n those of another f a m i l y ~5 with respect to all a-ordered incomplete weight functions, t h e n ~b' m a y be said to be conditionally superior to ~b in the m i n i m a x sense w i t h respect to { ~ (a)}.

A weight function where W (F, dl) is a-ordered a n d where V for all F E~01 W ( F , d2)= 0 for a l l F f i o ) 3

:where V is a constant, m a y be called an a-ordered complete weight function. Using such a weight function, we have for all tests of distribution-free families

% (q~, F) = P (d~ I ~ , F) W (F, d2) = ~ V for all F E0)1.

F u r t h e r r = r 1 + r 2, and r i

((~,

F) = 0 for F(~o~ i. Thus, if ~b' is conditionally superior t o ¢ with respect to the corresponding a-ordered incomplete weight function, it

~ollows t h a t

sup r ( q ' , F) _< sup r ( ~ , F ) for a n y ~,

Fe~9 F e ~

i.e. q" is a t least as good as ~ in the m i n i m a x sense. Now, let

~(m)

be of size %. Then ~ ! m u s t be a t least as good as ~(m), a n d a f o r t i o r i q'(m) is also a t least as good as ~(~) for a n y a-ordered complete weight function. Thus, we h a v e established

T h e o r e m 3. I f two test families ~b' a n d ~ are distribution-free with respect to oJ1, a n d ff q)' is conditionally superior to (or equivalent to) ¢ in the m i n i m a x sense with respect to {~(a)}, t h e n ~b' is never inferior to ~b in the m i n i m a x sense with respect t o the class of a-ordered complete weight functions.

As a corollary to Theorems 2 a n d 3 we m a y state:

A test f a m i l y ~b' of m a x i m i n power with respect to {~2 (a)} is conditionally superior t o a n y other test family, with the exception of other test families of m a x i m i n power which, if t h e y exist, are equivalent to ~b' in the m i n i m a x sense, with respect to a n y a-ordered incomplete weight function. I f ~ ' is distribution-free, it is f u r t h e r m o r e superior (or equivalent) to a n y other distribution-free test family, but not necessarily t o a n y other test family, with respect to a n y a-ordered complete weight function.

The consequence of the theorems is t h a t for distribution-free test families, if (2) is satisfied for all :% ~b' is b e t t e r t h a n q} according to r a t h e r general o p t i m a l i t y criteria, whether it is adequate to compare the two kinds of wrong decisions or not. F o r test

(12)

£ . RUIST,

Non-parametric hypotheses

families t h a t are not distribution-free, the relation between the tests can be assured to hold only for each size ~ separately, b u t even so, this w a y of comparing two tests seems attractive. Thus we m a y very often reduce the problem of comparing tw~

test families to the following steps:

a) fix co x and cog.,

b) find a partition {/2 (a)} of eos such t h a t F 1 m a y be said to be further a w a y from Wl t h a n F s if

F1e/2(al) ,

FsE/2(as), and a I > a s,

c) for arbitrary fixed a and ~, find out which test has the bigger lower limit of it~

power function, and

d) see if this relation is independent of a and ~.

I t may, of course, happen t h a t one test is better for some a or a, and worse f o r others. I n t h a t case, the weight function has to be specified numerically in o r d e r to get a discrimination between the tests, and the use of the minimax principle m a y be more severely criticized. I n the following sections, however, we shall be able t o show t h a t there are interesting cases where the relation between the tests is inde- pendent of a and a, and thus a fairly unambiguous choice between them can be made.

Before going over to the examples it is important to notice t h a t the properties of superiority a n d conditional superiority depend rather heavily on the choice of /2.

If /2 is very wide, the test family ¢ ' m a y be superior to ¢ , but i f / 2 is narrowed down, corresponding to an increase of a priori information, ~b m a y well become superior to ~b'. If this is the case, it would be of interest to find o u t when this change takes place.

6. Location tests

I n the remaining part of the paper, the general principles, given in the previous.

section, of comparing two test families, are applied to the problem of testing the~

value of the median ~z in a one-dimensional distribution. The discussion is confined to the one-sided case of testing bt = 0 against ~t > 0. The sample is assumed to consist.

of n independent observations drawn from the population to be tested.

I n order to be able to introduce successively increasing information (by decreasing:

/2) symmetrically into eo 1 and 092, we shall use the following notation:

Let F denote the distribution of the one-dimensional variable, and (X) the cumulative distribution function belonging to it, M 1 the class of all one-dimensional distributions with [z = 0, M s the class of all one-dimensional distributions with ~z > 0,

/2' a class of one-dimensional distributions being considered, e.g. distributions'.

with symmetrical density functions, co 1 = M 1 N /2'

eos = M s N /2' /2 = cox U co2.

B y different choices o f / 2 ' the problem m a y be given different degrees of generality..

I f / 2 ' contains only normal populations, it is the c/assical Student problem of testing the mean of a normal population with unknown variance. For changing/2', or equiva- lently, changing degrees of information, the order between the tests according to.

o u r optimality criterion will probably change. We shall be concerned with a com- parison of tests at a gradual decrease o f / 2 ' from the most general case to more spe- cific ones.

(13)

ARKIV FSR M_~TEMATIK. B d 3 nr 9 Like HOEFFDING, we shall define a p a r t i t i o n of 0)2, a n d s a y t h a t of t w o distribu- tions in eo~, t h a t one is f a r t h e r a w a y f r o m 0) 1 which h a s t h e bigger a = ½ - F (0).

Besides being m a t h e m a t i c a l l y simple, this seems to be a r e l e v a n t measure in several kinds of applications. F ( 0 ) will be d e n o t e d b y q. F o r convenience, we shall write 0)z (q) for Q (½ - q), a n d q-ordered for (½ - q)-ordered w e i g h t functions.

I n t h e following c o m p a r i s o n of tests, r a n k tests will p l a y a p r e d o m i n a n t part, a n d it seems a p p r o p r i a t e first to e x a m i n e this t y p e of tests.

7. R a n k test families

A r a n k test f a m i l y m a y be defined in the following way. L e t t h e sample p o i n t be x = {x,} = {e~y~} (i = 1, 2 . . . n),

w h e r e Yi = ] xi ] a n d e i = sgn x i, We shall assume t h a t t h e d i s t r i b u t i o n s are continuous, so t h a t P {x i = 0} = 0, a n d P {y~ = yj} = 0 for i ~= j. T h e o b s e r v a t i o n s m a y t h e n be o r d e r e d in such a w a y t h a t

Y l < Y~ < "'" < Yn"

I f ~v (x) is a f u n c t i o n of {el} only, t h e fixed limit test f a m i l y a n d t h e p e r m u t a t i o n test

~amily g e n e r a t e d b y ~v (x) coincide a n d constitute a r a n k test f a m i l y . A n y sequence

~1, ez . . . . , e~ defines a region in t h e sample space. There are M = 2 n such basic cells (WALs~, 1949). L e t us d e n o t e t h e m b y Bj (~ = 1 . . . M).

F o r distributions with density functions s y m m e t r i c a l a r o u n d 0, P { x e B j I F e 0 ) ~ } = T ~ 1 for a n y 7".

T h u s , all r a n k test families are distribution-free in t h e class of such s y m m e t r i c a l distributions.

A n y ~v defines a n ordering of the Bj, s a y Bt,, Bt,, . . . , B t M, in the sense t h a t

~) (Btl) ~ (Bt,) ~ - . . . ~ ) (BtM). A r a n k test f a m i l y is c o m p l e t e l y defined b y t h e sequence tl, t2, . . . , tM. Thus, a n y strictly increasing f u n c t i o n of ~v will g e n e r a t e t h e same r a n k test f a m i l y as %

A n a l t e r n a t i v e w a y of describing certain r a n k test families was p o i n t e d o u t b y W~LSH (1949), w h o a r r a n g e d t h e observations in increasing order of m a g n i t u d e . L e t us call t h e $th o b s e r v a t i o n in t h a t order x(,), so t h a t

~(1) < x ( 2 ) < . . . < X ( n ) . T h e n he suggested t h e tests

1 if m i n [½ (xc) + x(~)), ½ (x(k) + x(l)), ...] > 0 (x) = 0 otherwise

where i, ~, k, l, ... are d e t e r m i n e d so as to give t h e desired size to t h e test.

WALSH s h o w e d t h a t an increase in one or several of t h e i, ~, k, l , . . . a m o u n t s t o increasing t h e region where ~ ( x ) = 1 b y one or m o r e basic cells. This means,

(14)

E. RUIST,

Non-parametric hypotheses

in t h e p r e s e n t n o t a t i o n , t h a t to each test f a m i l y of WALSH'S t y p e c o r r e s p o n d s an ordering of t h e Bs, or equivalently, a f u n c t i o n ~o (x). T h e opposite is n o t true, a t is easily seen b y an e x a m p l e : T a k e for simplicity n = 5 . Then, if

5 10

~p (x) = ~ ~, i, for c = 4 we get P {~o (x) = c} = 0 a n d :c = 3-~. Here, ~o ( + + + ) = i = l

= ~ 0 ( + + + + - ) = 5 . I n order to get q 0 ( x ) = l for these sequences also i n WALSH'S f o r m u l a t i o n , we can a t m o s t require rain [½ (x(2) + x(~)), x(3)] > 0 b u t that.

is also satisfied b y t h e sequences ( - + + + - ), a n d ( + - + - + ) w i t h ~0 (x) = 3~

a n d b y ( + - + + - ) with ~0 (x) = 1. WXLSE'S f o r m u l a t i o n of t h e tests is, h o w - ever, v e r y simple in n u m e r i c a l applications a n d could p r o f i t a b l y b y used w h e n - ever possible.

T h e m o s t powerful r a n k test f a m i l y for t h e case when 0) 3 contains one d i s t r i b u t i o n F ' o n l y is g e n e r a t e d b y the f u n c t i o n

~p(B~) = P (xe Bj [ F'}

(cf. LEHlVIANN a n d STEIN, 1949, a n d

HOEFFDING,

1951).

W h e n working exclusively with r a n k tests we m a y i n t r o d u c e slightly m o d i f i e d admissibility concepts. Thus, a test is :c-admissible among rank tests if no u n i f o r m l y m o r e powerful r a n k test exists, a n d it is admissible among rank tests if no u n i f o r m l y b e t t e r r a n k test exists. These t w o concepts are, however, equivalent. I t is t r i v i a l t h a t admissibility a m o n g r a n k tests implies :c-admissibility a m o n g r a n k tests, a n d t h e converse is p r o v e d b y

T h e o r e m 4. A r a n k test t h a t is :c-admissible a m o n g r a n k tests is also admissible a m o n g r a n k tests for a n y w e i g h t function.

P r o o f : L e t ~0~, be :c-admissible a m o n g r a n k tests. N o w suppose t h e t h e o r e m were n o t true. T h e n there would exist a r a n k t e s t ~0~, t h a t is u n i f o r m l y b e t t e r t h a n '

a) Suppose first t h a t :c~ > :cl- B u t t h e n

P (d 2 [ q0~,, F ) > P (d2 [ q0~,, F) for all F E co:

a n d T~, c a n n o t be u n i f o r m l y b e t t e r t h a n q:,.

b) I f :c1:0~2, a n d

P (as I ~0=,, F) > p (d~ I ~'.., F)

for all F E o l , for all F E co2,

with t h e i n e q u a l i t y sign t r u e for at least one F C Q. B u t since t h e tests a r e similar a n d of t h e same size,

P (d 21 q~a,, F) = P (d~ [ ~0~,, F ) = :cl, for all F e o l .

Thus, t h e i n e q u a l i t y sign m u s t hold for some F E co2. B u t t h a t w o u l d i m p l y t h a t ~0~, is u n i f o r m l y m o r e powerful t h a n q0~,, which is impossible, since q0~, is a-admissible a m o n g r a n k tests.

(15)

ARKIV f O R MATEMATIK. B d 3 n r 9

c) If ~ < ~1, we m a y compare ~ with the test of size gl f r o m the s a m e family. As the power with respect to a n y absolutely continuous F E co2 is a n increasing function of :¢,

P(d2]qD~ ~, F)<P(d2]q)~ ,, F) for all F ~eo 2.

Thus, if

P (d2 I q~,, F) _> P (d 2 [ q~',, F) for all F E o)2

~0~, would be uniformly more powerful t h a n ~v~,, which is impossible b y defini- tion. This completes the proof.

A family of r a n k tests, all m e m b e r s of which are admissible a m o n g r a n k tests, m a y be called rank-admissible. Although we m a y perhaps, because of its simplicity, accept a r a n k test family t h a t is not admissible, we would hardly do so, were it n o t rank-admissible.

F o r later reference, we list a few r a n k test families below:

1. The sign test/amily, a p p a r e n t l y first mentioned b y :FISHER (1925), and inves- tigated b y I ) I x o ~ and M o o d (1946). I t is generated b y ~(x) = ~ ei.

i = l

I n a sense, the sign tests are quite crude, as ~o has the same value for several B~..

I t will be convenient in this connection to introduce a n o t h e r notation. L e t T~ be t h e t o t a l i t y of all Bj with ~fl (Bj) = k. To point out which Bj E T k, t h e y will occasionally be n u m b e r e d Bkl , Bg~ . . . B~,%. Since P ( T ~ [eOl) will be r a t h e r big, the tests depend heavily on the outcome of the r a n d o m experiment determining t h e decision w h e n T(x) = p . HEMI~LRIJK (1950 a, b) pointed out, however, t h a t a sign test m a y be regarded as basic to every other test in this situation. Thus, a n y test for the m e d i a n of a s y m m e t r i c a l distribution m a y be considered as a combination of a sign t e s t a n d a conditonal two-sample test, testing whether or not the positive and the negative observations come from identical populations. The functions generating these tests m a y be called ~Pl (x) for the sign tests a n d ~02 (x) for the two-sample tests. Then, t h e combined test family is generated b y a function ~v (~0 I, ~02). HEMV.LRIJK suggested a rule for the construction of this function, b u t gave no justification for it. H o w e v e r , in analyzing different tests, it m a y sometimes be convenient to discuss separately t h e choice of ~o 2 and the construction of ~v. As an example of this, w e shall describe below three test families which are based on the same ~%, b u t where the construction of ~o is different. The c o m m o n ~02 is ~P2 = ~ ei i, and the test families are:

2a. Wilc, oxon's one-sample test /amily, proposed b y WILCOXON (1945). I t h a s

~o = ~02. This test m a y be regarded as the r a n k analogue of the p e r m u t a t i o n t e s t family generated b y ~o (x) = ~, suggested b y FISHIER (1935).

2b. The Wileoxon-Hemelri~k test/amily. The principle of constructing ~o given b y HEMELmJX is easily expressed in the present notation. L e t B k i be n u m b e r e d so t h a t for all k

W2 (Bk ~) > ~V2 (Bk ~) > ' - " > ~v~ (B~ ~ ) . Then, for k = ~ l (B~) > 0 ,

1/i if ~02 (B~) >Y~2 (B~(~+~))

~p (Bz~) = 1/(i + r) if ~2 (Bki) =~V 2 (Bk(i+l)) . . . ~2 (B~(~,r)) >~v2 (Bk(l+r+l)).

F o r k __< 0, ~ is not defined.

(16)

~. RUIST,

Non-parametric hypotheses

2c. The Wilcox(m-sign test/amily.

Here, ~0 = n2~01 + yJ~ which means t h a t

~fl(Bki ) >

>~0(B~j) if k > l, a n d t h e value of ~% is of relevance if k = 1 only. This is a sign test f a m i l y , modified so as to m a k e t h e influence of the r a n d o m e x p e r i m e n t smaller.

To m a k e the difference b e t w e e n test families 2 a - c clear, we give a n e x a m p l e of t h e orders, defined b y t h e different % of t h e Bj when n = 5.

Order among

B i

v2x =~]e~ ~p2 = ~] e4 i 2a 2b 2c

T s : B a x + + + + + 5 15 1 1 1

T a: B 3 ~ - + + + + 3 13 2 2 , 3 2

B 3 z + - + + + 3 11 3 4 , 5 3

B a a + + - - + + 3 9 4 , 5 6 4

B a ~ + + + - + 3 7 6 , 7 7, 8 , 9 5

B a s + + + + - 3 5 8, 9 , 1 0 10 6

T a: B a x - - + + + 1 9 4 , 5 2 , 3 7

B a s - + - + + 1 7 6 , 7 4 , 5 8

B ~ 3 - + + - + 1 5 8, 9 , 1 0 7, 8 , 9 9, 10

B x , + - - + + 1 5 8, 9 , 1 0 7, 8 , 9 9 , 1 0

e t c .

Given a r a n k test family, F ' is a least f a v o r a b l e distribution in w~ (q) relative to it, if

P(Bq]F')= rain ~ P ( B q I F )

for all s = l , 2 , . . . , M .

t = l F~eo2(q) i = 1

T h u s , P ( B s ] F ' ) should, as f a r as possible, be a decreasing f u n c t i o n of ~0 (Bs).

T h e value of t h e m i n i m u m p o w e r depends on t h e restrictions expressed b y t h e size of ~ ' . U n d e r all circumstances, however, for all F E 092 (q), a n d for all

m k n - - I t n + k

Bkf e Tk, ~ P(BktIF) =q 2

( l - - q ) z . This means t h a t if for a r a n k test f a m i l y

t = l

g e n e r a t e d b y a f u n c t i o n ~0 (YJl, ~02) w i t h YJ2 (Bkl) --> yJ~ (Bk2) >-- "'" --> ~02

(Bkmk),

~ P (BkllF)

is m i n i m i z e d b y t h e same d i s t r i b u t i o n F ' for all s a n d k, t h e n F ' is a least f a v o r a b l e d i s t r i b u t i o n in eo z (q) relative to all tests w i t h t h e same ~P2, irrespective of t h e f o r m of ~o. Thus, t h e search for a least f a v o r a b l e distribu- t i o n c a n m o s t p r o f i t a b l y s t a r t b y considering each Tk separately.

As t h e r a n k tests m a k e no use of the absolute m a g n i t u d e s of t h e y's, a n y topo- logical t r a n s f o r m a t i o n of t h e y scale does n o t affect t h e test. B y choosing a suitable t r a n s f o r m a t i o n t h e calculations m a y be m a d e easier. F o r instance, t h e positive side of t h e d i s t r i b u t i o n m a y be m a d e r e c t a n g u l a r over t h e range (0, 1 - q). T h e same t r a n s f o r m a t i o n is t h e n applied t o t h e absolute values of t h e negative side. As in this w a y t h e positive side is m a d e identical f o r all distributions with c o m m o n q, o n l y t h e n e g a t i v e side will be of relevance in t h e following discussion. T h e transfor- m a t i o n is defined b y

x' = I F (J x l) - q] sgn x.

T h e t r a n s f o r m e d c u m u l a t i v e d i s t r i b u t i o n f u n c t i o n b e c o m e s

(17)

ARKIV FOR MATEM.~TIK. B d 3 nr 9

IF, I-F-'

0

+q

T h e d e n s i t y f u n c t i o n is m o s t easily expressed in t e r m s of x :

lr (:~') = 0

d F (x) / (x) d ~ ' ( - z )

1 [ 0

l ( - x )

for

x' < q-- I

q - - l _ x ' < 0

0 < _ x ' < l - q x ' > l - q .

for

x' < q - 1

q - - l _ < x ' < 0

0 < _ x ' < l - q x'>_ 1--q.

I t should be n o t e d t h a t f o r q = ½, i.e. F Eco I, F T is f o r all s y m m e t r i c a l distribu- t i o n s r e c t a n g u l a r ( - ½, ½). Thus, if ~ ' contains o n l y s y m m e t r i c a l distributions, co 1 is r e d u c e d to one element.

As an e x a m p l e of t h e effect of t h e t r a n s f o r m a t i o n for F E co2, let us r e g a r d t h e C a u c h y d i s t r i b u t i o n w i t h F ( x ) = ½ + 1 - arc t g ( x - a) where q = ½ - 1 are t g a, a n d F E c % ( q ) . F o r q - l < _ x ' < 0 , we g e t

F -1 ( q - x ' ) = a +

t g

( q - x ' - ½)

• F T (x') = ½ - - arc tg (2 a + tg 7~ (q - x' - ½)) 1

7~

and, after some calculation,

/T (X') = [1 + 2 a (a -- a cos 2 7~ x' -- sin 2 ~ x')] -1.

This f u n c t i o n has a m i n i m u m for

x ' = q - 1

2 , a n d is s y m m e t r i c a l a r o u n d this p o i n t w i t h / r ( q - 1) = / r (0) = 1.

F o r t h e n o r m a l distribution, t h e explicit expression for F r

(x')

is difficult to obtain, b u t b y expressing [ r

(x')

in t e r m s of x, it is possible to o b t a i n its value in some points. Thus, for N(½, a2), where a S has t o be d e t e r m i n e d so as to m a k e F (0) = q,

/ r (x') = exp{ a~ } for q - 1 ~ x ' < 0 .

As

x ' = q - 1

corresponds to x = - ~ a n d x ' = 0 to

x=O,

we get

/ T ( q - - 1 ) = O

a n d [ r (0) = 1.

d / r ( X ' ) in t e r m s of

x,

a n d g e t W e c a n also express

dx'

d/r(x') dx' c°nstant'exp ( i

• 2 a - - ~ ( x ~ + 3 x + ~ )

}

for q - l _ x ' < 0 .

(18)

~.. RUIST,

Non-parametric hypotheses

Thus, for x" = q - 1, or x = - ~ , t h e derivative is infinite. As x and x' increase to 0, it is first decreasing, then increasing, b u t is always positive.

Generally, for a n y distribution with symmetrical a n d unimodal density function,

/ ( x ) / / ( - x)

can never exceed 1 for x < 0, as long as q < ½, a n d accordingly/T(X')

<_ 1

for

x' < O.

A large group of t r a n s f o r m e d distributions h a v e s y m m e t r i c a l equivalents in ~ ' , as is shown b y

T h e o r e m 5. To every

J(z')

satisfying

(4)

J(x')

= 0 for

x'<_q-1

strictly increasing~ for q - l < x ' < 0

< x ' + l -

q J

= x ' + q for 0 < x ' < 1 - q

= 1 for x ' > l - q

there corresponds a t least one s y m m e t r i c a l distribution for which F r ( x ' ) = J ( x ' ) . P r o o f : L e t F h a v e /z = ½. Then, according to the s y m m e t r y ,

(5)

F ( x ) = I - F ( 1 - x ) .

Now, t a k e fying

in the interval (0, ½) an a r b i t r a r y increasing function

F 1 (x),

saris- F1 (0) = q a n d F 1 (½) = ½.

Then, define F (x) successively in the following w a y :

F (x) =

F I (x) 1 - F 1 (1 - x ) J [ - F 1 ( - x ) + q ] J [ q - 1 + F 1 (1 + x ) ] 1 - J [ q - F1 ( x - 1)]

1 - J [q - 1 + F I (2 - x)]

etc.

for 0 _ < x < ½

½ < x < l - ½ < _ x < 0 - l _ < x < - ½

l < x < 1 ½ 1 ½ < x < 2

using the recursive f o r m u l a

J [ q - F ( - x ) ]

for x < 0

(6)

F ( x ) = 1 - F ( 1 - x )

for x > 0 .

B y construction,

F(x)

satisfies the s y m m e t r y condition (5). I t remains to show t h a t

F(x)

is a cumulative distribution function, i.e. t h a t it is non-decreasing, a n d t h a t F ( - oo) = 0 , and F ( c ¢ ) = 1.

(19)

ARKIV FSa MATEMATIK. Bd 3 nr 9 I n the recursive formula (6), it is seen t h a t F (x) is increasing within a n y interval of the left-hand side if it is increasing in the corresponding interval of the right-hand side. Thus, since F t (x) is increasing, F ( x ) is increasing within a n y interval where the formula is applied. B u t we must also demand t h a t the value of F (x) for x > 0 within one interval (i, i + 1) shall be greater t h a n t h a t obtained in the previous interval (i - 1, i), thus

(7) F ( x ) > F ( x - 1 ) f o r i _ < x _ < i + l , i = 1 , 2 , . . . B u t according to (6),

F ( x ) = 1 - $'(1 - x ) = 1 - J [ q - F ( x - 1)].

P u t x' = q - F ( x - 1). Then (7) becomes

1 - J ( x ' ) > q - x ' for x' < 0 or J ( x ' ) < x ' + l - q .

This condition is satisfied for q - 1 < x' < 0 according to (4), and F (x) is increasing in the whole of the interval 0 < F ( x ) < 1. Thus, F ( x ) is a cumulative distribution function.

A consequence of theorem 5 is t h a t to a n y class g2T of FT with cumulative distri- bution functions satisfying (4), there corresponds a class g2' of symmetrical distri- butions, and to a certain extent we m a y in the following discussion consider the classes g2' and g2~ interchangeably.

8. C o m p a r i s o n o f l o c a t i o n test s

A. We are now in a position to make the comparison between test families outlined in section 6 above. For g 2 ' = the class of all absolutely continuous distributions, HOEF]~DINO (1951) proved t h a t the sign test family is of maximin power with respect to {0) 2 (q)}. Thus, according to the corollary to our Theorems 2 and 3, it is condition- ally superior to a n y other test family and superior to a n y other rank test family with respect to q-ordered weight functions, in both cases with the possible exception of other test families of maximin power. However, if there exists another family of maximin power, its tests m a y be uniformly more powerful and uniformly better t h a n the sign tests, in which case the sign test family will not be admissible or even rank-admissible. HOE•FDING did not discuss this possibihty. We shall study it in subsections B and E below.

B. Let us now reduce g2' to distributions with symmetrical density functions a n d call this class D~. Then the mean of the distribution is, if it exists, equal to the median, and the tests m a y be considered as tests of the mean. We shall see t h a t the situation is not changed from A above. For let FEw1 and GEo:~(q) be two distri- butions with density functions

1-2q ( q '~[1~11

/ (~) - 2 (1 - q) \ V - - q / q ~l[x]l g (x) = (1 -- 2 q) ~,i -- q/

(20)

E. RUIST,

Non-parametric hypotheses

where Ix] means the largest integer

<_x,

also for x < 0. Thus, ] [x]] = [] x l] for x > 0, and for all integer x, and

[I z I]

+ 1 for non-integer x < 0.

Now, since P {x = an integer} ---- 0, we get t I~l g = 2" q"-

(x,)

(l-q)"+

where n_ and n+ are the number of negative and positive among the n observations.

F o r fixed n and q, this is obviously a function of n+ only. B y N e y m a n - P e a r s o n ' s lemma, a test family generated b y ~(x) = n+, or, equivalently, b y y~(x) = n + - n , is most powerful for the case when wx = F and w2 (q) = G. This family is the sign test family. As a special ease of HOEFFDr~G'S (1951) theorems 1 and 2, a similar test 9o" is of minimax risk r 1 with respect to a n y q-ordered incomplete weight func- tion, if for a n y fixed q it is most powerful for a case where eo 1 = F ' and co 2 (q) = F~', and

P (d219O~, F) _> P (d, [ 9~, F'q') for all F e e0~ (q).

I n other words,

F'~'

should be a least favorable distribution in co 2 (q) to 9O~. As the equality sign holds true for all tests of the sign test family, it is, according to our previous results, conditionally superior or equivalent (since there m a y be more t h a n one test family of minimax risk) to a n y other test family and superior or equivalent to a n y other r a n k test family, with respect to a n y q-ordered weight function.

The transform of G, as well as of the distribution used b y I-IoEFFI)ING (1951, p. 91) tO demonstrate the optimality of the sign test family, has the density function [r(X') = ~ for q - 1 _< x' < 0. For reference purposes it m a y be convenient to give a name to this distribution, and we are going to call it the

bi-rectangular

distri- bution. I t gives equal probabilities to all sequences B k ~ belonging to one T k. Thus, in the case when co2 contains this distribution only, all tests with ~v = k ~ + ~P2, where k > max [ Y~2 [, are equivalent a n d most powerful rank tests. Let us call test families generated by such functions

modi]ied sign test ]amilies.

The tests belonging to such a family are essentially sign tests, b u t the random order between the Bk~

within a T k t h a t is characteristic for the sign test family is replaced b y a more or less complete order. The Wilcoxon-sign test family, described above, is an example of a modified sign test family.

We shall now state two i m p o r t a n t theorems on modified sign test families. These theorems will be fundamental to the following discussion.

T h e o r e m 6. If the bi-rectangular distribution belongs to ~o 2 (q), no rank test family t h a t is not a modified sign test family can be superior in the minimax sense to the sign test family with respect to all q-ordered weight functions.

T h e o r e m 7. The sign test family is not rank-admissible if there exists a modified sign test family, the tests of which are uniformly more powerful t h a n those of the sign test family. Such modified sign test families are all equally good in the minimax sense and also as good as the sign test family, b u t superior to a n y other rank test family.

Références

Documents relatifs

For verification of compatibility, the behavior of interacting components, at their interfaces, is modeled by labeled Petri nets with labels representing the requested and

They live in a lovely cottage in Stratford-upon-Avon, England. The house is not very big but my grandmother keeps it clean and tidy. It has only got one floor and the attic. In

We will use the characterization of rank-one convexity through Jensen’s inequality for laminates [8] so that we are interested in determining the exact range for the constant c so

These results are deduced by a general characterization theorem for families of subsets of a Hausdorff locally convex topological vector subspace of.. An

Parametric partial differential equations, partial differential equations with random coefficients, uniform convergence, adaptive methods, operator equations.. ∗ Research supported

In medical school, we learned that health services play a relatively small role in a person’s health status and that the social determinants of health, such as employment,

In recent focused discussions with members of support organizations in Melbourne, Australia, family carers of people with dementia emphasized that a great deal of stress

We mention current limitations that we face with these more advanced uses, in which we need to convince the type checker that the type families we define satisfy certain properties?.