A NNALES DE L ’I. H. P., SECTION B
M. A. S TEPHENS
Components of goodness-of-fit statistics
Annales de l’I. H. P., section B, tome 10, n
o1 (1974), p. 37-54
<http://www.numdam.org/item?id=AIHPB_1974__10_1_37_0>
© Gauthier-Villars, 1974, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section B » (http://www.elsevier.com/locate/anihpb) implique l’accord avec les condi- tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti- lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
Components of goodness-of-fit statistics
M. A. STEPHENS Ann. Inst. Henri Poincaré,
Vol. X, n° 1, 1974, p. 37-54.
Section B : Calcul des Probabilités et Statistique.
SUMMARY. - A test that a random
sample
comes from acompletely specified
continuous distribution can be reduced totesting
that agiven
set of n values Xi is
uniformly
distributed between 0 and 1.Suppose Fn(x)
is the E. D. F.
(empirical
distributionfunction)
of the xi, i. e.F~(x) = (num-
ber of
x~ x)/n,
and considerYn(x)
=Fn(x)-x.
Several well-knowngood-
ness-of-fit statistics are based on
integrals involving y~(x).
In a recentpaper, Durbin and Knott
suggested
apartition
of the Cramervon-Mises statisticW2
into an infinite set ofcomponents.
These are related to the coefficients whenYn(x)
isexpanded
as a Fourier sine series in the inter- val0,
1. Thecomponents
werecompared,
forasymptotic
power, with the entire statisticW~.
This idea is now extended in various ways. In
particular
the Watsonstatistic
U2
and theAnderson-Darling A2
arepartitioned
and the compo- nents examined.U2 depends
on a Fourier Seriesexpansion
andA2
onassociated
Legendre
functions.Components
ofW2
andU2
have interest-ing distributions,
andasymptotic
power studies of a moregeneral
naturethan those of Durbin and Knott
bring
outproperties
of thesecomponents
and also of the parent statistics.’
RESUME. - Un test
qu’un
echantillon aléatoireprovient
d’une distribu- tion continue donneepeut
être réduit au testqu’un
ensemble de n valeurs x;est uniformément distribue entre 0 et 1.
Supposons
queF"(x)
est la distri-bution
empirique
des xi, c’est-à-dire(nombre de xi x)/n,
et consi-dérons
yn(x)
=Fn(x)-x.
Plusieursstatistiques
bien connuesd’adequation
sont basees sur des
intégrales comprenant
Dans un articlerecent,
Durbin et Knott ontsuggéré
unepartition
de lastatistique W2
de Cra-mer-von-Mises en une infinite de
composantes.
Celles-ci sont reliees aux coefficients deyn(x)
en series de Fourier en sinus sur l’intervalle0,
1. LesAnnales de 1’Institut Henri Poincaré - Section B - Vol.
37
composantes
ont étécomparées
avec lastatistique W2 asymptotiquement.
Cette idee est étendue dans cet article de diverses manières. En
parti-
culier les
statistiques
de WatsonU2
et deAnderson-Darling A2
sontdécomposées
et leurscomposantes
examinees. Lastatistique U2 depend
d’un
développement
en série deFourier,
tandis queA2
estdéveloppée
enfonction de
Legendre
associees. Lescomposantes
deW2
et deU2
ont desdistributions
interessantes et des formesasymptotiques
d’une natureplus générale
que celles de Durbin et Knottqui
fontapparaître
lesparticularites
de ces
composantes
et desstatistiques apparentées.
INTRODUCTION
In a recent paper, Durbin and Knott
(1972) partitioned
the Cra-mer-von Mises statistic
W2,
used for tests ofgoodness-of-fit,
into an infi-nite set of
components.
For the case when thehypothesised
distribution function iscompletely specified,
thesecomponents
have theinteresting property
thatthey
all have the samedistribution,
to within a constant.Durbin and Knott
(henceforth
referred toby DK) suggested
the use ofsome of the
components
as teststatistics,
andinvestigated
theirasympto-
tic powers,compared
withW2,
for the case when the distribution testedon
Ho
isN(0, 1),
and when thefamily
of alternatives has a different mean or a different variance.They
alsosuggested
how theAnderson-Darling
statistic
A2
could besimilarly partitioned
and gave some power results forA2
and the Watson statisticU2.
These statistics are all defined below.The purpose of this paper is to
provide
various extensions of the DK results with the intention of furtherilluminating
theproperties
of thecomponents
and the entirestatistics, particularly
withrespect
to power.The extensions are
a) U2
is alsopartitioned
intocomponents,
which also have identical and well knowndistributions;
b)
theasymptotic
power studies forW2, U2
andA2
are extended tothe more
general
alternative ofchanges
in both mean and variance of the normaldistribution,
andcomponents
ofU2
andA2
areincluded;
c)
power studies(this phrase always
refers toasymptotic power)
forthe three statistics are
given
for a test for theexponential
distribution.d)
A method ofshowing
power on contour maps is introduced as avery
straightforward
way to compare statistics.Annales de 1’Institut Henri Poincaré - Section B
COMPONENTS OF GOODNESS-OF-FIT STATISTICS 39
The results of these studies
bring
out the relative merits ofW2
andU2
in
detecting primarily
mean shifts or variance shiftsrespectively,
and alsothe overall
superiority
ofA2 (though
results aregiven
tosuggest
that the results forA2
in DK Table4,
varianceshift,
are inerror). They
also revealthat,
among thecomponents,
the firstcomponent
ofU2
is better than anycomponent
ofW2
in a test fornormality against
the realistic alternative which allows for a shift in both mean and variance.The extensions above involve coefficients which enter also into related work on tests for
normality
andexponentiality
whenparameters
in the distribution mustbe
estimated from thedata;
this isbriefly
described inSection 10.
1. THE TEST STATISTICS
Throughout
the paper, we shall followclosely
the DK notation. The definitions of the test statistics are :where the
original
observations are Yt, Y2 ... Ynfrom,
onHo,
a distribu-tion
F(y) ;
x~ =F(y;),
andFn(x)
is theempirical
distribution function(EDF)
of the xi. The
integrals
are all between limits 0 and1;
the statistics often have a suffix n, as inDK,
but this will be omitted. For comments on these statistics see, e. g.DK,
and also Pearson andHartley (1972), Stephens (1970).
2. COMPONENTS OF
UZ
Define
~/M(F~(x) - x),
0 x _ 1. ForW2,
DKexpand yn(x)
as a Fourier sine series in the interval
0,1;
ifwhere
Vol.
Further,
Define also
The last
equality
in(1)
isgiven by
DK and the result(3)
follows in the sameway.
For
U2, expand Yn(x)
as a Fourier series into both sine and cosine compo- nents :then
It then follows that
where E* will
always
indicate summation over even valuesonly
of the index.Il we now define components
and
Annales de 1’Institut Henri Poincaré - Section B
COMPONENTS OF GOODNESS-OF-FIT STATISTICS 41
we have the
parallel
results :DK observed that the
S~
all have the same nulldistribution;
all theR~,
for
even j,
also have the same null distribution.Further,
both these distri- butions arise when n random unit vectors are summed.3. REPRESENTATION ON A CIRCLE
This emerges
clearly
whenthe x;
are recorded around a circle.Suppose OS,
OT are
rectangular
axes for coordinates s, t, and let thecircle,
centre0,
radius 1 cut OS in A =(1, 0)
and B =( - 1, 0).
Markpoints Pi
on thecircle so that the
polar
coordinate0;
= nx;; 0 is measured anticlockwise from OS as initial line. The arclength APi
is then also when the xtare
uniformly
distributed between 0 and 1 thepoints P;
are uniform around the arc AB. Let the vectorsOP,,
i =1, 2,
..., n have vector sum(resul- tant) R 1, length R 1.
ThenS
1 = E cos is the component ofR
1 on thes-axis, and T
1 = E sin is the component on the t-axis.Suppose
now each0;
ismultiplied by
aninteger j,
and then thepoints,
now called
P;,
marked on the circle. Let theresultant
of vectorsOP;
becalled
length R~; clearly S~
andT~
are thecomponents
of on the twoaxes. If the xt were
originally uniform,
the setP;
will beuniformly
distri-buted around the
appropriate portion
of arelength jn;
it is then clear that allS~ for j
even, have the samedistribution,
that ofS2 ;
it is not soeasily
seen thatS~, for j odd,
also has thisdistribution,
but this isproved by
DK.Also, Tj,j
even, has the distributionof S2,
but notT J when j
isodd,
because the
original
observations were on a semicircle andsymmetry
is lost. It follows thatR2j,
allj,
has the same distribution asR2.
The twobasic
distributions,
those ofR2
and ofS~ (or S2)
are therefore those of thelength
of the resultant of n unit vectorsrandomly
oriented around acircle,
and of a component of this resultant on any
arbitrary
axis. These distri- butions have arisen in manyapplications,
and that ofR2
inparticular
hasbeen much studied.
Significance points
forR2
aregiven
in Greenwoodand Durand
(1955).
Goodapproximate
values for bothR2
andS2
statis-tics are obtainable from tables in
Stephens ( 1969).
Forreasonably large
n, is wellapproximated by
a standard normaldistribution,
and2R2/n by
the
x2 distribution,
i. e. u = has the distributionF(u)
= 1 -e-u,
u > 0.Thus,
to make a test based onthe j - th
component ofW2
orU2
onecan calculate the related value of
S~
or ofR~
and refer it to theS2
orR2
distribution
using significance points
asgiven
above. DK alsogives
exactpoints
forz"~ _ (2/n) 1 ~2 S~.
4. NON-NULL DISTRIBUTIONS. GENERAL
The circle
diagram helps
also inexamining
the distributions of compo-nents when
Ho
is nolonger
true. In what follows we areexpanding (for W2)
results
put
verysuccinctly
inDK, § 5,
andadding
new results forU2. Sup-
pose the
hypothesized
distributionF(y),
in theW2
andU2 definition,
isnot the true distribution of y; let this be
FT(y).
When the mean ofFT(y)
is greater than that
of F(y),
the valuesof x;
move towards1,
and thepoints Pi
move round the circle towards B. The first
component
ofW2
will tend to benegative,
so whentesting Ho against
thisalternative, Si
1 is tested forsignificance
in the lower tail.Conversely,
fornegative
meanshift, S
1 istested for
significance
in the upper tail. Whenangles (Ji
are alldoubled,
it is clear that the
points
all tend to lie in the lower semicircle forpositive
mean shift and vice versa. In either case
S2
will not belarge, though T2
would be. When the variance of
FT(y)
isgreater
than that ofF(y),
the valuesof x~ move towards 0 and
1; S~ will
not belarge,
but when theangles
aredoubled,
thepoints
tend to lie in the first and fourthquadrants,
andS2
will be
large
andpositive.
When the true variance is smaller than that ofF(y), S2
will benegative.
ThusS2
tends to detect achange
in variance.R2, being
a combination ofS2
andT2,
will growlarger
for either a meanshift or a variance shift.
These
comments will beamplified
later with nume-rical
comparisons.
We shall seethat,
for tests in the normalfamily, Si
1detects
only
mean shift andS2 only
varianceshift,
butR2
detects a combi-nation of both. The
weighting
of thecomponents gives Si
1 the main role inW2,
andR2
the main role inU2;
thusW2
tends to detect a differencein mean from that
given by Ho,
andU2
a difference in variance. This confirmsprevious
observation(DK; Stephens, 1970;
Pearson and Hart-ley, 1972).
5. NON-NULL DISTRIBUTIONS : COMPONENTS OF
W2
AND OFU2
DK show how
asymptotic
power may be calculated.They
make theassumption
that the tested distribution is of the formF(y, (J) with j9
a vec-tor
of p
parameters. OnHo, fl
isspecified equal
toflo,
and forHA,
the alter-native to
Ho, 0
takes the formj9o
+where y
is a constant vector.Annales de 1’Institut Henri Poincaré - Section B
COMPONENTS OF GOODNESS-OF-FIT STATISTICS 43
Thus for the alternative the
sample
is assumed to comealways
from thesame
family
ofdistributions, with § approaching
itsHo
value as n ~ oo.This is restrictive
compared
with the situationusually envisaged
ingood-
ness-of-fit tests, but enables power studies to be made.
Basically
this is because DK show that in such cases, thezn~
areasymptotically normally
distributed with variance 1 as
before,
but with non-zero means whenHo
is not true; thus the power of a
component
caneasily
be found. Letz~
be a random variable with the
limiting
distribution of as n - oo.Let 03B8i
be the i-thcomponent
or 03B8. Define to beCF(y,
evaluatedat
0
=Qo,
and written in terms of x =F(y, 00);
let be the i-th compo- nent of a vectorg(x).
Letvector
have the i-th componentDK show that the mean of
z~
isIllustrations.
They
illustrate for the two cases where the distribution tested is thenormal,
and 03B8(with only
onecomponent) represents
either themean or the variance. Thus on
Ho
the tested distribution ifN(0, 1 )
and forthe alternative called Mean
Shift
it is1)
and for that called VarianceShift
it isN(0,
1 + For the extension of their results to cover botha mean shift and a
variance shift,
let~o
be(0,1)
and let y be(yl, y2); then,
on
HA, Q
isUo
+ i. =(yl/n,
1 +y2/n).
Thecorresponding
compo-nents of
g(x)
arewhere x is
given by
The 03B4ji values,
for =1,2,
are then found from(7),
andasymptotically znj
has then a normaldistribution,
with mean 1 + and variance 1.Similarly,
it is easy to show thatzn~
will beasymptotically
distribu-ted
1 ),
where vector~;
has ;-th component(i
=1,2)
Then =
z ~
+zn~
hasasymptotically
a non-centralX2 distribution,
d. f.
2,
andnon-centrality
parameterVol. X, n° 1 - 1974.
6. MOMENTS OF
W~
ANDU 2
We now turn to
finding
the moments ofW2
andU2
on the alternative distribution.Suppose
and with valuesrespectively 0.1666,
0.02222 and
0.008466,
are theasymptotic
mean, variance and third cumu-lant of
W~
onHo.
OnHA,
it iseasily
shown that these parameters becomewhere we write
Q~
for( y’~~)’ -
1 +’‘~2a j2)2.
These results may be extended in an obvious way tohigher
cumulants.For
U2,
the nuilasymptotic
cumulants arerespectively
,uo =0.08333,
75
= 0.002777 and X30 = 0.000265. Define nowQ~ _ (y’~~)2
= +y, b~, ) ;
then,
onH,~ :
’
7. COMPARISONS WITH BEST TESTS
DK use the results for z~ to compare the
asymptotic
power ofz
with thatgiven by
the best test when the shift is one of either meanonly
orvariance
only.
For a test of meanshift,
the best statistic is y ; for a5 %
Annales de 1’Institut Henri Poincaré - Section B
45 COMPONENTS OF GOODNESS-OF-FIT STATISTICS
twotailed test, one
rejects Ho if I Jny
> 1.96.Suppose yl
ispositive;
y, is of course zero. The power, for
given
yl, is thenSince is
asymptotically N(0, 1 ),
the power of the test basedon y is
50 % when y
1 = 1.96(taking
the secondprobability
to bezero)
and is 95
% when 03B31 =
3.605.Since z~ is
N(y 1 b~ 1, 1),
theasymptotic
power of the test based onz"~
can
easily
be found for these valuesof yl, and
can becompared
with the 50%
and 95
%
powers of the best test. DKgive
some results in their Table3 ;
forexample, when y
1 =3.605,
so that y has 0.95 power, zihas
power 0.928 and Z 2 has power 0.05.For the variance shift alternative
(y
1 =0),
the best test is based onsample
variance
S2,
and uses the result that the distribution of.The values
of Y2
for50 %
and95 %
best power are 2.772 and 5.098. Parallel calculations to those for the mean shiftgive
theasymptotic
powers of components z,,;against
the best statistics2.
Results aregiven
inDK,
Table 3.8. MEAN AND VARIANCE SHIFT.
PROPERTIES OF COMPONENTS
For this case, of course, no best alternative test exists. An
orthogonality
between the coefficients 6 now leads to
interesting
results for the various statistics. For the test ofnormality, g 1 (x) -
0.5 is an evenfunction,
and~~ 1,
equation (7),
is zero for all even valuesof j. Similarly, b~2
is zero for allodd j,
since
g2(x) -
0.5 is an odd function.Also, ~i
1 will be zerofor j odd,
and~~2 -
0for j
even. Thus~~1 ~~2
= 0 and~~ 1 ~~2
= 0 for allj.
Then z 1 willhave mean
for
any valueof y2,
so that the powerof z for
agiven change
in mean remains the same no matter how
large
is thechange
invariance;
inthe
example given above, when y
=3.605,
z~ 1 has power 0.928 for allchanges
in variance.Similarly,
z2 has meany2~~2,
and hence constantpower for a
given change
invariance,
for allchanges
in mean. These twostatistics
respectively
detectchanges
in meanonly
or in varianceonly.
The
orthogonality
alsosimplifies
thenon-centrality
parameter forR~;
(11)
becomesVol. X, n° 1 - 1974.
9. MEAN AND VARIANCE SHIFT.
CUMULANTS OF
W2
ANDU2
The cumulants of
W2
andU 2 given
in(12)
and(13)
alsosimplify
consi-derably.
For i = 1 and
2,
letWhere,
asbefore, ~*
indicates the sum over even valuesonly of j.
Then
(12) becomes,
forW~.
and,
forU2
Note that this
gives
theinteresting
resultthat,
for the test fornormality,
the increase in any cumulant for a mean and variance shift alternative is the sum of the increases for the
corresponding
mean shiftonly
and variance shiftonly.
10. RELATED RESULTS
It
happens
that the values of and~~~,
necessary for the calculation ofB~r
andB ~,
were known to the author becausethey
ariseagain
in the
following
connection.Suppose
a test is to be made for the dis-tribution
F(x ; 0),
but 03B8 is not known and must be estimated from the data.Under certain
conditions,
theasymptotic
distributions ofW2
andU2
are still
expressible
in the form of an infinite sum ofweighted x2 variables,
Annales de 1’Institut Henri Poincaré - Section B
COMPONENTS OF GOODNESS-OF-FIT STATISTICS 47
as
given by (2)
and(4),
but theweights
are nolonger 2.
In the cal-culation of the new
weights,
one needsgt(x)
and as definedin §
5. Thesequantities
were found in order to calculate theweights
for the cases whereF(x; 0)
is the normaldistribution,
and 0 is the vector(p, ~2),
one or bothof which must be
estimated,
and also whereF(x; 0)
= 1 - exp( - 8x),
0 to be estimated. From the
weights
were calculatedasymptotic
percen-tage points
forW2, U2
andA2
for these cases(Stephens, 1971).
When a
parameter
is to beestimated,
theasymptotic
mean ofW2
orU2
decreases. Let
Ai, A~
be the decreases for the caseswhere,
in a test for nor-mality,
theparameter
estimated is the mean or variancerespectively,
andlet
A3
be the decrease when both are to be estimated. Then it may be shown thatwhere gl (x), g2(x)
aregiven
in(8)
and A 11,A22
in( 15).
ForU2
thecorrespond- ing
decreases areIt
happens
that theintegrals in 03941 and 03942
forW2
can bedirectly calculated,
so that a check is available on the accuracy
of A11
andA22
calculated from the 6’s.Similarly, integrals
exist forO1
andA~
forU2,
andparallel
resultshold for the statistic
A2.
11. THE STATISTIC
A~
This statistic may also be
partitioned (DK),
Section6):
let u~ =2xt - 1,
and let -
then
where
c_
and
Pi.)
areLegendre polynomials.
In
particular
and
where u is the mean of the Ui. The Ui are uniform
( - 1,1)
and thesestatistics, particularly
u, have arisen in other connections. Tables ofpercentage points
of u and of B are inStephens (1966;
B has the same distribution as T in thatpaper)
so tests based on thesecomponents
can be made. It wouldnot be difficult to
give good approximations
topercentage points
forhigher
order
components
also.Asymptotically,
eachz"J
isN(0,1)
onHo;
non-nullasymptotic theory
is similar to that for
W2.
Acomponent zn~
will beasymptotically normal,
with variance 1 and
mean 03B4j
where03B4j
hascomponents 03B4ji given by
PJ(2x - 1 )
is a normalised Ferrar’s associatedLegendre function,
andg‘(x)
is determined
by Ho
as in Section 5. Thusagain asymptotic
powers of com-ponents
may be found. For the normal case, withg~(x) given by (8), ~i
1(corresponding
to mean shiftalternatives)
is 0for j
even, = 0for j odd,
so, as withW2,
individual oddcomponents
will have constant power for agiven
meanshift,
no matter how the variancechanges,
and vice versafor even
components.
For the entire
statistic,
formulas(12)
and(13)
and later(15) (16)
and(17),
will hold for
A2, using
the of(22) and,
in the denominators of theformulas, replacing j2n2 by j(j
+1), j4n4 by j2(j
+1 )2,
andj6n6 by j3(j
+1 )3.
Thedecreases in mean when a test is made with estimated
parameters
are now,using
the notation of section10,
the results
correspond
to those in( 18)
forW2.
As for
W2
andU2,
the~~~
were available fromprevious
work on testswith estimated parameters, so non-null cumulants of
A2
could befound,
and the distribution
approximated.
Annales de 1’Institut Henri Poincaré - Section B
49 COMPONENTS OF GOODNESS-OF-FIT STATISTICS
12. POWER STUDIES : TEST FOR NORMALITY
Power studies have been made on
W2, U2
andA2,
when the nullhypo-
thesis is that the distribution is
N(0,1)
and on the alternative it isThese make use of the coefficients
~~i, already
available fromprevious work,
as described above. Theasymptotic
power of acomponent S
ofW~
can
easily
be found from normaltables, using
the fact thatasymptotically z"~
is
1) where ~
= = +y2~~2, Similarly
the power of a compo- nentR~
ofU2
can be foundusing
non-centralx2
tables and(14).
For thecomplete
statisticsW2, U2
andA2,
the cumulants of the non-nullasympto-
tic distribution were found from(16)
and(17),
and these were used to fitan
approximating
distribution of the form a + with constants a,b,
p chosen so that the first three cumulants match the true values. Acomputer
programgiving
thexp integral
was then used to find the power.Only
two-sided tests have been
considered,
for components, and one-sided tests for the entire statistics. DK used anotherapproximation;
their calculations have beenrepeated using
the a +b~2p form,
and results agree with theirTable 4 very
closely, certainly
to the accuracyrequired
for power compa-risons, except
at onepoint.
These are the results forA2,
varianceshift,
for valuesof y2
= 2.77 and5.095,
the values used in their Table4,
columns 3and 4
(y
is thenzero).
Their values are .40 and.89,
but our calculationsgive
.15 and .52. Monte Carlo power studies were made to check this
discrepancy using
1 000samples
of size 100 for y2 = 2.77 and 4 300samples
of size 100 for y2 = 5.095. Since we have toextrapolate
from a finitesample
size n = 100to n - oo, we record also the Monte Carlo and theoretical results for
W2
andU2.
Thefollowing
tablegives
theproportion
ofsamples
declaredsignificant using
a5 °/
test.The results suggest that the DK values for
A2
are incorrect.13. CONTOUR MAPS OF POWER
Graphical
methodsprovide
an effective way to compare the abilities of the statistics to detect combinations ofchanges
in mean and variance.In
Figures
1 and 2 valuesof 03B321
andy2 representing changes
in mean and invariance
respectively
are recorded on the x-axis andy-axis.
On thegraphs,
lines of constant power, called
isodynes,
have been drawn.In
Figure 1, they
are shown forW2, U2
and forR2,
the firstcomponent of U2 (called U 1
on thegraph)
and for power valuesof 20, 40, 60, 80 % .
In
Figure 2, isodynes
are drawn forA2, W2
andU2.
Thecomplete
setare shown for
A2,
butonly
the 40%
and 80% isodynes
forW2
andU2.
For
comparison
withDK
Table4,
note that the four columns in that tablecorrespond
topoints
on thegraphs
with coordinates(YI, y2) equal
to
(3.84,0) (13.00,0) (0,7.67) (0,25.95).
Annales de 1’Institut Henri Poincaré - Section B FIG. 1. - Test for Normality : W~, U2 and components.
COMPONENTS OF GOODNESS-OF-FIT STATISTICS 51
Comments on the
graphs. (a)
A statistic will be morepowerful
than ano-ther if its
isodyne
lies nearer theorigin.
Thus we can see that for this situa-tion,
a test forN(0,1)
with a normalalternative, A2
iseverywhere
morepowerful
thanW2. U2 however,
becomes better thanA2
as the varianceshift,
on thealternative,
becomeslarger.
Aninteresting
way to make compa- risons is to calculate the areas enclosedby
anisodyne
and the axes, for agiven
statistic andgiven
power. This cannot beabsolute,
since itdepends
on the
parameters
on the axes, but since the choiceyi, y2 gives isodynes
which are
nearly straight lines-incidentally
aninteresting
result theseareas have been calculated and are
given
in Table I. This table shows aclear overall
superiority
forA2.
(b) Turning
now tocomponents,
we know thatS~,
thej-th component
of
W~,
will haveisodynes parallel
to they-axis for j
odd and to the x-axisfor j
even. Similar results hold for thecomponents of A2.
Theintercepts
forthe first two components, denoted
by
asubscript (e.
g.WI, W~)
are shownin
Figures
1 and 2. Theisodynes
forW 1
andA 1
cut the x-axis almost atthe same
points
as those of the entirestatistic;
butas y2 increases, they
moveFIG. 2. - Test for Normality : W2, U2 and A2.
parallel
tothe y2
axis while that ofW2
moves towards the axis. Thus there would benothing gained
inusing
the firstcomponent
rather than the entire statistic. If one were more anxious toguard against
a varianceshift, however,
the second
components W~
andA~
will beconsiderably
better than theparent
statistic for smallchanges
in mean. The area-measure for thesecomponents
isinfinite, reflecting
the totalinsensitivity
to oneparameter.
(c)
The firstcomponent
ofU2
has much betterproperties;
itsisodynes,
from
( 14)
arestraight lines,
from axis to axis and it isnearly
aspowerful
as
U 2
itself. Area-measures aregiven
in Table 1.TABLE I. -
Comparison ofW2, U2 and A2 isodynes.
The table
gives
theapproximate
area enclosedby
theisodyne
and the two axes, to the nearest unit.In the
general goodness-of-fit situation,
one would want a test statisticto be
sensitive, ideally,
to achange
in the entiredistribution; certainly,
if one tests for
N(0, 1 )
and holds to a normal distribution on thealternative,
one would
hope
that a test statistic would be sensitive tochanges
in bothparameters J1
and62.
Thus in acomparison
of componentsR2
would seemto be the
best,
in this situation.Intuitively,
this isappealing;
if oneplots
the
points
on acircle,
it seems more intuitive to use the entire resultant of the n unit vectors soformed,
rather than acomponent
on aspecified
axis.14. POWER
STUDIES :
TEST FOR EXPONENTIALITY~ . The
comparisons
above may be extended to a test forexponentiality,
where the null
hypothesis Ho
is to test that theparent
distribution isF(y)
= 1 - exp( - 0y) with 0
=1, and,
onHA,
thefamily
remains expo-nential,
with B = 1 +~,~3/. Jn.
The powertheory
follows apattern
similarto that for the normal. The
single
parameter y3gives
asingle
b~
is then found from(7)
or(n) and
from(10).
Means ofcomponents,
and cumulants of the entirestatistics,
aregiven
as for the normal case, but withonly
onecomponent
and~~.
Values of these constants were Annales de l’Institut Henri Poincaré - Section BCOMPONENTS OF GOODNESS-OF-FIT STATISTICS 53
again
available to the author from work described in Section10,
and leadto power studies summarized in
Figure
3 and Table II. The best test usesstatistic y;
asymptotically,
onHo, n(8y - 1)
isN(0, 1 ).
For the best test tohave
50 % asymptotic
power, 03B33 =1.96,
i. e.y3 - 3.84,
and for95 %
power
y3
= 13.00. Thesepoints
also are shown inFigure
3. In thisFigure,
powers are
compared directly
sinceonly
oneparameter
in involved. Resultson components are
given
in Table II to avoid too much detail inFigure
3.Comments.
A2
isagain
the best statistic of thethree,
withW2
a closesecond; U2
isrelatively
weak. For all threestatistics,
the first componentsare almost as efficient as the entire
statistic,
but results for second compo- nents areconsiderably
poorer.~. Test Exponentiality: W ~, U ‘, A-’.
TABLE II. -
Test for exponentiality.
Values of
y3
togive
powerP( %)
forA2, W2
andU2,
and their first components.
Final remarks. The combined results
suggest
that there is very littlegained
in
taking
acomponent
rather than the entirestatistic,
in the usual case ofa very
general
alternative to the null. Thetechnique
ofasymptotic
power calculation introducedby
DK and extended here is very useful togive
measures of
comparison
for the various statistics and theisodyne graphs
illustrate the results
probably
better than extensive tables. Thisparticular study
bears out results inStephens ( 1972),
whereisodyne graphs
are usedfor
comparisons
based on Monte Carlostudies,
thatA2
is a very valuable statistic in a wide class of situations. On the theoreticalside,
it will be interest-ing
to see extensions of the DK work tofinding components
for the case whereparameters
of the distribution are estimated beforeFn(x)
is found.ACKNOWLEDGEMENTS
The author attended the Conference on uses of the
empirical
distri-bution hosted
by
S. U. N. Y. at Buffalo inSeptember 1971,
where a firstdraft of the Durbin and Knott paper was
presented,
and expresses hisgratitude
to Professor E. Parzen for the invitation toparticipate
and toN. S. F. for their
support.
This work was alsosupported by
the U. S. officeof Naval Research and
by
the National Research Council ofCanada,
andthe author expresses his
gratitude
to theseagencies:
also to the FrenchGovernment,
and to Professor J. R.Barra,
for avisiting Professorship
and invitation to visit the
University
ofGrenoble, 1971-1972,
where thiswork was
begun.
REFERENCES
J. DURBIN and M. KNOTT, Components of Cramer-von Mises Statistics I. Jour. Royal
Statist. Assn. B., t. 34, 1972, p. 290-307.
J. A. GREENWOOD and D. DURAND, The distribution of length and components of the
sum of n random unit vectors. Ann. Math. Statist., t. 26, 1955, p. 233-246.
E. S. PEARSON and H. O. HARTLEY, Biometrika Tables for Statisticians, vol. 2 : Cambridge : Cambridge University Press, 1972.
M. A. STEPHENS, Statistics connected with the uniform distribution; percentage points and application to tests for randomness of directions. Biometrika, t. 53, 1966, p. 235-240.
M. A. STEPHENS, Tests for randomness of directions against two circular alternatives.
J. Amer. Statist. Assn., t. 64, 1969, p. 250-289.
M. A. STEPHENS, Use of the Kolmogorov-Smirnov, Cramer-von Mises and related sta- tistics without extensive tables. Jour. Royal Statist. Assn. B., t. 32, 1970, p. 115-122.
M. A. STEPHENS, Asymptotic results for goodness-of-fit statistics when parameters must be estimated. Tech. Report., Department of Statistics, Stanford Univ., 1971.
M. A. STEPHENS, E. D. F. Statistics for Goodness-of-fit. Parts 1, 2, 3. Tech. Report, Depart-
ment of Statistics, Stanford Univ.. 1972.
(Manuscrit reçu le 18 octobre 1973) .
Annales de 1’Institut Henri Poincaré - Section B