P ERSI D IACONIS D AVID F REEDMAN
A dozen de Finetti-style results in search of a theory
Annales de l’I. H. P., section B, tome 23, n
oS2 (1987), p. 397-423
<http://www.numdam.org/item?id=AIHPB_1987__23_S2_397_0>
© Gauthier-Villars, 1987, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section B » (http://www.elsevier.com/locate/anihpb) implique l’accord avec les condi- tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti- lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
A dozen de Finetti-style results
in search of
atheory
Persi
DIACONIS (1)
David
FREEDMAN (2)
Department of Statistics, Stanford, California 94305
Department of Statistics, University of California, Berkeley, California 94720
ABSTRACT. - The first k coordinates of a
point uniformly
distributedover the
n-sphere
areindependent
standard normalvariables,
in the limitas n - oo with k fixed. If k - oo the theorem still
holds,
even in the senseof variation
distance, provided k
= o(n).
The main result of this paper is afairly sharp
bound on the variation distance. The boundgives
anotherproof
of the fact thatorthogonally
invariantprobabilities
on R °° are scalemixtures of sequences of iid standard normals. Similar results are
given
for the
exponential, geometric
and Poisson distributions. We do not have theright general
theorem.Key words : de Finetti’s theorem, exchangeable, symmetric, variation distance, binomial, multinomial, Poisson, geometric, normal, orthogonally invariant.
( 1)
Research partially supported by NSF Grant DMS86-00235.(~)
Research partially supported by NSF Grant DMS86-01634. I worked on this paper while visiting IIASA, Laxenburg, Austria, in 1983.Classification A. M.S. : 60 G 10, 60 J 05.
Annales de l’Institut Henri Poincaré - Probabilités et Statistiques - 0294-1449 Sup. 87/02/Vol. 23/397/27/$4,70/(g) Gauthier-Villars
distribué sur la
sphere
de dimension n secomportent
comme des variablesgaussiennes
réduitesindépendantes quand n -~
oo avec k fixe. Si k - oo le théorème reste vrai meme au sens de la distance de lavariation,
pourvuque k
= o(n).
Leprincipal
résultat de cet article est une borne assezprecise
sur la distance de la variation. Cette borne donne une autre dimension du fait que des
probabilités
invariantes par les transformationsorthogonales
sur R °° sont des
melanges
de suites de variablesgaussiennes
réduitesindépendantes.
On donne des résultatsanalogues
pour des distributionsexponentielles géométriques
et de Poisson. Nous ne savons pas de théorèmereprésentant
le bon cadregeneral.
1. INTRODUCTION
Let ~
be chosen at random on the surface of thesphere
{03BE: 03A3 03BE2i=n}.
Then03BE1, ... , 03BEk
are for kfixed,
in the limit oo,independent
standard normal variables. This result isusually - but
wethink
incorrectly - attributed
to Poincaré( 1912).
Thehistory,
and theconnection with
Lévy’s work,
will be discussed in Section 6 below. We allow k = o(n),
agrowth
condition which is necessary as well as sufficient.We
get
areasonably sharp
bound on the variation distance between the law ...,~~
and the law of kindependent
standard normals: this bound isessentially
2k /n.
More
formally,
letQnrk
be the law ...,~k,
when( ~ 1,
...,~k, 03BEk+1,..., 03BEn)
isuniformly
distributed over the surface of thesphere
{ 1;: = r2 .
LetIf
be the law of ..., where the03B6’s
areindependent
standard normals. Section 2 provesThe order
k/n
isright, although
the 2 is notsharp.
Theinequality
hascontent
only
when k( 1 /2) n - 3.
sequence
orthogonally
invariant if for every n, the law ofX 1,
...,Xn
is invariant under all
orthogonal
transformations of R n. A sequence isorthogonally
invariant iff it is a scale mixture of iid standardnormals,
aresult
usually
attributed toSchoenberg ( 1938);
and see Freedman(1962).
This theorem is false for finite sequences;
indeed,
isorthogonally
invariant but not a mixture of normals.
Inequality ( 1), however,
does lead to a finite version of therepresentation
theorem,
and then the infinite version followsby
a passage to the limit.For the finite
version,
suppose(X 1,
... ,Xn)
are northogonally
invariantrandom variables. Let
Pk
be the law ofXi,
...,Xk;
recall that~1, ~2, ...
are
independent
standard normal variablesand Pa
is the law ofa~l, ... ,
LetP~,k = where Il
is aprobability
on[0, oo).
(2)
THEOREM. -If
...,Xn
areorthogonally invariant,
there is aprobability
~, on[0, oo)
such thatfor
1 _ k__
n -4,
In
short,
the first k of northogonally
invariant variables are within about2 k/n
of a scale mixture of iid standard normals. This is almost immediate from(1). Indeed,
consider the class oforthogonally
invariantprobabilities Pn
in Rn. This is convex, and thetypical
extremepoint
is theuniform distribution on the
sphere
of radius r.Clearly,
ifPE Pn,
where ~, is the P-law
By convexity,
it isenough
to prove(2)
for the extreme and that is(1).
Themixing
measure can be taken as the law of/ - ~T~ 03A3 X?.
B ~ ~=1
Again,
the~
rate issharp, although
the 2 is wrong. This is a bit morecomplicated
to argue: ifP=Q~,~,~
and~
is bounded away from1,
(Diaconis
andFreedman, 1980,
sec.4)
on the binomial.The infinite case, ie
Schoenberg’s representation
theorem fororthogonal invariance,
follows from the finite. LetXi, X2,
... be an infinite sequenceof
orthogonally
invariant randomvariables,
with law P on R °°. LetPa
be the law of ... where the
ç’s
are iid standard normals. LetP, =
(3)
THEOREM. - LetX 2,
... beorthogonally
invariant. Then there isa
unique y
withProof -
Foreach n,
let ...,Xn
be the first n variables in the sequence, with lawPn. Clearly, Apply
theorem(2), getting
aprobability Jln
on[0, oo )
withThe sequence ~" is
tight,
becausegoes to zero if n ~ oo and then oc --~ oo : but
Prob ~ ~
>a ~
-~ 1 asoo . For
existence, let fl
be asubsequential
limit of fln:clearly, P~
weak staralong
thesubsequence.
Foruniqueness,
determines
by
theuniqueness
theorem forLaplace
transforms. For a moregeneral
and lessanalytic proof
ofuniqueness,
see Dubins andFreedman
(1979,
Theorem3.4).
For an even more abstract version ofuniqueness,
see Diaconis and Freedman( 1984,
Theorem4 . 15).
0 We have similar results in three other cases.Again,
the rates aresharp,
but not the constants. The
argument
from the bound to the finite version of de Finetti’s theorem to be the infinite is the same in all cases, and will not be discussed in detail eachtime;
nor will thesharpness
of the rates.If ~ _ (~1,
...,~n)
is uniform on thesimplex ~~ >__ 0
then~1,
...,~k
arenearly independent
andexponential
withparameter n/s :
the variation error is at most
This leads to a characterization of mixtures of
independent exponential variables,
as uniform on thesimplex given
the sum. Details on the boundare in Section 3.
The
geometric
This follows the
pattern
for theexponential, restricting
attention tononnegative integer-valued
variables. The exact bound is morecompli-
cated :
Details on the bound are in Section 4.
The Poisson
The
analog
of Poincaré’s theorem is a Poissonapproximation
to themultinomial:
Drop
s balls into n boxes. Count the numberfalling
intobox
1,
box2,
..., box k. These k counts arenearly
iid Poisson variables withparameter s/n:
the variation error at most1.2 k/n, according
toKersten
( 1963);
also see Vervaat(1970).
Mixtures of iid Poisson variablescan now be characterized as
being conditionally
multinomialgiven
theirsum. Details on the bound are in Section 5.
2. THE NORMAL CASE
We
begin
with somegeneral
remarks on variation distance. Let P andQ
beprobabilities
on the measurable space(Q, F).
ThenIf F is
understood,
it maydropped.
The 2 is a conventional nuisance factor.Clearly,
the sup
being
taken over all F-measurable cp with0 cp _
1.If P and
Q
areabsolutely
continuous withrespect
to a 03C3-finite referencemeasure p,
having densities p
and q, thenwhere f ~ =fwhenf>O
and 0 otherwise.Let E be a sub o-field of F which is
sufficient
in the the sense that forall
(2. 4)
LEMMA. -If 03A3~F is sufficient,
Proof. -
In the otherdirection,
ifthen
where
(p=P(A X)
and(2 . 2)
was used for theinequality.
0We are now
ready
to proveinequality (1).
Proof of (1).
- Since variation distance is invariant under 1-1mappings,
e. g.
scaling,
it suffices to taker = In
soc~ = r/~n =1.
Recall thatn
~ _ ( ~ ~ ~ ~ ~ ~ ~ ~x~ ~+1,
... ,~n)
is uniform on thewhile çt,
i
~ ...
are iidN(0, 1).
LetQ
be theQnrk-law
...+ ~~
andP
thelaw ...
+ ~k . By
lemma( 2 . 4),
We realize
Q
as the law ...,~n/R,
wherethe law of
i. e., n
times a beta[k/2, (n - k)/2]
variable(Cramer, 1946,
sec.18 . 4).
Thus, Q
hasdensity
On the other
hand, P
isxk
withdensity
(Cramer, 1946,
sec.18 .1 ) . By (2. 3)
Clearly,
where
We must estimate h and
A,
and this is the core of theproof.
Webegin
with
h,
and claimIndeed,
an easy calculation shows thatNext,
we claimIndeed, I-’ (z + 1 ) = 2 I-’ (z),
soThen
This proves
(2. 8). Combining (2. 7-8) gives
If k is even with 1
~ k __ n - 3,
then(2 . 9)
and(2.6)
showIf k is odd with 1
_ k _ n - 4,
then k + 1 is even, andThis
completes
theproof.
0(2.10)
Remark. - Lemma(2. 4)
is usedonly
tosimplify
thecalculations;
k
indeed,
witht2 = ~ x2,
thedensity
of at ...xk)
is1
and 0 for
(2.11)
Remark onunique lifting.
- Theuniqueness part
of Theorem(3)
is not
surprising,
becauseuniqueness always
holds for infinite versions of de Finetti’s theorem. For finiteversions,
the situation is morecomplicated:
for
example,
ifPp
makes ...,Xn
iid coin tosses with successprobabil- ity
p, thenonly
determines the first n moments of 11, so 11 isnot
unique.
Fororthogonally
invariantprobabilities, however, uniqueness
holds even for the finite version of de Finetti’s theorem - and a little more.
To state the
result,
letCn
be the convex set of allorthogonally
invariantprobabilities
onRn,
andPE Cn}, iff 03C0=Pk
is theP-law of the first k
coordinates,
for some Then(i) Cn
is asimplex
with extremepoints
theQnrn.
(ii)
P isuniquely
determinedby Pk.
(iii) Cnk
is asimplex.
(iv)
The extremepoints
ofCnk
are theProof - (i)
P= where 03BB is the P-law of(ii)
Itsuffices to show that
P1
determines P.By orthogonal invariance,
the characteristic function of Pdepends only on t tf,
say as (p( t ).
Now the characteristic function of
P~
isand this determines (p, so
P, by Lévy’s uniqueness
theorem. ThatPk
determines P is the
unique lifting property.
Claims(iii)
and(iv)
followby general arguments
from theunique lifting property.
0(2 . 12)
Remark onsharpness. -
The rates in( 1)
and(2)
aresharp,
butnot the constants.
Indeed,
let k and n tend to oo, with lim supk/n
1.We think we can prove is
minimized,
at leastasymptotically,
when u ispoint
mass at 1. NowHere,
andk
with Z a standard normal.
Inf ormally, ~ ~2
is n. beta[k/2, (n - k)/2]
whichis
asymptotically
normal with mean k and variancenearly 2 k 12014- );
while 03A303B62i
isxfl
which is aboutN (k, 2 k). Multiplying
a centered normali
variable
by 1- 9 changes
the distributionby (p(0)
in variationdistance,
and cp(6) ,: y8
for small 8. Details areomitted,
but see the next remark.(2.13)
A moregeneral
result. - We belive that we haveproved
the resultin
(2. 12)
for afairly
broad class ofexponential families, including
eg scale mixtures of gammas with fixedshape,
orshape
mixtures of gammas with fixed scale. Werequire uniformly
bounded standardized fourth moment and some smoothness in the carrier measure, in order toget
uniform bounds like(1)
or(2).
See Diaconis and Freedman(1986).
Morespecifi- cally,
let I be an open interval(a, b), possibly
infinite. Leth >_ 0
on I belocally integrable.
LetA = ( a, fl)
be open, and for we have theexponential
withdensity
’
where
Let mx be the mean of
P~,
anda~
thevariance,
and~r~
the characteristic function. Asusual,
mx is continuous andstrictly increasing.
We assume(i)
A ismaximal,
in the sensethat
mx - a as 03BB ~ a while b as03BB ~ 03B2.
Let ...,
X n
be iidP~.
ThenS =X
+ ...+ Xn
is sufficient for À.Let
Q~~
be the usual version of theregular
conditional distribution forXl,
...,Xn given
S = s. Let Let chosen so= s/n.
Let n and
These estimates are uniform in s with
Dropping
theuniformity conditions,
the estimates still holdprovided
~,* is fixed in I ands/n
--~but the error
depends
on ~,*. Under suchconditions,
we can also prove thefollowing.
(c) Suppose
8 with 0 6 1. Fix ~,*E I;
for each n, choose s ass/n
=Then ~ Qnsk - Pk~~~Qnsk- Pk03BB*~
+ o( 1 ).
The idea of the
proof
for e. g.(a)
is this: letfk , 1
be theP03BB-density
ofX~ + ... +Xk.
Then the norm in(a)
is(s - t)/fn, ~* (s)
can be estimatedby Edgeworth.
Fork = o (n)
thisturns
out,
to agood approximation,
to beFor
(C),
letp, = lfk ’ x y (dX) be the P -density
of X1
+ ... Let Q
be
the
Qnsk-law
°fX i
+ ... +Xk,
withdensity
q. Letf= fk,
x*. ThenChoose for K the interval
essentially
whereQ > P~.
The firstintegral
isc~ ( 8) + o{ 1 ) .
The second ispositive,
up too { 1): only point
masses need be considered for ~,. We havea similar bound for k = o
(n).
These estimates
apply
to the normal andexponential
distributions.They
do notapply fully
to thegeometric
or Poisson cases, because the standardized fourth moments blow up near zero.However,
the estimates doapply locally,
and demonstrate thesharpness
of thek/n
rate.3. THE EXPONENTIAL CASE
The main result of this section shows that the uniform distribution on a
simplex
hasapproximately independent exponential
coordinates. Some of thereasoning
in theprevious
section will be useful in this connection and can be abstracted as a lemma. This is a more direct version of(Diaconis
andFreedman, 1980,
p.757).
(3.1)
LEMMA:Then
cp (4) = o,
cp isstrictly decreasing, strictly
concave, and cp(N - ) = -
r.(b)
For0 x l, let f(x) = -(I-x) log (l-x)-x.
Thenf(O)=O, f
isstrictly decreasing, strictly
concave, andf ( 1- ) _ -1.
Proof -
Claims(a)
and(b)
areelementary; (c)
and(d)
follow. Claim(e)
iselementary.
For thelog
of the left side is bounded aboveby
The
log
of theright
side is bounded belowby
The
expression
in( 3 . 2)
is smaller than that in( 3 . 3), by
Jensen’sinequa- lity. 0
We are now
ready
to state and prove theanalog
ofinequality ( 1)
forthe uniform. Let
P~
be the lawof ~ 1, ... , ~~
which are iidexponentials
with
parameter ~,,
soP~ ~ ~L > y ~ = e -’~ y. Given § 1 +
...+ ~n = s,
the~’s
areconditionally
uniform on thesimplex.
LetQnsk
be the law of~1, ... , ~k where ~ _ ( ~ 1,
...,~k, ~k + 1, ... , ~n)
is uniform on thesimplex ~i >__ 0
forall f =
-s ~ ~
The next result shows that isnearly Pn~~, provided
i
J
k = o (n). Informally,
this is because~1 + ... + ~n
ispractically
constant,so the
conditioning
is immaterial.Proof -
It suffices to do the case s = n. The sum is sufficient so lemma(2.4) applies,
andwhere
Q
is theQ"sk
lawof §1
+ ...+ 03BEk
andP
is theP’1law of 03B61
+ ...+ 03B6k.
We realize
Q
as the law of ..., n where S= ~ 1
+ ... +~n
andthe
ç’s
are iid standardexponentials. So, Q
is the law ofi. e., n times a beta
( k, n - k~ variable,
withdensity
On the other
hand, P
is gamma, withdensity
As before
and we must estimate
f/g
= Ah,
whereWe claim
Indeed,
thelog
of the left side is(3.8)
Remark. - The rate but not the constant issharp.
Lemma(2.4)
is
only
to avoid tediouscalculation,
since theQnsk density
iswhere
t = x 1 + ... + xk _ s
and all(3.9)
Remark. - Heuristic versions of(3.4)
are known: The uniform distribution on thesimplex
can berepresented
asthe joint
law of thespacings
of n - 1points dropped
at random into the unitinterval,
and thespacings
areapproximately independent exponentials
for many purposes.See Feller
( 1971,
p.74)
or Diaconis and Efron(1986)
for further discus- sion.Rigorous
versions andapplications
aregiven by
LeCam( 1958), Pyke (1965),
and Holst(1979).
(3.10)
de Finetti’s Theorem. - LetR+=[0, oo);
Let ...,Xn
be thecoordinate variables on
R +
andS = X 1 + ... + Xn.
LetCn
be the class ofprobabilities
onR +
which share with the iidexponentials
theproperty
thatgiven
S = s, the conditionaljoint
distribution ofX ~, ... , Xn
is uniformon the
simplex.
If P is aprobability
onR~,
letPn
be the P-law of the first n coordinates. The infinite form of de Finetti’s theorem asserts that if aprobability
P onR ~
hasCn
for every n, then P is aunique
scalemixture of iid
exponential
variables. The infinite theorem follows from the finiteversion,
which isgiven
in the next remark.(3.11)
Finite de Finetti. - RecallCn
from(3. 10). Clearly, P~
ECn;
so isP~~ =
for anyprobability
~. on[0, oo).
If i. e., P isconditionally
uniformgive
the sum, then there is a y such that for allk_n-2,
In other
words,
if nnonnegative
random variables areconditionally
uni-form on the
simplex given
their sum, the first k = o(n)
are to within about2 k/n
a scale mixture of iidexponentials.
As in the normal case, thek/n
rate is
sharp,
but not the constant 2.(3.12)
Another characterizationof Cn. -
Withprevious notation
P ECn
iff
P(A) = P(A + x)
for all Borel sets A and alln-tuples
of realWith £ R +
i
suppose
PECn.
Then P= so it isenough
to prove thatQ (A) = Q (A + x),
whereQ
stands forQnsn,
the uniform distribution onthe
simplex.
But and arecongruent,
andhence of
equal Lebesgue
measure. In the otherdirection,
supposeP(A) =P(A +x).
LetVS
be aregular
conditional distribution for Pgiven S = s,
so forPS -1-almost
all s. For anyparticular
B and x,as one sees
by integrating
over S > s: takeA = B n { S > s}.
The invariance must hold for e. g. allspheres
B with rational center andradius,
and allrational x,
forcing V~
to beLebesgue
measureon ~ S = s ~.
0Some lemmas on the beta will be
developed
in order to prove theunique lifting property.
(3. 13)
LEMMA. -If X
is beta(p, q)
andindependently
Y is beta(p + q, r)
then XY is beta(p,
q +r).
Proof. -
LetU, V,
W beindependent
gamma variables withparameters
p, q, r
respectively.
ThenU/(U + V)
is beta(p, q) independently
of U + Vwhich is gamma
(p
+q),
so the trivialidentity
proves the lemma.
0
(3 .14)
COROLLARY. -If X
has a betadistribution,
thenlog
X isinfinitely
divisible.
Proof -
If X is beta(p, q),
then e. g. X can berepresented
as beta(p, q-E).
betaE).
O(3.15)
COROLLARY. -If X
is beta withgiven
parameters, and Y ~ o isindependent of X,
then the lawof XY
determines the lawof Y.
Proof - By (3.14),
the characteristic function oflog
X never vani-shes. 0
(3. 16)
Remark onunique lifting.
- Theanalog
of(2. 11)
holds in this context too. Asbefore,
letCn
be the convex set of allprobabilities
on thepositive
orthant of R~’ which areconditionally
uniformgiven
the sums ofso
and it is
enough
tocompute À
fromPx.
To avoidtrivialites,
suppose1 _ k n - 2.
The critical case isk =1,
and it isenough
tocompute À
fromPl.
Now is s. beta(1, n -1).
SoPl
is the law ofSX,
where S and Xare
independent,
S has the lawÀ,
and X is beta(1, n -1 ). Finally, P~
determines X
by ( 3 .15) .
Another
argument
forunique lifting
starts from the characterization(3.12)
ofCn.
Take withyi >_ o;
takeThe
upshot
is that forPECn,
and
Pi
determines P.4. THE GEOMETRIC CASE
Let
Pp
be law of03B61...03B6n,
which are iidgeometric
variables with para-meter p,
soPnp{03B6i=j} _ ( 1- p) p’ 1, ... Given §1
+ ...+03B6n = s,
the
ç’s
are uniform on thesimplex.
Let be the law of~~ ... ~k
where~=(~1,
... ,~ ~k + 1 ~ ~
...~")
is uniform on thesimplex
The
analog
of( 1)
is thefollowing
theorem.(4.1)
THEOREM. -Let p = s/(n + s).
For1 c k _ n - 3,
Proof -
Herescaling
is notfeasible,
so all values of s must beconsidered. Let
t = j 1 + ... + jk. Clearly,
By
the "stars and bars" lemma( Feller, 1968,
sec.II . S),
there areof
nonnegative integers
with sum s. Then fort=j1
+... +jk~s,Dividing (4. 3) by (4. 2),
the ratioQ,~Sk (j 1, ~ ~ ~ , jk)lPp (j 1, ~ ~ . , jk)
is seen toequal
This is
N/D,
whereNow
by
Lemma( 3 . 1 .f ) ~
The balance of theargument
is omitted. 0(4. 5)
Remark. - The usual remark onsharpness
of rates is omitted.(4. 6).
Remark. - Inphysics,
the conditional distribution of iidgeometric
variables
given
their sum(viz,
the uniform on thesimplex)
is referred toas
"Bose-Einstein";
the conditional distribution of iid Poissonsgiven
theirsum
(the multinomial)
is "Maxwell-Boltzman". See Feller( 1968,
pp. 40ff).
(4 . 7)
de Finetti’s theorem. - LetZ +
denote thenonnegative integers.
Let
Xi,
...,Xn
be the coordinate variables onZ +
andS = X 1
+ ...+ X".
Let
C"
be the class ofprobabilities
P onZ +
which share with the iidgeometrics
theproperty
thatgiven S = s,
theconditional joint
distributiono~ ~ 1 ~ ~ ~ ~ ~ Xn
isQnsn,
the uniform on thesimplex.
If P is aprobability
every n, then P is a
unique
mixture of iidgeometric
variables. The infinite theorem follows from the finiteversion, given
in the next remark.(4 . 8)
Finite de Finetti. -Clearly, P’p
EC";
and so is =Pp
~,(dp )
forany
probability
~, on[o, 1].
If P EC",
there is such that for k _ n -3,
In other
words,
if nnonnegative integer-valued
randomvariables
areconditionally
uniform on thesimplex given
their sum, the firstk = o (n)
are to within about 2
k/n
a mixture of iidgeometrics.
The rate issharp,
but not the constant.
(4. 9)
Another characterizationof Cn. -
Withprevious notation,
PGCn
iff
P ( A) = P ( A + x)
for all sets A and alln-tuples x = x 1 + ... + xn
ofn
integers provided A c Z +
andA + x c Z + . By A + x,
ofi
course, we
mean {a+x: aEA}.
Theproof
is omitted aselementary.
(4.10) Unique lifting.
- Continue with the notation of(4 . 7).
LetThen
P~,
the P-law ofXi,
...,Xk,
determines P. To avoidtrivialities,
suppose
n >_ 2
and k =1. Let ~, be thePn-law
of S. Theproblem
is tocompute À
fromPl,
Butforj=0, 1,
...By taking
successive differences n -1times,
one recovers03BB(s)/(n+s-1 s)
and hence03BB(s).
Aless-algebraic proof
starts from(4 . 9).
5. THE POISSON CASE
Let
P~
be the lawof ~ 1... ~n
which are iid Poisson variables with parameter ~,. GivenS = ~ 1-~
...+ ~~ = s,
theç’s
are multinomial. LetQnsk
be the law
... , ~k given S - s,
Proof - By sufficiency, where Q
is binomial withs trials and success
probability k/n,
whileP
is Poisson withparameter ks/n.
Nowappeal
to Kersten( 1963).
0(5 . 2)
Finite de Finetti. - LetZ +
denote thenonnegative integers;
letX 1... X"
be the coordinate function onZ +,
and S = X 1 + ... +Xn.
ThenPECn
iffQnsn
is the P-law ofXi,
...,Xn given
S = s.Clearly,
andso is
P~n =
for anyprobability ~
on[0, oo).
IfPE Cn,
there is asuch
that for kI
n,2
In other
words,
if nnonnegative integer-valued
random variables areconditionally
multinomial(n, s/n, ... , sin) given
their sum, the firstk = o (n)
are to within aboutk/n
a mixture of iid Poissons. The rate issharp
but not the constant.(5. 3) Unique lifting.
- LetP E Cn.
ThenPk,
the P-law ofXl,
...,Xk,
determines P. To avoid
trivialities,
supposen >_ 2
and k =1. Let 03BB be theP-law of S. The
problem
is tocompute À
fromPl,
ButLet
and
with radius of convergence
n/n -1. Clearly,
and since cp is
analytic
on the disk{x: I x ( n/n -1 ~
it is determinedby
itsderivatives at 1.
(5. 4)
More onunique lifting.
- We consider(4. lo)
and(5. 3)
a bit moregenerally.
Let M(i, j)
be an infinite stochastic matrix with entries whichare
positive for j _
i and vanishfor j> i, i. e.,
M is a lowertriangular
matrix on the
nonnegative integers Z + .
Let 03BB be aprobability
distributionon
Z"+.
LetM ( i, . )
be the law ofX 1
whenX 1
+...--X~
isuniformly
distributed on the
simplex {X 1 +
...+ X n = i~,
thatis, M (i, . )
is theQ;i corresponding
to thegeometric.
Then X M determines À.by ( 4 . 10) .
Like-wise for the M
corresponding
to themultinomial, by (5.3).
Forgeneral M,
a formal inversealways
exists.Indeed,
letMn
be the upper n x n submatrix of M: thenMn+ 1= Mn 1
when it can.However,
M is notnecessarily
1-1. Thecounterexample M,
with domain thepositive integers,
is
Algebraically,
for i =3, 5, 7,
...For
f=4, 6, 8,
...Let
Let
Then ~, M = ~,’ M.
6. HISTORY
1.
Poincare, Borel,
and MaxwellThe
asymptotic normality
of the first k coordinates of a randompoint
on the
n-sphere
is a theoremusually
attributed to Poincare(1912):
seee. g. McKean
(1973,
p.197),
Letac(1981, p. 412),
orBillingsley (1979,
p.
342). However,
after adiligent search,
we were unable to find the result in Poincare. The closest we came was a brief mention of statistical mech- anics at p.43,
and Liouville’s theorem at pp. 145 ff.The earliest reference to the theorem we could find in the
probability
literature was Borel
(1914, Chapter V).
Inequation (12)
on p.66,
Borelgives
asharp
statement of the theorem for k = 1. On pp.90-93,
he makesthe connection with the kinetic
theory
of gas.(We
continue with ournotation,
rather thanswitching
tohis.)
Consider mparticles,
each with 3velocity coordinates,
all denoted x 1, ..., xn, with n = 3 m. Eachparticle
has the same mass, to be denoted
by
c. Thesystem
is constrained to lieon the surface of constant kinetic
energy, -
c03A3 xf
=h .
A uniform distri- 2 î=lbution on this energy surface is assumed
(Liouville’s
theorem is the usual -butpartial -justification). Now xi
tends in distribution toN (0,
2some sense, but for variation distance k = o
(n)
isrequired.
Borel was
aiming
for "the usual form of Maxwell’stheorem",
that theempirical
distribution of x 1, ..., xn tends to normal. From a modernperspective,
this isquite
easy. IfZi,
...,Z~
are iidN (0,1) variables,
then theirempirical
distribution tends toN (0,1).
So must theempirical
2
1 ’
n 2distribution
ofZ1/R",
...,Zn/Rn,
whereR2n=1 n03A3 Zf -
1 a. e. One refer-ence of historical interest is Maxwell
(1875,
p.309);
asecond,
Maxwell[1878, eqn (49-55)],
is more technical and focused on the law of the individual x’s.The
application
to kinetictheory
is anexample
of what mathematicalphysicists
now call theequivalence ofensembles.
If H(x)
is theHamiltonian,
one can work with the "microcanonical
distribution",
the uniform distribu- tion on{x: H (x) = h2~. Alternatively,
one can work with the Gibb’s distri- butionG(dx)
on Rn. This is aprobability
whosedensity
isproportional
The constant c is chosen so
H (x) G (dx) = h2.
Theequivalence
of ensembles obtains when
for some wide class of
f unctions g,
and for somewell-specified meanning
of the
approximate equality.
To
get
Borel’sexample,
takeH x =1 c x2 + ... + x2
Theorem(1)
gives
theequivalence
of ensembles as n tends toinfinity,
for theg’s depending
on o(n)
coordinates.Physicists
tend to assume that theequiva-
lence
always holds, although currently
there is a move towardsrigor.
Theleading
researchers areDobrushin, Lanford,
and Ruelle. A convenient reference is Ruelle(1978, Chapter 1).
2. Paul
Levy
Levy
made extensive use of versions of theorem(1)
for k finite indiscussing
means on function spaces. The material is firstpresented
inLevy ( 1951).
Also McKean( 1973)
who describes Brownian motion"the uniform distribution on the
sphere
of radiusWe will
attempt
here to sketch the connection to theorem(1). Levy’s
idea was to define the mean value M of U as the
expected
value of U where U is afunctional,
andf
is chosen at random on the unitsphere
ofLZ [o,1].
This isclearly insane,
because there is norotationally
invariantcountably
additiveprobability
on thatsphere. Nothing daunted, Levy
defines
M (U)
for certain Uby
alimiting
process.Following
Gateaux1919
he discretizcs0 1
asp 1 ( 1 2 .. f n-1 n, n]
and con-siders the
approximation fn
tof
obtainedby averaging f
over theseintervals.
So fn
is constant on each interval. Let ai, ... , a,~ be the valuesof fn
on the n intervals. NowMn(U)
is defined asE{U{fn)}, when fn
ischosen at random on the =
I,
i. e.,f n (t)
dt = 1. If~0
M~ (U)
converges oo, the limit is declared to beM (U).
Forexample,
suppose at least
formally
thatU(/)=p /t - )
where p is a real-valued function. ThenMJU)==p fn(1 2)],
whoseexpectation by ( 1)
tends toSimilar
things
can be be done if Udepends
onf at
o(n)
coordinates.3. de Finetti
The
original
theorem(de Finetti, 1931)
states that anexchangeable
process of 0’s and l’s is a
unique
mixture ofcoin-tossing
processes. Themove from
~0,1 ~
to acompact
Hausdorff space is due to Hewitt andSavage (1955),
which alsogives
somehistory.
The result is false for abstract spaces(Dubins
andFreedman, 1979).
Stam
(1982) proved
a version of theorem(1)
with an error bound invariation
distance, assuming k
= o( /).
Hegives
someinteresting applica-
tions to
geometrical probability theory.
Gallardo(1983)
and Yor(1985)
gave an
argument (with k fixed) using
Brownian motion in n dimensions.Freedman and Lane
( 1980,
Lemmas 1 and2)
showed how to derive theconvergence of an
empirical
distribution ofdependent
random variables("the
usual form of Maxwell’stheorem")
from information on thelimiting
behavior of
pairwise joint
distributions.Indeed,
in thegeneral setup
of(2.13),
suppose Then theempirical
distribution of xi, ..., xn convergesweakly
inQsns-probability
toP03BB.
This is becauseQnsn
isexhange- able,
andP203BB by (2.13).
5. Recent literature on de Finetti’s theorem
The infinite
representation
theorems were discussed in Freedman( 1962, 1963),
who gave characterizations for mixtures of the variousexponential
families. Diaconis and Freedman
( 1980)
gave a finite form of de Finetti’s theorem for 0 -1variables,
with an error bound. Thepresent
note is asequel, carrying
out theanalysis
in four additional families of distributions.Zaman
(1986)
has results for Markov chains. See Eaton(1981)
on thenormal case, and Diaconis-Eaton-Lauritzen
( 1986)
for vector-valued ran-dom variables. Partial
exchangeability
is discussed from variousperspecti-
ves in Diaconis and Freedman
(1984),
Aldous(1985),
Lauritzen(1984),
Ressell
( 1985).
Finite versions of these results do not seem to be available in anydegree
ofgenerality.
Local central limit theorems arerelevant,
asin Martin-Lof
( 1970);
or conditioned limittheorems,
as in Csiszar(1984)
and
Zabell ( 1980).
6. Corrections
We would like to correct some errors in Diaconis and Freedman
( 1980).
On p.
757,
inequation (36) replace
o(k/n) by
o(k/n).
In thatequation
and
(41),
assume k - oo,although
theargument
doesgive
a result fork = 4 ( 1 ).
On p.764,
in the lastremark,
the urn is to containrn =1
redballs and bn
=n-1 black balls. Let H be thehypergeometric
distributionment from this urn. Let
Bp
be the binomial distribution with k trials andsuccess
probability
p. Then The minimal value foris essentially k 2 /n2,
forD. ALDOUS, Exchangeability and Related Topics, Springer Lecture Notes in Math., No.
1196, 1985.
E. BOREL, Introduction géométrique à quelques théories physiques, Gauthier-Villars, Paris,
1914.
P. C. BILLINGSLEY, Probability and Measure, Wiley, New York, 1979.
I. CSISZÁR, Sanov Property, Generalized I Projections, and a Conditioned Limit Theorem,
Ann. Prob., Vol. 12, 1984, p. 768-793.
B. DE FINETTI, Funzioni caratteristica di un fenomeno aleatorio, Atti della R. Academia Nazionale dei Lincei Ser 6, Memorie Classe di Scienze Fisiche, Mathematiche e Naturali, Vol. 4, 1931, pp. 251-299.
B. DE FINETTI, La prévision: ses lois logiques, ses sources subjectives, Ann. Inst. Henri Poincaré, Vol. 7, 1937, pp. 1-66.
P. DIACONIS, Finite Forms of de Finetti’s Theorem on Exchangeability, Synthèse, Vol. 36, 1978, pp. 271-281.
P. DIACONIS, M. EATON and S. LAURITZEN, (1986). Finite de Finetti Theorems, 1986, Unpublished manuscript.
P. DIACONIS and B. EFRON, Probabilistic-Geometric Theorems Arising from the Analysis
of Contingency Tables, Contributions to the Theory and Application of Statistics, A.
GELFAND Ed. Academic Press, Boston, pp. 103-125.
P. DIACONIS and D. FREEDMAN, Finite Exchangeable Sequences, Ann. Prob., Vol. 8, 1980, pp. 745-764.
P. DIACONIS and D. FREEDMAN, Partial Exchangeability and Sufficiency, in Statistics, Applications, and New Directions, J. K. GHOSH and J. ROY Eds., 1984, pp. 205-236, Indian Statistical Institute, Calcutta.
P. DIACONIS and D. FREEDMAN, A Finite Version of de Finetti’s Theorem for Exponential
Families with Uniform Asymptotic Estimates, Technical report, Department of Statistics, University of California, Berkeley, 1986.
L. DUBINS and D. FREEDMAN, Exchangeable Processes Need Not be Mixtures of Independent Identically Distributed Random Variables, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, Vol. 48, 1979, pp. 115-132.
M. EATON, On the Projections of Isotropic Distributions, Ann. Stat., 9, 1981, pp. 391-400.
W. FELLER, An Introduction to Probability Theory and Its Applications, Vol. I, 3rd Edition, Wiley, New York, 1968.
W. FELLER, An Introduction to Probability Theory and Its Applications, Vol. II, 2nd Edition, Wiley, New York, 1971.
D. FREEDMAN, Invariants Under Mixing which Generalize de Finetti’s Theorem, Ann. Math.
Statist., Vol. 33, 1962, pp. 916-923.
REFERENCES