A NNALES DE L ’I. H. P., SECTION B
M ARGRIT G AUGLHOFER
A. T. B HARUCHA -R EID
ε-entropy of sets of probability distribution functions and their Fourier-Stieltjes transforms
Annales de l’I. H. P., section B, tome 9, n
o2 (1973), p. 113-144
<http://www.numdam.org/item?id=AIHPB_1973__9_2_113_0>
© Gauthier-Villars, 1973, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section B » (http://www.elsevier.com/locate/anihpb) implique l’accord avec les condi- tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti- lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
~-Entropy of sets
of probability distribution functions and their Fourier-Stieltjes transforms
Margrit GAUGLHOFER and A. T. BHARUCHA-REID Center for Research in Probability, Department of Mathematics
Wayne State University, Detroit, Michigan 48202
Ann. Inst. Henri Poincaré, Section B :
Calcul des Probabilités et Statistique.
SECTION I
’
INTRODUCTION
1.1. Introduction
At the present time there are three main known notions of entropy:
measure-theoretic or
probabilistic, epsilon (E)
or metric, andtopological.
Historically,
measure-theoretic entropy was the first notion of entropyintroduced,
and has foundwidespread applications
inprobability theory,
informationtheory,
andphysics.
For discussions of measure-theoretic entropy we refer to Khinchin
[l3]
andRényi [23, Chap. IX].
The concept of 8-entropy
(also
called metricentropy)
was introducedby Kolmogorov [14, 15]
in order toclassify
compact metric spacesaccording
to their massiveness.The basic definitions are as follows. Let A be a subset of a metric space
X,
and let 8 > 0 begiven.
Afamily E1, E2,
...,E~
of subsets of X is said to be an~-covering of
A if the diameter of eachEk
does notexceed 28 and if the sets
Ek
cover A. For agiven
8 > 0, the number ndepends
on thecovering family { Ek ~,
but min n is an inva-riant of the set A. The
logarithm
H~(A)
=log2 N,(A)
is the ~-entropy
of
A.Annales de I’Institut Henri Poincaré - Section B - Vol. IX, n° 2 - 1973. 9 Vol. IX, n° 2, 1973, p. 113 -144.
114 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
A related notion is that of the
B-capacity
of A. Points x2, ..., x~of A are said to be
~-distinguishable
if the distance between eachpair
ofthem exceeds s. The number
Mg(A)
= max m is an invariant of the setA ;
and thelogarithm
C,(A)
=log2 M~(A)
is the
~-capacity of
A.In
general, H~(A)
andC~(A)
increaserapidly
to + oo as B --~0 ;
and theirasymptotic
behavior serves to describe the compact set A.Kolmogorov
and Vituskin(cf. [28])
used the notion of B-entropy in their famous work on Hilbert’s 13thproblem.
At the present time the concept of 8-entropy isbeing
used tostudy
a number ofproblems
inanalysis
andprobability.
In the field ofanalysis
we refer inparticular
toapplications
in
approximation theory (cf.,
Lorentz[l7, 18]).
Someapplications
ofs-entropy in
probability theory
arebriefly
summarized in the next sub- section.The newest concept of entropy is that of
topological
entropy. We referto
Adler,
et al[l for
a discussion oftopological
entropy and some of itsapplications.
Within recent years a number of mathematicians have
investigated
the
relationships
between the different concepts of entropy. We referto
Goodwyn [12],
who showed thattopological
entropy bounds measure- theoretic entropy ; and toDinaburg [5]
who studied therelationship
between
topological
entropy and s-entropy.1.2.
s Entropy
inProbability Theory
Although
the notion of e-entropy was defined and introduced in connec-tion with a
nonprobabilistic problem,
in recent years a number of mathe- maticians have utilized the concept of B-entropy tostudy
a number ofproblems
inprobability theory.
a.
Kolmogorov
and Tihomirov[l S]
have used B-entropy tostudy
certain
problems
in theprobabilistic theory
ofapproximate
transmission ofsignals ;
and Prosser[21]
hascomputed
the B-entropy andB:capacity
of certain
time-varying
channels(i.
e. certain bounded linear operatorson a Hilbert
space)
in communicationtheory. Compact, Hilbert-Schmidt,
and convolution operators were considered.Posner and Rodemich
[20]
have used B-entropy to formulate andstudy
certain
problems
which arise in thestudy
of datacompression.
TheirAnnales de l’lnstitut Henri Poincaré - Section B
115 E-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS
results are of great interest in the design and analysis Of systems lor trans-
mitting
andreceiving
communicationsignals.
b. Let
(S, d)
be a compact metric space, and letC(S)
denote the Banach space of continuous real-valued functions on S. Strassen andDudley [26]
utilized the notion of e-entropy to obtain conditions for which the central limit theorem holds for random variables with values in
C(S).
c. Chevet
[2], Chevet, et al [3]
andDudley [6, 7]
haveemployed
B-entropyto
investigate
thesample path continuity
of certain random functions with values in aseparable
metric spaces. These papers are of greatimportance
in the
theory
of random functions sincethey
introduce a newanalytic approach (based
onB-entropy)
to thestudy
ofsample path continuity.
We refer also to the papers of
Dudley [8]
and Sudakov[27].
1.3.
Summary
Following
the introduction of the notion of a-entropyby Kolmogorov,
a number of mathematicians have been concerned with the
computation
of the a-entropy of certain concrete function spaces, and have utilized these results to
study
certainproblems
inanalysis (cf., Kolmogorov
andTihomirov
[15],
Lorentz[18],
and Vituskin[29]).
In this paper westudy
the R-entropy of certain sets of
probability
distribution functions of real- valued randomvariables,
and we also consider the B-entropy of the sets of theirFourier-Stieltjes
transforms(or
characteristicfunctions).
In Section II we
give
some basicdefinitions,
and present a summary of someproperties
of B-entropy andB-capacity
which will be utilized insubsequent
sections. Section III is devoted to astudy
of the B-entropy of certain sets of truncatedprobability
distribution functions. In Sec- tion IV we consider the 8-entropy of the sets of associated Fourier-Stieltjes
transforms.SECTION II
s-ENTROPY : BASIC DEFINITIONS AND PROPERTIES In this section we
give
some basicdefinitions,
and present a summary of someproperties
of B-entropy andB-capacity
of sets in function spaces.For a detailed treatment of the B-entropy and
B-capacity
of sets in functionspaces we refer to
Kolmogorov
and Tihomirov[15].
Let A be a nonempty set in a metric space X.
Vol. IX, n° 2 - 1973.
116 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
DEFINITION 1.1. - A collection U of sets U c X is called an
e-covering
of A if ~ U ~
A andd(U)
2~ for allUeU,
whered(U)
denotes theUE1I
diameter of the set U.
DEFINITION 1.2. - A set U c X is called
e-separated
if any two distinctpoints
of U are at a distance greater than e from each other.DEFINITION 1.3. - A set A is said to be
totally
bounded(or pre-compact)
is for every e > 0 there exists a finite
e-covering
of A.Remarks :
(i)
A set A istotally
bounded if andonly
if for every e > 0 every8-separated
set is finite.(ii) Clearly,
any compact set istotally
bounded.
Throughout
this paper we restrict our attention tototally
boundedsets.
DEFINITION 1.4.2014 Let
N,(A)
be the minimal number of sets in an e-coverof A. Then
H,(A)
=log2 N,(A)
is called the e-entropyof
A.DEFINITION 1 . 5. - Let
M~(A)
be the maximal number ofpoints
inan
e-separated
subset of A. ThenC,(A)
=log2 M,(A)
is called the e-capa-city of
A.The
following properties
hold:(P 1) Ht:(A)
andC~(A)
arenondecreasing
functions as e decreases.(P2) H£(A) - CE(A).
For a discussion of the above
properties
we refer toKolmogorov
and Tihomirov
[15].
Remark :
(iii)
It is easy to show that for any bounded set A in n-dimen- sional Euclidean spaceRn having
interiorpoints log - , where f - g
meanslim f(e)/g(e)
= 1 .e
The above fact has
given
rise to thefollowing generalization
of dimension.DEFINITION 1.6. - The upper metric dimension of a set A is defined as
dm(A)
=lim sup H~(A)/log (1/~)
The metric dimension of A is defined as
dm(A)
= lim £-0H,(A)/log ( 1/E).
if this limit exists.
Annales de l’lnstitut Henri Poincaré - Section B
6-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS 117
We note that for convex, infinite-dimensional subsets A of a Banach space the metric dimension is
always equal
to + oo, andis, therefore,
useless as a measure to
distinguish
the massiveness of sets of this type.DEFINITION 1.7. - The exponent
(or
orderof growth) of
entropy r of A is defined asIf the lim sup is
equal
to alimit,
then r is often called the metric orderof
A.DEFINITION 1. 8. - The
logarithmic
orderof growth of entropy
is defined asIf the lim sup is
equal
to alimit,
thena(A)
is often called thefunc-
tional dimension
of
A.SECTION III e-ENTROPY OF SETS
OF PROBABILITY DISTRIBUTION FUNCTIONS
3 .1. Introduction
Let
(Q, d, ,u)
be acomplete probability
measure space; and consider the measurable space(R, ~),
where R is the real line and f1I is thea-algebra
of Borel subsets of R. A
mapping
x : S2 -~ R is said to be a real-valued random variableif {
m : EB ~
E ~ for all B E ~. Associated with every random variable is a real-valued functionF( ç)
defined as follows:The
mapping
x induces a measure v in thefollowing
manner:It is easy to show that v is a
probability
measure; hence(R, v)
is calledthe induced
probability
space.Vol. IX, n° 2 - 1973.
118 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
The function
F(~)
definedby (3 .1 )
is called a distributionfunction.
Thefollowing properties
of distributions are well-known(cf.,
Gnedenkoand
Kolmogorov [11, Chap. 1] ]
or Loève[16, Chap. IV]).
(1)
F isnon-decreasing;
(2)
F is left-continuous:Properties (1)-(3) imply
thefollowing
additional property:(4)
Theonly
discontinuities a distribution function can have arejumps;
and the set of discontinuities is at most countable.
Let %
denote the collection of all distribution functions F. We definea metric d
on %
as follows: ForF,
Ge S
The metric defined
by (3.3)
is called theLévy
metric, and(~, d)
is calledthe
Levy
space. We state thefollowing
basicresult,
theproof
of whichis
given
in Gnedenko andKolmogorov [77, Chap. 2]:
THEOREM 3.1. - 1’he
Levy
space(~, d) is a complete separable
metricspace.
In Section 3.2 we compute the e-entropy of the set of truncated dis- tributions
(~N, d);
and in Section3. 3
we compute the e-entropy of the set of truncated discrete distributions.3.2.
e-Entropy
of Setsof Truncated
Probability
Distribution FunctionsLet
JN
denote the set of distribution functions truncated in the follow- ine wav :Clearly
1 c ... ; and~N T ~
as N --~ oo.We first observe that
~N
with the uniform metricp(F, G) = sup |F(03BE) - G(03B6) I
Annales de l’lnstitut Henri Poincaré - Section B
e-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS 119
is not
totally
bounded for anyN;
hence a finite e-entropy cannot be obtained for(~N, p).
Infact,
the set{ Fk ~
whereis a
non-finite, e-separated
set for any e1/2.
This follows from the fact that for n > m,We will now show that
fjN
with theLevy
metric d has finite e-entropyfor any e >
0,
and any finite N.We now estimate
d)
-the e-entropy ofd).
This computa- tion will be carried out in two stages: we first compute an upper bound for and then compute a lower bound ford).
1. An upper bound. An upper bound for
d)
isgiven by
thelogarithm
of the number of elements in ane-covering
of(~N, d).
Suchan
e-covering
can be obtained in thefollowing
way: LetAn element of the
covering
is a set of distribution functions whosegraphs
are contained in the set
where
lp(x)
is anondecreasing
step function with thefollowing properties :
The step
function ~
has ajump
at x - 2e of the same size(except
whenthe value 1 is
reached)
as the function lp at x. We call the set(3. 6)
an e-cor-ridor. A
graphical representation
of an 8-corridor isgiven
inFigure
1.The
graph
of any distribution function is contained in such an s-corri- dor ; for if(xo, F(xo))
EQik, then,
as xincreases,
the fact that F is non-decreasing implies
that thegraph
has to leaveQik
and go into eitherVol. IX, n° 2 - 1973.
120 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
The e-corridor can be extended
(or
builtup)
in a manner whichdepends
on the
graph
of the distribution function F.By
the construction of thefunctions 03C8 and 03C6,
any two distribution functions F and Gbelonging
to the same 8-corridor have a
Levy
distanced(F, G) qJ)
~ 2e.We first compute the number of elements in an
e-covering
of(~N, d).
If the square say, is reached
by m
distinctcorridors,
and iby
h distinctcorridors,
thenQi, k
is reachedby m
+ h corridors.Therefore,
the entries in the table of
Figure
2 indicate the number of distinct corridorsAnnales de l’lnstitut Henri Poincaré - Section B
e-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS 121
the
s-covering
hasreaching
therespective
square. The sum of the number of corridors reach-ing
the squares i =1, 2, ..., [1/2E]
+ 1(thats is,
the sum ofnumbers in the last
column)
is the number of distinct elements in thee-covering
defined above ; we remark that ifN/e
is aninteger, then,
hereand in what
follows,
has to bereplaced by
l: and if1/2e
is an
integer,
then[1/2E]
has to bereplaced by (1/2e) -
1.Since the number of corridors
reaching Qik
isgiven by
the binomial coefficientelements. Hence for any finite N and e >
0,
the number Sgiven by (3. 7)
is finite. Therefore
d) log S(E) (3 . 8)
is finite for any finite N and any e > 0.
2. A lower bound. It follows from
property (P2) (cf.
Sect.II),
whichexpresses the
relationship
between e-entropy ande-capacity,
that a lowerbound for
d)
isgiven by
thelogarithm of
the number of elementsin a
2e-separated
subset of(~N, d).
Let
~(~) _ ~ F ~
for x - - N +2E[N/E], F(x)
isequal
to the lowerboundary
function of somee-corridor,
while for x > - N +2E[N/E], F(x)
=1 ~ .
LEMMA 3.1. - For any e’ > e,
At(e’)
is a2e-separated
subsetof (~N, d).
Proof.
- Let F and G denote two suchboundary
step functions. In order thatthey
be distinctthey
must differ on some interval(a, b)
of
length
2e’ for a value of at least 2e’. Without loss ofgenerality
we cantake
F(x) G(x)
on(a, b].
ThenF(x)
+ 6G(x)
for all x E(a, b]
andall 03B4 2E’. Since F is constant on
(a, b], F(x)
=F(x
+b)
aslong
as x,x + ~ are both contained in
(a, b].
Put x = b - 6. Then x E(a, b]
andF(x + a) + ~ G(x). (3.9)
Vol. IX, n° 2 - 1973.
122 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
Now,
assumed(F, G) = 5o
2e’.By
definition of theLevy metric,
thisimplies
that forbo
~2e’,
for all x. But
(3.10)
is a contradiction to(3.9).
Hence we can concludethat
d(F, G) >
2e’ > 2e forany E’
> e. Thus the above set of functions is2e-separated.
The number of distinct elements in the above
2e’ -separated
set isequal
toHence
~ ’
M2£(~N, d)
>R(s’) (for
all 8’ >s).
which
implies
thatd) > R(4
Hence
LEMMA 3.2.
Proof.
It is known(cf.,
Riordan[25],
p.7)
thatwhere M = min
(m, n, - 1). Therefore,
and thus
Annales de l’lnstitut Henri Poincaré - Section B
E-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS 123
Combining
the results obtained above(i.
e.(3.8), (3.11)
and Lemma3.2),
we can state the
following
result.THEOREM 3.2. - The e-entropy
d) of
the setof
truncatedproba- bility
distributionfunctions defined by (3.4) satisfies
where
We now consider the estimation of the exponent, or order of
growth,
of the entropy of
Using Stirling’s approximation
formulawe obtain for p
sufficiently large
Therefore
Substituting
a =[N/EJ,
p =[1/2E],
we obtain for small eVol. IX, n° 2 - 1973.
124 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
Similarly
Since
log (1
+x) ~
x for small x, we obtain forlarge
NSimilarly
Both and
are of order
and
and
Noting
thatlog
x x, we have for any fixed Nand,
for any fixed eFurther
Annales de l’Institut Henri Poincaré - Section B
e-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS 125
3.3.
e-Entropy
of Setsof Truncated Discrete
Probability
Distribution FunctionsLet
~
denote the class of discreteprobability
distribution functions truncated at - N andN;
i. e.,~ = { F E
F is constant except forN
jumps
at x = k of size0, k
= -N,
...,N, pk
= 1.
It followsfrom the results of Section 3.2 that the number of e-corridors necessary
to cover
~N
isequal
to the number of elements in a2e-separated
subset.Hence the number of elements is
equal
toK=V
Hence we have
K=V
THEOREM 3.3. - The e-entropy
of
the set(~N, d) of
truncated discreteprobability
distributionfunctions
isgiven by
Next we estimate the order of
d).
If we put a = 2N and p =[1/2E]
in
(3.13)
we obtainNow
Vol. IX, n° 2 - 1973.
126 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
Since
2N 1 2E + 1 ,
is small for E smallTherefore,
for any fixedN
and,
for any fixed eThus the metric order of
~
isand the
logarithmic
order isSECTION IV
e-ENTROPY OF SETS OF FOURIER-STIELTJES
TRANSFORMS OF PROBABILITY DISTRIBUTION FUNCTIONS
41 Introduction
Given a
probability
distribution functionF(~),
theFourier-Stieltjes transform
ofF,
that isAnnales de l’lnstitut Henri Poincaré - Section B
127 E-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS
is called the characteristic
function
of F. Thefollowing properties
of charac-teristic functions are well-known
(cf., Chung [4, Chap. VI],
Gnedenkoand
Kolmogorov [11, Chap. II],
Loève[16, Chap. IV],
or Lukacs[l9]):
(1) I .f(t) I f(o) = 1 ~ (2) .f ( - t) = f (t);
(3) f
isuniformly
continuous on the realline;
(4) f
ispositive-definite.
We remark that to all functions F + c, where c is an
arbitrary
constant,corresponds
the sameFourier-Stieltjes
transformf
The converse(and, therefore,
the 1-1correspondence
betweenprobability
distribution func- tions and characteristicfunctions)
follows from thefollowing
inversionformula for
Fourier-Stieltjes
transforms:provided
a and b(a b)
arecontinuity points
of F.Also, (4.2)
holds forall
a, b E
R(a b), provided
F is normalized.We now state the
following
basic result(cf.,
Gnedenko and Kolmo- gorov[11,
p.53]).
THEOREM 4.1. - Let
Fm
F denoteprobability
distributionfunctions
andlet
f", f
denote the associated characteristicfunctions.
ThenF"
converges to Fweakly
as n -~ oo(i.
e., theLevy
distanceF) -~
0 as n -~oo) if
andonly if fn(t) -> f(t) uniformly
in every boundedinterval t ~
T.Let §
denote the space of all characteristic functions. In view of the above theorem the metrization chosenfor §
is as follows.DEFINITION 4.1. - If
f and g
are two characteristic functions in~,
let
where !! f -
=Sup I .f (t) -
tE[a,b]
Vol. IX, n° 2 - 1973.
128 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
Proof
Assumep( f", f ) -~
0. Then to every e > 0 there exists an N such that e for all n >N;
that isfor all n > N. This
implies 11 fn - f
for all n> N. But the last statement says thatf" -~ f uniformly
in any bounded interval.Therefore, by
Theorem4.1, d(Fm F) -~
0 as n - oo.Conversely,
ifF) -~
0 as n - oo,then £ - f uniformly
inevery bounded interval
[-
mn,mn];
thatis,
for every e > 0 there existsan
N(e, m)
such that( ~ fn - ,~
e for all n >N(e, m),
whereN(e, m)
isnondecreasing
as m increases.Now !! f - g I( - m~,mn]
- 2 forany m
(since f (t) - g(t) ~ ~ f (t) + g(t) ~ 2).
Given s >0,
choose Msuch that
Now,
choose N suche/2
for all n > N. Conse-quently, II f" - f ~/2
for all n > N and all M. Thereforefor all n > N.
Let
~N
denote the space of all characteristic functionscorresponding
to truncated
probability
distribution functions F E and let~N
denotethe space of all characteristic functions
corresponding
to discreteproba- bility
distributions F E~N.
For
oN
we have thefollowing
result.LEMMA 4 . 2. - The metric p restricted to
~N
coincides with theuniform
metric
( ~ f -
= supI f (t) - g(t) I .
teR
Annales de l’lnstitut Henri Poincaré - Section B
E-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS 129
Proof.
Iff (t)
= thenf (t
+2~)
=f (t).
Thereforek
It is of interest to note that the fact that two compact metric spaces
(X, d)
and
(Y, p)
areequivalent (i.
e., there is a one-to-onecorrespondence
betweenX and Y, and
d(xn, x)
- 0 if andonly
ifp( y", y)
--~0)
does notimply
that
corresponding
sets in X and Y have e-entropy of same order. Wegive
thefollowing counterexample.
Let
[0,1],
X =In. X, being the product
of compact spaces, is
n=1 1
compact in the cartesian
product topology.
Let eachIn
be metrized in two ways.and
Clearly,
for anyfixed n, dn
and pn areequivalent
metrics onIn.
Let X be metrized
correspondingly:
For x= ~ xn ~ , y
={ yn ~
in Xdefine
d(x, y)
= supdn(xm yn)
andp(x, y)
= supy").
n n
Let
dn(In)
andpn(In)
denote the diameter ofIn according
to themetrics dn
and pn,
respectively. Clearly,
--~ 0 and - 0 as n - 00.Thus
(cf., Dungundji [9],
Theorem7.2,
p.190), d
and p both metrize the cartesianproduct topology
of the spaceX,
and are thereforeequivalent.
Next we compute the e-entropy of X with respect to both metrics. For
n =
1, 2,
...,[1/2E]
letIn
be divided into[1/2nE]
+ 1 intervalsI",kn
ofVol. IX, n° 2 - 1973. 10
130 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
length
2m(with
theexception
of the lastinterval,
which may beshorter).
Then,
for x~, y"belonging
to the sameinterval,
If n >
[ 1 /2E], then,
for any xn, yn ~1m
[1/2~j o0
Thus,
for xand y
in xfl In, d(x, y) = sup _ 2e.
n=l I [l/2s]+l 1
Therefore,
thefamily
is an
e-covering
of X withcardinality
where lm.km
is the left-handend-point
of some intervalIm.km.
Two distinctpoints
of S are at a distance >2s,
sincethus sup 2e.
Therefore,
for any e’ e, S is a2e’-separated
n
set of X.
Clearly,
S is of the samecardinality
asNow.
Using Stirling’s approximation formula,
Annales de l’lnstitut Henri Poincaré - Section B
131 E-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS
Therefore
Similarly,
the order ofH~(X, p)
can be obtained. In this case, forn =
1, 2,
...,[log 1 /2E],
the intervalsIn
are divided into[ I /2E ~ 2"]
+ 1intervals of
length
2E ~ 2"(or
less for the lastone).
The sets ~l’ and S’
(obtained
in ananalogous
way asabove)
havecardinality
where p =
[log 1/2~].
Thereforeand
Hence
H~(X, d)
andH~(X, p)
are of different order.In view of the
above,
it is necessary toconsider,
besides(~N, d)
and(~N, d ),
also the spaces( ~N, p)
and( ~N, p).
In Section 4 . 2 we
give
an estimate of the order of the e-entropyof ( ~N, p),
in Section 4. 3 we compute bounds for the e-entropy of
( ~N, p),
and inSection 4.4 we estimate the order of the bounds on
H~( ~N, p). Finally,
in Section 4.5 we present some comments on the
entropies d)
and
P).
4.2. An Estimate of the Order of the
e-Entropy
of(~N, p)
Any f E ~N
has arepresentation
Put
Vol. IX, n° 2 - 1973.
132 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
where z is
complex.
It is known(cf.,
Lukacs[19],
p.137,
Theorem7.2.1)
that
f (z)
is an entirefunctions;
alsoUsing
the notation ofKolmogorov
and Tihomirov([IS],
p.330), ~N
iscontained in class of entire functions
satisfying ( f (z) ( _
If is considered with the
metric [[ f I I
=I f(z) I,
wherezeK
K =
{
z( 1 ~ ,
then it is known(cf., Kolmogorov
and Tihomirov[IS],
Theorem
20,
p.330)
that. . , .
We remark that Erohin
[10]
has shown that the above result is notchanged
if K is
replaced by
anarbitrary
continuum in thez-plane.
We now prove the
following
result.THEOREM 4. 2. -
where
E
’: if limf (E) f() ~)
1.1E~8)
-
Proof. - Clearly, p)
_HE(~ 1,N, P~.
We will now show thatLet pm denote the metric
Annales de l’Institut Henri Poincaré - Section B
e-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS 133
Since the interval
[ -
mn, is compact andconnected,
it is a continuum in thez-plane,
and we havefor any m.
Now
for any M.
Thus,
a minima! ~-cover for03C1n)
is also an e-cover foryn~)’
and thereforefor every n.
Letting
n -~ oo, we obtainOn the other
hand,
sincea maximal
28-separated
set in(~ i ,N, P 1 ~
is also a28-separated.
set in(~ i ,N, P~ ;
and therefore
Vol. IX, n° 2 - 1973.
134 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
(4.6)
and(4.7) together imply (4.4),
and thus the theorem isproved.
4 . 3 . Bounds for the
8-Entropy
ofp)
In order to find an upper bound for
p),
we construct an 8-cover-ing
of~N,
the class of characteristic functionscorresponding
to the discrete distributions in~N.
Givenf E ~N, f
has arepresentation
where pk
= 1. Nowf
induces a vector(m _ N, ... , mo, ... , mN)
ofk= -N
.
positive
oddintegers
in thefollowing
way : letLet
U((~)j~= _~)
be the set of all functions/e ~ inducing
the same vector(~=-N. Clearly,
any function is in someU((~=_N).
Any
two functionsbelonging
to the same setU((~= _~)
are at a dis-tance less than or
equal
to 2s. This can be verified as follows IfAnnales de l’lnstitut Henri Poincaré - Section B
e-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS 135
But
N
Similarly, g(t) -
,
~ 2N
+ 1I
E.Therefore, f (t) - g(t) 2E,
for all t, thusp( f, g)
2E.Thus,
the class of sets is ane-covering
of
(~N, p).
The number of sets in it isequal
to the number of differentvectors -N induced
by
all the characteristic functions of~N.
ToE count their
numbers,
we introduce thefollowing abbreviations: ~ _
2N+1
N
LEMMA 4.3. is an odd
integer
between so =2(m
+N)
+ 1andsi =2(M +N)+
1.N
Proof
-Clearly ¿
as a sum of an odd number of oddintegers,
k= -N
N
is odd.
Since pk - 1, (2N +1)
maxPk 1,
thereforeVol. IX, n° 2 - 1973.
136 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
Thus one of the
m’s
is at least2.[1 2~] + 1; the other at least 1,
so
mk >_
so =2(m
+N)
+ 1. The maximum value any of themks
can take on is 2M +
1 ;
if this value is taken on, all othermks
must bel,
N
so
_2(M+N)+
1.We now count the number of vectors
(mk)k = - N
which lead to a sumN
I mk = 2s + 1,
then take the sum with respect to s, m + N s M +N.k= -N
LEMMA 4 . 4. - There are
N + S
distinct vectors (mk)Nk=-N whose com-LEMMA 4.4. - There are
2N distinct vectors (mk)Nk=-N whose com-
ponents add up to 2s +
1 ~
i. e. there are(N + s com p o sitions of 2s + 1
with 2N + 1 odd parts. ~’~~
Proof
Thegenerating
function forcompositions
with m odd parts iswhere the coefficient of tn is the number of
compositions
of n with modd parts
(cf.,
Riordan[24],
p.124). Now,
therefore
For m = 2N + 1, the coefficient of t2s + 1 is
This
completes
theproof
of Lemma 4.4.Annales de l’lnstitut Henri Poincaré - Section B
e-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS 137
The total number of distinct induced vectors
(mk)k = - N,
and thereforethe number of elements in the
8-covering
definedabove,
isFrom the above and an
application
of Lemma3.2,
we havewhere m =
[ 1 /2E],
M= - 2 E
.In order to find a lower bound for
H£( ~N, p),
we construct a28-separated
N
subset of
~N.
An element/(t)= ~
is definedby
a vector(pk)k = - N.
Consider the set
where m =
[ 1 /2E].
We now show that two elements
f,
g definedby
distinct vectorsand
(qk)k = - N
inD, respectively,
are at a distance greater than orequal
to 2E. Let pk =
rk.1
, qk =sk.1.
Since the two vectors aredistinct,
forP q
at least one
k,
qk, whichimplies
that skp,and,
since rk, sk, p and qare
integers, rkq - skp| >-
1. Let rkq - skp =dk,
= ak. Then 1Pk - qk -
~ dk~
Pq
We
require
an estimate ofp( f, g),
whereLet ett = z.
Vol. IX, n° 2 - 1973,
138 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
LEMMA 4.5. - Let
ak, k
=0,
..., n beintegers,
0for
at least one k.7’hen ~
n
Proof
Letw(z)
= Since the maximum of ananalytic
func-0
tion is assumed on the
boundary,
.If ao =
0,
let i denote the index of the first coefficient which is notequal
to zero, i. e. a 1 =
0,
..., a;- 1 =0,
at 5~ 0. ThenTherefore,
We now determine the number of elements in the
28-separated
set definedby
the vectors in D. We first count the number of vectors -N whichlead to a sum p, then take the sum with respect to p, p =
1, 2,
...,LEMMA 4.6. - T’here are
p r - 1 2N
+1
vectors(rk)Nk=-N having
exactly
r components notequal
to zero,adding
up to p.Proof
- There are(2N + 1 r ) ways to place
the r non-zero numbers
in the 2N + 1
positions.
Thegenerating
function forcompositions
withexactly
r parts isgr(t)
=(t
+t2
+ ~ ~ ~)r.
Thus the number ofcomposi-
tions of
p into
r parts is the coefficient of tP in theexpansion
of whichis
equal to p i. - 1 1 (cf.,
Riordan[24],
p. .124). Therefore
theproduct
2N+ 1
.p- 1 gives
the number of vectors with rpositive
compo-nents and sum
equal
to p.Annales de l’lnstitut Henri Poincaré - Section B
139
e-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS
If we take the sum of the above
expression,
first with respect tor min
(p,
2N +1),
then with respect to p[~/~],
we find the totalnumber R of elements in the set D.
In our notation we use the
boundary
conventionfor n = 0, 1, ... ; m =
1, 2,
.... Therefore ’This
expression
can besimplified,
if use is made of thefollowing
identities(cf.,
Riordan[25],
p. 3 and p.9).
(4.14), together
with theboundary
convention(4.12),
and(4.13) imply
Thus
Using
the above results and Lemma3.2,
we haveVol. IX, n° 2 - 1973.
140 MARGRIT GAUGLHOFER AND A. T. BHARUCHA-REID
We now utilise the
following
result ofKolmogorov
and Tihomirov([15],
TheoremXXI,
p.331)
to obtain an estimate of the order ofH~(~N, p).
Let denote the
class
of allentire, 2n-periodic
functionsf (z)
withI .f (Z) I -
p > 1. Thenwhere
F;,N
is considered in the uniform metric.If we let
p! 1,
and note thatthen the above result can be extended to p =
1;
that isG
Now,
forf E ~N, f(t)
has arepresentation given by (4. 8).
DefineClearly f (z)
is an entirefunctions;
and1T 1T
Therefore, ~N
c andusing (4.18)
we have4.4. The Estimation of the Order of the Bounds of
p)
In the last section an upper bound of
p)
was found to beAnnales de l’lnstitut Henri Poincaré - Section B
e-ENTROPY OF SETS OF PROBABILITY DISTRIBUTION FUNCTIONS 141
where
Now
Noting
thatM ~ N,
we find thate
Thus,
the upper bound(4. lo)
ofp)
is of order less than orequal
toas e -
0,
for any fixedN; and
less than orequal to O(N)
asN ~ oo, for any fixed e.
A lower bound for
p)
isVol. IX, n° 2 - 1973.