A NNALES DE L ’I. H. P., SECTION B
M ICHEL T ALAGRAND
The missing factor in Hoeffding’s inequalities
Annales de l’I. H. P., section B, tome 31, n
o4 (1995), p. 689-702
<http://www.numdam.org/item?id=AIHPB_1995__31_4_689_0>
© Gauthier-Villars, 1995, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section B » (http://www.elsevier.com/locate/anihpb) implique l’accord avec les condi- tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti- lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
Vol. 31, n° 4, 1995, p. 689-702 Probabilités et Statistiques
The Missing factor in Hoeffding’s inequalities
by
Michel TALAGRAND
University of Paris VI and The Ohio State University.
ABSTRACT. - A celebrated paper of
Hoeffding
establishessharp
boundsfor the tails of sums of bounded
independent
random variables. Anelementary
observation allows toimprove
these bounds tooptimal
orderunder mild conditions.
(The
method wepresent
also allows toimprove
many other
exponential bounds.)
RÉSUMÉ. - Un article célèbre
d’Hoeffding
établit des bornes pour la déviation d’une somme de variables aléatoiresindépendantes
parrapport
àsa moyenne. Combinant la méthode
d’Hoeffding
avec la transformationd’Esscher,
on montre comment sous deshypothèses supplémentaires
minimes ces bornes
peuvent
être améliorées d’unefaçon
essentiellementoptimale.
1. INTRODUCTION
Consider
independent
centered random variables(r.v.)
(Throughout
thepaper, X
willalways
dénote a centeredr.v.)
Thestudy Équipe
d’Analyse, Tour 48, U.A. au C.N.R.S. n° 754, Université Paris VI, 4, place Jussieu, 75230 Paris Cedex 05 and Department of Mathematics, 231 West 18th Avenue, Columbus, Ohio 43210, U..S.A.Annales de l ’Institut Henri Poincaré - Probabilités et Statistiques - 0246-0203 Vol. 3ï/95/04/$ Gauthier- Villars
of the tail
certainly
has along history.
In the case where the r.v. areidentically
distributed(and E exp 03B1|X1|
oo for some a >0),
H. Cramerproved
in 1938that,
ifp2
=then,
if x =o(nl/6),
we haveAsymptotic expansions,
howeverprecise,
do not diminish the need forinequalities
valid for alln, t.
Suchinequalities
have been obtained inparticular by Yu,
V.Prokhorov,
G. Bennett, W.Hoeffding,
S. V.Nagaev.
For the purpose of
simplicity,
let us discussHoeffding’s
bounds. The thrustof his paper is
(in
the i.i.d.case)
to obtain bounds thatdepend only
onp2
=EX21
and on b =sup Xi .
He obtains a boundfor a certain function H
(that
will be described in(1.5) below).
Thesharpness
of this bound can be tested in the case of the binomiallaw,
i. e.when
P(Xi
=1- p)
= p,P(Xi
=-p)
=1-p,
for some 0p
1. In thatcase
p2
=p(l - p),
b = 1 - p. For the binomiallaw, straight computation using Sterling’ s
formulae allows to evaluate >t).
Inparticular,
in
one then finds
that, min(p, 1 - p),
and some constantK,
for t »
(On
the otherhand, (1.2)
isoptimal
for t =b).
The bounds of
[H], [N],
are all obtained from theinequality
As
pointed
outby Hoeffding, (1.2)
is the best that can be obtained from( 1.4).
The purpose of thepresent
paper is to show how thetechnique
used toAnnales de l ’lnstitut Henri Poincaré - Probabilités et Statistiques
obtain
( 1.1 )
can be combined with the calculationsinvolving
the use of( 1.4)
to obtain under very mild extra conditions the
missing
factorexemplified by (1.3)
ininequalities
like(1.2), yielding
bounds oftruly optimal
order. Themethod we present is very
general.
We feel however that we serve better the readerby concentrating
on twosimple
results rather thanby trying
tobe
encyclopedic. (Adaptation
of the method to other cases isroutine.)
We consider the function
Thus
8(0)
=1/2,
and it is easy to see that 8 decreases for x > 0. It is shown in[I-K]
p. 17 and[K-J],
p. 505 thatWe consider the function
We denote
by
K a universal constant, that may vary at each occurrence.THEOREM 1.1. - Consider
independent
centered r,v. and let~2 - E( ~ Xi ) 2, p2 - ~2 ~n.
Assume thatXZ
bfor
all i n, and 2~_12~2
that |Xi| KB’ B for
all i n.Then, for
0 t KB, we haveComments. -
1)
SinceB(t/Q)
is of order it dominatesB/03C3
for therange of t considered. In
particular (1.6) yields
2)
For t =o(~/~-~~),
we have = +o(l).
This is the range where
(1.1)
holds.By
contrast, Theorem 1.1 holds forVol. 31,n° 4-1995.
much
larger
values of t. Itimplies
inparticular,
thatgiven
c > 0, there isa constant
K(c),
such that if0 t
then3)
Forsimplicity
we have made the blanketassumption B, although quite
less isrequired. However,
we do not know how toimprove
Theorem 3 of
[H]
without furtherassumption.
4) Certainly
one can obtain a rather small numerical value forK,
but numericalcomputations
are better left to others with the talent for it.THEOREM 1.2. - Consider
independent
r.v.(YZ)Zn
such that 0 ~ 1.Set tc =
n-1 03A3 EYi.
Thenfor
K t/ K,
we havein
Comments. -
1)
Thequantity
exp/~(1 2014 ~),
1 -)
isexactly
the bound(2.1 )
of[H] .
When the r.v. z have the common meantc, Theorem 1.2 is
closely
related to Theorem 1.1 as is seenby considering Xi
andusing
thepessimistic
bound o-{t).
Thepoint
of Theorem 1.2 is that the
variables
need not have a common mean.2)
When0-2
is of order theBerry-Essen
theorem allows toget
rid of therestriction t ~
K.The
point
of Theorems1.1,
1.2 is that for values of t up to ordercr2,
weget
both theoptimal exponential
term, and theessentially
correctfactor in front of it. In the situation of Theorem
1.2,
the worst case istc ==
1/2.
In that case, the exponent of theexponential
term is very closeto -2t2/n
for values of t of order up ton3~4 .
This isessentially
the range whereasymptotic expressions
like( 1.1 ) hold,
so much of thestrength
ofTheorem 1.2 is lost when one
replaces
the last termby
Thecorresponding corollary
is nonethelessworthy
to state.COROLLARY 1.3. - In the situation
of
Theorem 1.2 wehave, for
t >K,
thatAnnales de l’Institut Henri Poincaré - Probabilités et Statistiques
The paper is
organized
as follows. The basic observation ispresented
in Section 2. The
elementary
estimates needed to takeadvantage
of it arepresented
in Section3;
and the Theorems areproved
in Section 4.2. THE APPROACH For i
~
n, consider the functiongiven by
Since,
in this paper, we assume thatXi
isbounded, 03C8i
isinfinitely
differentiable. We
in The basic
inequality
is asfollows.
PROPOSITION 2.1. - Consider s > 0, and t =
~’ ( s ).
Thenwhere
pz
= and whereh(s)
isgiven by (2.4)
below.The term
h(s)
will be shown to be of lower order. Theapproach
is nowas follows. Given t > 0, to obtain a bound for
P(~ Xi
>t),
weapply
ia
(2.1)
where s is chosen so that t =~’(s).
We then find a lower bound for sand9" (s) (and
thus an upper bound forB(sp)).
We then observe thatand we note that the
quantity
M>0inf is boundedby
theHoeffding bounds,
since it is how these bounds are derived.To prove
Proposition
2.1, we follow Feller[F],
p. 554. We set S= 2: Xi.
i~n We consider the r.v.
"
so that ER = 1.
(The
Esschertransformation). Thus,
we can consider theprobability Q
such thatdQ
= RdP. Thus dP ==Vol. 31,n° 4-1995.
We have
where the last
equality
followsby integration by
parts. We now observe that the r.v.Xi
are stillindependent
whenQ
is the basicprobability.
Wecan thus
appeal
to theBerry-Essen
theorem as inFeller,
p.542,
to getwhere a
= J SdQ, h(s) = 6r(~)p ~(~),
and where
Since
by
differentiation we haveso that a = t.
Plugging (2.3)
into(2.2)
andintegrating by part again yields
Making
thechange
of variable u = t + pv, the last term is seen to beby
a standardcomputation.
To finish theproof,
it sufficesto see that
p2
=~" ( s ) .
This is seenby differentiating again
the relation(2.5).
aAnnales de l ’lnstitut Henri Poincaré - Probabilités et Statistiques
3. ESTIMATES The basic lemma is
completely elementary.
LEMMA 3.1. - Consider a
(bounded)
r.v. X with EX = 0,T2
=EX 2,
03B3 =
EX3.
Setp(s)
=log EesX.
ThenProof. - a) By computation,
we havecp"(s)
=g(s)(Eesx)-2,
where(This might
be thepoint
to observe thatg(s) >
0by Cauchy-Schwartz,
so that
(//’(~) ~ 0).
ThusThus, by Cauchy-Schwartz,
we haveg"(s)
>0,
so thatg’(s)
>g’ (0) =
q,and
g(s)
>g(0)
+ s-y =TZ
+ s~y.b)
Consider the functiong(s)
=E(XeSX).
Theng~3>(s)
= >0, so that
g"(s)
>g"(0)
= "I,g’(s)
>g’(0)
+ -ys =T2
+ andg(s)
>Tzs
+7zs~2.
c)
Consider the functionf(s)
= Thenf"(s)
=so that
f’(s) f’(0)
= and the resultby integration.
DLEMMA 3.2. - Assume that
for
some number B > 0, we have Bfor
all i n and
EX~ B2, E’X3
>-BEX2. Then, if
ssatisfies
t =we have
Vol. 31, n° 4-1995.
Proof of (3.1). -
We haveBy
Lemma3.1,
c, weget, setting a;
=EX 2
so that
In
particular,
sincelog(l
+x)
> x -x2/2
for x >0,
we haveProof of (3.2). - By
Lemma3.1, b,
wehave, setting
~y2 =~X~
By
Lemma3.1,
c,setting ((s) = (esB -
sB -1)/B~,
we haveThus, combining (3.4), (3.6), (3.7),
we haveAnnales de l’Institut Henri Poincaré - Probabilités et Statistiques
Denote
by f(s)
theright
hand side. Since~(s) (e - 2)~ s2
fors 1/B, ~(~) (e - 1)~ ~ 2s,
one sees that if0 s
so =1/3B, f’(s) > 0,
so thatf (s)
increases. Sinceq$1’
increases(as
noted in theproof
of Lemma
3.1),
it follows that if t =~%’(s) f (so),
wehave s
so, so that~(s) s2
and s Sincef (so)
>Q2~6B,
we have shown thatfor t
we have s(so
that inparticular
sB1/3).
Proof of (3.3). - By
Lemma3.1,
a, we haveBy
Lemma3.1,
c, and since sB 1, we haveso that
and
Combining
with(3.5)
and the fact that s2t/03C32 yield
the result. © Remark. - Rather thanassuming B,
one could obtain aweaker,
but sufficient result
assuming only
2.We are now
ready
for the main result.THEOREM 3.3. - There exists a universal constant K with the
following
property. Assume that
for
some number B > 0, the r. v.x~ satisfies
Then, ~r2 / K B,
we haveVol. 3i, n° 4-1995.
Proof. -
Let tsa2/6B,
and let ssatisfy t
=’lj;’ (s). Thus, by (3.2)
wehave
sB 1/3.
We observe thatThus, by (3.8),
we haveh(s)
We now show
that,
if~ ~/2~jB,
we haveIndeed,
Thus,
it suffices to prove thefollowing
lemma.LEMMA 3.4.
Proof. -
We observe that-B’(A)
=1 203C0 - 03BB03B8(03BB), so that
and the result follows
by elementary
estimates. D Remark. -Actually
lim-8’(~)
= and lim-~ZB’(~)
=It remains to observe that
But we have noted that
1/;" 2: 0,
sothat 9
is convex. And the minimumof
~(t6) 2014
ut occurs at thepoint
where the derivative is zero, i.e. for~’ (u)
= t. oAnnales de l ’lnstitut Henri Poincaré - Probabilités et Statistiques
4. PROOFS OF THE THEOREMS
Theorem 1.1 is a direct consequence of Theorem
3.3,
combined withHoeffding’ s
result thatWe now turn to the
proof
of Theorem 1.2. Forsimplicity
we set=
~,), 1 -
First we recall thatHoeffding
proves thatThus, recalling
thatXi = 5l - EYi by
Theorem3.3,
for some universalconstants
Ko, Ki,
we havethat,
for t0’2/ Ko
We now show that
By concavity
of the functionx(1 - x),
it suffices to show thatEYi(l - EY2 ) .
This follows from the fact thatfor - p x
1 - p, wehave x2 p(l- p) +x(1 - 2p),
sothat,
if-p X 1 - p
and EX = 0,we have
EX 2 p(l - p).
The main
difficulty
in theproof
of Theorem 1.2 is that it canhappen
that
o-2 tKo,
so that we cannot use(4.2).
The way around thisdifficulty
is to show in that case
that,
unless t is verysmall, using (1.2)
with b = 1yields
a better bound than(1.7). (Thus,
in that range(1.7)
is indeed weaker than(1.2).
Thepoint, however,
is that(1.7)
involvesonly
firstmoments.) By
directcomputation,
we observe theprecious
fact thatVol. 31, n° 4-1995.
In
particular
We
fix t,
and we assumeo-2 C tKo,
so thatp2 tKo/no
Forx ~ t/n,
we have
Thus, by (4.4)
andintegration
weget
and thus
On the other
hand,
for x~(1 - /~)/2,
we haveThus, by (4.5)
andintegration,
weget for t nu(1 - ~,)/2,
thatSince
9(A)
>(~/27r(l
+A))’B
the boundgiven by (1.7)
islarger
thanwhile, by (4.6),
the boundgiven by (1.2)
is less thanAnnales de l’lnstitut Henri Poincaré - Probabilités et Statistiques
It follows
that,
for some universal constantK,
if t> K, t
thenBZ Bl. (Observe
that thenThus,
we have shown thatprovided t 2: K,
we canalways
assumet so that
(4.3)
holds. We now observe that(provided Ko
islarge
enough)
the function .increases
for t > Ko
anda2 > Kot. Indeed,
we haveand the claim follows from Lemma 3.4. Theorem 1.2 is
proved.
DWe now prove
Corollary
1.3.By (4.5),
we haveso
that, by integration
since
(x
+a)3 - a3
is minimum at a =-x /2. Thus, by integration
By (4.2),
we haveVol. 31, n ° 4-1995.
If t 2: n7~8,
this is better than the bound of(1.8)
for nsufficiently large (which
we cancertainly assume).
Thus it suffices to consider the caset n7~8. For x t/n,
we haveThus, by (4.4) again
Thus, by (4.5),
and the argument of Theorem1.2,
it suffices to consider the case t Then(1.7)
holds.By (4.9),
we have2~/?T; and,
as shown in theproof
of Theorem2.1,
we have[B] G. BENNETT, Probability inequalities for the sum of independent random variables, Amer.
Stat. Assoc. J., 57, 1962, pp. 33-45.
[F] W. FELLER, An introduction to Probability Theory and its applications, Vol. II (second edition), J. Wiley and Sons, 1971.
[H] W. HOEFFDING, Probability inequalities for sums of bounded random variables, Amer.
Stat. Assoc. J., 58, 1963, pp. 13-30.
[K-J] S. KOTZ and N. L. JOHNSON, Encyclopedia of Statistical Sciences, Wiley, New York, 1982.
[I-K] K. ITO and H. P. MCKEAN, Diffusion processes and their sample paths, 1974, Springer Verlag.
[N] S. V. NAGAEV, Large deviations of sums of independent random variables, Ann. Probab., 7, 1977, pp. 745-789.
[P-U] I. S. PINELIS and S. A. UTEV, Exact exponential bounds for sums of independent random variables, Theory Probab. Appl., 1990, pp. 340-345.
[P] YU V. PROKHOROV, An extremal problem in probability theory, Theory of Probability
and its applications, 4, 1959, pp. 201-3.
(Manuscript received September 24, 1993;
revised June 30, 1994. )
Annales de l’lnstitut Henri Poincaré - Probabilités et Statistiques REFERENCES