J
OURNAL DE
T
HÉORIE DES
N
OMBRES DE
B
ORDEAUX
M
ICHEL
D
EKKING
P
ETER
V
AN DER
W
AL
Uniform distribution modulo one and binary search trees
Journal de Théorie des Nombres de Bordeaux, tome
14, n
o2 (2002),
p. 415-424
<http://www.numdam.org/item?id=JTNB_2002__14_2_415_0>
© Université Bordeaux 1, 2002, tous droits réservés.
L’accès aux archives de la revue « Journal de Théorie des Nombres de Bordeaux » (http://jtnb.cedram.org/) implique l’accord avec les condi-tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti-lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte-nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
Uniform distribution modulo
oneand
binary
search
trees
par MICHEL DEKKING et PETER VAN DER WAL
On the occasion of the 65th birthday of Michel France
RÉSUMÉ. On peut construire à partir d’une suite x =
(xk)~k=1
denombres distincts de l’intervalle
[0,1]
un arbre binaire enplaçant
successivement ces nombres sur les noeuds selon unalgorithme
"gauche-droite"
(cela
revient à classer les nombres selonl’algori-thme
Quicksort).
On noteHn(x)
la hauteur de l’arbre obtenu àpartir
des nombres x1, ... , xn . Il est évident quelogn/log2-1 ~ Hn(x) ~ n-1.
Si la suite x est obtenue comme valeurs de variables aléatoires
indépendantes
uniformes sur[0,1],
alors on saitqu’il
existe c > 0tel que
Hn (x) ~
c log n,
(n
~~),
presque-sûrement Récemment,
Devroye
etGoudjil
ont montré que si les x sont les suites deWeyl,
i.e.,
xk ={03B1k},
k =1,2,...,
où 03B1 est une variable aléatoire uniforme sur[0,1],
alorsHn (x) ~
(12/03C02)log n log log n, n ~
~,en
probabilité.
Dans ce
papier
nous considérons la classe de toutes les suites xuniformément
réparties
pourlesquelles
nous montrons que l’on a nécessairementHn (x)
=o(n)
quand n
~ ~.ABSTRACT.
Any
sequence x =(xk)~k=1
of distinct numbers from[0,1]
generates abinary
treeby storing
the numbersconsecutively
at the nodes
according
to aleft-right algorithm
(or
equivalently
by sorting
the numbersaccording
to theQuicksort
algorithm).
LetHn(x)
be theheight
of the treegenerated
by
xl, ... , xn.Ob-viously
log n/log 2 -1 ~ Hn (x) ~ n - 1.
If the sequences x are
generated by independent
random variableshaving
the uniform distribution on[0, 1],
then it is well known that there exists c > 0 such thatHn (x) ~
c log n
as n ~ ~ for almost all sequences x.Recently
Devroye
andGoudjil
have shown thatif the sequences x are
Weyl
sequences,i.e.,
definedby
xk ={03B1k},
k =
1, 2, ... ,
and 03B1 is distributeduniformly
at random on[0, 1]
thenHn(x) ~
(12/03C02)log n log log n
as n ~ ~ in
probability.
In this paper we consider the class of all
uniformly
distributedsequences x, and we show that the
only
behaviour that is excludedby
theequidistribution
of x is that of the worst case, i.e., for these x wenecessarily
haveHn (x)
=o(n)
as n ~ ~.1. Introduction
The
starting point
for thepresent
work is a paperpublished
in1981,
written
by
Michel Mend6s France and the first author titled "Uniform dis-tribution modulo one: ageometric viewpoint"
([3]).
The central idea of thatpaper is to associate to any sequence
(xk)
of real numbers a curver(x)
inthe
(complex)
plane
by putting
r(x)
=7((0, oo)),
whereand
1’(t)
is linear The curver(x)
gives
finer information on the sequence x than if one considers x as a subset~2? -" }
of Seee.g.
Figure
1 whichdisplays
apart
ofr((7rk 2)).
FIGURE 1.
r((7T~)) .
As noted in
[3]
thesubpatterns
of r which appear on thespiral
consist of113
segments,
the numerator of one of the continued fractionconvergents
of x. This has been further
explained
in[4]
and[1].
The first author has been involved the last few years in the
study
of treesgenerated by algorithms
(see
e.g.[2]).
Twoimportant
algorithms,
Quicksort
trees. In this paper we shall consider the
question
what one can say aboutthe
shape
of treesgenerated by
thebinary
search treealgorithm
when theinput
sequence x isequidistributed,
in other words we vary on the titleabove: uniform distribution modulo one - a
viewpoint
from a tree.2.
Binary
search treesA
binary
search tree is constructed from a sequence of distinct realnum-bers,
called thekeys,
as follows. The firstkey
isplaced
at thetop
of thetree,
called the root. The nextkey
is directed to the left subtree if it is smaller than the rootkey,
and to theright
subtree if it islarger.
Ingeneral,
keys
enter the left orright
subtreethrough
the root node and then theprocess of
turning
left andright
isrepeated
until the end of a branch isreached. We illustrate this with an
example.
FIGURE 2.
Building
the tree from the sequence.5, .2, .4, .6, .3, .1.
Consider the sequence of
keys
x1 -.5,
X2 =.2,
X3 =.4,
X4 =.6, z5 =
.3, xs
= .1: the firstkey
is.5,
and it will beplaced
at the root of the tree.The second
key,
.2,
goes to the left subtree because .2 .5. The thirdkey,
.4,
first goes to the leftsubtree,
because .4 .5. After this itgets
compared
to.2,
and because .4 >.2,
goes to theright.
This process continues until allkeys
areplaced
in thetree,
and we have reached the final tree inFigure
2.We shall denote
T (x)
the treegenerated by
a sequence x with elementsfrom
~0,1~,
andTn (z)
the treegenerated by
the first n elements xl, ... , xn.Since the nodes at each level double in
number,
theordinary
representation
of willquickly
get
messy. We therefore choose aspecial
embedding
of the
binary
tree in the(complex)
plane,
where the branches are scaledby
a factorvlr2-
at each level. Moreprecisely,
if we code a level n node vby
the vector
(dl, ... , dn),
where dk
= 1 when the level k node on theunique
path
from v to the root is aright
son of hisfather,
anddk - -1
if it is aleft son, then the
embedding
isgiven by
FIGURE 3.
Embedding
abinary
search tree in theplane.
When we
apply
thisalgorithm
to the first 1500 elements of the sequencex =
(17rk 21)
(here
{y}
denotes the fractionalpart
of a numbery),
and usethis
embedding
we obtainFigure
4.FIGURE 4.
T,500 (firk 21).
Apparently
we do not see now the structureimposed by
the continued fractionexpansion
of 7r as inFigure
1.However,
we mill see this structure in(see
Figure
5),
and this will beexplained
in the next section.An
important
characteristic of abinary
search tree is theheight
Hn(x)
of7~(~c).
Here theheight
is defined as thelargest
level that isoccupied
by
a
key
(the
level of the root is0).
Obviously
we have for all x and all nIt is
customary
for theanalysis
of thealgorithm
to assume that thekeys
aregenerated
by independent
random variableshaving
the uniform distributionon
(0,1~.
For this so called randompermutation
model almost all sequencesbehave
essentially
in the same way:Theorem 1
(cf.
[5, 9]).
Under the randompermutation
modelHn(x) ~
c log n
as n --~ oo almostsurely for
some c > 0(c
= 4.31107...).
Now
typically,
in fact almostsurely,
the sequences xgenerated by
the randompermutation
model areuniformly
distributed. A naturalquestion
is therefore what can be said about
Hn(x)
for anarbitrary
equidistributed
sequence x.
FIGURE 5.
3.
Binary
search treesgenerated by Weyl
sequencesTo answer the
previous question
we canprofit
from a niceanalysis
per-formed
by Devroye
andGoudjil
([6])
of a subclass ofequidistributed
They
are able togive
aprecise
description
of theheight
of the treegenerated
by
such sequences.Theorem 2
([6]).
Let a be an irrational numbers in[0, 1]
with continuedfraction
expansions
a =~al, a2, ... ~,
andconvergents
Pn/qn.
Thenand
Devroye
andGoudjil
deduce from this that for most(i.e.,
almost all ifa is chosen
according
to the uniform distribution on[0, 1])
sequences theheight
behaves aswhere the convergence is in
probability
as n ~ 00.They
furthermore deduce thefollowing
result on"large
height"
behaviour.Theorem 3
([6]).
Let(hn)
be a monotone sequenceof
real numbersde-creasing
0 at any slow rate. Then there exists a such that(2)
nhn
for infinitely
many n.Clearly
(1)
aboveimplies
that "smallheight"
behaviour is obtained for0: = T
:= ~(B/5 2013 1).
We then haveSee
Figure
6 for theembedding
of this tree.Our
goal
will be to show that if one takesarbitrary equidistributed
se-quences x, one can not do better than in
(2),
i.e.,
=o(~c),
but it ispossible
to fill the gap betweenI~
and This will be done in the next two sections.4. The
height
of treesgenerated by uniformly
distributedsequences
We first show that denseness of a sequence x
already
forces someregu-larity
onT(x).
Proposition
4. Let be a dense sequenceof
distinct numbers in~0,1~.
Thenfor
each - >0
there exists N such that all level N nodes inare
occupied,
and such that any twoneighbours
v and v on level NFIGURE 6. +
~)k~) .
Proof.
Fix an E > 0.By
the denseness of(xk),
between any two Xn andzm an Xk will appear for some k > n, m
(and
numbersarbitrarily
close to 0 and 1 willappear) .
Clearly
thisimplies
that all levels will be filled out in the end. Now let no be solarge
that for any xk in{~1,~2?’’ -
thereexists Xl such that
§-
Next,
choose N such that N >HnO,
thenobviously
level N isyet
completely
empty.
According
to the observation at thebeginning
of thisproof,
there exists nl such that thecomplete
level N isoccupied by keys
from We claim that for any twoneighbours
v and E, of this level
(say v
to the left ofv)
theirkeys
xkand xk
differ lessthan E.
Suppose
not,
i.e.,
suppose thatlXk -
ê. Let v A P be theclosest common ancestor of v and
v,
is on thepath
from v andthat from v to the
root,
and its level is maximal with thisproperty.
Let xj be thekey
at v A ÎÍ. Thenclearly
x~ xk .Let xi
be the smallest number so that 1 ~ no and(since
ê, either xtexists,
or there will exist alargest
number x,with Xj
zr xié and theproof
willproceed
similarly) .
Since I nok,
and since there are no other xmbetween Xk
and Xl, xe has beenassigned
toa node v* on the
path
from the root to v. Then there are twopossibilities
(writing
v’v"
if two nodes v’ and v" are on the samepath
to the root and the level of v’ is smaller than the level ofv") :
v* v 1B ¡; or v 1B ¡; v*.and v are on the same
path:
since Xl has gone to theright
subtree ofv*,
but since xk XI, xk has gone to thele f t
subtree of v*.Case 2: v 1B v v* . Then we
get
a contradiction with the fact that v andv are
neighbouring
nodes: now Xk must have gone to the left subtree bothat v 1B v and at v * . 0
We next show that
equidistribution
of xgives
still more structure onT(x).
Theorem 5.
If X
=(Xk)k>l
is auniforml y
distributedsequence in
~0,1~,
then =
o(n)
as n - oo.Proo, f. Suppose
c ~ ninfinitely
often for some c > 0.Choose N’ such that the first node of level N’ has a
key
which is smallerthan
c/2,
and the last node of levelN’
has akey
which islarger
than1 -
c/2.
By
theproposition
above there is a level N >N’,
such that thekeys coming
from x at levelN,
which we denoteby
yl y2 ... y2Nhave the
property
thatFor where n is
large,
we define=
height
of the subtree rooted at the node withkey
y;.
Then there is
a j
such thatfor
infinitely
many n.Keys zk
which will be moved to the subtree rooted at thejth
node(with
labelyj)
should at leastsatisfy
xx Yj+l, hence thisimplies
thatinfinitely
oftenThis contradicts the fact that the left hand side converges to which is
strictly
smaller than c.D
5. Small
height
Let x =(~1,~2?. ... )
=(3? ~ ~ ~ §? §) -
... )
denote the van derCorput
sequence. Fix c in the interval
( 1, 2)
and define an index sequenceI(c)
=(21, 22, ’ ... )
by
,
Note that the sequence
i(c)
isstrictly increasing.
We obtain a sequenceFIGURE 7.
following
way. We firstplace
theentry
X2n-l =2n
atposition
in
iny(c)
for n =
l, 2, ...
and thenplace
theremaining
entries of xalong
theopen
positions,
in the same orderthey appeared
in x. Note thaty(2)
isagain
equal
to the van derCorput
sequence.Lemma 6. For c E
(1, 2~,
the sequencey(c)
isequidistributed
on the unitinterval and
From this lemma it
immediately
follows thatProof.
Note that all sequencesy(c)
generate
the samebinary
search tree and that at time n, theheight
of the tree isequal
to the number of powersof 1
stored in the tree minus 1.Hence,
if n >and if n
r cn-ll
To see that the sequence
y(c)
isequidistributed,
note that the number of entries in(xl, ... , zn)
that do not appear in(yl, ... , y~)
is at mostwhere we used that xl =
yl.
Hence,
for 0 a b 1Since
it follows n : a yk
b}
I
converges to b - a and hence thesequence
y(c)
isequidistributed.
0References
[1] M. V. BERRY, J. GOLDBERG, Renormalisation of curlicues. Nonlinearity 1 (1988), 1-26.
[2] F. M. DEKKING, S. DE GRAAF, L. E. MEESTER, On the node structure of binary search trees. In Mathematics and Computer Science - Algorithms, Trees, Combinatorics and Probabilities
(Eds. Gardy, D. and Mokkadem, A.), pages 31-40. Birkhauser, Basel, 2000.
[3] F. M. DEKKING, M. MENDÈS FRANCE, Uniform distribution modulo one: a geometrical
view-point. J. Reine Angew. Math. 329 (1981), 143-153.
[4] J.-M. DESHOUILLERS, Geometric aspect of Weyl sums. In Elementary and analytic theory of numbers (Warsaw, 1982), pages 75-82. PWN, Warsaw, 1985.
[5] L. DEVROYE, A note on the height of binary search trees. J. Assoc. Comput. Mach. 33 (1986), 489-498.
[6] L. DEVROYE, A. GOUDJIL, A study of random Weyl trees. Random Structures Algorithms 12
(1998), 271-295.
[7] C. A. R. HOARE, Quicksort. Comput. J. 5 (1962), 10-15.
[8] H. M. MAHMOUD, Evolution of random search trees. John Wiley & Sons Inc., New York,
1992. Wiley-Interscience Series in Discrete Mathematics and Optimization.
[9] B. PITTEL, Asymptotical growth of a class of random trees. Ann. Probab. 13 (1985), 414-427.
Michel DEKKING, Peter VAN DER WAL
Thomas Stieltjes Institute for Mathematics and Delft University of Technology
Faculty ITS, Department CROSS
Mekelweg 4, 2628 CD Delft The Netherlands