Uniform distribution modulo one and binary search trees

(1)

J

OURNAL DE

T

HÉORIE DES

N

OMBRES DE

B

ORDEAUX

M

ICHEL

D

EKKING

P

ETER

V

AN DER

W

AL

Uniform distribution modulo one and binary search trees

Journal de Théorie des Nombres de Bordeaux, tome

14, n

o

_{2 (2002),}

p. 415-424

<http://www.numdam.org/item?id=JTNB_2002__14_2_415_0>

L’accès aux archives de la revue « Journal de Théorie des Nombres de Bordeaux » (http://jtnb.cedram.org/) implique l’accord avec les condi-tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti-lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte-nir la présente mention de copyright.

Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques

(2)

Uniform distribution modulo

one

and

binary

search

trees

par MICHEL DEKKING et PETER VAN DER WAL

On the occasion of the 65th _{birthday of}Michel France

RÉSUMÉ. On _peutconstruire à _partird’une suite x =

(xk)~k=1

de

nombres distincts de l’intervalle

_[0,1]

un arbre binaire en

_plaçant

successivement ces nombres sur les noeuds selon un

_algorithme

"gauche-droite"

(cela

revient à classer les nombres selon

l’algori-thme

_Quicksort).

On note

_Hn(x)

la hauteur de l’arbre obtenu à

partir

des nombres x1, ... , xn . Il est évident que

logn/log2-1 ~ Hn(x) ~ n-1.

Si la suite x est obtenue comme valeurs de variables aléatoires

indépendantes

uniformes sur

[0,1],

alors on sait

qu’il

existe c > 0

tel _que

_{Hn (x) ~}

_{c log n,}

_(n

~

_~),

_{presque-sûrement Récemment,}

Devroye

et

_Goudjil

ont montré que si les x sont les suites de

_Weyl,

i.e.,

xk =

{03B1k},

k =

1,2,...,

où 03B1 est une variable aléatoire uniforme sur

[0,1],

alors

Hn (x) ~

(12/03C02)log n log log n, n ~

~,

en

probabilité.

Dans ce

_papier

nous considérons la classe de toutes les suites x

uniformément

_réparties

_pour

_lesquelles

nous montrons _quel’on a nécessairement

_{Hn (x)}

=

o(n)

quand n

~ ~.

ABSTRACT.

_Any

_{sequence x}=

(xk)~k=1

of distinct numbers from

[0,1]

generates a

binary

tree

_{by storing}

the numbers

_{consecutively}

at the nodes

_according

to a

left-right algorithm

(or

equivalently

by sorting

the numbers

_according

to the

_Quicksort

_algorithm).

Let

_Hn(x)

be the

_height

of the tree

_generated

_by

xl, ... , xn.

Ob-viously

log n/log 2 -1 ~ Hn (x) ~ n - 1.

If the _{sequences x}are

generated by independent

random variables

having

the uniform distribution on

[0, 1],

then it is well known that there exists c > 0 such that

_{Hn (x) ~}

_{c log n}

as n ~ ~ for almost all sequences x.

_Recently

_Devroye

and

Goudjil

have shown that

(3)

if the _{sequences x}are

_Weyl

_sequences,

_i.e.,

defined

_by

xk =

{03B1k},

k =

_{1, 2, ... ,}

and 03B1 is distributed

_uniformly

at random on

[0, 1]

then

Hn(x) ~

(12/03C02)log n log log n

as n ~ ~ in

probability.

In this _paperwe consider the class of all

uniformly

distributed

sequences x, and we show that the

_only

behaviour that is excluded

by

the

_{equidistribution}

of x is that of the worst case, i.e., for these x we

necessarily

have

Hn (x)

=

o(n)

as n ~ ~.

1. Introduction

The

_{starting point}

for the

_present

work is a _paper

published

in

1981,

written

_by

Michel Mend6s France and the first author titled "Uniform dis-tribution modulo one: a

_{geometric viewpoint"}

([3]).

The central idea of that

paper is to associate to any sequence

(xk)

of real numbers a curve

r(x)

in

the

_(complex)

_plane

_{by putting}

_r(x)

=

7((0, oo)),

where

and

_1’(t)

is linear The curve

r(x)

_gives

finer information on the _{sequence x}than if one considers x as a subset

~2? -" }

of See

e.g.

Figure

1 which

displays

a

_part

of

r((7rk 2)).

FIGURE 1.

_{r((7T~)) .}

As noted in

_[3]

the

_subpatterns

of r which _appearon the

spiral

consist of

113

_segments,

the numerator of one of the continued fraction

_convergents

of x. This has been further

explained

in

[4]

and

[1].

The first author has been involved the last few years in the

study

of trees

generated by algorithms

(see

e.g.

[2]).

Two

important

algorithms,

Quicksort

(4)

trees. In this paper we shall consider the

_question

what one can _sayabout

the

_shape

of trees

_{generated by}

the

_binary

search tree

_algorithm

when the

input

sequence x is

equidistributed,

in other words we _varyon the title

above: uniform distribution modulo one - a

_viewpoint

from a tree.

2.

_Binary

search trees

A

_binary

search tree is constructed from a _sequenceof distinct real

num-bers,

called the

_keys,

as follows. The first

key

is

placed

at the

_top

of the

tree,

called the root. The next

_key

is directed to the left subtree if it is smaller than the root

_key,

and to the

_right

subtree if it is

_larger.

In

_general,

keys

enter the left or

_right

subtree

_through

the root node and then the

process of

turning

left and

right

is

repeated

until the end of a branch is

reached. We illustrate this with an

example.

FIGURE 2.

_Building

the tree from the _sequence

_{.5, .2, .4, .6, .3, .1.}

Consider the sequence of

keys

x1 -

.5,

X2 =

.2,

X3 =

.4,

X4 =

.6, z5 =

.3, xs

= .1: the first

key

is

.5,

and it will be

placed

at the root of the tree.

The second

_key,

_.2,

goes to the left subtree because .2 .5. The third

_key,

.4,

first _goesto the left

_subtree,

because .4 .5. After this it

_gets

_compared

to

_.2,

and because .4 >

_.2,

_goesto the

_right.

This _processcontinues until all

keys

are

placed

in the

_tree,

and we have reached the final tree in

Figure

2.

We shall denote

_{T (x)}

the tree

_{generated by}

a _{sequence x}with elements

from

_~0,1~,

and

_{Tn (z)}

the tree

_{generated by}

the first n elements _{xl, ... , xn.}

Since the nodes at each level double in

_number,

the

_ordinary

_{representation}

of will

_quickly

_get

messy. We therefore choose a

special

embedding

of the

_binary

tree in the

_(complex)

_plane,

where the branches are scaled

_by

a factor

vlr2-

at each level. More

precisely,

if we code a level n node v

by

the vector

_{(dl, ... , dn),}

_{where dk}

= 1 when the level k node _onthe

_unique

path

from v to the root is a

right

son of his

father,

and

dk - -1

if it is a

left _son,then the

_embedding

is

_{given by}

(5)

FIGURE 3.

_Embedding

a

_binary

search tree in the

_plane.

When we

_apply

this

algorithm

to the first 1500 elements of the sequence

x =

(17rk 21)

(here

{y}

denotes the fractional

_part

of _anumber

y),

and use

this

_embedding

we obtain

_Figure

4.

FIGURE 4.

_{T,500 (firk 21).}

Apparently

we do not see now the structure

_{imposed by}

the continued fraction

_expansion

of 7r as in

Figure

1.

However,

we mill see this structure in

_(see

_Figure

_5),

and this will be

_explained

in the next section.

An

_important

characteristic of a

binary

search tree is the

_height

_Hn(x)

of

_7~(~c).

Here the

_height

is defined as the

_largest

level that is

_occupied

_by

(6)

a

_key

(the

level of the root is

0).

Obviously

we have for all x and all n

It is

_customary

for the

_analysis

of the

_algorithm

to assume that the

_keys

are

generated

by independent

random variables

_having

the uniform distribution

on

_(0,1~.

For this so called random

permutation

model almost all _sequences

behave

_essentially

in the same _way:

Theorem 1

_(cf.

_{[5, 9]).}

Under the random

_permutation

model

_{Hn(x) ~}

c log n

as n --~ oo almost

surely for

some c > 0

_(c

= 4.31107...

).

Now

_typically,

in fact almost

_surely,

the _{sequences x}

_{generated by}

the random

_permutation

model are

uniformly

distributed. A natural

_question

is therefore what can be said about

Hn(x)

for an

arbitrary

equidistributed

sequence x.

FIGURE 5.

3.

_Binary

search trees

_{generated by Weyl}

_sequences

To answer the

_{previous question}

we can

profit

from a nice

_analysis

per-formed

_{by Devroye}

and

_Goudjil

_([6])

of a subclass of

_{equidistributed}

(7)

They

are able to

_give

a

_precise

description

of the

height

of the tree

generated

by

such _sequences.

Theorem 2

_([6]).

Let a be an irrational numbers in

[0, 1]

with continued

fraction

expansions

a =

~al, a2, ... ~,

and

_convergents

Pn/qn.

Then

and

Devroye

and

_Goudjil

deduce from this that for most

_(i.e.,

almost all if

a is chosen

_according

to the uniform distribution on

[0, 1])

_sequencesthe

height

behaves as

where the convergence is in

probability

as n ~ 00.

They

furthermore deduce the

_following

result on

_"large

height"

behaviour.

Theorem 3

_([6]).

Let

_(hn)

be a monotone sequence

of

real numbers

de-creasing

0 at _anyslow rate. Then there exists a such that

(2)

nhn

for infinitely

many n.

Clearly

(1)

above

_implies

that "small

_height"

behaviour is obtained for

0: = _T

:= ~(B/5 2013 1).

We then have

See

_Figure

6 for the

_embedding

of this tree.

Our

_goal

will be to show that if one takes

_{arbitrary equidistributed}

se-quences x, one can not do better than in

_(2),

_i.e.,

=

o(~c),

but it is

possible

to _{fill the gap between}

I~

and This will be done in the next two sections.

4. The

_height

of trees

_{generated by uniformly}

distributed

sequences

We first show that denseness of a _{sequence x}

_already

forces some

regu-larity

on

T(x).

Proposition

4. Let be a dense _sequence

of

distinct numbers in

~0,1~.

Then

_for

each - >

0

there exists N such that all level N nodes in

are

occupied,

and such that _anytwo

neighbours

v and v on level N

(8)

FIGURE 6. +

~)k~) .

Proof.

Fix an E > 0.

_By

the denseness of

_(xk),

between any two Xn and

zm an Xk will appear for some k > _n,m

(and

numbers

_arbitrarily

close to 0 and 1 will

_{appear) .}

_Clearly

this

_implies

that all levels will be filled out in the end. Now let no be so

_large

that for _any_xkin

{~1,~2?’’ -

there

exists Xl such that

§-

Next,

choose N such that N >

_HnO,

then

obviously

level N is

_yet

_completely

_empty.

_According

to the observation at the

_beginning

of this

_proof,

there exists _nlsuch that the

_complete

level N is

_{occupied by keys}

from We claim that for _anytwo

_neighbours

v and E, of this level

(say v

to the left of

v)

their

_keys

_xk

_{and xk}

differ less

than E.

_Suppose

_not,

_i.e.,

_supposethat

lXk -

ê. Let v A P be the

closest common ancestor of v and

_v,

is on the

path

from v and

that from v to the

_root,

and its level is maximal with this

_property.

Let xj be the

key

at v A ÎÍ. Then

clearly

x~ _{xk .}

Let xi

be the smallest number so that 1 ~ no and

(since

ê, either xt

exists,

or there will exist a

_largest

number x,

with Xj

zr _xiéand the

_proof

will

_proceed

_{similarly) .}

Since I no

k,

and since there are no other xm

between Xk

and Xl, xe has been

assigned

to

a node v* on the

path

from the root to v. Then there are two

_{possibilities}

(writing

v’

v"

if two nodes v’ and v" are on the same

_path

to the root and the level of v’ is smaller than the level of

_{v") :}

v* v 1B ¡; or v 1B ¡; v*.

(9)

and v are on the same

path:

since _Xl has _goneto the

_right

subtree of

_v*,

but since _xk _XI,xk has gone to the

le f t

subtree of v*.

Case 2: v 1B v v* . Then we

_get

a contradiction with the fact that v and

v are

_neighbouring

nodes: _{now Xk must}have _goneto the left subtree both

at v 1B v and at v * . 0

We next show that

_{equidistribution}

of x

_gives

still more structure on

T(x).

Theorem 5.

_{If X}

=

(Xk)k>l

_is_a

_{uniforml y}

distributed

sequence in

~0,1~,

then =

o(n)

_as_{n -}_oo.

Proo, f. Suppose

c ~ n

infinitely

often for some c > 0.

Choose N’ such that the first node of level N’ has a

key

which is smaller

than

_c/2,

and the last node of level

N’

has a

_key

which is

_larger

than

1 -

_c/2.

_By

the

_proposition

above there is a level N >

N’,

such that the

keys coming

from x at level

_N,

which we denote

_by

_yl _y2 ... y2N

have the

_property

that

For where n is

large,

we define

=

height

of the subtree rooted _atthe node with

key

y;.

Then there is

_{a j}

such that

for

_infinitely

_manyn.

Keys zk

which will be moved to the subtree rooted at the

_jth

node

_(with

label

_yj)

should at least

_satisfy

_xx _Yj+l,hence this

_implies

that

infinitely

often

This contradicts the fact that the left hand side _convergesto which is

_strictly

smaller than c.

D

5. Small

_height

Let x =

_{(~1,~2?. ... )}

=

(3? ~ ~ ~ §? §) -

... )

denote the van der

Corput

sequence. Fix c in the interval

( 1, 2)

and define an index _sequence

I(c)

=

(21, 22, ’ ... )

by

,

Note that the sequence

i(c)

is

strictly increasing.

We obtain a _sequence

(10)

FIGURE 7.

following

way. We first

place

the

entry

X2n-l =

2n

at

position

in

y(c)

for n =

_{l, 2, ...}

and then

place

the

_remaining

entries of _x

along

the

open

positions,

in the same order

_{they appeared}

in x. Note that

y(2)

is

again

equal

to the van der

Corput

_sequence.

Lemma 6. For c E

_{(1, 2~,}

the _sequence

_y(c)

is

_{equidistributed}

on the unit

interval and

From this lemma it

_immediately

follows that

Proof.

Note that all _sequences

_y(c)

_generate

the same

_binary

search tree and that at time _n,the

_height

of the tree is

_equal

to the number of _powers

of 1

stored in the tree minus 1.

_Hence,

if n >

(11)

and if n

r cn-ll

To see that the _sequence

y(c)

is

_{equidistributed,}

note that the number of entries in

_{(xl, ... , zn)}

that do not appear in

(yl, ... , y~)

is at most

where we used that xl =

yl.

Hence,

for 0 a b 1

Since

it follows n : a _yk

b}

I

_convergesto b - a and hence the

sequence

y(c)

is

equidistributed.

0

References

[1] M. V. BERRY, J. GOLDBERG, Renormalisation of curlicues. Nonlinearity 1 _(1988),1-26.

[2] F. M. DEKKING, S. DE GRAAF, L. E. MEESTER, On the node structure of binary search trees. In Mathematics and Computer Science - Algorithms, Trees, Combinatorics and Probabilities

(Eds. Gardy, D. and Mokkadem, A.), pages 31-40. Birkhauser, Basel, 2000.

[3] F. M. DEKKING, M. MENDÈS FRANCE, Uniform distribution modulo one: a _geometrical

view-point. J. Reine Angew. Math. 329 _(1981),143-153.

[4] J.-M. DESHOUILLERS, Geometric aspect of Weyl sums. In _Elementaryand _{analytic theory}of numbers _{(Warsaw, 1982),}pages 75-82. PWN, Warsaw, 1985.

[5] L. DEVROYE, A note on the height of binary search trees. J. Assoc. Comput. Mach. 33 _(1986), 489-498.

[6] L. DEVROYE, A. _GOUDJIL,A _{study of}random Weyl trees. Random Structures _Algorithms12

(1998), 271-295.

[7] C. A. R. _{HOARE, Quicksort. Comput.}J. 5 _(1962),10-15.

[8] H. M. MAHMOUD, Evolution of random search trees. John _Wiley& Sons _Inc.,New York,

1992. _{Wiley-Interscience}Series in Discrete Mathematics and Optimization.

[9] B. _{PITTEL, Asymptotical growth of}a class _ofrandom trees. Ann. Probab. 13 _(1985),414-427.

Michel _DEKKING,Peter VAN DER WAL

Thomas _StieltjesInstitute for Mathematics and Delft _Universityof _Technology

Faculty ITS, Department CROSS

Mekelweg 4, 2628 CD Delft The Netherlands