HAL Id: hal-00620805
https://hal-upec-upem.archives-ouvertes.fr/hal-00620805
Submitted on 24 Feb 2013
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires
Enumerative combinatorics on words
Dominique Perrin
To cite this version:
Dominique Perrin. Enumerative combinatorics on words. Crapo Henri, Rota Gian-Carlo. Algebraic
Combinatorics and Computer Science, Springer-Verlag, pp.391-430, 2001. �hal-00620805�
DominiquePerrin
Institut Gaspard Monge, Universitede Marne-la-Vallee,
77454 Marne-la-Vallee Cedex 2 Frane.
perrinuniv-mlv.fr.
Abstrat
Wepresentthestateoftheart intheeld ofgenerating seriesfor
formallanguages. Theemphasis isonregular languagesandrational
series. ThepaperoversaspetsinludingregulartreesandtheKraft-
MMillaninequalityaswellasneklaesandzetafuntions.
Contents
1 Introdution 2
2 Regular sequenes and automata 3
2.1 Regularsequenes . . . 5
2.2 Finiteautomata . . . 7
2.3 Beyond regularsequenes . . . 8
3 Enumeration on regular trees 9 3.1 Graphsand trees . . . 10
3.2 Regularsequenesand trees . . . 11
3.3 Approximateeigenvetor . . . 13
3.4 The multisetonstrution . . . 16
3.5 Generating sequeneof leaves . . . 20
3.6 Generating sequeneof nodes . . . 24
4 Generating sequenes of prex odes 28 4.1 Trees and prexodes . . . 28
4.2 Bixodes . . . 30
5.1 Subshiftsof nitetype . . . 33
5.2 Cirularodes. . . 36
5.3 Zetafuntions. . . 40
1 Introdution
Generatingseries,alsoalledgeneratingfuntionsplayanimportantrolein
ombinatorial mathematis. Manyenumeration problemsan be solved by
transferringthebasioperationson setsinto algebraioperationson formal
seriesleading to a solutionof an enumeration problem. The famouspaper
byDoubilet,RotaandStanley'Theideaofgeneratingfuntion'[41℄,plaes
thesubjetinageneralmathematialframeallowingto presentinaunied
waythediverse sorts ofgeneratingfuntions from theordinaryones to the
exponentialoreven Dirihletones.
Their plae withinthe eld of ombinatoris on words is partiular. It
was indeedM. P. Shutzenberger's point of view that sets of words an be
onsidered as series in several non-ommutative variables. The generating
seriesoftheset appears thenasa theimageof thenon-ommutative series
through an homomorphism. This gives rise to a rih domain in whih an
interplaybetweenlassialommutativealgebraandombinatorisonwords
ispresent.
In these letures, I will survey on several aspets of these generating
funtions on words. The emphasis is on the most elementary ase orre-
sponding to sets of words whih an be dened using a nite automaton,
usuallyalledregular. The orrespondingseriesare atuallyrational. Two
speialases willbe onsidered inturn. The rst one is the ase of sets of
wodrs orresponding to leaves in a tree and usually alled prex odes. A
reent resultduetoFrederiqueBassino,Marie-PierreBealandmyself[10 ℄is
presented. Itompletelyharaterizesthegeneratingseriesofregularprex
odes. The seondone is theaseof sets ofwordsonsideredup to ayli
permutation, often alled neklaes. The orresponding generating series
arethezeta funtionsof symbolidynamis.
Awordontheterminologyusedhere. Weonstantlyusethetermregular
wherea riher terminologyisoften used. Inpartiular,what we allhere a
regularsequene is,inEilenberg'sterminology,an N-rational sequene (see
[22 ℄,[42 ℄ or[18℄).
We onsidertheset A
of allwords ona given alphabetA. A subset ofA
isoften alleda formal language. Forsets X;Y A
,we denote
X+Y =X[Y;
XY =fxyjx2X;y 2Yg;
X
=fx
1 x
2 x
n jx
i
2X;n0g
We say thatthe pair (X;Y) is unambiguousiffor eah z2XY there isat
mostone pair(x;y)2XY suh thatz=xy.
We say that a set of nonempty words X is a ode if for eah x 2 X
there is at most one sequene (x
1
;x
2
;:::;x
n
) with x
i
2 X suh that x =
x
1 x
2 x
n
(one also says that X is uniquely deipherable). A partiular
ase of a ode is a prex ode. It is a set ofwords X suh that no element
of X is a prex of another one. It is easy to see that suh a set is either
reduedto theemptywordordoesnotontaintheemptywordand isthen
aode.
ThelengthdistributionofasetofwordsXisthesequeneu
X
=(u
n )
n0
with
u
n
=Card(X\A n
):
We denote byu
X
theformal series
u
X (z)=
X
n0 u
n z
n
:
whihis theordinarygeneratingseriesof thesequene u
X .
For example, the length distribution of X = A
is u(z) = 1
1 kz
where
k=Card(A).
The entropy ofa formallanguageX is
h(X)=log(1=);
where is the radius of onvergene of the seriesu
X
(z). It is well dened
providedX isinniteandthusisnite. IfthealphabetAhask elements,
we have h(X)logk.
Thefollowingresult relatesthebasioperationson setswithoperations
onseries.
Proposition 1 The following properties hold for any subsets X;Y of A
.
(i) If X\Y =;, then u
X+Y
=u
X +u
Y .
XY X Y
(iii) If X is a ode, then u
X
=1=(1 u
X ).
Proof. The rst two formulae are lear. If X is a ode, every word in X
hasaunique deompositionasaprodutofwordsinX. Thisimpliesthat
u
X n
=(u
X )
n
andthus,
u
X
=1+u
X
++u
X n
+=1=(1 u
X ):
Example 1 The setX =fb;abg isa prexode. The seriesu
X is
u
X (z)=
1
1 z z
2 :
Let(F
n )
n0
bethesequene ofFibonai numbersdenedbyF
0
=0,F
1
=
1,and F
n+2
=F
n+1 +F
n
. Itfollows from thereurrenerelationthat
z
1 z z
2
= X
n0 F
n z
n
:
Consequently,u
X (z)
= P
n0 F
n+1 z
n
. Itan also beprovedbyaombina-
torialargument thatthenumberofwordsof lengthninX
isF
n+1 .
Thereareseveralvariantsofthegeneratingseriesonsideredabove. One
mayrst dene
p
X (z)=
X
n0 u
n
k n
z n
;
where k = Card(A). The oeÆients of z n
in p
X
(z) is the probability for
a wordof length nto be in theset X. Therelation betweenu
X and p
X is
simplesine p
X
(z) = u
X
(z=k). Another variant of the generating series is
theexponential generating series ofthesequene (u
n )
n0
denedas
e(z)= X
n0 u
n
n!
z n
:
We willalso usethezeta funtionofa sequene (u
n )
n1
denedas
(z)=exp X
n1 u
n
n z
n
:
We onsidersequenes ofnatural integerss=(s
n )
n0
. We shallnotdistin-
guishbetween suh asequene and theformal seriess(z)= P
n0 s
n z
n
:
We usuallydenoteavetor indexedbyelementsofasetQ, alsoalled a
Q-vetor, withboldfaesymbols. Forv=(v
q )
q2Q
we saythat v isnonneg-
ative,denotedv0,(resp.positive,denotedv>0)ifv
q
0 (resp.v
q
>0)
forall q 2Q. The same onventions are used formatries. A nonnegative
QQ-matrix M is said to be irreduible if, for all indiesp;q,there is an
integer m suh that (M m
)
p;q
> 0. The matrix is primitive if there is an
integer m suh thatM m
>0.
The adjaeny matrix of a graph G = (Q;E) is the QQ-matrix M
suh that for eah p;q 2 Q, the integer M
p;q
is the number of edges from
p to q. The adjaeny matrix of a graph G is irreduible i the graph is
stronglyonneted. Itisprimitiveif,moreover, theg..doflengthsofyles
inGis 1.
Let G be a nite graph and let I, T be two sets of verties. For eah
n0,let s
n
bethenumberofdistintpaths oflengthnfrom avertex ofI
toavertexofT. Thesequenes=(s
n )
n0
isalledthesequenereognized
by (G;I;T) or also by G if I and T are already speied. When I = fig
and T =ftg,we simplydenote (G;i;t) insteadof (G;fig;ftg).
Asequene s=(s
n )
n0
of nonnegativeintegersissaid to beregularifit
is reognizedby suh a triple (G;I;T), where Gis nite. We say that the
triple(G;I;T) is a representation of thesequene s. The verties of I are
alled initial and those of T terminal. Two representations are said to be
equivalent ifthey reognize thesame sequene.
A representation (G;I;T) is said to be trim if every vertex of G is on
some path from I to T. It is learthat any representation is equivalent to
atrim one.
A well known result in theory of nite automata allows one to use a
partiular representation of any regular sequene s suh that s
0
=0. One
an always hoose in this ase a representation (G;i;t) of s with a unique
initial vertex i, a unique nal vertex t 6= i suh that no edge is entering
vertexiand noedgeisgoingoutofvertext. Suharepresentationisalled
anormalized representation (seeforexample [37 ℄ page14).
Let(G;i;t) bea trimnormalizedrepresentation. Ifwe mergetheinitial
vertexiandthenalvertextinasinglevertexstilldenotedbyi,weobtain
anew graphdenoted byG, whihisstrongly onneted. The triple(G ;i;i)
isalledthe losure of(G;i;t).
Letsbearegularsequenesuhthats
0
=0. Thestar s
ofthesequene
s
(z)= 1
1 s(z) :
Proposition 2 If (G;i;t) is a normalized representation of s, its losure
(G ;i;i) reognizesthe sequenes
.
Proof. Thesequenesisthelengthdistributionofthepathsofrst returns
to vertex i in G , that is of nite paths going from i to i without going
throughvertexi. Thelengthdistributionofthesetofallreturnstoiisthus
1+s(z)+s 2
(z)+:::=1=(1 s(z)).
Anequivalentdenitionofregularsequenesusesvetorsinsteadofsets
I;F. Let i be a Q-row vetor of nonnegative integers and let t be a Q-
olumnvetor of nonnegative integers. We say that (G;i;t) reognizes the
sequene s=(s
n )
n0
ifforeah integer n0
s
n
=iM n
t;
whereM istheadjaeny matrixof G. The proof thatbothdenitionsare
equivalentfollowsfromthefatthatthefamilyofregularsequenesislosed
under addition (see [22 ℄). A triple (G;i;t) reognizing a sequene sis also
alleda representationof sand tworepresentations are alledequivalent if
they reognizethe same sequene.
A sequene s=(s
n )
n0
of nonnegative integers is rational ifit satises
a reurrene relation with integral oeÆients. Equivalently, s is rational
ifthere exist two polynomials p(z);q(z) with integral oeÆientsand with
q(0)=1 suh that
s(z)= p(z)
q(z) :
1 2
Figure 1: TheFibonai graph.
Forexample,thesequene sdenedbys(z)= z
1 z z 2
isthesequene of
Fibonai numbers also dened bys
0
=0;s
1
=1 and s
n+1
=s
n +s
n 1 . It
isreognizedbythegraph ofFigure 1with I =f1g and T =f2g.
Setion3.6).
AtheoremofSoittola[42 ℄,alsofoundindependentlyin[27℄haraterizes
thoserationalsequeneswhihareregular. Wesaythatarationalsequene
hasadominating root, eitherifit isa polynomialorifithasa real positive
polewhihisstritlysmallerthanthemodulusofanyotherone. Asequene
r is amerge ofthe sequenesr
i
ifthere isan integer p suh that
r(z)= p 1
X
i=0 z
i
r
i (z
p
):
Theorem 1 (Soittola) Asequeneofnonnegativeintegersr =(r
n )
n0 is
regularifandonlyifit isamergeof rationalsequeneshavinga dominating
root.
This result shows that it is deidableif a rational series is regular (see
[42 ℄). Inthepositivease,there isanalgorithmomputingarepresentation
ofthesequene.
2.2 Finite automata
Wepresenthereabriefintrodutiontotheoneptsusedinautomatatheory.
Fora generalreferene,see [38 ℄ or[22 ℄.
An automaton over the alphabet A is omposed of a set Q of states, a
set E QAQ of edges ortransitions and two sets I;T Qof initial
and terminalstates.
A path intheautomaton Ais a sequene
(p
1
;a
1
;p
2 );(p
2
;a
2
;p
3
);:::;(p
n
;a
n
;p
n+1 )
of onseutive edges. Its label is the word x =a
1 a
2 a
n
. A path is su-
essful if it starts in an initial state and ends in a terminal state. The set
reognized bythe automatonis thesetof labelsof its suessfulpaths.
Anautomatonisdeterministiif,foreahstatepandeahlettera,there
is at most one edge whih starts at p and is labeled by a. The term right
resolving isalso used.
Example 2 Let A be the automaton given in Figure 2 with 1 as unique
initial and terminal state. It reognizes the set X
where X is the prex
ode X =fb;abg:
1 2
b
b
Figure2: Golden mean automaton.
A set of words X over A is regular if it an be reognized by a nite
automaton.
Itisa lassialresult thatasetofwordsisregulariitan beobtained
bya nitenumberof operationsunion,produtand star,startingform the
nitesets.
The followingresultis also lassial(see[22 ℄ forexample).
Proposition 3 Everyregularsetanbereognized byanite deterministi
automaton having a unique initial state.
The following theorem is of fundamental importane. It belongsto the
earlyfolkloreof automata theory.
Theorem 2 Thelengthdistributionsofregularsetsaretheregularsequenes.
Proof. LetX be a regularset. By Proposition3,it an bereognized by a
deterministi automaton A. Sine A is deterministi,there is at most one
pathwithgivenlabel,originandend. Thusthenumberofpathsoflengthn
fromtheinitialstate to aterminalstate isequaltothenumberu
n
ofwords
ofX oflengthn.
Conversely,letubearegularsequeneenumeratingthepathsinagraph
G from I to T. We onsider the graph G asan automaton with all edges
withdistint labels. Let X be the set of labels of paths from I to T. The
sequeneu is thelengthdistributionof theset X.
Example 3 If X=a
b,then
u
X (z)=
z
1 z :
2.3 Beyond regular sequenes
There are several natural lasses of series beyond the rational ones. The
algebraiseries are those satisfying an algebrai equation. More generally,
termsis givenbya rationalfration(see [26℄).
The lass of algebrai seriesis linked withthe lass of ontext-free sets
(see [23 ℄). A typial example of a ontext-free set is the set of words on
thebinaryalphabetfa;bghavingasmanya'sasb's. Weomputebelowits
lengthdistributionwhih isan algebraiseries.
Example 4 The set of words on A = fa;bg having an equal number of
ourrenes of aand b is asubmonoidof A
generated bya prexode D.
SineanywordofD
oflength2nisobtainedbyhoosingnpositionsamong
2n, we have
u
D (z)=
X
n0
2n
n
z 2n
:
Bya simpleappliation ofthebinomial formula, we obtain
u
D
(z)=(1 4z 2
) 1
2
:
Thisfollows indeed,usingthesimpleidentity
1
2
n
= 1
( 4) n
2n
n
:
We have u
D
(z)=1 1=u
D (z)
and thus
u
D
(z)=1 p
1 4z 2
:
Thusu
D
(z) isan algebraiseries,solutionof theequation
f 2
2f +4z 2
=0:
3 Enumeration on regular trees
We nowturn to thestudyof generatingsequenes linked withtrees. Atu-
ally, we do notenumerate trees butobjets withina tree like the nodes or
the leaves at eah level. This is atually equivalent to the enumeration of
partiular sets of words, namely prex-losed sets and prex odes, as we
shallseebelow(Setion4).
Inthispaper, we use direted multigraphsi.e. graphswithpossiblyseveral
edges withthe same originand thesame end. We simply allthem graphs
inall whatfollows. We denoteG=(Q;E) agraphwithQassetofverties
andE asset ofedges. We alsosay thatGis agraph on thesetQ.
A tree T on a set of nodes N with a root r 2 N is a funtion T :
N frg ! N whih assoiates to eah node distint from the root its
father T(n), in suh a way that, for eah node n, there is a nonnegative
integer h suh thatT h
(n)=r. The integer h istheheight ofthenode n.
A tree is k-ary if eah node has at most k hildren. A node without
hildrenisalledaleaf. Anodewhihisnotaleafisalledinternal. Anode
nisadesendant of anodem ifm=T h
(n)forsome h0. Ak-arytree is
omplete ifall internalnodeshave exatlyk hildrenand have at leastone
desendant whih isa leaf.
For eah node nof a tree T, thesubtree rooted at n, denoted T
n is the
treeobtained byrestriting theset ofnodesto thedesendantsof n.
Two trees S;T are isomorphi, denoted S T,ifthere isa mapwhih
transforms S into T bypermuting thehildrenofeah node. Equivalently,
S T if there is a bijetive map f : N ! M from the set of nodes of S
onto theset of nodesof T suh thatf ÆS =T Æf. Suh a map f is alled
anisomorphism.
IfT isatree withN assetofnodes,thequotient graph ofT isthegraph
G=(Q;E)whereQand Earedenedasfollows. ThesetQisthequotient
of N by the equivalene n m if T
n T
m
. Let m denote the lass of a
node m. The numberof edges from m to n is thenumberof hildrenof m
equivalentto n.
Conversely,thesetofpathsinagraphwithgivenoriginisatree. Indeed,
let G=(Q;E) be a graph. Let r 2Q bea partiular vertex and let N be
thesetofpathsinGstartingatr. ThetreeT havingN assetofnodesand
suhthatT(p
0
;p
1
;:::;p
n )=(p
0
;p
1
;:::;p
n 1
) isalledtheovering tree of
Gstartingat r.
Both onstrutions aremutuallyinverse in thesense that any tree T is
isomorphito theovering tree of its quotient graph starting at the image
oftheroot.
Proposition 4 LetT bea treewithrootr. Let Gbeitsquotient graph and
let i bethe vertex of G whih isthe lass of the root of T. For eah vertex
q of G and for eah n 0, the number of paths of length n from i to q is
equal to the number of nodes of T atheight n in the lass of q.
isomorphisubtrees,i.e.ifits quotient graphis nite.
Figure 3: A regulartree.
1 3
4 2
Figure4: Andits quotientgraph.
For example, the innite tree representedon Figure 3 is a regular tree.
Itsquotientgraph isrepresentedon Figure 4.
3.2 Regular sequenes and trees
If T is a tree, its generating sequene of leaves is the sequene of numbers
s=(s
n )
n0
,where s
n
is the number of leaves at height n. We also simply
saythatsis thegenerating sequene ofT.
The followingresultis a diretonsequeneof thedenitions.
Theorem 3 Thegenerating sequeneofa regulartreeisa regularsequene.
Proof. Let T be a regular tree and let G be its quotient graph. Sine T
is regular, G is nite. The leaves of T form an equivalene lass t. By
iisthelass of theroot ofT.
We saythat asequenes=(s
n )
n1
satisestheKraftinequalityforthe
integer k if
X
n0 s
n k
n
1;
i.e.usingtheformalseriess(z)= P
n0 s
n z
n
,if
s(1=k)1:
We say that s satises the strit Kraft inequality for k if s(1=k) < 1.
Thefollowingresult is well-known(see [4 ℄ page35 forexample).
Theorem 4 A sequene s is the generating sequene of a k-ary tree i it
satises the Kraft inequality for the integer k.
LetusonsidertheKraft'sequalityase. Ifs(1=k)=1,thenanytree T
having s asgenerating sequene is omplete. The onverse propertyis not
truein general(see [22 ℄ p. 231). However, it isa lassialresult thatwhen
T isaompleteregulartree,itsgeneratingsequenesatisess(1=k)=1(see
Proposition8).
For the sake of a omplete desription of the onstrution desribed
aboveintheproofofTheorem4,wehavetospeifythehoiemadeateah
stepamong theleavesat height n. A possiblepoliyisto hooseto give as
manyhildrenaspossibleto thenodeswhiharenotleavesandofmaximal
height.
IfwestartwithanitesequenessatisfyingKraft'sinequality,theabove
method builds a nite tree with generating sequene equal to s. It is not
truethatthisinrementalmethodgivesa regulartreewhenwestartwitha
regularsequene, asshowninthe followingexample.
Let s(z) = z 2
=(1 2z 2
). Sine s(1=2) = 1=2, we may apply the Kraft
onstrutionto buildabinary treewithlengthdistributions. Theresult is
thetree T(X)where X is thesetof prexes oftheset
Y = [
n0 01
n
0f0;1g n
:
whihis notregular.
If sisa regular sequene suh thats
0
=0,there exists aregular tree T
havingsasgeneratingsequene. Indeed,let (G;i;t) be anormalized repre-
sentation of s. The generating sequene of the overing tree of G starting
however nottruethat theregular overingtree obtainedis k-ary,as shown
inthefollowingexample.
Let s be the regular sequene reognized by the graph of Figure 5 on
the left with i= 1 and t= 4. We have s(z) =3z 2
=(1 z 2
). Furthermore
s(1=2)=1and thusssatisesKraft'sequalityfork=2. Howeverthereare
fouredges goingoutofvertex2anditsregularoveringtreestartingat 1is
4-ary. A solutionforthisexample isgiven bythe graphof Figure 5 on the
right. Itreognizessanditsoveringtreestartingat 1istheregularbinary
treeof Figure 3.
1 2 3
4
1 3
4 2
Figure5: Graphs reognizings(z)=3z 2
=(1 z 2
).
TheaimofSetion3.5istobuildfromaregularsequenesthatsatises
theKraftinequalityforanintegerk atreewithgeneratingsequeneswhih
isbothregularand k-ary.
3.3 Approximate eigenvetor
Let M be the adjaeny matrix of a graph G. By the Perron-Frobenius
theorem (see [25℄, for a general presentation and [30 ℄, [28 ℄ or [11 ℄ for the
link with graphs and regular sequenes), the nonnegative matrix M has a
nonnegative real eigenvalue of maximal modulus denoted by , also alled
thespetralradius ofthematrix.
WhenGisstronglyonneted, thematrixisirreduibleand thePerron-
Frobeniustheoremassertsthatthedimensionoftheeigenspaeofthematrix
Morrespondingtoisequaltoone,andthatthereisapositiveeigenvetor
assoiated to .
Letkbeaninteger. Ak-approximateeigenvetorofanonnegativematrix
M is,by denition,an integralvetor v0 suh that
Mvkv :
Onehasthe followingresult(see [30℄p.152).
admits a positivek-approximate eigenvetor i k .
Foraproof,see[30 ℄p.152. WhenM istheadjaenymatrixofagraph
G,wealsosaythatvisak-approximateeigenvetorofG. Theomputation
of an approximate eigenvetor an be obtained by the use of Franaszek's
algorithm (see for example [30 ℄). It an be shown that there exists a k-
approximate eigenvetor with elements bounded above by k 2n
where n is
thedimensionofM [5 ℄. ThusthesizeoftheoeÆientsof ak-approximate
eigenvetorisboundedabovebyanexponentialinnandanbeintheworst
aseof thisorder ofmagnitude.
The followingresult iswell-known. It linkstheradiusof onvergene of
asequene withthe spetralradiusof theassoiatedmatrix.
Proposition 6 Let s be a regular sequene reognized by a trim represen-
tation (G;I;T). Let M be the adjaeny matrix of G. The radius of on-
vergeneof s isthe inverse of the maximal eigenvalue of M.
Proof. ThemaximaleigenvalueofM is=limsup
n0 n p
kM n
k,wherekk
isanyoftheequivalentmatrixnorms. Let betheradiusofonvergeneof
sand,foreahp;q 2Q,let
pq
betheradiusofonvergeneofthesequene
u
pq
= (M n
pq )
n0
. Then 1= = min
pq
. Sine (G;I;T) is trim, we have
pq
forallp;q 2Q. On theother hand,min
pq
sines isa sum of
someofthesequenesu
pq
. Thus
s
=min
pq
whihonludestheproof.
Asa onsequeneof thisresult,the radiusofonvergene of a regular
sequenesisapole. Indeed, withtheabovenotation, s(z)=i(1 Mz) 1
t.
Thendet(I Mz)isadenominatorof therationalfrations,thepolesofs
areamong theinversesoftheeigenvaluesofM. Andsine1=istheradius
of onvergene of s, it has to be a pole of s. In partiular, s diverges for
z=.
The followingresult,dueto Berstel,is alsowell-known. Itallowsone to
omputetheradiusof onvergene ofthestar of a sequene.
Proposition 7 Let s be a regular sequene. The radius of onvergene of
theseriess
(z)=1=(1 s(z))istheuniquerealnumberr suhthats(r)=1:
Fora proof,see[22 ℄ pp 211-214, [18℄p. 82 or[11 ℄p. 84. Asaonsequene,
we obtainthefollowingresult.
radius of onvergene of s
. The sequene s satises the Kraft strit in-
equality s(1=k) < 1 (resp. equality s(1=k) = 1) if and only if < k (resp.
=k).
We have thusproved thefollowingresult, whihis thebasis ofthe on-
strutionsof thenext setions.
Proposition 9 Let s be a regular sequene satisfying Kraft's inequality
s(1=k)1. Let (G;i;t) bea normalized representation of sand let (G ;i;i)
bethelosureof (G;i;t). Theadjaeny matrixM ofGadmitsa k-approxi-
mateeigenvetor.
Atually,under thehypothesis of Proposition9, thegraph Gitself also
admits a k-approximate eigenvetor. Indeed, let w = (w
q )
q2Q t
be a k-
approximate eigenvetor of G . Then the vetor w = (w
q )
q2Q
dened by
w
q
=w
q
forq 6=tand w
t
=w
i
isa k-approximateeigenvetorof G. Thisis
illustratedinthefollowingexample.
1 2 3
4
1 2 3
Figure 6: ThegraphsGand G.
Letusforexampleonsideragains(z)=3z 2
=(1 z 2
)(seeFigure5). The
sequenesisreognizedbythenormalizedrepresentation(G;1;4) whereG
isthegraphrepresentedontheleftofFigure6. ThegraphGisrepresented
on theright. The vetors
w= 2
6
6
4 3
2
1
3 3
7
7
5
;w= 2
4 3
2
1 3
5
are2-approximateeigenvetors ofG and Grespetively.
In this setion, we present the main onstrution used in this paper. It
anbeonsideredasaversionwithmultipliitiesofthesubsetonstrution
used in automata theory to replae a nite automaton by an equivalent
deterministione. Weuseonlyunlabeledgraphsbuttheonstrutionanbe
easilygeneralizedtographswithedgeslabeledbysymbolsfromanalphabet.
Ouronstrution isalsolinkedwithoneusedbyD.Lindtobuildapos-
itive matrixwith given spetral radius(see [30 ℄,espeiallyLemma11.1.9).
We use for onveniene the term multiset of elements of a set Q as a
synonymofQ-vetor. Ifu=(u
q )
q2Q
issuh amultiset,theoeÆientu
q is
also alled themultipliity of q. The degree of u is thesum P
q2Q u
q of all
multipliities.
We start witha triple(G;i;t) whereG=(Q;E) is a nitegraphand i
(resp. t)is a row(resp. olumn) Q-vetor. We denote byM the adjaeny
matrixof G.
Let m be a positive integer. We dene another triple(H;J;X) whih
is said to be obtained bythe multiset onstrution. The graph H is alled
an extension of the graph G. The extension is not unique and depends
as we shall see on some arbitrary hoies. The set S of verties of H is
formed of multisets of elements of Q of total degree at most m. Thus, an
element of S is a nonnegative vetor u = (u
q )
q2Q
with indies in Q suh
that P
q2Q u
q
m. Thisonditionensuresthat H is anitegraph.
WenowdesribethesetofedgesofthegraphHbydeningitsadjaeny
matrixN. LetU betheSQ-matrixdenedbyU
u;q
=u
q
. ThenN isany
nonnegative SS-matrix whihsatises
NU =UM:
Equivalently,forallu2S,
X
v 2S N
u;v
v=uM:
Let us omment informally the above formula. We an desribe the on-
strution of the graph H as a sequene of hoies. If we reah a vertex u
of H,we partition themultisetuM of vertiesreahable from the verties
omposinguintomultisetsofdegree atmostm todenethevertiesreah-
ablefromu inH. TheintegerN
u;v
isthemultipliityofv inthepartition.
Theformula simplyexpresses the fatthat theresult isindeeda partition.
Ingeneral, there areseveral possiblepartitions. The matrixU is alledthe
transfer matrix of theextension.
JbetheS-rowvetor suhthat J
i
=1 and J
u
=0foru6=i. LetX be the
S-olumnvetor suhthat X
u
=ut.
Thus
JU =i; X=Ut:
To avoid unneessary omplexity,we only keep inS the verties reahable
from i. Thus, we replaethe set S by theset of elements u of S suh that
there isa pathfrom ito u.
ThenumberofmultisetsofdegreeatmostmonasetQwithnelements
is n
m+1
1
n 1
. Thus the number of verties of a multiset extension is of order
n m
. It ispolynomialinnifm is taken asa onstant.
1 2 1 12
Figure7: The graphsGand H.
LetforexampleGbethegraphrepresentedonFigure7ontheleft. The
graphH represented ontheright isa multisetextension ofG with
i=
1 0
; j=
0
1
:
Thematries M;N and U are
M =
2 1
0 1
;N =
1 1
0 2
;U =
1 0
1 1
;J=
1 0
;X=
0
1
:
Inthisase, thematrixU isinvertibleandthematriesM;N areonjugate.
The basipropertyofan extensionis thefollowingone.
Proposition 10 LetH bean extensionof G. Thetriple(H;J;X)isequiv-
alentto (G;i;t).
Proof. Foreah n0,we have
UM n
=N n
U:
JN n
X = JN
n
Ut
= JUM
n
t
= iM n
t:
Thisshows that (H;J;X)reognizes s.
We willalsomakeuseofthefollowingadditionalpropertyofextensions.
Proposition 11 Let H be an extension of G. Let M (resp. N) be the
adjaeny matrix of G (resp. H) and let U be the transfer matrix. If w is
a k-approximate eigenvetor of M,the vetor W=Uw is a k-approximate
eigenvetor of N. If wis positive, then W isalso positive.
Proof. We have
NW=NUw=UMwkUw=kW :
Sine all rows of U aredistint from 0, the vetor W is positive whenever
wispositive.
In the next setion, we will hoose a partiular extension of the graph
G alled admissible and whih is dened as follows. Let w be a positive
Q-vetor andletm be apositiveinteger. LetH beanextensionofG,let U
bethetransfermatrix, andletW=Uw . We saythatH isadmissible with
respetto wand m ifforeah u2S,allbutpossiblyone of thevertiesv
suh that(u;v ) is anedge ofH satisfyW
v
0modm.
Theorem 5 For any graph G on Q, any positive Q-vetor w and any in-
teger m>0, the graph G admits an admissibleextension with respet to w
and m.
The proof relies on the following ombinatorial lemma. This lemma is
also used in a similar ontext by Adler et al. and Marus [34 ℄,[1℄. It is
atuallypresentedin[3℄ asa nievariant of thepigeon-holepriniple.
Lemma 1 Let w
1
;w
2
;:::;w
m
be positive integers. Then there is a non-
empty subset Sf1;2;:::;mg suh that P
q2S w
q
isdivisible by m.
Proof. Thepartialsumsw
1
;w
1 +w
2
;w
1 +w
2 +w
3
;:::;w
1 +w
2
++w
m
eitherarealldistint(modm),ortwoareongruent(modm). Intheformer
there are1p<rmsuhthat
w
1 +w
2
++w
p w
1 +w
2
++w
r
( modm)
Henew
p+1 +w
p+2
++w
r
0 (mod m).
Proof. ofTheorem5. WebuildprogressivelythesetofedgesofH. Letube
anelementofS. Weprovebyindutiononthedegreed(uM)= P
q2Q (uM)
q
of uM that there exists v
1
;::: ;v
n
2 S suh that uM = P
n
i=1 v
i and
W
v
i
0modm for 1 i n 1. If uM 2 S, i.e. if d(uM) m,
we hoose n = 1 and v
1
= uM. Otherwise, there exists a deomposition
uM = v+u 0
suh that d(v ) = m. Let w
1
;w
2
;:::;w
m
be the sequene of
integers formed by the w
q
repeated v
q
times. By Lemma 1 applied to the
sequene of integers w
i
, there is a deomposition v = v 0
+r with v 0
6= 0
suh thatW
v
0 0modm. We have uM =v 0
+w 0
withw 0
=r+u 0
. Sine
d(w 0
) < d(uM), we an apply the indution hypothesis to w 0
, giving the
desiredresult.
ForanS-vetorW ,wedenotebyd W
m
etheS-vetor Zsuhthatforeah
u inS,
Z
u
=d W
u
m e:
Summingup thepreviousresults,weobtainthe followingstatement.
Proposition 12 Let H be an admissibleextension of G withrespet to w
and m. Let M (resp. N) be the adjaeny matrix of G (resp. H), let U
be the transfer matrix and let W = Uw . If w is a positive k-approximate
eigenvetor of M,then d W
m
e is a positive k-approximate eigenvetor of N.
Proof. By Proposition3.4,thevetorW isa positivek-approximateeigen-
vetor of N. Thus
NWkW :
Let u be an element of S. We have W
v
0modm forall indies v suh
that N
u;v
> 0 exept possibly for an index v
0
. The previous inequality
impliesthat
X
v 2S fv
0 g
N
u;v W
v
m +N
u;v
0 W
v
0
m k
W
u
m :
Sine v
m
isa nonnegative integer forv2Q fv
0
g, we get
X
v 2S fv0g N
u;v W
v
m +N
u;v
0 d
W
v
0
m
ek d W
u
m e:
Thisproves that
Nd W
m ekd
W
m e:
3.5 Generating sequene of leaves
Inwhatfollows,weshowhow themultisetonstrution allowsone to prove
themainresult of[10 ℄ onerningthegeneratingsequenesofregulartrees.
We beginwith thefollowing lemma,whih is also usedin thenext setion.
We usethetermleaf fora vertexof agraph withoutoutgoingedges.
Lemma 2 Let G be a graph on a set Q of verties. Let i2Q and T Q.
IfG admitsa k-approximate eigenvetor w ,thereisa graph G 0
and asetof
verties I 0
of G 0
suh that
1. G 0
admitsthe k-approximate eigenvetor w 0
withall omponents equal
to1.
2. the triple (G;i;w ) isequivalentto the triple (G 0
;I 0
;w 0
);
3. Ifw
p
=1for all p2T, thereisa setofvertiesT 0
ofG 0
suhthat the
triple (G;i;T) is equivalentto the triple (G 0
;I 0
;T 0
). Moreover, if T is
the setof leaves of G,we an hoose for T 0
the setof leaves of G 0
.
We nowstate the mainresultof [10 ℄.
Theorem 6 Let s=(s
n )
n0
be a regular sequene of nonnegative integers
and let k be a positive integer suh that P
n0 s
n k
n
1. Then there is a
k-ary rational treehaving sas its generating sequene.
Proof. Let us onsider a regular sequene s and an integer k suh that
P
n0 s
n k
n
1. Sine the result holds trivially for s(z) = 1, we may
suppose that s
0
= 0. Let (G;i;t) be a normalized representation of s and
let G be the losure of G as dened at the beginning of Setion 2.1. We
denotebyM (resp.M)theadjaenymatrixofG(resp.G). LetQ=Q ftg
thematrixMadmitsapositivek-approximateeigenvetorw . Bydenition,
we have Mwkw .
Let w be the Q-vetor dened by w
q
=w
q
forall q 2 Q and w
t
=w
i .
Then,sinethereisnoedgegoingoutoftinG,wisapositivek-approximate
eigenvetor of M. Let t be the Q-vetor whih is the harateristi vetor
ofthevertext. Letm=w
i .
By Theorem5there existsanadmissibleextensionH of Gwithrespet
to w and m. Let U be the transfer matrixand let W =Uw . Sine w
t
0modm, we may hoose H withthefollowing additionalproperty. Forall
u2S eitheru
t
=0oru=t.
Aording to Proposition10, the sequene s is reognized by(H;J;X)
where J is the harateristi row vetor of i and X is the harateristi
olumn vetor of t. This means that s is reognized by the normalized
representationonsistinginthegraphH,theinitialvertexi,thatweidentify
to i,and theterminal vertext, thatwe identifyto t.
LetN betheadjaenymatrixofH. ByProposition12,thevetord W
m e
isapositivek-approximateeigenvetorofN. Remarkthatd W
m e
i
=d W
m e
t
=
1.
We maynowapplyLemma2 toonstrut atriple(H 0
;I 0
;T 0
)equivalent
to (H;i;t). The set T 0
is the set of leaves of H 0
. Sine d W
m e
i
= 1, I 0
is
reduedto onevertexi 0
. SineH 0
admits ak-approximateeigenvetor with
allomponentsequaltoone,thegraphH 0
isofoutdegreeatmostk. Finally
sis thegenerating sequene of the overing tree of H 0
startingat i 0
. This
treeis k-aryand regular.
Let us onsider the above onstrutions in the partiular ase of the
equality in Kraft's inequality. In this ase, the result is a omplete k-ary
tree. Indeed, by Proposition 8, the matrix M admits a positive integral
eigenvetor wfortheeigenvaluek. We have forall p2Q,
X
q2Q M
p;q w
q
=kw
p :
Asaonsequene, foranyu6=t,we have
X
v 2S N
u;v W
v
=kW
u :
Then the graph onstruted in Lemma 2 is of onstant outdegree k. Thus
thek-arytree obtainedis omplete.
of Theorem 6. Let n be the number of verties of the graph G giving a
normalizedrepresentationofs. Thesizeoftheintegerm=w
i
isexponential
inn(seeSetion3.3). ThusthenumberofvertiesofthegraphHisbounded
by a doubleexponentialin n. The nal regular tree is the overing tree of
agraph whoseset ofverties hasthesame sizeinorder ofmagnitude.
Letforexample sbe thesequene denedby
s(z)= z
2
(1 z 2
) +
z 2
(1 5z 3
) :
Sine s(1=2) = 1, it satises the Kraft equality for k = 2. The sequene
s is reognized by (G;i;t) where G = (Q;E) is the graph given in Figure
3.5 with Q = f1;2;3;4;5;6;7g, i = 1, t = 4. The adjaeny matrix of G
admitsthe 2-approximateeigenvetor representedon Figure 3.5, wherethe
oeÆientsofwarerepresentedinsquaresbesidetheverties. Thusm=3.
3
3 1
5 4
7 6
2 1
2
3 1
2
4
Figure8: A normalized representation ofs
An admissible extension H of G with respet to w and m is given in
Figure 9. In this gure,eah multisetof S is represented bya sequene of
vertieswithrepetitionsorrespondingtothemultipliity. Forexample,the
multiset u= (0;0;1;0;0;2;0) is represented by (3;6;6). The sequene s is
reognizedbythe normalizedrepresentation(H;1;4), wheretheinitialand
nalverties arenamed asthey appearon Figure 9. TheoeÆientsof W
arerepresentedinsquaresbesidetheverties.
A regular binary tree T having s as generating sequene of leaves, is
given in Figure 10. In this gure, the nodes have been renumbered, with
thehildrenof a nodewith agiven labelrepresentedonlyone. Theleaves
1
3
1 2
2
4
1 1
1
2 1
6 6 6 2 5
7 7 7
5 5 5 3 6 6 3 7 2 7 7 3 5 5
2 6
1
4
Figure 9: Anadmissible extensionH.
of the tree are indiated by blak boxes. The tree itself is obtained from
the graph of Figure 9 by appliation of the onstrution of Lemma 2. For
example, the vertex (2;5), whih has oeÆient 6 in W , is split into two
vertiesnamed 2and 3 inthetree.
This example was suggested to us by Christophe Reutenauer [39 ℄. To
hekdiretlythatthelengthdistributionisequaltos(z),onemayompute
from the graph the following regular expression of s(z) and hek by an
elementaryomputation (possiblywiththehelpofa symboliomputation
system)that itis equalto s(z).
s(z)=(z 6
)
(2z 2
+z 4
+2z 5
+z 6
+(z 2
+3z 5
)(5z 3
)
3z 3
): (1)
(notefora readerunfamiliarwithregularexpressions: therst fator(z 6
)
orresponds to the vertex labeled 1 at level 6 of the tree. The term 2z 2
+
z 4
+2z 5
+z 6
orresponds to the leaves reahed by a path whih does not
useavertexlabeled5. Thefator(z 2
+3z 5
)(5z 3
)
orrespondsto thepaths
from the root to a vertex labeled 5. Finally, thefator 3z 3
orresponds to
thediretpaths from 5to a leaf.)
Thisexampleshowsaninterestingfeatureofthisproblem. Infat, from
thepointofview ofregularexpressions,thediÆultoperationinthisprob-
lemisthesum. Itwouldbeasimplemattertobuildarationaltreeforeah
term of the sum in the expression (1) (see the example of Figure 5). The
diÆultywouldthenbetomergethesetreestoobtainone orrespondingto
thesum.
1 2
3
4
5
6
7
8
9 10
11
12
10
13
12
12
14
5
5
5
5
1
Figure10: Aregular binarytree withlengthdistributions.
Auriousonsequeneof Theorem6 isthefollowingpropertyof regular
sequenes.
Corollary 1 Let k 2 be an integer and let u be a regular sequene
suh that u(1=k) 1 and u(0) = 0. Then there exist k regular sequenes
u
1
;:::;u
k
suh that u
i
(1=k)1 and
u(z)= k
X
i=1 zu
i (z):
Proof. It is a simple onsequene of Theorem 6. Indeed, if X is a regular
prexode on the k element alphabet A,then X = P
a2A aX
a
whereeah
X
a
is aregular prexodeon thealphabetA.
We don't know ofa diretproofof thisresult.
3.6 Generating sequene of nodes
In thissetion, we onsiderthe generating sequene of theset of all nodes
inatreeinsteadof justthesetofleaves. Thisismotivatedbythefatthat
in searh trees, the information an either be arried by the leaves or by
all the nodes of the tree. We will see that the omplete haraterization
ompliatedthantheone forleaves.
Soittola (see [42 ℄ p.104)hasharaterized theserieswhihare thegen-
eratingsequenes of nodesina regular tree. We haraterize theones that
orrespondtok-arytrees(Theorem7). Wealsogiveamorediretonstru-
tionina partiular ase(Theorem 8).
Let T be a tree. The generating sequene of nodes of the tree T is the
sequenet=(t
n )
n0
,wheret
n
isthenumberofnodesofT atheightn. The
sequenet satisest
0
1 and,moreover, ifT is ak-ary tree,theondition
t
n kt
n 1
for all n 1. If T is a regular tree, then t is a regular sequene. We
nowompletelyharaterizetheregularsequenestthatarethegenerating
sequenesof nodes ofa k-aryregular tree.
Theorem 7 Let t = (t
n )
n0
be a regular sequene and let k be a positive
integer. Thesequene(t
n )
n0
isthe generating sequeneof nodesof ak-ary
regular tree i it satises the following onditions.
(i) the onvergeneradius of t isstritly greater than 1=k,
(ii) the sequenes(z)=t(z)(kz 1)+1 is regular.
Proof. Let us rst show that the onditions are neessary. Let T be the
ompletek-arytree obtained by addingi newleavesto eah node that has
k ihildren. SineT is a regulartree,T isalso regular.
Let s be the generating sequene of leaves of T. Sine T is omplete,
s(1=k)=1. Sine kt
n
=s
n+1 +t
n+1
foralln0,wehave
1 s(z)=t(z)(1 kz):
Sinesisaregularsequene,itsradiusofonvergeneisstritlylargerthan
1=k (see Setion 3.3). Sine the value of the derivative of s at z = 1=k is
kt(1=k), thesame holdsfort. Thisprovesthe neessityof theonditions.
Conversely,iftsatisestheonditionsofthetheorem,theregularseries
s(z) =t(z)(kz 1)+1 satises s(1=k) = 1. Thus, byTheorem 6, s is the
generatingsequene ofleavesofaompletek-aryregulartree. The internal
nodes of this tree form a k-ary regular tree whose generating sequene of
nodes ist.
Thesequenesdenedbyondition(ii)isrationalassoonastisregular
andthereforerational. Givenaregularsequenet,ondition(ii)isdeidable
inview ofthetheorem of Soittola (Theorem1).
negativity of the oeÆients of the series s and thus the inequality 8n
1;t
n kt
n 1
. Italso impliesthatt
0 1.
We now show that there are regular sequenes t satisfying t
n kt
n 1
forall n1, and ondition (i) of the theorem and suh that thesequene
s(z)=t(z)(kz 1)+1 isnotregular. Theexampleisbased onan example
ofarationalsequenewithnonnegativeoeÆientsandwhihisnotregular
(see [18 ℄page 95). Let
r
n
=b 2n
os 2
(n)
withos()= a
b
where theintegersa;b aresuh thatb6=2a and0<a<b.
The sequene r is rational, has nonnegative integer oeÆients and is not
regular. Itspoles are 1
b 2
, 1
b 2
e 2i
and 1
b 2
e 2i
. We nowdene the sequene t
asfollows:
t
2h
= k h
;
t
2h+1
= k h
+r
h :
We also assume that b 2
< k. By Soittola's theorem, the sequene t is
regular sineit is a mergeof rationalsequenes havinga dominatingroot.
The onvergene radius of t is 1
p
k
>
1
k
. Therefore the sequene t satises
therst onditionof Theorem7. Let s be the sequene dened by s(z) =
t(z)(kz 1)+1. Ifh=2pis even,
s
h
= kt
h 1 t
h
= kk p 1
+kr
p 1 k
p
+1=kr
p 1 +1:
Thusthesequene sis notregular.
The above example does not work for the small values of k (the least
value isk =10). We do notknow ofsimilarexamples for2k9.
We nally desribe a partiular ase of Theorem 7 in whih one has a
relatively simple method, based on the multiset onstrution, to build the
regulartree witha given generatingsequene of nodes. Thisavoids theuse
ofSoittola's haraterization whih leadsto amethod ofhigheromplexity.
A primitive representation of a regular sequene s is a representation
(G;i;t) suhthat theadjaenymatrix ofGis primitive. Thefollowingre-
sultisprovedin[8 ℄withadierentproofusingthestate-splittingmethodof
symbolidynamis. Theproofgivenin[10℄reliesonasimpleronstrution.
Theorem 8 Let t = (t
n )
n0
be a regular sequene and let k be a positive
integer suh that t
0
=1, t
n kt
n 1
for all n1 and suh that
(ii) thas a primitive representation.
Then(t
n )
n0
isthegeneratingsequeneofnodes by heightofak-aryregular
tree.
The proof of this theorem given in [10 ℄ uses the multiset onstrution.
Itrelieson thefollowinglemma.
Lemma 3 Let M be a primitive matrix with spetral radius . Let v be a
non-null and nonnegative integral vetor and let k be an integer suh that
< k. Then there is a positive integer n suh that M n
v is a positive k-
approximate eigenvetor of M.
Proof. ForaprimitivematrixM withspetralradius,itisknownthatthe
sequene ((
M
)
n
)
n0
onverges to r:l where ris apositiveright eigenvetor
andl apositivelefteigenvetor of M fortheeigenvaluewithlr=1(see
forexample[30 ℄ p. 130). Thus( M
n
n
v )
n0
onverges to r:l:v whihis equal
to r where is a nonnegative real number. Sine Mr=r, we get, for a
largeenoughinteger n,
M M
n
n
vk M
n
n
v
or equivalently MM n
v kM n
v . If n is large enough, we moreover have
M n
v>0sine M isprimitive.
Theproofof Theorem8 usesa shiftofindiesof thesequeneto obtain
a new sequene to whih a simple appliation of the multisetonstrution
an be applied. We illustrateiton an example.
1 2 3
Figure11: AprimitiverepresentationG oft.
i=
1 0 0
and t= 2
4 1
1
0 3
5
:
Theadjaeny matrixM of Gisthe primitivematrix
M = 2
4
1 1 0
0 0 1
1 0 0 3
5
:
Its spetral radius is less than 2. The hypothesis of Theorem 8 are thus
satised. We have
M 2
t= 2
4 2
1
2 3
5
and M 3
t= 2
4 3
2
2 3
5
:
Sine M 3
t 2M 2
t, thevetor W =M 2
t is an approximateeigenvetor of
M (theexistene of suh avetor is asserted byLemma 3). Let w=M 2
t.
ApplyingLemma 2, we obtainfrom G thegraph G 0
represented on the
leftsideofFigure 12. Moreover, (G;i;w ) isequivalentto (G 0
;I 0
;w 0
) where
I 0
isthesetofinitialvertiesindiatedonFigure12andwisthevetorwith
all omponents equal to 1. The overing trees T
1;1 and T
1;2 of G
0
starting
at the verties of I 0
give, with the appropriate shift of indies, the binary
regular tree T represented on the right sideof Figure 12 (the nodes of the
treehave beenrenumbered).
4 Generating sequenes of prex odes
There is a lose onnexion between trees and prex odes or prex-losed
sets of words. We present belowthetranslationof some of thenotionsand
resultsseenbeforein termsof prexodes.
4.1 Trees and prex odes
Let R be a set of words on the alphabetA =f0;1;::: ;k 1g. The set R
issaid to beprex-losed ifanyprexof anelement of R isalso inR . The
setXofwordswhiharenotaproperprexofawordinR isaprexode,
alledtheprexode assoiatedto R .
7 3
4 5
2,1 1,1
3,1 3,2
1,2
T 1,1 T 1,2
1
2 3
4 5
4 5
6
Figure 12: Thegraph G 0
and thetree T.
WhenR isprexlosed,we anbuildatreeT(R )asfollows. Theset of
nodesis R ,theroot istheemptywordandT(a
1 a
2 a
n )=a
1 a
2 a
n 1 .
Theleaves ofT froma prexode whihistheprexodeassoiatedto R .
Thegeneratingsequene of T is thegeratingsequene of X.
Let for example R = f;0;1;10;11g. The tree T(R ) is represented on
Figure13. Theassoiatedprexode isX=f0;10;11g.
Figure13: The tree T(X).
Let X be a prexode on an alphabetwithk symbols. It is lear that
n n1
X
n1 u
n k
n
1;
orequivalentlyu(1=k)1. The numberu(1=k) an atuallybeinterpreted
astheprobabilitythata longenough wordhasa prexinX.
There is also a onnexion with the notion of entropy. Atually, if X is
aprexode,theentropyofX
isequaltolog (1=) where isthesolution
of theequation u
X
()=1. ThusKraft's inequalityexpresses the fat that
h(X
)logk.
Conversely,Kraft-MMillan'stheoremstates thatforanysuhsequene
u=(u
n )
n1
,thereexistsaprexodeX ona k-symbolalphabetsuhthat
u=u
X .
The equalityase inKraft'sinequalityorresponds to apartiular lass
ofprexodesoften alledomplete. A prexode X onthealphabetA is
ompleteifanywordon A haseither a prexinX oris a prexof a word
ofX.
Theorem 6 shows that the generatingsequenes of regular prex odes
areexatlythe regularsequenessatisfyingKraft'sinequality.
4.2 Bix odes
We investigate here the length distributions of a partiular lass of prex
odes, alledbix. Several other lassesof prex odes ould give riseto a
similarstudy (fora desriptionto these lasses, see[21 ℄).
The denitionof a suÆxode is symmetri to the denitionof a prex
ode. Itisaset ofwordsX suhthatnoelementof X isasuÆxofanother
one. The notionofa ompletesuÆxode isalso symmetri. A bix ode is
aset X of words whih isbothaprexand a suÆxode.
Any set of words of xed lengthis obviously a bix ode butthere are
more ompliatedexamples.
Example 5 The set
X=faaa;aaba;aabb;ab;baa;baba;babb;bba;bbbg
isa omplete prexode pitured inFigure 14. It is also a ompletesuÆx
ode as onemayhekbyreadingits words bakwards.
Surprisingly,it is an open problemto haraterize the length distribu-
tionsofbixodes. Thefollowingsimpleexampleshowsthatthey aremore
onstrainedthanthose of prexodes.
a
b
a
b
a
b
b
a
b
a
b
a
b
a
b
Figure 14: Thebixode X.
Example 6 Thesequeneu(z)=z+2z 2
isnotrealizableasthelengthdis-
tributionofabixodeon abinary alphabetalthoughu(1=2)=1. Indeed,
oneofthesymbolshastobeinX,saya. Thenbbistheonlywordoflength
2that an beadded.
The following nie partial result is due to Ahlswede, Balkenhol and
Khahatrian[2 ℄. Westatetheresultforabinaryalphabet. Itan bereadily
generalizedto k symbolsbutit presentslessinterest.
Theorem 9 For any integer sequeneu suh that
u(1=2)1=2;
there isa bix ode X suh that u=u
X .
Proof. The proof isbyindution. Wesupposethat we have alreadybuilt a
bixodeXformedofwordsoflengthatmostn 1withlengthdistribution
(u
1
;u
2
;:::;u
n 1
). We have
n
X
i=1 u
i 2
i
1=2;
2 n
X
i=1 u
i 2
n i
2 n
:
Finally,we obtain
u
n 2
n
2 n 1
X
i=1 u
i 2
n i
:
The expression of the right handside is at most equal to the number of
elements of the set A n
XA
A
X. Thus, we an hoose u
n
words of
lengthn whihdo nothave aprexorasuÆxinX. Thisprovesthe result
byindution.
The authors of [2 ℄ formulate the interesting onjeture that Theorem 9
isstilltrue ifthehypothesisu(1=2) 1=2 isreplaed byu(1=2)3=4.
There are known additional onditionsimposed on lengthdistributions
of bix odes. For example, one has the following result,originally dueto
Shutzenberger (see [16 ℄).
Theorem 10 IfXisaniteompletebixodeonk symbols,thenu
X
(1=k)=
1 and 1
k u
0
X
(1=k) is an integer.
The number 1
k u
0
X
(1=k) an be interpretedas the average length of the
words ofX. Indeed
zu 0
X (z)=
X
x2X jxjz
jxj
:
Example 7 Forthe bixode of Example5,we have
u
X
(z)=z 2
+4z 3
+4z 4
andthus
u 0
X
(z)=2z+12z 2
+16z 3
:
Hene 1
2 u
0
X
(1=2)=3:
The onditionsof Theorem 10 show diretly thatthe sequene of Example
6isnotrealizable. Indeed, itsatisestherstonditionbutnottheseond
one. The onditions of Theorem 10 are not suÆient. Indeed, if u(z) =
z+4z 3
wehaveu(1=2) =1andu 0
(1=2)=4althoughitislearlyimpossible
thatu=u
X
fora bixodeX.
Reently, Ye and Yeung [45℄ have made some progress on this prob-
lem. Theyare in partiular able to prove that Theorem 9 stillholdswhen
u(1=2)5=8.
ular odes
Inthissetion,wepresent anumberofresultson interrelatedobjetswhih
are onneted with yli permutation of words. The link with enumera-
tiveombinatoriswasdeveloppedinLothaire'svolume[31 ℄ andlaterinR.
Stanley's book [44℄. We begin with notions lassial insymboli dynamis
(see[30 ℄or[28℄forageneralreferene;see[15 ℄or[24℄forthelinkwithnite
automata).
5.1 Subshifts of nite type
Asubshift is aset of biinnitewords on anitealphabet A whih avoids a
given set F of forbidden words. It is a topologialspae asa losed subset
of the spae A Z
of funtions from Z into the set A. The full shift on A is
thesetof all biinnitewords on A. Itorrespondsto thease F =;.
Aso subshiftisthesetof biinnitelabelsofpathsinaniteautoma-
ton. A so subshift is alled irreduible if the automaton an be hosen
strongly onneted. A subshift of nite type is the set of biinnite words
avoiding a nite setof nitewords. Any subshiftof nitetype is so but
the onverse is not true. The edge shift of a nite graph G is the set S
G
of biinnite paths in G (viewed as biinnite sequenes of edges). It is a
subshiftof nitetype.
The shift isthefuntionon a subshiftS whihmaps a pointx to the
pointy =(x)whose ithoordinate isy
i
=x
i+1 .
Amorphism from asubshiftS intoa subshiftT isafuntionf :S !T
whih is ontinuous and invariant under theshift. A bijetive morphism is
alled a onjugay. Any subshift of nite type is onjugate to some edge
shift.
The entropy h(S) of a subshiftS is the entropyof theformal language
formedbythenitebloksourringinwordsofS. Itanbeshownthatthe
entropyis atopologialinvariant,inthesensethattwo onjugate subshifts
have the same entropy.
Whiletheentropyisameasureofnumberofforbiddenwords,itispossi-
bletostudythenumberofminimalforbiddenwords. Itgivesrisetoanother
invariantof subshifts[13 ℄,[14 ℄.
Anintegerpisaperiod ofapointx=(a
n )
n2Z ifa
n+p
=a
n
foralln2Z.
Equivalently,p isaperiodof xif p
(x)=x. Thezetafuntionofa subshift
(S)=exp X
n1 p
n
n z
n
wherep
n
isthenumberofwords withperiodninS. Itisalso atopologial
invariant,sinea pointof periodnis mapped by aonjugay ona point of
thesame period.
ThefollowingresultduetoBowenandLanford[19℄islassial(see[30 ℄).
Proposition 13 LetGbea nitegraphandletM bethe adjaeny matrix
of G. Then
(S
G
)=det(I Mz) 1
:
Proof. We rst have foreah n1
Tr(M n
)=p
n
sinetheoeÆient (i;j) of M n
is thenumberof paths from ito j. Thus
(S
G
) = exp X
n1 p
n
n z
n
= exp X
n1 Tr(M
n
)
n z
n
= expTr(log(I Mz) 1
)
= det (I Mz) 1
sine,bytheformulaof Jaobi, expTr=detexp.
Example 8 LetS betheedge shiftofthegraph Gof Figure15. We have
M = 2
4
1 1 0
0 0 1
1 0 0 3
5
:
Consequently
(S)= 1
1 z z
3 :
1
3
Figure 15: A subshiftof nitetype
Let S be a subshift of nite type and let p
n
be the number of points
with periodn. Let q
n
be the number of points with least period n. Sine
q
n
is a multipleof n, we also denote q
n
= nl
n
. We have then the formula
expressingthe zeta funtion as an innite produt using theintegers l
n as
exponents.
(S)=
n1 (1 z
n
) ln
;
asone mayverify usingp
n
= P
djn dl
d
and thedenitionof (S).
A lassialresult,related with what follows,is thefollowing statement,
knownas Krieger'sembeddingtheorem.
Theorem 11 Let S;T betwo subshiftsof nite type. Thereexistsan inje-
tivemorphism f :S!T withf(S)6=T i
1. h(S)<h(T)
2. foreahn1,q
n
(S)q
n
(T)whereq
n
(S)(resp. q
n
(T))isthenumber
of points of S (resp. T)of least period n.
The following result is the basis of many appliations of symboli dy-
namisto oding. It isdueto Adler, CoppersmithandHassner [1℄.
Theorem 12 If S is an irreduible subshift of nite type suh that h(S)
logk, it is onjugate to a subshift of nite type S
G
where the graph G has
outdegree at least k.
The proof is based on a state-splitting algorithm using approximate
eigenvetors and Lemma 1. This result is part of a number of onstru-
tionsleadingto slidingblokodesusedinmagnetireording(see[35℄,[11 ℄
or[30 ℄). It givesat thesame time thefollowingresult.
Theorem 13 It S is a subshift of nite type suh that h(S) logk, then
thereis a graph G of outdegree at mostk suh that S isonjugate to S
G .
u be a regular sequene of integers suh that u(1=k) 1. Let G be a
normalized graphreognizing u (in thesense of Setion2.1). Let
Gbe the
graph obtainedby merging the initial and terminal vertex. Then h(S
G )
logk. WeanapplyTheorem13toobtainagraphHwithoutdegreeatmost
ksuhthatS
G andS
H
areonjugate. ThisgivestheonlusionofTheorem
6providedtheinitial-terminalvertexdidnotsplitintheonstrution. The
followingexamplesshowbothases (fordetails,see [7 ℄ and[8 ℄).
Example 9 LetGbe thegraphofFigure5. The splittingofvertex2gives
agraph ofoutdegree 2. A normalizationgivesthe automatonon theright.
Example 10 The sequene ofthe example given inFigure 6 is reognized
by a graph G suh that
G has three yles of length 2. The solutionas a
binarytree has onlytwo yles of length 2 and thusould notbe obtained
bystate-splitting.
5.2 Cirular odes
Airular word,orneklae, isthe equivalenelassof awordunderyli
permutation. For a word w,we denote by w the irularwordrepresented
byw.
Let X be a set of words and w =x
1 x
2 x
n
with x
i
2 X. The set of
ylipermutationsofthesequene (x
1
;x
2
;:::;x
n
) isalledafatorization
oftheirular wordw.
Airularode isasetXofwordssuhthatthefatorizationofirular
words isunique.
Example 11 The set X=fa;abag is airular ode. Indeed, theposition
ofthesymbols bdeterminesuniquely theourrenes ofaba.
Example 12 The set X = fab;bag is not a irular ode. Indeed, the
irular word w for w = abab has two fatorizations namely (ab;ab) and
(ba;ba).
The followingharaterization is useful(see [16 ℄).
Proposition 14 A setX is a irular ode if and only if it is a ode and
for all u;v2A
,
uv;vu 2X
)u;v2X
is nota irular ode. Indeed, otherwise we would have a;b2 X
whih is
ontraditory.
LetX beaniteode. Theower automaton of X,denoted A
X
,isthe
followingautomaton. The setof itsstates is
Q=f(u;v)2A +
A +
juv2Xg[(1;1)
The transitions are of the form (u;av) a
! (ua;v) or (1;1) a
! (a;v) or
(u;a) a
!(1;1). The uniqueinitialand nal state is(1;1).
Example 14 Theowerautomatonoftheirularodefa;abagispitured
inFigure16.
1
2
3 a
a
b
a
Figure 16: Theowerautomatonof fa;abag.
The followingresultis easy to prove.
Proposition 15 The ower automaton A
X
reognizes X
. Theode X is
irular i for eah word w, there is atmost one ylewith label w.
We now study the length distributions of irular odes. Let X be a
irular ode and let u
(
z) = (u
n )
n1
be its length distribution. For eah
n 1, let p
n
be the number of words w of length n suh that w has a
fatorization inwords ofX.
Proposition 16 The sequenes (p
n )
n1
and (u
n )
n1
are related by
exp X
n1 p
n
n z
n
= 1
1 u(z)
: (2)
n n
It istherefore possibleto supposethat thesequene (u
n
) is nite, i.e. that
the ode X is nite. Let A be the ower automaton of X. Let S be
the subshiftof nitetype assoiated with the graph of A. Then p
n is the
numberof elementsofperiodn inS. Indeed, eah wordw suh thatw has
afatorizationisountedexatlyoneasthelabelofayleinA. Wehave
also
det(I Mz)=1 u(z):
Thus, theresult followsfrom Proposition13.
Theexpliitrelationbetweenthenumbersu
n andp
n
isthefollowing. For
eahi1,letu (i)
=(u (i)
n )
n1
bethelengthdistributionofX i
. Equivalently,
u (i)
n
istheoeÆientof degree nof u(z) i
. Then foreah n1
p
n
= n
X
i=1 n
i u
(i)
n :
We also have foreah n1
p
n
=nu
n +
n 1
X
i=1 p
i u
n i
: (3)
This formula an be easily dedued from Formula (2) by taking the loga-
rithmi derivative of eah side of the formula. It shows diretly that for
any sequene (u
n )
n1
of nonnegative integers, the sequene p
n
dened by
Formula(2)is formedof nonnegativeintegers.
Formula (3) is known as Newton's formula in the eld of symmetri
funtions. Atually, the numbers u
n
an be onsidered, up to the sign, as
elementarysymmetrifuntionsandthep
n
asthesumsofpowers(see[32 ℄).
The linkbetween Wittvetors and symmetri funtions wasestablishedin
[43 ℄.
Letp
n
= P
djn dl
d
. Thenl
n
isthenumberofnon-periodiirularwords
oflengthn withafatorization. Interms of generatingseries,we have
exp X
n1 p
n
n z
n
= Y
n1 (1 z
n
) l
n
: (4)
Puttingtogether Formulae (2)and (4), weobtain
1
1 u(z)
= Y
n1 (1 z
n
) ln
: (5)
n n1 n n1
thusdenedisformedofnonnegativeintegers. Thisanbeprovedeitherby
adiret omputation orby aombinatorial argument sineanysequene u
ofnonnegativeintegersisthelengthdistributionofairularodeonalarge
enoughalphabet. We denote l=(u) and we saythat l isthe -transform
ofthesequene u.
We denoteby'
n
(k)thenumberofnon-periodiirularwordsoflength
nonk symbols. Thenumbers'
n
(k)arealledtheWittnumbers. Itislear
thatthesequene ('
n (k))
n1
isthe -transformof thesequene (k n
)
n1 .
The orresponding partiularase of Identity (5)
1 kz= Y
n1 (1 z
n
) 'n(k)
isknownastheylotomi identity.
ThefollowingarraysdisplayatabulationoftheWitt numbers forsmall
valuesof nand k.
n '
n (2) '
n
(3) '
n (4)
1 2 3 4
2 1 3 6
3 2 8 20
4 3 18 60
5 6 48 204
6 9 116 670
7 18 312 2340
8 30 810 8160
9 56 2184 29120
10 99 5880 104754
The value '
3
(4) =20 is famous beause of the geneti ode: there are
preisely20amino-aidsodedbywordsoflength3overa4-symbolalphabet
A,C,G,U.
For anysequene a=(a
n )
n1 ,let
p
n
= X
djn da
n=d
d :
Thepair(a;p)isalleda Wittvetor(see[29 ℄ or[36 ℄). Thenumbersp
n are
theghost omponents. In termsof generatingseries,one has
exp X
n1 p
n
n z
n
= Y
n1 (1 a
n z
n
) 1
: