PERRIN
Abstrat. Thispaperpresentsasurveyonlengthdistributionsofregularlanguages.
Theaentisonproblemsinodingtheoryandtherelationwithsymbolidynamis.
Keywords. Regularsequenes,niteautomata,prexodes,bixodes,symboli
dynamis,zetafuntions.
1. Introdution. The notion of a length distribution for a formal
languageisasimpleone: itisthegeneratingseriesu(z)= P
n0 u
n z
n
ofthe
numberofwordsofeahlength. Thisseriesarriesimportantinformation
onerninga formallanguage sineit measures in a sense the size of the
language. It is moreover appropriate in the ase of oding. In fat, a
length-preserving enoding denes a one-to-one orrespondene between
words. Thetwosetsofwordsinsuhaorrespondenewillhavethesame
lengthdistribution.
Itisalassialresultthatthelengthdistributionofaformallanguage
arries also some information onerning the struture of the language,
in the sense that algebraioperations on series orrespond to operations
onformal languages. Thus, aswe shall see below in more detail, length
distributionswhih arerationalseriesorrespondto regularlanguages.
This orrespondene betweenoperationson series and on sets is the
basisofthemethodofgeneratingseriesinenumerativeombinatoris. Nu-
merous examples of appliations an be found in the book of Graham,
KnuthandPatashnik[23℄.
Wepresenthereasurveyonlengthdistributions offormal languages
withemphasisontheproblemsrelatedtoodingandniteautomata. We
insistonthefollowinggeneralproblem: givenafamilyF ofsets ofwords,
haraterizethe length distributions of the elements of F. For example,
the length distributions of prex odes on k-symbols are the sequenes
satisfyingKraft'sinequality
X
n0 u
n k
n
1;
i.e. u(1=k)1.
Ouremphasisisonthepropertyofregularitywhihisthedenability
by a nite automaton. This plaes ourwork at the intersetion between
Institutd'
Eletronique etd'InformatiqueGaspard-Monge, UniversitedeMarnela
Vallee, 5, Boulevard Desartes, Champs-sur-Marne, 77454 Marne la Vallee Cedex 2,
Frane. http://www-igm.univ-mlv.fr/
oding theory and automata theory. For example, one of the main re-
sults presented hereis a nite-state versionof Kraft-MMillan's theorem
haraterizingthelengthdistributions ofregularprexodes.
Wealsomakeonnexionswiththeeldofsymbolidynamis. Thisis
naturalsinethebasinotionofsymbolidynamis,namelytheonjugay
ofsubshiftsisbasedonaone-to-oneorrespondenebetweenpathsinnite
graphs,givingriseto aninvarianeofthelengthdistributions.
Our paper is organized as follows. The rst setions (Setions 2,3)
presentthebasinotionsonautomataandformalseriesusedinthepaper.
InSetion4,wepresentthenite-stateversionofKraft-MMillantheorem
mentionedabove. The partiular aseof bixodes is studied in Setion
5. The last setion (Setion 6) presents several interonneted notions
onerningsubshiftsofnitetypeandirularodes.
2. Length distributions. We onsidertheset A
ofallwordsona
given alphabet A. A subsetof A
is often alled aformal language. For
setsX;Y A
,wedenote
X+Y =X[Y;
XY =fxyjx2X;y2Yg;
X
=fx
1 x
2 x
n jx
i
2X;n0g
Wesaythatthepair(X;Y)isunambiguousifforeahz2XY thereisat
mostonepair(x;y)2XY suhthat z=xy.
Wesaythat aset of nonempty wordsX is aode ifforeahx 2X
there isat mostonesequene (x
1
;x
2
;:::;x
n
)with x
i
2X suh thatx =
x
1 x
2 x
n
(one also says that X is uniquely deipherable). A partiular
aseofaodeis aprex ode. ItisasetofwordsX suhthatnoelement
of X is aprexof anotherone. It iseasy to see that suh aset is either
reduedtotheemptywordordoesnotontaintheemptywordandisthen
aode.
The length distribution of a set of words X is the sequene u
X
=
(u
n )
n0 with
u
n
=Card(X\A n
):
Wedenotebyu
X
theformalseries
u
X (z)=
X
n0 u
n z
n
:
whihistheordinarygeneratingseriesofthesequeneu
X .
For example,thelengthdistribution ofX=A
isu(z)= 1
1 k z where
k=Card(A).
Theentropy ofaformallanguageX is
where istheradiusofonvergeneoftheseriesu
X
(z). It iswelldened
providedX isinniteandthusisnite. IfthealphabetAhaskelements,
wehaveh(X)logk.
Thefollowingresultrelatesthebasioperationsonsetswithoperations
onseries.
Proposition2.1. Thefollowingpropertieshold foranysubsetsX;Y
ofA
.
(i) If X\Y =;,then u
X+Y
=u
X +u
Y .
(ii) If the pair (X;Y)isunambiguous,thenu
XY
=u
X u
Y .
(iii) If X isaode,then u
X
=1=(1 u
X ).
Proof. Thersttwoformulaearelear. IfX isaode,everywordin
X
hasauniquedeompositionasaprodutof wordsinX. Thisimplies
that
u
X n
=(u
X )
n
andthus,
u
X
=1+u
X
++u
X
n+=1=(1 u
X ):
Example 1. ThesetX =fb;abgisaprex ode. The seriesu
X
is
u
X (z)
= 1
1 z z
2 :
Let(F
n )
n0
bethesequeneofFibonai numbersdenedbyF
0
=0, F
1
=
1,andF
n+2
=F
n+1 +F
n
. Itfollows fromthe reurrenerelationthat
z
1 z z
2
= X
n0 F
n z
n
:
Consequently, u
X (z)=
P
n0 F
n+1 z
n
. Itan also be provedby a ombi-
natorialargumentthatthe numberof wordsof lengthninX
isF
n+1 .
There are several variants of the generating series onsidered above.
Onemayrstdene
p
X (z)=
X
n0 u
n
k n
z n
;
wherek=Card(A). TheoeÆientsof z n
in p
X
(z)is theprobabilityfor
awordoflengthntobeinthesetX. Therelationbetweenu
X andp
X is
simplesinep
X (z)=u
X
(z=k). Anothervariantofthegeneratingseries is
theexponential generating seriesofthesequene(u
n )
n0
dened as
e(z)= X
n0 u
n
n!
z n
:
Wewill alsousethezetafuntionofasequene(u
n )
n1
dened as
(z)=exp X
u
n
n z
n
:
3. Regular distributions. Inthis setion, wedesribe theonne-
tionbetweenthe notionsof aregularlanguage anda rationalseries. We
provethelassialresult(Theorem3.4)haraterizingtheregularsequenes
as the length distributions of regular languages. We mention nally the
possibleextensiontomoregenerallassesofformallanguages,suhasthe
ontext-freelanguages. These resultsare well-knownin thetheory of au-
tomataandweinlude themhereforthesakeofthereader'sonveniene.
A word on the terminology used here. We use onstantly the term
regular where a riher terminology is oftenused. Inpartiular, what we
all herea regular sequene is, in Eilenberg's terminology, an N-rational
sequene (see [20℄, [33℄ or[16℄). A regularset is also alled arational or
reognizable set.
3.1. Regular sequenes. Asequeneu=(u
n )
n0
ofintegersisreg-
ularifthere existsanite graphGandtwosetsofvertiesI;T ofGsuh
thatforalln0,
u
n
=Card(P(n;I;T));
whereP(n;I;T)isthesetofpathsoflengthnfromavertexofI toavertex
ofT. ThegraphGisoneinwhihmultiples edgesareallowed(sometimes
alledamultigraph). WesaythatthegraphGreognizesthesequeneu.
Anequivalentdenitionofregularsequenesisobtainedbyonsidering
nonnegativematries.
Proposition 3.1. A sequene u=(u
n )
n0
of integers isregular i
thereexistsanonnegativematrixM2N k k
andtwovetorsl;2N k
suh
that
u
n
=lM n
;
wherelisonsideredasarow vetorand asaolumnvetor.
Proof. Let ubea regularsequene dened bya graphG onthe set
f1;:::;kgof verties. WehooseM tobetheadjaenymatrixofG,i.e.
foreahpairv;wofverties,M
v;w
isthenumberofedgesfromvtow. Let
lbe therow vetordened byl
v
=1 ifv 2 I and 0otherwise. Let be
theolumnvetordenedby
v
=1ifv2T and0otherwise. Thenumber
ofpaths oflengthn from avertexof I to avertexof T is foreahn1
equalto lM n
.
Conversely,let Gbe thegraphwith adjaeny matrixM. Sine the
familyofregularsequenesislosedunder addition,wemaysupposethat
thevetorsl; have0;1oeÆients. Weanthenonsiderl;asthehar-
ateristivetorsofsetsI;T ofverties. Itisthenobviousthat thegraph
thusonstrutedreognizesu.
Example2. LetGbethe graphof Figure 1. Thenumber ofpaths of
lengthnfrom vertexi=1tovertex t=2isthe Fibonai numberF .
1
Æ
6
2
Æ
?
Fig.1. TheFibonaigraph.
Aordingly, letM be thematrix
M=
1 1
1 0
:
Thesamesequeneisdenedby the equation
F
n
=
1 0
M n
0
1
:
Wesaythatasequeneuofintegersis rational ifu(z)=p(z)=q(z)for
somepolynomialsp(z);q(z)withintegeroeÆients. The followingresult
islassial.
Theorem 3.1. Any regular sequene u of nonnegative integers is
rational.
Proof. Let(l;M;)besuhthatu
n
=lM n
. Wehave
u(z)= X
n0 lM
n
z n
=l(
X
n0 (Mz)
n
)=l(I Mz) 1
:
TheresultfollowssinetheoeÆientsof(I Mz) 1
arerationalfrations.
Example 3. The generating funtion of the Fibonai sequeneis
F(z)= z
1 z z
2 :
TheonverseofTheorem3.1isnottrue. Wehaveatuallythefollow-
ingresult,due toJeanBerstel(see[20℄or[16℄).
Theorem 3.2. Foranyregularsequeneu,thereisanintegerpsuh
that the set of poles of minimal modulusis the setof omplex numbers "
where isthe radiusof onvergeneof uand"
p
=1for somep1.
Inpartiular,theradiusofonvergeneisapole.
The following example(from [20℄ Example 6.1, Chapter VIII) shows
theexisteneofrationalserieswithnon-negativeintegeroeÆientswhih
are notregular.
Example 4. Let0<<=2besuhthat os=a=with0<a<
and6=2a. Thesequene
u = 2n
os 2
n
isrational butnotregular(poles: 1;e 2i
;e 2i
).
Asequeneuisamergeofsequenes
u (0)
;:::;u (p 1)
ifforn0;0i<p,
u
pn+i
=u (i)
n :
We say that a pole of a rational series is dominating if it is stritly less
thanthemodulusofallotherones. Thefollowingresultisdueto Soittola
(see[33℄).
Theorem 3.3. A sequeneof non-negativeintegersisregulariitis
anmergeof rational sequeneswith adominatingpole.
Example5. Thesequene
1;1;2;1;4;2;8;3;16;5;:::
isthe mergeofthe sequeneofpowersof 2andthe Fibonai sequene.
Athirdequivalentdenitionofregularsequenesispossible. Onean
indeed showthat aseries u(z)is regulari itanbeobtainedbyanite
numberofoperationsofsum,produtandstarwith
u
(z)= 1
1 u(z)
;
startingfrompolynomialswithnonnegativeintegeroeÆients. Anexpres-
sionofthisformisusuallyalled aregularexpression.
Example 6. The sequene(0;1;3;8;21;:::)formedof the Fibonai
numbersof evenindex isregular. Indeedwehave
F
2n
=lM 2n
withthe triple (l;M;) ofExample 2. Wehave
M 2
=
2 1
1 1
;
andthusF
2n
isthenumberofpathsof lengthnfrom1to2inthe graphof
Figure2. The seriess(z)= P
n0 F
2n z
n
anaordingly bewritten
s(z)=z(2z+z 2
z
)
=
z(1 z)
2 :
1
Æ
-
2
Æ
6
Fig.2. OneeveryotherFibonainumber
3.2. Finite automata. Wepresent hereabrief introdution to the
oneptsusedinautomatatheory. Forageneralreferene,see[31℄or[20℄.
An automatonoverthealphabetAisomposedofasetQofstates,a
set EQAQofedges ortransitionsandtwosetsI;T Qofinitial
andterminalstates.
A path intheautomatonAisasequene
(p
1
;a
1
;p
2 );(p
2
;a
2
;p
3
);:::;(p
n
;a
n
;p
n+1 )
of onseutive edges. Its label is the word x = a
1 a
2 a
n
. A path is
suessful ifit startsin aninitial stateandends in aterminalstate. The
set reognizedbytheautomatonisthesetoflabelsofitssuessfulpaths.
An automatonis deterministi if, for eah state pand eah letter a,
thereisat mostoneedgewhihstartsatpandis labeledbya. Theterm
rightresolving isalsoused.
1
Æ
?
Æ
?
2
Æ
b
a
b
Fig.3. Goldenmeanautomaton.
Example 7. Let A be the automaton given in Figure 3 with 1 as
uniqueinitial and terminalstate. It reognizes the set X
where X isthe
prexode X =fb;abg:
A set ofwordsX overA is regularif itanbereognizedbyanite
automaton.
Itisalassialresultthatasetofwordsisregulariitanbeobtained
byanitenumberofoperationsunion,produtandstar,startingformthe
nitesets.
Thefollowingresultisalsolassial.
Proposition 3.2. Every regular set an be reognized by a nite
deterministiautomatonhaving auniqueinitial state.
Proof. LetA=(Q;E;I;T)beaniteautomatonoverA reognizing
aset X. Let B=(R;F;fIg;T)betheautomatondened asfollows. Its
statesarethesubsets
u
foralluin A
. SineQis nite,there isanitenumberofsubsetsQ(u).
TheedgesofBarealltriples
(Q(u);a;Q(ua)):
Thesetofterminalstatesis
T =fU 2RjU\T 6=;g:
ItiseasytoverifythatB isdeterministiandreognizesX.
Theorem3.4. Thelengthdistributionsofregularsetsarethe regular
sequenes.
Proof. LetXbearegularset. ByProposition3.2,itanbereognized
byadeterministiautomatonA. SineAisdeterministi,thereisatmost
onepath with givenlabel, originand end. Thus the numberof paths of
lengthn from theinitial stateto aterminal stateisequal tothe number
u
n
ofwordsofX oflengthn.
Conversely, let u be a regular sequene enumerating the paths in a
graphGfromI to T. WeonsiderthegraphGasanautomatonwithall
edgeswithdistintlabels. LetX bethesetoflabelsofpathsfromI toT.
ThesequeneuisthelengthdistributionofthesetX.
Example8. If X=a
b,then
u
X (z)=
z
1 z :
3.3. Beyond regular sequenes. There areseveral naturallasses
ofseriesbeyondtherationalones. Thealgebraiseriesarethosesatisfying
analgebraiequation. Moregenerally,thehypergeometriseriesarethose
suhthatthequotientoftwosuessivetermsisgivenbyarationalfration
(see[23℄).
Thelassofalgebraiseriesislinkedwiththelassofontext-freesets
(see [21℄). A typial example of aontext-free set is the set of wordson
thebinary alphabet fa;bghaving asmanya's asb's. We omputebelow
itslengthdistributionwhih isanalgebraiseries.
Example 9. The setof wordson A=fa;bghaving anequal number
of ourrenes ofa andb is asubmonoidof A
generated bya prex ode
D. Sine any wordof D
of length2nisobtainedby hoosing npositions
among2n, wehave
u
D (z)
= X
n0
2n
n
z 2n
:
Byasimple appliation ofthe binomial formula,we obtain
u (z)
=(1 4z 2
) 1
2
:
Thisfollows indeed, usingthe simple identity
1
2
n
= 1
( 4) n
2n
n
:
Wehaveu
D
(z)=1 1=u
D
(z)andthus
u
D (z)=1
p
1 4z 2
:
Thus u
D
(z)isan algebrai series, solutionof theequation
f 2
2f+4z 2
=0:
4. A nite-state version of the Kraft-MMillan theorem. Let
X beaprexodeonanalphabetwith ksymbols. It islassialthat its
lengthdistributionu=(u
n )
n1
satisesKraft'sinequality
X
n1 u
n k
n
1;
orequivalentlyu(1=k)1. Thenumberu(1=k)anatuallybeinterpreted
astheprobabilitythatalongenoughwordhasaprexinX.
Thereisalsoaonnexionwiththenotionofentropy. Atually,ifX is
aprexode,theentropyofX
isequaltolog(1=)whereisthesolution
oftheequationu
X
()=1. ThusKraft'sinequalityexpressesthefatthat
h(X
)logk.
Conversely, Kraft-MMillan's theorem states that for any suh se-
quene u=(u
n )
n1
, thereexists aprexode X onak-symbolalphabet
suhthat u=u
X .
Let us briey desribe the proof. We suppose by indution to have
alreadybuiltaprexodeX formedofwordsoflengthatmostn 1with
lengthdistribution(u
1
;u
2
;:::;u
n 1
)onthealphabetA
k
=f0;1;:::;k
1g. Wehave
n
X
i=1 u
i k
i
1;
andthus
n
X
i=1 u
i k
n i
k n
:
Thisallowsustohooseu
n
wordsonthealphabet A
k
oflengthnwithout
aprex in X. Forthe sakeofaompletedesriptionof theonstrution,
nwhihdonothavealreadyaprexin X. Apossiblepoliy isto hoose
theearlierones inthealphabeti order.
TheequalityaseinKraft'sinequalityorrespondstoapartiularlass
ofprexodesoftenalledomplete. AprexodeX onthealphabetAis
ompleteifanywordonAhaseitheraprexin X orisaprexofaword
ofX.
Thenotionofaprexodeisrelatedtothenotionofatree. Aprex
odeonk symbolsorrespondstoak-arytree. Thelengthdistributionof
theprex odeis theenumerativesequene of theleavesofthe tree. We
allitthelengthdistributionofthetree. Usually,theinterestisfousedon
nitetrees,asin Humanalgorithm forexample.
Weareinterestedhereintheaseofinnitetreesand,moreespeially
of regular trees arising from prex odes whih are regular, in the sense
denedabove. Thenotionofaregulartreeanalso bedeneddiretly as
aninnitetreewithonlyanitenumberofnon-isomorphisubtrees.
ByTheorem 3.4, ifX is regular, then thesequene u
X
is also regu-
lar. Thefollowingresultshowsthat onverselytheonjuntionofthetwo
onditions(of being regularand to satisfyKraft'sinequality)issuÆient
toensuretheexisteneofaregularprexodeonak-symbolalphabet.
Theorem 4.1. A sequene uof integers is the lengthdistribution of
aregularprexode on ksymbols i
(i) itisregular.
(ii) itsatisesKraft'sinequality u(1=k)1.
The essene of this result is a onstrutive method allowing one to
buildtheregularprexodeX giventhesequeneu.
Two simple methods ome to mind at rst glane. The rst one is
toapply diretlytheproofoftheKraft'stheorem. Thefollowingexample
showsthattheresultneednotbearegularset,althoughthesequeneuis
itselfregular.
Example 10. Letu(z)=z 2
=(1 2z 2
). Sine u(1=2)=1=2, wemay
applythe Kraftonstrution tobuildabinary treewithlength distribution
u. The resultisthe set
X = [
n0 01
n
0f0;1g n
whih isnotregular.
The seond method takes into aount the hypothesis that the se-
quene is regular. It will fail in its naive version but the solution is a
renementof this idea. LetG be agraphsuh that u
n
isthe numberof
pathsoflengthnfromI toT. WeannormalizethegraphGtoobtaina
graphsuhthatI =fig,T=ftgandthatnoedgegoesoutoft. Welabel
eah edge in suh a way that edges with a ommon start have dierent
labels. Theset reognized by theautomatonthus onstrutedis aprex
The trouble is that the number of symbols used may well be larger
thank asshownbythefollowingexample.
Example 11. Let u be the regular sequene given by the graph of
Figure4ontheleftwithi=1andt=4. Wehavealsou(z)=3z 2
=(1 z 2
).
Furthermoreu(1=2)=1andthususatisesKraft'sequality. Howeverthere
are four edges going out of vertex 2 and the method desribed above fails
tobuild abinary prexode. Asolution onA=fa;bgisthe regularprex
ode
X =(aa)
(ab+ba+bb):
Theorresponding automaton isgiven onFigure4on theright.
1
Æ
-
2
Æ
3
Æ
4
Æ
-
1
Æ
-
3
Æ
4
Æ
-
2
Æ b
b
a a
a
b
Fig.4. Graphsreognizingu(z)=3z 2
=(1 z 2
).
The proof of Theorem 4.1 onsists in building a new graph with all
verties of outdegree at most k. It relies on a transformation alled the
multiset onstrution desribed in [8℄. Theproofuses thefollowingom-
binatorial lemma also used in symboli dynamis by Adler and Marus
[28℄,[2℄,andquotedin [4℄asanievariantofthepigeon-holepriniple.
Lemma 4.1. Let k
1
;k
2
;:::;k
n
be positive integers. Then there is a
subsetSf1;2;:::;ngsuhthat P
s2S k
s
isdivisible by n.
Thegraphobtainedisshownin anexamplebelow.
Example 12. Let
u(z)= z
2
1 z 2
+ z
2
1 5z 3
: (4.1)
We have u(1=2) = 1. A regular binary tree with length distribution u is
givenin Figure 5(notethat,byonvention,avertexlabeled v hasitssons
represented only one on the gure. Thus, for example the vertex labeled
1 on the right has the same sons as the root. The leaves of the tree are
indiatedby ablakbox).
To hek that the lengthdistribution is equal to u, one may ompute
from the graph the following regular expression of u and hek by an el-
ementary omputation (possibly with the help of a symboli omputation
system) thatitisequal tou.
6 2 4 5 6 2 5 3 3
1 l
2 l
3 l
4 l
5 l
6
l
7 l
8 l
9 l
10 l
11 l
12 l
10 l
13 l
12 l
12 l
14 l
5 l
5 l
5 l
5 l
1 l
Fig.5. Regularbinarytreewithlengthdistributionu.
(noteforareaderunfamiliarwithregularexpressions: therstfator(z 6
)
orresponds to the vertex labeled 1 at level 6of the tree. The term 2z 2
+
z 4
+2z 5
+z 6
orresponds to the leaves reahed by a path whih does not
useavertexlabeled5. Thefator(z 2
+3z 5
)(5z 3
)
orrespondstothepaths
from the rootto avertex labeled 5. Finally, the fator 3z 3
orresponds to
the diretpathsfrom 5toa leaf.)
This example (suggested to us by Christophe Reutenauer) shows an
interestingfeatureofthisproblem. Infat,fromthepointofviewofregular
expressions, the diÆult operation inthis problem isthe sum. It wouldbe
a simple matter to build a rational tree for eah term of the sum in the
expression (12) (see Example 11). The diÆulty would then be to merge
these twotrees toobtainoneorresponding tothesum.
A urious onsequene of Theorem 4.1 is the following property of
regularsequenes.
Corollary4.1. Letk2beanintegerandletuberegularsequene
suh that u(1=k)1 and u(0)=0. Then there existk regular sequenes
u
1
;:::;u
k
suhthat u
i
(1=k)1and
u(z)= k
X
i=1 zu
i (z):
Proof. It isa simpleonsequene of Theorem 4.1. Indeed, ifX is a
regular prex ode on the k element alphabet A, then X = P
a2A aX
a
whereeahX
a
isaregularprexodeonthealphabetA.
5. Bixodes. Weinvestigateherethelengthdistributionsofapar-
tiular lass of prex odes, alled bix. Several other lasses of prex
odesouldgiveriseto asimilarstudy(foradesriptiontothese lasses,
see[19℄).
ThedenitionofasuÆxodeissymmetritothedenitionofaprex
ode. ItisasetofwordsX suhthatnoelementofX isasuÆxofanother
one. Thenotionof aompletesuÆxodeis alsosymmetri. A bixode
isasetX ofwordswhih isbothaprexandasuÆxode.
Anysetofwordsofxedlengthisobviouslyabixodebutthereare
moreompliatedexamples.
e
e
e
e
e
e
e
e
a
b
a
b
a
b
a
b
a
b
a
b a
b
a
b
Fig.6. ThebixodeX.
Example 13. Theset
X =faaa;aaba;aabb;ab;baa;baba;babb;bba;bbbg
is a ompleteprex ode pituredin Figure6. It isalso aomplete suÆx
ode as onemayhekbyreadingitswordsbakwards.
Surprisingly, it is an open problem to haraterize the lengthdistri-
butions of bixodes. The followingsimpleexampleshowsthat theyare
moreonstrainedthanthoseofprexodes.
Example 14. The sequene u(z) = z+2z 2
is not realizable as the
lengthdistributionofabixodeonabinaryalphabetalthoughu(1=2)=1.
Indeed,oneofthe symbols hastobeinX,say a. Then bbisthe onlyword
oflength2thatanbeadded.
The following nie partial result is due to Ahlswede, Balkenhol and
Khahatrian [3℄. We state the result for a binary alphabet. It an be
Theorem 5.1. Forany integersequeneusuhthat
u(1=2)1=2;
thereisabixode X suhthat u=u
X .
Proof. Theproofis by indution. Wesuppose that wehavealready
builtabixodeX formed ofwordsoflengthat mostn 1with length
distribution(u
1
;u
2
;::: ;u
n 1
). Wehave
n
X
i=1 u
i 2
i
1=2;
andthus
2 n
X
i=1 u
i 2
n i
2 n
:
Finally,weobtain
u
n 2
n
2 n 1
X
i=1 u
i 2
n i
:
The expression of the right handside is at most equal to the number of
elements of theset A n
XA
A
X. Thus, weanhooseu
n
wordsof
lengthnwhihdonothaveaprexorasuÆxinX. Thisprovestheresult
byindution.
Theauthorsof [3℄ formulatetheinterestingonjeturethat Theorem
5.1isstilltrueifthehypothesisu(1=2)1=2isreplaedbyu(1=2)3=4.
Thereareknownadditionalonditionsimposedonlengthdistributions
ofbixodes. Forexample,onehasthefollowingresult,originally dueto
Shutzenberger(see[14℄).
Theorem 5.2. IfX isaniteompletebixode onksymbols,then
u
X
(1=k)=1and 1
k u
0
X
(1=k)isaninteger.
Thenumber 1
k u
0
X
(1=k)anbeinterpretedastheaveragelengthofthe
wordsofX. Indeed
zu 0
X (z)=
X
x2X jxjz
jxj
:
Example15. Forthe bixode ofExample 13,wehave
u
X (z)=z
2
+4z 3
+4z 4
andthus
u 0
X
(z)=2z+12z 2
+16z 3
:
Hene 1
2 u
0
X
(1=2)=3: The onditionsofTheorem 5.2 show diretly that
thesequene of Example14 is notrealizable. Indeed, itsatises therst
onditionbut nottheseondone. Theonditionsof Theorem5.2are not
suÆient. Indeed,ifu(z)=z+4z 3
wehaveu(1=2)=1and u 0
(1=2) =4
althoughitislearlyimpossiblethatu=u forabixodeX.
6. Zeta funtions, subshifts ofnite type and irular odes.
Inthissetion,wepresentanumberofresultsoninterrelatedobjetswhih
are onneted with yli permutation of words. We begin with notions
lassialin symbolidynamis (see[25℄or[24℄for ageneralreferene;see
[13℄or[22℄forthelinkwithnite automata).
6.1. Subshiftsofnitetype. Asubshift isasetofbiinnitewords
onanitealphabetAwhihavoidsagivensetF offorbiddenwords. Itis
atopologialspaeas alosedsubset ofthespaeA Z
offuntionsfrom Z
intothesetA. ThefullshiftonAisthesetofallbiinnitewordsonA. It
orrespondstotheaseF =;.
A so subshift is the set of biinnite labels of paths in a nite au-
tomaton. A so subshift is alled irreduible if the automaton an be
hosenstronglyonneted. Asubshift of nite type isthe setof biinnite
words avoiding a nite set of nite words. Any subshift of nite type is
sobuttheonverseisnottrue. Theedgeshift ofanitegraphGisthe
setS
G
ofbiinnitepathsinG(viewedasbiinnitesequenesofedges). It
isasubshiftofnitetype.
The shift is thefuntion on asubshift S whih maps apointx to
thepointy=(x)whoseith oordinateisy
i
=x
i+1 .
AmorphismfromasubshiftSintoasubshiftTisafuntionf :S!T
whihisontinuousandinvariantundertheshift. Abijetivemorphismis
alled aonjugay. Any subshift of nite typeis onjugateto some edge
shift.
Theentropy h(S)ofasubshiftS istheentropyoftheformallanguage
formed by the nite bloks ourring in words of S. It an be shown
thattheentropyisatopologialinvariant,inthesensethattwoonjugate
subshiftshavethesameentropy.
While the entropy is a measure of number of forbidden words, it is
possibleto studythenumberofminimal forbiddenwords. It givesriseto
anotherinvariantofsubshifts[11℄, [12℄.
An integerpis a period of apointx =(a
n )
n2Z if a
n+p
=a
n for all
n2Z. Equivalently, pisaperiod ofx if p
(x)=x. Thezetafuntion of
asubshiftS,isdened astheseries
(S)=exp X
n1 p
n
n z
n
wherep
n
isthenumberofwordswithperiodninS. Itisalsoatopologial
invariant,sinea pointof period n is mapped by aonjugayon apoint
ofthesameperiod.
The following result due to Bowen and Lanford [18℄ is lassial (see
[25℄).
Proposition6.1. LetGbeanitegraphandletM bethe adjaeny
matrixofG. Then
(S )=det(I Mz) 1
:
Proof. Wersthaveforeahn1
Tr(M n
)=p
n
sinetheoeÆient(i;j)ofM n
isthenumberofpathsfromitoj. Thus
(S
G )=exp
X
n1 p
n
n z
n
=exp X
n1 Tr(M
n
)
n z
n
=expTr(log (I Mz) 1
)
=det(I Mz) 1
sine,bytheformulaofJaobi,expTr=detexp.
Example16. LetS bethe edge shiftofthe graphGofFigure7. We
have
M= 2
4
1 1 0
0 0 1
1 0 0 3
5
:
Consequently
(S)= 1
1 z z 3
:
1
Æ
2
Æ
3
Æ
Fig.7. Asubshiftofnitetype
LetS beasubshift ofnite typeandletp
n
bethenumberofpoints
withperiodn. Letq
n
bethenumberof pointswithleast periodn. Sine
q
n
isamultiple ofn, wealsodenote q
n
=nl
n
. Wehavethentheformula
expressingthezetafuntion asaninniteprodutusingtheintegersl
n as
exponents.
(S)=
n1 (1 z
n
) ln
;
asonemayverifyusingp
n
= P
dl
d
andthedenition of(S).
Alassialresult,relatedwithwhatfollows,isthefollowingstatement,
knownasKrieger'sembeddingtheorem.
Theorem 6.1. Let S;T betwo subshifts of nite type. There exists
an injetive morphism f :S!T withf(S)6=T i
1. h(S)<h(T)
2. for eah n 1, q
n
(S) q
n
(T) where q
n
(S) (resp. q
n
(T)) is the
number ofpointsof S (resp. T)of leastperiodn.
Thefollowingresultisthebasisofmanyappliationsof symbolidy-
namisto oding. Itisdue toAdler,Coppersmith andHassner[2℄.
Theorem 6.2. If S is an irreduible subshiftof nite type suhthat
h(S)logk,itisonjugatetoasubshiftofnitetype S
G
wherethegraph
G hasoutdegreeatleastk.
The proof is basedon a state-splitting algorithm using approximate
eigenvetorsand Lemma4.1. This resultispartofanumberof onstru-
tionsleadingtoslidingblokodesusedinmagnetireording(see[29℄,[9℄
or[25℄). Itgivesat thesametimethefollowingresult.
Theorem 6.3. ItS isasubshiftof nitetypesuhthath(S)logk,
thenthereisagraphGof outdegreeatmost ksuhthat S is onjugateto
S
G .
There is a onnexion between this theorem and Theorem 4.1. Let
indeed u be a regularsequene of integerssuh that u(1=k) 1. Let G
be a normalized graph reognizing u (in the sense of Setion 4). Let
G
be the graphobtainedby merging theinitial and terminal vertex. Then
h(S
G
) logk. We an apply Theorem 6.3 to obtain a graph H with
outdegreeat mostk suh that S
G and S
H
are onjugate. This givesthe
onlusionofTheorem4.1providedtheinitial-terminalvertexdidnotsplit
in theonstrution. Thefollowingexamplesshowbothases(fordetails,
see[6℄and[7℄).
Example 17. Let Gbethe graph ofFigure4. The splittingof vertex
2 gives a graph of outdegree 2. A normalization gives the automaton on
theright.
Example18. ThesequeneofExample12 isreognizedbyagraphG
suhthat
Ghas threeylesof length2. Thesolution asabinarytreehas
onlytwoylesoflength2andthusouldnotbeobtainedbystate-splitting.
6.2. Cirularodes. Airularword,orneklae,istheequivalene
lassof awordunder yli permutation. Fora wordw, we denoteby w
theirularwordrepresentedbyw.
LetX beasetofwordsand w=x
1 x
2 x
n withx
i
2X. Theset of
ylipermutationsofthesequene(x
1
;x
2
;:::;x
n
)isalledafatorization
oftheirularwordw.
A irular ode is a set X of words suh that the fatorization of
irularwordsisunique.
Example 19. The set X =fa;abag is a irular ode. Indeed, the
Example 20. The set X = fab;bag is not airular ode. Indeed,
the irular word w for w = abab has two fatorizations namely (ab;ab)
and(ba;ba).
Thefollowingharaterizationisuseful(see[14℄).
Proposition 6.2. A set X is a irular ode if and only if it is a
ode andfor allu;v2A
,
uv;vu2X
)u;v2X
Example 21. We obtain another way to prove that the set X =
fab;bagisnota irularode. Indeed, otherwise wewouldhave a;b2X
whih isontraditory.
LetX beanite ode. Theower automaton of X, denoted A
X , is
thefollowingautomaton. Thesetofitsstatesis
Q=f(u;v)2A +
A +
juv2Xg[(1;1)
The transitions are of the form (u;av) a
! (ua;v) or (1;1) a
! (a;v) or
(u;a) a
!(1;1). Theuniqueinitialandnalstateis(1;1).
Example 22. The ower automaton of the irularode fa;abag is
pituredinFigure8.
1
Æ
?
Æ
?
2
Æ
3
Æ a
a
b
a
Fig.8. Theowerautomatonoffa;abag.
Thefollowingresultiseasytoprove.
Proposition 6.3. The ower automaton A
X
reognizes X
. The
odeX isirularifor eahwordw,thereisatmostoneylewithlabel
w.
Wenowstudythe lengthdistributions ofirular odes. Let X bea
irular ode and letu
( z)=(u
n )
n1
be itslengthdistribution. For eah
n 1, let p
n
be the numberof words w of length n suh that w has a
fatorizationinwordsofX.
Proposition6.4. Thesequenes(p
n )
n1 and(u
n )
n1
arerelatedby
exp X
p
n
n z
n
= 1
1 u(z)
: (6.1)
Proof. Eah (p
n
) depends only on the rst n terms of the sequene
(u
n
). It is therefore possible to suppose that the sequene (u
n
) is nite,
i.e. that theodeX isnite. LetAbetheowerautomatonofX. LetS
be thesubshift ofnite typeassoiatedwith thegraphof A. Thenp
n is
thenumberof elementsof period nin S. Indeed, eah wordw suh that
whasafatorizationisountedexatlyoneasthelabelof ayle in A.
Wehavealso
det (I Mz)=1 u(z):
Thus,theresultfollowsfromProposition6.1.
Theexpliitrelationbetweenthenumbersu
n andp
n
isthefollowing.
For eah i 1, let u (i)
= (u (i)
n )
n1
be the length distribution of X i
.
Equivalently, u (i)
n
is the oeÆient of degree n of u(z) i
. Then for eah
n1
p
n
= n
X
i=1 n
i u
(i)
n :
Wealsohaveforeahn1
p
n
=nu
n +
n 1
X
i=1 p
i u
n i
: (6.2)
Thisformulaanbeeasilydeduedfrom Formula(6.1)by takingthelog-
arithmiderivativeof eah sideof theformula. Itshowsdiretly that for
anysequene(u
n )
n1
ofnonnegativeintegers,thesequenep
n
dened by
Formula(6.1)isformedofnonnegativeintegers.
Formula(6.2)isknownasNewton'sformulaintheeldofsymmetri
funtions. Atually,thenumbersu
n
anbeonsidered,upto thesign,as
elementarysymmetrifuntionsandthep
n
asthesumsofpowers(see[26℄).
ThelinkbetweenWittvetorsandsymmetrifuntionswasestablishedin
[34℄.
Let p
n
= P
djn dl
d
. Then l
n
is the number of non-periodi irular
wordsof length n with afatorization. In termsof generating series, we
have
exp X
n1 p
n
n z
n
= Y
n1 (1 z
n
) l
n
: (6.3)
PuttingtogetherFormulae(6.1)and(6.3),weobtain
1
1 u(z)
= Y
n1 (1 z
n
) l
n
: (6.4)
Foranysequene(u
n )
n1
ofnonnegativeintegers,thesequenel=(l
n )
n1
byadiretomputationorbyaombinatorialargumentsineanysequene
u of nonnegative integers is the lengthdistribution of a irular ode on
a large enough alphabet. We denote l = (u) and we say that l is the
-transform ofthesequeneu.
We denote by '
n
(k) the number of non-periodi irular words of
lengthn onk symbols. The numbers'
n
(k)arealled the Wittnumbers.
Itislearthatthesequene('
n (k))
n1
isthe-transformofthesequene
(k n
)
n1 .
TheorrespondingpartiularaseofIdentity(6.4)
1 kz= Y
n1 (1 z
n
) 'n(k )
isknownastheylotomi identity.
ThefollowingarraysdisplayatabulationoftheWittnumbersforsmall
valuesof nand k.
n '
n (2) '
n
(3) '
n (4)
1 2 3 4
2 1 3 6
3 2 8 20
4 3 18 60
5 6 48 204
6 9 116 670
7 18 312 2340
8 30 810 8160
9 56 2184 29120
10 99 5880 104754
The value '
3
(4) = 20 is famous beause of the geneti ode: there
are preisely 20 amino-aidsoded bywordsof length 3overa 4-symbol
alphabet A,C,G,U.
Foranysequenea=(a
n )
n1 ,let
p
n
= X
djn da
n=d
d :
Thepair(a;p) isalled aWittvetor (see[30℄). Thenumbersp
n
arethe
ghost omponents. Intermsofgeneratingseries,one has
exp X
n1 p
n
n z
n
= Y
n1 (1 a
n z
n
) 1
:
Theorem 6.4. Letu=(u
n )
n1
beasequeneofnonnegativeintegers
andlet l=(l
n )
n1
bethe -transform of u. The sequene(u
n )
n1 isthe
lengthdistributionof airularode onk symbolsi forall(n1)
l
n '
n (k):
SeveralomplementstoTheorem6.4 appearin [5℄. Inpartiular,the
relation with Kraft's inequality is studied. The equality ase in Kraft's
inequalityisharaterizedintermsofthesequeneofinequalitiesabove.
There is aonnexion betweenTheorem 6.4 andKrieger'sembedding
theorem(Theorem6.1),inthesensethatTheorem6.4givesasimpleproof
ofTheorem6.1inapartiularase. Atually,letusonsiderthepartiular
aseofsubshiftofnite type,alled arenewal system.
A renewalsystemS isthe edgeshiftofagraphGmadeupof yles
sharing exatly onevertex. Suh agraph is determined by the sequene
u=(u
i )
1in
where u
i
is thenumber of loopswith length i. LetT
k be
the full shift on k symbols. Suppose that the pair formed by S and T
k
satisesthehypothesesofKrieger'stheorem. Thenumberq
n
(S)ofpoints
ofleastperiodnisnl
n
wherel=(l
n )
n1
isthe-transformofthesequene
uandq
n (T
k )=n'
n
(k). Thus, thesequene usatises thehypotheses of
Theorem 6.4. Consequently, there is irular ode X suh that u
X
= u.
Theowerautomatonof X denesanembeddingofS
G
intothefullshift
onk symbols. ThisgivesanalternativeproofofKrieger'stheoremin this
ase.
ItwouldbeinterestingtohaveaproofofKrieger'stheoremalongthe
samelinesinthegeneralase.
Tolose thissetion, wemention thefollowingopen problem: If the
sequeneuisregularandsatisestheinequalities
l
n '
n
(k) (n1);
wherel=(u),doesthereexistarationalirularodeonksymbolssuh
thatu=u
X
?
6.3. Zeta funtions. Theorem 6.1 admits the following generaliza-
tiondueto Reutenauer[32℄.
Theorem 6.5. Thezetafuntion of asosubshiftisregular.
Wehaveseenalready(Theorem6.1)thatthezetafuntionofasubshift
ofnitetypeisarationalfration,andindeedtheinverseofapolynomial.
Thestrongerstatementthatitisregularfollowsfromthefollowingformula
allowingtoomputedet(I Mz)whenMistheadjaenymatrixofann
graphG. Onehas
det(I Mz)=(1 v
1
(z))(1 v
n (z));
where v
i
(z) is thelengthdistribution ofthe set of rst returnsto statei
The proof that the zeta funtion of a so subshift is rational is a
resultofManningandBowen[27℄,[17℄. Foranexposition,see[25℄or[10℄.
Ageneralizationappearsin[15℄.
7. Aknowledgments. The authors wish to thank for the help re-
eived during the preparation of this paper. We are indebted to Julia
AbrahamsfortherefereneoftheworkofAhlswedeetal. andseveralother
reentreferenesonerningbixodes(see[1℄). Thelink betweenlength
distributionsofirularodesandsymmetrifuntionswasdislosedtous
byJaquesDesarmenienandJean-YvesThibon. WealsothankVeronique
Bruyereforimprovingourwork.
REFERENCES
[1℄ J.Abrahams,Codeandparsetreesforlosslesssoureenoding,inCompression
andComplexityofSequenes1997,B.C.etal.,ed.,IEEEComputerSoiety,
1998,pp.145{171.
[2℄ R. L.Adler, D. Coppersmith, and M.Hassner, Algorithms for slidingblok
odes,IEEETrans.Inform.Theory,IT-29(1983),pp.5{22.
[3℄ R.Ahlswede,B.Balkenhol,andL.Khahatrian,Somepropertiesof x-free
odes,Teh.Rep.039,UniversityBielefeld,1997.
[4℄ M.AignerandG.M.Ziegler,ProofsfromTheBook,Springer-Verlag,1998.
[5℄ F.Bassino,Generatingfuntionsofirularodes,Adv.inAppl.Math,22(1999),
pp.1{24.
[6℄ F. Bassino, M.-P.
B
eal, and D. Perrin, Enumerativesequenes of leaves in
rational trees,inICALP'97,no.1256inLetureNotesinComputerSiene,
Springer-Verlag,1997,pp.76{86.
[7℄ ,Enumerativesequenesofleavesandnodesinrationaltrees,Theoret.Com-
put.Si.,(1999),pp.41{60.
[8℄ , A nite state version of version of Kraft-MMillan theorem, SIAM J.
Comput.,(2000). Toappear.
[9℄ M.-P.
B
eal,CodageSymbolique,Masson,1993.
[10℄ ,Puissaneexterieured'unautomatedeterministe,appliationaualulde
lafontionfontionz^etad'un systemesoque,RAIROInform.Theor.Appl.,
29(1995),pp.85{103.
[11℄ M.-P.B
eal,F.Mignosi,andA.Restivo,Minimalforbiddenwordsandsymboli
dynamis,inSTACS'96,C.PuehandR.Reishuk,eds.,vol.1046ofLeture
NotesinComputerSiene,Springer-Verlag,1996,pp.555{566.
[12℄ M.-P.
B
eal, F. Mignosi, A.Restivo, and M.Siortino, Forbidden words in
symbolidynamis,Teh.Rep.99-15,I.G.M.,UniversitedeMarne-la-Vallee,
1999.ToappearinAdv.inAppl.Math.
[13℄ M.-P.
B
ealandD.Perrin,Symbolidynamisandniteautomata,inHandbook
ofFormalLanguages, G.Rosenbergand A.Salomaa, eds.,vol.2,Springer-
Verlag,1997,h.10.
[14℄ J.BerstelandD.Perrin,TheoryofCodes,AademiPress,1985.
[15℄ J. Berstel and C. Reutenauer, Zeta funtions of formal languages, Trans.
Amer.Math.So.,321(1990),pp.533{546.
[16℄ ,RationalSeriesandtheirLanguages,Springer-Verlag,1998.
[17℄ R. Bowen,On Axiom A dieomorphisms, inAMS-CBMS Reg. Conf.,vol.35,
Providene,1978.
[18℄ R.BowenandO.E.Lanford,Zetafuntionsofrestritionsoftheshifttrans-
formation,inPro.Symp.PureMath.AMS,vol.14,1970,pp.43{50.
Bruy
nationalColloquiumonAutomata,LanguagesandProgramming(ICALP'96),
F.MeyerandB.Monien,eds.,vol.1099,Springer-Verlag,1996,pp.24{47.
[20℄ S.Eilenberg,Automata,LanguagesandMahines,vol.A,AademiPress,1974.
[21℄ P.Flajolet,Analyti modelsand ambiguityofontext-freelanguages,Theoret.
Comput.Si.,49(1987),pp.283{309.
[22℄ G.D.Forney,B.H.Marus,N.T.Sindhushayana,andM.Trott, Amul-
tilingual ditionary: System theory, oding theory, symboli dynamis and
automatatheory,inProeedingsofSymposiainAppliedMathematis,no.50,
1995,pp.109{138.
[23℄ R.L.Graham,D.Knuth,andO.Patashnik,ConreteMathematis,Addison
Wesley,1988.
[24℄ B.P.Kithens,SymboliDynamis,Springer-Verlag,1997.
[25℄ D.A.LindandB.H.Marus,AnIntrodutiontoSymboliDynamisandCod-
ing,Cambridge,1995.
[26℄ I.G.Madonald,SymmetriFuntionsandHallPolynomials,OxfordUniversity
Press,1995.
[27℄ A.Manning,AxiomAdifeomorphismshavarationalzetafuntions,Bull.London
Math.So.,3(1971),pp.215{220.
[28℄ B. H.Marus, Fatorsand extensions of full shifts,Monats. Math, 88(1979),
pp.239{247.
[29℄ B.H.Marus,R.M.Roth,andP.H.Siegel,Constrained systemsandoding
forreordinghannels,inHandbookofCodingTheory,V.S.PlessandW.C.
Human,eds.,vol.II,NorthHolland,1998,h.20,pp.1635{1764.
[30℄ N. MetropolisandG.-C.Rota,Wittvetorsand thealgebra ofneklaes,Ad-
vanesinMath.,50(1983),pp.95{125.
[31℄ D.Perrin,Finiteautomata,inHandbookofTheoretialComputerSiene,J.van
Leeuwen,ed.,vol.B,Elsevier,1990,h.1.
[32℄ C.Reutenauer,N-rationalityofzetafuntions,Adv.inAppl.Math.,29(1997),
pp.1{17.
[33℄ A.SalomaaandM.Soittola,AutomataTheoretiPropertiesofFormalPower
Series,Springer-Verlag,1978.
[34℄ T.SharfandJ.-Y.Thibon,OnWittvetorsandsymmetrifuntions,Algebra
Colloq.,3(1996),pp.231{238.