• Aucun résultat trouvé

Thisseriesarriesimportantinformation onerninga formallanguage sineit measures in a sense the size of the language

N/A
N/A
Protected

Academic year: 2022

Partager "Thisseriesarriesimportantinformation onerninga formallanguage sineit measures in a sense the size of the language"

Copied!
23
0
0

Texte intégral

(1)

PERRIN

Abstrat. Thispaperpresentsasurveyonlengthdistributionsofregularlanguages.

Theaentisonproblemsinodingtheoryandtherelationwithsymbolidynamis.

Keywords. Regularsequenes,niteautomata,prexodes,bixodes,symboli

dynamis,zetafuntions.

1. Introdution. The notion of a length distribution for a formal

languageisasimpleone: itisthegeneratingseriesu(z)= P

n0 u

n z

n

ofthe

numberofwordsofeahlength. Thisseriesarriesimportantinformation

onerninga formallanguage sineit measures in a sense the size of the

language. It is moreover appropriate in the ase of oding. In fat, a

length-preserving enoding denes a one-to-one orrespondene between

words. Thetwosetsofwordsinsuhaorrespondenewillhavethesame

lengthdistribution.

Itisalassialresultthatthelengthdistributionofaformallanguage

arries also some information onerning the struture of the language,

in the sense that algebraioperations on series orrespond to operations

onformal languages. Thus, aswe shall see below in more detail, length

distributionswhih arerationalseriesorrespondto regularlanguages.

This orrespondene betweenoperationson series and on sets is the

basisofthemethodofgeneratingseriesinenumerativeombinatoris. Nu-

merous examples of appliations an be found in the book of Graham,

KnuthandPatashnik[23℄.

Wepresenthereasurveyonlengthdistributions offormal languages

withemphasisontheproblemsrelatedtoodingandniteautomata. We

insistonthefollowinggeneralproblem: givenafamilyF ofsets ofwords,

haraterizethe length distributions of the elements of F. For example,

the length distributions of prex odes on k-symbols are the sequenes

satisfyingKraft'sinequality

X

n0 u

n k

n

1;

i.e. u(1=k)1.

Ouremphasisisonthepropertyofregularitywhihisthedenability

by a nite automaton. This plaes ourwork at the intersetion between

Institutd'

Eletronique etd'InformatiqueGaspard-Monge, UniversitedeMarnela

Vallee, 5, Boulevard Desartes, Champs-sur-Marne, 77454 Marne la Vallee Cedex 2,

Frane. http://www-igm.univ-mlv.fr/

(2)

oding theory and automata theory. For example, one of the main re-

sults presented hereis a nite-state versionof Kraft-MMillan's theorem

haraterizingthelengthdistributions ofregularprexodes.

Wealsomakeonnexionswiththeeldofsymbolidynamis. Thisis

naturalsinethebasinotionofsymbolidynamis,namelytheonjugay

ofsubshiftsisbasedonaone-to-oneorrespondenebetweenpathsinnite

graphs,givingriseto aninvarianeofthelengthdistributions.

Our paper is organized as follows. The rst setions (Setions 2,3)

presentthebasinotionsonautomataandformalseriesusedinthepaper.

InSetion4,wepresentthenite-stateversionofKraft-MMillantheorem

mentionedabove. The partiular aseof bixodes is studied in Setion

5. The last setion (Setion 6) presents several interonneted notions

onerningsubshiftsofnitetypeandirularodes.

2. Length distributions. We onsidertheset A

ofallwordsona

given alphabet A. A subsetof A

is often alled aformal language. For

setsX;Y A

,wedenote

X+Y =X[Y;

XY =fxyjx2X;y2Yg;

X

=fx

1 x

2 x

n jx

i

2X;n0g

Wesaythatthepair(X;Y)isunambiguousifforeahz2XY thereisat

mostonepair(x;y)2XY suhthat z=xy.

Wesaythat aset of nonempty wordsX is aode ifforeahx 2X

there isat mostonesequene (x

1

;x

2

;:::;x

n

)with x

i

2X suh thatx =

x

1 x

2 x

n

(one also says that X is uniquely deipherable). A partiular

aseofaodeis aprex ode. ItisasetofwordsX suhthatnoelement

of X is aprexof anotherone. It iseasy to see that suh aset is either

reduedtotheemptywordordoesnotontaintheemptywordandisthen

aode.

The length distribution of a set of words X is the sequene u

X

=

(u

n )

n0 with

u

n

=Card(X\A n

):

Wedenotebyu

X

theformalseries

u

X (z)=

X

n0 u

n z

n

:

whihistheordinarygeneratingseriesofthesequeneu

X .

For example,thelengthdistribution ofX=A

isu(z)= 1

1 k z where

k=Card(A).

Theentropy ofaformallanguageX is

(3)

where istheradiusofonvergeneoftheseriesu

X

(z). It iswelldened

providedX isinniteandthusisnite. IfthealphabetAhaskelements,

wehaveh(X)logk.

Thefollowingresultrelatesthebasioperationsonsetswithoperations

onseries.

Proposition2.1. Thefollowingpropertieshold foranysubsetsX;Y

ofA

.

(i) If X\Y =;,then u

X+Y

=u

X +u

Y .

(ii) If the pair (X;Y)isunambiguous,thenu

XY

=u

X u

Y .

(iii) If X isaode,then u

X

=1=(1 u

X ).

Proof. Thersttwoformulaearelear. IfX isaode,everywordin

X

hasauniquedeompositionasaprodutof wordsinX. Thisimplies

that

u

X n

=(u

X )

n

andthus,

u

X

=1+u

X

++u

X

n+=1=(1 u

X ):

Example 1. ThesetX =fb;abgisaprex ode. The seriesu

X

is

u

X (z)

= 1

1 z z

2 :

Let(F

n )

n0

bethesequeneofFibonai numbersdenedbyF

0

=0, F

1

=

1,andF

n+2

=F

n+1 +F

n

. Itfollows fromthe reurrenerelationthat

z

1 z z

2

= X

n0 F

n z

n

:

Consequently, u

X (z)=

P

n0 F

n+1 z

n

. Itan also be provedby a ombi-

natorialargumentthatthe numberof wordsof lengthninX

isF

n+1 .

There are several variants of the generating series onsidered above.

Onemayrstdene

p

X (z)=

X

n0 u

n

k n

z n

;

wherek=Card(A). TheoeÆientsof z n

in p

X

(z)is theprobabilityfor

awordoflengthntobeinthesetX. Therelationbetweenu

X andp

X is

simplesinep

X (z)=u

X

(z=k). Anothervariantofthegeneratingseries is

theexponential generating seriesofthesequene(u

n )

n0

dened as

e(z)= X

n0 u

n

n!

z n

:

Wewill alsousethezetafuntionofasequene(u

n )

n1

dened as

(z)=exp X

u

n

n z

n

:

(4)

3. Regular distributions. Inthis setion, wedesribe theonne-

tionbetweenthe notionsof aregularlanguage anda rationalseries. We

provethelassialresult(Theorem3.4)haraterizingtheregularsequenes

as the length distributions of regular languages. We mention nally the

possibleextensiontomoregenerallassesofformallanguages,suhasthe

ontext-freelanguages. These resultsare well-knownin thetheory of au-

tomataandweinlude themhereforthesakeofthereader'sonveniene.

A word on the terminology used here. We use onstantly the term

regular where a riher terminology is oftenused. Inpartiular, what we

all herea regular sequene is, in Eilenberg's terminology, an N-rational

sequene (see [20℄, [33℄ or[16℄). A regularset is also alled arational or

reognizable set.

3.1. Regular sequenes. Asequeneu=(u

n )

n0

ofintegersisreg-

ularifthere existsanite graphGandtwosetsofvertiesI;T ofGsuh

thatforalln0,

u

n

=Card(P(n;I;T));

whereP(n;I;T)isthesetofpathsoflengthnfromavertexofI toavertex

ofT. ThegraphGisoneinwhihmultiples edgesareallowed(sometimes

alledamultigraph). WesaythatthegraphGreognizesthesequeneu.

Anequivalentdenitionofregularsequenesisobtainedbyonsidering

nonnegativematries.

Proposition 3.1. A sequene u=(u

n )

n0

of integers isregular i

thereexistsanonnegativematrixM2N k k

andtwovetorsl;2N k

suh

that

u

n

=lM n

;

wherelisonsideredasarow vetorand asaolumnvetor.

Proof. Let ubea regularsequene dened bya graphG onthe set

f1;:::;kgof verties. WehooseM tobetheadjaenymatrixofG,i.e.

foreahpairv;wofverties,M

v;w

isthenumberofedgesfromvtow. Let

lbe therow vetordened byl

v

=1 ifv 2 I and 0otherwise. Let be

theolumnvetordenedby

v

=1ifv2T and0otherwise. Thenumber

ofpaths oflengthn from avertexof I to avertexof T is foreahn1

equalto lM n

.

Conversely,let Gbe thegraphwith adjaeny matrixM. Sine the

familyofregularsequenesislosedunder addition,wemaysupposethat

thevetorsl; have0;1oeÆients. Weanthenonsiderl;asthehar-

ateristivetorsofsetsI;T ofverties. Itisthenobviousthat thegraph

thusonstrutedreognizesu.

Example2. LetGbethe graphof Figure 1. Thenumber ofpaths of

lengthnfrom vertexi=1tovertex t=2isthe Fibonai numberF .

(5)

1

Æ

6

2

Æ

?

Fig.1. TheFibonaigraph.

Aordingly, letM be thematrix

M=

1 1

1 0

:

Thesamesequeneisdenedby the equation

F

n

=

1 0

M n

0

1

:

Wesaythatasequeneuofintegersis rational ifu(z)=p(z)=q(z)for

somepolynomialsp(z);q(z)withintegeroeÆients. The followingresult

islassial.

Theorem 3.1. Any regular sequene u of nonnegative integers is

rational.

Proof. Let(l;M;)besuhthatu

n

=lM n

. Wehave

u(z)= X

n0 lM

n

z n

=l(

X

n0 (Mz)

n

)=l(I Mz) 1

:

TheresultfollowssinetheoeÆientsof(I Mz) 1

arerationalfrations.

Example 3. The generating funtion of the Fibonai sequeneis

F(z)= z

1 z z

2 :

TheonverseofTheorem3.1isnottrue. Wehaveatuallythefollow-

ingresult,due toJeanBerstel(see[20℄or[16℄).

Theorem 3.2. Foranyregularsequeneu,thereisanintegerpsuh

that the set of poles of minimal modulusis the setof omplex numbers "

where isthe radiusof onvergeneof uand"

p

=1for somep1.

Inpartiular,theradiusofonvergeneisapole.

The following example(from [20℄ Example 6.1, Chapter VIII) shows

theexisteneofrationalserieswithnon-negativeintegeroeÆientswhih

are notregular.

Example 4. Let0<<=2besuhthat os=a=with0<a<

and6=2a. Thesequene

u = 2n

os 2

n

(6)

isrational butnotregular(poles: 1;e 2i

;e 2i

).

Asequeneuisamergeofsequenes

u (0)

;:::;u (p 1)

ifforn0;0i<p,

u

pn+i

=u (i)

n :

We say that a pole of a rational series is dominating if it is stritly less

thanthemodulusofallotherones. Thefollowingresultisdueto Soittola

(see[33℄).

Theorem 3.3. A sequeneof non-negativeintegersisregulariitis

anmergeof rational sequeneswith adominatingpole.

Example5. Thesequene

1;1;2;1;4;2;8;3;16;5;:::

isthe mergeofthe sequeneofpowersof 2andthe Fibonai sequene.

Athirdequivalentdenitionofregularsequenesispossible. Onean

indeed showthat aseries u(z)is regulari itanbeobtainedbyanite

numberofoperationsofsum,produtandstarwith

u

(z)= 1

1 u(z)

;

startingfrompolynomialswithnonnegativeintegeroeÆients. Anexpres-

sionofthisformisusuallyalled aregularexpression.

Example 6. The sequene(0;1;3;8;21;:::)formedof the Fibonai

numbersof evenindex isregular. Indeedwehave

F

2n

=lM 2n

withthe triple (l;M;) ofExample 2. Wehave

M 2

=

2 1

1 1

;

andthusF

2n

isthenumberofpathsof lengthnfrom1to2inthe graphof

Figure2. The seriess(z)= P

n0 F

2n z

n

anaordingly bewritten

s(z)=z(2z+z 2

z

)

=

z(1 z)

2 :

(7)

1

Æ

-

2

Æ

6

Fig.2. OneeveryotherFibonainumber

3.2. Finite automata. Wepresent hereabrief introdution to the

oneptsusedinautomatatheory. Forageneralreferene,see[31℄or[20℄.

An automatonoverthealphabetAisomposedofasetQofstates,a

set EQAQofedges ortransitionsandtwosetsI;T Qofinitial

andterminalstates.

A path intheautomatonAisasequene

(p

1

;a

1

;p

2 );(p

2

;a

2

;p

3

);:::;(p

n

;a

n

;p

n+1 )

of onseutive edges. Its label is the word x = a

1 a

2 a

n

. A path is

suessful ifit startsin aninitial stateandends in aterminalstate. The

set reognizedbytheautomatonisthesetoflabelsofitssuessfulpaths.

An automatonis deterministi if, for eah state pand eah letter a,

thereisat mostoneedgewhihstartsatpandis labeledbya. Theterm

rightresolving isalsoused.

1

Æ

?

Æ

?

2

Æ

b

a

b

Fig.3. Goldenmeanautomaton.

Example 7. Let A be the automaton given in Figure 3 with 1 as

uniqueinitial and terminalstate. It reognizes the set X

where X isthe

prexode X =fb;abg:

A set ofwordsX overA is regularif itanbereognizedbyanite

automaton.

Itisalassialresultthatasetofwordsisregulariitanbeobtained

byanitenumberofoperationsunion,produtandstar,startingformthe

nitesets.

Thefollowingresultisalsolassial.

Proposition 3.2. Every regular set an be reognized by a nite

deterministiautomatonhaving auniqueinitial state.

Proof. LetA=(Q;E;I;T)beaniteautomatonoverA reognizing

aset X. Let B=(R;F;fIg;T)betheautomatondened asfollows. Its

statesarethesubsets

u

(8)

foralluin A

. SineQis nite,there isanitenumberofsubsetsQ(u).

TheedgesofBarealltriples

(Q(u);a;Q(ua)):

Thesetofterminalstatesis

T =fU 2RjU\T 6=;g:

ItiseasytoverifythatB isdeterministiandreognizesX.

Theorem3.4. Thelengthdistributionsofregularsetsarethe regular

sequenes.

Proof. LetXbearegularset. ByProposition3.2,itanbereognized

byadeterministiautomatonA. SineAisdeterministi,thereisatmost

onepath with givenlabel, originand end. Thus the numberof paths of

lengthn from theinitial stateto aterminal stateisequal tothe number

u

n

ofwordsofX oflengthn.

Conversely, let u be a regular sequene enumerating the paths in a

graphGfromI to T. WeonsiderthegraphGasanautomatonwithall

edgeswithdistintlabels. LetX bethesetoflabelsofpathsfromI toT.

ThesequeneuisthelengthdistributionofthesetX.

Example8. If X=a

b,then

u

X (z)=

z

1 z :

3.3. Beyond regular sequenes. There areseveral naturallasses

ofseriesbeyondtherationalones. Thealgebraiseriesarethosesatisfying

analgebraiequation. Moregenerally,thehypergeometriseriesarethose

suhthatthequotientoftwosuessivetermsisgivenbyarationalfration

(see[23℄).

Thelassofalgebraiseriesislinkedwiththelassofontext-freesets

(see [21℄). A typial example of aontext-free set is the set of wordson

thebinary alphabet fa;bghaving asmanya's asb's. We omputebelow

itslengthdistributionwhih isanalgebraiseries.

Example 9. The setof wordson A=fa;bghaving anequal number

of ourrenes ofa andb is asubmonoidof A

generated bya prex ode

D. Sine any wordof D

of length2nisobtainedby hoosing npositions

among2n, wehave

u

D (z)

= X

n0

2n

n

z 2n

:

Byasimple appliation ofthe binomial formula,we obtain

u (z)

=(1 4z 2

) 1

2

:

(9)

Thisfollows indeed, usingthe simple identity

1

2

n

= 1

( 4) n

2n

n

:

Wehaveu

D

(z)=1 1=u

D

(z)andthus

u

D (z)=1

p

1 4z 2

:

Thus u

D

(z)isan algebrai series, solutionof theequation

f 2

2f+4z 2

=0:

4. A nite-state version of the Kraft-MMillan theorem. Let

X beaprexodeonanalphabetwith ksymbols. It islassialthat its

lengthdistributionu=(u

n )

n1

satisesKraft'sinequality

X

n1 u

n k

n

1;

orequivalentlyu(1=k)1. Thenumberu(1=k)anatuallybeinterpreted

astheprobabilitythatalongenoughwordhasaprexinX.

Thereisalsoaonnexionwiththenotionofentropy. Atually,ifX is

aprexode,theentropyofX

isequaltolog(1=)whereisthesolution

oftheequationu

X

()=1. ThusKraft'sinequalityexpressesthefatthat

h(X

)logk.

Conversely, Kraft-MMillan's theorem states that for any suh se-

quene u=(u

n )

n1

, thereexists aprexode X onak-symbolalphabet

suhthat u=u

X .

Let us briey desribe the proof. We suppose by indution to have

alreadybuiltaprexodeX formedofwordsoflengthatmostn 1with

lengthdistribution(u

1

;u

2

;:::;u

n 1

)onthealphabetA

k

=f0;1;:::;k

1g. Wehave

n

X

i=1 u

i k

i

1;

andthus

n

X

i=1 u

i k

n i

k n

:

Thisallowsustohooseu

n

wordsonthealphabet A

k

oflengthnwithout

aprex in X. Forthe sakeofaompletedesriptionof theonstrution,

(10)

nwhihdonothavealreadyaprexin X. Apossiblepoliy isto hoose

theearlierones inthealphabeti order.

TheequalityaseinKraft'sinequalityorrespondstoapartiularlass

ofprexodesoftenalledomplete. AprexodeX onthealphabetAis

ompleteifanywordonAhaseitheraprexin X orisaprexofaword

ofX.

Thenotionofaprexodeisrelatedtothenotionofatree. Aprex

odeonk symbolsorrespondstoak-arytree. Thelengthdistributionof

theprex odeis theenumerativesequene of theleavesofthe tree. We

allitthelengthdistributionofthetree. Usually,theinterestisfousedon

nitetrees,asin Humanalgorithm forexample.

Weareinterestedhereintheaseofinnitetreesand,moreespeially

of regular trees arising from prex odes whih are regular, in the sense

denedabove. Thenotionofaregulartreeanalso bedeneddiretly as

aninnitetreewithonlyanitenumberofnon-isomorphisubtrees.

ByTheorem 3.4, ifX is regular, then thesequene u

X

is also regu-

lar. Thefollowingresultshowsthat onverselytheonjuntionofthetwo

onditions(of being regularand to satisfyKraft'sinequality)issuÆient

toensuretheexisteneofaregularprexodeonak-symbolalphabet.

Theorem 4.1. A sequene uof integers is the lengthdistribution of

aregularprexode on ksymbols i

(i) itisregular.

(ii) itsatisesKraft'sinequality u(1=k)1.

The essene of this result is a onstrutive method allowing one to

buildtheregularprexodeX giventhesequeneu.

Two simple methods ome to mind at rst glane. The rst one is

toapply diretlytheproofoftheKraft'stheorem. Thefollowingexample

showsthattheresultneednotbearegularset,althoughthesequeneuis

itselfregular.

Example 10. Letu(z)=z 2

=(1 2z 2

). Sine u(1=2)=1=2, wemay

applythe Kraftonstrution tobuildabinary treewithlength distribution

u. The resultisthe set

X = [

n0 01

n

0f0;1g n

whih isnotregular.

The seond method takes into aount the hypothesis that the se-

quene is regular. It will fail in its naive version but the solution is a

renementof this idea. LetG be agraphsuh that u

n

isthe numberof

pathsoflengthnfromI toT. WeannormalizethegraphGtoobtaina

graphsuhthatI =fig,T=ftgandthatnoedgegoesoutoft. Welabel

eah edge in suh a way that edges with a ommon start have dierent

labels. Theset reognized by theautomatonthus onstrutedis aprex

(11)

The trouble is that the number of symbols used may well be larger

thank asshownbythefollowingexample.

Example 11. Let u be the regular sequene given by the graph of

Figure4ontheleftwithi=1andt=4. Wehavealsou(z)=3z 2

=(1 z 2

).

Furthermoreu(1=2)=1andthususatisesKraft'sequality. Howeverthere

are four edges going out of vertex 2 and the method desribed above fails

tobuild abinary prexode. Asolution onA=fa;bgisthe regularprex

ode

X =(aa)

(ab+ba+bb):

Theorresponding automaton isgiven onFigure4on theright.

1

Æ

-

2

Æ

3

Æ

4

Æ

-

1

Æ

-

3

Æ

4

Æ

-

2

Æ b

b

a a

a

b

Fig.4. Graphsreognizingu(z)=3z 2

=(1 z 2

).

The proof of Theorem 4.1 onsists in building a new graph with all

verties of outdegree at most k. It relies on a transformation alled the

multiset onstrution desribed in [8℄. Theproofuses thefollowingom-

binatorial lemma also used in symboli dynamis by Adler and Marus

[28℄,[2℄,andquotedin [4℄asanievariantofthepigeon-holepriniple.

Lemma 4.1. Let k

1

;k

2

;:::;k

n

be positive integers. Then there is a

subsetSf1;2;:::;ngsuhthat P

s2S k

s

isdivisible by n.

Thegraphobtainedisshownin anexamplebelow.

Example 12. Let

u(z)= z

2

1 z 2

+ z

2

1 5z 3

: (4.1)

We have u(1=2) = 1. A regular binary tree with length distribution u is

givenin Figure 5(notethat,byonvention,avertexlabeled v hasitssons

represented only one on the gure. Thus, for example the vertex labeled

1 on the right has the same sons as the root. The leaves of the tree are

indiatedby ablakbox).

To hek that the lengthdistribution is equal to u, one may ompute

from the graph the following regular expression of u and hek by an el-

ementary omputation (possibly with the help of a symboli omputation

system) thatitisequal tou.

6 2 4 5 6 2 5 3 3

(12)

1 l

2 l

3 l

4 l

5 l

6

l

7 l

8 l

9 l

10 l

11 l

12 l

10 l

13 l

12 l

12 l

14 l

5 l

5 l

5 l

5 l

1 l

Fig.5. Regularbinarytreewithlengthdistributionu.

(noteforareaderunfamiliarwithregularexpressions: therstfator(z 6

)

orresponds to the vertex labeled 1 at level 6of the tree. The term 2z 2

+

z 4

+2z 5

+z 6

orresponds to the leaves reahed by a path whih does not

useavertexlabeled5. Thefator(z 2

+3z 5

)(5z 3

)

orrespondstothepaths

from the rootto avertex labeled 5. Finally, the fator 3z 3

orresponds to

the diretpathsfrom 5toa leaf.)

This example (suggested to us by Christophe Reutenauer) shows an

interestingfeatureofthisproblem. Infat,fromthepointofviewofregular

expressions, the diÆult operation inthis problem isthe sum. It wouldbe

a simple matter to build a rational tree for eah term of the sum in the

expression (12) (see Example 11). The diÆulty would then be to merge

these twotrees toobtainoneorresponding tothesum.

A urious onsequene of Theorem 4.1 is the following property of

regularsequenes.

Corollary4.1. Letk2beanintegerandletuberegularsequene

suh that u(1=k)1 and u(0)=0. Then there existk regular sequenes

u

1

;:::;u

k

suhthat u

i

(1=k)1and

u(z)= k

X

i=1 zu

i (z):

Proof. It isa simpleonsequene of Theorem 4.1. Indeed, ifX is a

regular prex ode on the k element alphabet A, then X = P

a2A aX

a

whereeahX

a

isaregularprexodeonthealphabetA.

(13)

5. Bixodes. Weinvestigateherethelengthdistributionsofapar-

tiular lass of prex odes, alled bix. Several other lasses of prex

odesouldgiveriseto asimilarstudy(foradesriptiontothese lasses,

see[19℄).

ThedenitionofasuÆxodeissymmetritothedenitionofaprex

ode. ItisasetofwordsX suhthatnoelementofX isasuÆxofanother

one. Thenotionof aompletesuÆxodeis alsosymmetri. A bixode

isasetX ofwordswhih isbothaprexandasuÆxode.

Anysetofwordsofxedlengthisobviouslyabixodebutthereare

moreompliatedexamples.

e

e

e

e

e

e

e

e

a

b

a

b

a

b

a

b

a

b

a

b a

b

a

b

Fig.6. ThebixodeX.

Example 13. Theset

X =faaa;aaba;aabb;ab;baa;baba;babb;bba;bbbg

is a ompleteprex ode pituredin Figure6. It isalso aomplete suÆx

ode as onemayhekbyreadingitswordsbakwards.

Surprisingly, it is an open problem to haraterize the lengthdistri-

butions of bixodes. The followingsimpleexampleshowsthat theyare

moreonstrainedthanthoseofprexodes.

Example 14. The sequene u(z) = z+2z 2

is not realizable as the

lengthdistributionofabixodeonabinaryalphabetalthoughu(1=2)=1.

Indeed,oneofthe symbols hastobeinX,say a. Then bbisthe onlyword

oflength2thatanbeadded.

The following nie partial result is due to Ahlswede, Balkenhol and

Khahatrian [3℄. We state the result for a binary alphabet. It an be

(14)

Theorem 5.1. Forany integersequeneusuhthat

u(1=2)1=2;

thereisabixode X suhthat u=u

X .

Proof. Theproofis by indution. Wesuppose that wehavealready

builtabixodeX formed ofwordsoflengthat mostn 1with length

distribution(u

1

;u

2

;::: ;u

n 1

). Wehave

n

X

i=1 u

i 2

i

1=2;

andthus

2 n

X

i=1 u

i 2

n i

2 n

:

Finally,weobtain

u

n 2

n

2 n 1

X

i=1 u

i 2

n i

:

The expression of the right handside is at most equal to the number of

elements of theset A n

XA

A

X. Thus, weanhooseu

n

wordsof

lengthnwhihdonothaveaprexorasuÆxinX. Thisprovestheresult

byindution.

Theauthorsof [3℄ formulatetheinterestingonjeturethat Theorem

5.1isstilltrueifthehypothesisu(1=2)1=2isreplaedbyu(1=2)3=4.

Thereareknownadditionalonditionsimposedonlengthdistributions

ofbixodes. Forexample,onehasthefollowingresult,originally dueto

Shutzenberger(see[14℄).

Theorem 5.2. IfX isaniteompletebixode onksymbols,then

u

X

(1=k)=1and 1

k u

0

X

(1=k)isaninteger.

Thenumber 1

k u

0

X

(1=k)anbeinterpretedastheaveragelengthofthe

wordsofX. Indeed

zu 0

X (z)=

X

x2X jxjz

jxj

:

Example15. Forthe bixode ofExample 13,wehave

u

X (z)=z

2

+4z 3

+4z 4

andthus

u 0

X

(z)=2z+12z 2

+16z 3

:

Hene 1

2 u

0

X

(1=2)=3: The onditionsofTheorem 5.2 show diretly that

thesequene of Example14 is notrealizable. Indeed, itsatises therst

onditionbut nottheseondone. Theonditionsof Theorem5.2are not

suÆient. Indeed,ifu(z)=z+4z 3

wehaveu(1=2)=1and u 0

(1=2) =4

althoughitislearlyimpossiblethatu=u forabixodeX.

(15)

6. Zeta funtions, subshifts ofnite type and irular odes.

Inthissetion,wepresentanumberofresultsoninterrelatedobjetswhih

are onneted with yli permutation of words. We begin with notions

lassialin symbolidynamis (see[25℄or[24℄for ageneralreferene;see

[13℄or[22℄forthelinkwithnite automata).

6.1. Subshiftsofnitetype. Asubshift isasetofbiinnitewords

onanitealphabetAwhihavoidsagivensetF offorbiddenwords. Itis

atopologialspaeas alosedsubset ofthespaeA Z

offuntionsfrom Z

intothesetA. ThefullshiftonAisthesetofallbiinnitewordsonA. It

orrespondstotheaseF =;.

A so subshift is the set of biinnite labels of paths in a nite au-

tomaton. A so subshift is alled irreduible if the automaton an be

hosenstronglyonneted. Asubshift of nite type isthe setof biinnite

words avoiding a nite set of nite words. Any subshift of nite type is

sobuttheonverseisnottrue. Theedgeshift ofanitegraphGisthe

setS

G

ofbiinnitepathsinG(viewedasbiinnitesequenesofedges). It

isasubshiftofnitetype.

The shift is thefuntion on asubshift S whih maps apointx to

thepointy=(x)whoseith oordinateisy

i

=x

i+1 .

AmorphismfromasubshiftSintoasubshiftTisafuntionf :S!T

whihisontinuousandinvariantundertheshift. Abijetivemorphismis

alled aonjugay. Any subshift of nite typeis onjugateto some edge

shift.

Theentropy h(S)ofasubshiftS istheentropyoftheformallanguage

formed by the nite bloks ourring in words of S. It an be shown

thattheentropyisatopologialinvariant,inthesensethattwoonjugate

subshiftshavethesameentropy.

While the entropy is a measure of number of forbidden words, it is

possibleto studythenumberofminimal forbiddenwords. It givesriseto

anotherinvariantofsubshifts[11℄, [12℄.

An integerpis a period of apointx =(a

n )

n2Z if a

n+p

=a

n for all

n2Z. Equivalently, pisaperiod ofx if p

(x)=x. Thezetafuntion of

asubshiftS,isdened astheseries

(S)=exp X

n1 p

n

n z

n

wherep

n

isthenumberofwordswithperiodninS. Itisalsoatopologial

invariant,sinea pointof period n is mapped by aonjugayon apoint

ofthesameperiod.

The following result due to Bowen and Lanford [18℄ is lassial (see

[25℄).

Proposition6.1. LetGbeanitegraphandletM bethe adjaeny

matrixofG. Then

(S )=det(I Mz) 1

:

(16)

Proof. Wersthaveforeahn1

Tr(M n

)=p

n

sinetheoeÆient(i;j)ofM n

isthenumberofpathsfromitoj. Thus

(S

G )=exp

X

n1 p

n

n z

n

=exp X

n1 Tr(M

n

)

n z

n

=expTr(log (I Mz) 1

)

=det(I Mz) 1

sine,bytheformulaofJaobi,expTr=detexp.

Example16. LetS bethe edge shiftofthe graphGofFigure7. We

have

M= 2

4

1 1 0

0 0 1

1 0 0 3

5

:

Consequently

(S)= 1

1 z z 3

:

1

Æ

2

Æ

3

Æ

Fig.7. Asubshiftofnitetype

LetS beasubshift ofnite typeandletp

n

bethenumberofpoints

withperiodn. Letq

n

bethenumberof pointswithleast periodn. Sine

q

n

isamultiple ofn, wealsodenote q

n

=nl

n

. Wehavethentheformula

expressingthezetafuntion asaninniteprodutusingtheintegersl

n as

exponents.

(S)=

n1 (1 z

n

) ln

;

asonemayverifyusingp

n

= P

dl

d

andthedenition of(S).

(17)

Alassialresult,relatedwithwhatfollows,isthefollowingstatement,

knownasKrieger'sembeddingtheorem.

Theorem 6.1. Let S;T betwo subshifts of nite type. There exists

an injetive morphism f :S!T withf(S)6=T i

1. h(S)<h(T)

2. for eah n 1, q

n

(S) q

n

(T) where q

n

(S) (resp. q

n

(T)) is the

number ofpointsof S (resp. T)of leastperiodn.

Thefollowingresultisthebasisofmanyappliationsof symbolidy-

namisto oding. Itisdue toAdler,Coppersmith andHassner[2℄.

Theorem 6.2. If S is an irreduible subshiftof nite type suhthat

h(S)logk,itisonjugatetoasubshiftofnitetype S

G

wherethegraph

G hasoutdegreeatleastk.

The proof is basedon a state-splitting algorithm using approximate

eigenvetorsand Lemma4.1. This resultispartofanumberof onstru-

tionsleadingtoslidingblokodesusedinmagnetireording(see[29℄,[9℄

or[25℄). Itgivesat thesametimethefollowingresult.

Theorem 6.3. ItS isasubshiftof nitetypesuhthath(S)logk,

thenthereisagraphGof outdegreeatmost ksuhthat S is onjugateto

S

G .

There is a onnexion between this theorem and Theorem 4.1. Let

indeed u be a regularsequene of integerssuh that u(1=k) 1. Let G

be a normalized graph reognizing u (in the sense of Setion 4). Let

G

be the graphobtainedby merging theinitial and terminal vertex. Then

h(S

G

) logk. We an apply Theorem 6.3 to obtain a graph H with

outdegreeat mostk suh that S

G and S

H

are onjugate. This givesthe

onlusionofTheorem4.1providedtheinitial-terminalvertexdidnotsplit

in theonstrution. Thefollowingexamplesshowbothases(fordetails,

see[6℄and[7℄).

Example 17. Let Gbethe graph ofFigure4. The splittingof vertex

2 gives a graph of outdegree 2. A normalization gives the automaton on

theright.

Example18. ThesequeneofExample12 isreognizedbyagraphG

suhthat

Ghas threeylesof length2. Thesolution asabinarytreehas

onlytwoylesoflength2andthusouldnotbeobtainedbystate-splitting.

6.2. Cirularodes. Airularword,orneklae,istheequivalene

lassof awordunder yli permutation. Fora wordw, we denoteby w

theirularwordrepresentedbyw.

LetX beasetofwordsand w=x

1 x

2 x

n withx

i

2X. Theset of

ylipermutationsofthesequene(x

1

;x

2

;:::;x

n

)isalledafatorization

oftheirularwordw.

A irular ode is a set X of words suh that the fatorization of

irularwordsisunique.

Example 19. The set X =fa;abag is a irular ode. Indeed, the

(18)

Example 20. The set X = fab;bag is not airular ode. Indeed,

the irular word w for w = abab has two fatorizations namely (ab;ab)

and(ba;ba).

Thefollowingharaterizationisuseful(see[14℄).

Proposition 6.2. A set X is a irular ode if and only if it is a

ode andfor allu;v2A

,

uv;vu2X

)u;v2X

Example 21. We obtain another way to prove that the set X =

fab;bagisnota irularode. Indeed, otherwise wewouldhave a;b2X

whih isontraditory.

LetX beanite ode. Theower automaton of X, denoted A

X , is

thefollowingautomaton. Thesetofitsstatesis

Q=f(u;v)2A +

A +

juv2Xg[(1;1)

The transitions are of the form (u;av) a

! (ua;v) or (1;1) a

! (a;v) or

(u;a) a

!(1;1). Theuniqueinitialandnalstateis(1;1).

Example 22. The ower automaton of the irularode fa;abag is

pituredinFigure8.

1

Æ

?

Æ

?

2

Æ

3

Æ a

a

b

a

Fig.8. Theowerautomatonoffa;abag.

Thefollowingresultiseasytoprove.

Proposition 6.3. The ower automaton A

X

reognizes X

. The

odeX isirularifor eahwordw,thereisatmostoneylewithlabel

w.

Wenowstudythe lengthdistributions ofirular odes. Let X bea

irular ode and letu

( z)=(u

n )

n1

be itslengthdistribution. For eah

n 1, let p

n

be the numberof words w of length n suh that w has a

fatorizationinwordsofX.

Proposition6.4. Thesequenes(p

n )

n1 and(u

n )

n1

arerelatedby

exp X

p

n

n z

n

= 1

1 u(z)

: (6.1)

(19)

Proof. Eah (p

n

) depends only on the rst n terms of the sequene

(u

n

). It is therefore possible to suppose that the sequene (u

n

) is nite,

i.e. that theodeX isnite. LetAbetheowerautomatonofX. LetS

be thesubshift ofnite typeassoiatedwith thegraphof A. Thenp

n is

thenumberof elementsof period nin S. Indeed, eah wordw suh that

whasafatorizationisountedexatlyoneasthelabelof ayle in A.

Wehavealso

det (I Mz)=1 u(z):

Thus,theresultfollowsfromProposition6.1.

Theexpliitrelationbetweenthenumbersu

n andp

n

isthefollowing.

For eah i 1, let u (i)

= (u (i)

n )

n1

be the length distribution of X i

.

Equivalently, u (i)

n

is the oeÆient of degree n of u(z) i

. Then for eah

n1

p

n

= n

X

i=1 n

i u

(i)

n :

Wealsohaveforeahn1

p

n

=nu

n +

n 1

X

i=1 p

i u

n i

: (6.2)

Thisformulaanbeeasilydeduedfrom Formula(6.1)by takingthelog-

arithmiderivativeof eah sideof theformula. Itshowsdiretly that for

anysequene(u

n )

n1

ofnonnegativeintegers,thesequenep

n

dened by

Formula(6.1)isformedofnonnegativeintegers.

Formula(6.2)isknownasNewton'sformulaintheeldofsymmetri

funtions. Atually,thenumbersu

n

anbeonsidered,upto thesign,as

elementarysymmetrifuntionsandthep

n

asthesumsofpowers(see[26℄).

ThelinkbetweenWittvetorsandsymmetrifuntionswasestablishedin

[34℄.

Let p

n

= P

djn dl

d

. Then l

n

is the number of non-periodi irular

wordsof length n with afatorization. In termsof generating series, we

have

exp X

n1 p

n

n z

n

= Y

n1 (1 z

n

) l

n

: (6.3)

PuttingtogetherFormulae(6.1)and(6.3),weobtain

1

1 u(z)

= Y

n1 (1 z

n

) l

n

: (6.4)

Foranysequene(u

n )

n1

ofnonnegativeintegers,thesequenel=(l

n )

n1

(20)

byadiretomputationorbyaombinatorialargumentsineanysequene

u of nonnegative integers is the lengthdistribution of a irular ode on

a large enough alphabet. We denote l = (u) and we say that l is the

-transform ofthesequeneu.

We denote by '

n

(k) the number of non-periodi irular words of

lengthn onk symbols. The numbers'

n

(k)arealled the Wittnumbers.

Itislearthatthesequene('

n (k))

n1

isthe-transformofthesequene

(k n

)

n1 .

TheorrespondingpartiularaseofIdentity(6.4)

1 kz= Y

n1 (1 z

n

) 'n(k )

isknownastheylotomi identity.

ThefollowingarraysdisplayatabulationoftheWittnumbersforsmall

valuesof nand k.

n '

n (2) '

n

(3) '

n (4)

1 2 3 4

2 1 3 6

3 2 8 20

4 3 18 60

5 6 48 204

6 9 116 670

7 18 312 2340

8 30 810 8160

9 56 2184 29120

10 99 5880 104754

The value '

3

(4) = 20 is famous beause of the geneti ode: there

are preisely 20 amino-aidsoded bywordsof length 3overa 4-symbol

alphabet A,C,G,U.

Foranysequenea=(a

n )

n1 ,let

p

n

= X

djn da

n=d

d :

Thepair(a;p) isalled aWittvetor (see[30℄). Thenumbersp

n

arethe

ghost omponents. Intermsofgeneratingseries,one has

exp X

n1 p

n

n z

n

= Y

n1 (1 a

n z

n

) 1

:

(21)

Theorem 6.4. Letu=(u

n )

n1

beasequeneofnonnegativeintegers

andlet l=(l

n )

n1

bethe -transform of u. The sequene(u

n )

n1 isthe

lengthdistributionof airularode onk symbolsi forall(n1)

l

n '

n (k):

SeveralomplementstoTheorem6.4 appearin [5℄. Inpartiular,the

relation with Kraft's inequality is studied. The equality ase in Kraft's

inequalityisharaterizedintermsofthesequeneofinequalitiesabove.

There is aonnexion betweenTheorem 6.4 andKrieger'sembedding

theorem(Theorem6.1),inthesensethatTheorem6.4givesasimpleproof

ofTheorem6.1inapartiularase. Atually,letusonsiderthepartiular

aseofsubshiftofnite type,alled arenewal system.

A renewalsystemS isthe edgeshiftofagraphGmadeupof yles

sharing exatly onevertex. Suh agraph is determined by the sequene

u=(u

i )

1in

where u

i

is thenumber of loopswith length i. LetT

k be

the full shift on k symbols. Suppose that the pair formed by S and T

k

satisesthehypothesesofKrieger'stheorem. Thenumberq

n

(S)ofpoints

ofleastperiodnisnl

n

wherel=(l

n )

n1

isthe-transformofthesequene

uandq

n (T

k )=n'

n

(k). Thus, thesequene usatises thehypotheses of

Theorem 6.4. Consequently, there is irular ode X suh that u

X

= u.

Theowerautomatonof X denesanembeddingofS

G

intothefullshift

onk symbols. ThisgivesanalternativeproofofKrieger'stheoremin this

ase.

ItwouldbeinterestingtohaveaproofofKrieger'stheoremalongthe

samelinesinthegeneralase.

Tolose thissetion, wemention thefollowingopen problem: If the

sequeneuisregularandsatisestheinequalities

l

n '

n

(k) (n1);

wherel=(u),doesthereexistarationalirularodeonksymbolssuh

thatu=u

X

?

6.3. Zeta funtions. Theorem 6.1 admits the following generaliza-

tiondueto Reutenauer[32℄.

Theorem 6.5. Thezetafuntion of asosubshiftisregular.

Wehaveseenalready(Theorem6.1)thatthezetafuntionofasubshift

ofnitetypeisarationalfration,andindeedtheinverseofapolynomial.

Thestrongerstatementthatitisregularfollowsfromthefollowingformula

allowingtoomputedet(I Mz)whenMistheadjaenymatrixofann

graphG. Onehas

det(I Mz)=(1 v

1

(z))(1 v

n (z));

where v

i

(z) is thelengthdistribution ofthe set of rst returnsto statei

(22)

The proof that the zeta funtion of a so subshift is rational is a

resultofManningandBowen[27℄,[17℄. Foranexposition,see[25℄or[10℄.

Ageneralizationappearsin[15℄.

7. Aknowledgments. The authors wish to thank for the help re-

eived during the preparation of this paper. We are indebted to Julia

AbrahamsfortherefereneoftheworkofAhlswedeetal. andseveralother

reentreferenesonerningbixodes(see[1℄). Thelink betweenlength

distributionsofirularodesandsymmetrifuntionswasdislosedtous

byJaquesDesarmenienandJean-YvesThibon. WealsothankVeronique

Bruyereforimprovingourwork.

REFERENCES

[1℄ J.Abrahams,Codeandparsetreesforlosslesssoureenoding,inCompression

andComplexityofSequenes1997,B.C.etal.,ed.,IEEEComputerSoiety,

1998,pp.145{171.

[2℄ R. L.Adler, D. Coppersmith, and M.Hassner, Algorithms for slidingblok

odes,IEEETrans.Inform.Theory,IT-29(1983),pp.5{22.

[3℄ R.Ahlswede,B.Balkenhol,andL.Khahatrian,Somepropertiesof x-free

odes,Teh.Rep.039,UniversityBielefeld,1997.

[4℄ M.AignerandG.M.Ziegler,ProofsfromTheBook,Springer-Verlag,1998.

[5℄ F.Bassino,Generatingfuntionsofirularodes,Adv.inAppl.Math,22(1999),

pp.1{24.

[6℄ F. Bassino, M.-P.

B

eal, and D. Perrin, Enumerativesequenes of leaves in

rational trees,inICALP'97,no.1256inLetureNotesinComputerSiene,

Springer-Verlag,1997,pp.76{86.

[7℄ ,Enumerativesequenesofleavesandnodesinrationaltrees,Theoret.Com-

put.Si.,(1999),pp.41{60.

[8℄ , A nite state version of version of Kraft-MMillan theorem, SIAM J.

Comput.,(2000). Toappear.

[9℄ M.-P.

B

eal,CodageSymbolique,Masson,1993.

[10℄ ,Puissaneexterieured'unautomatedeterministe,appliationaualulde

lafontionfontionz^etad'un systemesoque,RAIROInform.Theor.Appl.,

29(1995),pp.85{103.

[11℄ M.-P.B

eal,F.Mignosi,andA.Restivo,Minimalforbiddenwordsandsymboli

dynamis,inSTACS'96,C.PuehandR.Reishuk,eds.,vol.1046ofLeture

NotesinComputerSiene,Springer-Verlag,1996,pp.555{566.

[12℄ M.-P.

B

eal, F. Mignosi, A.Restivo, and M.Siortino, Forbidden words in

symbolidynamis,Teh.Rep.99-15,I.G.M.,UniversitedeMarne-la-Vallee,

1999.ToappearinAdv.inAppl.Math.

[13℄ M.-P.

B

ealandD.Perrin,Symbolidynamisandniteautomata,inHandbook

ofFormalLanguages, G.Rosenbergand A.Salomaa, eds.,vol.2,Springer-

Verlag,1997,h.10.

[14℄ J.BerstelandD.Perrin,TheoryofCodes,AademiPress,1985.

[15℄ J. Berstel and C. Reutenauer, Zeta funtions of formal languages, Trans.

Amer.Math.So.,321(1990),pp.533{546.

[16℄ ,RationalSeriesandtheirLanguages,Springer-Verlag,1998.

[17℄ R. Bowen,On Axiom A dieomorphisms, inAMS-CBMS Reg. Conf.,vol.35,

Providene,1978.

[18℄ R.BowenandO.E.Lanford,Zetafuntionsofrestritionsoftheshifttrans-

formation,inPro.Symp.PureMath.AMS,vol.14,1970,pp.43{50.

Bruy

(23)

nationalColloquiumonAutomata,LanguagesandProgramming(ICALP'96),

F.MeyerandB.Monien,eds.,vol.1099,Springer-Verlag,1996,pp.24{47.

[20℄ S.Eilenberg,Automata,LanguagesandMahines,vol.A,AademiPress,1974.

[21℄ P.Flajolet,Analyti modelsand ambiguityofontext-freelanguages,Theoret.

Comput.Si.,49(1987),pp.283{309.

[22℄ G.D.Forney,B.H.Marus,N.T.Sindhushayana,andM.Trott, Amul-

tilingual ditionary: System theory, oding theory, symboli dynamis and

automatatheory,inProeedingsofSymposiainAppliedMathematis,no.50,

1995,pp.109{138.

[23℄ R.L.Graham,D.Knuth,andO.Patashnik,ConreteMathematis,Addison

Wesley,1988.

[24℄ B.P.Kithens,SymboliDynamis,Springer-Verlag,1997.

[25℄ D.A.LindandB.H.Marus,AnIntrodutiontoSymboliDynamisandCod-

ing,Cambridge,1995.

[26℄ I.G.Madonald,SymmetriFuntionsandHallPolynomials,OxfordUniversity

Press,1995.

[27℄ A.Manning,AxiomAdifeomorphismshavarationalzetafuntions,Bull.London

Math.So.,3(1971),pp.215{220.

[28℄ B. H.Marus, Fatorsand extensions of full shifts,Monats. Math, 88(1979),

pp.239{247.

[29℄ B.H.Marus,R.M.Roth,andP.H.Siegel,Constrained systemsandoding

forreordinghannels,inHandbookofCodingTheory,V.S.PlessandW.C.

Human,eds.,vol.II,NorthHolland,1998,h.20,pp.1635{1764.

[30℄ N. MetropolisandG.-C.Rota,Wittvetorsand thealgebra ofneklaes,Ad-

vanesinMath.,50(1983),pp.95{125.

[31℄ D.Perrin,Finiteautomata,inHandbookofTheoretialComputerSiene,J.van

Leeuwen,ed.,vol.B,Elsevier,1990,h.1.

[32℄ C.Reutenauer,N-rationalityofzetafuntions,Adv.inAppl.Math.,29(1997),

pp.1{17.

[33℄ A.SalomaaandM.Soittola,AutomataTheoretiPropertiesofFormalPower

Series,Springer-Verlag,1978.

[34℄ T.SharfandJ.-Y.Thibon,OnWittvetorsandsymmetrifuntions,Algebra

Colloq.,3(1996),pp.231{238.

Références

Documents relatifs

We introduce in Section 5.1 the definition of Krasowskii Admissible Regular Synthesis and discuss it. Then in Section 5.2 we give our main result ensuring that, for

• Development of high order implicit time schemes for ODEs, application to Maxwell’s equations (conference ICOSAHOM2016 Rio June,

118-120 permet bien de comprendre comment, à l’époque de Luc, et dès celle du Chroniste, « la présence de chants est désormais perçue comme une caractéristique originale

Since small voids are more likely to be affected in that model, the average density contrast smoothed over the Lagrangian scale would be smaller than those of the large voids,

L’observation la plus convaincante de ce point de vue est que les séquences des fragments d’ADN immunoprécipités par cha- cune des protéines reflètent les diffé-

The underlying mathematical model of hybrid modelers is the synchronous parallel composition of stream equations, differential equations, and hierarchical automata.. But even with

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

An initial component of F is a non-dicritic component C of E −1 (0) to which is anchored at least two dead branches and having a single additional singularity of F.. It is