• Aucun résultat trouvé

Languages and scanners

N/A
N/A
Protected

Academic year: 2022

Partager "Languages and scanners"

Copied!
21
0
0

Texte intégral

(1)

Languages and scanners

Dani`ele Beauquier and Jean-Eric Pin

In this paper, we are interested in a special class of automata, called scanners. These machines can be considered as a model for computations that require only “local” information. Informally, a scanner is an automaton equipped with a finite memory and a “sliding” window of a fixed length. In a typical computation, the sliding window is moved from left to right on the input, so that the scanner can remember the factors of length smaller than or equal to the size of the window. In view of these factors, the scanner decides whether or not the input is accepted or rejected.

a1 a2 a3 a4 a5 · · · an

Finite Memory

Figure 0.1: A scanner.

Scanners have been used for a long time in language theory. Every- one knows the local languages which occur for instance in the theorem of Chomsky-Sch¨utzenberger on context-free languages. Roughly speaking, a local language is described by the factors of length 2 of its words. For in- stance, ifA={a, b, c, d}, the language c(ab)+d is the set of all words whose set of factors of length 2 is exactly{ca, ab, ba, bd}. The locally testable lan- guages generalize local languages : the membership of a given word in such a language is determined by the set of factors of a fixed lengthk (the order in which these factors occur and their frequency is not relevant) of the word, and by the prefixes and suffixes of length< kof the word. These conditions can be tested by a scanner. The locally testable languages are character-

LITP/IBP, Universit´e Paris VI et CNRS, Tour 55-56, 2 place Jussieu, 75251 Paris Cedex 05, FRANCE.e-mail: beauquier@litp.ibp.fr, pin@litp.ibp.fr

(2)

ized by a deep algebraic property of their syntactic semigroup, discovered independently by Brzozowski-Simon [2] and McNaughton [7].

There are several possible variations on this definition. First, one can drop the conditions about the prefixes and suffixes. Membership in this type of language, that we call strongly locally testable (SLT), is determined only by factors of a fixed length k. Thus, a language is SLT if and only if it is a finite boolean combination of languages of the formAwA, where wis a word. Surprisingly, this rather natural family of languages does not seem to have been considered previously in the literature. We show that this family is also decidable and characterized by another nice algebraic property. But this time, the syntactic semigroup does not suffice, and a property of the image of the language in its syntactic semigroup is needed.

A second natural extension is to take in account the number of occur- rences of the factors of the word. However, since we want to use finite automata to recognize our languages, we can only count factors up to a cer- tain threshold. Threshold counting is the favorite way of counting of small children : they can distinguish 0, 1, 2, . . . but after a certain numbern(the threshold), all numbers are ”big”. From a more mathematical point of view, two positive integers sand t are congruent threshold n ifs=t or if s≥n and t ≥n. This defines the threshold locally testable languages (TLT). A combination of two deep results of Straubing [11] and Th´erien and Weiss [13]

yields a syntactic characterization of these languages. In view of the results of the previous paragraph, it is reasonable to think that similar results hold if one drops the conditions about the prefixes and suffixes. However, we are not yet able to solve this problem.

The families of languages we have introduced are also deeply connected with the study of first order theory of successor, with a predicate for each letter, interpreted on finite words. Indeed Thomas [14] proved that the languages definable in this logic are exactly the TLT languages. Since we have an effective syntactic characterization of these languages, we derive the following decidability result : given a monadic second order formula ϕ of the theory of successor, it is decidable whetherϕ is equivalent to a first order formula. We also show that the languages definable by a boolean combination of existential formulas are exactly the SLTT languages.

Finally, scanners can also be used to define sets of infinite (or even bi- infinite) words. This is technically more difficult and will be the subject of a future paper. Most of the results of the present paper as well as those of this future paper have been presented, without proofs, at the ICALP in Stresa, 1989 [1].

(3)

1 Preliminaries

1.1 Words.

LetA be a finite alphabet. We denote byA the set of words over A, and by A+ the set of non empty words. Ifu is a word of length≥k, we denote byupk andusk, respectively, the prefix and suffix of lengthk ofu. Ifu and x are two words, we denote by u

x

the number of occurrences of the factor x inu. For instance abababa

aba

= 3, since aba occurs in three different places inabababa: abababa, abababa, abababa.

1.2 Finite semigroups.

Recall that a semigroup is a set S equipped with an associative multipli- cation. All semigroups considered in this paper are finite, except for free semigroups and free monoids. Therefore the word “semigroup” will mean

“finite semigroup” in the rest of the paper. An elementeof a semigroupS is idempotent ife2=e. The set of idempotents of a semigroupS is denoted by E(S). Every non-empty finite semigroup contains at least one idempotent.

IfS =E(S), S is an idempotent semigroup.

A monoid is a semigroup with an identity. Let S be a semigroup. We denote by S1 the monoid equal to S if S has an identity, and to S∪ {1}, where 1 is a new identity, otherwise.

Given two semigroups S and T, a semigroup morphism ϕ : S → T is a function from S into T such that, for every s, s0 ∈ S,(sϕ)(s0ϕ) = (ss0)ϕ.

Recall the definitions of the Green’s relationsR, LandD: sRtif and only if there exists u, v ∈ S1 such that su =t and tv = s, s L t if and only if there existsu, v ∈S1 such thatus=tand vt=s, sDt if and only if there existsu∈S1 such that sRu anduLt. We denote by ≤J the preorder on S defined by s≤J t if and only if there exists u, v ∈S1 such that usv=t.

One can show that sDt if and only if s ≤J t and t ≤J s. We shall need the following technical result.

Lemma 1.1 Let S and T be two finite semigroups, let π : S → T be a surjective morphism. Let t and t0 be two elements of T such that t R t0 (respectively t L t0, t D t0). Then there exist s, s0 ∈ S such that sπ = t, s0π=t0 and sRs0 (respectively sLs0,sDs0).

Proof. We give the proof forR only, but the proof for Land D is similar.

Lettandt0 be two elements ofT such that tRt0. Sinceπ is surjective, the set tπ−1 is non empty. Choose a minimal elements (for the preorder≤J) intπ−1. SincetRt0, there existu, v ∈T1such that t0=tuandt=t0v. Let x, y∈S1be such thatxπ=uandyπ=v. Then (sxy)π=tuv=t0v=tand sxy≤J sx≤J s. Thus, by the choice ofs, we havesxyDs. It follows, by a

(4)

standard argument ([9], proposition 1.4, p.47),sxyRs, whences0=sxRs.

This proves the lemma, sinces0π =tu=t0.

Let S be a finite semigroup. A local subsemigroup of S is a subsemi- group of S of the form eSe, where e ∈ E(S). A semigroup is said to be locally trivial, (respectively locally commutative, locally idempotent, locally a group, etc.) if all the local subsemigroups of S are trivial (respectively commutative, idempotent, groups, etc.). For instance, a semigroupS is lo- cally idempotent and commutative if, for each e∈E(S) and each s, t∈S, (ese)2 = (ese) and (ese)(ete) = (ete)(ese).

1.3 Semidirect products.

Let M be a monoid and let T be semigroup. We write the product in M additively to provide a more transparent notation , but it is not meant to suggest that M is commutative. A left action of T on M is a map (t, m) → tm from T ×M into M such that, for all m, m1, m2 ∈ M and t, t1, t2 ∈T, (t1t2)m=t1(t2m) and t(m1+m2) =tm1+tm2, t0 = 0 Given such a left action, the semidirect product of M andT (with respect to this action) is the semigroup M ∗T defined on the set M ×T by the product (m1, t1)(m2, t2) = (m1+t1m2, t1t2).

1.4 Counting semirings.

We define the following congruences onN. x ≡y threshold t(also denoted x≡t y) if and only if (x < tand x =y) or (x≥tand y ≥t). For instance the equivalence classes of ≡4 are {0}, {1}, {2}, {3}, {4,5,6,7, . . .}. The quotient semiring Nt = N/≡t is called a counting semiring. In particular, the boolean semiringB=N1 is a counting semiring.

2 Scanners and languages defined by factors of words.

2.1 Some equivalences and congruences.

Factors of words can be used in many different ways to define families of languages. We have selected four of them, which form the subject of this article. For everyk, t >0, let≡k,t be the equivalence of finite index defined on A+ by setting u ≡k,t v if and only if, for every word x of length ≤ k, u

x

t

v

x

. For instance, u≡k,1v if and only ifu and v have the same sets of factors of length k, and u ≡k,5 v if and only if u and v have the same factors of lengthk, but counted with multiplicity threshold 5.

(5)

Example 2.1 abababab≡2,3abababasinceababababcontains 4 (≡3 thresh- old 3) occurrences of ab and 3 (≡3 threshold 3) occurrences of ba, and no occurrences ofaa (respectivelybb).

We also define a congruence∼k,t of finite index onA+ by settingu∼k,t vif (a) uand v have the same prefixes of length < k,

(b) uand v have the same suffixes of length< k, (c) u≡k,t v.

These four equivalences define four classes of languages. A subset ofA+ is locallyk-testable if it is union of∼k,1-classes. It isthreshold locallyk-testable if it is union of ∼k,t-classes for some t. If one replaces in the previous def- initions the congruence∼k,t by the equivalence ≡k,t, one defines the corre- sponding notions ofstrongly locally k-testable andstrongly threshold locally k-testable languages.

A subset of A+ is locally testable (LT) if it is locallyk-testable for some k > 0. The notions of threshold locally testable, strongly locally testable, strongly threshold locally testable languages are defined similarly. The cor- responding abreviations are TLT, SLT, and STLT.

2.2 A combinatorial description.

One can also define these classes as boolean algebras. Set, for x∈A+, and r, t≥0,

L(x, r, t) ={u∈A+| u

x

t r}

Thus L(x, r, t) is the set of all words u containing r occurrences of the factor x, but counted threshold t. For instance, L(x,1,1) = AxA, and L(x,0,1) =A+\AxA.

Proposition 2.1 Let L be a subset ofA+.

(1) L is SLT if and only if it is a boolean combination of languages of the form L(x,1,1),

(2) L is STLT if and only if it is a boolean combination of languages of the formL(x, r, t),

(3) L is LT if and only if it is a boolean combination of languages of the form L(x,1,1), xA, or Ax,

(4) L is TLT if and only if it is a boolean combination of languages of the form L(x, r, t), xA, or Ax.

Proof. We only recall the proof of (1), which is standard. The proof of the other statements can be easily adapted. For each u∈A+, put

Ek(u) ={x∈A+ |xis a factor of length k of u} and Fk(u) =Ak\Ek(u)

(6)

Then the equivalence class of uwith respect to ≡k,1 is the set S(u) = \

x∈Ek(u)

AxA\ [

x∈Fk(u)

AxA

This shows that S(u) is a boolean combination of languages of the form AxA.

Conversely, let L be a finite boolean combination of languages of the formAxA. Let k be the maximal length of the wordsx occurring in such a boolean expression. It suffices to show that if |x| ≤ k, then AxA is a union of ≡k,1-classes. But this is clear, since if u ∈ AxA and u ≡k,1 v, thenu andv have the same factors of length |x|, whence v∈AxA.

The relations between the four classes is shown in the following diagram.

SLT

ST LT

T LT

LT

Example 2.2 LetA={a, b}. Then (ab)+ is locally testable since (ab)+= (aA∩Ab)\(AaaA ∪AbbA). We shall see in section 3 that (ab)+ is not strongly locally testable.

Example 2.3 Let A = {a, b}. Then the set aba is strongly threshold locally testable : it is the set of all words containing exactly one occurrence ofb. We shall see in section 3 that aba is not locally testable.

2.3 Scanners.

Our four classes of languages can also be defined in terms of a special class of finite automata, the scanners. The informal definition of a scanner has been given in the introduction. There are actually two types of scanners, depending on the use of the window : normally, a window of sizekis allowed to read only factors of lengthk. If the scanner is unbounded, we allow the window to be also moved beyond the first and the last letter of the word, so that the prefixes and suffixes of length< k can be read. For instance, if k= 3, and u=abbaaabab, different positions of the window are represented in the following diagrams :

a bbaaabab ab baaabab abb aaabab a bba aabab · · · abbaaaba b

(7)

We now give the formal definition. A scanner on a counting semiring K is a triple S = (A, k, F) where A is a finite set (the alphabet), k is a positive integer (the size of the window),F is a (finite) set of polynomials of KhAi of degreek, called the memory. Letu∈A+be a word. Letf :A+ →KhAi be the application defined by

uϕ= X

|x|=k

u x

x

Thusuϕis just the formal sum of all factors of lengthkofu, with multiplicity counted inK. For instance, ifA={a, b}, K=N3, and k= 3,

(abbabbaabababbabbab)ϕ= 4abb+ 4bba+ 5bab+baa+aab+ 2aba

= 3abb+ 3bba+ 3bab+baa+aab+ 2aba A wordu is accepted byS if and only if uf ∈F.

A scanner on the boolean semiring is aboolean scanner. In the case of an unbounded scanner, the window is allowed to read the prefixes and suffixes of length< k. To represent this information, one introduces two new functions π, σ : A+ → KhAi defined by uπ = P

t<kupt and uσ = P

t<kust. The memory is now a triple (P, F, S) of sets of polynomials of KhAi of degree

≤k. Intuitively, P codes the set of possible prefixes, F the set of possible factors, andS the set of possible suffixes. A wordu is accepted byS if and only if uπ∈P, uϕ∈F and uσ∈S.

Proposition 2.2 Let L be a subset ofA+.

(1) L is SLT if and only if it is accepted by a boolean scanner, (2) L is STLT if and only if it is accepted by a scanner,

(3) L is LT if and only if it is accepted by an unbounded boolean scanner, (4) L is TLT if and only if it is accepted by an unbounded scanner, Proof. Again, we only give the proof for (1), but the other proofs are similar. Let S = (A, k, F) be a boolean scanner recognizing a subset L of A+. Observe that u ≡k,1 v if and only if uf = vf. It follows that L is union of ≡k,1-classes. Conversely, if L is union of ≡k,1-classes for some k, put F = uf |u∈L. Then the boolean scanner S = (A, k, F) recognizes L.

3 Syntactic characterizations.

In this section, we give effective characterizations for three of the four fam- ilies of languages introduced above. In order to keep a uniform notation

(8)

for the subsequent statements, we shall denote by S(L) the syntactic semi- group of a recognizable language L of A+, by η : A+ → S(L) its syntactic morphism, and by P(L) =Lη the syntactic image of L. We need first to introduce some algebraic tools.

3.1 Varieties of semigroups.

A variety of (finite) semigroups is a class of semigroups closed under taking subsemigroups, morphic images (or quotients) and finite direct products.

Varieties of monoids are defined similarly. The following varieties will be used in this paper.

• J1, the variety of all idempotent and commutative monoids,

• Com, the variety of all commutative monoids,

• Acom, the variety of all commutative aperiodic monoids,

• LI, the variety of locally trivial semigroups,

• LIk, the variety of all semigroupsS that satisfy the equation x1x2· · ·xkxx1x2· · ·xk=x1x2· · ·xk

for allx, x1, x2, . . . , xk∈S.

• LJ1, the variety of locally idempotent and commutative semigroups, Given a variety of monoidsVand a variety of semigroupsW, we denote byV∗Wthe variety of semigroups generated by all the semidirect products of the formM∗T, whereM ∈V and T ∈W.

3.2 Varieties of languages.

LetVbe a variety of semigroups (monoids). One associates to each alphabet Athe setA+V (AV) of all languages ofA+(A) whose syntactic semigroup (monoid) belongs to V. V is called the variety of languages corresponding to V. A description of various varieties of languages can be found in the litterature [5, 6, 9].

Proposition 3.1 For each alphabet A,

(1) A+J1 is the boolean algebra generated by the languages of the form AaA where a∈A,

(2) A+LIk is the boolean algebra generated by the languages of the form Au, vA where u and v are words of length ≤k,

(3) A+LI is the boolean algebra generated by the languages of the form Au, vA where u, v ∈A+,

(9)

(4) AAcom is the boolean algebra generated by the languages of the form {u∈A | |u|a =r}, where a∈A andr ∈N.

For now, we need a description, due to Straubing [11], of the varieties of languages corresponding to V∗LI and toV∗LIk, whenV is a variety of monoids.

Let k be an integer, and let Bk=Ak. To avoid ambiguity, words ofBk will be represented by finite sequences (b1, b2,· · · , bn), where eachbi ∈Ak. We define a (sequential) function σk :A+→Bk by setting

k= 1 if |w| < k,

(wa)σk = (wσk,(wa)sk) if |w| ≥k and a∈A.

For example, (abbaab)σ3 = (abb, bba, baa, aab). Thusσkassociates to a word u the sequence of words appearing on a window of size k when u is read from left to right.

To each congruence α of finite index on Bk, associate the congruence (α, k) onA+ defined by u(α, k)v if and only if

(a) uand v have the same prefixes of length < k, (b) uand v have the same suffixes of length< k,

(c) uσkαvσk

Denote by V and Wk the varieties of languages corresponding toV and V∗LIk, respectively.

Theorem 3.2 [11]A language belongs toA+Wk−1if and only if it is a finite union of(α, k)-classes for some congruenceα on Bk such that Bk/α∈V.

We give an equivalent form of Theorem 3.2 in terms of boolean algebras.

Corollary 3.3 For every alphabet A, A+Wk−1 is the boolean algebra gen- erated by the languages of the form Au, vA (where u and v are words of length < k) or Xσ−1k , whereX ∈BkV.

Proof. In one direction, it suffices to show that each of the languages Au, vA andXσk−1 belong toA+Wk−1. First, if|u| < k, then, by proposition 3.1, S(Au), S(uA)∈LIk−1 ⊂V∗LIk−1. Furthermore, sinceσk is a sequential function, a general result ([5], Chap. 6) states that S(Xσ−1k ) divides a wreath product of the form M(X)◦S(σk), where S(σk) is the syntactic invariant of σk. Now M(X) ∈ V since X ∈ BkV, and S(σk) ∈ LIk−1. It follows that S(Xσk−1)∈V∗LIk−1.

Conversely, if L∈A+Wk−1, then by Theorem 2.5, L is union of (α, k)- classes for a certain congruence α such that Bk/α ∈ V. Now, it follows from the definition of (α, k) that the equivalence classes of (α, k) are boolean combinations of sets of the formAu,vA(whereuandvare words of length

(10)

< k) or Xσ−1k , whereX is an equivalence class forα. But sinceBk/α∈V, one hasX ∈BkV.

For the variety V∗LI, one has the following result.

Theorem 3.4 Let S be a semigroup. Then S ∈ V∗ LI if and only if S∈V∗LIk, wherek =|S|.

Applying these results withV=AcomandJ1 respectively, one obtains the following corollary.

Corollary 3.5

(1) L is locally testable if and only ifS(L) belongs toJ1∗LI,

(2) Lis threshold locally testable if and only ifS(L)belongs toAcom∗LI, Proof. We only prove (2), the proof of (1) beeing similar. LetLbe a subset ofA+. By Corollary 3.3 and Theorem 3.4,S(L)∈Acom∗LIif and only ifL is a boolean combination of languages of the formAu,vA orXσ−1k , where X ∈BkAcom. Furthermore, by Proposition 3.1, X ∈BkAcomif and only ifX is a boolean combination of languages of the form{u∈Bk | |u|x =r}, wherex∈Bk and r∈N. Now we have

{u∈A | |u|x =r}σk−1 ={u∈A| u

x

=r}=L(x, r, t) for every t > r.

Therefore S(L) ∈ Acom ∗LI if and only if L is a boolean combination of languages of the formAu, vA or L(x, r, t), that is, if and only if L is threshold locally testable.

Note that corollary 3.5 does not give an algorithm to decide whether a language is LT, TLT or PLT. Indeed, it is not clear at first sight whether one can decide if a given semigroup belongs to J1∗LIorAcom∗LI. But Brzozowski and Simon [2] and Mac Naughton [7] shaw independently that J1∗LI=LJ1. Therefore we have

Theorem 3.6 [2, 7, 11] L is locally testable if and only if S(L) belongs to LJ1.

The syntactic characterization of locally threshold testable languages is more involved and depends on a deep result of Th´erien and Weiss [13]. Given a semigroup S, form a graphG(S) as follows : the vertices of G(S) are the idempotents of S, and the edges from e to f are the elements of the form esf.

Theorem 3.7 [13]A semigroup S belongs to Acom∗LIif and only ifS is aperiodic and its graph satisfies the following condition (C): if p and r are edges from eto f, and if q is an edge from f to e, then pqr=rqp.

(11)

e f p, r

q

Figure 3.2: The conditionpqr=rqp.

Therefore, one obtains

Corollary 3.8 A language Lis threshold locally testable if and only ifS(L) is aperiodic and its graph satisfies condition (C).

Example 3.1 LetA={a, b}, and let L=aba. Then Lis recognized by the automaton

1 a

2 a

b

Figure 3.3: The minimal automaton ofaba. The transitions are given in the following table

1 2

a 1 2

b 2 −

bb − −

Therefore, S = S(L) is presented by the relations a = 1, b2 = 0. Thus S = {a, b,0}, where a = 1 is the identity, and E(S) = {1,0}. The local semigroups are 0S0 = {0}, and 1S1 = S. This last local semigroup is not idempotent, sinceb26=b. Therefore,Lis not locally testable. On the other hand,G(S) is the graph represented below, which satisfies condition (C).

1, b,0 1 0 0

0

0

Figure 3.4: The graph ofS.

ThereforeL is TLT (see example 2.2).

The three classes of languages we have considered so far were character- ized by an algebraic property of their syntactic semigroup. Such a property

(12)

do not suffice, however, to characterize the class of strongly locally testable languages. To overcome this difficulty, we need to consider not only the syntactic semigroup, but also the syntactic image of the language.

LetSbe a semigroup and letP be a subset ofS. We say thatP saturates theD-classes ofS if, for every D-classD ofS, D∩P 6=∅impliesD ⊂P. It is equivalent to say thats∈P andsDtimplyt∈P. The next proposition shows that this property is stable under quotients.

Proposition 3.9 Let S and T be two semigroups, let π : S → T be a surjective morphism. LetPS (respectively PT) be a subset of S (respectively T) such that PT =PSπ and PS =PTπ−1. Then PS saturates the D-classes of S if and only if PT saturates the D-classes of T.

Proof. Suppose thatPT saturates theD-classes ofT. Lets∈PS ands0∈S such that s D s0. Then sπ D s0π, sπ ∈ PT and therefore, s0π ∈PT. Thus s0∈PTπ−1=PS, and PS saturates the D-classes of S.

Conversely, suppose that PS saturates the D-classes of S. Let t, t0 ∈ T such that t∈PT and tDt0. By Lemma 1.1, there existss, t∈S such that sπ=t, s0π=t0 and sDs0. It follows thats∈PTπ−1 =PS, whences0 ∈PS

and s0π =t0 ∈PSπ=PT.

Theorem 3.10 A languageLis strongly locally testable if and only if S(L) is locally idempotent and commutative and P(L) saturates the D-classes of S(L).

Proof. To simplify notations, putS =S(L), P =P(L) and denote by ∼k

the congruence ∼k,1. If L is SLT, then L is also LT and thus S is locally idempotent and commutative by Theorem 3.6. Furthermore,Lis a boolean combination of languages of the formAxA. Letk be the maximal length of the words x occurring in this boolean combination. Then, if u ∈L and ifu and v have the same factors of length ≤k, then v ∈L. We claim that P saturates the R-classes of S. Let s ∈ P and let t ∈ S such that s R t.

Then there exist some elements x, y ∈S(L)1 such that sx =t and ty =s.

Let s0 ∈ A+, x0, y0 ∈A be words such that s0η = s, x0η = x and y0η =y (if x = 1 or y = 1, we take x0 = 1 or y0 = 1, respectively). Now the word s0(x0y0)k belongs to L, since (s0(x0y0)k)η = s(xy)k = s. Furthermore, the wordss0(x0y0)kand s0(x0y0)kx0 contain the same factors of length≤k. This is obvious ifx0 = 1. Suppose nowx0∈A+. Then every factor ofs0(x0y0)k is clearly a factor of s0(x0y0)kx0. Conversely, let t be a factor of length ≤k of s0(x0y0)kx0. Then eithertis a factor ofs0(x0y0)k, ortis a factor of (x0y0)k−1x0, since |(x0y0)k−1| ≥ k−1. But (x0y0)k−1x0 itself is a factor of s0(x0y0)k and thustis a factor ofs0(x0y0)k. It follows, by the remark above, thats0(x0y0)kx0 belongs to Land hence (s0(x0y0)kx0)η =s(xy)kx=sx=t∈P, proving the

(13)

claim. A dual argument would show that P saturates the L-classes, and henceP also saturates the D-classes.

Conversely, assume that S is locally idempotent and commutative and thatP saturates theD-classes ofS. Then,L is locally testable by theorem 3.6, and thus is union of ∼k-classes for some k. Put Sk = A+/∼k. Then there is a surjective morphismπk :Sk→S, and a subsetQof Sk such that

(a) L=Qπ−1k and Q=Lπk, (b) Q=P π−1 andP =Qπ.

Now, by Proposition 3.9, Q saturates the D-classes of Sk. To finish the proof, we need a result on theD-classes ofSk. Denote by Fk(u) the set of factors of lengthk of a word u.

Lemma 3.11 Letuandv be two words ofA+. ThenuπkDvπkif and only if either u=v or Fk(u) =Fk(v)6=∅ (this case implies that u and v are of length ≥k).

Proof. We first treate the case|u|< k (respectively|v|< k). IfuπkDvπk, then there exist four wordsx, y, s, t∈A such thatxuy∼kv and svt∼ku.

This impliessvt=u, whence|v|< k, and thus xuy=v, so that u=v.

Suppose now |u|,|v| ≥ k. IfuπkDvπk, there exist two wordsx, y∈A such that xuy ∼k v. In particular, every factor of length k of u is a factor ofv, and, by a dual argument,Fk(u) =Fk(v).

Conversely, assume that Fk(u) = Fk(v) and let p (respectively s) be the prefix (suffix) of length k of u. Then p (respectively s) occurs in v, so that v = v0pv1 =v2sv3. Put w = v0uv3. We claim that w ∼k v. Indeed v0p (respectively sv3) is a common prefix (suffix) of v and w. Next, since Fk(u) =Fk(v), each factor of lengthk ofvis a factor ofuand hence a factor ofw. Conversely, lettbe a factor of length k ofw. Thentis either a factor ofv0p, a factor ofu, or a factor of sv3. In each case, it is also a factor ofv, which proves the claim. Thereforevπk= (v0πk)(uπk)(v3πk), and, by a dual argument,uπkDvπk.

We now conclude the proof of Theorem 3.10. We start with the equality L=Qπk−1 and we distinguish two categories of elements inQ. Put

Q1 ={s∈Q| every word ofsπ−1k is of length≥k}and Q2 ={s∈Q| there exists a word ofsπ−1k of length< k}

Then, sinceQ=Q1∪Q2, we have

L=Q1πk−1∪ [

s∈Q2

k−1

and we shall prove separately that the languages Q1πk−1 and sπk−1, for s∈ Q2, are SLT.

(14)

Lets∈Q2. Thensπ−1k contains a worduof length< k, andsπk−1={u}, since u cannot be equivalent to another word. But

{u}=AuA\ [

a∈A

AuaA∪ [

a∈A

AauA

!

and thus sπ−1k is strongly locally testable.

Since Q saturates the D-classes of Sk, and since, by Lemma 3.11, an element ofQ2 cannot beD-equivalent with an element ofQ1, Q1 is a union ofD-classes. Furthermore, Lemma 3.11 shows that aD-classDcontained in Q1 is entirely characterized by a certain non empty setF of words of length k. More precisely Dπ−1k ={u∈A+ | Fk(u) =F}. It follows that Dπk−1 is strongly locally testable, since

{u∈A+|Fk(u) =F}= \

x∈F

AxA\[

x∈Ak\F AxA

Finally,Lis a finite union of SLT languages, and thus is also strongly locally testable.

Example 3.2 Let A = {a, b, c}, and let L = c(ab) ∪c(ab)a. Then L is recognized by the following automaton

1 c 2 3

a

b

Figure 3.5: An automaton recognizing L=c(ab)∪c(ab)a.

The transitions are given in the following table

1 2 3

a − 3 −

b − − 2

c 2 − −

ab − 2 − ba − − 3 ca 3 − −

Therefore,S(L) is presented by the relations cabc=c, a2 =b2=c2 =bc= ac= 0. TheD-class structure is represented in the following diagram, where the grey box is the image of L.

(15)

ab a b ba

c ca

0

Figure 3.6: The J-class structure.

ThusP(L) saturates theD-classes, andL is SLT. In fact,

L=AcA\(AaaA∪AacA∪AbbA∪AbcA∪AcbA∪AccA).

The next statement summarizes the results of this section.

Corollary 3.12 For a given recognizable subsetLofA+, the following prop- erties are effectively decidable :

(1) L is locally testable,

(2) L is threshold locally testable, (3) L is strongly locally testable.

In view of these results, it is natural to conjecture that one can also decide whether a given language is STLT, but we don’t have a proof of this fact.

4 Connections with logic.

The connections between formal languages and mathematical logic were first studied by B¨uchi [3]. To each word u∈A+ is associated a structure

Mu = ({1,2, . . . ,|u|}, S,(Ra)a∈A)

whereS denotes the successor relation on{1,2, . . . ,|u|} andRa is set of all isuch that thei-th letter of u is ana. For instance, ifA ={a, b} and u= abaab, thenRa={1,3,4}andRb={2,5}. The logical language appropriate to such models hasS and theRa’s as non logical symbols, and formulas are built in the standard way by using these non-logical symbols, variables, boolean connectives, equality and quantifiers. Note that we don’t use the symbol < in this logic. We shall denote byL1(A) and L2(A), respectively, the set of first order and monadic second order formulas of this logic. Given

(16)

a sentenceϕ, we denote by L(ϕ) the set of all words which satisfy ϕ, when ϕis interpreted according to the model described above. For instance, if

ϕ=∃x∃y∃z((y=Sx)∧(z=Sy)∧Rax∧Rby∧Rbz) thenL(ϕ) =AabbA.

The seminal result of B¨uchi can now be stated as follows

Theorem 4.1 [3]A subset of A+ is rational if and only if it can be defined by a L2(A)-sentence.

The first order theory was investigated by Thomas [14]. Thomas proved that a language is TLT if and only if it is definable by a L01(A)-sentence, whereL01(A) is the logical language obtained by completingL1(A) with the 0-ary symbols min, max, interpreted as the minimum 1 and the maximum

|u|on{1, . . . ,|u|}. Furthermore, Thomas proved that boolean combinations of existentialL01(A)-sentences were sufficient to define TLT-languages. Now, it is easy to define min and max in terms ofS. For instance, one can consider min as a new variable satisfying the formula∀x¬S(x, min). Therefore, one obtains

Theorem 4.2 A subset of A+ is threshold locally testable if and only if it is definable by aL1(A)-sentence.

However, the situation is slightly different if one considers only boolean combinations of existentialL1(A)-sentences, because rewriting min in terms ofS creates some alternations of quantifiers. More precisely, we have Theorem 4.3 A subset of A+ is strongly threshold locally testable if and only if it is definable by a boolean combination of existentialL1(A)-sentences.

Proof. IfLis a STLT language,Lis a boolean combination of languages of the formL(x, r, t). Therefore it suffices to show that each of these languages can be defined by a boolean combination of existential L1(A)-sentences.

The formal proof can easily be adapted from the following example, where A={a, b}. One hasL(ab,2,3) =L(ϕ), whereϕ=ϕ1∧ ¬ϕ2, and

ϕ1 =∃x1∃x2∃x3∃x4(¬(x1 =x3)∧S(x1, x2)∧S(x3, x4)

∧Rax1∧Rbx2∧Rax3∧Rbx4)

ϕ2 =∃x1∃x2∃x3∃x4∃x5∃x6(¬(x1 =x3)∧ ¬(x3=x5)∧ ¬(x1 =x5)

∧S(x1, x2)∧S(x3, x4)∧S(x5, x6)

∧Rax1∧Rbx2∧Rax3∧Rbx4∧Rax5∧Rbx6)

Conversely, it suffices to show that a language L defined by an existential L1(A)-sentence ϕ is SLTT. We use an argument of game theory, which

(17)

we borrow from [14]. For the conveniance of the reader, we briefly review the terminology of game theory needed to achieve the proof (see [10] for more details). Let u = u1· · ·ur and v = v1· · ·vs be two words, where u1, . . . , ur, v1, . . . , vs∈A. A position inu (respectively inv) is a an element of {1, . . . ,|u|} (respectively {1, . . . ,|v|}). For each m > 0, define a game Gm(u, v) between two players as follows. Player I plays first, and choosesm positionsi1, . . . , im of u. Then player II must choose m positionsj1, . . . , jm inv. We say that II wins the game if the following conditions are satisfied : (1) For 1≤r ≤m, the letteruir is equal to the lettervjr. (Intuitively, if I chooses an occurrence of a lettera, then II should choose an occurrence of the same letter.)

(2) For each r, s ≤ m, ir = is if and only if jr = js. (Intuitively, if I decides to choose twice - or more - the same position, then II should follow this choice. Conversely, II is not allowed to choose twice the same position if it was not the choice of I.)

(3) For each r, s≤m, ir =is+ 1 if and only if jr =js+ 1. (Intuitively, if I decides to choose two adjacent positions, then II should follow this choice. Similarly, if I chooses two non-adjacent positions, then II should follow this choice.)

Now, by the theory of Ehrenfeucht-Fra¨ıss´e, two words u and v satisfy the same existential formulas of quantifier rank≤mif and only if Player II has a winning strategy in the games Gm(u, v) and Gm(v, u).

The main argument of the proof is the following lemma.

Lemma 4.4 Letn= 2m+ 1 andt= (m−1)(2m+ 1) + 1. Ifu≡n,tv, then Player II has a winning strategy in the gamesGm(u, v) andGm(v, u).

Proof. By symmetry, it suffices to prove the result for the game Gm(u, v).

The strategy of II is easier to understand if one thinks that the choice ofik

(respectivelyjk) also determines the segment

Ik= [1, . . . ,|u|]∩[ik−2m−k, ik+ 2m−k]

(respectively Jk = [1, . . . ,|v|]∩[jk−2m−k, jk+ 2m−k]). The strategy of II consists to choose jk so that the following conditions are satisfied :

(a) u[Ik] =v[Jk],

(b) For every s < k, is ∈ Ik if and only if js ∈ Jk. In this case Ik is a subsegment ofIs and Jk is the corresponding subsegment of Js. We prove by induction onk thatjkcan be choosen so that conditions (a) and (b) are satisfied. For k = 1, condition (b) is empty, and condition (a) can be fulfilled becauseuandvhave the same factors of lengthnby hypothesis.

Assume thatjs has been choosen successfully for s < k. We now choose jk as follows.

(18)

First, assume there exists s < k such that is ∈ Ik, and let us take the smallests satisfying this condition. Then

is−2m−s ≤(ik+ 2m−k)−2m−s≤(ik+ 2m−k)−2.2m−k =ik−2m−k and, symmetrically, ik+ 2m−k ≤is+ 2m−s. Therefore Ik is a subsegment of Is and we can take for Jk the corresponding subsegment of Js. Since u[Is] = v[Js], we have u[Ik] = v[Jk]. Now if i0s ∈ Ik for some s0 such that s < s0 < k, thend(is, is0)≤d(is, ik)+d(ik, is0)≤2.2m−k≤2m−s0. Therefore Is0 is a subsegment ofIs , as illustrated in the following diagram, andJs0 is the corresponding subsegment ofJs0.

Is

Ik

Is0

is ik is0

A similar argument would show that, if js0 ∈ Jk, then is0 ∈ Ik. Thus conditions (a) and (b) are satisfied in this case.

Now suppose that is ∈ Ik for every s < k. We claim that there exists at least one occurrence of the factor x =u[Ik] in v, defining a segment Jk

such thatv[Jk] =x and js ∈Jk for every s < k. Indeed, assume that every segment Jk such that v[Jk] = x satisfies js ∈ Jk for some s < k, that is, jk ∈[js−2m−k, js+ 2m−k]. Then one can bound the number of occurrences ofx inv as follows:

v x

≤ X

0<s<k

(2m−k+1+ 1)) = (k−1)(2m−k+1+ 1)< t Now, since |x| ≤ n, we have by hypothesis u

x

v

x

threshold t, so that u

x

= v

x

. But each segment J inv such that v[J] =x and js ∈ J defines a segment I in u such that u[I] =x and is ∈I, since u[Is] = v[Js] by the induction hypothesis.

(19)

Is

I

Js

J u

v

Furthermore, this application is injective, that is, the situation repre- sented in the figure below can never occur.

Is Is0

I =I0

Js Js0

J J0

u

v

Indeed, we have seen above that if, for instance, s < s0, I = I0 implies that Is0 is a subsegment of Is. Therefore, Js0 must be a subsegment of Js and thusJ =J0. It follows that each occurrence ofxinvthat is a factor of somev[Js] is in one-to-one correspondence with an occurrence ofxinuthat is a factor ofu[Is]. In particular,u

x

>v

x

, a contradiction. This proves the claim, and conditions (a) and (b) can be satisfied.

Now, it is easy to verify that this choice ofj1, . . . , jmis a winning strategy for player II.

We now conclude the proof of theorem 4.3. Assume that Lis defined by an existential sentence ϕof quantifier rank m. Ifu≡n,tv, then Lemma 4.4 and the theorem of Ehrenfeucht-Fra¨ıss´e shows thatu satisfiesϕif and only ifv satisfies ϕ. This means thatL(ϕ) is a union of ≡n,t-classes, and hence L(ϕ) is strongly locally threshold testable.

(20)

5 Remarks.

There are a few extensions that were not considered in order to keep this pa- per to a reasonable size. The first possibility would be to introduce modulo counting. If one considers modulo counting only, the notions of “periodi- cally locally testable language” and “modular scanner” can be easily defined, and the syntactic characterization follows from the works of Straubing and Th´erien (the condition would be thatS(L) is locally a commutative group).

One can also give a logical interpretation if one allows the “modular” quan- tifiers considered by Straubing, Th´erien and Thomas [12]. One can also consider simultaneously modulo and threshold counting. The corresponding variety of semigroups would beCom∗LI, for which an effective description has been given by Th´erien and Weiss [13] (a semigroup belongs toCom∗LI if and only if the graph associated with the semigroup satisfies (C)). How- ever, no such decidability results is known for the corresponding “strong”

notions. In conclusion, if one removes the conditions on prefixes and suffixes, nothing is known, except in the boolean case.

The second possible extension is to consider infinite words, and this will be the subject of a future paper.

References

[1] D. Beauquier and J.E. Pin, 1989, Factors of words, inAutomata, Lan- guages and Programming, (G. Ausiello, M. Dezani-Ciancaglini and S. Ronchi Della Rocca, eds.), Lecture Notes in Comput. Sci., 372, Springer, 63–79.

[2] J.A. Brzozowski and I. Simon, 1973, Characterizations of locally testable languages, Discrete Math. 4, 243-271.

[3] J.R. B¨uchi, 1962, On a decision method in restricted second-order arith- metic, inProc. 1960 Int. Congr. for Logic, Methodology and Philosophy of Science, Stanford Univ. Press, Standford, 1–11.

[4] S. Eilenberg, 1974, Automata, languages and machines, Vol. A, Aca- demic Press, New York.

[5] S. Eilenberg, 1976, Automata, languages and machines, Vol. B, Aca- demic Press, New York.

[6] G. Lallement, 1979,Semigroups and combinatorial applications, Wiley, New York.

[7] R. McNaughton, 1974, Algebraic decision procedures for local testabil- ity, Math. Syst. Theor. 8, 60-76.

(21)

[8] D. Perrin and J.E. Pin, First order logic and star-free sets,J. Comput.

System Sci. 32, 1986, 393–406.

[9] J.-E. Pin, 1984 Vari´et´es de langages formels, Masson, Paris; English translation: 1986, Varieties of formal languages, Plenum, New-York.

[10] J.G. Rosenstein, 1982,Linear orderings, Academic Press, New York.

[11] H. Straubing, Finite semigroup varieties of the form V ∗D, J. Pure Applied Algebra 36(1985) 53–94.

[12] H. Straubing, D. Th´erien and W. Thomas, 1988, Regular Languages Defined with Generalized Quantifiers, in Proc. 15th ICALP, Springer Lecture Notes in Computer Science 317, 561–575.

[13] D. Th´erien and A. Weiss, 1985, Graph congruences and wreath prod- ucts,J. Pure Applied Algebra 35, 205–215.

[14] W. Thomas, 1982, Classifying regular events in symbolic logic,J. Com- put. Syst. Sci 25, 360–375.

Références

Documents relatifs

In this section, we will give some properties of these Green functions that we will need later. The proof of the following lemma is based on elementary potential theory on the

We have shown that in controlled Latvian, which is a highly synthetic CNL, where definite and indefinite articles are not used, the topic-focus articulation can be

In this particular example, the product number and the utilization of Northern Sámi and Finnish as particular local resources do not seem to us to afford international mobility

Adult Spanish second language (L2) learners of English and native speakers of English participated in an English perception task designed to investigate their ability to use

Now off-line one counter languages are a special case of off-line pda languages and hence can be accepted by deterministic multitape Turing machines in time « 4 [ 5 ] , We

Eine klare dichotomische Trennung beider Varietäten ist dennoch aus (sozio-)linguistischer Perspektive nicht nachvoll- ziehbar und empirisch nicht zu belegen, da beide Varietäten

In this paper, we present sentence production data from Persian, studying the effect of the length in interaction with factors that are related to conceptual accessibility. While

In order to produce this Atlas module, a large team of scholars has carried out research of various types: collection and processing of exist- ing geographic and demographic data