OLIVIER CARTON, DOMINIQUE PERRIN, AND JEAN- ´ERIC PIN IRIF, CNRS and Universit´e Paris-Diderot,, Case 7014, 75205 Paris Cedex 13, France.
e-mail address: [email protected]
Laboratoire d’informatique Gaspard-Monge, Universit´e de Marne-la-Vall´ee, 5, boulevard Descartes, Champs- sur-Marne, F-77454 Marne-la-Vall´ee Cedex 2.
e-mail address: [email protected]
IRIF, CNRS and Universit´e Paris-Diderot,, Case 7014, 75205 Paris Cedex 13, France.
e-mail address: [email protected]
ABSTRACT. Difference hierarchies were originally introduced by Hausdorff and they play an im- portant role in descriptive set theory. In this survey paper, we study difference hierarchies of regular languages. The first sections describe standard techniques on difference hierarchies, mostly due to Hausdorff. We illustrate these techniques by giving decidability results on the difference hierarchies based on shuffle ideals, strongly cyclic regular languages and the polynomial closure of group lan- guages.
Dedicated to the memory of Zolt´an ´Esik.
1. INTRODUCTION
Consider a set E and a set F of subsets of E containing the empty set. The general pattern of a difference hierarchy is better explained in a picture. Saturn’s rings-style Figure 1 represents a decreasing sequence
X1 ⊇X2⊇X3 ⊇X4⊇X5
of elements ofF. The grey part of the picture corresponds to the set(X1−X2) + (X3−X4) +X5, a typical element of the fifth level of the difference hierarchy defined byF. Similarly, then-th level of the difference hierarchy defined by F is obtained by considering length-n decreasing nested sequences of sets.
Received by the editorsThursday 5thOctober, 2017 13:23.
1998 ACM Subject Classification: Formal languages and automata theory, Regular languages, Algebraic language theory.
Key words and phrases: Difference hierarchy, regular language, ordered syntactic monoid.
The third author is partially funded from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 670624). The first and third authors are partially funded by the DeLTA project (ANR-16-CE40-0007).
LOGICAL METHODS
IN COMPUTER SCIENCE DOI:10.2168/LMCS-???
c O. Carton, D. Perrin, and J.- ´E. Pin Creative Commons
1
X1
X2 X3
X4 X5
Figure 1: Five subsets ofE.
Difference hierarchies were originally introduced by Hausdorff [12,13,14]. They play an important role in descriptive set theory [28, Section 11] and also yield a hierarchy on complexity classes known as the Boolean hierarchy [15, Section 3], [30, Section 2], [3], [2, Section 3]. Difference hierarchies were also used in the study ofω-regular languages [4,6,8,7,9,29].
The aim of this paper is to survey difference hierarchies of regular languages. Decidability questions for difference hierarchies over regular languages were studied in [10] and more recently by Glasser, Schmitz and Selivanov in [11]. The latter article is the reference paper on this topic and contains an extensive bibliography, to which we refer the interested reader. However, paper [11]
focuses on forbidden patterns in automata, a rather different perspective than ours.
We first present some general results on difference hierarchies and their connection with closure operators. The results on approximation of Section5, first presented in [5], lead in some cases to convenient algorithms to compute chain hierarchies.
Next we turn to algebraic methods. Indeed, a great deal of results on regular languages are obtained through an algebraic approach. Typically, combinatorial properties of regular languages — being star-free, piecewise testable, locally testable, etc. — translate directly to algebraic properties of the syntactic monoid of the language (see [18] for a survey). It is therefore natural to expect a similar algebraic approach when dealing with difference hierarchies. However, things are not that simple. First, one needs to work withordered monoids, which are more appropriate for classes of regular languages not closed under complement. Secondly, although Theorem 7.2yields a purely algebraic characterization of the difference hierarchy, it does not lead to decidability results, except for some special cases. Two such cases are presented at the end of the paper. The first one studies the difference hierarchy of the polynomial closure of a lattice of regular languages. The main result, Corollary8.5, which appears to be new, states that the difference hierarchy induced by the polynomial of group languages is decidable. The second case, taken from [5], deals with cyclic and strongly cyclic regular languages.
Our paper is organised as follows. Prerequities are presented in Section2. Section3covers the results of Hausdorff on difference hierarchies and Section4is a brief summary on closure opera- tors. The results on approximation form the core of Section 5. Decidability questions on regular languages are introduced in Section6. Section7on chains is inspired by results of descriptive set theory. Two results that are not addressed in [11] are presented in Sections8and9. The final Section 10opens up some perspectives.
2. PREREQUISITES
In this section, we briefly recall the following notions: upsets, ordered monoids, stamps and syntac- tic objects.
LetEbe a preordered set. Anupper setofEis a subsetU ofEsuch that the conditionsx∈U and x 6 y imply y ∈ U. An ordered monoid is a monoid M equipped with a partial order6 compatible with the product onM: for allx, y, z∈M, ifx6ythenzx6zyandxz6yz.
Astampis a surjective monoid morphismϕ:A∗ → M from a finitely generated free monoid A∗onto a finite monoidM. IfMis an ordered monoid,ϕis called anordered stamp.
Therestricted direct productof two stampsϕ1 :A∗→M1 andϕ2 :A∗ → M2 is the stampϕ with domainA∗ defined byϕ(a) = (ϕ1(a), ϕ2(a)). The image ofϕis an [ordered] submonoid of the [ordered] monoidM1×M2.
A∗
M1 M2
Im(ϕ)⊆M1×M2
ϕ1 ϕ2
ϕ
π1 π2
Stamps and ordered stamps are used to recognise languages. A languageLofA∗ isrecognised by a stampϕ:A∗→ Mif there exists a subsetP ofMsuch thatL=ϕ−1(P). It isrecognised by an ordered stampϕ:A∗ →M if there exists an upper setU ofMsuch thatL=ϕ−1(U).
The syntactic preorder of a language was first introduced by Sch¨utzenberger in [26, p. 10]. Let Lbe a language ofA∗. Thesyntactic preorderofLis the relation6Ldefined onA∗ byu6Lvif and only if, for everyx, y∈A∗,
xuy∈L =⇒ xvy∈L.
The associated equivalence relation∼L, defined byu∼L vifu6Lvandv6Lu, is thesyntactic congruence of L and the quotient monoid M(L) = A∗/∼L is the syntactic monoid of L. The natural morphismη:A∗ →A∗/∼Lis thesyntactic stampofL. Thesyntactic imageofLis the set P =η(L).
Thesyntactic order 6P is defined on M(L)as follows: u 6P vif and only if for all x, y ∈ M(L),
xuy∈P =⇒ xvy∈P
The partial order6P is stable and the resulting ordered monoid(M(L),6P)is called theordered syntactic monoidofL. Note thatP is now an upper set of(M(L),6P)andηbecomes an ordered stamp, called the ordered syntactic stamp ofL.
3. DIFFERENCE HIERARCHIES
LetEbe a set. In this article, alatticeis simply a collection of subsets ofEcontaining∅andEand closed under taking finite unions and finite intersections. Alatticeclosed under complement is a Boolean algebra. Thorough this paper, we adopt Hausdorff’s convention to denote union additively,
set difference by a minus sign and intersection as a product. We also sometimes denote Lc the complement of a subsetLof a setE.
LetF be a set of subsets ofE containing the empty set. We set B0(F) = {∅}and, for each integern>1, we letBn(F)denote the class of all sets of the form
X=X1−X2+ · · · ±Xn (3.1) where the setsXiare inFand satisfyX1 ⊇X2⊇X3 ⊇ · · · ⊇Xn. By convention, the expression on the right hand side of (3.1) should be evaluated from left to right, but given the conditions on the Xi’s, it can also be evaluated as
(X1−X2) + (X3−X4) + (X5−X6) +· · · (3.2) Since the empty set belongs to F, one has Bn(F) ⊆ Bn+1(F) for all n > 0 and the classes Bn(F)define a hierarchy within the Boolean closure ofF. Moreover, the following result, due to Hausdorff [13], holds:
Theorem 3.1. LetF be a lattice of subsets ofE. The union of the classesBn(F)forn>0is the Boolean closure ofF.
Proof. Let B(F) = ∪n>1Bn(F). By construction, every element of Bn(F) is a Boolean com- bination of members of F and thus B(F) is contained in the Boolean closure of F. Moreover B1(F) = F and thusF ⊆ B(F). It is therefore enough to prove thatB(F)is closed under com- plement and finite intersection. IfX=X1−X2+ · · · ±Xn, one has
E−X=E−X1+X2− · · · ∓Xn
and thusX∈ B(F)impliesE−X∈ B(F). ThusB(F)is closed under complement.
LetX=X1−X2+ · · · ±XnandY =Y1−Y2+ · · · ±Ymbe two elements ofB(F). Let Z=Z1−Z2+ · · · ±Zn+m−1
with
Zk= X
i+j=k+1 iandjnot both even
XiYj
Therefore
Z1=X1Y1,
Z2=X1Y2+X2Y1, Z3=X1Y3+X3Y1,
Z4=X1Y4+X2Y3+X3Y2+X4Y1, ...
Zn+m−1=
(XnYm ifmandnare not both even
∅ otherwise
We claim that Z = XY. To prove the claim, consider for each set X = X1−X2 + · · · ±Xn
associated with the decreasing sequenceX1, . . . ,Xnof subsets ofE, the functionµX defined onE by
µX(x) = max{i>1|x∈Xi}
with the convention thatµX(x) = 0ifx ∈E−X1. Thenx∈Xif and only ifµX(x)is odd. We now evaluateµZ(x)as a function ofi=µX(x)andj=µY(x). We first observe that ifk>i+j,
then x /∈ Zk. Next, if iand j are not both even, then x ∈ XiYj and XiYj ⊆ Zi+j−1, whence µZ(x) = i+j−1. Finally, if iandj are both even, then x /∈ Zi+j−1 and thusµZ(x) is either equal to0or toi+j−2. Summarizing the various cases, we observe thatµX(x)andµY(x)are both odd if and only ifµZ(s)is odd, which proves the claim. It follows thatB(F)is closed under intersection.
An equivalent definition ofBn(F)was given by Hausdorff [14]. LetX △Y denote the symmetric difference of two subsetsXandY ofE.
Proposition 3.2. For everyn>0,Bn(F) ={X1△X2 △ · · · △Xn |Xi ∈ F}.
Proof. Indeed, if X = X1 −X2 + · · · ±Xn with X1 ⊇ X2 ⊇ X3 ⊇ · · · ⊇ Xn, thenX = X1 △ X2 △ · · · △ Xn. In the opposite direction, if X = X1 △ X2 △ · · · △ Xn, then X=Z1−Z2+ · · · ±ZnwhereZk=P
i1, . . . , ikdistinctsXi1· · ·Xik. 4. CLOSURE OPERATORS
We review in this section the definition and the basic properties of closure operators.
LetE be a set. A map X → X fromP(E) to itself is aclosure operator if it isextensive, idempotent and isotone, that is, if the following properties hold for allX, Y ⊆E:
(1) X⊆X(extensive) (2) X=X(idempotent)
(3) X⊆Y impliesX ⊆Y (isotone)
A setF ⊆Eisclosed ifF =F. IfF is closed, and ifX ⊆F, thenX⊆F =F. It follows thatX is the smallest closed set containingX. This justifies the terminology “closure”. Actually, closure operators can be characterised by their closed sets.
Proposition 4.1. A set of closed subsets for some closure operator onEis closed under (possibly infinite) intersection. Moreover, any set of subsets ofEclosed under (possibly infinite) intersection is the set of closed sets for some closure operator.
Proof. LetX →Xbe a closure operator and let(Fi)i∈I be a family of closed subsets ofE. Since a closure is isotone, T
i∈IFi ⊆ Fi = Fi. It follows thatT
i∈IFi ⊆ T
i∈IFi and thusT
i∈IFi is closed.
Given a set F of subsets ofE closed under intersection, denote byX the intersection of all elements ofF containingX. Then the mapX →Xis a closure operator for whichF is the set of closed sets.
In particular,X∩Y ⊆X∩Y, but the inclusion may be strict.
Example 4.1. The trivial closure is the application defined by X=
(∅ ifX=∅ E otherwise For this closure, the only closed sets are the empty set andE.
Example 4.2. IfEis a topological space, the closure in the topological sense is a closure operator.
Example 4.3. The convex hull is a closure operator. However, it is not induced by any topology, since the union of two convex sets is not necessarily convex.
Theintersection of two closure operatorsX → X1 andX → X2 is the functionX → X3 defined byX3 =X1∩X2.
Proposition 4.2. The intersection of two closure operators is a closure operator.
Proof. Let 3 be the intersection of 1 and 2. First, since X ⊆ X1 and X ⊆ X2, one has X ⊆ X3 = X1 ∩X2. In particular, X3 ⊆ X3
3
. Secondly, since X1 ∩X2 ⊆ X1, X1∩X2
1
⊆X1
1
=X1. Similarly,X1∩X2
2
⊆X2. It follows that X33 =X1∩X21∩X1∩X22⊆X1∩X2 =X3
and hence X3 = X33. Finally, if X ⊆ Y, then X1 ⊆ Y 1 and X2 ⊆ Y 2, and therefore X3⊆Y 3.
Let us concluse this section by giving a few examples of closure operators occurring in the theory of formal languages.
Example 4.4. Iteration. The map L → L∗ is a closure operator. Similarly, the map L → L+, whereL+denotes the subsemigroup generated byL, is a closure operator.
Example 4.5. Shuffle ideal. Theshuffle product (or simplyshuffle) of two languages L1 and L2 overAis the language
L1 xxyL2 ={w∈A∗ |w=u1v1· · ·unvnfor some wordsu1, . . . , un, v1, . . . , vnofA∗ such thatu1· · ·un∈L1andv1· · ·vn∈L2}. The shuffle product defines a commutative and associative operation over the set of languages over A. Given a languageL, the languageL xxy A∗ is called theshuffle ideal generated byLand it is easy to see that the mapL→LxxyA∗is a closure operator.
This closure operator can be extended to infinite words in two ways: the finite and infinite shuffle idealsgenerated by anω-languageXare respectively:
XxxyA∗ ={y0x1y1· · ·xnynx|y0, . . . , yn∈A∗andx1· · ·xnx∈X}
XxxyAω ={y0x1y1x2· · · |y0, . . . , yn∈A∗andx1x2· · · ∈X}
The mapsX→ XxxyA∗ andX →XxxyAω are both closure operators.
Example 4.6. Ultimate closure. Theultimate closureof a languageXof infinite words is defined by:
Ult(X) ={ux|u∈A∗ andvx∈Xfor somev∈A∗} The mapX→Ult(X)is a closure operator.
5. APPROXIMATION
In this section, we consider a setF of closed sets of E containing the empty set. It follows that the corresponding closure operator satisfies the condition ∅ = ∅. We first define the notion of an approximationof a set by a chain of closed sets. Then the existence of a best approximation will be established. In this section,Lis a subset ofE.
Definition 5.1. A chain F1 ⊇ F2 ⊇ · · · ⊇ Fn of closed sets is ann-approximation ofL if the following inclusions hold for allksuch that2k+ 16n:
F1−F2 ⊆F1−F2+F3−F4 ⊆ · · · ⊆F1−F2+ · · · +F2k−1−F2k⊆ · · ·
⊆L⊆ · · · ⊆F1−F2+F3− · · · +F2k+1⊆ · · · ⊆F1−F2+F3 ⊆F1 There is a natural order among then-approximations of a given setL. Ann-approximation F1 ⊇ F2 ⊇ · · · ⊇FnofLis said to bebetterthan ann-approximationF1′ ⊇F2′ ⊇ · · · ⊇Fn′ if, for all ksuch that2k+ 16n,
F1−F2+F3− · · · +F2k+1 ⊆F1′−F2′ +F3′− · · · +F2k+1′ and
F1′−F2′ + · · · +F2k−1′ −F2k′ ⊆F1−F2+ · · · +F2k−1−F2k We will need the following elementary lemma:
Lemma 5.1. LetX,Y andZbe subsets ofE.
(1) The conditionsX−Y ⊆ZandX−Z⊆Y are equivalent, (2) IfY ⊆XandX−Y ⊆Z, thenX−Z =Y −Z.
The description of the best approximation ofLrequires the introduction of two auxiliary functions.
For every subsetXofE, set
f(X) =X−L and g(X) =X∩L (5.1)
The key properties of these functions are formulated in the following lemma.
Lemma 5.2. For every subsetX ofE, X−f(X) ⊆ Land X−g(X) ⊆ Lc. Furthermore, the following properties hold:
(1) ifX⊇Y ⊇L, thenf(X)⊇f(Y)andX−f(X)⊆Y −f(Y)⊆L, (2) ifX⊆Y ⊆L, theng(X)⊇g(Y)andX−g(X)⊆Y −g(Y)⊆L.
Proof. Sinceg(X) =X−Lc, the results concerninggcan be deduced from those concerningf by taking the complement. Let us prove the statements involvingf. The first part of the lemma follows from a simple computation:X−f(X) =X−X−L⊆X−(X−L) =X∩L⊆L.
Suppose now thatX ⊇Y ⊇L. ThenX−L⊇Y −Land thusf(X) ⊇f(Y). Furthermore, X−Y ⊆X−L⊆X−L=f(X). Therefore, by Lemma5.1,X−f(X) =Y−f(X)⊆Y−f(Y).
Lemma 5.3. Let F1 ⊇ F2 ⊇ · · · ⊇ Fn be an n-approximation of L and, for 1 6 k 6 n, let Sk=F1−F2+ · · · ±Fk. Then, for16k6n,
(f(Sk) =f(Fk)ifkis odd
g(Skc) =g(Fk)ifkis even (5.2)
Proof. Ifk= 1, thenS1 =F1and the result is trivial. Suppose thatk >1. Ifkis odd,Sk−1 ⊆L and thus Sk−L = (Sk−1+Fk)−L = Fk −L. It follows that f(Sk) = f(Fk). Ifkis even, L⊆Sk−1and thusSkc∩L= (Sk−1c +Fk)∩L=Fk∩L. Thereforeg(Skc) =g(Fk).
Define a sequence(Ln)n>0 of subsets ofEbyL0 =Eand, for alln>0, Ln+1=
(f(Ln) ifnis odd
g(Ln) ifnis even (5.3)
The next theorem expresses the fact that the sequence(Ln)n>0 is the best approximation ofLas a Boolean combination of closed sets. In particular, ifLn=∅for somen >0, thenL∈ Bn−1(F).
Theorem 5.4. Let L be a subset of E. For every n > 0, the sequence (Lk)16k6n is the best n-approximation ofL.
Proof. We first show that the sequence(Lk)16k6nis ann-approximation ofL. First, everyLk is closed by construction. We show thatLk+1 ⊆ Lkby induction onk. This is true fork = 0since L0 =E. Now, ifkis even,Lk+1 =Lk∩L⊆Lk=Lkand ifkis odd,Lk+1 =Lk−L⊆Lk= Lk.
Set, fork > 0, Sk = L1−L2+ · · · ±Lk. By Lemma5.2, the relations L2k−1 −L2k = L2k−1−f(L2k−1)⊆Lhold for everyk >0, and similarly,L2k−L2k+1=L2k−g(L2k)⊆Lc. It follows thatS2k ⊆L. FurthermoreS2k+1c = (L0−L1) + (L2−L3) + · · ·+ (L2k−L2k+1)⊆Lc and thusL⊆S2k+1.
We now show that the sequence(Lk)16k6nis the best approximation ofL. Let(L′k)16k6nbe anothern-approximation ofL. Set, fork > 0, Sk′ = L′1−L′2+ · · · ±L′k. Then, by definition, L⊆L′1and thus
S1 =L1 =L⊆L′1 =L′1 =S1′.
Let k be an even number and suppose by induction that Sk−1 ⊆ Sk−1′ . Then, by definition of an approximation, Sk′ = Sk−1′ −L′k ⊆ L, and thus by Lemma5.1, Sk−1′ −L ⊆ L′k. It follows f(Sk−1′ ) =Sk−′ 1−L⊆L′k=L′k. Now, sinceSk−1′ ⊇Sk−1⊇L, Lemma5.2(1) shows that
Sk′ =Sk−1′ −L′k⊆S′k−1−f(Sk−1′ )⊆Sk−1−f(Sk−1).
Now, by Lemma5.3,f(Sk−1) =f(Lk−1) =Lk, whenceSk′ ⊆Sk−1−Lk=Sk. Similarly,L⊆Sk+1′ =Sk′ +L′k+1 and thusS′ck−Lc⊆L′k+1. It follows that
g(S′ck) =S′ck−Lc⊆L′k+1=L′k+1. Therefore, one gets by Lemma5.1,
S′ck+1=S′ck−L′k+1⊆S′ck−g(S′ck)⊆Skc−g(Skc).
Now, by Lemma 5.3, g(Skc) = g(Lk) = Lk+1. Thus S′ck+1 ⊆ Skc −Lk+1 = Sk+1c , whence Sk+1 ⊆Sk+1′ .
WhenF is a set of subsets ofE closed under arbitrary intersection, Theorem5.4provides a characterization of the classesBn(F).
Corollary 5.5. LetLbe a subset ofE and letF be a set of subsets ofE closed under (possibly infinite) intersection and containing the empty set. Let(Lk)16k6nbe the bestn-approximation ofL with respect toF. ThenL∈ Bn−1(F)if and only ifLn=∅and in this case
L=L1−L2+ · · · ±Ln−1 (5.4)
Proof. IfL∈ Bn−1(F), thenL=F1−F2+· · · ±Fn−1withF1, . . . , Fn−1 ∈ F. LetFn=∅. Then the sequence(Fk)16k6nis ann-approximation ofL. Since(Lk)16k6nis the bestn-approximation ofL, one hasL=L1−L2+ · · · ±Ln−1. Thus, with the notation of Lemma5.3,
(f(Ln−1) =f(L) =∅ifn−1is odd
g(Ln−1) =g(Lc) =∅ifn−1is even (5.5) Therefore,Ln=∅by (5.3).
Conversely, suppose thatLn=∅. Ifn= 2k, then
(L1−L2) + · · · + (L2k−1−L2k)⊆L⊆(L1−L2) + · · · + (L2k−3−L2k−2) +L2k−1 Ifn= 2k+ 1, then
(L1−L2) + · · · + (L2k−1−L2k)⊆L⊆(L1−L2) + · · · + (L2k−1−L2k) +L2k+1 In both cases, one getsL=L1−L2+ · · · ±Ln−1and thusL∈ Bn−1(F).
Let us illustrate this corollary by a concrete example.
Example 5.1. Let A = {a, b, c} and letL be the lattice of shuffle ideals. If L is the language {1, a, b, c, ab, bc, abc}, a straightforward computation gives
L0 =A∗
L1 =g(L0) =A∗ xxy(L0∩L) =A∗xxyL=A∗
L2 =f(L1) =A∗xxy(L1−L) =A∗xxy{aa, ac, ba, bb, ca, cb, cc}
L3 =g(L2) =A∗ xxy(L2∩L) =A∗xxyabc
L4 =f(L3) =A∗xxy(L3−L) =A∗xxy{aabc, abac, abca, babc, abbc, abcb, cabc, acbc, abcc}
L5 =g(L4) =A∗ xxy(L4∩L) =∅
It follows thatL=L1−L2+L3−L4andL∈ B4(L), butL /∈ B3(L).
It is also possible to use the approximation algorithm for a setLof subsets ofEclosed under (possibly infinite) union and containing the setE. In this case, the set
Lc ={Lc |L∈ L}
is closed under (possibly infinite) intersection and contains the empty set. Consequently, the ap- proximation algorithm can be applied toLc but it describes the difference hierarchy Bn(Lc). To recover the difference hierarchyBn(L), the following algorithm can be used. First compute the best Lc-approximation of even length ofLand the bestLc-approximation of odd length ofLc, say
L=Lc1−Lc2+ · · · ±Lcn (5.6) Lc =F1c−F2c+ · · · ±Fmc (5.7) withneven,modd,Li, Fi ∈ LandLnandFmpossibly empty to fill the parity requirements. Now Ladmits the followingL-decompositions, whereL1andF1 are possibly empty (and consequently deleted):
L=Ln−Ln−1+ · · · ±L1 (5.8)
=Fm−Fm−1+ · · · ±F1 (5.9) It remains to take the shortest of the two expressions to get the bestL-approximation ofL.
6. DECIDABILITY QUESTIONS ON REGULAR LANGUAGES
Given a lattice of regular languagesL, four decidability questions arise:
Question 1. Is the membership problem forLdecidable?
Question 2. Is the membership problem forB(L)decidable?
Question 3. For a given positive integern, is the membership problem forBn(L)decidable?
Question 4. Is the hierarchyBn(L)decidable?
Indeed, given a regular languageL, Question1asks to decide whetherL∈ L, Question2whether L ∈ B(L) and Question3 whetherL ∈ Bn(L). Question 4asks whether on can one effectively compute the smallest n such that L ∈ Bn(L), if it exists. Note that if Questions 2 and 3 are decidable, then so is Question4. Indeed, given a language L, one first decides whetherLbelongs toB(L)by Question2. If the answer is positive, this ensures thatLbelongs toBn(L)for somen and Question3allows one to find the smallest suchn.
If the latticeLis finite, it is easy to solve the four questions in a positive way. In some cases, a simple application of Corollary5.5 suffices to solve Question3immediately. One just needs to find the appropriate closure operator and to provide algorithms to compute the functionsf(X)and g(X)defined by (5.1).
Example 6.1. LetLbe the lattice generated by the languages of the formB∗, whereB ⊆A. Then bothLandB(L)are finite. It is known that a regular language belongs toLif and only if its ordered syntactic monoid is idempotent and commutative and satisfies the inequation1 6xfor allx[20].
It belongs toB(L)if and only if its syntactic monoid is idempotent and commutative.
Finally, one can define a closure operator by setting L = B∗, where B is the set of letters occurring in some word ofL. For instance, letL= ({a, b, c}∗− {b, c}∗) + ({a, b}∗−a∗) + 1. This language belongs toB(L)and its minimal automaton is represented below:
1
2
3 c
a b a
b, c
a, b, c
Applying the approximation algorithm of Section5, one getsL0 ={a, b, c}∗,L1 ={b, c}∗,L2 = b∗andL3 =∅and thusL={a, b, c}∗− {b, c}∗+b∗is the best3-approximation ofL.
If the lattice is infinite, our four questions become usually much harder, but can still be solved in some particular cases. But let us first present a powerful tool introduced in [5], chains in ordered monoids.
7. CHAINS AND DIFFERENCE HIERARCHIES
Chains can be defined on any ordered set. We first give their definition, then establish a connection with difference hierarchies.
Definition 7.1. Let(E,6)be a partially ordered set and letXbe a subset ofE. AchainofEis a strictly increasing sequence
x0< x1 < . . . < xm−1
of elements ofE. It is called anX-chainifx0 is inXand thexi’s are alternatively elements ofX and of its complementXc. The integermis called thelengthof the chain. We letm(X)denote the maximal length of anX-chain.
There is a subtle connection between chains and difference hierarchies of regular languages. LetM be a finite ordered monoid and letϕ:A∗→M be a surjective monoid morphism. Let
L={ϕ−1(U)|U is an upper set ofM}
By definition, every language ofLis recognised by the ordered monoidM.
Theorem 7.1. If there exists a subset P ofM such thatL = ϕ−1(P) and m(P) 6 n, then L belongs toBn(L).
Before starting the proof, let us clarify a delicate point. The conditionL =ϕ−1(P)means thatL is recognised by themonoidM. It does not mean thatLis recognised by theordered monoidM, a property which would requireP to be an upper set.
Proof. For eachs∈M, letm(P, s)be the maximal length of aP-chain ending withs. Finally, let, for eachk >0,
Uk ={s∈M |m(P, s)>k}
We claim that Uk is an upper set of M. Indeed, ifs ∈ Uk, there exists aP-chain x0 < x1 <
· · · < xr−1 = sof length r > k. Lettbe an element of M such that s 6 t. If sand tare not simultaneously inP, thenx0 < x1 < · · · < xr−1 < tis aP-chain of lengthr+ 1>k. Otherwise, x0 < x1 < · · · < xr−2 < tis aP-chain of lengthr >k. Thusm(P, t)>k, andt∈Uk, proving the claim.
We now show that
P =U1−U2+U3−U4· · · ±Un (7.1) First observe thats∈P if and only ifm(P, s)is odd. Sincem(P)6n, one hasm(P, s) 6nfor everys∈Mand thusUn+1 =∅. Formula (7.1) follows, since for eachr>0,
{s∈M |m(P, s) =r}=Ur−Ur+1.
Let, for16i6n,Li =ϕ−1(Ui). SinceUi is an upper set, eachLibelongs toL. Moreover, one gets from (7.1) the formula
L=L1−L2+L3· · · ±Ln (7.2) which shows thatL∈ Bn(L).
We now establish a partial converse to Theorem7.1. A lattice of regular languages is a setLof regular languages ofA∗ containing ∅andA∗ and closed under finite union and finite intersection.
LetLbe a lattice of regular languages ofA∗.
Theorem 7.2. LetLbe a lattice of regular languages. If a languageLbelongs toBn(L), then there exist an ordered stampη:A∗→M and a subsetP ofMsatisfying the following conditions:
(1) ϕis a restricted product of syntactic ordered stamps of members ofL, (2) L=η−1(P),
(3) m(P)6n.
Proof. IfL∈ Bn(L), then
L=L1−L2+L3 · · · ±Ln
withL1 ⊇L2 ⊇ · · · ⊇LnandLi ∈ L. Letηi :A∗ →(Mi,6i)be the syntactic morphism ofLi and letPi = ηi(Li). Then eachPi is an upper set ofMi and Li = η−1i (Pi). Letη :A∗ → M be the restricted product of the stampsηi. Condition (1) is satisfied by construction.
Observe that ifη(u) = (s1, . . . , sn)is an element ofM, the conditionsi+1 ∈Pi+1is equivalent withu ∈ Li+1, and sinceLi+1 is a subset ofLi, this condition also impliesu ∈ Li and si ∈ Pi. Consequently, for each elements= (s1, . . . , sn)ofM, there exists a uniquek∈ {0, . . . , n}such that
s1 ∈P1, . . . , sk∈Pk, sk+1∈/ Pk+1, . . . , sn∈/ Pn This uniquekis called thecutofs. Setting
P ={s∈M |the cut ofsis odd}
one gets
η−1(P) = [
kodd
(L1∩ · · · ∩Lk)−Lk+1
= [
kodd
(Lk−Lk+1) =L (7.3) which proves (2).
Let nowx0 < x1 < · · · < xm−1 be aP-chain. Let, for0 6i6m−1,xi = (si,1, . . . , si,n) and letki be the cut ofxi. We claim that ki+1 > ki. Indeed, sincexi < xi+1,si,ki 6i si+1,kiand sincePi is an upper set,si,ki ∈Pi impliessi+1,ki ∈Pi+1, which proves thatki+1 >ki. But since xiandxi+1are not simultaneously inP, their cuts must be different, which proves the claim. Since x0 ∈P, the cut ofx0is odd, and in particular, non-zero. It follows that0< k0 < k1 < · · · < km−1 and since the cuts are numbers between0andn,m6n, which proves (3).
It is tempting to try to improve Theorem 7.2 by taking for M the syntactic morphism of L and forϕthe syntactic morphism ofL. However, Example 5.1 ruins this hope. Indeed, let F = {1, a, b, c, ab, bc, abc} be the set of factors of the wordabc. Then the syntactic monoid ofLcan be defined as the setF ∪ {0}equipped with the product defined by
xy =
(xy ifx,yandxyare all inF 0 otherwise
Now the syntactic image ofLis equal toF. It follows thatM−F ={0}and thus, whatever order is taken on M, the length of a chain is bounded by 3. Nevertheless, ifL is the lattice of shuffle ideals, thenLdoes not belong toB3(L).
Therefore, ifLis a regular language, the maximal length of an L-chain cannot be in general computed in the syntactic monoid of L. It follows that decidability questions onBn(L), as pre- sented in Section6below, cannot in general be solved just by inspecting the syntactic monoid. An exceptional case where the syntactic monoid suffices is presented in the next section.
8. THE DIFFERENCE HIERARCHY OF THE POLYNOMIAL CLOSURE OF A LATTICE
A languageLofA∗ is amarked productof the languagesL0, L1, . . . , Lnif L=L0a1L1· · ·anLn
for some lettersa1, . . . , anofA. Given a setLof languages, thepolynomial closureofLis the set of languages that are finite unions of marked products of languages ofL. Thepolynomial closureofL is denoted Pol Land the Boolean closure of Pol Lis denotedBPolL. Finally, let co-Pol Ldenote the set of complements of languages in PolL. In this section, we are interested in the difference hierarchy induced by PolL. We consider several examples.
8.1. Shuffle ideals. If L = {∅, A∗}, then PolL is exactly the set of shuffle ideals considered in Examples4.5and6.1andBPolLis the class ofpiecewise testable languages. The following easy result was mentioned in [20].
Proposition 8.1. A language is a shuffle ideal if and only if its syntactic ordered monoidMsatisfies the inequation16xfor allx∈M.
The syntactic characterization of piecewise testable languages follows from a much deeper result of Simon [27].
Theorem 8.2. A language is piecewise testable if and only if its syntactic monoid isJ-trivial.
Note that the closed sets of the closure operatorX→X xxyA∗of Example4.5are exactly the shuffle ideals. It follows that for the latticeLof shuffle ideals, the four questions mentioned earlier have a positive answer. More precisely, the decidability of the membership problem forLand for B(L)follows from Proposition8.1and Theorem 8.2, respectively. The decidability of Question3 (and hence of Question4) follows from the approximation algorithm. See Example5.1.
8.2. Group languages. Recall that agroup language is a language whose syntactic monoid is a group, or, equivalently, is recognized by a finite deterministic automaton in which each letter defines a permutation of the set of states. According to the definition of a polynomial closure, apolynomial of group languagesis a finite union of languages of the formL0a1L1· · ·akLkwherea1, . . . , akare letters andL0, . . . , Lkare group languages.
LetdGbe the metric onA∗defined as follows:
rG(u, v) = min{|M| |M is a finite group that separatesuandv}
dG(u, v) = 2−rG(u,v)
It is also known that the closure of a regular language fordGis again regular and can be effectively computed. This result was actually proved in two steps: it was first reduced to a group-theoretic conjecture in [22] and this conjecture became a theorem in [25].
LetGbe the set of group languages onA∗and let Pol Gbe the polynomial closure ofG. We also let co-PolGdenote the set of complements of languages of Pol G. The following characterization of Pol Gwas given in [17].
Theorem 8.3. LetLbe a regular language and letMbe its ordered syntactic monoid. The following conditions are equivalent:
(1) L∈Pol G,
(2) Lis open in the pro-group topology onA∗,
(3) for allx∈M,16xω.
Theorem8.3shows that Pol Gis decidable. The corresponding result forBPolGhas a long story, related in detail in [19], where several other characterizations can be found.
Theorem 8.4. Let L be a regular language and let M be its syntactic monoid. The following conditions are equivalent:
(1) L∈BPolG,
(2) the submonoid generated by the idempotents ofMisJ-trivial,
(3) for all idempotentse,fofM, the conditionsef e=eimpliesef =e=f e.
Theorem 8.4 shows thatBPolG is decidable. Now, Theorem 8.3 shows that a regular language belongs to co-Pol G if and only if it is closed in the pro-group topology on A∗. It follows that co-PolGis closed under arbitrary intersections and the operation associating to a regular language ofA∗ its closure in the pro-group topology is a closure operator. As we have seen, the closure of a regular language is regular and can be effectively computed. It follows that the algorithm described at the end of Section5can be applied to get our last corollary:
Corollary 8.5. The difference hierarchyBn(Pol G)is decidable.
9. CYCLIC AND STRONGLY CYCLIC REGULAR LANGUAGES
Cyclic and strongly cyclic regular languages are two classes of regular languages related to symbolic dynamic and first studied in [1]. It was shown in [5] that an appropriate notion of chains suffices to characterise the difference hierarchy based on the class of strongly cyclic regular languages. This contrasts with Section7, in which the general results on chain did not lead to a full characterization of difference hierarchies.
LetA = (Q, A,·)be a finite (possibly incomplete) deterministic automaton. A word u sta- bilises a subsetP ofQifP·u = P. Given a subset P ofQ, letStab(P)be the set of all words that stabiliseP. The languageStab(A)that stabilisesAis by definition the set of all words which stabilise at least one nonempty subset ofQ.
Definition 9.1. A language isstrongly cyclicif it stabilises some finite deterministic automaton.
Example 9.1. IfAis the automaton represented in Figure2, then
Stab({1}) = (b+aa)∗, Stab({2}) = (ab∗a)∗, Stab({1,2}) =a∗ andStab(A) = (b+aa)∗+ (ab∗a)∗+a∗.
1 2
a
a b
Figure 2: The automatonA.
One can show that the set of strongly cyclic languages of A∗ form a lattice of languages but are not closed under quotients. For instance, as shown in Example9.1, the languageL= (b+aa)∗+ (ab∗a)∗+a∗ is strongly cyclic, but Corollary9.5will show that its quotientb−1L = (b+aa)∗ is not strongly cyclic, sinceaa∈(b+aa)∗buta /∈(b+aa)∗.
We will also need the following characterization [1, Proposition 7]:
Proposition 9.1. LetA= (Q, A, E)be a deterministic automaton. A wordubelongs toStab(A) if and only if there is some stateqofAsuch that for every integern, the transitionq·unexists.
Strongly cyclic languages admit the following syntactic characterization [1, Theorem 8]. As usual, sωdenotes the idempotent power ofs, which exists and is unique in any finite monoid.
Proposition 9.2. LetLbe a non-full regular language. The following conditions are equivalent:
(1) Lis strongly cyclic,
(2) there is a morphismϕfromA∗onto a finite monoidM with zero such that L=ϕ−1({s∈M |sω6= 0}),
(3) the syntactic monoid M of L has a zero and its syntactic image is the set of all elements s∈Msuch thatsω 6= 0.
Proposition9.2leads to a simple syntactic characterization of strongly cyclic languages. Recall that a language ofA∗ isnondenseif there exists a wordu∈A∗such thatL∩A∗uA∗ =∅.
Proposition 9.3. Let L be a regular language, let M be its syntactic monoid and let P be its syntactic image. ThenLis strongly cyclic if and only if it satisfies the following conditions, for all u, x, v∈M:
(S1) uxωv∈P impliesxω ∈P, (S2) xω ∈P if and only ifx∈P.
Furthermore, if these conditions are satisfied and ifLis not the full language, thenLis nondense.
Proof. LetLbe a strongly cyclic language, letM be its syntactic monoid and letP be its syntactic image. IfLis the full language, then the conditions (S1)and(S2)are trivially satisfied. IfLis not the full language, then Proposition9.2shows thatM has a zero and thatP ={s∈ M |sω 6= 0}.
Observing thatxω = (xω)ω, one gets
x∈P ⇐⇒xω 6= 0⇐⇒(xω)ω 6= 0⇐⇒xω∈P which proves(S2). Similarly, one gets
uxωv∈P ⇐⇒ (uxωv)ω 6= 0 =⇒ xω6= 0 ⇐⇒ x∈P which proves(S1).
Conversely, suppose that L satisfies (S1) and (S2). If L is full, then L is strongly cyclic.
Otherwise, letz /∈P. Thenzω ∈/ P by(S1)anduzωv /∈P for allu, v ∈M by(S2). This means thatzis a zero ofM and that0∈/ P. By Proposition9.2, it remains to prove thatx∈Pif and only ifxω 6= 0. First, ifx ∈ P, thenxω ∈P by(S2)and since0 ∈/ P, one has xω 6= 0. Conversely, ifxω 6= 0, then uxωv ∈ P for someu, v ∈ M, since xω is not equivalent to0 in the syntactic congruence ofP. It follows thatxω ∈P by(S1)andx∈Pby(S2).
We turn now to cyclic languages.
Definition 9.2. A subset of a monoid is said to be cyclicif it is closed under conjugation, power and root. Thus a subsetP of a monoidM is cyclic if it satisfies the following conditions, for all u, v ∈M andn >0:
(C1) un∈P if and only ifu∈P, (C2) uv∈P if and only ifvu∈P.
This definition applies in particular to the case of a language ofA∗.
Example 9.2. IfA={a, b}, the languageb∗and its complementA∗aA∗are cyclic.
One can show that regular cyclic languages are closed under inverses of morphisms and under Boolean operations but not under quotients. For instance, the language L = {abc, bca, cab} is cyclic, but its quotient a−1L = {bc} is not cyclic. Thus regular cyclic languages do not form a variety of languages. However, they admit the following straightforward characterization in terms of monoids.
Proposition 9.4. LetLbe a regular language ofA∗, letϕbe a surjective morphism fromA∗to a finite monoidM recognisingLand letP =ϕ(L). ThenLis cyclic if and only ifPis cyclic.
Corollary 9.5. Every strongly cyclic language is cyclic.
Proof. LetLbe a strongly cyclic language, letM be its syntactic monoid and letP be its syntactic image. By Proposition9.3,P satisfies(S1)and(S2). It suffices now to prove that it satisfies(C2).
The sequence of implications
xy∈P ⇐⇒(S2) (xy)ω ∈P ⇐⇒ (xy)ω(xy)ω∈P ⇐⇒ (xy)ω−1xy(xy)ω−1xy∈P
⇐⇒ ((xy)ω−1x)(yx)ωy∈P =(S⇒1) (yx)ω ∈P ⇐⇒(S2) yx∈P.
shows thatxy ∈P impliesyx∈P and the opposite implication follows by symmetry.
Proposition9.2implies that every strongly cyclic language is cyclic. Actually, for any regular cyclic language, there is a smallest strongly cyclic language containing it [5, Theorem 2].
Proposition 9.6. LetLbe a regular cyclic language ofA∗, letη:A∗ → M be its syntactic stamp and letP =η(L). ThereM has a zero and the language
L=
(η−1({s|sω 6= 0}) if0∈/P,
A∗ otherwise.
is the smallest strongly cyclic language containingL.
Proof. If0 ∈/ P, then the language L is strongly cyclic by Proposition9.2. Morevover, sinceL is cyclic, P is cyclic by Proposition9.4. It follows that ifs ∈ P, thensω ∈ P and in particular sω6= 0. Consequently,LcontainsL.
It remains to prove that Lis the smallest strongly cyclic language containing L. LetX be a strongly cyclic language containing Land let u be a word ofL. LetA = (Q, A, E) be a deter- ministic automaton such thatX = Stab(A). Settings=η(u), one hassω 6= 0by definition ofL.
Consequently, η(s)n6= 0for every integernand there are two wordsxnand ynsuch thatxnunyn belongs toL. By Proposition 9.1, there is a state qn ofAsuch that the transition qn·xnunynis defined. The transition(qn·xn)·unis thus defined for everynand by Proposition 9.1again, the wordubelongs toX. ThusL⊆Xas required.
Suppose now that0 ∈ P and letz be a word ofLsuch thatη(z) = 0. LetX be a strongly cyclic language containing L. IfX is not full, then X is nondense by Proposition 9.3 and there exists a wordu∈A∗such thatA∗uA∗∩X=∅. SinceXcontainsL, one also getsA∗uA∗∩L=∅ and in particularzu /∈L. But this yieds a contradiction, sinceη(zu) =η(z)η(u) = 0∈Pand thus zu∈η−1(P) =L. Thus the only strongly cyclic language containingLisA∗.
Given a finite monoidM, the Green’s preorder relation6J defined onM by
s6J tif and only ifs∈M tM, or equivalently, if there existsu, v∈M such thats=utv is a preorder onM. The associated equivalence relationJ is defined by
sJ tifs6J tandt6J s, or equivalently, ifM sM =M tM.
Corollary 9.7. LetLbe a regular cyclic language ofA∗, letη:A∗ →Mbe its syntactic stamp and letP =η(L). ThenLis strongly cyclic if and only if for all idempotentse, f ofM, the conditions e∈Pande6J f implyf ∈P.
Proof. Suppose thatLis strongly cyclic and lete, f be two idempotents ofM such thate∈Pand e6J f. Letu, v ∈M be such thate=uf v. Sincefω =f, one getsufωv ∈P and thusf ∈P by Condition(S1)of Proposition9.3.
In the opposite direction, suppose that for all idempotents e, f of M, the conditions e ∈ P and e 6J f imply f ∈ P. Since L is cyclic, it satisfies (C1) and hence (S2). We claim that it also satisfies (S1). Indeed, uxωv ∈ P implies (uxωv)ω ∈ P by (S2). Furthermore, since (uxωv)ω 6J xω, one also hasxω ∈P, and finallyx∈P by(S2), which proves the claim.
The precise connection between cyclic and strongly cyclic languages was given in [1].
Theorem 9.8. A regular language is cyclic if and only if it is a Boolean combination of regular strongly cyclic languages.
Theorem9.8motivates a detailed study of the difference hierarchy of the classS of strongly cyclic languages. This study relies on a careful analysis of the chains on the set of idempotents of a finite monoid, pre-ordered by the relation6J.
Definition 9.3. AP-chain of idempotentsis a sequence (e0, e1, . . . , em−1)of idempotents ofM such that
e06J e16J · · · 6J em−1
e0 ∈ P and, for0 < i < m,ei ∈ P if and only ifei−1 ∈/ P. The integerm is the length of the P-chain of idempotents.
We let ℓ(M, P) denote the maximal length of a P-chain of idempotents ofM. We consider in particular the case where ϕ : A∗ → M is a stamp recognising a regular languageL ofA∗ and P =ϕ(L). The next theorem shows that in this case,ℓ(M, P)does not depend on the choice of the stamp recognisingL, but only depends onL.
Theorem 9.9. LetLbe a regular language. Letϕ : A∗ → M and ψ : A∗ → N be two stamps recognisingL. IfP =ϕ(L)andQ=ψ(L), thenℓ(M, P) =ℓ(N, Q).
Proof. It is sufficient to prove the result whenϕis the syntactic stamp ofL. Since the morphismψ is surjective, M is a quotient of N and there is a surjective morphism π : N → M such that π◦ψ=ϕ. It follows that
π(Q) =Pandπ−1(P) =Q. (9.1) We show that to anyP-chain of idempotents inN, one can associate aQ-chain of idempotents of the same length inMand vice-versa.
Let(e0, . . . , em−1)be aQ-chain of idempotents inN and letfi =π(ei)for06i6 m−1.
Since every monoid morphism preserves 6J, the relations (9.1) show that(f0, . . . , fm−1)is aP- chain of idempotents inM.
Let now(f0, . . . , fm−1)be aP-chain of idempotents inM. Sincefi−1 6J fi, there exist for 16 i6m−1elementsui, vi ofM such thatuifivi =fi−1. Let us choose an idempotentem−1 such thatπ(em−1) =fm−1 and some elementssiandtiofN such thatπ(si) =ui andπ(ti) =vi. We now define a sequence of idempotents(e0, . . . , em−1)ofN by setting
em−2= (sm−1em−1tm−1)ω em−3= (sm−2em−2tm−2)ω · · · e0= (s1e1t1)ω