LITP, University Paris Vl and CNRS, Tour 55-65, 4 Place Jussieu, 75252 Paris Cedex 05, FRANCE. On the languages by £inite reversible automata (1) by J.E. Pin,

(1)

b y J.E. P i n ,

LITP, U n i v e r s i t y P a r i s Vl and CNRS, T o u r 55-65, 4 P l a c e J u s s i e u , 75252 P a r i s Cedex 05, FRANCE.

Abstract. A reversible automaton is a finite (possibly incomplete) automaton in which each letter induces a partial one-to-one m a p from the set of states into itself. W e give four non-trivial characterizations of the languages accepted by a reversible automaton equipped with a set of initial and final states and w e s h o w that one can effectively decide whether a given rational (or regular) language can be accepted by a reversible automaton. The first characterization gives a description of the subsets of the free group accepted by a reversible automaton that is s o m e w h a t reminiscent of Kleene's theorem. The second characterization is more combinatorial in nature. The decidability follows from the third - algebraic - characterization. The last and somewhat unexpected characterization is a topological description of our languages that solves an open problem about the finite-group topology of the free monoid.

I. Introduction.

The well-Known Kleene's theorem states that a language is rational if and only if it is accepted by a finite deterministic automaton. Since rational languages are closed under reversal, the rational languages are also exactly the languages accepted by finite codet, ermin~stic automcc~a (an automaton & is codeterministic if the reverse automaton is deterministic). The aim of this paper is to characterize the languages accepted by J'wver,.~"iMe (that is both deterministic and codeterministic) automata. Although this natural question requires only the very basic definitions of automata theory and could have been asked already

(i) ThL~ r e s e a m h w a s # u p p o r ~ e d b~ ~he Au~'trc[l~'an R e s e a r c h GranLe Scheme,

(2)

in the f i f t i e s , the answer we propose in this paper is intimately related to the more advanced research on recognizable languages and finite semigroups.

We f i r s t need to make precise the meaning of the expression "accepted by an automaton".

In the case of a deterministic automaton, one usually considers a set of final states and either a unique i n i t i a l state, or a set of initial states : in both cases, the languages accepted by these automata are the rational languages. There are two solutions to keep the symmetry for the definition of a reversible automaton. We may consider a unique i n i t i a l state and a unique final state: we call i t the restricted mode of acceptance. Such acceptors have been considered in a r t i f i c i a l intelligence in connection with the problem of inductively inferring general rules from examples [ I ] . They also have occurred in the study of the star-height problem [ 9 ] and are related to certain classes of biprefix codes [6]. The corresponding class of languages is not closed under union and the membership problem for this class is easy to solve. The second solution consists in considering a ~e~ of initial and final states. Now, the corresponding class of languages ~ is closed under (finite) union, but i t is no longer t r i v i a l to decide whether or not a given rational language belongs to ~ (for instance, the minimal automaton of a language of ~ is not reversible in general). We propose in this paper four different characterizations of the class ~.

Our f i r s t characterization relates the class ~ to a class of rational subsets of the free group. Indeed, an automaton is reversible i f and only i f each l e t t e r induces a partial one-to-one map from the set of states into itself. Therefore, if a reversible automaton accepts a language L of A*, i t also accepts in a natural way a subset K of the free group such that L = K n A * . Now the subsets of the free group accepted by a reversible automaton form the smallest class of subsets (of the free group) containing the singletons and closed under the three operations "union", "product by an element of the free group", and "subgroup generated by". These subsets are also the f i n i t e unions of cosets of f i n i t e l y generated subgroups of the free group.

Next~ we observe that i f a language L belongs to ~, then the idempotents of the syntactic monoid M(L) commute. This necessary condition (a) plays an important role in semigroup theory [2,3,I0,I 13. However, (a) is not sufficient, conversely, to ensure that L belongs to ~.

(3)

There are three different ways to strengthen condition (a) to obtain a characterization of ~. The first solution is to require that in L, "plus is equivalent to star", or more precisely, that,

(b) if ×u+y c L for some words x,u,y of A*, then xy E L.

The second solution is an algebraic translation of (b) : if P denotes the image of L in H(L), then~

(c) for every s,t in H(L)~ and for every idempotent e of H(L)t set E P implies st £ P.

This characterization is i m p o r t a n % because it shows that the membership problem for is decidable. Given a rational language L, one can effectively compute R(L) and P and verify (a) and (c). Therefore, one may decide whether or not L belongs to ~.

Our last characterization relates ~ to the group topology of the free monoid [5,12,14].

This topology is defined by a distance in which, roughly speaking, t w o w o r d s are close if they are not distinguishable by a group of small cardinality. W e show that a rational language L belongs to ~ if and only if H(L) satisfies (a) and L is c ~ o n e d in this topology. In fact, w e conjecture that a rational language L is closed if and only if the condition (c) holds.

The results of this paper show that this conjecture is true if the idempotents of R(L) commute.

W e refer the reader to [II ] for the consequences of this conjecture in semigroup theory.

Some of the proofs are omitted and will appear elsewhere.

2. Reversible automata.

Recall that a finite deterministic (incomplete) automaton is a triple a = ((~,A,.) where Q is a finite set of states, A is the alphabet and (q,a)---> q.a is a partial function from Q x A into Q. We denote by a r the reverse automaton of ~t. In general I~ r is a non-deterministic automaton. If it is deterministic~ we say that ~t is reuer~(ble (or ~ t ~ u e [6,14]). ~quivalently,

~t is reversible if every letter a E A induces an injective partial function from Q into itself.

W e say that a language L of A * is accepted in the restricted m o d e by ~ if there exists an initial state i and a final state f such that L - - { u E A * l i . u = f } . W e say that L

(4)

is accepted by a if there exists a set I of initial states and a set F of final states such that

L = { u E A * I there exists i E I and f E F such that i.u = f }.

Example.

The automaton represented in the following diagram is reversible and accepts the language { a, ac, bc }.

b

It is not difficult to see that a trim, reversible automaton with a unique initial star and a unique final state is necessarily minimal. The next proposition, discovered independently by various authors, characterizes the languages accepted by a reversible automaton in the restricted mode

Proposition

2,! A lc/nguctge L L.~ a, c c e p t e d b~ ct ,-euers(Me #u~omc/~on ~n t h e ,,'e~tric~ed m o d e ~f a n d ont~ i~ the m i n ~ m M automaton o ~ L is revers~Me a n d has a un~Iue

~ina! s~cde,

I t is much more d i f f i c u l t to characterize the class ~ of all the languages accepted by a reversible automaton. The following construction shows that ~ is closed under union.

Given two reversible automata ~ t = (QI,A,.~,It,Ft) and ~ = (Q2,A,.2,12,F2), we form the dL~jo(nt un(on a of a l and ~t~ as follows. ~= (Q,A,.,I,F) where Q is the disjoint union of

QI

and Q2, I is the disjoint union of I i and 12, F is the disjoint union of F: and F 2, and, for every l e t t e r a E A, q.a = q.ta if q E Gt I and q.a= q.2a i f q E (12. Then a is a reversible automaton that accepts L(~t t) UL(~2).

Since the minimal automaton of a singleton {u} is always reversible, i t follows that contains all the f i n i t e languages. Notice that the minimal automaton of a language of is not necessarily reversible. For instance, the minimal automaton of L= { a, ac, be} is

(5)

not reversible.

Let ~ be a reversible automaton. The action from A on (l can be extended t o an action from the free group F(A) on Q by setting, for every q E Q,

q . l = q

andf for every aEA and for every uEA*,

q.ua = (q.u).a i f q.u and (q.u).a are both defined (undefined otherwise), q.u~ = q' if q.u is defined and if q'.a = q.u (undefined otherwise).

Now, the subset of the free group accepted by ~ is the set

S(~t) = { u E F(A) I there e x i s t s q E G and q' E F such t h a t q.u = q' }

For instance~ i f k is the automaton of example I, then S ( ~ ) = { c } u < a ~ > { a c , b c } where <X>

denotes the subgroup of F(A) generated by a set X.

Since L ( ~ ) = S ~ ) n A * , i t suffices to describe the subsets of the free group accepted by a reversible automaton to obtain a f i r s t characterization of the class ~. This is the goal of the following theorem.

T h e o r e m 2.2 A sub.ee~ S o f t h e f r e e g r o u p F(A) is a c c e p t e d b~ a r e v e r s i b l e automaton i f a n d onl~ Ef S is a f i n i t e un~on o f lejrt cosets o f f i n i t e l ~ generated subgroups o f the free group.

(Proof omitted)

Here is another version of the same result, that is somewhat reminiscent of ](leene's theorem.

T h e o r e m 2.3 T h e s u b s e t s o / t h e ~ t e e g r o u p a c c e p t e d b~ a r e u e r s i b l e a u t o m a t o n f o r m the smallest class o f s u b s e ~ ~ such t h a t

(1) ~ E ~ and f o r euer~ gE F(A)~ { g } E ~p (2) iX" St., =q2E ~ ~hen S t US 2 E ~

(6)

(2) O r SE ~: ctnd gE F(A) %hen g.FE ~:, (4) Qr SE ~, then <E> E ~:,

3. An algebraic characterization of I~.

Let L be a rational language of A*. We denote by N(L) the syntactic monoid of L, by q : A * - - - > N ( L ) the syntactic morphism and we put P=Lq. Recall that an element e of a monoid t4 is

~dempo~en~

i f e = ee. The following proposition gives two important properties of the languages of ~.

Proposition 3.1 [,F L Le ~ccep~ed b~ cr /'et~er~b~e ~rutomnt, on ~hen, (u) ~he (de=po~enf~ oF ,if(L) c~mu#ep

Proof. Let $~= (G,.,A+ItF) be a reversible automaton accepting L. Then L is recognized by the transition monoid M of , t . Let e be an idempotent of H. Then for every qE Q, (q.e).e = q.e whenever q.e is defined. Since ~t is reversible, e is an injective partial function and thus q.e =q or is undefined. In other words, every idempotent is a subidentity on G. I t follows immediately that the idempotents commute in M. Now, the syntactic monoid M(L) divides M (see [4,7,123 for instance), and since the class of monoids with commuting idempotents is closed under division, the idempotents also commute in M(L).

Let ×,u,y E A * be words such that xu+y CL. Since M is finite, there exists an integer n>0 such that u n is idempotent in N, that is, induces a subidentity on Q. Now since xunyEL, there exists an i n i t i a l state q and a final state q' such that q.xuny = q'. Thus (q.x).u n is defined, and hence is equal to (q.x). Therefore q.xy=q.xuny=q ' whence xy is accepted by a and thus xy E L . o

Condition (b) of the previous proposition is equivalent to a more algebraic statement.

(7)

Proposition 3.2 F o r e v e ~ ~t~onaZ language I.~ the ~'oZfo~v~ng conditions a / 9 ~qu~vuZent:

(b) .~or eve]'p z,u,p E A ~ z u + ~ c L b~pEe~ zp E L~

Proof. Assume t h a t (b) is s a t i s f i e d and l e t s , e , t E H ( L ) w i t h e idempotent. Assume that set E P. Then, since 11 is surjective, there exist some words x,uty E A * such that x~1 = s, u~1 = e and y q = t . Now, for every n>0, (xunyh1=set EP. Thus x u + y c P ~ - l = L and hence xy EL by (b). I t follows that st = (xy)~l E LTI = P.

Conversely, assume that (c) holds, and l e t x,uty be words such that x u + y C L . Then there exists n>0 such that unq=e is an idempotent. Setting x q = s and y~1=t, we obtain set E Lq = P, and hence st E P by (c). Therefore xy E L, since (xy)q = st E P. o

We now turn to the converse of proposition 3.1, for which we need a more detailed study of monoids with commuting idempotents. Recall that an element x of a monoid H is

~'eguZ~r i f there e x i s t s an element y such that xyx=x and yxy=y. We start with an important combinatorial lemma, due to Ash [2].

Proposition 3.3 L e t M be u monoid with commuting ~dempotents, and tet T I : A * - - - > t / b e a m o n o ( d m o e p h i s m . T h e n t h e e e e z ~ s ~ an ~ntegee #> 0 s u c h t h u t e v e ~ w o r d ~vEA*

crdm~s cr jractor(zu~ion oJ" t h e ~'orm w = u z v i u t . . . v ~ w f t h u i , ... , u k - ~ E A +, ua,u k E A * , v t , ...,v k E A + and

(I) v i ~.,...,vk ~ as'e reguZar eZemen&~ o~" I~,

(2) [d" tti_ I d e n o t e . ~ t h e tcr.~t ( e t t e r o j r u~_ I c m d a i t h e jr~r.~ ( e t t e r o ~ u i , ( b i v i k q a n d (viui.~ ] a r e n o t r~uZ#z'.,

(3) k~: ... ukl ¢ N.

Recall the d e f i n i t i o n of Green's r e l a t i o n n . Let u and v be two elements of a monoid H. Then u n v i f and only i f there exist two elements x,yE• such t h a t u x = v and v y = x . An n-class i s regular i f i t contains a regular element (or, equivalently, an idempotent). The following proposition summarizes the properties of n-classes that are used in this paper.

(8)

Proposition 3.4 Le~ N be a =ono~d u~th commu~f~ng ~de=potenf.~. Then (1) ez:~/~ r e g u l a r ~ - c l a ~ R con~crLr~ a unique ~dempoten~ e, (2) ~ o r euer~ x E R , ex= x,

(3)./'or eue/~ u,~_~E tl, u ~ u ~ u~ and u s = u s U~pties u = u .

(Proof omitted.)

Theorem 3.5 A r a t i o n a l t a n g u a g e I. is a c c e p t e d b~ a reuers~ble a u t o m a t o n or a n d onl~

or ~t o a t ~ i e s conditions ~a) a n d (b) or, eclu~uatentt~, cond~t~on~ (a) a n d (c).

Proof. By propositions 3.1 and 3.2, it suffices to s h o w that if L satisfies (a) and (c), then L E ~ . Let r be the m a x i m u m size of an ~-class of M(L), and let N be the integer given by l e m m a 3.3. Let ~ be the set of all the reversible automata of the form ~B= (G,A,.,I,F) w h e r e G contains at most r(N+1) states and the language accepted by ~B is contained in L.

is a finite set, since there are only a finite number of automata with at most r(N+i) states. Let ~ be the disjoint union of all the automata of ;:. Then ~t is a reversible automaton such that L(~JCL. To prove that L ~ ) is actually equal to L, it suffices to exhibit, for every word w E L, a reversible automaton ~B of ~: such that w E L(~.

Put m = WTl and denote by P(m) the smallest subset of M(L) containing rn and satisfying condition (c) : for every s,e,t E M(L), with e idempotent, set E P(m) implies st E P(m). Now, since m EP, P(m) is contained in P and the language L(m) = P(m)q -I is contained in L.

~4e f i r s t assume t h a t m is a regular element of M(L). Then, by proposition 3.4, the

=-class R of m contains a unique idempotent e and we can state

Proposition 3.6 The language L(m) ~ accepted b~ the reuer.~ble automaton 3=(R~4,.,{e},Lw}), where, ./'or eue/~ xE R and e u e ~ aE A, z.a= z~nl) O r z ~ 1 ) E R (undej'ined otherwL~e).

(9)

Proof. Proposition 3.4 shows that ~B is reversible. Next, L(~)=S~I - i where S = { s E H (L) I es = m } and L(m) = P(m)q -I . Therefore, i t suffices to show that S = P(m). First m E S by proposition 3.4, and i f sftES for some s,tEH(L) and some idempotent f, then es ~esf St (es)f = (esf)f° It follows that es = esf by proposition 3.4, whence est = esft = m and st E S.

Thus S s a t i s f i e s condition (c) and P(m) is contained in S. Conversely, let sES. Then es= m, and hence les= m EP(m). Therefore, by condition (c), s= l.sE P(m) and thus S= P(m) as required.

We now turn to the general case. Let W=UmV:... v~u~ be a f a c t o r i z a t i o n of w given by lemma 3.3. Put, for ! ~ i ~g, Vi~l = mi, and l e t e i be the (unique) idempotent of the ~-class of m i . The previous proposition shows that the language L(mi) is accepted by the automaton

~B i = (R i,A,.,e l,m i).

W e consider also the minimal automaton ~B of the word u= umu i ...u~ defined as follows.

The set of states is the set of left factors of u and, for each letter a E A and for each left factor x of u, x . a = x a if xa is a left factor of u and is undefined otherwise. W e n o w

"sew" the automata ~B and ~Bi's together, according to the following diagram.

Now, Proposition 3.3 implies that the resulting automaton is reversible (the details are omitted), accepts the language K=uaL(ml)ui...uK_iL(mk)u ~ and contains at most r(N+l) states. W e claim that K is contained in L(m) (and thus in L). Indeed put, for 0 { i { ~ , si=uiq, so that m = samls i,.. mksk. Since l<cKYl~l-I/it suffices to show that i{q--sgP(ml)sl...P(mk)sk is contained in P(m). Let T be the set of all (tl, ... ,t k) of P(ml)x...xP(mk) such that smtlsl ...skt~ E P(m). Then T contains (ml, ... ,m~). Furthermore, if (tl, ... ,tk)E T and if ti: x ifiYi for s o m e idempotent f i, then (s at i ... s i- i x i)f i(Y is i ... s ~t k) E P(m),and hence, by condition (c) s e t 1 . . . s i _ i x i Y i s i . . . s ~ t t E P ( m ) , so that ( t : , . . . , t i _ l ~ x i Y i , t i + i , . . . , t k ) E T . Therefore T is equal to P(m i) × .-. x P(m ~} and this concludes the proof.

(10)

4. A topological characterization.

In this section, we give a topological description of the class ¢. Let us first recall the definition of the topology we are concerned with.

One can show that two distinct words u and v of A * can always be separated by a f i n i t e group in the following sense : there exists a finite group G and a monoid morphism

**T" A*--->**

G such that T(u) # ~,(v). Set, for every u,v E F(A)

r(u,v) = min { Card(G) I G is a finite group that separates u and v } and

d(u,v) = e -r(u'v)

with the usual conventions min ~=® and e -®= 0. Then d is a distance (in fact an ultrametric distance) which defines a topology on A*, called the

#r~n~te-group ~opologb, of

the free monoid.

This topology, introduced by Reutenauer [14,15], is an analogue for the free monoid to the topology of the free group introduced by N. Hall [53. I t is the coarsest topology such that every monoid morphism from A * into a discrete f i n i t e group is continuous. The free monoid A*, equipped with this topology, is a topological monoid. The interested reader is referred to [12,14] for a more detailed study of this topology. An example of a converging sequence is given by the following proposition.

Proposition

4.1

Ci4] Fo/'eve/'p word w E A ~ E.m wn'/=1,

The next proposition relates reversible automata to this topology.

Proposition 4.2 [ 1 4 ] £ver~V Zanguage accepted b~ a reuer~Me automaton L~ clc~ed ~n the

~'bT~te group topoZoc3~,

The converse is not true in general. For instance, the language a ' b * is closed but is not accepted by any reversible automaton. However, we have

(11)

Theorem 4.3 A r a t i o n a l language L is accepted b~ cr reversible automaton i f and onL~

(a) the idempotent.e commute ~n It(L) a n d (d) L Le closed ~n the fb~ite group topoZog~.

Proof. I f L is accepted by a reversible automaton, then L is closed by proposition 4.2 and the idempotents of N(L) commute by proposition 3.1 . Conversely, i f L s a t i s f i e s (d), then L s a t i s f i e s (b). Indeed, let xpu,y be words such that xu+y e L . Then, in particular, for every n >0, xun!y E L. Since L is closed, and since the multiplication is continuous, i t follows by proposition 4.1 that xy = lira y, un!y EL. Thus L satisfies (a) and (b) and the result follows

n+¢o from proposition 3. o

~Je have conjectured [ t I ] that a rational language is closed i f and only if i t satisfies condition (c) (or (b)). Theorem 4.3 shows that this conjecture is true i f the idempotents of the syntactic monoid of L commute.

5. Summary

Le~ u s summarize the previous results into a single statement

Theorem 5.i Let L be a rational language. The following conditior~ are equivalent:

(1) L ~ accepted b~ a reversible automaton,

~?) L = K n A + where K i~- a subset o f the f r e e g r o u p F(A) consisting o f a iYnite union o f l e f t cose~ o f f ~n(tel~, generated subgroulxe o f F(A),

(3) the ~dempotents o f It(L) commute and, f o r euer~ x+u~vEA*., z u + v c L implies z~, E L,

(4.> ~he idempotents o f M(L) commute and, f o r ever~ s,t,e E M(L) ~uch t h a t e i~

~demDotent, set E P [mpEe=~ Bt E Fo

(5) the ~dempotents of N(L) commute a n d L ~ cLcceed in the f~n~te-group toDolog,~.

(12)

Acknowledgement.

This wore is the result of many fruitful discussions with C.J. A s h and T.E. Hall during my visit to H o n a s h University (Australia) on the invitation of T.E. Hall. T h e o r e m 2.2 and condition (2) of theorem 5.1 were suggested by Ch. ReL~tenauer. I would like to thank also D. Beauquier, C. Choffrut (LITP, Paris) for numerous helpful conversations.

References.

[ I ] D. Angluin, Inference of reversible languages, Journal of the Association for Computing Machinery 29 (1982) 741-765.

[2] C.J. Ash, Finite semigroups with commuting idempotents, J. Austral. Hath. Soc. Series A, to appear.

[3] J.C. Birget, S.W. Hargolis and J. Rhodes, Finite semigroups whose idempotents commute or form a subsemigroup, Proceedings of the Chico Conference on Semigroups (1986), to appear.

[4] S. Eilenberg, Automata, Languages and Machines, Vol B, Academic Press, New-York (1976).

[5] M. Hall Jr, A topology for free groups and related groups, Ann. Hath. 52 (1950) 127-139.

[6 ] T.E. Hall, Biprefix codes, inverse semigroups and syntactic monoids of injective automata, Theoretical Computer Science.

[7] G. Lallement, Semigroups and Combinatorial Applications, Wiley, New-YorE (1979).

[8] H. Lothaire, Combinatorics on ~/ords, Encyclopedia of Mathematics 17, Addison Wesley, New-York (1983).

[9] R. McNaughtont The loop complexity of pure-group events. Inf. Control 11 (1967) 167-176.

[ I0 ] S.W. M argolis and J.E. Pin, Languages and inverse semigroups, i i th ICALP, Lecture Notes in Computer Science 199, Springer, Berlin (1985) 285-299.

[ I i ] S.W. Nargolis and J.E. Pin, Finite inverse semigroups, varieties and languages, Journal of Algebra, to appear.

[ ! 2 ] J.E. Pin, Finite group topology and p-adic topology for free monoids, 12th ICALP, Lecture Notes in Computer Science 194 (1985) 445-455.

[13] J.E. Pin, Varieties of formal languages, Masson, Paris (1984), North Oxford Academic, London and Plenum, New-York (1986).

(13)

[ i 4 ] Ch. Reutenauer, Une topologie du monoide fibre, Semigroup Forum i8, (1979), 33-49.

[15] Ch. Reutenauer, Sur mon article "une topologie du monoide fibre", Semigroup Forum 22, (t98t), 93-95.