39 DOI: 10.1051/ita:2005035
EQUALITY SETS FOR RECURSIVELY ENUMERABLE LANGUAGES
Vesa Halava
1, Tero Harju
1, Hendrik Jan Hoogeboom
2and Michel Latteux
3Abstract. We consider shifted equality sets of the form EG(a, g1, g2) ={w|g1(w) =ag2(w)}, whereg1 andg2 are nonerasing morphisms andais a letter. We are interested in the family consisting of the languagesh(EG(J)), wherehis a coding andEG(J) is a shifted equality set. We prove several closure properties for this family. More- over, we show that every recursively enumerable languageL⊆A∗is a projection of a shifted equality set, that is, L=πA(EG(a, g1, g2)) for some (nonerasing) morphismsg1andg2and a lettera, whereπAdeletes the letters not inA. Then we deduce that recursively enumerable star languages coincide with the projections of equality sets.
Mathematics Subject Classification.03D25, 68Q45.
1. Introduction
In formal language theory, languages are often determined by their generating grammars or accepting machines. It is also customary to say that languages gen- erated by grammars of certain form or accepted by automata of specific type form a language family. Here we shall study a language family defined by simple gener- alized equality sets of the formEG(J), whereJ = (a, g1, g2) is an instance of the shifted Post Correspondence Problemconsisting of a letteraand two morphismsg1 andg2. Then the setEG(J) consists of the wordswthat satisfyg1(w) =ag2(w).
Keywords and phrases.Morphism, equality set, shifted Post Correspondence Problem, closure properties, recursively enumerable sets.
1 Department of Mathematics and TUCS – Turku Centre for Computer Science, University of Turku, 20014 Turku, Finland;vesa.halava@utu.fi; harju@utu.fi
2 Department of Computer Science, Leiden University PO Box 9512, 2300 RA Leiden, The Netherlands;hoogeboom@liacs.nl
3 Universit´e des Sciences et Technologies de Lille, Bˆatiment M3, 59655 Villeneuve d’Ascq Cedex, France;latteux@lifl.fr
c EDP Sciences 2005
Article published by EDP Sciences and available at http://www.edpsciences.org/itaor http://dx.doi.org/10.1051/ita:2005035
Our motivation for these generalized equality sets comes partly from a result of [6], where it was proved that the family of regular valence languages is equal to the family of languages of the formh(EG(J)), wherehis a coding (i.e., a letter-to- letter morphism), and, moreover, in the instanceJ = (a, g1, g2) the morphismg2 is periodic. Here we shall consider general case where we do not assume g2 to be periodic. However, we do assume that both morphisms are nonerasing. We study the closure properties of this familyCEof languages. In particular, we show thatCEis closed under union, product, Kleene plus, intersection with regular sets.
Also, more surprisingly,CE is closed under nonerasing inverse morphisms.
In the last section, we consider the connection of the sifted equality sets to recursively enumerable languages. In particular, we show that every every recur- sively enumerable languageL⊆A∗is a projection of a shifted equality set, that is, L=πA(EG(a, g1, g2)) for some (nonerasing) morphismsg1 andg2and a lettera, whereπA deletes the letters not inA.
The results of Sections 2 and 3 have been proved in the authors’ conference paper [7]. The characterization results of Section 4 concerning presentation of recursively enumerable sets by shifted equality sets are new. The problem of presenting recursively enumerable sets using (general) equality sets was initiated by Salomaa [14], Culik II [1], and Engelfriet and Rozenberg [2,3]; see also [4,11,15,16].
2. Preliminaries
LetA be an alphabet, and denote byA∗ the monoid of all finite words under the operation of concatenation. Note that theempty word, denoted byε, is in the monoid A∗. The semigroup A∗\ {ε} generated by A is denoted by A+. For a subsetL⊆A∗, we denote byL+ the set of all words of the formw1w2. . . wn for wi ∈Lwithn≥1. ThenL∗=L+∪ {ε}.
For two wordsu, v ∈A∗, uis a prefix of v if there exists a word z ∈A∗ such thatv=uz. Ifv=uz, then we also writeu=vz−1 andz=u−1v.
In the following, letAandB be alphabets andg: A∗→B∗ a mapping. For a wordx∈B∗, we denote byg−1(x) ={w∈A∗|g(w) =x} the inverse image ofx under g. Then g−1(K) =∪x∈Kg−1(x) is the inverse image of K ⊆B∗ under g, andg(L) ={g(w)|w∈L}is theimageofL⊆A∗ underg. Also,g is amorphism ifg(uv) =g(u)g(v) for allu, v∈A∗. A morphismgis acoding, if it maps letters to letters, that is, ifg(A)⊆B. A morphismgis said to beperiodic, if there exists a wordw∈B∗such that g(A∗)⊆ {w}∗.
IfAandB are alphabets such that A⊆B, then the morphismπA:B∗→A∗, defined by
πA(a) =
a ifa∈A, ε ifa∈B\A, is theprojection ofB∗ ontoA∗.
In the following section, for a given alphabetA, the alphabet ¯A={¯a|a∈A}
is acopy ofA, ifA∩A¯=∅.
In the Post Correspondence Problem, PCP for short, we are given two mor- phismsg1, g2: A∗ → B∗ and it is asked whether or not there exists a nonempty wordw∈A+such thatg1(w) =g2(w). Here the pair (g1, g2) is aninstance of the PCP, and the wordwis called asolution. As a general reference to the problems and results concerning the Post Correspondence Problem, we give [8].
For an instanceI= (g1, g2) of the PCP, let
E(I) ={w∈A∗|g1(w) =g2(w)}
be itsequality set. It is easy to show that an equality setE=E(g1, g2) is always a monoid, that is, E =E∗. In fact, it is a free monoid, and thus the algebraic structure ofEis relatively simple, although the problem whether or notEis trivial is undecidable.
We shall now consider special instances of thegeneralized Post Correspondence Problem in order to have slightly more structured equality sets. In theshifted Post Correspondence Problem, orshifted PCP for short, we are given two morphisms g1, g2:A∗ →B∗ and a letter a∈B, and it is asked whether there exists a word w∈A∗ such that
g1(w) =ag2(w). (1)
The triple J = (a, g1, g2) is called an instance of the shifted PCP and a word w satisfying equation (1) is called a solution of J. It is clear that a solution w is always nonempty. We let
EG(J) =
w∈A+|g1(w) =ag2(w) be theshifted equality set ofJ.
We shall denote byCE the set of all languagesh(EG(J)), where his a coding, and the morphisms in the instancesJ of the shifted PCP are both nonerasing.
In [6]CEperis defined as the family of languagesh(EG(J)), wherehis a coding, and one of the morphisms in the instanceJ of the shifted PCP is assumed to be periodic. It was proved in [6] thatCEperis equal to the family of languages defined by the regular valence grammars (see [12]). It is easy to see that the morphisms in the instances could have been assumed to be nonerasing in order to get the same result. Therefore, the familyCE studied in this paper is a generalization ofCEper or, actually,CEper is a subfamily ofCE.
3. Closure properties of CE
The closure properties of the familyCEperfollow from the known closure prop- erties of regular valence languages. In this section, we study the closure properties of the more general familyCE under various operations.
Before we start our journey through the closure results, we make first some assumptions of the instances of the shifted PCP defining the languages at hand.
An instanceJ = (a, g1, g2) of the shifted PCP is said to befrontal, if the shift lettera appears only as the first letter in the images ofg1 and adoes not occur at all in the images ofg2.
Lemma 3.1. Let L=h(EG(J))for a instance J = (a, g1, g2) of the shifted PCP and a coding h. There exists a frontal instance J = (#, g1, g2) and a coding h such thatL=h(EG(J)).
Proof. Assumeg1, g2: A∗→B∗andh:A∗→C∗. Let # be a letter not inB. We shall construct a new instanceJ= (#, g1, g2), whereg1, g2: (A∪A)¯ ∗ →(B∪{#})∗ and ¯A is a copy of A, by setting for all x ∈ A g2(x) = g2(¯x) = g2(x), and g1(x) =g1(x) and
g1(¯x) =
g1(x), ifais not a prefix ofg1(x),
#w, ifg1(x) =aw.
Define a new codingh: (A∪A)¯ ∗→C∗ byh(x) =h(¯x) =h(x) for allx∈A. It
is now obvious thatL=h(EG(J)).
The next lemma shows that we may also assume that the instance (g1, g2) does not have any nontrivial solutions, that is,E(g1, g2) = {ε} for all instances J = (a, g1, g2) defining the language h(EG(J)). For this result we introduce two mappings which are used for desynchronizing a pair of morphisms. Letdbe a new letter. For a wordu=a1a2· · ·an, where eachai is a letter, define
d(u) =da1da2d· · ·dan and rd(u) =a1da2d· · ·dand.
In other words,d is a morphism that adds din front of every letter and rd is a morphism that addsd after every letter of a word. This is a standard technique in language theory, seee.g.[8].
Lemma 3.2. For every instance J of the shifted PCP and codingh, there exists a frontal instanceJ= (a, g1, g2)and a coding h such thath(EG(J)) =h(EG(J)) andE(g1, g2) ={ε}.
Proof. By Lemma 3.1, we can assume that J = (a, g1, g2) is a frontal instance of the shifted PCP. Let g1, g2:A∗ → B∗, and let h: A∗ → C∗. We define new morphismsg1, g2: (A∪A)¯ ∗→(B∪ {d})∗, whered /∈B is a new letter and ¯Ais a copy ofA, as follows. For allx∈A,
g2(x) =d(g2(x)) and g2(¯x) =d(g2(x))d, (2) g1(x) =g1(¯x) =
ad·rd(w), ifg1(x) =aw,
rd(g1(x)), ifais not a prefix ofg1(x). (3)
It is clear thatJ is a frontal instance. Note also that, since the imagesg2(¯x) start and end ind, the letters in ¯A can be used only as the last letter of a solution of J = (a, g1, g2). Since every image byg2begins with letterdand it is not a prefix of any image ofg1, we obtain thatE(g1, g2) ={ε}. On the other hand, (a, g1, g2) has a solutionw¯xif and only ifwxis a solution of (a, g1, g2). Therefore, we can defineh: (A∪A)¯ ∗→C∗ byh(x) =h(¯x) =h(x) for allx∈A. The claim of the lemma follows, since obviouslyh(EG(J)) =h(EG(J)).
We call an instance (a, g1, g2)reduced, if it is frontal andE(g1, g2) ={ε}. 3.1. Rational operations
Theorem 3.3. The family CE is closed under union and product of languages.
Proof. Let K, L∈ CE with K = h1(EG(J1)) and L = h2(EG(J2)), whereJ1 = (a1, g11, g12) and J2 = (a2, g21, g22) are reduced, and g11, g12: Σ∗ → B1∗ and g21, g22: Ω∗ →B∗2. Without restriction we can suppose that Ω∩Σ = ∅. (Oth- erwise we take a copy of the alphabet Ω that is disjoint from Σ.) We can also assume thatB1∩B2=∅. LetB =B1∪B2.
(1) For the closure under union, let # be a new letter. First replace every appearance of the shift lettersa1 anda2 in J1 and J2 by #. Define g1, g2: (Σ∪ Ω)∗→B∗as follows: for allx∈Σ∪Ω,
g1(x) =
g11(x), ifx∈Σ
g21(x), ifx∈Ω and g2(x) =
g12(x), ifx∈Σ g22(x), ifx∈Ω.
Define a codingh: (Σ∪Ω)∗→C∗ similarly:
h(x) =
h1(x), ifx∈Σ
h2(x), ifx∈Ω. (4)
Since Σ∩Ω = ∅, and the instances J1 and J2 are reduced (i.e., E(g11, g12) = {ε}=E(g21, g22)), it follows that the solutions inEG(J1) and EG(J2) cannot be combined or mixed. Thus, it is easy to see thath(EG(#, g1, g2)) =K∪L.
(2) For the closure under product, we assume that the length of the images of the morphisms are at least 2. (Actually, this is needed only forg11). This can be assumed, for example, by the construction in the proof of Lemma 3.2.
We shall prove that KL= {uv|u∈K, v∈L} is in CE. For this, we define g1, g2: (Σ∪Ω)∗→B∗ in the following way: for eachx∈Σ,
g1(x) =
a2(g11(x)), ifa1is not a prefix ofg11(x), a1ya2(w), ifg11(x) =a1yw (y∈B1),
and
g2(x) =ra2(g12(x)),
and for eachx ∈ Ω, g1(x) = g21(x) and g2(x) =g22(x). If we now define hby combiningh1 andh2 as in (4), we obtain thath(EG(a1, g1, g2)) =KL.
We shall now extend the above result by proving thatCEis closed under Kleene plus,i.e., if K ∈ CE, then alsoK+ ∈ CE. ClearlyCE is not closed under Kleene star, since the empty word does not belong to any language inCE.
Theorem 3.4. The family CE is closed under Kleene plus.
Proof. Let K = h(EG(a, g1, g2)), where g1, g2: A∗ → B∗ are nonerasing mor- phisms,h:A∗→C∗ is a coding and the instance (a, g1, g2) is frontal. Also, let ¯A be a copy of A, and define ¯g1,g¯2: (A∪A)¯ ∗ →B∗ in the following way: for each x∈A,
¯
g1(x) =g1(x) and ¯g2(x) =g2(x),
¯ g1(¯x) =
a(g1(x)), ifais not a prefix ofg1(x), a(w), ifg1(x) =aw,
¯
g2(¯x) =ra(g2(x)).
Extendhalso to ¯Aby settingh(¯x) =h(x) for allx∈A.
Now h(EG(a,g¯1,¯g2)) = K+, since ¯g1(w) = a¯g2(w) if and only if, w=x1· · ·xnxn+1, wherexi∈A¯+for 1≤i≤n,xn+1∈A+, ¯g1(xi)a=a¯g2(xi) for 1≤i≤nand ¯g1(xn+1) =a¯g2(xn+1). After removing the bars form the lettersxi (byh), we obtain words inEG(a, g1, g2).
3.2. Intersection with regular languages
We show now thatCE is closed under intersections with regular languages.
Theorem 3.5. The familyCEis closed under intersections with regular languages.
Proof. LetJ = (a, g1, g2) be an instance of the shifted PCP,g1, g2: Σ∗→B∗. Let L=h(EG(J)), whereh: Σ∗→C∗ is a coding.
We shall prove thath(EG(J))∩Ris inCEfor all regularR⊆B∗. We note first thath(EG(J))∩R=h(EG(J)∩h−1(R)), and therefore it is sufficient to show that, for all regular languagesR⊆Σ∗,h(EG(J)∩R) is inCE. Therefore, we shall give a construction for instancesJ of the shifted PCP such thatEG(J) =EG(J)∩R.
AssumeR⊆Σ∗ is regular, and letG= (N,Σ, P, S) be a right linear grammar generating R (see [13]). Let N = {A0, . . . , An−1}, whereS = A0, and assume without restriction, thatS does not appear on the right hand side of any produc- tion. We consider the setP of the productions as an alphabet.
Let # andd be new letters. We define g1, g2:P∗ →(B∪ {d,#})∗ as follows.
First assume that
g1(a) =a1a2. . . ak and g2(a) =b1b2. . . bm for the (generic) lettera. We define
g1(p) =
#dna1dna2dn. . . akdj , ifp= (A0→aAj) dn−ia1dna2dn. . . akdj , ifp= (Ai→aAj),
#dna1dna2dn. . . ak , ifp= (A0→a), dn−ia1dna2dn. . . ak , ifp= (Ai→a), and
g2(p) =dnb1dnb2. . . dnbm, ifp= (A→aX), whereX ∈N∪ {ε}.
As in [9],EG(J) =EG(J)∩Rfor the new instanceJ = (#, g1, g2). The claim
follows from this.
3.3. Morphisms
Next we shall present a construction for the closure under nonerasing mor- phisms. This construction is a bit more complicated than the previous ones.
Theorem 3.6. The familyCE is closed under taking images of nonerasing mor- phisms.
Proof. LetJ = (a, g1, g2) be an instance of the shifted PCP, whereg1, g2:A∗ → B∗. LetL=h(EG(J)), whereh:A∗→C∗is a coding. Assume thatf: C∗ →Σ∗ is a nonerasing morphism. We shall construct h, g1 and g2 such that f(L) = h(EG(J)) for the new instanceJ= (a, g1, g2).
First we show that we can restrict ourselves to cases where
min{|g1(x)|,|g2(x)|} ≥ |f(x)| for all x∈A. (5) Indeed, suppose the instanceJ does not satisfy (5). We construct a new instance J¯ = (#,g¯1,g¯2) and a coding ¯h such that ¯h(EG( ¯J) = h(EG(J)) and ¯g1 and ¯g2 do fulfill (5). Let c /∈ B be a new letter. Let k = maxx∈A{|f(x)|}. We define
¯
g1(x) =kc(g1(x)) and ¯g2(x) =kc(g2(x)) for allx∈A. We also need a new copyx of each letterxfor whichais a prefix ofg1(x). Ifg1(x) =aw, wherew∈B∗, then define ¯g1(x) = #kc(w). It now follows that ifu∈EG( ¯J), thenu=xv for some wordv∈A∗ andxv∈EG(J). Therefore, by defining ¯has follows
¯h(y) =
h(y), ify∈A, h(x), ify=x, we have ¯h(EG( ¯J) =h(EG(J)) as required.
Now assume that (5) holds inJ = (a, g1, g2) and forf. Let us consider the non- erasing morphismf h:A∗ →Σ∗. Note that also the composition f hsatisfies (5).
In order to prove the claim, it is clearly sufficient to consider the case, wherehis the identity mapping, that is,f =f h.
First we define for every image f(x), where x ∈ A, a new alphabet Ax = {bx|b∈Σ}. We consider the words
(b1b2. . . bm)x= (b1)x(b2)x. . .(bm)x, forf(x) =b1. . . bm.
Let c and d be new letters and let n =
x∈A|f(x)|. Assume that A = {x1, x2, . . . , xq}.
Partition the integers 1,2, . . . , n into q sets such that for the letter xi there corresponds a set, saySi={i1, i2, . . . , i|f(xi|}, of|f(xi)|integers.
Assume that f(xi) =b1. . . bm, g1(xi) = a1a2. . . a, and g2(xi) = a1a2. . . ak. We define new morphismsg1 andg2 as follows:
g1((b1)xi) =cndna1ci1,
g1((bj)xi) =cn−ij−1dnajcij forj= 2, . . . , m−1, g1((bm)xi) =cn−im−1dnamcndn. . . cndna,
and
g2((b1)xi) =cndna1cndi1,
g2((bj)xi) =dn−ij−1ajcndij forj= 2, . . . , m−1, g2((bm)xi) =cndn−im−1amcndn. . . cndnak.
Then
g1((b1. . . bm)xi) =cndna1cndna2. . . cndna, g2((b1. . . bm)xi) =cndna1cndna2. . . cndnak.
The beginning has to be still fixed. For the cases, where a1 = a, we need new letters (b1)xi, for which we define
g1((b1)xi) =aci1 andg2((b1)xi) =cndnajcndi1. Now our constructions for the morphismsg1 andg2 are completed.
Next we defineh, by settingh((bi)x) = bi andh((b1)x) =b1 for alli andx.
We obtain thath(EG(J)) =f(h(EG(J)), which proves the claim.
Next we shall prove that the familyCE is closed under inverse of nonerasing morphisms.
Theorem 3.7. The family CE is closed under nonerasing inverse morphisms.
Proof. Consider an instanceh(EG(J)), where J = (a, g1, g2) with gi:A∗ → B∗ andh:A∗→C∗ is a coding. We may assume thath(A) =C.
Moreover, letg: Σ∗→C∗ be a nonerasing morphism.
For eachx∈Σ, leth−1g(x) ={vx,1, vx,2, . . . , vx,kx}and let Σx= x(1), . . . , x(kx)
be a set of new letters for x. Denote Θ = ∪x∈ΣΣx, and define the morphisms g1, g2: Θ∗→B∗ and the codingt: Θ∗→Σ∗ by
gj
x(i)
=gj(vx,i) forj= 1,2, and t
x(i)
=x
for eachx(i)∈Θ.
Consider the instanceJ= (a, g1, g2).
Now, assume thatu=a1a2. . . an ∈ g−1h(EG(J)) (with ai ∈Σ). Then there exists a word w = w1w2. . . wn such that g1(w) = ag2(w) and ai ∈ g−1h(wi), that is,wi =vai,ri ∈h−1g(ai) for some ri, and so g1(w) =ag2(w) for the word w=a(r11)a(r22). . . a(rnn), for whicht(w) =u. Thereforeu∈t(EG(J)).
The converse inclusion t(EG(J)) ⊆ g−1h(EG(J)) is clear by the above con-
structions.
LetA andB be two alphabets. A mapping τ: A∗ →2B∗, where 2B∗ denotes the set of all subsets ofB∗, is asubstitution if for all u, v∈A∗
τ(uv) =τ(u)τ(v).
Note thatτ is actually a morphism fromA∗ to 2B∗.
A substitutionτis calledfiniteifτ(a) is a finite set for alla∈A, andnonerasing ifε /∈τ(a) for alla∈A.
Corollary 3.8. The family CE is closed under nonerasing finite substitutions.
Proof. SinceCEis closed under nonerasing morphisms, inverses of nonerasing mor- phisms, it is closed under nonerasing finite substitutions. Indeed, as is immediate, every finite substitution is a composition of an inverse of a coding and a nonerasing
morphism.
Note thatCEis almost atrio, see [5], but it seems that it is not closed underall inverse morphisms. It is also almost a bifaithful rational cone, see [10], but since the languages do not contain the empty word,CEis not closed under the bifaithful finite transductions.
4. Equality sets and recursively enumerable languages
The following result of Engelfriet and Rozenberg [3] gives a classical morphic representation of recursively enumerable languages; see also Salomaa [15] (see Th. 6.9, p. 111). Recall thatπA denotes the projection onto A∗.
Theorem 4.1. For every recursively enumerable language L ⊆ A∗, there are two morphismsh1, h2 and a regular languageR such thatL=πA(E(h1, h2)∩R).
A slight modification of its proof permits to strengthen this theorem:
Lemma 4.2. For every recursively enumerable language L ⊆ A∗, there are two nonerasing morphisms h1, h2 and a regular language R such that L = πA(E(h1, h2)∩R). Moreover, one can take R = KA∗K where K and K are regular languages defined on an alphabet B disjoint fromA.
Proof. Assume first that ε /∈ L. Let G = (N, A, P, S) be a type 0 grammar generatingL, where we can assume that the productions have no terminal letters on the right hand side,i.e., P ⊆N+×(N∪A)+. Let ¯A be a copy ofA that is disjoint from the other alphabets. Also, letV =N∪A¯and R=KA∗K with
K=S0(V∗P V∗)∗ and K=F#∗, whereS0,,F and # are new symbols.
Let us define the morphismsh1and h2 by
S0 p= (u, v) X ∈N a¯∈A¯ a∈A F #
h1 S0S v X a # # #
h2 S0 u X a a ##
Let us takeu∈πA(E(h1, h2)∩R). Then there exists a wordz∈E(h1, h2)∩R such thath1(z) =h2(z), andu=πA(z). Here
z=S0z1. . .znF#i,
wherez1, . . . , zn−1∈V∗P V∗,u=zn, andi≥0. Hence, for 1≤j≤n−1, h2(zj) =⇒Gh1(zj),
h1(S0z1. . .zj) =h2(S0z1. . .zj+1).
Therefore,h2(z1) =Sandh1(zj) =h2(zj+1) for 1≤j≤n−1. So we obtain that S=h2(z1) =⇒G h1(z1) =h2(z2) =⇒G. . .=⇒Gh1(zn−2)
=h2(zn−1) =⇒Gh1(zn−1) =h2(zn) =u and thereforeu∈L.
Conversely, ifu∈L, then we have a derivation
S=w1=⇒Gw2=⇒G. . .=⇒G wn =u
according to the grammarG. For each 1≤j ≤n−1, one can findzj ∈V∗P V∗ such thath1(zj) =wj+1 andh2(zj) =wj . Set then
z=S0z1. . .zn−1uF#i+1,
whereiis the length ofu. Thenz∈Rand one can easily check thath1(z) =h2(z).
Hence,u=πA(z)∈πA(E(h1, h2)∩R).
Finally, ifε∈L, setD=A∪ {d}, where dis a new symbol. Then Ld=πD(E(h1, h2)∩KD∗K) =πD(E(h1, h2)∩KA∗dK)
and hence L = πA(E(h1, h2) ∩KA∗dK). This completes the proof of the
lemma.
Note that the form of the regular languageR and the fact that the two mor- phisms are nonerasing are crucial for the proofs of the following lemmata. The proof of the following lemma uses the methods from [9].
Lemma 4.3. Let A andB be two disjoint alphabets and h1, h2: (A∪B)∗ →C∗ be two nonerasing morphisms. If K and K are two regular languages included in B+, then πA(E(h1, h2)∩KA∗K) = πA(EG(#, g1, g2)) for some nonerasing morphismsg1 andg2.
Proof. Let us take two nondeterministic finite automata M = (Q, B,∆, q0, F), M = (Q, B,∆, q0, F) such that L(M) =K and L(M) =K. The transitions are triples of the form (q, b, p), that is, ∆⊆Q×B×Qand ∆⊆Q×B×Q. Clearly, one can assume thatQ={q0, . . . , qn}andQ={q0, . . . , qn}withQ∩Q=∅, and that the automata have unique final statesF={qn}andF={qn}. Also, we can assume that there are no transitions (q, b, q0) and (q, b, q0) that enter the initial statesq0 and q0, and that there are no transitions (qf, b, q) and (qf, b, q) leaving from the final statesqf andqf.
First, we define three morphisms Θ,and ras follows.
Letθ: (A∪∆∪∆)∗→(A∪B)∗ be the morphism defined by θ(a) =afora∈A, andθ((p, b, q)) =b for (p, b, q)∈∆∪∆.
Also, let=d2n and r=rd2n, that is, , r:C∗ →(C∪ {d})∗, where dis a new symbol, and
(c) =d2nc, andr(c) =cd2n forc∈C.
The morphismg2: (A∪∆∪∆)∗→(C∪ {#, d})∗ becomes defined by g2=rh2θ.
It is immediate thatg2(z)∈(Cd2n)∗ for allz∈(A∪∆∪∆)∗. The shift letter # does not belong to any image ofg2. The notationwd−mmeansw(dm)−1, that is, dm is removed as a suffix of the wordw, and similarly d−mwis an abbreviation for (dm)−1w. The morphismg1: (A∪∆∪∆)∗→(C∪ {#, d})∗is defined by
g1((q0, b, qj)) = #rh1(b)d−2j,
g1((qi, b, qj)) =d2irh1(b)d−2j fori= 0, g1(a) =h1(a) fora∈A,
g1((q0, b, qn)) =h1(b)d2n,
g1((q0, b, qj)) =h1(b)d2j+1 forj =n, g1((qi, b, qn)) =d−(2i+1)h1(b)d2n fori= 0,
g1((qi, b, qj)) =d−(2i+1)h1(b)d2j+1 fori= 0 andj=n.
The morphismg1 decodes the behaviour of the combined automata that accepts the languageKA∗K in the sense thatg1(z)∈#(Cd2n)∗ if and only ifz =uvu for some wordsu∈∆∗,v∈A∗, and u∈∆∗such thatθ(u)∈K andθ(u)∈K. Therefore, we have
g1(z)∈#(Cd2n)∗ ⇐⇒ θ(z)∈KA∗K. (6) Finally, let π = πC be the projection π: (C∪ {#, d})∗ → C∗ that deletes the lettersdand #. Then we have
πg1=h1θ and πg2=h2θ. (7) Let v be a word in πA(EG(#, g1, g2)) and let z be such that v = πA(z) and g1(z) = #g2(z). Sinceg2(z)∈(Cd2n)∗, alsog1(z)∈(Cd2n)∗, and it follows by (6) thatθ(z)∈KA∗K. Consequently, by (7), we have
h1θ(z) =πg1(z) =π(#g2(z)) =πg2(z) =h2θ(z).
Hence,θ(z)∈E(h1, h2)∩KA∗K and alsov=πAθ(z)∈πA(E(h1, h2)∩KA∗K) as required.
Conversely, let v ∈ πA(E(h1, h2)∩KA∗K), say v = πA(kvk) with k ∈ K, k ∈ K and h1(kvk) = h2(kvk). Then there exists a word z = uvu with u∈ ∆+, u ∈ ∆+, θ(u) = k, θ(u) = k, θ(z) = kvk and g1(z) = #rh1θ(z) =
#rh2θ(z) = #g2(z). Therefore,v =πA(z)∈πA(EG(#, g1, g2)), which completes
the proof.