Equality sets for recursively enumerable languages

(1)

39 DOI: 10.1051/ita:2005035

EQUALITY SETS FOR RECURSIVELY ENUMERABLE LANGUAGES

Vesa Halava

¹

, Tero Harju

¹

, Hendrik Jan Hoogeboom

²

and Michel Latteux

³

Abstract. ^We ^consider ^shifted ^equality ^sets ^of ^the ^form E_G(a, g₁, g₂) ={w|g₁(w) =ag₂(w)}, whereg₁ andg₂ are nonerasing morphisms andais a letter. We are interested in the family consisting of the languagesh(E_G(J)), wherehis a coding andE_G(J) is a shifted equality set. We prove several closure properties for this family. More- over, we show that every recursively enumerable languageL⊆A^∗is a projection of a shifted equality set, that is, L=πA(EG(a, g1, g2)) for some (nonerasing) morphismsg₁andg₂and a lettera, whereπ_Adeletes the letters not inA. Then we deduce that recursively enumerable star languages coincide with the projections of equality sets.

Mathematics Subject Classification.03D25, 68Q45.

1. Introduction

In formal language theory, languages are often determined by their generating grammars or accepting machines. It is also customary to say that languages generated by grammars of certain form or accepted by automata of speciﬁc type form a language family. Here we shall study a language family deﬁned by simple generalized equality sets of the formE_G(J), whereJ = (a, g₁, g₂) is an instance of the shifted Post Correspondence Problemconsisting of a letteraand two morphismsg₁ andg₂. Then the setE_G(J) consists of the wordswthat satisfyg₁(w) =ag₂(w).

Keywords and phrases.Morphism, equality set, shifted Post Correspondence Problem, closure properties, recursively enumerable sets.

1 Department of Mathematics and TUCS – Turku Centre for Computer Science, University of Turku, 20014 Turku, Finland;vesa.halava@utu.fi; harju@utu.fi

2 Department of Computer Science, Leiden University PO Box 9512, 2300 RA Leiden, The Netherlands;hoogeboom@liacs.nl

3 Universit´e des Sciences et Technologies de Lille, Bˆatiment M3, 59655 Villeneuve d’Ascq Cedex, France;latteux@lifl.fr

c EDP Sciences 2005

Article published by EDP Sciences and available at http://www.edpsciences.org/itaor http://dx.doi.org/10.1051/ita:2005035

(2)

Our motivation for these generalized equality sets comes partly from a result of [6], where it was proved that the family of regular valence languages is equal to the family of languages of the formh(E_G(J)), wherehis a coding (i.e., a letter-to- letter morphism), and, moreover, in the instanceJ = (a, g₁, g₂) the morphismg₂ is periodic. Here we shall consider general case where we do not assume g₂ to be periodic. However, we do assume that both morphisms are nonerasing. We study the closure properties of this familyCEof languages. In particular, we show thatCEis closed under union, product, Kleene plus, intersection with regular sets.

Also, more surprisingly,CE is closed under nonerasing inverse morphisms.

In the last section, we consider the connection of the sifted equality sets to recursively enumerable languages. In particular, we show that every every recursively enumerable languageL⊆A^∗is a projection of a shifted equality set, that is, L=π_A(E_G(a, g₁, g₂)) for some (nonerasing) morphismsg₁ andg₂and a lettera, whereπ_A deletes the letters not inA.

The results of Sections 2 and 3 have been proved in the authors’ conference paper [7]. The characterization results of Section 4 concerning presentation of recursively enumerable sets by shifted equality sets are new. The problem of presenting recursively enumerable sets using (general) equality sets was initiated by Salomaa [14], Culik II [1], and Engelfriet and Rozenberg [2,3]; see also [4,11,15,16].

2. Preliminaries

LetA be an alphabet, and denote byA^∗ the monoid of all ﬁnite words under the operation of concatenation. Note that theempty word, denoted byε, is in the monoid A^∗. The semigroup A^∗\ {ε} generated by A is denoted by A⁺. For a subsetL⊆A^∗, we denote byL⁺ the set of all words of the formw₁w₂. . . w_n for w_i ∈Lwithn≥1. ThenL^∗=L⁺∪ {ε}.

For two wordsu, v ∈A^∗, uis a prefix of v if there exists a word z ∈A^∗ such thatv=uz. Ifv=uz, then we also writeu=vz⁻¹ andz=u⁻¹v.

In the following, letAandB be alphabets andg: A^∗→B^∗ a mapping. For a wordx∈B^∗, we denote byg⁻¹(x) ={w∈A^∗|g(w) =x} the inverse image ofx under g. Then g⁻¹(K) =∪x∈Kg⁻¹(x) is the inverse image of K ⊆B^∗ under g, andg(L) ={g(w)|w∈L}is theimageofL⊆A^∗ underg. Also,g is amorphism ifg(uv) =g(u)g(v) for allu, v∈A^∗. A morphismgis acoding, if it maps letters to letters, that is, ifg(A)⊆B. A morphismgis said to beperiodic, if there exists a wordw∈B^∗such that g(A^∗)⊆ {w}^∗.

IfAandB are alphabets such that A⊆B, then the morphismπ_A:B^∗→A^∗, deﬁned by

π_A(a) =

a ifa∈A, ε ifa∈B\A, is theprojection ofB^∗ ontoA^∗.

In the following section, for a given alphabetA, the alphabet ¯A={¯a|a∈A}

is acopy ofA, ifA∩A¯=∅.

(3)

In the Post Correspondence Problem, PCP for short, we are given two mor- phismsg₁, g₂: A^∗ → B^∗ and it is asked whether or not there exists a nonempty wordw∈A⁺such thatg₁(w) =g₂(w). Here the pair (g₁, g₂) is aninstance of the PCP, and the wordwis called asolution. As a general reference to the problems and results concerning the Post Correspondence Problem, we give [8].

For an instanceI= (g₁, g₂) of the PCP, let

E(I) ={w∈A^∗|g₁(w) =g₂(w)}

be itsequality set. It is easy to show that an equality setE=E(g₁, g₂) is always a monoid, that is, E =E^∗. In fact, it is a free monoid, and thus the algebraic structure ofEis relatively simple, although the problem whether or notEis trivial is undecidable.

We shall now consider special instances of thegeneralized Post Correspondence Problem in order to have slightly more structured equality sets. In theshifted Post Correspondence Problem, orshifted PCP for short, we are given two morphisms g₁, g₂:A^∗ →B^∗ and a letter a∈B, and it is asked whether there exists a word w∈A^∗ such that

g₁(w) =ag₂(w). (1)

The triple J = (a, g₁, g₂) is called an instance of the shifted PCP and a word w satisfying equation (1) is called a solution of J. It is clear that a solution w is always nonempty. We let

E_G(J) =

w∈A⁺|g₁(w) =ag₂(w) be theshifted equality set ofJ.

We shall denote byCE the set of all languagesh(E_G(J)), where his a coding, and the morphisms in the instancesJ of the shifted PCP are both nonerasing.

In [6]CEperis deﬁned as the family of languagesh(E_G(J)), wherehis a coding, and one of the morphisms in the instanceJ of the shifted PCP is assumed to be periodic. It was proved in [6] thatCEperis equal to the family of languages deﬁned by the regular valence grammars (see [12]). It is easy to see that the morphisms in the instances could have been assumed to be nonerasing in order to get the same result. Therefore, the familyCE studied in this paper is a generalization ofCE_per or, actually,CE_per is a subfamily ofCE.

3. Closure properties of CE

The closure properties of the familyCEperfollow from the known closure properties of regular valence languages. In this section, we study the closure properties of the more general familyCE under various operations.

(4)

Before we start our journey through the closure results, we make ﬁrst some assumptions of the instances of the shifted PCP deﬁning the languages at hand.

An instanceJ = (a, g₁, g₂) of the shifted PCP is said to befrontal, if the shift lettera appears only as the ﬁrst letter in the images ofg₁ and adoes not occur at all in the images ofg₂.

Lemma 3.1. Let L=h(E_G(J))for a instance J = (a, g₁, g₂) of the shifted PCP and a coding h. There exists a frontal instance J = (#, g₁, g₂) and a coding h such thatL=h(E_G(J)).

Proof. Assumeg₁, g₂: A^∗→B^∗andh:A^∗→C^∗. Let # be a letter not inB. We shall construct a new instanceJ= (#, g₁, g₂), whereg₁, g₂: (A∪A)¯ ^∗ →(B∪{#})^∗ and ¯A is a copy of A, by setting for all x ∈ A g₂(x) = g₂(¯x) = g₂(x), and g₁(x) =g₁(x) and

g₁(¯x) =

g₁(x), ifais not a preﬁx ofg₁(x),

#w, ifg₁(x) =aw.

Deﬁne a new codingh: (A∪A)¯ ^∗→C^∗ byh(x) =h(¯x) =h(x) for allx∈A. It

is now obvious thatL=h(E_G(J)).

The next lemma shows that we may also assume that the instance (g₁, g₂) does not have any nontrivial solutions, that is,E(g₁, g₂) = {ε} for all instances J = (a, g₁, g₂) deﬁning the language h(E_G(J)). For this result we introduce two mappings which are used for desynchronizing a pair of morphisms. Letdbe a new letter. For a wordu=a₁a₂· · ·a_n, where eacha_i is a letter, deﬁne

_d(u) =da₁da₂d· · ·da_n and r_d(u) =a₁da₂d· · ·da_nd.

In other words,_d is a morphism that adds din front of every letter and r_d is a morphism that addsd after every letter of a word. This is a standard technique in language theory, seee.g.[8].

Lemma 3.2. For every instance J of the shifted PCP and codingh, there exists a frontal instanceJ= (a, g₁, g₂)and a coding h such thath(E_G(J)) =h(E_G(J)) andE(g₁, g₂) ={ε}.

Proof. By Lemma 3.1, we can assume that J = (a, g₁, g₂) is a frontal instance of the shifted PCP. Let g₁, g₂:A^∗ → B^∗, and let h: A^∗ → C^∗. We deﬁne new morphismsg₁, g₂: (A∪A)¯ ^∗→(B∪ {d})^∗, whered /∈B is a new letter and ¯Ais a copy ofA, as follows. For allx∈A,

g₂(x) =_d(g₂(x)) and g₂(¯x) =_d(g₂(x))d, (2) g₁(x) =g₁(¯x) =

ad·r_d(w), ifg₁(x) =aw,

r_d(g₁(x)), ifais not a preﬁx ofg₁(x). (3)

(5)

It is clear thatJ is a frontal instance. Note also that, since the imagesg₂(¯x) start and end ind, the letters in ¯A can be used only as the last letter of a solution of J = (a, g₁, g₂). Since every image byg₂begins with letterdand it is not a preﬁx of any image ofg₁, we obtain thatE(g₁, g₂) ={ε}. On the other hand, (a, g₁, g₂) has a solutionw¯xif and only ifwxis a solution of (a, g₁, g₂). Therefore, we can deﬁneh: (A∪A)¯ ^∗→C^∗ byh(x) =h(¯x) =h(x) for allx∈A. The claim of the lemma follows, since obviouslyh(E_G(J)) =h(E_G(J)).

We call an instance (a, g₁, g₂)reduced, if it is frontal andE(g₁, g₂) ={ε}. 3.1. Rational operations

Theorem 3.3. The family CE is closed under union and product of languages.

Proof. Let K, L∈ CE with K = h₁(E_G(J₁)) and L = h₂(E_G(J₂)), whereJ₁ = (a₁, g₁₁, g₁₂) and J₂ = (a₂, g₂₁, g₂₂) are reduced, and g₁₁, g₁₂: Σ^∗ → B₁^∗ and g₂₁, g₂₂: Ω^∗ →B^∗₂. Without restriction we can suppose that Ω∩Σ = ∅. (Oth- erwise we take a copy of the alphabet Ω that is disjoint from Σ.) We can also assume thatB₁∩B₂=∅. LetB =B₁∪B₂.

(1) For the closure under union, let # be a new letter. First replace every appearance of the shift lettersa₁ anda₂ in J₁ and J₂ by #. Deﬁne g₁, g₂: (Σ∪ Ω)^∗→B^∗as follows: for allx∈Σ∪Ω,

g₁(x) =

g₁₁(x), ifx∈Σ

g₂₁(x), ifx∈Ω and g₂(x) =

g₁₂(x), ifx∈Σ g₂₂(x), ifx∈Ω.

Deﬁne a codingh: (Σ∪Ω)^∗→C^∗ similarly:

h(x) =

h₁(x), ifx∈Σ

h₂(x), ifx∈Ω. (4)

Since Σ∩Ω = ∅, and the instances J₁ and J₂ are reduced (i.e., E(g₁₁, g₁₂) = {ε}=E(g₂₁, g₂₂)), it follows that the solutions inE_G(J₁) and E_G(J₂) cannot be combined or mixed. Thus, it is easy to see thath(E_G(#, g₁, g₂)) =K∪L.

(2) For the closure under product, we assume that the length of the images of the morphisms are at least 2. (Actually, this is needed only forg₁₁). This can be assumed, for example, by the construction in the proof of Lemma 3.2.

We shall prove that KL= {uv|u∈K, v∈L} is in CE. For this, we deﬁne g₁, g₂: (Σ∪Ω)^∗→B^∗ in the following way: for eachx∈Σ,

g₁(x) =

_a₂(g₁₁(x)), ifa₁is not a preﬁx ofg₁₁(x), a₁y_a₂(w), ifg₁₁(x) =a₁yw (y∈B₁),

(6)

and

g₂(x) =r_a₂(g₁₂(x)),

and for eachx ∈ Ω, g₁(x) = g₂₁(x) and g₂(x) =g₂₂(x). If we now deﬁne hby combiningh₁ andh₂ as in (4), we obtain thath(E_G(a₁, g₁, g₂)) =KL.

We shall now extend the above result by proving thatCEis closed under Kleene plus,i.e., if K ∈ CE, then alsoK⁺ ∈ CE. ClearlyCE is not closed under Kleene star, since the empty word does not belong to any language inCE.

Theorem 3.4. The family CE is closed under Kleene plus.

Proof. Let K = h(E_G(a, g₁, g₂)), where g₁, g₂: A^∗ → B^∗ are nonerasing morphisms,h:A^∗→C^∗ is a coding and the instance (a, g₁, g₂) is frontal. Also, let ¯A be a copy of A, and deﬁne ¯g₁,g¯₂: (A∪A)¯ ^∗ →B^∗ in the following way: for each x∈A,

¯

g₁(x) =g₁(x) and ¯g₂(x) =g₂(x),

¯ g₁(¯x) =

_a(g₁(x)), ifais not a preﬁx ofg₁(x), _a(w), ifg₁(x) =aw,

¯

g₂(¯x) =r_a(g₂(x)).

Extendhalso to ¯Aby settingh(¯x) =h(x) for allx∈A.

Now h(E_G(a,g¯₁,¯g₂)) = K⁺, since ¯g₁(w) = a¯g₂(w) if and only if, w=x₁· · ·x_nx_n+1, wherex_i∈A¯⁺for 1≤i≤n,x_n+1∈A⁺, ¯g₁(x_i)a=a¯g₂(x_i) for 1≤i≤nand ¯g₁(x_n+1) =a¯g₂(x_n+1). After removing the bars form the lettersx_i (byh), we obtain words inE_G(a, g₁, g₂).

3.2. Intersection with regular languages

We show now thatCE is closed under intersections with regular languages.

Theorem 3.5. The familyCEis closed under intersections with regular languages.

Proof. LetJ = (a, g₁, g₂) be an instance of the shifted PCP,g₁, g₂: Σ^∗→B^∗. Let L=h(E_G(J)), whereh: Σ^∗→C^∗ is a coding.

We shall prove thath(E_G(J))∩Ris inCEfor all regularR⊆B^∗. We note ﬁrst thath(E_G(J))∩R=h(E_G(J)∩h⁻¹(R)), and therefore it is suﬃcient to show that, for all regular languagesR⊆Σ^∗,h(E_G(J)∩R) is inCE. Therefore, we shall give a construction for instancesJ of the shifted PCP such thatE_G(J) =E_G(J)∩R.

AssumeR⊆Σ^∗ is regular, and letG= (N,Σ, P, S) be a right linear grammar generating R (see [13]). Let N = {A0, . . . , A_n−1}, whereS = A₀, and assume without restriction, thatS does not appear on the right hand side of any produc- tion. We consider the setP of the productions as an alphabet.

(7)

Let # andd be new letters. We deﬁne g₁, g₂:P^∗ →(B∪ {d,#})^∗ as follows.

First assume that

g₁(a) =a₁a₂. . . a_k and g₂(a) =b₁b₂. . . b_m for the (generic) lettera. We deﬁne

g₁(p) =











#dⁿa₁dⁿa₂dⁿ. . . a_kd^j , ifp= (A₀→aA_j) dⁿ⁻ⁱa₁dⁿa₂dⁿ. . . a_kd^j , ifp= (A_i→aA_j),

#dⁿa₁dⁿa₂dⁿ. . . a_k , ifp= (A₀→a), dⁿ⁻ⁱa₁dⁿa₂dⁿ. . . a_k , ifp= (A_i→a), and

g₂(p) =dⁿb₁dⁿb₂. . . dⁿb_m, ifp= (A→aX), whereX ∈N∪ {ε}.

As in [9],E_G(J) =E_G(J)∩Rfor the new instanceJ = (#, g₁, g₂). The claim

follows from this.

3.3. Morphisms

Next we shall present a construction for the closure under nonerasing morphisms. This construction is a bit more complicated than the previous ones.

Theorem 3.6. The familyCE is closed under taking images of nonerasing mor- phisms.

Proof. LetJ = (a, g₁, g₂) be an instance of the shifted PCP, whereg₁, g₂:A^∗ → B^∗. LetL=h(E_G(J)), whereh:A^∗→C^∗is a coding. Assume thatf: C^∗ →Σ^∗ is a nonerasing morphism. We shall construct h, g₁ and g₂ such that f(L) = h(E_G(J)) for the new instanceJ= (a, g₁, g₂).

First we show that we can restrict ourselves to cases where

min{|g1(x)|,|g2(x)|} ≥ |f(x)| for all x∈A. (5) Indeed, suppose the instanceJ does not satisfy (5). We construct a new instance J¯ = (#,g¯₁,g¯₂) and a coding ¯h such that ¯h(E_G( ¯J) = h(E_G(J)) and ¯g₁ and ¯g₂ do fulﬁll (5). Let c /∈ B be a new letter. Let k = max_x∈A{|f(x)|}. We deﬁne

¯

g₁(x) =^k_c(g₁(x)) and ¯g₂(x) =^k_c(g₂(x)) for allx∈A. We also need a new copyx of each letterxfor whichais a prefix ofg₁(x). Ifg₁(x) =aw, wherew∈B^∗, then define ¯g₁(x) = #^k_c(w). It now follows that ifu∈E_G( ¯J), thenu=xv for some wordv∈A^∗ andxv∈E_G(J). Therefore, by defining ¯has follows

¯h(y) =

h(y), ify∈A, h(x), ify=x, we have ¯h(E_G( ¯J) =h(E_G(J)) as required.

(8)

Now assume that (5) holds inJ = (a, g₁, g₂) and forf. Let us consider the nonerasing morphismf h:A^∗ →Σ^∗. Note that also the composition f hsatisﬁes (5).

In order to prove the claim, it is clearly suﬃcient to consider the case, wherehis the identity mapping, that is,f =f h.

First we deﬁne for every image f(x), where x ∈ A, a new alphabet A_x = {b_x|b∈Σ}. We consider the words

(b₁b₂. . . b_m)_x= (b₁)_x(b₂)_x. . .(b_m)_x, forf(x) =b₁. . . b_m.

Let c and d be new letters and let n =

x∈A|f(x)|. Assume that A = {x₁, x₂, . . . , x_q}.

Partition the integers 1,2, . . . , n into q sets such that for the letter x_i there corresponds a set, sayS_i={i1, i₂, . . . , i_|f(x_i_|}, of|f(x_i)|integers.

Assume that f(x_i) =b₁. . . b_m, g₁(x_i) = a₁a₂. . . a, and g₂(x_i) = a₁a₂. . . a_k. We deﬁne new morphismsg₁ andg₂ as follows:

g₁((b₁)_x_i) =cⁿdⁿa₁cⁱ¹,

g₁((b_j)_x_i) =cⁿ⁻ⁱ^j−1dⁿa_jcⁱ^j forj= 2, . . . , m−1, g₁((b_m)_x_i) =cⁿ⁻ⁱ^m−1dⁿa_mcⁿdⁿ. . . cⁿdⁿa,

and

g₂((b₁)_x_i) =cⁿdⁿa₁cⁿdⁱ¹,

g₂((b_j)_x_i) =dⁿ⁻ⁱ^j−1a_jcⁿdⁱ^j forj= 2, . . . , m−1, g₂((b_m)_x_i) =cⁿdⁿ⁻ⁱ^m−1a_mcⁿdⁿ. . . cⁿdⁿa_k.

Then

g₁((b₁. . . b_m)_x_i) =cⁿdⁿa₁cⁿdⁿa₂. . . cⁿdⁿa, g₂((b₁. . . b_m)_x_i) =cⁿdⁿa₁cⁿdⁿa₂. . . cⁿdⁿa_k.

The beginning has to be still ﬁxed. For the cases, where a₁ = a, we need new letters (b₁)_x_i, for which we deﬁne

g₁((b₁)_x_i) =acⁱ¹ andg₂((b₁)_x_i) =cⁿdⁿa_jcⁿdⁱ¹. Now our constructions for the morphismsg₁ andg₂ are completed.

Next we deﬁneh, by settingh((b_i)_x) = b_i andh((b₁)_x) =b₁ for alli andx.

We obtain thath(E_G(J)) =f(h(E_G(J)), which proves the claim.

Next we shall prove that the familyCE is closed under inverse of nonerasing morphisms.

(9)

Theorem 3.7. The family CE is closed under nonerasing inverse morphisms.

Proof. Consider an instanceh(E_G(J)), where J = (a, g₁, g₂) with g_i:A^∗ → B^∗ andh:A^∗→C^∗ is a coding. We may assume thath(A) =C.

Moreover, letg: Σ^∗→C^∗ be a nonerasing morphism.

For eachx∈Σ, leth⁻¹g(x) ={vx,1, v_x,2, . . . , v_x,k_x}and let Σ_x= x⁽¹⁾, . . . , x^(k^x⁾

be a set of new letters for x. Denote Θ = ∪_x∈ΣΣ_x, and deﬁne the morphisms g₁, g₂: Θ^∗→B^∗ and the codingt: Θ^∗→Σ^∗ by

g_j

x⁽ⁱ⁾

=g_j(v_x,i) forj= 1,2, and t

x⁽ⁱ⁾

=x

for eachx⁽ⁱ⁾∈Θ.

Consider the instanceJ= (a, g₁, g₂).

Now, assume thatu=a₁a₂. . . a_n ∈ g⁻¹h(E_G(J)) (with a_i ∈Σ). Then there exists a word w = w₁w₂. . . w_n such that g₁(w) = ag₂(w) and a_i ∈ g⁻¹h(w_i), that is,w_i =v_a_i_,r_i ∈h⁻¹g(a_i) for some r_i, and so g₁(w) =ag₂(w) for the word w=a^(r₁¹⁾a^(r₂²⁾. . . a^(r_nⁿ⁾, for whicht(w) =u. Thereforeu∈t(E_G(J)).

The converse inclusion t(E_G(J)) ⊆ g⁻¹h(E_G(J)) is clear by the above con-

structions.

LetA andB be two alphabets. A mapping τ: A^∗ →2^B^∗, where 2^B^∗ denotes the set of all subsets ofB^∗, is asubstitution if for all u, v∈A^∗

τ(uv) =τ(u)τ(v).

Note thatτ is actually a morphism fromA^∗ to 2^B^∗.

A substitutionτis calledfiniteifτ(a) is a ﬁnite set for alla∈A, andnonerasing ifε /∈τ(a) for alla∈A.

Corollary 3.8. The family CE is closed under nonerasing finite substitutions.

Proof. SinceCEis closed under nonerasing morphisms, inverses of nonerasing morphisms, it is closed under nonerasing ﬁnite substitutions. Indeed, as is immediate, every ﬁnite substitution is a composition of an inverse of a coding and a nonerasing

morphism.

Note thatCEis almost atrio, see [5], but it seems that it is not closed underall inverse morphisms. It is also almost a bifaithful rational cone, see [10], but since the languages do not contain the empty word,CEis not closed under the bifaithful ﬁnite transductions.

(10)

4. Equality sets and recursively enumerable languages

The following result of Engelfriet and Rozenberg [3] gives a classical morphic representation of recursively enumerable languages; see also Salomaa [15] (see Th. 6.9, p. 111). Recall thatπ_A denotes the projection onto A^∗.

Theorem 4.1. For every recursively enumerable language L ⊆ A^∗, there are two morphismsh₁, h₂ and a regular languageR such thatL=π_A(E(h₁, h₂)∩R).

A slight modiﬁcation of its proof permits to strengthen this theorem:

Lemma 4.2. For every recursively enumerable language L ⊆ A^∗, there are two nonerasing morphisms h₁, h₂ and a regular language R such that L = π_A(E(h₁, h₂)∩R). Moreover, one can take R = KA^∗K where K and K are regular languages defined on an alphabet B disjoint fromA.

Proof. Assume ﬁrst that ε /∈ L. Let G = (N, A, P, S) be a type 0 grammar generatingL, where we can assume that the productions have no terminal letters on the right hand side,i.e., P ⊆N⁺×(N∪A)⁺. Let ¯A be a copy ofA that is disjoint from the other alphabets. Also, letV =N∪A¯and R=KA^∗K with

K=S₀(V^∗P V^∗)^∗ and K=F#^∗, whereS₀,,F and # are new symbols.

Let us deﬁne the morphismsh₁and h₂ by

S₀ p= (u, v) X ∈N a¯∈A¯ a∈A F #

h₁ S₀S v X a # # #

h₂ S₀ u X a a ##

Let us takeu∈π_A(E(h₁, h₂)∩R). Then there exists a wordz∈E(h₁, h₂)∩R such thath₁(z) =h₂(z), andu=π_A(z). Here

z=S₀z₁. . .z_nF#ⁱ,

wherez₁, . . . , z_n−1∈V^∗P V^∗,u=z_n, andi≥0. Hence, for 1≤j≤n−1, h₂(z_j) =⇒Gh₁(z_j),

h₁(S₀z1. . .zj) =h₂(S₀z1. . .zj+1).

Therefore,h₂(z₁) =Sandh₁(z_j) =h₂(z_j+1) for 1≤j≤n−1. So we obtain that S=h₂(z₁) =⇒G h₁(z₁) =h₂(z₂) =⇒G. . .=⇒Gh₁(z_n−2)

=h₂(z_n−1) =⇒Gh₁(z_n−1) =h₂(z_n) =u and thereforeu∈L.

(11)

Conversely, ifu∈L, then we have a derivation

S=w₁=⇒_Gw₂=⇒_G. . .=⇒_G w_n =u

according to the grammarG. For each 1≤j ≤n−1, one can ﬁndz_j ∈V^∗P V^∗ such thath₁(z_j) =w_j+1 andh₂(z_j) =w_j . Set then

z=S₀z₁. . .z_n−1uF#ⁱ⁺¹,

whereiis the length ofu. Thenz∈Rand one can easily check thath₁(z) =h₂(z).

Hence,u=π_A(z)∈π_A(E(h₁, h₂)∩R).

Finally, ifε∈L, setD=A∪ {d}, where dis a new symbol. Then Ld=π_D(E(h₁, h₂)∩KD^∗K) =π_D(E(h₁, h₂)∩KA^∗dK)

and hence L = π_A(E(h₁, h₂) ∩KA^∗dK). This completes the proof of the

lemma.

Note that the form of the regular languageR and the fact that the two morphisms are nonerasing are crucial for the proofs of the following lemmata. The proof of the following lemma uses the methods from [9].

Lemma 4.3. Let A andB be two disjoint alphabets and h₁, h₂: (A∪B)^∗ →C^∗ be two nonerasing morphisms. If K and K are two regular languages included in B⁺, then π_A(E(h₁, h₂)∩KA^∗K) = π_A(E_G(#, g₁, g₂)) for some nonerasing morphismsg₁ andg₂.

Proof. Let us take two nondeterministic finite automata M = (Q, B,∆, q₀, F), M = (Q, B,∆, q₀, F) such that L(M) =K and L(M) =K. The transitions are triples of the form (q, b, p), that is, ∆⊆Q×B×Qand ∆⊆Q×B×Q. Clearly, one can assume thatQ={q0, . . . , q_n}andQ={q₀, . . . , q_n}withQ∩Q=∅, and that the automata have unique final statesF={qn}andF={q_n}. Also, we can assume that there are no transitions (q, b, q₀) and (q, b, q₀) that enter the initial statesq₀ and q₀, and that there are no transitions (q_f, b, q) and (q_f, b, q) leaving from the final statesq_f andq_f.

First, we deﬁne three morphisms Θ,and ras follows.

Letθ: (A∪∆∪∆)^∗→(A∪B)^∗ be the morphism deﬁned by θ(a) =afora∈A, andθ((p, b, q)) =b for (p, b, q)∈∆∪∆.

Also, let=_d²ⁿ and r=r_d²ⁿ, that is, , r:C^∗ →(C∪ {d})^∗, where dis a new symbol, and

(c) =d²ⁿc, andr(c) =cd²ⁿ forc∈C.

(12)

The morphismg₂: (A∪∆∪∆)^∗→(C∪ {#, d})^∗ becomes deﬁned by g₂=rh₂θ.

It is immediate thatg₂(z)∈(Cd²ⁿ)^∗ for allz∈(A∪∆∪∆)^∗. The shift letter # does not belong to any image ofg₂. The notationwd^−mmeansw(d^m)⁻¹, that is, d^m is removed as a suﬃx of the wordw, and similarly d^−mwis an abbreviation for (d^m)⁻¹w. The morphismg₁: (A∪∆∪∆)^∗→(C∪ {#, d})^∗is deﬁned by

g₁((q₀, b, q_j)) = #rh₁(b)d^−2j,

g₁((q_i, b, q_j)) =d²ⁱrh₁(b)d^−2j fori= 0, g₁(a) =h₁(a) fora∈A,

g₁((q₀, b, q_n)) =h₁(b)d²ⁿ,

g₁((q₀, b, q_j)) =h₁(b)d^2j+1 forj =n, g₁((q_i, b, q_n)) =d^−(2i+1)h₁(b)d²ⁿ fori= 0,

g₁((q_i, b, q_j)) =d^−(2i+1)h₁(b)d^2j+1 fori= 0 andj=n.

The morphismg₁ decodes the behaviour of the combined automata that accepts the languageKA^∗K in the sense thatg₁(z)∈#(Cd²ⁿ)^∗ if and only ifz =uvu for some wordsu∈∆^∗,v∈A^∗, and u∈∆^∗such thatθ(u)∈K andθ(u)∈K. Therefore, we have

g₁(z)∈#(Cd²ⁿ)^∗ ⇐⇒ θ(z)∈KA^∗K. (6) Finally, let π = π_C be the projection π: (C∪ {#, d})^∗ → C^∗ that deletes the lettersdand #. Then we have

πg₁=h₁θ and πg₂=h₂θ. (7) Let v be a word in π_A(E_G(#, g₁, g₂)) and let z be such that v = π_A(z) and g₁(z) = #g₂(z). Sinceg₂(z)∈(Cd²ⁿ)^∗, alsog₁(z)∈(Cd²ⁿ)^∗, and it follows by (6) thatθ(z)∈KA^∗K. Consequently, by (7), we have

h₁θ(z) =πg₁(z) =π(#g₂(z)) =πg₂(z) =h₂θ(z).

Hence,θ(z)∈E(h₁, h₂)∩KA^∗K and alsov=π_Aθ(z)∈π_A(E(h₁, h₂)∩KA^∗K) as required.

Conversely, let v ∈ π_A(E(h₁, h₂)∩KA^∗K), say v = π_A(kvk) with k ∈ K, k ∈ K and h₁(kvk) = h₂(kvk). Then there exists a word z = uvu with u∈ ∆⁺, u ∈ ∆⁺, θ(u) = k, θ(u) = k, θ(z) = kvk and g₁(z) = #rh₁θ(z) =

#rh₂θ(z) = #g₂(z). Therefore,v =π_A(z)∈π_A(E_G(#, g₁, g₂)), which completes

the proof.