On ambiguity in DOS systems

(1)

RAIRO. I NFORMATIQUE THÉORIQUE

A ^NDRZEJ E ^HRENFEUCHT D ^AVID H ^AUSSLER

G RZEGORZ R OZENBERG On ambiguity in DOS systems

RAIRO. Informatique théorique, tome 18, n^o3 (1984), p. 279-295

<http://www.numdam.org/item?id=ITA_1984__18_3_279_0>

L’accès aux archives de la revue « RAIRO. Informatique théorique » im- plique l’accord avec les conditions générales d’utilisation (http://www.numdam.

org/conditions). Toute utilisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit contenir la présente mention de copyright.

Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques

http://www.numdam.org/

(2)

R.A.I.R.O. Informatique théorique/Theoretical Informaties (vol. 18, n° 1, 1984, p. 279 à 295)

OIM AMBIGUITY IN DOS SYSTEMS (*) by Andrzej EHRENFEUCHT (X) , David HAUSSLER (2),

and Grzegorz ROZENBERG (3)

Communicated by J. BERSTEL

Abstract. — A DOS System formalizes the notion of generative determinism in the framework of(somewhat modified) context free grammars. This paper investigates the concept o/ambiguity in DOS Systems, in particular we concentrate on the décision probiems related to ambiguity in DOS Systems. We show that the following probiems are undecidable for DOS Systems.

1. Is there a word in a given regular language which is ambiguous in a given DOS system ? 2. Are there two distinct words in a given DOS language (i. e. a language generaled by a DOS system) which map to the same word under a given weak identity ?

3. Is there a word in a given DOS language with two dérivation trees which do not "share" any frontier nodes.

Actually we show that the above undecidability results hold for a strict subclass of the class of DOS Systems (languages) defined by restricting ourselves to acyclic and propagating DOS Systems.

Résumé. — Un DOS-système formalise la notion de déterminisme génératif dans le cadre des grammaires "context-free" (quelque peu modifiées). Cet article étudie le concept <F ambiguïté dans les DOS-systèmes; nous nous intéressons tout particulièrement aux problèmes de décision liés à Pambigutté dans les DOS-systèmes. Nous montrons que les problèmes suivants sont indécidables pour les DOS-systèmes.

1. Existe-t-il un mot dans un langage rationnel donné qui est ambiguë dans un DOS-système donné ?

2. Existe-t-il deux mots distincts dans un DOS-langage (i. e. dans un langage engendré par un DOS-système) donné qui donnent le même mot par une identité faible donnée?

3. Existe-t-il un mot dans un DOS-langage donné ayant deux arbres de dérivation qui ne contiennent aucun nœud frontière commun ?

Nous montrons que ces résultats d'indécidabilité sont en fait vrais pour une sous-classe stricte de la classe des DOS-systèmes (langages) définie par restriction aux DOS-systèmes acycliques et propageants.

INTRODUCTION

DOS Systems are defined in [3] to capture the notion of generative determinism in a class of sequential rewriting Systems such as are found in context free grammers. A DOS system is given by a triple G = < L, h, w >

(*) Received in October 1982, revised in March 1982.

(*) Department of Computer Science, University of Colorado, Boulder, Colorado 80302.

(²) Department of Mathematics and Computer Science, University of Denver, Denver, Colorado 80208.

(3) Institute of Applied Math, and Computer Science, University of Leiden, Leiden, The Netherlands.

R.A.I.R.O. Informatique théorique/Theoretical Informaties 0399-0540/84/03 279 17/$ 3.70/ © AFCET-Bordas

(3)

where 2 is a fini te alphabet, h : S -> E* and w e 2+. As the notation suggests, DOS Systems are intended to be a sequential analogue of DOL Systems (see e. g. [9]). The language generated by G consists of ail words derivable from the

"axiom" w by successive applications of "productions" given by the function h.

There is no distinction between terminal and nonterminal letters, hence the study of DOS Systems may also be considered a branch of the investigation into pure grammars (see e. g. [7, 10, 8]). Furthermore, addition of a terminal alphabet to DOS Systems is unnecessary since it was shown in [3] that this does not increase the language generating power of DOS Systems (except to allow the génération of the empty language) .-

This paper continues the research into DOS Systems begun in [2], [3] and [4]

by defining the concept of ambiguity in DOS Systems and exploring some basic décision problems related to this notion. As in context free grammars, ambiguities arise in DOS Systems when distinct dérivation trees exist for the same word in the language. (Actually we consider dérivation forests in DOS Systems since the axiom for a DOS System is not necessarily a single letter word.) However, the study of ambiguity in DOS Systems differs from the study of ambiguity in context free grammars in one major respect. Due to the generative determinism of a DOS System, ail dérivation forests within the System are subforests of a single infinité forest called the D-for est of the System. The D-forest of a DOS System is simply an infinité forest whose root nodes are a séquence of nodes labelled with the letters of the axiom of the system, and such that the successors of any node labelled a are a séquence of nodes labelled with the letters of the word specified as the replacement for a.

The frontier of any dérivation forest in a DOS System is a eut in the D- forest, where by a eut we mean a séquence of nodes in their left to right ordering from the D-forest such that there is exactly one node in the séquence for each infinité path from a root node of the D-forest. Conversely, each eut in the D-forest corresponds to the frontier of some dérivation forest. Thus ambiguity in DOS Systems is characterized (with some qualifications) by the existence of distinct but identically labelled cuts in the D-forest of the system.

This places the investigation of ambiguity in DOS Systems into a combinatorial environment which is greatly simplified over that given for arbitrary context free grammars.

The paper is organized as follows:

In Section 1 we specify the basic notation used throughout the paper. We begin Section 2 of the paper by giving formai définitions of dérivations and ambiguity in DOS Systems, and inherent ambiguity of DOS Systems. We then dérive a simple lemma concerning ambiguity in APDOS Systems. These are propagating DOS Systems with no cycles of derivability among the letters of

R.A.I.R.O. Informatique théorique/Theoretical Informaties

(4)

ON AMBIGUÏTY IN DOS SYSTEMS 2 8 1

the alphabet (See [1].) Using APDOS Systems, we give undecidability results for the following types of ambiguity problems.

1. Given a regular language R and an (AP) DOS System G, is there a word weR which is ambiguous in G? (Theorem 1.4).

2. Given a weak identity (p and an (AP) DOS System G, are there two distinct words Wi and w2 derivable in G such that (p (wi) = <p (w2) ? (Theorem 1.6).

3. Is there a word derivable in a given (AP) DOS System by two dérivation forests which do not "share" any frontier nodes? (Theorem 1.8).

The method used in obtaining the first two undecidability results is by réduction from Greibach's "two-way" version of the Post Correspondence Problem [5]. The third resuit is obtained by réduction from the emptiness of intersection problem for DOS languages, proven undecidable in [2],

SECTION 1: BASIC NOTATION

Throughout this paper E dénotes an arbitrary finite alphabet and X* dénotes the free monoid with null word X generated by E. E + = E* — { X}. For weE*,

| w | dénotes the length of w and w^R dénote the reverse of w. N dénotes the set of natural numbers, including 0. For any set 5, P(S) dénotes the set of all subsets of S and card(S) dénotes the cardinality of S. Ç> dénotes the empty set.

A weak identity is a homomorphism <p : E* -> E* where q>(a) = a or <p(a) = X for any a e S. For any alphabet E, Ë dénotes the shadow alphabet for E, i. e.

{â: a e E } . iden : (E l j £ ) * ->• E* is the unbarring homomorphism defined by iden (a) = iden (a) = a for any a e E.

Given a DOS System G = <E, /i, w>, the D-forest of G is an ordered forest with nodes labelled from E U { X } defined inductively as follows. The roots of the D -forest are a séquence of nodes labelled with the letters of w from left to right For any node labelled a in the D-forest, where ael,, its successors are a séquence of nodes labelled with the letters of h (à) from left to right. If h(à) = X then the node has exactly one successor labelled X, as does any node labelled X.

A eut in the D-forest is a séquence x of nodes in the D-forest such that on each infinité path starting from a root node there is exactly one node from t, and the order of the nodes in T is their left to right order in the D-forest.

Given a function h : E -• E*, hs : E* -> P(E*)5 the sequential extension of h, is defined by :

() {()} f o r vol. 18, n°3, 1984

(5)

and:

hs(ax.. . af c) = { f l i . . •di~1h(ai)ai+1.. .ak: l ^ i ^ k } for a ^

A DOS system is a triple G = < X, h, w > where S is a finite, nonempty alphabet, h : X -* Z*, and w e Z⁺. fc is called the underlying mapping of G and w is called the axiom of G. The language of G, denoted L (G), is defined by L (G) = { x : x e (fc*)" (w) for some n ^ 0 }, where (/is)° i s the identity function.

A séquence of distinct letters

O i , ...,a*>, fl.-eE for i : is a çyc/e in /i if

fc > 1, h (a() — ai+l9 for ail 1 ^ i: < fc and h (afc) = at. A DOS system is acycîic if fc contains no cycles andpropagating if h(a)^X for ail a e S. We use the préfixes A and P to dénote that a DOS system is acyclic or propagating respectively. A DOS, PDOS or APDOS language is any language generated by a DOS, PDOS or APDOS system, respectively.

Given a DOS system G = <S, h, vv> and words u,ueZ*, we say u dérives v in one step, written M=> V, if v e hs (M). We say u dérives v (in one or more steps),

G

written:

u=> v, if ve (h^s)ⁿ (M) for some n ^ 0.

G

We adopt the concepts of a node, labelled ordered forest, successor and />atft from the theory of directed graphs. Formai définitions can be found in any standard graph theoretical text. For any labelled graph G, lG dénotes the labelling function on G. The subscript G will be omitted when the graph in question is clear from the context.

SECTION 2: RESULTS

The simplest approach to ambiguity is to define a DOS system to be ambiguous if its D-forest contains two distinct cuts for the same word.

However, this approach leads to trivial instances of ambiguity in Systems in which the underlying mapping h contains a cycle, or in which h(a) = a for some letter a. In this case we will have a profusion of cuts for the same word simply because we can "rederive" certain words ad infinitum by iterating a cycle or "identity production" within a dérivation. This situation can be avoided by restricting our attention to "minimal" cuts in the D-forest.

R.A.I.R.O. Informatique théorique/Theoretical Informaties

(6)

ON AMBIGUITY IN DOS SYSTEMS 2 8 3 DÉFINITION: Given a D-forest F and two cuts c1 and c2 in F, Ci<c2 if and only if for every infinité path in F, the node of Ci on this path is the same as the node for c2 on this path, or the node of ci is doser to the root node of this path than the node of c². c^x<c² if and only if c i ^ c² and Ci^c². Given a word w, c is a minimal eut for w in F if and only if l(c) = w and there does not exist a eut Ci such that Z(ci) = w and Ci <c.

DÉFINITION: A word w is ambiguous in a given DOS System G if and only if there exist two distinct minimal cuts for w in the D-forest of G. G is ambiguous if and only if there exists a word weL (G) which is ambiguous in G.

A DOS language Tis inherently ambiguous if and only if every DOS system G such that L(G) = T is ambiguous.

A simple example of an ambiguous DOS system is the system:

G = < {a, b, c }, h, ac > where h (a) = ab> h(b) = b and h (c) = bc.

Herè L(G) = afc*c and each word in L(G) — {ac] is ambiguous in G. Notice that this language is not inherently ambiguous. We can generate it unambiguously by simply modifying h so that h(c) = c or h(a) = a (but not both). On the other hand, it is clear that the DOS language L = a* is inherently ambiguous, since the only DOS System which générâtes L is the System G = ({a},h,ay where h (a) = aa, and the word aaa is ambiguous in G.

To investigate ambiguity in DOS Systems, we will need a simple method of describing spécifie cuts in a given D-forest and of determining whether or not they are minimal

DÉFINITION: Given a DOS System G = <£, n, vv> and xe£*, a dérivation for x in G is an ordered séquence:

D = < Wi, . . . , w^k > where k ^ 1, iden (wi) — w, w^k — x and for each l^i<k there exist wj, wfeZ* and â e £ such that:

w£ = w- âwf and tóen (w£+1) = w£ /i (a) w?.

), .. .,iden(wk)}. The eut for the i-th word in D, written c(i, D), is defined inductively as follows.

1. c(l, D) is the ordered séquence of root nodes in the D-forest for G.

2. For ail 1< i < fe, if the n-th letter of w,- is the barred letter, then c (i + 1, D) is obtained from c(i, D) by replaeing the n-th node of c(z, D) with the ordered séquence of its successors in the D-forest of G.

The eut for the last word of D, c(fe, D), will be abbreviated as c(D). We will say that D is a dérivation for the eut c(D).

vol. 18, n° 3, 1984

(7)

LEMMA 1.1: For any DOS System G:

(i) For any w e S *⁵ weL(G) if and only ifthere is a dérivation ofw in G.

(ii) For any dérivation D in G, c(D) is a eut in the D-for est o f G and conversely, for any eut c in the D-forest of G there is a dérivation for c in G.

Proof: This is obvious. •

In the present investigation, we will concentrate on the special subclass of DOS Systems known as APDOS Systems, more extensively studied in [3], (A formai définition appears in the previous section.) Both of the examples of ambiguous Systems given above were in fact APDOS Systems. For this basic subclass of the DOS Systems, we have a simple characterization of the dérivations which produce minimal cuts in the D-forest.

DÉFINITION: A dérivation <wi> . . .,wk> is elementary if and only if iden (vv^f) #iden (wⁱ⁺1) for ail 1 < i < k.

LEMMA 1.2: Let D be a dérivation in an APDOS System G. Then c(D) is minimal if and only if D is elementary.

Proof: Let G = < E, h, w > and let D = < Wi, . . . , w^k >. If D is not elementary then at some step in D, a letter a e l is replaced with itself, so h(a) = a. It is obvious that none of the cuts for any succeeding step of the dérivation will be minimal since each will contain some successor of this node generated when a is replaced by a and each such successor is simply another node labelled a. In particular, c{D) will not be minimal.

On the other hand, suppose that c(D) is not minimal. Then there exists a eut c<c(D) such that l(c)~xvk. In this case it is clear that we can find a dérivation D' = (uu . . .,wM, . . .,um> in G such that:

c(D') = c(D) and c{n,D') = c.

Since G is propagating:

| Uj | ^ | Ui | for ail Hence since

iden (uⁿ) = u^mi | u^t \ = | u,-1 for ail n<Ï, j<m.

This implies that in each step of the dérivation of um from iden(un) in D ' a single letter replacement occurs. Hence we can find xu j ^ e S * and a^eS for

such that:

un = xnà~yn and iden(ut) = xtaiyt for ail and

| Xi | = | xj | for ail n < ij < m.

(8)

ON AMBIGUITY IN DOS SYSTEMS ^ 8 5

Since am = an and ai=>ai+1 for all n^i<m, the fact that G is acyclic implies

G

that at = aj for ail n^i,j^m. We conclude that there exists a letter a e Z such that c(D) contains a node labelled a which is a proper descendant of another node labelled a. Thus any dérivation for the eut c{D) must involve some step where a replaced by a. Thus there exists 1 ^ i < k such that iden (w,-) = iden (wi+1) and hence D is not elementary. •

In the remainder of this paper, we present several undecidability results for problems concerned with ambiguity in APDOS Systems. The first two results will utilize réductions to some form of the following problem, a variant of the

"two way" Post Correspondence Problem which is defined and proven undecidable in [5],

DÉFINITION: The K-substitutionproblem is defined as follows. Given n, k e N , a finite alphabet S and a substitution S : {1, . . . , n } * - • P ( S+) such that card(S(i)) = k for ail l < i < n , does there exist a word w e { l , . . . , n } * such that card(S(w))<fc^{| w |}? Such a word w is called a solution to the given k- substitution problem.

PROPOSITION 1.3: The 2-substitution problem is undecidable.

Proof: The 2-substitution problem is obviously equivalent to the two-way correspondence problem defined in [5] if we restrict our attention to we{\, . . . , n }+. On the other hand, if w = X, then S(w) = {^}, and thus card (S (w)) = 1 = 21 w|. Hence X is never a solution to the 2-substitution problem and can safely be ignored. Thus the resuit follows from the undecidability of the two-way correspondence problem, established in [5], •

It is obvious that given a single word, or a finite set of words, we can décide if any of these words are ambiguous in a given APDOS System. However this is not the case if we are given an arbitrary regular set, as is shown in our next theorem.

THEOREM 1.4: It is undecidable whether or not there exists a word in a given regular set R which is ambiguous in a given APDOS System.

Proof: We will transform an instance of the 2-substitution problem into an equivalent instance of the above problem. Assume that S : {1, . . . , n } * -> P ( Z+) is an instance of the 2-substitution problem with:

S (i) — {xu yi} for ail 1 ^ i < n.

Let the alphabets:

A = {au - - ' , an} , B = { bu .. .,bn) a n d N={ 1, .. . , n } ,

vol 18, n° 3, 1984

(9)

be given such that A, B, N and S are pairwise disjoint. Let A = A{J B\J N{J2, and let a = a^t. . .aⁿbi.. .bⁿ.

Let ƒ, g : (Z U N)* - • A* be the homomorphisms defined by:

and:

g (a) = a a for

Let /i : A - • A* be defined by:

and: i . . . bt for ail

and:

h(c) = c for c e S U N .

Obviously h is propagating and acyclic. Let G be the APDOS System

< A, h, a > and let R be the regular set (aE)*a(iVa)*. We claim that there exists xeRC\L(G) such that x is ambiguous in G if and only if there exists w e N * such that card(S(w))<2^. When this claim is established, our resuit will follow directly, using Proposition 1.3.

Examining the System G it is apparent that given a word r=paq, where p, qeA*⁹ any of the words in:

{pf(Xj)aLg(j)q,pf(yj)aLg(j)q: U j < « }

can be derived from r in one step. To dérive the word pf(xj)oig(j)q we replace the a,- in the substring a oî paq and to dérive pf(yj)oig(j)q we replace bj.

Let D = < wl9 . . . , wk> be an elementary dérivation in G of a word wkeR.

We claim that for each i : 1 < i < k, there exist:

w'i e (a Z)*, wf e (AT a)*, a-, af e ( 4 U B)* and c,-

such that:

a = al et OLi and wt = w[

Since iden(w¹) = a, the claim is true for î = 1.

Assume that the claim holds for some i<k — 1. Hence iden(wi+1)eR. Let us say that given any word r e R, the "middle a" of r is that occurrence of a in r

R.AXR.O. Informatique théorique/Theoretical Informaties

(10)

ON AMBIGUITY IN DOS SYSTEMS 2 8 7

such that:

r = rx oc r2 where rx e (a 2)* and r2 e (N a)*.

Since Z) is elementary, in any step of D a letter from A \J B is rewritten. If any letter in any occurrence of a whïch is not the middle a of iden(wi+1) is rewritten, the resuit ïs a word which hàs a letter from N occurring before a letter from S. It is impossible to dérive a word in R from such a word, hence it must be the case that the barred letter inwj+i occurs in the middle a. Thus by induction on i, our claim holds for ail i : 1 ^i<k.

It follows from the above observations that for each elementary dérivation D in G of a word in R, there exists:

zl9 ...,£fce£*s iu . . such that:

iden (D) = < a, f(zt) a £ (ÏX), . . . , ƒ (zx... zk) a g ( ik. . . h) >

where:

Zi€S(ij) for ail l</<Jfc.

Hence L(G)nR = { f(x) a g (w*) : w e i V * , x e I * and x G S (w)} using Lemma 1.1.

Now, let us assume that we are given w such that card(S(w))<z^wK Hence there exist:

fc>0, i i , . . . Jke N a n d z , uu . . . , uk, vu . . . , such that:

i i . . .fk = w , z = ul t . . uk = V i . . . vk

where:

and Wy^üj for at least one j

Consider the word t=f(z) ag(wR)eR. We can find:

ao, . . . , ocjt -1, aó, . . . , ai -1 where iden (a,-) = it/en (aO = a for ail 0 ^ i < f c - l and

ö i = <aó, ƒ («0 ai g 0*0, .. .,ƒ(«!.. .tt*-Oajk-i^(ï*-i.. .1*0. O and:

vol. 18, n° 3, 1984

(11)

are elementary dérivations of t in G. Let j be the smallest integer such that Hence:

c ( i , D i ) = c f t D2) for K j , but c(j+ 1, Dx)*c(j+19 D2).

Since ail further replacements in either dérivation occur in descendants of the middle a's introduced in the j-th step, it is apparent that:

Hence c(Di)^c(D²). Since both dérivations are elementary, t is ambiguous in G by Lemma 1.2.

On the other hand, let us assume that there exists teRC^L(G) such that t is ambiguous in G. Then by the above observations we can find k > 0, Mi, . . , , uk, -Ü i, . . . , ükG Ï , i'i, . . . , ik£N and elementary dérivations Dt and D2

as given above such that for some i, O^i^fc— 1, a*/oc*. This implies that Ui+i^vi+1 since a different letter is replaced at this point in the dérivation.

Thus since:

{ uj, vj}^S (ij) for ail 1 ^j < fc, i \ . . . ik

is a solution to the 2-substitution problem. This establishes our claim, ana complètes the proof. •

In [4] it was shown that by using DOS languages, along with the opérations of intersection and weak identity, one obtains a hierarchy of classes of languages, at the bottom of which is the class of images of DOS languages under weak identity mappings and at the top of which is the class of recursively enumerable languages. Hencç it is natural to consider the

"ambiguities" created by the application of a weak identity to the words of a DOS System. We will show that for an arbitrary APDOS System G and weak identity <p, it is undecidable in gênerai whether or not there exist distinct words Wi,w2eL(G) such that (p(wi) = (p(w2). To do this, we will need a slightly stronger resuit concerning the undecidability of the 2-substitution problem.

DÉFINITION: An instance of the 2-substitution problem with S : {1, .. . , n } * -> P ( 2+) is extension-free if:

) = Ç) and S(w for any w e l * and x e l+.

LEMMA 1.5: The 2-substitution problem is undecidable in gênerai for extension-free instances.

R.A.I.R.O. Informatique théorique/Theoretical Informaties

(12)

Proof: We need only verify that Greibach's encoding of the Post Correspondence Problem in the proof of the undecidability of the "two way"

correspondence problem in [5] always yields an extension-free instance of the 2-substitution problem. This encoding, given as a 2-substitution problem, can be given as follows.

Let P=(xi,yx >, . . . , <x„,yn} be an instance of the Post Correspondence Problem with words from Z⁺ and let "*", "#" ard "&" be letters not appearing in 2. Let ƒ g : X* -> (S U { * })* be homomorphisms given by:

ƒ (a) = a * and g (à) = * a.

Let:

S: { l , . . be the 2-substitution given by:

and

S(i + l n ) = { * & ^(Xj)^#*, • & • / ( y * ) » * } for

Greibach's arguments show that the 2-substitution >problem for S has a non-null solution if and only if P has a solution. Since there is never a null solution to the 2-substitution problem (see the proof of Proposition 1.3), we need only show that:

for any w e { l , . . ., and:

x e { l , . . . , 4 n }⁺ (irrespective of P).

Let us fix P arbitrarily as above and assume that S(w)PiS (wx)#Ç) for some w, x as above. Let:

w = ii... ïft and x = ik +1... i/

where:

for Then there exist:

z9ui l 9 . . .9U i ^ vi l 9 . . .9vi t s u c h t h a t z = ui x. . > ui k= vi t. . . v ^

where:

Ui.eSiij) for

vol. 18,n°3, 1984

(13)

and:

Vi.eSiij) for l^j^l

We may assume that the length of w is chosen as small as possible, and thus k is as small as possible. Clearly, w cannot be null, so k> 0. Since k is minimal, m^Vi^ By examining S, it is clear that this implies that n + l ^ i ' i ^ 2 n or 3 n + K i " i < 4 n . In the latter case however, the "#" symbols must appear aligned in utl and i?,*l5 forcing these words to be equal. Thus n + l ^ i ' i < 2 n , which implies that | uil \ — | vi± | = 1 mod 2. Hence it is not possible that ij^n for ail 2 ^ j ^ /, because both words in 5 (ij) have even length for any ij ^ n. Hence there must either be a first occurrence of the symbol "#", or a second occurrence of the symbol "&" (counting from the left) in z. Let i, be the index in UiV,.uik where the first (leftmost) of these events occurs. If the first event is a second occurrence of "&" in ut, then because the second "&" 5 must be aligned in uil. . . uik and u^.. .vit, it is clear by examining S that we must have Uil. . .ui _i=vt1.. .Vi _l9 contradicting the minimality of fe. Similarly, if the first of the above events is an occurrence of "#" in uf, then u%x.. .Ui =vtl.. .Vi, again contradicting our assumption. Thus S ( w ) n S (wx) = Ç). Assuming S(w)f|S (xw) 7e Ç>, the argument is similar, due to the symmetry in S. M

THEOREM 1.6: Given an APDOS System G and a weak identity cp, it is undecidable whether or not there exist distinct words Wi,w2eL(G) such that Proof: We will transform an extension-free instance of the 2-substitution problem into an equivalent instance of the above problem. Let n, S, A, B, N, 2, A, a, and xh yt for ail 1 ^i^n be as given in the proof of Theorem 1.4.

Let h : A -> A* be defined by:

and:

h(bi) = bi.. .bnyi a iai.. .anblt. ,£\ for i : and:

c for

Obviously ft is propagating and acyclic. Let G be the APDOS System

<A,fc,oc>.

Let <p : A -> S U iV be the weak identity defined by:

(p(a)^I if aeA{JB;

cp(a) = a otherwise.

(14)

We claim that there exist wuw2sL(G) such that w i / w2 but cp(wi) = (p(w2) if and only if there exists weN* such that card(S(w))<2|w|. When this claim is established, our resuit will follow directly, using Lemma 1.5.

Let g be the alphabet {G, )h [h ]t l^i^n} and let T be the semi-Dyck language generated from the context free grammar (QU {S},Q,p, S} where:

p = {S

Let f, g : g* -> A* be the homomorphisms defined by:

/ ( G ) = OC X , ƒ ( [ ; ) = OC ^

ƒ( )0=/aO = oti for

and

f o r

We will establish that L(G)=/(T)a. First, observe that a=/(X) oceL(G).

Now, assume that ƒ (wi vv2) a=/(vvi) a g(W2)GL (G), where vviW2eT. This implies that:

= {/(wiG)*w2)oc, ƒ (Wi [J; w2) oc : U K n } c L ( G ) ,

Thus since T is the semi-Dyck language over the alphabet g, it is easily established by induction that ƒ (T) a^L(G).

On the other hand, it is clear that a, the axiom of G, is in ƒ (T) a. Now assume that weL(G) is also an element of f{T) a. For any w' derived from w in one step we can find wu w2eQ* such that:

w' = ƒ (wi) oc g (w2) and w' e P U { w }, where P is as above. Hence L(G)^f(T) a. Thus L(G)=f(T) a.

Now let us assume that there exists weN such that card(S (w))<2iw|. Hence there exists:

fc>0, iu ...JkeN and z,w1? ...,uk, i?l9 . . such that:

where:

{«/.üjJsSO") for all

vol. 18, n° 3, 1984

(15)

and Uj^Vj for at least one 1 ^j Since:

L ( G ) = f ( T ) a , w1 l k k O a n d :

W2=f(li1. • .[ÉJIV . . ] «1) a = a i ; i . . . a t ;f ta if c. . . c x ï i a

are both in L(G). Since M^U,- for some l < î < f c , wx ^vv2. However, since

Mi. . .uk =

Thus the " i f part of our claim is established.

Now assume that there exist Wi and w²eL(G) such that Wi^w² but

<p (wi) = q> (w2). Choose w± and vv2 such that the minimum of | wx j and | w21 is as small as possible. Since:

L(G)=f(T)a^{9 W l} = we can find:

uq, . . ., uk, vu . . ., such that:

Wi = / ? i a M i . . .<

and:

w2=p2avq. . .c (or vice versa) where:

£ w2 and cp

vkeH* and

X Mjt OC ik . . . OC i i QC J i

x p j a ù . - . a i i a i¹

(wi) = q>(\

i i , . . . ,

Z¹ PieA ...ajma .. .ctj^m a

v2),

i*eJV,

*

(i) for i G { 1,2 } either /?; = X- or pt ends with a letter from N and:

(ii) Mf6S(ï,)for/: l</<fc and VtsS(U) for / : Since cp(w!) = (p(>v²), (i) implies that:

M i . . .uk = vq.. .vk a n d

Thus because S is extension-free, we must have q— 1. If i/f = ^ for ail 1 ^i then we must have ƒ>! ^p², However, then since:

L (G)=f(T) a, wi =/>! a j i . . .ajm a and w'2 =p2 a j i . . .a jm a, are both in L (G). This would imply that wi and w2 are words shorter than Wi and w2 with w i ^ w2 but cp(wi) = cp(w2), contradicting our assumption of the R.A.LR.O. Informatique théorique/Theoretical Informaties

(16)

A. EHRENFEUCHT, D. HAUSSLER, G. ROZENBERG 293

minimality of the lesser of | wx | and | vv21. Hence there exists l^j^k where Uj^Vj. Using (ii) with q=l9 this implies that card(S(ii.. .ik))<zlh-ikï, which establishes the "only if¹ part of our claim, and complètes the proof. •

In defining our notion of ambiguity in DOS Systems, we have restricted our attention to words represented by distinct cuts in the D-forest of the system.

A natural refinement of his notion of ambiguity is to insist that the cuts be completely distinct, in that they do not share the same node for any letter in the word.

DÉFINITION: Given a DOS System G, a word weL(G) is strongly ambiguous in G if and only if there exist distinct minimal cuts Ci and c2 for w such that for each letter of w, the node for this letter in Ci is distinct from the node for this letter in c2. A DOS System is strongly ambiguous if its language contains a strongly ambiguous word.

We show that strong ambiguity is an undecidable property of APDOS Systems. In obtaining this resuit, we will use the undecidability of the emptiness of intersection problem for cofunctional DOS Systems, demonstrated in [2].

DÉFINITION: TWO DOS Systems:

Gi = < Et, hu wi > and G2 = < S2, fe2, w2 >

are cofunctional if and only if T,t = S2 and hx—h2.

LEMMA 1.7: The emptiness of intersection problem is undecidable for cofunctional APDOS Systems.

Proof: This follows directly from the reasoning in [2] since all the DOS Systems used there are in f act APDOS Systems. •

THEOREM 1.8: It is undecidable whether or not a given APDOS system is strongly ambiguous,

Proof: Let G\ = < E, hy wx > and G2 = < S, h9 w2 > be two cofunctional APDOS Systems. Let A= {a, b, c} be a new alphabet such that S

Let ƒ : (2 U A) -> (Z U -4)* be defined by:

ƒ (b) = b, f (a) = abw, b, f{c) = bw2 bc

and f(d) = h(d) for any d e £ . Obviously fis propagating and acyclic.

Let G be the APDOS system < Z U A, f ac >.

We claim that there exists a word weL(G) which is strongly ambiguous in G if and only if L{G^)C\L(G2)^(jy. Establishing this claim, our resuit follows directly from Lemma 1.7.

First, suppose that there exists weL(Gi) (~\L(G2). Then we can find:

wi 6iden"1 (wi), w2Giden~l (w2), D1 = <iâc,abw\ bc, . . . , abwbc>

vol. 18, n° 3, 1984

(17)

and:

D 2 = < ac, abw2 bc, . . . , abwbc >

such that Di and D2 are two distinct dérivations of abwbc in G where c never appears in G1 and a never appears in G2. This implies that the set of nodes appearing in c (Di) is disjoint from the set of nodes appearing in c (D2), which establishes the " i f part of our claim.

On the other hand assume that there exists a word w which is strongly ambiguous in G. It is readily verified that L(G)^a(b S* &)* c, thus we can find:

and uu .. , , uke l * such that w = abu^b.. .bukbc.

In any dérivation of w, the subwords of the form b^L* b must be created by first rewriting the letter a or the letter c and then rewriting letters from E zero or more times. Thus in any dérivation of w, the total r mber of times the letter a or the letter c is rewritten is k. In deriving an instance of w with any given minimal eut c, we may assume that in the dérivation ail replacements of a appear bef ore replacements of c which in turn appear bef ore all replacements of letters in H and no replacements of b with b occur. Since w is strongly ambiguous, we must have k> 0.

Let:

D1 = <âi;1c5 .. ^âviC.aXiC, .. ..ax^ay^c, ,aytc) and:

D2 = (âv'1c, ...9âvic9axic, ..., ax'mc, ay\ c, ...9ay'ncy,

be two dérivations of w in the style described above, such that the first node of c(D0 is distinct from the first node of c(D2). Hence i / / . Without loss of generality, we may assume that i<l^k. This implies that in Du the substring u^{i + 1} is derived from the word vv² using replacements given by h and in D², is derived from wi using replacements given by h. Thus n ^ ( G2) # Ç ) . This establishes the "only if" part of our claim, and complètes the proof.

Since the class of APDOS Systems is contained within the class of DOS Systems, the following is an immédiate corollary of Theorems 1.4, 1.6 and

1.8.

COROLLARY 1.9: The following problems are undecidable for an arbitrary DOS System G.

1. /s there a word in an arbitrary regular language which is ambiguous in G?

(18)

ON AMBIGUITY IN DOS SYSTEMS 2 9 5

2. Are there two distinct words Wi and w2 in L(G) such that cp(w1) = (p(w2) for an arbitrary weak identity cp ?

3. Is G strongly ambiguous ? •

In view of these results, it would be somewhat surprising if there were an algorithm to décide whether or not an arbitrary DOS System is ambiguous.

However, we have not been able to show that no such algorithm exists, hence this remains the primarj open problem in this area.

ACKNOWLEDGEMENTS

We would like to express our thanks to the référée for his many suggestions on improving this présentation, and especially for pointing out an error in our original proof of Theorem 1.6. The simple example of an inherently ambiguous DOS language we have given is also due to the référée. In addition, we greatfully acknowledge the support of NSF grant MCS 79-03838.

REFERENCES

1. A. EHRENFEUCHT, D. HAUSSLER, G. ROZENBERG and P. ZEIGER, On DOS languages and DOS mappings, in préparation.

2. A. EHRENFEUCHT and G. ROZENBERG, On the Emptiness of the Intersection of Two DOS Languages Problem, Information Processing Letters, Vol. 10, 1980, pp. 223- 225.

3. A. EHRENFEUCHT and G. ROZENBERG, On Basic Properties of DOS Systems and Languages, Information and Control, Vol. 47, 1980, pp. 137-153.

4. A. EHRENFEUCHT and G. ROZENBERG, Représentation Theorems Using DOS Languages, Acta Informatica, to appear.

5. S. GREIBACH, The Undecidability of the Ambiguity Problem for Minimal Linear Grammars, Information and Control, Vol. 6, 1963, pp. 119-125.

6. M. HARRISON, Introduction to Formai Language Theory, Addison-Wesley, 1978.

7. T. HARJU and M. PETTONEN, Some Dec{dability Problems of Sentential forms, International Journal of Computer Mathematics, Vol. 7, 1979, pp. 95-108.

8. H. A. MAURER, A. SALOMAA and D. WOOD, Pure Grammars, McMaster University, Computer Science Technical Report No. 79-CS-7, 1979.

9. G. ROZENBERG and A. SALOMAA, The Mathematical Theory ofL Systems, Academie Press, New York-London, 1980.

10. A. SALOMAA, On Sentential Forms of Context Free Grammars, Acta Informatica, Vol. 2, 1973, pp. 40-49.

vol. 18, n° 3, 1984

On ambiguity in DOS systems

RAIRO. I NFORMATIQUE THÉORIQUE

A NDRZEJ E HRENFEUCHT D AVID H AUSSLER

G RZEGORZ R OZENBERG On ambiguity in DOS systems

A ^NDRZEJ E ^HRENFEUCHT D ^AVID H ^AUSSLER