Trimming Visibly Pushdown Automata

(1)

Trimming Visibly Pushdown Automata

Mathieu Caralp, Pierre-Alain Reynier, Jean-Marc Talbot

Laboratoire d’Informatique Fondamentale de Marseille UMR 7279, Aix-Marseille Universit´e & CNRS, France

Abstract

We study the problem of trimming visibly pushdown automata (VPA). We first describe a polynomial time procedure which, given a visibly pushdown automaton that accepts only well-nested words, returns an equivalent visibly pushdown automaton that is trimmed. We then show how this procedure can be lifted to the setting of arbitraryVPA.

Furthermore, we present a way of building, given aVPA, an equivalentVPAwhich is both deterministic and trimmed.

Last, our trimming procedures can be applied to weightedVPA.

Keywords: Visibly Pushdown Automata, Trimming

1. Introduction

Visibly pushdown automata (VPA) are a particular class of pushdown automata defined over an alphabet split into call, internal and return symbols [2, 3]¹. InVPA, the stack behavior is driven by the input word: when reading a call symbol, a symbol is pushed onto the stack, for a return symbol, the top symbol of the stack is popped, and for an internal symbol, the stack remains unchanged.VPAhave been applied in research areas such as software verification (VPAallow one to model function calls and returns, thus avoiding the study of data flows along invalid paths) and XML documents processing (VPAcan be used to model properties over words satisfying a matching property between opening and closing tags).

Languages defined by visibly pushdown automata enjoy many properties of regular languages such as (effective) closure by Boolean operations and these languages can always be defined by a deterministic visibly pushdown automaton. However,VPAdo not have a unique minimal form [1]. Instead of minimization, one may consider trimming as a way to deal with smaller automata. Trimming a finite state automaton amounts to removing useless states,i.e.

states that do not occur in some accepting computation of the automaton: every state of the automaton should be both reachable from an initial state, and co-reachable from a final state. This property is important from both a practical and a theoretical point of view. Indeed, most of the algorithmic operations performed on an automaton will only be relevant on the trimmed part of this automaton. Removing useless states may thus avoid the study of irrelevant paths in the automaton, and speed up the analysis. From a theoretical aspect, there are several results holding for automata provided they are trimmed. For instance, the boundedness of finite-state automata with multiplicities can be char- acterized by means of simple patterns for trimmed automata (see [14, 9]). Similarly, Choffrut introduced in [8] the twinning property to characterize sequentiality of (trimmed) finite-state transducers. This result was later extended to weighted finite-state automata in [6]. Both of these results have been extended to visibly pushdown automata and transducers in [7] and [10] respectively, requiring these objects to be trimmed.

While trimming finite state automata can be done easily in linear time by solving two reachability problems in the graph representing the automaton, the problem is much more involved forVPA(and for pushdown automata in general). Indeed, in this setting, the current state of a computation (called a configuration) is given by both a “control”

state and a stack content. A procedure has been presented in [12] for pushdown automata. It consists in computing, for

Email addresses:mathieu.caralp@lif.univ-mrs.fr(Mathieu Caralp),pierre-alain.reynier@lif.univ-mrs.fr (Pierre-Alain Reynier),jean-marc.talbot@lif.univ-mrs.fr(Jean-Marc Talbot)

1These automata were first introduced in [5] as “input-driven automata”.

(2)

each state, the regular language of stack contents that are both reachable and co-reachable, and using this information to constrain the behaviors of the pushdown automaton in order to trim it. This approach has however an exponential time complexity.

Contributions. In this work, we present a procedure for trimming visibly pushdown automata. The running time of this procedure is bounded by a polynomial in the size of the inputVPA. We first tackle the case ofVPArecognizing only so-calledwell-nestedwords,i.e. words which have no unmatched call or return symbols. This class ofVPAis called well-nestedVPA, and denoted bywnVPA. We actually present a construction for reducingwnVPA,i.eensuring that every run starting from an initial configuration can be completed into an accepting run. Symmetrically, we define a construction for co-reduction acting in a dual fashion. Combination of both constructions yields a trimming procedure.

In a second step, we address the general case. To do so, we present a construction which modifies aVPAin order to obtain awnVPA. This construction has to be reversible, in order to recover the original language, and to be compatible with the trimming procedure. In addition, we also design this construction in such a way that it allows to prove the following result: given aVPA, we can effectively build an equivalentVPAwhich is both deterministic and trimmed.

Moreover, when considering deterministic inputs, we prove that there is no trimming procedure running in polynomial time while preserving determinism. Finally, we show that our constructions can be applied to weightedVPA.

Organization of the paper. In Section 2 we introduce definitions. We address the case of well-nestedVPAin Section 3 and the general case in Section 4. We consider the issue of determinization in Section 5 and the extension to weighted VPAin Section 6. Last, a summary of our results is presented in Section 7.

Related models. VPAare tightly connected to several models:

Context-free grammars: it is well-known that pushdown automata are equivalent to context-free grammars. This observation yields the following procedure for trimming pushdown automata². One can first translate the automaton into an equivalent context-free grammar, then eliminate from this grammar variables generating the empty language or not reachable from the start symbol, and third convert the resulting grammar into the pushdown automaton performing its top-down analysis. This construction has a polynomial time complexity but, in this form, it does not apply toVPA.

Indeed, the resulting pushdown automaton may not satisfy the condition of visibility as the third step may not always produce rules respecting the constraints on push and pop operations associated with call and return symbols.

Tree automata: by the standard interpretation of XML documents as unranked trees,VPAcan be understood as acceptors of unranked tree languages. It is shown in [2] that they actually do recognize precisely the set of regular (ranked) tree languages, using the encoding of so-calledstack-trees, which is similar to the first-child next-sibling encoding. Trimming ranked tree automata is standard (and can be performed in linear time), and one can wonder whether this approach could yield a polynomial time trimming procedure for VPA. Actually, going through tree automata would not ease the construction of a trimmedVPA. Indeed, trimming the first-child next-sibling encoding of awnVPA, and then translating back the result into awnVPAyields an automaton which is reduced but not trimmed (this is intuitively due to the fact that the first-child next-sibling encoding realizes a left-to-right traversal of the tree).

Moreover, this construction does not ensure a bijection between accepting runs, a property that is useful when moving to weightedVPA.

Nested word automata [3]: this model is equivalent to that ofVPA. One could thus rephrase our constructions in this context, and obtain the same results.

2. Definitions 2.1. Preliminaries

Words and well-nested words. Astructured alphabetΣis a finite set partitioned into three disjoint setsΣ_c,Σ_rand Σ_ι, denoting respectively thecall,returnandinternalalphabets. We denote byΣ^˚the set of words overΣand byε the empty word.

The set ofwell-nestedwordsΣ^˚_wnis the smallest subset ofΣ^˚such thatεPΣ^˚_wnand for allaPΣ_ι,cPΣ_c,rPΣ_r and allu, vPΣ^˚_wn,auPΣ^˚_wnandcurvPΣ^˚_wn.

2We thank G´eraud S´enizergues for pointing us this construction.

(3)

Given a family of elementse1, e2, . . . , e_n, we denote byΠⁿ

i“1e_ithe concatenatione1e2. . . e_n. The length of a worduis denoted by|u|. For a tuplee“ pf1, . . . , f_lq, we define for alliP t1, . . . , luthe projectionπ_ipeqasf_i. We extend this notion of projection to sequences asπipśn

k“1ekq “śn

k“1πipekq

Visibly pushdown automata (VPA). Visibly pushdown automata are a restriction of pushdown automata in which the stack behavior is imposed by the input word. On a call symbol, theVPApushes a symbol onto the stack, on a return symbol, it must pop the top symbol of the stack, and on an internal symbol, the stack remains unchanged. The only exception is that some return symbols may operate on the empty stack.

Definition 2.1 (Visibly pushdown automata). A visibly pushdown automaton(VPA) on finite words overΣ is a tupleA“ pQ, I, F,Γ, δqwhereQis a finite set of states,I ĎQis the set of initial states,F ĎQis the set of final states,Γis a finite stack alphabet andδ“δ_cYδ_rYδ^K_r Yδ_ιis the (finite) transition relation withδ_c,δ_r,δ^K_r andδ_ιfour disjoint sets, withδ_cĎQˆΣ_cˆΓˆQ,δ_rĎQˆΣ_rˆΓˆQ,δ_r^KĎQˆΣ_rˆ tKu ˆQ, andδ_ι ĎQˆΣ_ιˆQ.

For a transitiont“ pq, α, γ, q¹qfromδ_c,δ_rorδ_r^Kort“ pq, α, q¹qinδ_ι, we denote bysourceptqandtargetptqthe statesqandq¹respectively, and byletterptqthe symbolα.

Astack is a word from Γ^˚ and we denote byKthe empty word on Γ. Aconfiguration κof aVPA is a pair pq, σq PQˆΓ^˚.

Definition 2.2. A run ofAon a wordw“α1. . . αlPΣ^˚over a finite (and possibly empty) sequence of transitions ptkq1ďkďlfrom a configurationpq, σqto a configurationpq¹, σ¹qis a finite sequence of symbols and configurations ρ“ pq0, σ0qΠ^l

k“1pαkpqk, σkqqsuch thatpq, σq “ pq0, σ0q,pq¹, σ¹q “ pql, σlq, and, for each1ďkďl, there exists γk PΓsuch that either:

• tk“ pq_k´1, αk, γk, qkq Pδcandσk“σ_k´1γk, or

• t_k“ pqk´1, α_k, γ_k, q_kq Pδ_randσ_k´1“σ_kγ_k, or

• t_k“ pqk´1, α_k,K, qkq Pδ_r^Kandσ_k´1“σ_k“ K, or

• t_k“ pqk´1, α_k, q_kq Pδ_ιandσ_k “σ_k´1.

We denote byRun_wpAqthe set of runs ofAover the wordwand byRunpAqthe set of runs ofA. Note that a run for the empty word is simply any configuration. We writeρ:pq, σqÝÑ pq^w ¹, σ¹qwhen there exists a runρoverwfrom pq, σqtopq¹, σ¹q. We may omit the superscriptwwhen irrelevant. For this runρ, we denote byfirstpρqthe configurationpq, σqand bylastpρqthe configurationpq¹, σ¹q. Given two runsρi “κⁱ0Π^ℓⁱ

k“1pαⁱ_kpκⁱ_kqqforiP t1,2u, we define the concatenationρ1ρ2of these runs, provided thatlastpρ1q “firstpρ2q, asρ1ρ2“κ¹0Π^ℓ¹

k“1pα¹_kκ¹_kqΠ^ℓ²

k“1pα²_kκ²_kq.

Initial and final configurations are configurations of the formpq,KqwithqPIandpq, σqwithqPF respectively.

A run isinitialized if it starts in an initial configuration and it is accepting if it is initialized and ends in a final configuration. We denote byARunwpAqthe set of accepting runs ofAover the wordw. A word is accepted byAiff there exists an accepting run ofAon it. The language ofA, denoted byLpAq, is the set of words accepted byA.

A configurationκisreachable from a configurationκ¹ if there exists a worduinΣ^˚such thatκ¹ ÝÑ^u κ; in this case we say also thatκ¹isco-reachable fromκ. We say that a configurationκ¹isreachableif there exists an initial configurationκ_isuch thatκ¹is reachable fromκ_iandco-reachableif there exists an final configurationκ_fsuch that κ¹is co-reachable fromκ_f.

Definition 2.3. AVPAA“ pQ, I, F,Γ, δqisdeterministicifIis a singleton and the following conditions hold:

• ifpp, c, γ, qq,pp, c, γ¹, q¹q Pδ_cthenγ“γ¹andq“q¹,

• ifpp, a, qq,pp, a, q¹q Pδιorpp, r,K, qq,pp, r,K, q¹q Pδ_r^Korpp, r, γ, qq,pp, r, γ, q¹q Pδrtthenq“q¹. A VPAAis said to bewell-nestedifLpAq ĎΣ^˚_wn. The class of well-nestedVPAis denoted bywnVPA.

Remark 2.1. IfAis well-nested then its final configurationspf, σq(withf a final state) are reachable only ifσ“ K.

(4)

2.2. (Co)-reduced and trimmed VPA

Definition 2.4. LetAbe aVPA. Let us consider the three following conditions : (1) every reachable configuration is co-reachable.

(2) every configuration co-reachable from some reachable and final configuration is reachable.

(3) for every stateq, there exists an accepting run going through a configurationpq, σq.

We say that the automatonAisreducedif it fulfills conditionsp1qandp3q, andco-reducedif it fulfills conditionsp2q andp3q. It istrimmedif it is both reduced and co-reduced. It isweakly reducedif it fulfills conditionp1qandweakly co-reducedif it fulfills conditionp2q.

κi κf

κ

ñ

κi κf

κ κ¹_i

Figure 1: Ifκis co-reachable from a final configuration κf,κf being reachable from an intial configurationκi, thenκis reachable from an initial configurationκ¹_i.

Observe in Figure 1 that conditionp2qlooks more compli- cated than the property stating that every co-reachable configuration is reachable. Indeed, unlike in finite state-automata, the presence of a stack requires one to focus on reachable final configurations, and not to consider arbitrary final configuration.

However, we will see that forwnVPA, this condition is equivalent to the simpler one.

Observe that condition p3q simply corresponds to the re- moval of states that are useless, which can easily be done in polynomial time:

Proposition 2.1. ([4]) LetA“ pQ, I, F,Γ, δqbe aVPA. For any stateqPQ, one can build in polynomial time from Atwo finite state automataBq andCq over the alphabetΓsuch thatLpBqq “ tσ PΓ^˚ | pq, σqis reachable inAu andLpCqq “ tσPΓ^˚| pq, σqis co-reachable inAu.

The property of being co-reduced can be rephrased when consideringwnVPA:

Proposition 2.2. LetA “ pQ, I, F,Γ, δqbe awnVPAsuch that for all statef P F there exists an accepting run leading to the configurationpf,Kq. ThenAisweakly co-reducedif and only if every configuration co-reachable from some configurationpf,Kqwithf PFis reachable.

PROOF. By Remark 2.1 and the assumption onAin the proposition, the set of reachable final configurations ofAis

tpf,Kq |f PFu. l

3. Trimming well-nestedVPA

In this section, given awnVPAA, we present the construction of awnVPA trimwnpAq, which recognizes the same language, and in addition is trimmed. First we define the reducedwnVPA reducepAqwhich is equivalent toA. Next we will presentcoreducepAqand lastly we will combine these two procedures in order to producetrimwnpAq.

3.1. Construction ofwreducepAqandreducepAq

For awnVPAA, we describe the construction of the weakly reducedVPA wreducepAq. TheVPA reducepAqis simply obtained by removing useless states ofwreducepAqusing Proposition 2.1. In fact, ifLpBqq XLpCqq “ H thenqcan be removed because there is no accepting run going through a configurationpq, σqinA.

Consider awnVPAA“ pQ, I, F,Γ, δq. We defineWNas the settpp, qq PQˆQ| Dpp,Kq Ñ pq,Kq PRunpAqu.

This set can be computed in quadratic time as the least one satisfying

• tpp, pq |pPQu ĎWN,

• ifpp, p¹q PWNandpp¹, p²q PWN, thenpp, p²q PWN

• ifpp, qq PWN, andDpq, i, q¹q Pδι, thenpp, q¹q PWN

(5)

• ifpp, qq PWNandDpp¹, c, γ, pq Pδ_c,pq, r, γ, q¹q Pδ_r, thenpp¹, q¹q PWN

Definition 3.1. Given awnVPAA “ pQ, I, F,Γ, δq, we define thewnVPA wreducepAq “ pQ¹, I¹, F¹,Γ¹, δ¹qwith Q¹“WN,I¹“WNX pIˆFq,F¹ “ tpf, fq |f PFu,Γ¹“ΓˆQ, andδ¹defined by its restrictions on call, return³ and internal symbols respectively (namelyδ_c¹,δ_r¹ andδ_ι¹):

• δ¹_c“

"

ppp, qq, c,pγ, qq,pp¹, q¹qq pp, qq,pp¹, q¹q PQ¹,pp, c, γ, p¹q Pδ_c

DrPΣ_r,DsPQ,pq¹, r, γ, sq Pδ_randps, qq PQ¹

*

• δ¹_r“ t ppq¹, q¹q, r,pγ, qq,pp, qqq | pp, qq PQ¹,pq¹, r, γ, pq Pδru

• δ¹_ι“ t ppp, qq, a,pp¹, qqq | pp, qq,pp¹, qq PQ¹,pp, a, p¹q Pδ_ιu

pp, σq

pp¹, σ.γq pq¹, σ.γq

ps, σq pq, σq c

w1

r

w2

Figure 2: Construction of call transitions.

Intuitively, the states (and the stack) ofwreducepAqextend those ofAwith an additional state ofA. This extra component is used bywreducepAq, when simulating a run of theVPAA, to store the target state that the run should reach to pop the symbol on top of the stack. The source states of return transitions are of the formpq, qqto ensure that the target state is reached when popping the top of the stack. To obtain a weakly reducedVPA, we require for the call transitions the existence of a matching return transition that allows one to reach the target state. This condition is depicted on Figure 2, and we give an example of the construction in Figure 3.

1 2 3

c{γ

r{γ c1{γ1

c2{γ² r{γ²

r{γ¹

1,1 2,2

3,3 2,3 c{γ,1

r{γ,1 r{γ²,3

r{γ1,2 c1{γ1,2

c2{γ²,3 r{γ1,3 c1{γ¹,3

Figure 3: On the left aVPAA, on the rightreducepAq. There exists an initialized run ofAovercc2which cannot be completed into an accepting run. This run is no longer present inreducepAq.

For the rest of this subsection, we assume fixed somewnVPAA“ pQ, I, F,Γ, δqand denoteAwredthewnVPA wreducepAq.

TheVPAAwredsatisfies that :

Lemma 3.1. For allwPΣ^˚_wnand allρ¹:ppp, qq,KqÝ^wÑ ppp¹, q¹q,Kq PRunpAwredq,we haveq“q¹.

PROOF. By induction on the structure ofw; we show the property forw“εfor the base case. The induction step is done overw“cw1rw2andw“aw1 withw1, w2 PΣ^˚_wn,c PΣ_c,r PΣ_randaP Σ_ι assuming the property on

w1, w2. l

We now relate accepting runs ofAand ofAwred: we define a mappingΦ_wredfromRunpAwredqtoRunpAq. For any runρ¹“ ppp0, q0q, σ0qΠ^k_i“₁pαipppi, qiq, σiqqofAwred, we letΦ_wredpρ¹qbe the runpp0, π1pσ0qqΠ^k_i“₁pαippi, π1pσiqqq.

Lemma 3.2. For allwPΣ^˚and all runsρPRunwpAwredq, we haveΦ_wredpρq PRunwpAq.

PROOF. By induction on the structure ofw. l

3As the language is well-nested, we do not consider return transitions on the empty stack.

(6)

The next two lemmas prove the injectivity and surjectivity ofΦ_wred.

Lemma 3.3. For allw P Σ^˚_wn, if there exists a runρ : pp,Kq ÝÑ pq,^w KqinRunpAq, then there exists a runρ¹ : ppp, qq,KqÝ^wÑ ppq, qq,Kq PRunpAwredqsuch thatΦ_wredpρ¹q “ρ.

PROOF. By induction on the structure ofwusing Lemma 3.2. l

Lemma 3.4. For allwPΣ^˚_wnand allρ¹1, ρ¹2PRunwpAwredq, ifΦ_wredpρ¹1q “Φ_wredpρ¹2qandlastpρ¹1q “lastpρ¹2q, then ρ¹1“ρ¹2.

PROOF. By induction on the structure ofwusing Lemma 3.2 and Lemma 3.1. l Theorem 3.1. LetAbe awnVPA, and letAwred“wreducepAqandAred“reducepAq.AwredandAredcan be built in polynomial time, and satisfy:

p1q for allwPΣ^˚_wn, there exist bijections betweenARunwpAwredqandARunwpAq, andARunwpAredqandARunwpAq and thus in particularLpAq “LpAwredq “LpAredq,

p2q Awredis weakly reduced, andAredis reduced.

PROOF. We prove that when restricted toARun_wpAwredq, the mappingΦ_wredis bijective.

• Φ_wredis surjective: it is easy to verify that applying Lemma 3.3 on an accepting run ofAyields an accepting run ofAwred, which is in its inverse image by the mappingΦ_wred.

• Φ_wredis injective: this is an immediate corollary of Lemma 3.4 as last states of accepting runs ofAwredare of the formpf, fq. Thus, if two accepting runsρ¹1, ρ¹2have the same image byΦ_wred, they must verifylastpρ¹1q “ lastpρ¹2q.

We now prove thatwreducepAqis weakly reduced that is, that any reachable configuration ofAwred is also co- reachable.

Letκ“ ppp, qq, σqbe a reachable configuration ofAwred. There exists a runρofAwredof the formppi, fq,Kq Ñκ starting in an initial configurationppi, fq,Kq. We show by induction on the size of the stackσthat we can reach a final configuration fromκ.

If|σ| “0, by Lemma 3.1, we obtainq“f. In particular, this impliesqPF. Sincepp, qq PWN, there is a run pp,Kq Ñ pq,KqinA, thus by Lemma 3.3 we have a runppp, qq,Kq Ñ ppq, qq,KqofAwred. This concludes this case as by the above observation (qPF), we havepq, qq PF¹.

We now assume for the induction that the property holds when|σ| ď n and we consider a stack σsuch that

|σ| “ n`1. Let denote by pγ, q¹qthe top symbol of σ, writeσ “ σ¹.pγ, q¹q, and consider the first position in the runρthat pushes this symbol onto the stack. We denote bycthe associated call. More precisely, there exists a unique decomposition ofρasρ : ppi, fq,Kq Ñ ppp¹, q¹q, σ¹q ÝÑ ppp^c ², q²q, σq Ñ ppp, qq, σqsuch that the run from ppp², q²q, σqtoppp, qq, σqis associated with a well-nested word. By Lemma 3.1, we obtainq² “q. Considering the call transition associated withc, and by definition ofδ_c¹, there exists a return transitionppq, qq, r,pγ, q¹q,ps, q¹qq Pδ¹_r for some letterrPΣ_r. In addition, aspp, qq PWN, there is a runpp,Kq Ñ pq,KqinA, and thus by Lemma 3.3 we have a runppp, qq,Kq Ñ ppq, qq,KqinAwred, and thus a runppp, qq, σq Ñ ppq, qq, σqinAwred.

As the top symbol ofσispγ, q¹q, the above return transition can be used to reach configurationpps, q¹q, σ¹qwhose

height isn. The result follows by induction hypothesis. l

An important feature of the constructionreduceis that it preserves the co-reduction:

Proposition 3.1. LetAbe awnVPA. IfAis co-reduced, thenreducepAqis co-reduced as well.

PROOF. We will prove that ifAis co-reduced, thenAwred“wreducepAqis co-reduced as well. SincereducepAqis obtained by removing all the useless states ofAwred, we can conclude thatreducepAqis also co-reduced.

Letκ“ ppp, qq, σqandκi “ pppi, qiq,Kqbe two configurations of Awredsuch that ppi, qiqis an initial state of Awredand there exists inAwredtwo runsρ:κÑ ppf, fq, σ¹qandκi Ñ ppf, fq, σ¹qwheref is a final state ofA. As

(7)

Awredis awnVPAand by Remark 2.1, we haveσ¹“ K. We show by induction on the size of the stackσthatκcan be reached from an initial configuration.

If|σ| “ 0, by Lemma 3.1, q “ f. By construction of the setWN,pp,Kq Ñ pf,Kqis a run ofA. SinceAis co-reduced, there exists a runpi,Kq Ñ pp,KqwithiPI. Thus by Lemmas 3.3 and 3.1ppi, fq,Kq Ñ ppp, fq,Kq Ñ ppf, fq,Kqis a run ofAwred.

We now assume for the induction that the property holds when|σ| ď n and we consider a stack σsuch that

|σ| “n`1. Let us consider the first position in the runρthat pops the top symbol from the stack. We denote byr the associated return. More precisely, there exists a unique decomposition ofρas follows:

ppp, qq, σqÝÑ ppq^w ², q²q, σqÝÑ ppp^r ¹, q¹q, σ¹qÝÑ ppf, f^w¹ q,Kq withwPΣ^˚_wnandσ“σ¹pγ, q¹qforγPΓ By Lemma 3.1,q²“q. By Lemma 3.4, the projection ofρis a run inA:

pp,σq¯ Ý^wÑ pq,σq¯ ÝÑ pp^r ¹,¯σ¹qÝÑ pf,^w¹ Kq PRunpAq whereσ¯“π1pσqandσ¯¹“π1pσ¹q

SinceAis co-reduced, the configurationpp,σq¯ is reachable from an initial configuration ofAby some runρ1. We decomposeρ1so as to exhibit the call transition that pushed the top symbol ofσ:¯

ρ1:pi,Kq Ñ pp²,σ¯¹qÝÝÑ ps,^c,γ σq¯ ÝÝÑ pp,^w² σq P¯ RunpAq wherecPΣ_candw²PΣ^˚_wn By Lemmas 3.3 and 3.1, we deduce:

pps, qq,KqÝÝÑ ppp, qq,^w² KqÝ^wÑ ppq, qq,Kq PRunpAwredq and thus

pps, qq, σqÝÝÑ ppp, qq, σq^w² Ý^wÑ ppq, qq, σq PRunpAwredq

Consider now the call transition oncused in the runρ1. It is of the formt “ pp², c, γ, sq. By the existence of the above run inAwred, we can deduce the existence of the transitionppp², q¹q, c,pγ, q¹q,ps, qqqinδ¹_c. We can thus extend the above run ofAwredas follows:

ppp², q¹q, σ¹qÝÑ pps, qq, σq^c ÝÝÑ ppp, qq, σq^w² Ý^wÑ ppq, qq, σqÝÝÑ ppf, fq,^rw¹ Kq PRunpAwredq

and the result follows by induction hypothesis. l

3.2. Construction ofcoreducepAqandwcoreducepAq

For awnVPAA, we now describe the construction of theVPA wcoreducepAq, which is weakly co-reduced. This construction can be seen as the application ofwreduceon the dual ofAwhich is obtained by swapping call and return symbols and transitions, swapping initial and final states and swapping target and source of transitions. TheVPA coreducepAqis then obtained similarly asreducepAqby removing useless states ofwcoreducepAq.

Definition 3.2. Given awnVPAA“ pQ, I, F,Γ, δq, we define thewnVPA wcoreducepAq “ pQ¹, I¹, F¹,Γ¹, δ¹qwith Q¹“WN,I¹ “ tpi, iq |iPIu,F¹“WNX pIˆFq,Γ¹ “ΓˆQ, andδ¹defined by its restrictions on call, return and internal symbols respectively (namelyδ_c¹,δ_r¹ andδ_ι¹):

• δ¹_c“ tppp, qq, c,pγ, pq,pq¹, q¹qq | pp, qq PQ¹,pq, c, γ, q¹q Pδcu

• δ¹_r“

"

ppp, qq, r,pγ, p¹q,pp¹, q¹qq pp, qq,pp¹, q¹q PQ¹,pq, r, γ, q¹q Pδr,

DcPΣ_c,DsPQ,ps, c, γ, pq Pδcandpp¹, sq PQ¹

*

• δ¹_ι“ tppp, qq, a,pp, q¹qq | pp, qq,pp, q¹q PQ¹,pq, a, q¹q Pδιu

(8)

As we did forwreduce, the states (and the stack) ofwreducepAqextend those ofAwith an additional state of A. Here this information is used to store the stateqreached when the current top symbol of the stack was pushed.

To obtain a weakly co-reducedVPAwe require for the return transitions the existence of a matching call transition that allows one to go through the source stateq. Similarly aswreduceandreduce, the constructionswcoreduceand coreducehave the following properties:

Theorem 3.2. LetAbe awnVPA, and letAwcor “wcoreducepAqandAcor “coreducepAq.Awcor andAcorcan be built in polynomial time, and satisfy:

p1q for allwPΣ^˚there exist bijections betweenARunwpAwcorqandARunwpAq, andARunwpAcorqandARunwpAq and thus in particularLpAq “LpAwcorq “LpAcorq,

p2q Awcoris weakly co-reduced, andAcoris co-reduced.

To prove Theorem 3.2, we proceed the same way as in the proof of Theorem 3.1. Thus we define a mappingΦ_wcor fromRunpAwcorqtoRunpAq. For any runρ¹ “ ppp0, q0q, σ0qΠ^k

i“1pαipppi, q_iq, σiqqofAwcor, we letΦ_wcorpρ¹qbe the runpq0, π1pσ0qqΠ^k

i“1pα_ipq_i, π1pσ_iqqq.

PROOF. Since wcoreduce can be seen as the dual of wreduce, we can prove that Φ_wcor is a bijection between ARunwpAwcorqandARunwpAqby following the lines of the proof of propertyp1qof Theorem 3.1.

Due to Proposition 2.2, forwnVPA, being co-reduced is the exact dual of being reduced. Therefore, the proof of propertyp2qcan be done by following the lines of the proof of propertyp2qof Theorem 3.1. l

The co-reduction construction preserves two importants properties:

Proposition 3.2. LetAbe a reducedVPA, thencoreducepAqis also reduced.

PROOF. The proof of this proposition follows the lines of that of Proposition 3.1. l Proposition 3.3. LetAbe a deterministicVPA, thencoreducepAqis also deterministic.

PROOF. The proof of this proposition is done by inspecting the transitions ofAwcor. l 3.3. Trimming awnVPA

We define the constructiontrimwn astrimwn “ coreduce˝reduce. Proposition 3.2 entails that the construction coreducepreserves the reduction, and thus by Theorems 3.1 and 3.2 we obtain the following result:

Theorem 3.3. LetAbe awnVPA, andAtrim “ trimwnpAq. Atrim is trimmed and can be built in polynomial time.

Furthermore, for allwPΣ^˚, there exists a bijection betweenARun_wpAqandARun_wpAtrimq, and thus in particular LpAq “LpAtrimq.

4. TrimmingVPA

We now present a construction which turns aVPAAinto awnVPAover a special alphabet. In a second step, we trim the resultingVPAusing the procedure described in Section 3 forwnVPA. Last, from the resultingwnVPAwe construct aVPAover the original alphabet recognizing the languageLpAqand which is still trimmed. It is not obvious to propose procedures for transforming aVPAintownVPA, and back, which are compatible with the notion of being trimmed.

(9)

4.1. FromVPAtownVPA...

Words fromΣ^˚differ from well-nested words as they admit unmatched returns and calls. We address these issues by extending the alphabetΣand by modyfing words fromΣ^˚: some new symbols are added and used to complete the unmatched calls by adding new return symbols and to replace unmatched returns by new internal symbols.

LetΣ“Σ_cZΣ_rZΣ_ιbe a structured alphabet. We introduce the structured alphabetΣ^ext“Σ^ext

c ZΣ^ext

r ZΣ^ext

ι

defined byΣ^ext

c “Σ_c,Σ^ext

r “Σ_rZ tru, and¯ Σ^ext

ι “Σ_ιZ tar|rPΣ_ru, wherer¯andtar|rPΣ_ruare fresh symbols.

We define inductively the mappingΩwhich transforms a word overΣinto a word overΣ^ext as follows, given aPΣ_ι,rPΣ_r,cPΣ_candnPN:

• Ωpǫ, nq “ǫ, Ωpaw, nq “aΩpw, nq,

• Ωprw, nq “rΩpw, n´1qifną0, Ωprw,0q “a_rΩpw,0q,

• Ωpcw, nq “

"

cw1rΩpw2, nq ifDw1PΣ^˚_wn,w2PΣ^˚andrPΣ_rsuch thatw“w1rw2, cΩpw, n`1q¯r otherwise.

For example, Ωprrccar,1q “ ra_rccarr. We denote by¯ extpwq the wordΩpw,0q. Intuitively, this mapping replaces every unmatched returnrby the internal symbola_rand adds a suffix of the formr¯^˚in order to match every unmatched call. We can prove by induction on the structure ofwthatextpwqis a well-nested word over the alphabet Σ^ext.

Given a wordwand a naturaln,Ωpw, nqis obtained fromwby replacing every unmatched returnrof height less thannby the internal symbola_rand adds a suffix of the formr¯^˚in order to match every unmatched call. We extend the functionsΩandextto languages in the obvious way.

We define also a variant ofΩwhich does not add the suffixr¯^˚, we call this mappingΩ_ι: Ω_ιpw, nq “w¹such thatΩpw, nq “w¹w²withw¹P pΣ^extzt¯ruq^˚andw²Pr¯^˚ We also defineextιpwqasΩ_ιpw,0q. We extend the functionΩ_ιto runs in the following way:

Ω_ιpκ0Π^l_k“₁pαkκ_kq, nq “κ0Π^l_k“₁pα¹_kκ_kq withΩ_ιpα1. . . α_l, nq “α¹1. . . α¹_l

LetAbe aVPA, we now present the constructionextendwhich turns aVPAoverΣinto awnVPAoverΣ^ext. Intuitively, the construction adds a suffix to theVPAwhich reads words fromr¯^` and replaces the returns on empty stack by internals, thus implementing the mappingext. To achieve this, theVPAmust have some specific properties, introduced in the next definition:

Definition 4.1. LetA “ pQ, I, F,Γ, δqbe aVPA. ThenAisstack-compliantif there exist partitions ofQasQ “ Q_KZQ_MandΓasΓ_KZΓ_Msuch that:

1. For all initialized run ofAleading to a configurationpp, σq, σ“ KiffpPQ_K, 2. For all initialized run ofAleading to a configurationpp, σq, σP tKu Y pΓ_K¨Γ^˚

Mq, 3. For alltPδr,sourceptq RF.

We can now present the constructionextend, which operates on stack-compliantVPA. We will present in Subsec- tion 4.3 the constructionscompliantthat allows to build stack-compliantVPA.

Definition 4.2. LetA“ pQ, I, F,Γ, δqbe a stack-compliantVPAover the alphabetΣwith partitionsQ“Q_KZQ_M and Γ “ Γ_KZΓ_M. We construct extendpAq “ pQ¹, I¹, F¹,Γ¹, δ¹qas thewnVPA over the alphabetΣ^ext where:

Q¹“QY tf¯_K,f¯_Mu,I¹“I,F¹“ pFXQ_Kq Y tf¯_Ku,Γ¹“Γ, andδ¹is given by:

• δ¹_c“δc and δ^K1_r “ H

• δ¹_r“δ_rY tpp,r, γ,¯ f¯_τq |τ P tK,Mu, pPFXQ_M, γ PΓ_τu Y tpf¯_M,r, γ,¯ f¯_τq |τ P tK,Mu, γ PΓ_τu

• δ¹_ι“διY tpp, ar, qq | pp, r,K, qq Pδ^K_r, pPQ_Ku

(10)

Intuitively, extendpAqhas two components: the first one, composed of the states of A, can read words over pΣ^extztruq¯ ^˚and replaces every unmatched return by internal symbols, whereas the second one, composed of the two added statestf¯_K,f¯_Mu, reads words of the formr¯^`. This intuition is formalized for an arbitraryVPAover the alphabet Σ^extin the next definition:

Definition 4.3. LetΣbe an alphabet andA“ pQ, I, F,Γ, δqbe aVPAover the alphabetΣ^ext. We define two subsets ofQas follows:

trappAq “ tqPQ| Dp,pp,¯r, γ, qq Pδru

borderpAq “ tpRtrappAq | DtPδsuch thatsourceptq “pandtargetptq PtrappAqu Elements ofborderpAqare calledborder statesofA.

Border states are the only ones that allow to reach a state intrappAqfrom a state not intrappAq. One can verify that in the case of theVPA extendpAq, we havetrappextendpAqq “ tf¯_M,f¯_KuandborderpextendpAqq “FXQ_M. We give an example of these properties in Figure 4.

1 2 3

4 5 6

c{γ c{γ

c{γ¹ r{γ r{γ¹ r{γ¹ r{K

c{γ¹ c{γ¹

1 2 3 f¯_M

f¯_K

4 5 6

c{γ c{γ

c{γ¹ r{γ r{γ¹ r{γ¹

¯r{γ¹

¯r{γ

ar ¯r{γ

c{γ¹ c{γ¹

r{γ¯ ¹

Figure 4: On the left, theVPAA“ pQ, I, F,Γ, δq, on the right thewnVPAAext “extendpAq. The dashed parts are the components ofA modified byextend.Ais stack-compliant forQ“Q_KZQ_MandΓ“Γ_KZΓ_MwhereQ_K“ t1,4u,Q_M“ t2,3,5,6u,Γ_K“ tγuandΓ_M“ tγ¹u. We will see later thatAis produced by the constructionscompliant. Moreover we haveborderpAextq “ t3uandtrappAextq “ tf¯_M,f¯_Ku.

LetA“ pQ, I, F,Γ, δqbe a stack-compliantVPAwith partitionsQ“Q_KZQ_MandΓ“Γ_KZΓ_M. We define Aext“extendpAqand will prove thatLpAextq “extpLpAqq.

Lemma 4.1. Letw“α1. . . α_k P Σ^˚ be a word andρ “κ0Π^ℓ

k“1pα_kpκ_kqq, withκ_ia configuration ofAfor alli, thenρis an initialized run ofAoverwif and only ifextιpρqis an initialized run ofAextoverextιpwq.

PROOF. By definition,extιpwqis obtained fromwby replacing every unmatched return by an internal symbol. The constructionextendimplements correctly this transformation thanks to pointp1qof definition of stack-compliance.l Lemma 4.2. Letρbe an initialized run ofAextover some wordwleading to configuration pp, σqsuch thatpis a border state ofAext, then there exists exactly one runρ¹over¯r^|σ|such thatρρ¹is an accepting run ofAext.

PROOF. By construction ofAext, the wordwdoes contain any occurrence of the letterr. Thus, by Lemma 4.1, there¯ exists a runρ1ofAsuch thatρ“ext_ιpρ1q. As observed above, we haveborderpAextq “ FXQ_M. Thus by point p2qof definition of stack-compliance,σPΓ_K¨Γ^˚

M, and by construction there exists a unique run ofAextwhich leads

fromptof¯_Koverr¯^|σ|and pops all the symbols of the stack. l

Theorem 4.1. Letw P Σ^˚ be a word, there exists a bijection betweenARun_wpAqandARunextpwqpAextq, and thus LpAextq “extpLpAqq.

PROOF. First suppose thatwis a word without unmatched calls, thenextιpwq “ extpwq. Letρbe an accepting run ofAoverwleading to configurationpp, σq. extιpwq “extpwqentailsσ“ Kand thus, by pointp1qof the definition of stack-compliance, we havepPFXQ_K. The result follows from Lemma 4.1.

Otherwise,wis a word with some unmatched calls. We define the setERunwasERunw“ tρPRunextιpwqpAextq | ρis initialized and ends in a border stateu.

As border states of Aext arise from accepting states of A and by Lemma 4.1, for all runρ P RunwpAq, ρ P ARunwpAqif and only ifextιpρq PERunw. The result follows from Lemma 4.2. l

(11)

4.2. ... And back.

We will define the converse of the functionextend, named retract, which performs the last step of the procedure. However, we have to identify the trap amongst the states of thewnVPAin order to remove it and recover the original language. This transformation is not always possible, and thus we introduce a sufficient condition, named retractability, allowing to apply theretractprocedure:

Definition 4.4. LetΣbe an alphabet andA “ pQ, I, F,Γ, δqbe aVPAover the alphabetΣ^ext. Let us consider the following conditions:

p1q There exists aVPABoverΣsuch thatLpAq “extpLpBqq, p2q We havetrappAq XI“ H,

p3q For all transitionstinδ,letterptq “r¯if and only iftargetptq PtrappAq, p4q For all transitionstinδsuch thatsourceptq PtrappAq,letterptq “r,¯

p5q For each initialized run ofA, which ends in a border state there exists exactly one runρ¹over¯r^`such thatρρ¹is an accepting run.

p6q For all transitionstPδ_rsuch thatsourceptq PborderpAq,letterptq “r.¯

We say thatAisretractableif it fulfills conditionsp1q, . . . ,p5qandstrongly retractableif it is retractable and fulfills conditionp6q.

Retractability properties warrant thatretractcan be applied. In the sequel we will prove that our different constructions preserve the retractability property and thus preserve the trap: for example the trap of thewnVPAproduced bywreduce˝extendis of the formtpf¯_K,f¯_Kq,pf¯_M,f¯_Mqu. Note that we need the strong retractability property because thereduceconstruction does not preserve the retractability. We first show that the constructionextendensures the strong retractability:

Theorem 4.2. LetAbe a stack-compliantVPA, thenextendpAqis strongly retractable.

PROOF. Propertyp1qis a consequence of Theorem 4.1. Propertiesp2q,p3q,p4qare consequences of the construction, propertyp5qis a consequence of Lemma 4.2 and propertyp6qis a consequence of pointp3qof definition of stack-

compliance. l

We now define the constructionretract:

Definition 4.5. LetA“ pQ, I, F,Γ, δqbe a retractablewnVPAover the alphabetΣ^ext, we define theVPA retractpAq “ pQ¹, I¹, F¹,Γ, δ¹q over the alphabet Σ with Q¹ “ QztrappAq, I¹ “ I, F¹ “ pFztrappAqq YborderpAq, and the set of transition rules δ¹ “ δ¹_c Zδ_r¹ Zδ_r^1K Zδ_ι¹ is defined by δ_c¹ “ δ_c, δ¹_r “ tt P δ_r | letterptq ‰ ¯ru, δ_r^1K“ tpp, r,K, qq | pp, a_r, qq Pδ_ιu, andδ_ι¹ “ ttPδ_ι |letterptq PΣ_ιu.

This construction works as follows: it replaces all the internal transitions overarby return transitions on empty stack over symbolr, and removes the return transitions overr. Note that the final states of¯ retractpAqare the final states ofAwhich are not intrappAqand the border states ofA.

Lemma 4.3. Letw“α1. . . αk PΣ^˚be a word andρ“κ0Π^ℓ

k“1pαkpκkqq, withκia configuration ofretractpAqfor alli, starting in configurationκ0“ pp, σq, thenρPRunwpretractpAqqif and only ifΩ_ιpρ,|σ|q PRunΩιpw,|σ|qpAq.

PROOF. First observe thatwPΣ^˚entails, by pointsp2qandp3qof the definition of retractability, that configurations κ_i “ ppi, σ_iqverify the propertyp_iRtrappAqfor alli. AsΩ_ιis applied onρwith|σ|as second argument, it replaces the unmatched returns ofwthat are taken inρwith an empty stack by internal symbols. The lemma is then proved by

an induction on the length ofw. l

(12)

Theorem 4.3. Let A be a retractableVPA on Σ^ext then for any word w P Σ^˚, there exists a bijection between ARun_wpretractpAqqandARunextpwqpAq, and thus in particularLpAq “extpLpretractpAqqq.

PROOF. LetwP Σ^˚be a word and distinguish two cases. Ifwis a word without unmatched calls, thenextpwq “ extιpwq. By Lemma 4.3 and pointp2qof the definition of retractability,extinduces a bijection between the set of initialized runs ofretractpAqoverwand the set of initialized runs ofAoverextpwq. In addition, an initialized runρ ofretractpAqoverwis accepting iffextpρqis accepting inA. Indeed, aswhas no unmatched calls, the configuration pp, σqreached byρverifiesσ“ K, and thuspRborderpAqby pointp5qof the definition of retractable. Thenpis a final state ofretractpAqiffpis a final state ofA. Otherwise,wis a word with some unmatched calls. We define the setRRun_wasRRun_w“ tρPRunextιpwqpAq |ρis initialized and ends in a border stateu.

As in the previous case, we can show by pointp2qof the definition of retractability and by Lemma 4.3 thatext_ι induces a bijection betweenARunwpretractpAqqandRRunw. Thus by pointp5qof the definition of retractable, we have a bijection betweenARunwpretractpAqqandARun_extpwqpAq.

Last, by point p1qof the definition of retractability, any word accepted by Ais of the form extpwq for some wPΣ^˚, and thus the previous bijections implyLpAq “extpLpretractpAqqq. l

Our constructionretractpreserves the trim property:

Proposition 4.1. LetAbe a retractableVPAonΣ^ext, then ifAis trimmed, so isretractpAq.

PROOF. LetAret“retractpAq. For the reduced part, let us consider an initialized runρ: pp,KqÝÑ pq, σq^w ofAret

overw. By Lemma 4.3 ρˆ “ extιpρqis a run of Aover wˆ “ extιpwq. Observe that ρˆis initialized. Since Ais reduced, there exists a runρˆ¹overw¹such thatρˆρˆ¹is an accepting run ofA. By construction we can decomposew¹ asw¹“w1w2withw1PΣ^˚andw2P¯r^˚andρˆ¹asρˆ1:pq, σqÝÝÑ pq^w¹ ¹, σ¹qoverw1andρˆ2:pq¹, σ¹qÝÝÑ pp^w² ¹,Kqover w2. Ifw2 “ε, thenσ¹ “ Kand by constructionq¹is a final state ofAret. By Lemma 4.3, there exists a runρ1such thatρˆ1 “Ωpρ1,|σ|qandρρ1is an accepting run ofAret. Ifw2 ‰ε, thenq¹ PborderAand soq¹is a final state of Aret. Again, by Lemma 4.3 there exists a runρ1such thatρˆ1“Ωpρ1,|σ|qandρρ1is an accepting run ofAret.

For the co-reduced part, let us consider two runsρ:pp, σqÝ^wÑ pq, σ¹qandρ¹:pi,KqÝÑ pq, σ^w¹ ¹qofAretwithqa final state andian initial state. We distinguish two cases:

• Ifσ¹ “ K, then by pointp5qof the definition of retractability, stateqcannot be inborderpAq. By definition of Aret, this implies thatqis a final state ofA. By Lemma 4.3,ρˆ“Ω_ιpρ,|σ|qis a run ofAoverwˆ “Ω_ιpw,|σ|q, then since Aas awnVPAand is co-reduced, there exists a runρˆ² ofAover a word wˆ² such thatρˆ²ρˆis an accepting run ofAoverwˆ²w. Thus by Lemma 4.3 there exists a runˆ ρ² ofAretsuch thatρˆ² “extιpρ²qover w²withwˆ²“extpw²qandρ²ρis an accepting run ofAretoverw²w.

• Ifσ¹ ‰ K, then by construction,qPborderpAq. By Lemma 4.3,ρˆ“Ω_ιpρ,|σ|qandρˆ¹ “extιpρ¹qare runs of AoverΩ_ιpw,|σ|qandextιpw¹qrespectively. Observe thatρˆ¹is initialized. Then by pointp5qof the definition of retractability there exists exactly one runρ¯overr¯^`such thatρˆ¹ρ¯is an accepting run ofAand soρˆρ¯is also a run ofA. SinceAis co-reduced there exists a runρˆ²ofAover a wordwˆ²such thatρˆ²ρˆρ¯is an accepting run of A. By Lemma 4.3, there exists a runρ²ofAretsuch thatρˆ²“extιpρ²qoverw²withwˆ² “extpw²q. Thus by

Lemma 4.3,ρ²ρis an accepting run ofAretoverw²w. l

We prove that the constructionsreducepreserves the strong retractability andcoreducepreserves the retractability:

Proposition 4.2. LetAbe a strongly retractableVPA, thenAred“reducepAqis strongly retractable.

PROOF. We letA“ pQ, I, F,Γ, δqandAred“ pQ¹, I¹, F¹,Γ¹, δ¹q. First note that by construction:

For allpp, qq PQ¹, ifpp, qq PtrappAredqthenpPtrappAq (1a) For allpp, qq PQ¹, ifpp, qq PborderpAredqthenpPborderpAq (1b) Equation (1a) is a consequence of the constructionreduce. Concerning Equation (1b), to prove thatpP borderpAq, the fact thatpR trappAqfollows from pointp3qof Definition 2.4 applied onpp, qq, and the fact that there exists a transition on¯rfromptotrappAqfollows from the constructionreduce.