Rational stochastic languages

(1)

HAL Id: hal-00019728

https://hal.archives-ouvertes.fr/hal-00019728

Preprint submitted on 27 Feb 2006

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Rational stochastic languages

François Denis, Yann Esposito

To cite this version:

François Denis, Yann Esposito. Rational stochastic languages. 2006. �hal-00019728�

(2)

ccsd-00019728, version 1 - 27 Feb 2006

Rational stochastic languages

Franc¸ois Denis and Yann Esposito

LIF-CMI, UMR 6166 39, rue F. Joliot Curie 13453 Marseille Cedex 13 FRANCE fdenis,[email protected]

Abstract. The goal of the present paper is to provide a systematic and comprehensive study of rational stochastic languages over a semiringK∈ {Q,Q⁺,R,R⁺}. A rational stochastic language is a probability distribution over a free monoidΣ^∗which is rational overK, that is which can be generated by a multiplicity automata with parameters in K. We study the relations between the classes of rational stochastic languagesS_K^rat(Σ).

We define the notion of residual of a stochastic language and we use it to investigate properties of several subclasses of rational stochastic languages. Lastly, we study the representation of rational stochastic languages by means of multiplicity automata.

1 Introduction

In probabilistic grammatical inference, data often arise in the form of a finite sequence of wordsw1, . . . , wnover some predefined alphabet Σ. These words are assumed to be independently drawn according to a fixed but unknown probability distribution over Σ^∗. Probability distributions over free monoidsΣ^∗are called stochastic languages. A usual goal in grammatical inference is to try to infer an approximation of this distri- bution in some class of probabilistic models, such as probabilistic automata. A proba- bilistic automaton (PA) is composed of a structure, which is a finite automaton (NFA), and parameters associated with states and transitions, which represent the probabil- ity for a state to be initial, terminal or the probability for a transition to be chosen.

It can easily be shown that probabilistic automata have the same expressivity as Hid- den Markov Models (HMM), which are heavily used in statistical inference [DDE05].

Given the structureAof a probabilistic automaton and a sequence of wordsS, computing parameters forAwhich maximize the likelihood ofS is NP-hard [AW92]. In practical cases however, algorithms based on the E.M. (Expectation-Maximization) method [DLR77] can be used to compute approximate values. On the other hand, in- ferring a probabilistic automaton (structure and parameters) from a sequence of words is a widely open field of research. Most results obtained so far only deal with re- stricted subclasses of PA, such as Probabilistic Deterministic Automata (PDA), i.e.

probabilistic automata whose structure is deterministic (DFA) or Probabilistic Resid- ual Automata (PRA), i.e. probabilistic automata whose structure is a residual finite state automaton (RFSA)[CO94,CO99,dlHT00,ELDD02,DE04].

In other respects, it can be noticed that stochastic languages are particular cases of formal power series and that probabilistic automata are also particular cases of multiplicity automata, notions which have been extensively studied in the field of for- mal language theory[SS78,BR84,Sak03]. Therefore, stochastic languages which can be generated by multiplicity automata are special cases of rational languages. We call them rational stochastic languages. The goal of the present paper is to provide a sys- tematic and comprehensive study of rational stochastic languages so as to bring out

(3)

properties that could be useful for a grammatical inference purpose. Indeed, consid- ering the objects to infer as special cases of rational languages makes it possible to use the powerful theoretical tools that have been developed in that field and hence, give answers to many questions that naturally arise when working with them: is it possible to decide within polynomial time whether two probabilistic automata generate the same stochastic language? does allowing negative coefficients in probabilistic automata extend the class of generated stochastic languages? can a rational stochastic language which takes all its values inQalways be generated by a multiplicity automata with coefficients inQ? and so forth. Also, studying rational stochastic languages for themselves, considered as objects of language theory, helps to bring out notions and properties which are important in a grammatical inference pespective: for example, we show that the notion of residual language (or derivative), so important for grammatical inference [DLT02,DLT04], has a natural counterpart for stochastic languages [DE03], which can be used to express many properties of classes of stochastic languages.

Formal power series take their values in a semiringK: let us denote byKhhΣiithe set of all formal power series. Here, we only consider semiringsQ,R,Q⁺andR⁺. For any such semiringK, we define the setSK^rat(Σ)of rational stochastic languages as the set of stochastic languages overΣwhich are rational languages overK. For any two distinct semiringsK and K^′, the corresponding sets of rational stochastic languages are distinct. We show thatRis a Fatou extension ofQfor stochastic languages, which means that any rational stochastic language overRwhich takes its values inQis also rational overQ. However,R⁺is not a Fatou extension ofQ⁺for stochastic languages:

there exists a rational stochastic language overR⁺ which takes its values inQ⁺and which is not rational overQ⁺.

For any stochastic languagepoverΣand any wordusuch thatp(uΣ^∗)6= 0, let us define the residual languageu⁻¹pofpwith respect toubyu⁻¹p(w) =p(uw)/p(uΣ^∗):

residual languages clearly are stochastic languages. We show that the residual languages of a rational stochastic languagepoverKare also rational overK. The residual subsemimodule[Res(p)]ofKhhΣiispanned by the residual languages of any stochastic language pmay be used to express the rationality ofp:pis rational iff[Res(p)]is included in a finitely generated subsemimodule ofKhhΣii. But when K is positive, i.e.K =Q⁺orK =R⁺, it may happen that[Res(p)]itself is not finitely generated.

We study the properties of two subclasses ofSK^rat(Σ): the setSK^{f ingen}(Σ)composed of rational stochastic languages overKwhose residual subsemimodule is finitely generated and the set SK^{f in}(Σ)composed of rational stochastic languages overK which have finitely many residual languages. We show that for any of these two classes,R⁺ is a Fatou extension ofQ⁺: any stochastic language ofS_R^{f ingen}+ (Σ)(resp. ofS_R^{f in}+(Σ)) which takes its values inQ⁺is an element ofS_Q^{f ingen}+ (Σ)(resp. ofS_Q^{f in}+(Σ)). We also show that for any element pofSK^{f ingen}(Σ), there exists a unique minimal subset of residual languages ofpwhich generates[Res(p)].

Then, we study the representation of rational stochastic languages by means of multiplicity automata. We first show that the set of multiplicity automata with parameters inQwhich generate stochastic languages is not recursive. Moreover, it contains no recursively enumerable subset capable to generate the whole set of rational stochastic languages overQ. A stochastic languagepis a formal series which has two properties:

(i) p(w) ≥ 0 for any word w, (ii) P

wp(w) = 1. We show that the undecidability

(4)

comes from the first requirement, since the second one can be decided within polynomial time. We show that the set of stochastic languages which can be generated by probabilistic automata with parameters in Q⁺ (resp.R⁺) exactly coincides with S_Q^rat⁺(Σ)(resp.S_R^rat⁺(Σ)). A probabilistic automatonAis called a Probabilistic Resid- ual Automaton (PRA) if the stochastic languages associated with its states are residual languages of the stochastic languages pA generated by A. We show that the set of stochastic languages that can be generated by probabilistic residual automata with parameters inQ⁺(resp.R⁺) exactly coincides withS_Q^{f ingen}+ (Σ)(resp.S_R^{f ingen}+ (Σ)). We do not know whether the class of PRA is decidable. However, we describe two decidable subclasses of PRA capable of generatingSK^{f ingen}(Σ)whenK=Q⁺orK =R⁺: the class ofK-reduced PRA and the class of prefixial PRA. The first one provides minimal representation in the class of PRA but we show that the membership problem is PSPACE-complete. The second one produces more cumbersome representation but the membership problem is polynomial. Finally, we show that the set of stochastic languages that can be generated by probabilistic deterministic automata with parameters inQ⁺(resp.R⁺) exactly coincides withS_Q^{f in}⁺(Σ), which is also equal toSQ^{f in}(Σ) (resp.S_R^{f in}+(Σ), which is also equal toS_R^{f in}(Σ)).

We recall some properties on rational series, stochastic languages and multiplicity automata in Section 2. We define and study rational stochastic languages in Section 3.

The relations between the classes of rational stochastic languages are studied in Sub- section 3.1. Properties of the residual languages of rational stochastic languages are studied in Subsection 3.2. A characterisation of rational stochastic languages in terms of stable subsemimodule is given in Subsection 3.3. ClassesSK^{f ingen}(Σ)andSK^{f in}(Σ) are defined and studied in Subsection 3.4. The representation of rational stochastic languages by means of multiplicity automata is given in Section 4.

2 Preliminaries

2.1 Rational series

In this section, we recall some definitions and results on rational series. For more information, we invite the reader to consult [SS78,BR84,Sak03].

LetΣbe a finite alphabet, and Σ^∗ be the set of words onΣ. The empty word is denoted byεand the length of a worduis denoted by|u|. The number of occurrences of the letterxin the wordwis denoted by|w|x. For any integerk, we denote byΣ^kthe set{u ∈Σ^∗ | |u|=k}and byΣ^≤k the set{u∈Σ^∗| |u| ≤ k}. We denote by<the length-lexicographic order onΣ^∗. For any wordu ∈ Σ^∗and any language L ⊆Σ^∗, letuL = {uv ∈ Σ^∗|v ∈ L}and u⁻¹L = {v ∈ Σ^∗|uv ∈ L}. A subsetP ofΣ^∗ is prefixial if for anyu, v ∈Σ^∗,uv∈P ⇒u∈P.

A semiring is a setKwith two binary operations+and·and two constant elements 0 and 1 such that

1. hK,+,0iis a commutative monoid, 2. hK,·,1iis a monoid,

3. the distribution lawsa·(b+c) =a·b+a·cand(a+b)·c=a·c+b·chold, 4. 0·a=a·0 = 0for everya.

(5)

A semiring is positive if the sum of two elements different from 0 is different from 0.

The semirings we consider here are the field of rational numbersQ, the field of real numbersR,Q⁺andR⁺, respectively the non negative elements ofQandR;Q⁺ andR⁺are positive semirings.

LetΣbe a finite alphabet andKa semiring. A formal power series is a mappingr ofΣ^∗intoK. The valuesr(w)wherew∈Σ^∗are referred to as the coefficients of the series, andris written as a formal sumr=P

w∈Σ^∗r(w)w. The set of all formal power series is denoted byKhhΣii. Givenr, the subset ofΣ^∗ defined by{w|r(w) 6= 0}is the support of r and denoted bysupp(r). A polynomial is a series whose support is finite. The subset ofKhhΣiiconsisting of all polynomials is denoted byKhΣi.

We denote by 0 the series all of whose coefficients equal 0. We denote by 1 the series whose coefficient for εequals 1, the remaining coefficients being equal to 0.

The sum of two seriesr and r^′ inKhhΣiiis defined by r+r^′ = P

w∈Σ^∗(r(w) + r^′(w))w. The multiplication of a series r by a scalar a ∈ K is defined by ar = P

w∈Σ^∗a·r(w)w. The Cauchy product of two series r and r^′ is defined by rr^′ = P

w∈Σ^∗

P

w1w2=wr(w₁)·r^′(w₂)

w.These operations furnishKhhΣiiwith the structure of a semiring withKhΣias a subsemiring. The Hadamard product of two series randr^′is defined byr⊙r^′ =P

w∈Σ^∗r(w)r^′(w)w.

A seriesr is quasiregular ifr(ǫ) = 0. Quasiregular series have the property that for everyw ∈Σ^∗, there exist finitely many integers isuch thatrⁱ(w) 6= 0where the exponentiofrⁱrefers to the Cauchy product. Letr be a quasiregular series,r^∗(resp.

r⁺) is defined byr^∗(w) =P

i≥0rⁱ(w)(resp.r⁺(w) =P

i≥1rⁱ(w)).

A subsemiringR ofKhhΣiiis rationally closed ifr⁺ ∈ Rfor every quasiregular elementrofR. The familyK^rathhΣiiofK-rational series overΣ is the smallest rationally closed subset ofKhhΣiiwhich contains all polynomials. WhenK is commutative, the Hadamard product of two rational series is a rational series.

LetKbe a semiring and letm, nbe two integers. Let us denote byK^m×nthe set ofm×nmatrices whose elements belong toK and byImthe matrix whose diagonal elements are equal to 1 and whose all other elements are null. Note thatK^m×mforms a semiring.

A seriesris recognizable if there exists a multiplicative homomorphismµ:Σ^∗→ K^n×n, n ≥ 1, and two matricesλ∈ K^1×n, γ ∈ K^n×1 such that for everyw ∈ Σ^∗, r(w) = λµ(w)γ. The tuple(λ, µ, γ)is called anndimensional linear representation ofr. A linear representation ofris said to be reduced if its dimension is minimal.

Let us denote byK^rechhΣiithe set of all recognizable series.

Theorem 1. [Sch61] The familiesK^rathhΣiiandK^rechhΣiicoincide.

LetKbe a semiring. Then a commutative monoid V is called aK-semimodule if there is an operation·fromK×V intoV such that for anya, b∈K, v, w∈V,

1. (ab)·v=a·(b·v),

2. (a+b)·v=a·v+b·vanda·(v+w) =a·v+a·w, 3. 1·v=vand0·v= 0.

If S is a subset of a K-semimodule V, the subsemimodule [S] generated by S is the smallest of all subsemimodules of V containing S. It can be proved that [S] = {a₁s₁+. . .+ansn|n∈N^∗, ai ∈K, si∈S}.

(6)

Let us consider the semimoduleK^Σ^∗of all functionsF :Σ^∗→K. For any word uofΣ^∗and any functionFofK^Σ^∗, we define a new functionuF˙ byuF˙ (v) =F(uv) for any wordv. The operator transforming F into uF˙ is linear: for anyF, G ∈K^Σ^∗ anda ∈ K,u˙(a·F) = a·uF˙ and u(F˙ +G) = ˙uF + ˙uG. A subset B ofK^Σ^∗ is called stable if the conditionsu∈Σ^∗ andF ∈Bimply thatuF˙ ∈B.

Theorem 2. [Fli74,Jac75] Suppose thatKis a commutative semiring andrbelongs toKhhΣii. Then the following three conditions are equivalent:

1. rbelongs toK^rathhΣii;

2. the subsemimodule ofKhhΣiigenerated by{ur˙ |u∈Σ^∗}is contained in a finitely generated stable subsemimodule ofK^Σ^∗;

3. rbelongs to a finitely generated stable subsemimodule ofK^Σ^∗.

WhenKis not a field, it may happen that a seriesrbelongs to a finitely generated stable subsemimodule ofKhhΣii, and hence is a rational series, while the stable subsemimodule generated by{ur˙ |u ∈ Σ^∗}is not finitely generated. An example of this situation will be provided on Example 1.

Two linear representations(λ, µ, γ)and(λ^′, µ^′, γ^′)of a rational seriesrare similar if there exists an inversible matrixm ∈K^n×nsuch thatλ^′ =λm, µ^′w= m⁻¹µwm for any wordwandγ^′ =m⁻¹γ.

Theorem 3. [Sch61,Fli74] Assume thatK is a commutative field. Then any two re- duced linear representations(λ, µ, γ)and(λ^′, µ^′, γ^′)of a rational seriesrare similar.

The dimension of any reduced linear representation ofris also the dimension of the vector subspace generated by{ur˙ |u∈Σ^∗}.

Let K be a subsemiring of K^′. K^′ is said to be a Fatou extension of K if every rational series over K^′ with coefficients in K is a rational series over K. It has been shown in [Fli74] that when K and K^′ are commutative fields then K^′ is a Fatou extension of K. Therefore, R is a Fatou extension of Q: any rational series over R which only takes rational values is a rational series over Q: R^rathhΣii ∩ QhhΣii = Q^rathhΣii. It has also been proved thatR⁺ is not a Fatou extension of Q⁺:Q⁺^rathhΣii( R⁺^rathhΣii ∩Q⁺hhΣii.

2.2 Stochastic languages

A stochastic language is a formal seriespwhich takes its values inR⁺and such that P

w∈Σ^∗p(w) = 1. For any stochastic languagepand any languageL ⊆Σ^∗, the sum P

w∈Lp(w)is defined without ambiguity. So, let us denoteP

w∈Lp(w)byp(L). The set of all stochastic languages overΣis denoted byS(Σ). For any stochastic language pand any wordusuch thatp(uΣ^∗)6= 0, we define the stochastic languageu⁻¹pby

u⁻¹p(w) = p(uw) p(uΣ^∗)·

u⁻¹pis called the residual language ofpwrtu. Let us denote byres(p)the set{u∈ Σ^∗|P

w∈Σ^∗p(uw) 6= 0}and by Res(p)the set {u⁻¹p|u ∈ res(p)}. For anyK ∈ {R,R⁺,Q,Q⁺}, defineSK^rat(Σ) = K^rathhΣii ∩ S(Σ), the set of rational stochastic

(7)

languages overK. LetS ={s₁, . . . , sn}be a finite subset ofS(Σ). The convex hull ofSinKhhΣiiis defined byconvK(S) ={s∈KhhΣii|s=α1·s1+. . .+αn·sn

where eachαi ∈K, αi ≥0andα₁+. . .+αn= 1}. Clearly, any element ofconvK(S) is a stochastic language.

Example 1. LetΣ = {a}, and let p₁, p₂ and pbe the rational stochastic languages overR⁺defined onΣ^∗by

p₁(aⁿ) = 2⁻⁽ⁿ⁺¹⁾, p₂(aⁿ) = 3·2^−(2n+2) andp= (p₁+p₂)/2.

Check that

a˙ⁿp₁ = p₁

2ⁿ,a˙ⁿp₂ = p₂

2²ⁿ anda˙ⁿp= 2ⁿp₁+p₂ 2²ⁿ⁺¹ and

(aⁿ)⁻¹p₁ =p₁,(aⁿ)⁻¹p₂=p₂and(aⁿ)⁻¹p= 2ⁿp₁+p₂ 2ⁿ+ 1 ·

Let V be the vector subspace of RhhΣii generated by p₁ and p₂: V is represented on Figure 1. The subsemimodule of R⁺hhΣii generated by p1 and p2 corresponds

000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000

111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111 111111

p1 p p2

O p3

˙ ap a˙²p

V Vp

S(Σ)∩ V

a⁻¹p (a²)⁻¹p

Fig. 1. The stable subsemimodule ofR⁺hhΣiigenerated bypis equal toVp: it does not contains the halfline]Op1)and it is not finitely generated.

to the closed halfconeC delimited by the halflines[Op₁)and [Op₂). The line(p₁p₂) is composed of the rational series r inV which satisfy P

w∈Σ^∗r(w) = 1. Letq = αp₁+ (1−α)p₂. The constraintq(aⁿ)≥0is equivalent to the inequality

(2ⁿ⁺¹−3)α+ 3≥0.

The seriesqsuch thatq(aⁿ)≥0for any integernmust satisfy 0≤α ≤3.

(8)

Letp₃ = 3p₁ −2p₂. The stochastic languages inV are the points of the line (p₂p₃) which lie betweenp2andp3.

LetV^pbe the subsemimodule ofR⁺hhΣiigenerated by{up˙ |u∈Σ^∗}. Check that Vp={t(αp₁+ (1−α)p₂)|1/2≤α <1, t∈R⁺}and thatVpis not finitely generated.

2.3 Automata

A non deterministic finite automaton (NFA) is a tuplehΣ, Q, Q_I, Q_T, δiwhereQis a finite set of states,QI ⊆ Qis the set of initial states, QT ⊆ Q is the set of final states,δ is the transition function defined fromQ×Σ to2^Q. Let δ also denote the extended transition function defined from2^Q×Σ^∗to2^Qbyδ(q, ε) ={q},δ(q, wx) =

∪q^′∈δ(q,w)δ(q^′, x)and δ(R, w) = ∪q∈Rδ(q, w) for anyq ∈ Q, R ⊆ Q,x ∈ Σ and w ∈ Σ^∗. An NFA is deterministic (DFA) ifQI contains only one element q₀ and if

∀q∈Q,∀x∈Σ,|δ(q, x)| ≤1.

LetKbe a semiring. AK-multiplicity automaton (MA) is a 5-tuplehΣ, Q, ϕ, ι, τi whereQis a finite set of states,ϕ:Q×Σ×Q→Kis the transition function,ι:Q→ Kis the initialization function andτ :Q→K is the termination function. LetQI = {q ∈Q|ι(q) 6= 0}be the set of initial states andQT ={q ∈Q|τ(q) 6= 0}be the set of terminal states. The support of an MAhΣ, Q, ϕ, ι, τiis the NFAhΣ, Q, QI, QT, δi whereδ(q, x) = {q^′ ∈ Q|ϕ(q, x, q^′) 6= 0}. We extend the transition function ϕto Q×Σ^∗×Qbyϕ(q, wx, r) =P

s∈Qϕ(q, w, s)ϕ(s, x, r)andϕ(q, ε, r) = 1ifq=r and0otherwise, for anyq, r ∈Q,x∈ Σandw∈Σ^∗. For any finite subsetL⊂Σ^∗ and anyR⊆Q, defineϕ(q, L, R) =P

w∈L,r∈Rϕ(q, w, r).

For any MAA=hΣ, Q, ϕ, ι, τi, we define the seriesrAby rA(w) = X

q,r∈Q

ι(q)ϕ(q, w, r)τ(r).

For anyq∈Q, we define the seriesrA,qbyrA,q(w) =P

r∈Qϕ(q, w, r)τ(r).

If the semiring K is positive, it can be shown that the support of the series r_A defined by aK-multiplicity automaton is equal to the language defined by the support ofA. In particular,supp(rA)is a regular language. This property is false in general whenKis not positive.

Two MAAandA^′ are equivalent if they define the same series, i.e. ifrA=r_A^′. Let A = hΣ, Q, ϕ, ι, τi be a K-MA and let q ∈ Q. Suppose that there exist coefficientsαq^′ ∈ K forq^′ ∈ Q^′ = Q\ {q}such that rA,q = P

q^′∈Q^′αq^′rA,q^′.Let A^′ =hΣ, Q^′, ϕ^′, ι^′, τ^′iwhere

– ϕ^′(r, x, s) =ϕ(r, x, s) +αsϕ(r, x, q)for anyr, s∈Q^′andx∈Σ, – ι^′(r) =ι(r) +αrι(q)for anyr∈Q^′,

– τ^′(r) =τ(r)for anyr∈Q^′.

The multiplicity automaton A^′ is called aK-reduction ofA. A multiplicity au- tomatonAis calledK-reduced if it has noK-reduction.

Proposition 1. LetA=hΣ, Q, ϕ, ι, τibe aK-MA and letA^′ =hΣ, Q^′, ϕ^′, ι^′, τ^′ibe aK-reduction ofA. Then, for any state q^′ ∈ Q^′, r_A^′_,q^′ = r_A,q^′. As a consequence, r_A^′ =rA.

(9)

Proof. Let Q^′ = Q \ {q} and let α_q^′ ∈ K for any q^′ ∈ Q^′ such that rA,q = P

q^′∈Q^′αq^′rA,q^′.For any stater∈Q^′, we have

r_A^′_,r(ε) =τ^′(r) =τ(r) =r_A,r(ε).

Now, assume that for any word w of length ≤ k and any state r ∈ Q^′ we have rA^′,r(w) =rA,r(w). Letxbe a letter, we have:

rA^′,r(xw) = X

s∈Q^′

ϕ^′(r, x, s)rA^′,s(w) = X

s∈Q^′

(ϕ(r, x, s) +αsϕ(r, x, q))rA,s(w)

= X

s∈Q^′

ϕ(r, x, s)rA,s(w) +ϕ(r, x, q) X

s∈Q^′

αsrA,s(w)

= X

s∈Q^′

ϕ(r, x, s)r_A,s(w) +ϕ(r, x, q)r_A,q(w)

=X

s∈Q

ϕ(r, x, s)rA,s(w) =rA,r(xw).

Hence,r_A^′_,r =r_A,rfor anyrofQ^′. Moreover, rA^′ = X

s∈Q^′

ι^′(s)rA,s= X

s∈Q^′

(ι(s) +αsι(q))rA,s

= X

s∈Q^′

ι(s)rA,s+ι(q) X

s∈Q^′

αsrA,s=X

s∈Q

ι(s)rA,s=rA.

⊓

⊔ A state q ∈ Qis accessible (resp. co-accessible) if there exists q₀ ∈ QI (resp.

qt ∈ QT) and u ∈ Σ^∗ such thatϕ(q₀, u, q) 6= 0(resp. ϕ(q, u, qt) 6= 0). An MA is trimmed if all its states are accessible and co-accessible. Given an MAA, a trimmed MA equivalent toAcan efficiently be computed fromA.

From now, we only consider trimmed MA.

We shall consider several subclasses of multiplicity automata, defined as follows:

A semi Probabilistic Automaton (semi-PA) is an MAhΣ, Q, ϕ, ι, τisuch thatι, ϕ andτ take their values in[0,1], such thatP

q∈Qι(q) ≤1and for any stateq,τ(q) + ϕ(q, Σ, Q)≤1. Semi-PA generate rational series overR⁺.

A Probabilistic Automaton (PA) is a trimmed semi-PA hΣ, Q, ϕ, ι, τi such that P

q∈Qι(q) = 1and for any stateq,τ(q) +ϕ(q, Σ, Q) = 1. Probabilistic automata generate stochastic languages.

Proposition 2. LetA = hΣ, Q, ϕ, ι, τibe aK-semi-PA (resp. aK-PA). Forq ∈ Q, P

w∈Σ^∗rA,q(w)≤1(resp.P

w∈Σ^∗rA,q(w) = 1). As a consequence,P

w∈Σ^∗rA(w)≤ 1(resp.P

w∈Σ^∗rA(w) = 1).

(10)

Proof. For any integerkand anyq∈Q, we have X

|w|≤k+1

rA,q(w) +ϕ(q, Σ^k+2, Q)

= X

|w|≤k

rA,q(w) +X

r∈Q

ϕ(q, Σ^k+1, r)τ(r) +X

r∈Q

ϕ(q, Σ^k+1, r)ϕ(r, Σ, Q)

= X

|w|≤k

rA,q(w) +X

r∈Q

ϕ(q, Σ^k+1, r)[τ(r) +ϕ(r, Σ, Q)].

From this relation, it is easy to infer by induction onkthat X

|w|≤k

r_A,q(w) +X

r∈Q

ϕ(q, Σ^k+1, r)≤1(resp. = 1)

whenAis a semi-PA (resp. a PA).

A first consequence is that X

w∈Σ^∗

rA,q(w) ≤1and X

w∈Σ^∗

rA(w) = X

w∈Σ^∗

X

q∈Q

ι(q)rA,q(w) ≤1.

Letn=|Q|. SinceAis trimmed, there exists a wordu∈Σ^≤n−1such thatr_A,q(u)>

0. Therefore, there existsα <1such thatϕ(q, Σⁿ, Q)< α. It can easily be shown, by induction on the integerk, thatϕ(q, Σ^kn, Q)< α^k.

Now, whenAis a PA, we have X

w∈Σ^∗

rA,q(w)≥ X

|w|<kn

rA,q(w) = 1−ϕ(q, Σ^kn, Q)>1−α^k

for any integerk. Therefore,

X

w∈Σ^∗

rA,q(w) = 1.

Finally,

X

w∈Σ^∗

r_A(w) = X

w∈Σ^∗

X

q∈Q

ι(q)r_A,q(w) =X

q∈Q

ι(q) = 1.

⊓

⊔ It can easily be deduced from Proposition 2 that aR⁺-reduction of a PA is still a PA (the property is false in general for a semi-PA).

A Probabilistic Residual Automaton (PRA) is a PAhΣ, Q, ϕ, ι, τisuch that for any q∈Q, there exists a wordusuch thatrA,q =u⁻¹rA. Check that aR⁺-reduction of a PRA is still a PRA, since the series associated with the states remain unchanged within a reduction.

A Probabilistic Deterministic Automaton (PDA) is a PA whose support is deter- ministic. Check that a PDA is a PRA. Therefore, aR⁺-reduction of a PDA is a PRA, but since reduction introduces non-determinism, it is no longer a PDA.

For any classCofK-multiplicity automata, let us denote bySK^C(Σ)the class of all stochastic languages which are recognized by an element ofC.

(11)

A 1 q₀ a,0.5 q₁ 1 b,0.5

B 1 q₀ a,0.4;b,0.4 q₁ b,0.2

1

C 0.5 q₀ q₁

0.4 b,0.2

a,0.4;b,0.4

b,0.5

0.5

D 0.5 q₀ q₁

0.4 b,0.2

a,0.4;b,0.4

a,0.2;b,0.3

0.5

Fig. 2. Let us precise notations on automaton A: q₀ is the unique initial state and ι(q₀) = 1, q₁ is the unique terminal state and τ(q₁) = 1, ϕ(q₀, a, q₁) = 0.5, ϕ(q₀, b, q₀) = 0.5 and any other transitions satisfy ϕ(q, x, q^′) = 0.A is a PDA;B is a PRA sincerB,q0 =rBandrB,q1 =a⁻¹rB;Cis also a PRA sincerC,q0 =ab⁻¹rC

andrC,q₁ =a⁻¹rC; it can easily be shown thatDis not a PRA.

2.4 Equivalent representations of rational series

Stable finitely generated subsemimodules, linear representations and multiplicity automata provide us with several representations of rational series. The following classical claims show that they are equivalent: in particular, a seriesr over K is rational iff there exists aK-multiplicity automatonAsuch thatr =rA. Moreover, any one of these representations can efficiently be derived from any other one.

Claim 1 Let M be a stable subsemimodule of KhhΣiigenerated by r₁, . . . , r_n and containing the seriesr. Letαi and α^x_i,j be coefficients inK defined for any letterx and any1≤i, j≤nsuch that

r =

n

X

i=1

αiriandxr˙ i=

n

X

j=1

α^x_i,jrj.

Let(λ, µ, γ)be the linear representation defined byλ[1, i] =αi,µ(x)[i, j] =α^x_i,j andγ[i,1] =ri(ε)for any1≤i, j≤nand anyx ∈Σ. Then(λ, µ, γ)is a linear representation ofr.

Claim 2 Let(λ, µ, γ)be ann-dimensional linear representation ofrand letA=hΣ, Q, ϕ, ι, τi be the MA defined byQ={1, . . . , n},ι(i) =λ[1, i],τ(i) =γ[i,1]andϕ(i, x, j) = µ(x)[i, j]. Thenr=rA.

Claim 3 LetA=hΣ, Q, ϕ, ι, τibe an MA and letM be the subsemimodule generated by {rA,q|q∈Q}. ThenMis a stable subsemimodule ofKhhΣiiwhich containsrA.

(12)

The proofs of these claims are classical. We give them for sake of completness.

Proof (Claim 1). Let us prove by induction on the length of the wordwthat for any wordw,µ(w)γ = (r₁(w), . . . , rn(w))^t. From definition,µ(ε)γ =γ = (r₁(ε), . . . , rn(ε))^t.

Suppose that the relation is proved for all words of length≤nand letw∈Σⁿand x∈Σ.

µ(xw)γ =µ(x)µ(w)γ

=µ(x)(r₁(w), . . . , r_n(w))^tby induction hypothesis

=





n

X

j=1

α^x_1,jrj(w), . . . ,

n

X

j=1

α^x_n,jrj(w)





t

= ( ˙xr₁(w), . . . ,xr˙ n(w))^t

= (r₁(xw), . . . , rn(xw))^t.

Now, for any wordw,

λµ(w)γ =λ(r₁(w), . . . , r_n(w))^t=

n

X

i=1

α_ir_i(w) =r(w).

⊓

⊔

Proof (Claim2). For any wordw, we have rA(w) =

n

X

i,j=1

ι(i)ϕ(i, w, j)τ(j) =

n

X

i,j=1

=λ[1, i]µ(w)[i, j]γ[i,1] =λµ(w)γ.

⊓

⊔ Proof (Claim3). First note thatrA=P

q∈Qι(q)rA,qand therefore,rA∈M.

Next, for any letterx, any wordwand any stateq ∈Q,

˙

xrA,q(w) =rA,q(xw) = X

q^′∈Q

ϕ(q, x, q^′)rA,q^′(w) and therefore,

˙

xrA,q= X

q^′∈Q

ϕ(q, x, q^′)rA,q^′.

Mis a stable subsemimodule ofKhhΣii. ⊓⊔

These equivalent characterizations make it possible to transfer definitions from one representation mode to another: check that ann-dimensional linear representation of a rational series overK is reduced if and only iff the corresponding multiplicity automaton isK-reduced.Also, results obtained using one representation can immediatly be transfered to the other ones.

(13)

2.5 Computing equivalence and reduction of MA

Deciding whether two NFA are equivalent is a PSPACE-complete problem. However, deciding whether two MA are equivalent can be achieved within polynomial time.

Proposition 3. It is decidable within polynomial time whether two MAs over R are equivalent.

Proof. LetAandA^′be two MA and let(λ, µ, γ)(resp.(λ^′, µ^′, γ^′)) be ann-dimensional (resp. n^′-dimensional) linear representation of the rational seriesrA (resp. rA^′). For any word wlet θ(w) = (µ(w)γ, µ^′(w)γ^′). Let E be the vector subspace of Rⁿ⁺ⁿ^′ spanned by{θ(w)|w∈Σ^∗}and letT be the linear mapping fromRⁿ⁺ⁿ^′ toRdefined by T(u, u^′) = λu−λ^′u^′ for anyu ∈ Rⁿ and u^′ ∈ Rⁿ^′. The series r_A and r_A^′ are equal, i.e.AandA^′are equivalent, iff∀(u, u^′)∈E, T(u, u^′) = 0, property which can

be checked within polynomial time. ⊓⊔

The following algorithm decides the equivalence of two MA:

Input: A, A^′ MA B ={ε}, S ={x|x∈Σ} while S 6=∅ do

let v be the smallest element in S and let S=S\ {v} if θ(v) does not belong to the subspace spanned by θ(B) then

B=B∪ {v} and S =S∪ {vx|x∈Σ} end if

end while while B 6=∅ do

let v∈B and let B =B\ {v} if T(θ(v))6= 0 then

output no ; exit end if

end while output yes.

The first part of the algorithm computes a basis of E; the second part checks whetherT(E) ={0}.

Note that whenA and A^′ are not equivalent, the previous algorithm provides a wordusuch thatrA(u)6=rA^′(u)and whose length is≤ |Q|+|Q^′|.

Proposition 4. LetA0, A1, . . . , Anbe MAs overR. It is decidable within polynomial time whether there exists α₁, . . . , αn ∈ Rsuch that rA₀ = Pn

i=1αirAi. More pre- cisely, all such tuples of parameters (α₁, . . . , α_n) are solutions of a linear system computable within polynomial time.

Proof. Consider the following algorithm.

Let Eq ={r_A₀(ε) =Pn

i=1x_ir_A_i(ε)}

#Eq is a set of independent equations on variables x₁, . . . , xn. While Eq has a solution (α₁, . . . , αn) such that rA₀ 6=Pn

i=1αirAi

(14)

Let u be a word such that rA₀(u)6=Pn

i=1αirAi(u) Eq=Eq∪ {rA0(u) =Pn

i=1xirAi(u)} Output : Eq

From Proposition 3, ifr_A₀ 6=Pn

i=1α_ir_A_i, a wordusuch thatr_A₀(u)6=Pn

i=1α_ir_A_i(u) and whose length is≤Pn

i=0|Qi|can be found within polynomial time (where|Qi|is the number of states ofAi). The algorithms ends sinceEqhas at mostn+ 1elements.

It is clear that(α₁, . . . , α_n)is a solution ofEqiffr_A₀ =Pn

i=1α_ir_A_i. ⊓⊔ A similar result holds when we ask for positive coefficients.

Proposition 5. LetA0, A1, . . . , Anbe MAs overR. It is decidable within polynomial time whether there existsα₁, . . . , αn∈R⁺such thatrA₀ =Pn

i=1αirAi.

Proof. Add the constraints x₁ ≥ 0, . . . , x_n ≥ 0 to the system Eq in the previous algorithm. A polynomial linear programming algorithm will then find a solution ofEq

or decide thatEqhas no solution. ⊓⊔

As a consequence of these propositions, it can efficiently be decided whether an MAAisK-reduced .

Proposition 6. LetA=hΣ, Q, ϕ, ι, τibe aK-MA. It is decidable within polynomial time whetherAisK-reduced; ifAis notK-reduced, aKreduction can be computed within polynomial time.

Proof. For anyq ∈ Q, check whether there exist coefficientsαq^′ ∈K forq^′ ∈Q^′ = Q\ {Q}such thatrA,q = P

q^′∈Q^′α_q^′r_A,q^′. If so, use these coefficients to compute a

K-reduction ofA. ⊓⊔

3 Rational stochastic languages

The objects we study are rational stochastic languages, i.e. stochastic languages which are also rational series. A rational stochastic language can always be generated by using a multiplicity automaton. But depending on the setK of numbers used for the parameters, we obtain different setsSK^rat(Σ)of rational stochastic languages. In the following, we suppose thatK ∈ {R,R⁺,Q,Q⁺}. First, we study the relations between all these classes of rational stochastic languages and next, we give a characterization ofSK^rat(Σ)in terms of stable subsemimodules ofS(Σ).

3.1 Relations between classes of rational stochastic languages Let us begin by the simplest inclusions.

Proposition 7.

S_Q^rat⁺(Σ)⊆ SQ^rat(Σ)(SR^rat(Σ)andS_Q^rat⁺(Σ)(S_R^rat⁺(Σ)⊆ SR^rat(Σ).

Moreover,

SR^rat⁺(Σ)\QhhΣii 6=∅.

(15)

Proof. LetK₁be a subsemiring ofK₂. We haveK₁^rathhΣii ⊆K₂^rathhΣiiand hence, SK^rat₁(Σ)⊆ SK^rat₂(Σ).

Now, letr be the rational series defined on Σ = {a} by r(ε) = √

2/2, r(a) = 1−√

2/2andr(aⁿ) = 0for anyn≥2. Clearly,r ∈ S_R^rat+(Σ)\QhhΣiiwhich implies thatSQ^rat(Σ)(SR^rat(Σ)andS_Q^rat⁺(Σ)(S_R^rat⁺(Σ). ⊓⊔ A rational stochastic language overRwhich only takes rational values is a rational stochastic language overQ.

Proposition 8.

SR^rat(Σ)∩QhhΣii=SQ^rat(Σ).

Proof.

Recall that Ris a Fatou extension ofQ: any rational series overRwhich only takes rational values is a rational series overQi.e.

R^rathhΣii ∩QhhΣii=Q^rathhΣii. As a consequence,

SR^rat(Σ)∩QhhΣii=S(Σ)∩R^rathhΣii ∩QhhΣii

=S(Σ)∩Q^rathhΣii

=SQ^rat(Σ).

⊓

⊔ It has also been proved thatR⁺ is not a Fatou extension ofQ⁺:Q⁺^rathhΣii ( R⁺^rathhΣii ∩Q⁺hhΣii. We prove below that this result can be extended to stochastic languages: there exists a rational stochastic language overR⁺which takes only rational values and which is not a rational stochastique language overQ⁺.

Proposition 9. S_Q^rat⁺(Σ)(S_R^rat⁺(Σ)∩Q⁺hhΣii.

Proof. We use an element in R⁺^rathhΣii ∩Q⁺hhΣii \ Q⁺^rathhΣii described in [BR84] to prove the proposition.

Consider the multiplicity automatonA=hΣ, Q, ϕ, ι, τiwhereΣ ={a, b},Q= {q₀, q₁},ι(q₀) = ι(q₁) = 1,ϕ(q₀, a, q₀) = α², ϕ(q₀, b, q₀) = α⁻², ϕ(q₁, a, q₁) = α⁻²,ϕ(q₁, b, q₁) =α²whereα = (√

5 + 1)/2,ϕ(qi, x, qj) = 0for anyx ∈Σwhen i6=jandτ(q₀) =τ(q₁) = 1(see Figure 3).

LetrA be the rational series generated by A. Let w ∈ Σ^∗. We have rA(w) = α²ⁿ+α⁻²ⁿwheren=|w|a− |w|b. Check that for any integern,α²ⁿ+α⁻²ⁿ ∈N.

Hence,rA∈R⁺^rathhΣii ∩Q⁺hhΣii. It is shown in [BR84] thatrA6∈Q⁺^rathhΣii. Now let A^′ = hΣ, Q, ϕ^′, ι^′, τ^′i where for any states q and q^′ and any letter x, ι^′(q) = 1/2, ϕ^′(q, x, q^′) = ϕ(q, x, q^′)/4 and τ^′(q₀) = τ^′(q₁) = 1/4. Check that α²+α⁻² = 3. Then,A^′is a probabilistic automaton. Letpbe the stochastic language generated byA. We have

p(w) = 1

2^2|w|+3 α²ⁿ+α⁻²ⁿ

wheren=|w|a− |w|b

(16)

and hence

p∈ SR^rat⁺(Σ)∩Q⁺hhΣii.

Let s be the series defined by s(w) = 2^2|w|+3. Clearly, s ∈ Q⁺^rathhΣii and rA =s⊙p(Hadamard product). Recall that whenK is commutative, the Hadamard product of two rational series is a rational series. ThereforerA6∈Q⁺^rathhΣii ⇒p6∈

Q⁺^rathhΣiiand hence,p6∈ S_Q^rat⁺(Σ). ⊓⊔

A 1 q₀ 1

a, α²;b, α⁻²

1 q₁ 1 a, α⁻²;b, α²

A^′ 1/2 q₀ 1/4 a,^α₄²;b,^α⁻²₄

1/2 q₁ 1/4 a,^α⁻²₄ ;b,^α₄²

A^′′ ε

1/4

a 1/4 1

b,³₄

a,³₈;b,⁻³₈

a,⁻¹₆ ;b,¹₆ a,³₄

Fig. 3.A^′ generates a rational stochastic language p_A^′ which takes all its values in Q. However, pA^′ is not a rational stochastic language overQ⁺.A^′′ is a multiplicity automaton overQwhich generatespA^′.

Remark that sincepis a rational stochastic language which takes all its values inQ, pis a rational stochastic language overQ, from Prop 8. Letp₀ =pA^′,q₀andp₁ =pA^′,q₁

be the stochastic languages generated from the statesq₀andq₁of automatonA^′. It can easily be shown that

p= ¹₂p₀+¹₂p₁ a⁻¹p= ^α₃²p₀+^α⁻²₃ p₁

These relations makes it possible to base onpanda⁻¹pan automata which recognizes p. Check that

˙ ap= 3

8p,bp˙ = 3 4p−3

8a⁻¹p,aa˙ ⁻¹p= −1 6 p− 3

4a⁻¹pandba˙ ⁻¹p= 1 6p+3

4a⁻¹p.

These relations can be used to prove that the automatonA^′′in Fig. 3 generatesp.

Now, we prove that there exists a rational stochastic language overQwhich is not rational overR⁺. In particular, it cannot be generated by a probabilistic automaton.

Proposition 10. SQ^rat(Σ)\ S_R^rat⁺(Σ)6=∅.

(17)

Proof. Let Σ = {a, b} and for any w ∈ Σ^∗, let r and s be the series defined by r(w) = |w|aands(w) = |w|b. They are rational overQsince they belong to a stable finitely generated subsemimodule ofQhhΣii. Indeed,

˙

ar=r+ 1,br˙ =r,as˙ =sandbs˙ =s+ 1.

Hence, the seriesr−sand(r−s)²where the exponent refers to the Hadamard product are also rational overQ. For anyn∈N, letσn=P

w∈Σⁿ(r−s)²(w)≤n²·2ⁿ. Check that

σ_n=n2ⁿandσ =X

n≥0

σ_n 2²ⁿ = 2.

Now, lettbe the series defined by

t(w) = (r−s)²(w) σ·2^2|w| .

t is a rational stochastic languages over Q. Its support is the set supp(t) = {w ∈ Σ^∗ | |w|a 6= |w|b} which is known to be not rational. Ift were rational overR⁺, it support would be rational. Therefore,t∈ SQ^rat(Σ)\ S_R^rat+(Σ). ⊓⊔

All these results can be summarized on diagram 4.

R⁺hhΣii

Q⁺hhΣii

S(Σ) S(Σ)∩Q⁺hhΣii

S_R^rat+(Σ)

S_Q^rat+(Σ) S^rat_R (Σ)

S_R^rat+(Σ)∩Q⁺hhΣii S^rat_Q (Σ) =S_R^rat(Σ)∩Q⁺(Σ)

Fig. 4. Inclusion relations between classes of rational stochastic languages.

3.2 Residual languages of rational stochastic languages

Recall that given a stochastic languagep ∈ S(Σ) and a wordu ∈ res(p), i.e. such that p(uΣ^∗) 6= 0, the residual language ofpwrtuis the stochastic language defined by

u⁻¹p(w) = p(uw) p(uΣ^∗)·

Whenptakes its values inQ⁺, it is not true in general thatu⁻¹ptakes also its values inQ⁺.

(18)

Consider two series(αn)_n∈Nand(βn)_n∈NoverQ⁺and such thatP

nαn=√ 2/2 and P

nβn = 4/5−√

2/2. Now, consider the series r ∈ Q⁺hh{a, b}iidefined by r(ε) = 1/5,r(aⁿ) = α_n−1,r(bⁿ) = β_n−1 forn ≥ 1andr(w) = 0otherwise. It is easy to check thatr is a stochastic language which takes its values overQ⁺and that a⁻¹r(ε) =√

2α₀. Therefore,a⁻¹r6∈QhhΣii.

We prove below that whenpis a rational stochastic language overK, all its residual languages are also rational overK. Moreover, the setRes(p) ={u⁻¹p|u ∈ res(p)} generates the same subsemimodule ofKhhΣiias the set{up˙ |u∈Σ^∗}.

We need before two linear algebra technical lemmas to prove this result.

Lemma 1. Letf :Qⁿ→Qⁿbe a linear mapping and lett∈Qⁿsuch thatP

k≥0f^kt converges tou. Thenu∈Qⁿ.

Proof. LetF be the vector subspace ofQⁿgenerated by{f^kt|k ∈ N}. There exists an integer d such that f⁰t = t, . . . , f^d−1t is a basis of F. As the sumP

k≥0f^kt converges, f^kt converges to 0 when k tends to infinity. Therefore, for anyv ∈ F, f^kvalso converges to 0 whenktends to infinity. Letv ∈ F such thatfv = v. We have alsof^kv = vfor any integer kand hence,v = 0. Letg : F → F defined by g(v) =v−fv. The linear mappinggis one-to-one and for anyv∈F and any integer k,

v+fv+. . .+f^kv=g⁻¹(1−f^k+1)(v).

Therefore,

u=g⁻¹tandu∈Qⁿ.

We use Lemma 1 to show that if{r₁, . . . , rn}generates a stable subsemimodule ofQhhΣiiand if each sumP

w∈Σr_i(w)converges toσ_ithen eachσ_i ∈Q.

Lemma 2. LetM be a stable subsemimodule ofQhhΣiigenerated by{r1, . . . , rn} and letσ_i^k=P

w∈Σ^kri(w)for any1≤i≤nand any integerk. Suppose that for any 1≤i≤n, the sumsP

k≥0σ_i^kconverges toσ_i. Thenσ_i ∈Qfor any1≤i≤n.

Proof. Let t = (r₁(ε), . . . ,(rn(ε)))^t. AsM is stable, there exist α^x_i,j ∈ Qfor any 1≤i, j≤nand anyx∈Σsuch thatxr˙ i=Pn

j=1α^x_i,j·rj. LetB ∈Q^n×ndefined by B[i, j] = P

x∈Σα^x_i,j. Let us prove by induction onkthat for any integerk, we have (σ₁^k, . . . , σ_n^k)^t=B^kt. The property is true fork= 0as for any integeri,σ⁰_i =ri(ε).

(19)

Now,

σ_i^k+1= X

w∈Σ^k,x∈Σ

r_i(xw)

= X

w∈Σ^k,x∈Σ

˙ xri(w)

= X

w∈Σ^k,x∈Σ,j∈{1,...,n}

α^x_i,j·rj(w)

= X

j∈{1,...,n}

X

x∈Σ

α^x_i,j

!

· X

w∈Σ^k

rj(w)

= X

j∈{1,...,n}

B[i, j]σ_j^k

= X

j∈{1,...,n}

B[i, j](B^kt)[j]by induction hypothesis

= (B^k+1t)[i].

Therefore,B^ktconverges to(σ₁, . . . , σn)^t. From Lemma 1,σi ∈ Qfor any1 ≤

i≤n. ⊓⊔

Lemma 3. Letp ∈ SK^rat(Σ). For any wordu ∈res(p),P

w∈Σ^∗p(uw) ∈ K. More- over, the setRes(p)generates the same subsemimodule ofKhhΣiias the set{up˙ |u∈ Σ^∗}.

Proof. Letp∈ SK^rat(Σ). For any word u,P

w∈Σ^∗p(uw)∈R⁺sincepis a stochastic language. Suppose now that K = Q or K = Q⁺. The set {up˙ |u ∈ Σ^∗} generates a finite vector subspace P ofQhhΣii. Let{u˙1p, . . . ,u˙np} be a finite subset of {up˙ |u ∈ Σ^∗} which generates P. Letσi = P

w∈Σ^∗u˙ip(w) for any i = 1, . . . , n.

From Lemma 2, eachσi∈Q. Now, for anyu∈Σ^∗, there existsα₁, . . . , αn∈Qsuch thatup˙ =Pn

i=1α_iu˙_ip. Therefore,P

w∈Σ^∗p(uw) =Pn

i=1α_iσ_i∈Q⁺.

So, for anyKand anyu∈res(p), there exists an inversible elementαuofKsuch thatup˙ =αu·u⁻¹p. In consequence, the setRes(p)generates the same subsemimod-

ule ofKhhΣiias the set{up˙ |u∈Σ^∗}. ⊓⊔

For any stochastic languagepoverK, let us denote by[Res(p)]the subsemimodule ofKhhΣiigenerated byRes(p)and let us call it the residual subsemimodule ofp.

Note that[Res(p)]is stable.

Proposition 11. Letp∈ SK^rat(Σ). For any wordu∈res(p),u⁻¹p∈ SK^rat(Σ).

Proof. From Lemma 3, the residual stochastic languages u⁻¹p belong to the same stable subsemimodules ofKhhΣiiasp. Therefore, they are rational overK. ⊓⊔ 3.3 Characterization ofS^rat

K (Σ)in terms of stable subsemimodules

We show in this section that a seriespoverKis a rational stochastic language if and only if there exists a finite subsetSinS(Σ)which generates a stable subsemimodule ofKhhΣiiand such thatp∈convK(S).

The^≪if part^≫is easy to prove.