Duality and equational theory of regular languages

(1)

Duality and equational theory of regular languages

Mai Gehrke¹ Serge Grigorieff² Jean-´Eric Pin²

1Radboud Universiteit

2LIAFA, CNRS and University Paris 7

October 2008, Valencia

(2)

Summary

(1) Related results

(2) Equational theory of regular languages (3) The profinite world

(4) Examples (5) Duality

(3)

Part I

Equational theory of regular languages

(4)

A virtuous circle

First order logic Star-free

languages

x^ω = x^ω+1

Aperiodic monoids

Decidability Sch¨utzenberger

Mc Naughton

(5)

Another virtuous circle

BΣ1

formulas Piecewise testable

languages

x^ω =x^ω+1 (xy)^ω = (yx)^ω J-trivial

monoids

Decidability I. Simon

Thomas

(6)

A brief historical perspective

There have been several attempts to find an appropriate framework to state these results

• Eilenberg (1976) + Reiterman (1982)

• Pin (1995) + Pin-Weil (1996)

• Pippenger (1997)

• Pol´ak (2001)

• Esik (2002), Straubing (2002) + Kunc (2003)

(7)

Eilenberg-Reiterman theorem

A variety of languages is a class of regular languages closed under Boolean operations, quotients and inverse of morphisms between free monoids.

Varieties of languages

Profinite identities Varieties of

finite monoids

Decidability

Eilenberg Reiterman

In good cases

(8)

Closure properties of classes of regular languages

• Eilenberg’s varieties: Boolean operations, quotients, inverse of morphisms

• Pin: intersection, union, quotients, inverse of morphisms

• Pippenger: Boolean operations, inverse of morphisms

• Pol´ak: intersection, quotients, inverse of morphisms

• Esik, Straubing: Boolean operations, quotients, inverse of length preserving morphisms

(9)

Part II

Lattices of languages

(10)

Lattices of languages

Let A be a finite alphabet. A lattice of languages is a set of regular languages of A^∗ containing ∅ and A^∗ and closed under finite intersection and finite union.

Let u and v be words of A^∗. A language L of A^∗ satisfies the equation u →v if

u ∈ L ⇒v ∈ L

Let E be a set of equations of the form u →v.

Then the languages of A^∗ satisfying the equations of E form a lattice of languages.

(11)

Equational description of finite lattices

Proposition

A finite set of languages of A^∗ is a lattice of

languages iff it can be defined by a set of equations of the form u →v with u, v ∈ A^∗.

Therefore, there is an equational theory for finite lattices of languages. What about infinite lattices?

One needs the profinite world...

(12)

Part III

The profinite world

Citation (M. Stone)

A cardinal principle of modern mathematical research may be stated as a maxim: One must always topologize.

(13)

Separating words

A monoid M separates two words u and v of A^∗ if there exists a monoid morphism ϕ : A^∗ →M such that ϕ(u) 6= ϕ(v).

For instance, the morphism which maps each word onto its length modulo 2 is a morphism from {a, b}^∗ onto Z/2Z which separates abaaba and abaabab.

Let M = {(^{1 0}_{0 1}),(^{1 0}_{1 0}),(^{0 1}_{0 1})} and let

ϕ : {a, b}^∗ → M defined by ϕ(a) = (^{1 0}_{1 0}) and ϕ(b) = (^{0 1}_{0 1}). Then, for all u, ϕ separates ua and ub since ϕ(ua) = ϕ(a) and ϕ(ub) = ϕ(b).

(14)

The profinite metric

Let u and v be two words. Put r(u, v) = min

|M| M is a finite monoid

that separates u and v}

d(u, v) = 2^−r(u,v)

Intuitively, two words are close for d if one needs a large monoid to separate them.

Then d is an ultrametric^∗, for which the product of words is uniformly continuous.

(∗) d(x, z) 6 max {d(x, y), d(y, z)}

(15)

The free profinite monoid

The completion of the metric space (A^∗, d) is the free profinite monoid on A and is denoted by Ac^∗. Its elements are called profinite words. Further, if A is finite, Ac^∗ is compact.

Any morphism ϕ : A^∗ →M, where M is a

(discrete) finite monoid extends in a unique way to a uniformly continuous morphism ϕˆ: Ac^∗ → M. Further, a profinite word u is completely determined by its images ϕ(u), whereˆ ϕ runs over the class of morphisms from A^∗ onto a finite monoid.

(16)

A nonfinite profinite word

For each u ∈ A^∗, the sequence u^n! is a Cauchy sequence and hence converges in Ac^∗ to a limit, denoted by u^ω. If ϕ is a morphism from A^∗ onto a finite monoid, ϕ(u^ω) is the unique idempotent x^ω of the semigroup generated by x = ϕ(u).

1 x x² x³

. . .

x^i+p = xⁱ

xⁱ⁺¹ xⁱ⁺²

x^i+p−1 x^ω

(17)

Part IV

Equational theory

(18)

Profinite equations

Let (u, v) be a pair of profinite words of Ac^∗. We say that a regular language L of A^∗ satisfies the

profinite equation u → v if

u ∈ L ⇒v ∈ L

Let η : A^∗ →M be the syntactic morphism of L.

Then L satisfies the profinite equation u → v iff ˆ

η(u) ∈ η(L) ⇒ηˆ(v) ∈ η(L)

(19)

Main result

Given a set E of equations of the form u →v (where u and v are profinite words), the set of all regular languages of A^∗ satisfying all the equations of E is called the set of languages defined by E.

Theorem

A set of regular languages of A^∗ is a lattice of languages iff it can be defined by a set of equations of the form u →v, where u, v ∈ Ac^∗.

(20)

Equations of the form u 6 v

Let us say that a regular language satisfies the equation u 6v if, for all x, y ∈ Ac^∗, it satisfies the equation xvy →xuy.

Proposition

Let L be a regular language of A^∗, let (M,6_L) be its syntactic ordered monoid and let η : A^∗ →M be its syntactic morphism. Then L satisfies the

equation u 6v iff η(u)ˆ 6_L η(v)ˆ .

(21)

Quotienting algebras of languages

A lattice of languages is a quotienting algebra of languages if it is closed under the quotienting operations L → u⁻¹L and L → Lu⁻¹, for each word u ∈ A^∗.

Theorem

A set of regular languages of A^∗ is a quotienting algebra of languages iff it can be defined by a set of equations of the form u 6v, where u, v ∈ Ac^∗.

(22)

Boolean algebras

Let us write

u ↔v for u → v and v → u, u = v for u 6 v and v 6 u.

Theorem

(1) A set of regular languages of A^∗ is a Boolean algebra iff it can be defined by a set of

equations of the form u ↔ v.

(2) A set of regular languages of A^∗ is a Boolean quotienting algebra iff it can be defined by a set of equations of the form u = v.

(23)

Interpreting equations (Notation: P = η (L))

Closed under Interpretation

∪,∩ u →v η(u)ˆ ∈ P ⇒η(v) ∈ P

quotient u 6 v xvy →xuy

complement (L^c) u ↔v u →v and v → u quotient and L^c u = v xuy ↔xvy

ϕ⁻¹ Variables = words

ϕ⁻¹ length pres. Variables = letters

ϕ⁻¹ length Variables =

multiplying words of the same length

(24)

Languages with zero

A language with zero is a language whose syntactic monoid has a zero. The class of regular languages with zero is closed under Boolean operations and quotients, but not under inverse of morphisms.

Therefore, this class is not a variety, but it can be defined by a set of equations of the form u = v.

Proposition

A regular language has a zero iff it satisfies the equation xρA = ρA = ρAx for all x ∈ A^∗.

(25)

The profinite word ρ

_A

Let us fix a total order on the alphabet A. Let u₀, u₁, . . . be the ordered sequence of all words of A^∗ in the induced shortlex order.

1, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, . . . Reilly and Zhang (see also Almeida-Volkov) proved that the sequence (vn)n>0 defined by

v0 = u0, vn+1 = (vnun+1vn)^(n+1)!

converges to a profinite word ρA, which is

idempotent and belongs the minimal ideal of Ac^∗.

(26)

Nondense languages

A language L of A^∗ is dense if, for each word u ∈ A^∗, L∩A^∗uA^∗ 6= ∅.

The languages non-dense or full are closed under finite union, finite intersection and quotients. They are not closed under inverse of morphisms, even for length preserving morphisms.

Theorem

A language of A^∗ is non-dense or full iff it satisfies the equations x 6 0 for all x ∈ A^∗.

(27)

Slender or full languages

A regular language is slender iff it is a finite union of languages of the form xu^∗y, where x, u, y ∈ A^∗. Regular slender or full languages form a quotienting lattice of languages, but are not closed under

inverse of morphisms.

Theorem

Suppose that |A| > 2. A regular language of A^∗ is slender or full iff it satisfies the equations x 60 for all x ∈ A^∗ and the equation x^ωuy^ω = 0 for each x, y ∈ A⁺, u ∈ A^∗ such that i(uy) 6= i(x).

(28)

Automata recognizing slender languages

Theorem

A regular language is slender iff its minimal

deterministic automaton does not contai any pair of connected cycles.

u

x y

Two connected cycles, where x, y ∈ A⁺ and u ∈ A^∗.

(29)

Sparse languages

A regular language is sparse iff it is a finite union of languages of the form u₀v^∗₁u₁· · ·v^∗_nun, where u₀, v₁, . . . , vn, un are words. Regular sparse or full

languages form a quotienting lattice of languages, but are not closed under inverse of morphisms.

Theorem

Suppose that |A| > 2. A regular language of A^∗ is sparse or full iff it satisfies the equations x 6 0 for all x ∈ A^∗ and the equations (x^ωy^ω)^ω = 0 for each x, y ∈ A⁺ such that i(x) 6= i(y).

(30)

Boolean closures

Theorem

Suppose that |A| > 2. A regular language of A^∗ is slender or coslender iff its syntactic monoid has a zero and satisfies the equations x^ωuy^ω = 0 for each x, y ∈ A⁺, u ∈ A^∗ such that i(uy) 6= i(x).

Theorem

Suppose that |A| > 2. A regular language of A^∗ is sparse or cosparse iff its syntactic monoid has a zero and satisfies the equations (x^ωy^ω)^ω = 0 for each x, y ∈ A⁺ such that i(x) 6= i(y).

(31)

Duality in a nutshell

The dual space of a distributive lattice is the set of its prime filters.

Elements ←→ Prime filters

Boolean algebras ←→ Topological spaces

Distributive lattices ←→ Ordered topologial spaces Sublattices ←→ Quotient spaces

n-ary operations ←→ (n+ 1)-ary relations

(32)

The dual space of Reg(A

^∗

)

Almeida (1989, implicitely) and Pippenger (1997, explicitely) proved:

Theorem

The dual space of the lattice of regular languages is the space of profinite words. Furthermore, the canonical embedding is given by the topologial closure: e(L) = L.

(33)

Some formulas

The maps L 7→ L and K 7→ K ∩A^∗ are inverse isomorphisms between the Boolean algebras Reg(A^∗) and Clopen(Ac^∗). In particular, for all L, L1, L2 ∈ Reg(A^∗):

(1) L^c = (L)^c,

(2) L1 ∪L2 = L1 ∪L2, (3) L1 ∩L2 = L1 ∩L2,

(4) for all x, y ∈ A^∗, then x⁻¹Ly⁻¹ = x⁻¹Ly⁻¹. (5) If ϕ : A^∗ →B^∗ is a morphism and

L ∈ Reg(B^∗), then ϕˆ⁻¹(L) = ϕ⁻¹(L).

(34)

Prime filters of Reg(A

^∗

)

• Given a prime filter p of Reg(A^∗), there is a unique profinite word u such that, for every

morphism from A^∗ onto a finite monoid M, ϕ(u)ˆ is the unique element m ∈ M such that ϕ⁻¹(m) ∈ p.

• If u is a profinite word, the set

pu = { L ∈ Reg(A^∗) | ϕ⁻¹( ˆϕ(u)) ⊆L for some morphism ϕ from A^∗ onto a finite monoid } is a prime filter of Reg(A^∗).

(35)

A duality result

The right and left residuals of L by K are:

K\L = {u ∈ A^∗ |Ku ⊆ L}

L/K = {u ∈ A^∗ |uK ⊆ L}

Theorem

The product on profinite words is the dual of the residuation operations on regular languages.

The identity (H\L)/K = H\(L/K) in Reg(A^∗) is equivalent to stating that the product is associative.

(36)

Back to the proof of the main result

Theorem

A set of regular languages of A^∗ is a lattice of languages iff it can be defined by a set of equations of the form u →v, where u, v ∈ Ac^∗.

The proof is an instantiation of the duality between sublattices of Reg(A^∗) and preorders on its dual space Ac^∗.

Let L be a lattice of languages. The preorder

determining the quotient in the dual space is exactly the equational theory of L.

(37)

Summary

• We proved that every lattice of regular

languages is given by an equational theory, a result that subsumes Eilenberg’s variety theorem and its extensions.

• In particular, any class of regular languages defined by a fragment of logic closed under conjunctions and disjunctions (first order, monadic second order, temporal, etc.) admits an equational description.

(38)

The golden circle

Logical fragments

Lattices of languages

Profinite identities

Decidability

Stone-Priestley Duality

Hopefully

(39)

Potential extensions

• One could further extend this result to classes of regular languages only closed under finite intersection by using the syntactic semiring introduced by Pol´ak.

• Our result could also be adapted to languages of infinite words, words over ordinals or linear orders, and even perhaps to tree languages.

Our second main result reveals a strong link between relational semantics for modal logic and the algebraic theory of automata. Further duality results are to be expected. . .