Graph structure and monadic second-order logic. A language-theoretic approach.

(1)

HAL Id: hal-00646514

https://hal.archives-ouvertes.fr/hal-00646514

Submitted on 14 Jun 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

language-theoretic approach.

Bruno Courcelle, Joost Engelfriet

To cite this version:

Bruno Courcelle, Joost Engelfriet. Graph structure and monadic second-order logic. A language-

theoretic approach.. Cambridge University Press, pp.728, 2012, Encyclopedia of Mathematics and its

applications, Vol. 138, 978-0-521-89833-1. �hal-00646514�

(2)

Bruno Courcelle and Joost Engelfriet

Graph Structure and Monadic Second-Order Logic, a Language

Theoretic Approach

April 2011

to be published by

Cambridge University Press

(3)

(4)

3 To Dominique and Louco

(5)

(6)

Introduction

This book contributes to several fields of Fundamental Computer Science. It extends to finite graphs several central concepts and results of Formal Language Theory and it establishes their relationship to results about Fixed-Parameter Tractability. These developments and results have applications in Structural Graph Theory. They make an essential use of logic for expressing graph prob- lems in a formal way and for specifying graph classes and graph transformations.

We need some short historical accounts in order to describe these contributions.

Formal Language Theory

This theory has been developed with different motivations. Linguistics and compilation have been among the first ones, around 1960. In view of the appli- cations to these fields, different types of grammars, automata and transducers have been defined to specify formal languages, i.e., sets of words, and trans- formations of words called transductions, in finitary ways. The formalization of the semantics of sequential and parallel programming languages, that uses respectively program schemes and traces

¹

, the modelling of biological develop- ment and yet other applications have motivated the study of new objects, in particular of sets of terms.

²

These objects and their specifying devices have since been investigated from a mathematical point of view, independently of immediate applications. However, all these investigations have been guided by three main types of questions: comparison of descriptive power, closure prop- erties (with effective constructions in case of positive answers) and decidability problems.

A context-free grammar generates words, hence specifies a formal language.

However, each generated word has a derivation tree that represents its structure relative to the considered grammar. Such a tree, which can also be viewed as a term, is usually the support of further computation, typically a translation

1Traces are equivalence classes of words for congruences generated by commutations of letters; see the book [*DiekRoz]. For program schemes, see [*Cou90a]. The list of references is divided into two parts. The first part lists books, book chapters and survey articles: the * in, e.g., [*DiekRoz] indicates a reference of this kind. The second part lists research articles and dissertations.

2In Semantics, one is also interested ininfinitewords, traces and terms. In this book these will not be considered.

1

(13)

into a word of another language (this is the case in linguistics and in compila- tion). Hence, even for its initial applications, Formal Language Theory has had to deal with trees as well as with words. In Semantics, terms are even more important than words. Thus, sets of terms, usually called tree languages

³

, and transductions of terms, called tree transductions, have become central notions in Formal Language Theory.

Together with context-free grammars, finite (also called finite-state) au- tomata are among the basic notions of Language Theory, in particular for their applications to lexical analysis and pattern matching. They have also been used early (around 1960) for building algorithms to check the validity of certain logical formulas, especially those of Monadic Second-Order Logic, in certain re- lational structures. On the other hand, Monadic Second-Order Logic can be used to specify and to classify sets of words and terms.

⁴

There are deep re- lationships between monadic second-order formulas and finite automata that recognize words and terms (see [*Tho97a]). The fundamental result is that ev- ery language that is specified by a sentence of Monadic Second-Order Logic (expressing a property of words) can be recognized by a finite automaton, and vice-versa. Moreover, the finite automaton can be constructed effectively from the sentence. This means that Monadic Second-Order Logic can be viewed as a high-level specification language that can be compiled into “machine code”: a finite automaton that recognizes the words that satisfy the specification. The same result holds for terms, with respect to finite automata on trees. As a con- sequence of this fundamental relationship, Monadic Second-Order Logic is now one of the basic tools used in Formal Language Theory and its applications, in addition to context-free grammars, finite automata and finite transducers (which are finite automata with output).

The extension of the basic concepts of Formal Language Theory to graphs is a natural step because graphs generalize trees. However, graphs have already been present from the beginnings in several of its fields. In compilation, one uses attribute grammars that are context-free grammars equipped with seman- tic rules ([AhoLSU, Cre]). These rules associate graphs (called dependency graphs ) with derivation trees. An attribute grammar is actually the paradig- matic example of a context-free graph grammar (based on hyperedge replacement rewriting rules, [DreKH]). In the semantics of parallelism, traces are canon- ically represented by graphs, and an important concern is to specify them by finite automata ([DiekRoz]).

One starting point of the research presented in this book has been the devel- opment of a robust theory of context-free graph grammars, of recognizability of graphs (to be short, an algebraic formulation of finite automata) and of graph transductions. In order to use the theory of context-free grammars and rec- ognizability in arbitrary algebras initiated by Mezei and Wright in [MezWri], we choose appropriate (and natural) operations on graphs. Thus, graphs be-

3In addition to being words, terms have canonical representations as labelled, rooted and ordered trees. They are thus called “trees” but this terminology is inadequate.

4This logical language and the related one calledµ-calculus ([*ArnNiw]) are also convenient for expressing properties of programs.

(14)

CONTENTS 3 come the value of terms that are built with these (infinitely many) operations.

Roughly speaking, a context-free graph grammar is a finite set of rules of the form A

₀

→ f (A

₁

, . . . , A

_n

), n ≥ 0, where each A

_i

is a nonterminal of the gram- mar and f is one of the chosen graph operations. The rule means that if the graphs G

1

, . . . , G

n

are generated by respectively A

1

, . . . , A

n

, then A

0

can gener- ate the graph f (G

1

, . . . , G

n

). Such grammars have useful applications to Graph Theory: they can be used to describe many graph classes in uniform ways and to prove by inductive arguments certain properties of their graphs. Still roughly speaking, a set of graphs is recognizable if there is a finite automaton that rec- ognizes all the terms that evaluate to a graph in the set. Thus, the automaton does not work directly on the given graph, but rather on any term that repre- sents that graph. In a similar way one can define graph transductions through the use of tree transducers. Note that, to describe a set of graphs or a graph transduction in a finitary way, one can necessarily use only finitely many graph operations. As we will see, that is a rather severe, but natural restriction.

Our main goal will be to show that the fundamental use of Monadic Second- Order Logic as a high-level specification language carries over to graphs, not only for the specification of recognizable sets of graphs, but also for context- free sets of graphs and for certain types of graph transductions. This gives a new dimension to the above-mentioned fundamental result for words and terms, because the properties of graphs that can be specified in Monadic Second-Order Logic are more varied and useful than those of words and terms.

We will specify a set of graphs by a monadic second-order sentence, and a graph transduction by a tuple of monadic second-order formulas that define an

“interpretation” of the output graph in the input graph. From such a specifica- tion we will show how one can construct a finite automaton on terms, or a tree transducer in the second case, that is related to the specification as explained above. Note that the logic “acts” directly on the graphs, whereas the automata and transducers work on the terms that denote these graphs. Thus, Monadic Second-Order Logic can be viewed as playing the role of “finite automata on graphs” and “finite transducers of graphs” in our Formal Language Theory for Graphs.

Graph algorithms

The above-mentioned developments have important applications for the con-

struction of polynomial-time algorithms on graphs. In his 16th NP-completeness

column, published in 1985 [John], Johnson reviews a number of NP-complete

graph problems that become polynomial-time solvable if their inputs are re-

stricted to particular classes of graphs such as those of trees, of series-parallel

graphs, of planar graphs to name a few. For many of these classes, in particular

for trees, almost trees (with parameter k), partial k-trees, series-parallel graphs,

outerplanar graphs and cographs, the efficient algorithms take advantage of cer-

tain hierarchical structures of the input graphs. Because of these structures,

(15)

these graphs are somehow close to trees

⁵

. The notion of a partial k-tree has emerged as a powerful one subsuming many other types of “tree-like graphs”.

(The cographs have a canonical hierarchical structure but they are not included in the class of partial k-trees for any fixed k.) Many articles have produced polynomial-time algorithms for NP-complete problems restricted to partial k- trees. In 1994, Hedetniemi has compiled a list of 238 references [*Hed] on partial k-trees and algorithms concerning them. The notion of a partial k-tree has also been used with a different terminology (tree-width, tree-decomposition ) by Robertson and Seymour in their study of the structure of graph classes that exclude fixed graphs as minors. They formulate this notion in terms of particu- lar decompositions of graphs, called tree-decompositions, that are at the basis of the construction of polynomial-time algorithms. Each tree-decomposition has a width, and a graph is a partial k-tree if and only if it has tree-width at most k, which means that it has a tree-decomposition of width at most k.

The recent theory of Fixed-Parameter Tractability (the founding book by Downey and Fellows [*DowFel] was published in 1999) now gives a conceptual framework to most of these results. The notion of a fixed-parameter tractable al- gorithm specifies how the multiplicative constant factor of the time-complexity of a polynomial-time algorithm depends on certain parts of the data. It happens that for most of the graph algorithms based on tree-decompositions, the expo- nent of the polynomial is 1: these algorithms are linear-time in the size of the input graphs, with multiplicative “constant” factors that depend exponentially (or more) on the widths of the input tree-decompositions.

The explanation for this fact is one of the main goals of this book. We will show that, for a certain natural choice of graph operations, tree-decompositions correspond to terms, and tree-decompositions of width at most k correspond to terms that are built from a finite subset of those operations. A general algorithmic result that encompasses many of the above-mentioned results, fol- lows from the fundamental relationship between monadic second-order logic and finite automata discussed before: if the considered problem is specified by a monadic second-order sentence (and this is the case for many NP-complete graph problems not using numerical values in their inputs), then a finite au- tomaton on the terms that encode the tree-decompositions of width at most k can be constructed (for each k) to give the answer to the considered question (for example, Is the given graph 3-colorable? ) where the input graph is given by a tree-decomposition (or a term encoding it). The linearity result follows because finite automata can be implemented so as to work in linear time (and because a tree-decomposition of a graph can be found in linear time).

We will extend the case of tree-width bounded graphs (already discussed in [*DowFel]) to another type of graph decompositions, based on another natural choice of graph operations. This leads to the notion of clique-width of a graph.

Clique-width is more powerful than tree-width in the sense that every set of

5These classes can actually be generated by certain context-free graph grammars and the corresponding hierarchical structures of the generated graphs are represented by their derivation trees. There is thus a close relationship between the algorithmic issues and the extensions of language theoretic concepts discussed above.

(16)

CONTENTS 5 graphs of bounded tree-width has bounded clique-width but not vice-versa, an example being the set of cographs. On the other hand, in the above general result, the monadic second-order sentences must be restricted to use quantifi- cations on sets of vertices (instead of both vertices and edges), so less graph problems can be specified. The algorithms are cubic-time instead of linear-time because, for these graph operations, cubic time is needed to find a term for a given graph.

The theory that will be exposed in the nine chapters of this book has arisen from the confluence of the two main research directions presented above. The remainder of this introduction will present in a more detailed way, but still informally, the main concepts and results.

The role of logic

We will study and compare finitary descriptions of sets of finite graphs by using concepts from Logic, Universal Algebra and Formal Language Theory. We first explain the role of Logic. A graph

⁶

can be considered as a logical structure (also called relational structure) whose domain (also called its universe ) consists of the vertices, and that is equipped with a binary relation that represents adja- cency. Graph properties can thus be expressed by logical formulas of different languages and classified accordingly.

First-order formulas are rather weak in this respect because they can only express local properties such as that a graph has maximum degree or diameter bounded by a fixed integer. Most properties of interest in Graph Theory can be expressed by second-order formulas : these formulas can use quantifications on relations of arbitrary arity. Unfortunately, little can be obtained from the expression of a graph property in second-order logic. Our favourite logical lan- guage will be its restriction called Monadic Second-Order Logic. Its formulas are the second-order formulas that only use quantifications on unary relations, i.e., on sets. They can express many useful graph properties like connectivity, p-colorability (for fixed p) and minor inclusion, whence planarity. Such proper- ties are said to be monadic second-order expressible, and the corresponding sets of graphs are monadic second-order definable.

These logical expressions have interesting algorithmic consequences as ex- plained above, but only for graphs that are somehow “tree-like” (because 3- colorability is NP-complete and expressible by a monadic second-order sen- tence). Monadic second-order sentences are also used in Formal Language The- ory to specify languages, i.e., sets of words or terms. The fundamental result establishes that monadic second-order sentences and finite automata have the same descriptive power. But monadic second-order formulas are even more im- portant for specifying sets of graphs than for specifying languages because there is no convenient notion of graph automaton. They replace finite automata, not only for specifying sets of graphs, but also for specifying graph transforma-

6In order to simplify the discussion, we only discuss simple graphs, i.e., graphs without parallel edges.

(17)

tions. Such transformations, called monadic second-order transductions, gen- eralize the transductions of terms and words defined by finite automata with output called finite transducers.

⁷

Independently of these language theoretic applications, monadic second-order transductions are technically useful for con- structing monadic second-order formulas because the inverse image of a monadic second-order definable set of relational structures under a monadic second-order transduction is monadic second-order definable.

However, monadic second-order logic alone yields no interesting results. In order to be useful for the construction of algorithms, the expression of a graph property by a monadic second-order sentence must be coupled with constraints on the graphs of interest such as having bounded tree-width or bounded clique- width. The language theoretical issues to be discussed below will also combine monadic second-order sentences and the very same constraints. Hence, we will study certain hierarchical graph decompositions, like tree-decompositions, that fit with monadic second-order logic.

Graph algebras

Graph decompositions will be formalized algebraically by terms written with ap- propriate graph operations. Hence, we will use concepts from Universal Algebra in addition to ones from Logic.

For treating graphs as algebraic objects, i.e., as elements of appropriate alge- bras (words and traces are elements of monoids), we will define graph operations that generalize the concatenation of words. We will consider two natural ways to

“concatenate” two graphs. One way is to “glue” them together, by identifying some of their vertices. The other way is to “bridge” them (or rather, “bridge the gap between them”), by adding edges between their vertices. Clearly, to obtain single valued operations, we have to specify which vertices must be “glued” or

“bridged”. By means of labels attached to vertices, we will specify that vertices with the same label must be identified, or that edges must be created between all vertices with certain labels. Hence, we will define “concatenation” opera- tions on labelled graphs. To allow the flexible use of vertex labels, we also define (unary) operations that modify these labels. Terms written with these opera- tions evaluate to finite (labelled) graphs. The value G of a term t = f (t

1

, t

2

) is a certain combination, specified by f , of the values of its subterms t

1

and t

2

. These values are, roughly speaking, subgraphs of G (only “roughly” because the labels of the vertices of the graphs defined by t

1

and t

2

may differ from their labels in the resulting graph G). The same holds for all subterms of t, hence, t represents a hierarchical decomposition of G.

Based on the idea of “gluing” graphs (and using the numbers 1, . . . , k + 1 as labels), we will define, for each k, a finite set of graph operations, F

_[k+1]^HR

, that generates exactly the graphs of tree-width at most k. Hence, these oper- ations formalize algebraically an existing combinatorial notion. They yield a graph algebra (that generalizes the monoid of words) having countably many

7In particular, therational transductionsthat are transductions of words defined either by finite(-state) transducers or, algebraically, in terms of homomorphisms and regular languages.

(18)

CONTENTS 7 operations. We will call it the HR algebra for reasons explained below. An- other countable family of graph operations, also indexed by positive integers and based on the idea of “bridging” graphs, will yield a different graph algebra, called the VR algebra, and a graph complexity measure called clique-width. By definition, a graph has clique-width at most k if it belongs to the analogous finite set of graph operations F

_[k]^VR

. As observed before, clique-width is more powerful than tree-width in the sense that every set of graphs of bounded tree-width has bounded clique-width but not vice-versa. Many definitions and results will be similar for these two graph algebras. We will explain below why both algebras are interesting.

The introduction of graph operations is essential for our project of extending to graphs the basic concepts of Formal Language Theory in a clean way. We will use for that the algebraic notions of an equational set and of a recognizable set.

An equational set is a component of the least solution of an equation system written with set union and the operations of the considered algebra. Equation systems formalize context-free grammars that generate elements of the algebra:

if such a context-free grammar has, e.g., three rules A → f (B, C), A → g(A) and A → a for the nonterminal A, where B and C are two other nonterminals, f and g are operations of the algebra and a is a constant of the algebra, then the corresponding equation system has the equation A = f (B, C) ∪ g(A) ∪ {a}

(where A, B and C now stand for sets of elements of the algebra). The context- free languages are actually the equational sets of the monoids of words over their terminal alphabets (due to the least fixed-point characterization of context-free grammars of [GinRic] and [ChoSch]). A recognizable set is a set saturated by a congruence having finitely many classes. The regular languages are thus the recognizable sets of the monoids of words. When all elements of the algebra can be denoted by a term (which is the case for the HR and VR algebra), a set is recognizable if and only if there exists a finite automaton on terms that recognizes all the terms that evaluate to an element of the set.

The chart of Figure 1 shows some relationships between the above defined notions. An arrow means: “used for a definition or a construction”.

Two graph algebras

Since we will define two graph algebras, we will obtain two types of equational sets, called the HR- and the VR-equational sets. For each k, the set of graphs of tree-width at most k is HR-equational (because it is generated by the finite set of operations F

_[k+1]^HR

), and similarly, the set of graphs of clique-width at most k is VR-equational. There are also two types of recognizable sets of graphs, the HR- and the VR-recognizable sets. Every HR-equational set is VR-equational and every VR-recognizable set is HR-recognizable, but not vice-versa. The class of HR-equational sets is incomparable with the class of HR-recognizable sets, and similarly for the VR algebra.

These facts show some important differences with the case of words. For

words, we have a unique algebraic structure based on a single operation, and

the class of recognizable sets (the regular languages) is properly included in that

(19)

Figure 1: The main notions.

of equational sets (the context-free languages). But graphs are intrinsically more complicated than words: this explains why we need countably many operations and not just one. We will explain next why we have two algebras and two (robust) classes of equational sets that both generalize the class of context-free languages.

The two graph algebras have been defined initially in such a way that their equational sets coincide with existing context-free sets of graphs: the HR- equational sets are actually (but not by definition) those generated by certain context-free graph grammars based on a rewriting mechanism called Hyperedge Replacement (that uses “gluing” of graphs) and we call the corresponding alge- bra the HR algebra to refer to this fact; the other algebra, called the VR algebra, has been designed similarly so that its equational sets are those generated by the context-free graph grammars based on Vertex Replacement (that uses “bridg- ing” of graphs); see [DreKH] and [EngRoz] respectively for these two types of graph grammars.

Many properties of the equational and recognizable sets of graphs of both

kinds are just particular instances of those of the equational and recognizable

sets in arbitrary algebras. By using this algebraic approach, we generalize the

context-free languages without having to define a graph rewriting mechanism

and check that such rewriting is actually context-free (the general notion of

context-free rewriting is defined in [Cou87]). Similarly, we generalize the regular

languages without having to define any notion of graph automaton and to look

for closure properties of the class of sets of graphs that are recognized by such

automata.

(20)

CONTENTS 9 Monadic Second-Order Logic and the VR graph algebra

We first discuss the equational sets and the recognizable sets of the VR algebra, and their relationships with monadic second-order logic.

Two main results of this book are the Recognizability Theorem and the Equationality Theorem. They relate two ways of handling graphs: the “logical way” by which graphs are characterized in terms of what they are made of and contain (vertices, edges, paths, minors, subgraphs with particular properties) and the “algebraic way” by which sets of graphs are characterized more globally by means of equation systems and congruences. In the latter approach, graphs are treated as elements of algebras and related with other elements that are not necessarily among their subgraphs.

The Recognizability Theorem says that if a set of graphs is monadic second- order definable, then it is VR-recognizable. The Equationality Theorem says that a set of graphs is VR-equational if and only if it is the image of the set of finite trees under a monadic second-order transduction

⁸

. Here are some conse- quences of these two results.

The Recognizability Theorem entails that if a graph G is defined by a term t written with operations of the VR algebra belonging to any fixed finite set F , then one can check in time O(|t|), whether or not G satisfies a fixed monadic second-order property. This fact, based on a compilation of monadic second- order formulas into finite automata over F , is one of the keys

⁹

to the construction of fixed-parameter tractable algorithms for the verification of monadic second- order properties of graphs of bounded clique-width (whence also of graphs of bounded tree-width since bounded tree-width implies bounded clique-width).

Another consequence is the Filtering Theorem saying that the graphs of a VR- equational set that satisfy a fixed monadic second-order property (for example planarity) form a VR-equational set. This is based on a Filtering Theorem that holds in all algebras and says that the intersection of an equational set and a recognizable one is equational (generalizing the corresponding fact for context-free and regular languages). Since the emptiness of an equational set is decidable, we get as another corollary that the monadic second-order satisfia- bility problem is decidable for each VR-equational set L. This means that one can decide whether or not a given monadic second-order sentence is satisfied by some graph in L.

The Equationality Theorem entails that the class of VR-equational sets of graphs is preserved under monadic second-order transductions, because the class of monadic second-order transductions is closed under composition. This corol- lary strengthens the Filtering Theorem. It is similar to the fact that the image of a context-free language under a rational transduction is context-free.

8This means, informally, that it is the set of graphs “defined inside finite trees” by a fixed finite tuple of monadic second-order formulas. These transductions are based on, and extend, the model-theoretical notion of “interpretation”.

9The other one is a polynomial-time algorithm that finds a term evaluating to a given graphGif there exists one.

(21)

Monadic second-order logic and the HR graph algebra

The Recognizability and the Equationality Theorems have versions relative to the HR algebra. For describing them, we must go back to the initial definition of monadic second-order formulas (MS formulas in the sequel) interpreted in graphs: they only use quantifications on vertices and sets of vertices. This is due to the chosen representation of a graph by a relational structure whose domain is its set of vertices. However, we can also express logically the properties of a graph G via its incidence graph Inc (G). The vertices of this (bipartite) graph are the vertices and the edges of G, and its adjacency relation links a vertex and the edges incident with it. Thus, monadic second-order formulas to be interpreted in Inc(G) (MS

₂

formulas in the sequel) can also use quantifications on edges and sets of edges. A graph property is MS

₂

-expressible if it is expressible by an MS

₂

formula, and the corresponding set of graphs is MS

₂

-definable. The notation MS

₂

refers to this extension of the initially defined language (referred to by MS in the sequel). It is strictly more expressive. For example, the existence of a perfect matching is MS

2

-expressible but not MS-expressible. However, MS

2

formulas are not more expressive than MS formulas for properties of words, of trees and of certain types of graphs such as planar graphs and, for each k, of graphs of degree at most k. These facts show the existence of deep links between structural graph properties (such as planarity) and the expressive power of MS

2

versus MS sentences.

The Recognizability Theorem for the HR algebra says that every MS

2

- definable set of graphs is HR-recognizable, and the Equationality Theorem says that a set of graphs is HR-equational if and only if the set of its incidence graphs is the image of the set of finite trees under a monadic second-order transduction.

We obtain an algorithmic consequence similar to the one we have discussed for MS-expressible problems and the VR algebra: if a graph is defined by a term t over F

_[k]^HR

for some fixed k, then one can check in time O(|t|) whether or not it satisfies a fixed MS

₂

property. Since there exists a polynomial-time algorithm that decomposes appropriately the input graphs, we obtain, for each MS

₂

prop- erty, a fixed-parameter tractable verification algorithm, tree-width being the parameter. The algorithm for MS properties applies to larger classes of graphs, because bounded tree-width implies bounded clique-width, but to less proper- ties than this one, because not every MS

2

-expressible property is MS-expressible.

The notions of tree-width and clique-width are thus both useful, for solving dif- ferent problems. We also have a Filtering Theorem for the HR-equational sets and MS

2

-expressible properties, whence the decidability of the MS

2

-satisfiability problem for each HR-equational set. The Equationality Theorem for the HR algebra entails that the class of HR-equational sets of graphs is preserved under the monadic second-order transductions that transform incidence graphs.

A graph is uniformly k-sparse if its number of edges is at most k times its

number of vertices, and the same holds for all its subgraphs. Another main

result of this book is the Sparseness Theorem: MS

₂

formulas are not more

expressive than MS formulas for properties of uniformly k-sparse graphs, for

each fixed k. The above-mentioned types of graphs are uniformly k-sparse for

(22)

CONTENTS 11 some k.

Logical and Language Theoretical issues

Tree-width and clique-width are closely related with the decidability of monadic second-order satisfiability problems for particular sets of graphs. The satisfia- bility problem of MS

2

sentences for the set of graphs of tree-width at most some fixed k is decidable (because it is decidable for each HR-equational set), and the same holds for MS sentences and the set of graphs of clique-width at most k. Some converse results also hold: bounded tree-width is a necessary (but not sufficient) condition for a set of graphs to have a decidable MS

2

-satisfiability problem, and a similar result holds for clique-width and MS sentences. Their proofs use monadic second-order transductions and deep results of Graph The- ory.

The Recognizability and the Equationality Theorems contribute to estab- lishing the foundations of a sound and robust extension of the theory of formal languages to the description of sets of finite graphs. In this extension, monadic second-order logic plays a major role. From the above informal (and simplified) statements, this extension may seem to be straightforward. However, graphs are intrinsically more complex than words and terms, and some results do not extend as one could expect or hope. We give two examples. First, the set of all graphs is not equational in any of the two graph algebras, whereas the set of all words over a finite alphabet is (almost trivially) context-free. Second, there are uncountably many VR- and HR-recognizable sets of graphs, and this fact prevents any exact characterization of these sets in terms of graph automata or logical formulas. Such a characterization would generalize nicely the classical characterization of the recognizable (i.e., the regular) languages in terms of fi- nite automata and monadic second-order sentences, but it cannot exist. These examples are related to the fact that the sets of operations of the two graph algebras are infinite, and that this infiniteness is somehow unavoidable

¹⁰

. Graph structure

Graph structure is a flexible concept covering many different cases. Graph de- compositions form an important type of structuring. We have already discussed those that yield the notions of tree-width and clique-width in connection with algorithmic applications. There exist other types of graph decomposition that are useful for algorithmic purposes or for proving results. Examples are the mod- ular decomposition defined by Gallai [Gal], the decomposition in 3-connected components defined by Tutte [Tut] and the clique-sum decomposition used by Robertson and Seymour [Gro, RobSey03]. The existence of an embedding in a fixed surface, or of a homomorphism into a fixed graph (a proper vertex coloring with p colors of a loop-free graph can be defined as a homomorphism

10One can generate all finite graphs by a finite number of graph operations, but the Rec- ognizability Theorem fails for the corresponding algebra. So this algebra is useless for our purposes.

(23)

of this graph into the complete graph K

_p

) is a type of structure. Finally, the non-existence in a graph of particular induced subgraphs, minors or vertex- minors is also an important type of structural property. (See [Die] for minors and [Oum05] for vertex-minors). There exist nontrivial relations between these different notions. Here are some examples: the graphs without a fixed pla- nar graph as a minor have tree-width bounded by a value computable from this graph and those embeddable in a fixed surface are characterized by finitely many excluded minors [Die]; forbidding certain induced subgraphs implies bounded clique-width [BraDLM, BraELL].

Monadic second-order sentences can express many such structural proper- ties. The expression of p-colorability (for fixed p) is immediate. It is easy to construct a monadic second-order sentence expressing that a given graph has no minor or no induced subgraph isomorphic to a fixed graph. Hence the sets of planar graphs and of graphs of tree-width at most k (for all k) are MS-definable because each of them is characterized by finitely many excluded minors. A set of graphs defined by finitely many excluded induced subgraphs (this is the case of cographs) or by an infinite but MS-definable set of minimal excluded induced subgraphs is also MS-definable. The latter observation applies to comparabil- ity graphs ([Cou06a, Gal]) and to perfect graphs ([*ChuRST, ChuRST]). Their definitions are not directly expressible by monadic second-order sentences, and finding the minimal excluded induced subgraphs requires difficult proofs.

In many situations concerning graph structure, we need more than a yes or no answer. For example, that a graph does not contain K

5

or K

3,3

as a minor implies that it is planar, but this negative fact, when it is valid, does not help to find a planar embedding. In other words, we are not only interested in checking that a given graph “has some structure”, e.g., has a planar embedding or a tree- decomposition of width bounded by a fixed integer, but we are also interested in having a monadic second-order transduction that constructs from the given graph some planar embedding or some tree-decomposition. Such transductions may be difficult to construct. Some constructions are given in [Cou96a, Cou99, Cou00, Cou06b, Cou08a] and challenging questions remain open in this area.

To conclude with this aspect, we can state that many constructions of monadic second-order formulas and transductions use in an essential way re- sults of Graph Theory, and even very deep ones in some cases. Conversely, the methods developed in this book bring new results in Graph Theory apart from algorithmic applications. For example, the infinite set of minimal excluded induced subgraphs that characterizes the comparability graphs has a certain regularity that we can formalize by observing that this set is VR-equational.

Such applications deserve further study.

The main contributions of this book

Let us summarize the main ideas and results to be developed in this book. We define two graph algebras called the HR algebra and the VR algebra, from which we get two classes of equational and two classes of recognizable sets of graphs.

The terms of these algebras denote graphs and formalize certain hierarchical

(24)

CONTENTS 13 decompositions from which we get the graph complexity measures called tree- width and clique-width.

Monadic second-order logic in its two variants denoted by MS and MS

2

can be used to express formally graph properties and thus to specify sets of graphs.

The Recognizability Theorem says that every MS-definable set of graphs is recognizable in the VR algebra and that every MS

2

-definable set is recognizable in the HR algebra. We obtain from it fixed-parameter tractable algorithms for checking MS and MS

2

properties with, respectively, clique-width and tree- width as parameters. It entails that the corresponding monadic second-order satisfiability problems are decidable for the equational sets of the two algebras.

The Sparseness Theorem says that MS and MS

2

logic have the same power for defining sets of uniformly k-sparse graphs.

Graph transformations called monadic second-order transductions can be specified by MS or by MS

₂

formulas. The Equationality Theorem says that they generate from the set of trees, respectively, the equational sets of the VR and of the HR algebra. This shows the robustness of this theory that combines algebraic and logical notions. Its main definitions and results are actually for- mulated for relational structures which generalize graphs and incidence graphs, but several problems are open regarding this extension.

We will only consider finite graphs and finite relational structures. Another book would be necessary to cover the rich existing theory of countable structures that is important in Program Semantics.

Summary

The letters GT, UA, LT, L and A indicate that a chapter deals mainly with Graph Theory, Universal Algebra, (Formal) Language Theory, Logic and Algo- rithmic applications respectively.

Chapter 1 is an overview which presents the main definitions and results in an informal way, with the help of examples.

Chapter 2 (GT, UA) defines two families of graph operations and the as- sociated graph complexity measures of tree-width and clique-width. The two corresponding graph algebras, called the HR algebra and the VR algebra, are first defined as single sorted algebras and, later on, they are “refined” into many-sorted algebras.

Chapter 3 (UA, LT) defines and studies the equational and recognizable sets of many-sorted algebras in general. Its main result is the (algebraic version of the) Filtering Theorem.

Chapter 4 (GT, LT) applies the definitions and results of Chapter 3 to the graph algebras defined in Chapter 2 and establishes results which do not follow only from the general algebraic definitions.

Chapter 5 (L, UA, GT) introduces monadic second-order logic and devel- ops tools for expressing graph properties by monadic second-order formulas.

Definitions and proofs are given for relational structures. In particular, the

Recognizability Theorem is proved for a many-sorted algebra of finite relational

(25)

structures. The particular cases of this theorem for the HR and the VR graph algebras follow as immediate corollaries.

Chapter 6 (L, LT, A) is devoted to algorithmic applications. It reviews the parsing algorithms that construct the necessary expressions of the input graphs by terms over the operations of the HR and of the VR algebra. It develops in detail the “compilation” of monadic second-order formulas into finite automata intended to run on the terms resulting from the parsing step. This construction is hopefully more usable than the one of Chapter 5. It yields alternative proofs of weaker versions of the Recognizability Theorem.

Chapter 7 (L, LT, GT) defines monadic second-order transductions and es- tablishes their main properties: closure under composition and preservation of monadic second-order definability under inverse monadic second-order trans- ductions: we call this latter result the Backwards Translation Theorem. The Equationality Theorem characterizes the VR- and the HR-equational sets as the images of the set of trees (equivalently of terms over any rich enough functional signature) under monadic second-order transductions of appropriate types. Four types of transductions come from the two possible representations of a graph by a relational structure for the input and the output. The Equationality The- orems characterize bounded clique-width and bounded tree-width in a way that does not depend on the graph operations chosen in Chapter 2. Hence, VR- and HR-equationality as well as the properties of having bounded clique-width and tree-width are robust in the sense that they are stable under the monadic second-order transductions that are respectively specified by MS and by MS

2

formulas.

Chapter 8 (L, LT) shows that the classical automata-theoretic characteriza- tion (recalled in Chapter 5) of the monadic second-order definable sets of terms (hence, also of words) extends to monadic second-order transductions. More precisely, these transductions are characterized in terms of two-way finite-state transducers on words, and of tree-walking transducers on terms, where “tree”

refers to the representation of terms by labelled ordered trees. Characterizations of the VR-equational (equivalently of the HR-equational) languages of words and terms are obtained. Every (functional) monadic second-order transduction of graphs of bounded tree-width can be realized by a tree-walking transducer on the level of terms (of the HR algebra), i.e., it can be realized by parsing the input graph, applying the tree transducer to the resulting input term, and evaluating the output term of the transducer to produce the output graph. The same result holds for clique-width and the VR algebra. This can be viewed as a generalization of the Recognizability Theorem to graph transductions.

Chapter 9 (L, GT, UA, A) extends to finite relational structures the defi-

nitions and results of Chapters 2 to 7. It contains in particular an extension

to relational structures of the Equationality Theorem. Although many results

extend easily from graphs to relational structures, some seemingly difficult ques-

tions remain open. Additionally, this chapter proves the Sparseness Theorem

which establishes that, for expressing properties of uniformly k-sparse graphs

(26)

CONTENTS 15 (or relational structures

¹¹

) by monadic second-order formulas, quantifications over sets of edges (or sets of tuples, respectively) bring no additional power:

every MS

₂

formula can be translated into an equivalent MS formula.

Chapter 10 reviews some open problems and some results not presented in the previous chapters.

The bibliography is organized in two parts: the first part (with reference labels starting with *) lists books, book chapters and survey articles. The second one lists research articles and dissertations.

All necessary definitions will be given, but the reader is expected to be famil- iar with the basic notions of Logic (mainly first-order logic), of Universal Algebra (algebras, congruences), of Formal Language Theory (context-free grammars, fi- nite automata), and of Graph Theory (basic notions). Chapters 2 to 9 present detailed proofs of results that have been published in articles. It was not an easy task to elaborate consistent definitions and notations for many different notions from various fields, namely Language Theory, Universal Algebra, Logic and Graph Theory. By giving precise definitions and carefully written proofs, our first aim is to give a robust foundation to the field described in this intro- duction. Our second aim is that these definitions and proofs can be adapted to related notions, implemented and improved by researchers without too much effort.

Acknowledgements

We thank M. Nivat for writing a preface. The L

^A

TEX typing of the first drafts of most chapters has been done by graduate and doctoral students of Bordeaux 1 University: M. Kant´ e, R. Chen, R. Synave, R. Li and S. Abbas. We thank them warmly for their care and patience. We also thank A. Arnold, A. Blumensath, S. Djelloul, I. Durand and S. Oum who have read and commented some chapters.

Without being a member of the “Institut Universitaire de France” (IUF), B. Courcelle could not have started the project of writing this book. He thanks M. Nivat and W. Thomas who presented his application to IUF, and all those who supported it by writing recommendation letters. He dedicates his work to the memory of Philippe Flajolet (1948-2011), a forty year friend with whom he began his career of researcher at INRIA in 1973.

11Relational structures are used to prove the case of the theorem that concerns graphs.

A relational structure is uniformly k-sparse if its number of tuples is at mostk times the cardinality of its domain, and the same holds for all its substructures.

(27)

(28)

Chapter 1

Overview

This chapter presents the main definitions and results of this book and their significance, with the help of a few basic examples. It is written so as to be readable independently of the others. Definitions are sometimes given infor- mally, with simplified notation, and most proofs are omitted. All definitions will be repeated with the necessary technical details in the subsequent chapters.

In Section 1.1, we present the notion of equational set of an algebra by using as examples a context-free language, the set of cographs and the set of series-parallel graphs. We also introduce our algebraic definition of derivation trees.

In Section 1.2, we introduce the notion of recognizability in a concrete way, in terms of properties that can be proved or refuted, for every element of the considered algebra, by an induction on any term that defines this element. We formulate a concrete version of the Filtering Theorem saying that the intersec- tion of an equational set and a recognizable one is equational. It follows that one can decide if a property belonging to a finite inductive set of properties is valid for every element of a given equational set. We explain the relationship between recognizability and finite automata on terms.

In Section 1.3, we show with several key examples how monadic second-order sentences can express graph properties. We recall the fundamental equivalence of monadic second-order sentences and finite automata for words and terms.

In Section 1.4, we introduce two graph algebras. They are called the VR and the HR algebra because their equational sets are those that are generated by the context-free Vertex Replacement and Hyperedge Replacement graph grammars respectively. The cographs and the series-parallel graphs are respectively our current examples of a VR- and an HR-equational set. We state (a weak version of) the Recognizability Theorem which says, in short, that monadic second- order definability implies recognizability. From it we obtain a logical version of the Filtering Theorem where the recognizable sets are defined by monadic second-order sentences.

In Section 1.5, we review the basic definitions of Fixed-Parameter Tractabil- ity and we state the algorithmic consequences of the (weak) Recognizability

17

(29)

Theorem. This theorem has actually two versions, relative to the two graph algebras defined in Section 1.4, and yields two Fixed-Parameter Tractability Theorems.

In Section 1.6, we describe the consequences of the Recognizability and Fil- tering Theorems for the problem of deciding whether a given monadic second- order sentence is satisfied by some graph of tree-width at most a given k or, more generally, by some graph of an equational set.

In Section 1.7, we introduce the notion of a monadic second-order trans- duction by means of examples that have some graph theoretic content, and we state the Equationality Theorem for the VR algebra. It gives a characterization of the VR-equational sets, and in particular of the sets of graphs of bounded clique-width, that is formulated in purely logical terms.

In Section 1.8, we consider monadic second-order formulas interpreted in incidence graphs (as opposed to in graphs “directly”). These formulas can use edge set quantifications. We compare the corresponding four types of monadic second-order transduction and we state the Equationality Theorem for the HR algebra: it is based on monadic second-order transductions that transform in- cidence graphs.

In Section 1.9, we define relational structures and we extend to them (eas- ily) some results relative to graphs represented by their incidence graphs. We introduce betweenness and cyclic ordering as examples of combinatorial notions that are based on linear orderings but are defined in a natural way as ternary relations.

1.1 Context-free grammars

By starting from the standard notion of a context-free grammar, we introduce the notion of an equational set and we define two equational sets of graphs. We define the equational sets of a (one-sorted) algebra and the corresponding sets of derivation trees.

1.1.1 Context-free word grammars

By using context-free grammars, one can specify certain formal languages, namely the context-free languages, in a finitary way. Context-free grammars are usually defined as rewriting systems satisfying particular properties, conveyed by the term “context-free” and axiomatized in [Cou87]. However, the Least Fixed-Point Characterization of context-free languages due to Ginsburg and Rice [GinRic]

and to Chomsky and Sch¨ utzenberger [ChoSch] is formulated in terms of systems

of recursive equations written with the operations of union and concatenation

over languages. This algebraic view has been developed by Mezei and Wright

[MezWri] and has many advantages. First, it is more synthetic in that it deals

with languages rather than with words produced individually by derivation se-

quences. Second, it puts the study of context-free languages in the more general

(30)

1.1. CONTEXT-FREE GRAMMARS 19 framework of recursive definitions handled as least solutions of systems of equa- tions, and last but not least, it is applicable to any algebra. This latter aspect is especially important for the extension to graphs.

We recall how context-free languages can be characterized as the components of the least solutions of certain systems of equations in languages. A context- free grammar G is a finite set of rewriting rules defined with two alphabets, a terminal alphabet A and a nonterminal alphabet N. For every S in N , the context-free language over A generated by G from S is denoted by L(G, S).

Example 1.1 We consider for example the context-free grammar G with ter- minal alphabet A = {a, b, c}, nonterminal alphabet N = {S, T} and rules named respectively p, q, . . . , w (where ε denotes the empty word):

p : S → aST q : S → SS r : S → a s : T → bT ST u : T → a v : T → c w : T → ε

It defines two languages L(G, S) and L(G, T ) over A, i.e., sets of words in A

^∗

. These languages satisfy the equations of the following system Σ

G

:

Σ

_G

( K = aKL ∪ KK ∪ {a}

L = bLKL ∪ {a, c, ε}

with K = L(G, S) and L = L(G, T ). The pair (L(G, S), L(G, T )) is thus a so- lution of Σ

_G

. However, it is not the only one. The pair of languages (A

^∗

, bA

^∗

∪ {a, c, ε}) is another solution as one checks easily

¹

. The Least Fixed-Point Char- acterization of context-free languages establishes that the pair (L(G, S), L(G, T )) is the least solution of Σ

G

for component-wise inclusion.

1.1.2 Cographs

We give two examples of similar definitions of sets of graphs. We first consider as ground set the set G

^u

of undirected simple graphs

²

. Two isomorphic graphs are considered as the same object. We will use ⊕ to denote the disjoint union of two graphs G and H . This means that G ⊕ H is the union of G and of a copy of H disjoint with G (hence G ⊕ G 6= G). We will also use the complete join, G ⊗ H , defined as G ⊕ H augmented with undirected edges linking every vertex

1SinceL(G, S)⊆A⁺=AA^∗, the pair (A^∗, bA^∗∪ {a, c, ε}) is a solution of ΣGthat differs from (L(G, S), L(G, T)).

2In this book, all graphs are finite. A graph issimpleif it has no twoparallel edges, i.e., no two edges with the same ends, and the same directions in the case of directed graphs. Parallel edges are also calledmultipleedges. An edge with equal ends is aloop. The superscript “u”

inG^urefers to undirected graphs.

(31)

Figure 1.1: A cograph.

of G and every vertex of H . We let 1 denote any graph with one vertex and no edges. Note that both ⊕ and ⊗ are commutative and associative operations.

The set of cographs C can be defined as the least set of graphs satisfying the equation

C = (C ⊕ C) ∪ (C ⊗ C) ∪ {1}. (1.1) This set (it is a proper subset of G

^u

) has actually alternative characterizations (see Section 1.3.1 below). From this equation, one can derive definitions of cer- tain subsets of C. Consider for example the following system of two equations:

( C

0

= (C

0

⊕ C

0

) ∪ (C

1

⊕ C

1

) ∪ (C

0

⊗ C

0

) ∪ (C

1

⊗ C

1

)

C

1

= (C

0

⊕ C

1

) ∪ (C

0

⊗ C

1

) ∪ {1}. (1.2) Its least solution in P(G

^u

) × P(G

^u

) is the pair of sets (C

₀

, C

₁

) where C

₀

(resp.

C

₁

) is the set of cographs having an even (resp. an odd) number of vertices.

³

We will give general and effective methods for deriving from an equation or a system of equations that defines a set L, an equation or a system of equations defining {x ∈ L | P (x)} where P is a property of the objects under consideration. This is possible if P has an appropriate “inductive behaviour” relative to the operations with which the given equation or system of equations is written.

From the definition of cographs as elements of the least subset C of G

^u

satisfying (1.1), it follows that each of them is denoted by a term, more formally, is the value of a term in an algebra of graphs. Examples of terms denoting cographs are

1, 1 ⊕ 1, (1 ⊕ 1) ⊗ 1, (1 ⊕ 1) ⊗ (1 ⊕ 1).

The cograph of Figure 1.1 is the value of the term t = (1⊗1 ⊗1) ⊗(1⊕ (1⊗1)).

Since ⊗ is associative, we have written t by omitting some parentheses as usual, for readability. These terms belong to the set T ({⊕, ⊗, 1}) of all terms written with the constant 1 and the two binary operations ⊕ and ⊗. Equa- tion (1.1) can also be solved with ground set T ({⊕, ⊗, 1}). For this interpreta- tion of (1.1) the unknown C denotes subsets of T ({⊕, ⊗, 1}). Clearly, the set

3We denote byP(X) the powerset of a setX, i.e., its set of subsets.

Graph structure and monadic second-order logic. A language-theoretic approach.

HAL Id: hal-00646514

https://hal.archives-ouvertes.fr/hal-00646514

Submitted on 14 Jun 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

language-theoretic approach.

Bruno Courcelle, Joost Engelfriet

To cite this version:

Bruno Courcelle, Joost Engelfriet. Graph structure and monadic second-order logic. A language-

theoretic approach.. Cambridge University Press, pp.728, 2012, Encyclopedia of Mathematics and its

applications, Vol. 138, 978-0-521-89833-1. �hal-00646514�

Bruno Courcelle and Joost Engelfriet

Graph Structure and Monadic Second-Order Logic, a Language

Theoretic Approach

April 2011

to be published by

Cambridge University Press

3

To Dominique and Louco

Contents

Introduction 1

1 Overview 17

1.1 Context-free grammars . . . . 18

1.1.1 Context-free word grammars . . . . 18

1.1.2 Cographs . . . . 19

1.1.3 Series-parallel graphs . . . . 21

1.1.4 The general setting . . . . 22

1.1.5 Derivation trees . . . . 26

1.2 Inductive sets of properties and recognizability . . . . 30

1.2.1 Properties of the words of a context-free language . . . . 30

1.2.2 Some properties of series-parallel graphs . . . . 32

1.2.3 Inductive sets of properties . . . . 34

1.2.4 Recognizability . . . . 37

1.2.5 From inductive sets to automata . . . . 38

1.3 Monadic second-order logic . . . . 41

1.3.1 Monadic second-order graph properties . . . . 41

1.3.2 Monadic second-order logic and recognizability . . . . 45

1.4 Two graph algebras . . . . 46

1.4.1 The algebra of simple graphs with ports . . . . 47

1.4.2 The algebra of graphs with sources . . . . 49

1.4.3 A weak Recognizability Theorem . . . . 51

1.5 Fixed-parameter tractability . . . . 53

1.6 Decidability of monadic second-order logic . . . . 56

1.7 Graph transductions . . . . 58

1.7.1 Examples of monadic second-order transductions . . . . . 59

1.7.2 The main properties of monadic second-order transductions 65 1.7.3 The Equationality Theorem . . . . 67

1.8 Monadic second-order logic with edge set quantifications . . . . . 68

1.8.1 Expressing graph properties with edge set quantifications 68 1.8.2 Monadic second-order transductions over incidence graphs 70 1.9 Relational structures . . . . 74

1.9.1 Relational signatures and structures . . . . 74

1.9.2 Betweenness and cyclic ordering . . . . 76

5

1.9.3 Relational databases . . . . 78

1.10 References . . . . 78

2 Graph algebras and widths of graphs 81 2.1 Algebras and terms . . . . 82

2.2 Graphs . . . . 88

2.3 The HR algebra of graphs with sources . . . 100

2.3.1 The HR graph operations . . . 100

2.3.2 Construction of the s-graph defined by a term . . . 111

2.3.3 Algebraic properties and derived operations defined by contexts . . . 117

2.4 Tree-decompositions . . . 122

2.4.1 Tree- and path-decompositions . . . 123

2.4.2 Some properties of tree-decompositions . . . 128

2.4.3 Transformations of tree- and path-decompositions . . . . 130

2.4.4 Tree-decompositions and chordal graphs . . . 134

2.4.5 A syntax for tree-decompositions . . . 137

2.5 The VR algebra of simple graphs with ports . . . 146

2.5.1 The VR graph operations . . . 146

2.5.2 Construction of the p-graph defined by a term . . . 151

2.5.3 Algebraic properties and derived operations defined by contexts . . . 155

2.5.4 Properties of clique-width . . . 159

2.5.5 Comparisons between tree-width and clique-width . . . . 165

2.5.6 Variations on F

. . . 171

2.6 Many-sorted graph algebras . . . 178

2.6.1 Many-sorted algebras . . . 178

2.6.2 The many-sorted HR algebra . . . 182

2.6.3 The many-sorted VR algebra . . . 186

2.7 References . . . 187

3 Equational and recognizable sets in many-sorted algebras 189 3.1 The equational sets of an algebra . . . 190