Saturation, Definability, and Separation for XPath on Data Trees
Sergio Abriola1, Mar´ıa Emilia Descotte1, and Santiago Figueira1,2
1 University of Buenos Aires, Argentina
2 CONICET, Argentina
Abstract. We study the expressive power of some fragments of XPath equipped with (in)equality tests over data trees.
Our main results are the definability theorems, which give necessary and sufficient conditions under which a class of data trees can be defined by a node expression or set of node expressions, and our separation theorems, which give sufficient conditions under which two disjoint classes of data trees can be separated by a class of data trees definable in XPath.
Keywords: XPath· data tree · bisimulation · definability · first-order logic · ultraproduct·saturation· separation.
1 Introduction
The abstraction of an XML document is a data tree, i.e. a tree whose every node contains a tag or label (such as LastName) from a finite domain, and a data value (such asSmith) from an infinite domain. XPath is the most widely used query language for XML documents; it is an open standard and consti- tutes a World Wide Web Consortium (W3C) Recommendation [3]. XPath=has syntactic operators to navigate the tree using the ‘child’, ‘parent’, ‘sibling’, etc.
accessibility relations, and can make tests on intermediate nodes. It can express properties of the underlying tree structure of the XML document, such as “the root of the tree has a child labeledaand a child labeledb”, and it can express conditions on the actual data contained in the attributes, such as “the root of the tree has two children with same tagabut different data value”.
First, we provide notions of saturation and ultraproducts that are adequate for XPath=, and show that bisimulation coincides with logical equivalence over saturated data trees. Using these tools, we show definability theorems, giving necessary and sufficient conditions under which a class of data trees can be defined by a node expression or set of node expressions of XPath=. Finally we give separation results, providing sufficient conditions under which two disjoint classes of data trees can be separated by a class of data trees definable in XPath=. While on this work we will only show results for the fragment of XPath that can only navigate via the ‘child’ accessibility relation, similar results hold for the vertical fragment having both the ‘child’ and ‘parent’ navigational operators.
The results on definability of this paper appeared originally in [1].
2 Preliminaries
Data trees.We say thatT is adata treeif it is a tree fromTrees(A×D), where Ais a finite set of labelsandDis an infinite set of data values. The data of a nodexis denoteddata(x), and its label islabel(x). The set of nodes of a data treeT is denotedT.
Downward XPath with data tests.We consider a fragment of XPath that corre- sponds to the navigational part of XPath 1.0 with data equality and inequality.
XPath=is a two-sorted language, withpath expressions(that we writeα, β, γ) and node expressions (that we write ϕ, ψ, η). The downward XPath, no- tated XPath↓= is defined by mutual recursion as follows:
α, β ::= o | [ϕ] | αβ | α∪β o∈ {ε,↓}
ϕ, ψ ::= a | ¬ϕ | ϕ∧ψ | ϕ∨ψ | hαi | hα=βi | hα6=βi a∈A
Node expressions represent properties on nodes. They are evaluated in nodes, and, intuitively, hα =βiis true at x if there are two paths starting in x, one satisfying the property α, and the other satisfying the property β, which end in nodes with equal data value. On the other hand, path expressions represent properties on paths. They are evaluated in pairs of nodes. For instance↓ is true at (x, y) ify is a child ofx, and [ϕ] is true atx, yifx=y andxsatisfiesϕ.
LetT andT0 be data trees, and let u∈T, u0 ∈T0. We say that T, u and T0, u0 arelogically equivalent for XPath↓=if no XPath↓=can distinguish node ufromu0.
Bisimulations.Notions of bisimulation present a way to determine whether two pointed data trees can be distinguished by a series of moves in XPath. We do not reproduce them here, but it is worth mentioning that they are forms of back-and-forth conditions over two data trees.
The main previous result in the literature establishing the connection between bisimulation and equivalence is the following:
Theorem 1. [4] IfT, uis bisimilar to T0, u0, then they are logically equivalent.
If T andT0 are finitely branching, the other implication also holds.
3 Saturation and quasi-ultraproducts
We introduce a notion of saturation for the downward fragment of XPath, and show that the reverse implication of Theorem 1 is true over saturated data trees.
Saturation is the key ingredient to show the Definability theorems, but their use lays hidden in the proof.
Saturation. Let hΣ1, . . . , Σni and hΓ1, . . . , Γmi be tuples of sets of XPath↓=- formulas. Given a data treeT andu∈T, we say thathΣ1, . . . , ΣniandhΓ1, . . . , Γmi are =↓n,m-satisfiable [resp.6=↓n,m-satisfiable] at T, uif there exist v0→v1 →
· · · →vn∈T andw0→w1→ · · · →wm∈T such thatu=v0=w0and
1. for alli∈ {1, . . . , n},T, vi|=Σi; 2. for allj∈ {1, . . . , m},T, wj|=Γj; and
3. data(vn) =data(wm) [resp.data(vn)6=data(wm)].
We say thathΣ1, . . . , ΣniandhΓ1, . . . , Γmiare =↓n,m-finitely satisfiable[resp.
6=↓n,m-finitely satisfiable] atT, uif for every finiteΣi0⊆Σi and finiteΓj0⊆Γj, we have that hΣ10, . . . , Σn0i and hΓ10, . . . , Γm0 i are =↓n,m-satisfiable [resp. 6=↓n,m- satisfiable] atT, u.
Definition 2. We say that a data treeT is↓-saturated if for every n, m∈N, every pair of tuples hΣ1, . . . , ΣniandhΓ1, . . . , Γmiof sets of XPath↓=-formulas, every u∈T, and?∈ {=,6=}, the following is true:
ifhΣ1, . . . , ΣniandhΓ1, . . . , Γmiare?↓n,m-finitely satisfiable atT, uthen hΣ1, . . . , ΣniandhΓ1, . . . , Γmi are?↓n,m-satisfiable atT, u.
Proposition 3. For ↓-saturated data trees, bisimulation coincides with logical equivalence.
Quasi-ultraproducts We introduce the notion of quasi-ultraproduct, a variant of the usual notion of first-order model theory, which will be needed for the definability theorems. Some of our results for quasi-ultraproducts make use of the fundamental theorem of ultraproducts (see e.g. [2, Thm. 4.1.9]).
Definition 4. Suppose(Ti, ui)i∈I is a family of pointed data trees,U is an ul- trafilter over I,T∗ is the ultraproduct of (Ti, ui)i∈I, and u∗ is the ultralimit of (ui)i∈I. The ↓-quasi ultraproductof (Ti, ui)i∈I modulo U is the pointed data tree(T∗|u∗, u∗), whereT∗|u∗ denotes the subtree ofT∗ induced by al the descen- dants of u∗. As a particular case one has the notion of ↓-quasi ultrapower.
4 Definability
Definability theorems address the question of which properties of models can be defined via formulas of the logic. IfKis a class of pointed data trees, we denote its complement byK.
Theorem 5. Let K be a class of pointed data trees. Then K is definable by a set of XPath↓=-formulas iffK is closed under↓-bisimulations and ↓-quasi ultra- products, and K is closed under↓-quasi ultrapowers.
Theorem 6. Let K be a class of pointed data trees. Then K is definable by an XPath↓=-formula iff bothK andK are closed under↓-bisimulations and↓-quasi ultraproducts.
The notion of`-bisimulation is a restricted version of↓-bisimulations. It has been shown to coincide with the notion of`-equivalence, which informally means indistinguishable by XPath↓= formulas that cannot “see” beyond` ‘child’-steps from the current point of evaluation.
Theorem 7. Let K be a class of pointed data trees. Then K is definable by a formula of XPath↓= iffK is closed by`-bisimulations for XPath↓= for some`.
5 Separation
Separation theorem provide conditions under which two disjoint classes of pointed models can be separated by a class definable in the logic.
Theorem 8. Let K1 and K2 be two disjoint classes of pointed data trees such that K1 is closed under ↓-bisimulations and ↓-quasi ultraproducts and K2 is closed under ↓-bisimulations and ↓-quasi ultrapowers. Then there exists a third class K which is definable by a set of XPath↓=-formulas, contains K1 and is disjoint fromK2.
Theorem 9. LetK1 andK2 be two disjoint classes of pointed data trees closed under ↓-bisimulations and ↓-quasi ultraproducts. Then there exists a third class K which is definable by an XPath↓=-formula, contains K1 and is disjoint from K2.
References
1. Sergio Abriola, Mar´ıa Emilia Descotte, and Santiago Figueira. Definability for down- ward and vertical Xpath on data trees. InLogic, Language, Information, and Com- putation - 21st International Workshop, WoLLIC 2014, Valpara´ıso, Chile, Septem- ber 1-4, 2014. Proceedings, pages 20–35, 2014.
2. C.C. Chang and H.J. Keisler. Model theory. Studies in logic and the foundations of mathematics. North-Holland, 1990.
3. J. Clark and S. DeRose. XML path language (XPath). Website, 1999. W3C Recommendation.http://www.w3.org/TR/xpath.
4. D. Figueira, S. Figueira, and C. Areces. Basic model theory of XPath on data trees.
InICDT, pages 50–60, 2014.