• Aucun résultat trouvé

Saturation, Definability, and Separation for XPath on Data Trees

N/A
N/A
Protected

Academic year: 2022

Partager "Saturation, Definability, and Separation for XPath on Data Trees"

Copied!
4
0
0

Texte intégral

(1)

Saturation, Definability, and Separation for XPath on Data Trees

Sergio Abriola1, Mar´ıa Emilia Descotte1, and Santiago Figueira1,2

1 University of Buenos Aires, Argentina

2 CONICET, Argentina

Abstract. We study the expressive power of some fragments of XPath equipped with (in)equality tests over data trees.

Our main results are the definability theorems, which give necessary and sufficient conditions under which a class of data trees can be defined by a node expression or set of node expressions, and our separation theorems, which give sufficient conditions under which two disjoint classes of data trees can be separated by a class of data trees definable in XPath.

Keywords: XPath· data tree · bisimulation · definability · first-order logic · ultraproduct·saturation· separation.

1 Introduction

The abstraction of an XML document is a data tree, i.e. a tree whose every node contains a tag or label (such as LastName) from a finite domain, and a data value (such asSmith) from an infinite domain. XPath is the most widely used query language for XML documents; it is an open standard and consti- tutes a World Wide Web Consortium (W3C) Recommendation [3]. XPath=has syntactic operators to navigate the tree using the ‘child’, ‘parent’, ‘sibling’, etc.

accessibility relations, and can make tests on intermediate nodes. It can express properties of the underlying tree structure of the XML document, such as “the root of the tree has a child labeledaand a child labeledb”, and it can express conditions on the actual data contained in the attributes, such as “the root of the tree has two children with same tagabut different data value”.

First, we provide notions of saturation and ultraproducts that are adequate for XPath=, and show that bisimulation coincides with logical equivalence over saturated data trees. Using these tools, we show definability theorems, giving necessary and sufficient conditions under which a class of data trees can be defined by a node expression or set of node expressions of XPath=. Finally we give separation results, providing sufficient conditions under which two disjoint classes of data trees can be separated by a class of data trees definable in XPath=. While on this work we will only show results for the fragment of XPath that can only navigate via the ‘child’ accessibility relation, similar results hold for the vertical fragment having both the ‘child’ and ‘parent’ navigational operators.

The results on definability of this paper appeared originally in [1].

(2)

2 Preliminaries

Data trees.We say thatT is adata treeif it is a tree fromTrees(A×D), where Ais a finite set of labelsandDis an infinite set of data values. The data of a nodexis denoteddata(x), and its label islabel(x). The set of nodes of a data treeT is denotedT.

Downward XPath with data tests.We consider a fragment of XPath that corre- sponds to the navigational part of XPath 1.0 with data equality and inequality.

XPath=is a two-sorted language, withpath expressions(that we writeα, β, γ) and node expressions (that we write ϕ, ψ, η). The downward XPath, no- tated XPath= is defined by mutual recursion as follows:

α, β ::= o | [ϕ] | αβ | α∪β o∈ {ε,↓}

ϕ, ψ ::= a | ¬ϕ | ϕ∧ψ | ϕ∨ψ | hαi | hα=βi | hα6=βi a∈A

Node expressions represent properties on nodes. They are evaluated in nodes, and, intuitively, hα =βiis true at x if there are two paths starting in x, one satisfying the property α, and the other satisfying the property β, which end in nodes with equal data value. On the other hand, path expressions represent properties on paths. They are evaluated in pairs of nodes. For instance↓ is true at (x, y) ify is a child ofx, and [ϕ] is true atx, yifx=y andxsatisfiesϕ.

LetT andT0 be data trees, and let u∈T, u0 ∈T0. We say that T, u and T0, u0 arelogically equivalent for XPath=if no XPath=can distinguish node ufromu0.

Bisimulations.Notions of bisimulation present a way to determine whether two pointed data trees can be distinguished by a series of moves in XPath. We do not reproduce them here, but it is worth mentioning that they are forms of back-and-forth conditions over two data trees.

The main previous result in the literature establishing the connection between bisimulation and equivalence is the following:

Theorem 1. [4] IfT, uis bisimilar to T0, u0, then they are logically equivalent.

If T andT0 are finitely branching, the other implication also holds.

3 Saturation and quasi-ultraproducts

We introduce a notion of saturation for the downward fragment of XPath, and show that the reverse implication of Theorem 1 is true over saturated data trees.

Saturation is the key ingredient to show the Definability theorems, but their use lays hidden in the proof.

Saturation. Let hΣ1, . . . , Σni and hΓ1, . . . , Γmi be tuples of sets of XPath=- formulas. Given a data treeT andu∈T, we say thathΣ1, . . . , ΣniandhΓ1, . . . , Γmi are =n,m-satisfiable [resp.6=n,m-satisfiable] at T, uif there exist v0→v1

· · · →vn∈T andw0→w1→ · · · →wm∈T such thatu=v0=w0and

(3)

1. for alli∈ {1, . . . , n},T, vi|=Σi; 2. for allj∈ {1, . . . , m},T, wj|=Γj; and

3. data(vn) =data(wm) [resp.data(vn)6=data(wm)].

We say thathΣ1, . . . , ΣniandhΓ1, . . . , Γmiare =n,m-finitely satisfiable[resp.

6=n,m-finitely satisfiable] atT, uif for every finiteΣi0⊆Σi and finiteΓj0⊆Γj, we have that hΣ10, . . . , Σn0i and hΓ10, . . . , Γm0 i are =n,m-satisfiable [resp. 6=n,m- satisfiable] atT, u.

Definition 2. We say that a data treeT is↓-saturated if for every n, m∈N, every pair of tuples hΣ1, . . . , ΣniandhΓ1, . . . , Γmiof sets of XPath=-formulas, every u∈T, and?∈ {=,6=}, the following is true:

ifhΣ1, . . . , ΣniandhΓ1, . . . , Γmiare?n,m-finitely satisfiable atT, uthen hΣ1, . . . , ΣniandhΓ1, . . . , Γmi are?n,m-satisfiable atT, u.

Proposition 3. For ↓-saturated data trees, bisimulation coincides with logical equivalence.

Quasi-ultraproducts We introduce the notion of quasi-ultraproduct, a variant of the usual notion of first-order model theory, which will be needed for the definability theorems. Some of our results for quasi-ultraproducts make use of the fundamental theorem of ultraproducts (see e.g. [2, Thm. 4.1.9]).

Definition 4. Suppose(Ti, ui)i∈I is a family of pointed data trees,U is an ul- trafilter over I,T is the ultraproduct of (Ti, ui)i∈I, and u is the ultralimit of (ui)i∈I. The ↓-quasi ultraproductof (Ti, ui)i∈I modulo U is the pointed data tree(T|u, u), whereT|u denotes the subtree ofT induced by al the descen- dants of u. As a particular case one has the notion of ↓-quasi ultrapower.

4 Definability

Definability theorems address the question of which properties of models can be defined via formulas of the logic. IfKis a class of pointed data trees, we denote its complement byK.

Theorem 5. Let K be a class of pointed data trees. Then K is definable by a set of XPath=-formulas iffK is closed under↓-bisimulations and ↓-quasi ultra- products, and K is closed under↓-quasi ultrapowers.

Theorem 6. Let K be a class of pointed data trees. Then K is definable by an XPath=-formula iff bothK andK are closed under↓-bisimulations and↓-quasi ultraproducts.

The notion of`-bisimulation is a restricted version of↓-bisimulations. It has been shown to coincide with the notion of`-equivalence, which informally means indistinguishable by XPath= formulas that cannot “see” beyond` ‘child’-steps from the current point of evaluation.

Theorem 7. Let K be a class of pointed data trees. Then K is definable by a formula of XPath= iffK is closed by`-bisimulations for XPath= for some`.

(4)

5 Separation

Separation theorem provide conditions under which two disjoint classes of pointed models can be separated by a class definable in the logic.

Theorem 8. Let K1 and K2 be two disjoint classes of pointed data trees such that K1 is closed under ↓-bisimulations and ↓-quasi ultraproducts and K2 is closed under ↓-bisimulations and ↓-quasi ultrapowers. Then there exists a third class K which is definable by a set of XPath=-formulas, contains K1 and is disjoint fromK2.

Theorem 9. LetK1 andK2 be two disjoint classes of pointed data trees closed under ↓-bisimulations and ↓-quasi ultraproducts. Then there exists a third class K which is definable by an XPath=-formula, contains K1 and is disjoint from K2.

References

1. Sergio Abriola, Mar´ıa Emilia Descotte, and Santiago Figueira. Definability for down- ward and vertical Xpath on data trees. InLogic, Language, Information, and Com- putation - 21st International Workshop, WoLLIC 2014, Valpara´ıso, Chile, Septem- ber 1-4, 2014. Proceedings, pages 20–35, 2014.

2. C.C. Chang and H.J. Keisler. Model theory. Studies in logic and the foundations of mathematics. North-Holland, 1990.

3. J. Clark and S. DeRose. XML path language (XPath). Website, 1999. W3C Recommendation.http://www.w3.org/TR/xpath.

4. D. Figueira, S. Figueira, and C. Areces. Basic model theory of XPath on data trees.

InICDT, pages 50–60, 2014.

Références

Documents relatifs

Therefore, besides their specific vulnerability to summer embolism, species having large conduits will have an additional sensitivity to winter cavitation: under the same water

In this paper, we prove that for any language L of finitely branch- ing finite and infinite trees, the following properties are equivalent: (1) L is definable by an existential

When Syt1 oligomerization is compromised with a targeted mutation (F349A), a large majority (85%) of these SUVs (Syt1 349 -vSUV) are mobile and spontaneous fuse similar to vSUVs,

In this section we present numerical results to illustrate the theoretical results derived in the previous sections and show that well chosen regularization operators and

We consider two types, first closure of the domain under simple set-theoretic operations, particularly under finite unions, second, whether the algebraic choice functions (or

Our main technical contribution therefore lies in the identification of a fragment of first-order logic that (1) supports reasoning about mutable tree data structures; (2)

Market volume is also necessary to be evaluated by national market regulators and central banks in order to have current information on global tendency of their

Second order sufficient optimality conditions for some state-constrained control problems of semilinear elliptic equations.. A new approach to