• Aucun résultat trouvé

On the Expressiveness of Probabilistic XML Models

N/A
N/A
Protected

Academic year: 2022

Partager "On the Expressiveness of Probabilistic XML Models"

Copied!
29
0
0

Texte intégral

(1)

Serge Abiteboul Benny Kimelfeld Yehoshua Sagiv3 Pierre Senellart4

1

2 4

(2)

Probabilistic XML

Framework Unordered data trees

Details: no attributes, no mixed content. . . A

B C

D 6=

A

B B C

D

(multiset semantics) Sample space: Set of all such data trees.

Probabilistic XML database: (Succinct) representation of a discrete probability distributionover this sample space (= a set of possible worlds).

(3)

Details: no attributes, no mixed content. . . A

B C

D 6=

A

B B C

D

(multiset semantics) Sample space: Set of all such data trees.

(4)

Probabilistic XML

Framework Unordered data trees

Details: no attributes, no mixed content. . . A

B C

D 6=

A

B B C

D

(multiset semantics) Sample space: Set of all such data trees.

Probabilistic XML database: (Succinct) representation of a discrete probability distributionover this sample space (= a set of possible worlds).

(5)

Goal

A genericframework for probabilistic XML

Previously proposed models: concreteinstancesof this framework Comparison of theexpressiveness of various models

Updatecapabilities in various models Efficiency issues

(6)

Outline

1 Introduction

2 P-documents

The P-document Model Types of Distributional Nodes

Link with Previously Studied Models

3 Efficient Translations between Models

4 Update Capabilities

5 Conclusion

(7)

A

ind

mux

0:8

B

0:3

C

0:7

D

0:6

Tree with ordinary(circles) and distributional (rectangles) nodes Distributional nodes specify how their children can berandomly selected

Several kindsof distributional nodes (see later on)

Possible-world semantics: every possible selection of children of distributional nodes, with associated probability

(8)

Possible-world semantics

A

B p3 =0:096 A

D p2=0:12 A

p1 =0:08

A

B D

p4=0:144

A

C p5 =0:224

A

C D

p6 =0:336

(9)

det

B1 . . . Bn

All children are always chosen Not really a distributional node, but sometimes useful in hierarchies

(10)

Independent

ind

B1 p1

. . . Bn pn

Children are randomly chosen independently of one another

The probability of choosing each child is given

(11)

mux

B1 p1

. . . Bn pn

At most one child is randomly chosen, with probability pi

Pn

i=1pi 1

(12)

Exponential

exp

B1 . . . Bn

Subset Probability

W1 p1

. . .

Wk pk

The probability of choosing a given subset Wj of the set of children is given 1k 2n

Pk

j=1pj =1

(13)

e1 p1

. . . ek pk

cie

c1 cn

Each ci is aconjunction of the random events ej or their negations :ej (e.g., e1^ :e2^e3)

Each ej is independent of the other ones The probability of each ej is given The ej can be shared across multiple distributional nodes

Only distributional node expressing long-distance dependency

Reminiscent of [Imieliński and Lipski,

(14)

Families of P-documents

Definition

PrXMLftype1;type2;:::g: family of p-documents obtained with the distributional nodes type1;type2; : : :

PrXMLfj6htype1;type2;:::g: no hierarchy of distributional nodes is

allowed (i.e., the child of a distributional node is an ordinary node)

(15)

[Nierman and Jagadish, 2002]: PrXMLfmux;detg [van Keulen et al., 2005]: subset ofPrXMLfmux;detg

[Abiteboul and Senellart, 2006, Senellart and Abiteboul, 2007]:

PrXMLfj6hindg and PrXMLfj6hcieg

[Hung et al., 2003]: subset ofPrXMLfj6hexpg (graphs restricted to trees) [Hung et al., 2007]: subset ofPrXMLfj6hexpg (intervals restricted to

points)

(16)

Outline

1 Introduction

2 P-documents

3 Efficient Translations between Models V-translations

Main Results

4 Update Capabilities

5 Conclusion

(17)

Definition

F isv-translatabletoF0 if, for eachP 2 F~ , there is aP~0 2 F0 such that the possible-world semanticsof P~ and P~0 areisomorph.

If, additionally, P~0 can be obtained from P~ in polynomial time, F is efficiently v-translatable toF0.

(18)

Main Results

All families of p-documents are translatable toPrXMLfmux;detg PrXMLfmux;indg is efficiently translatable to PrXMLfexpg and to PrXMLfcieg

PrXMLfexpg is not efficiently translatable to PrXMLfj6hexpg PrXMLfcieg is efficiently translatable to PrXMLfj6hcieg

PrXMLfcieg is not efficiently translatable to PrXMLfind;mux;expg PrXMLfexpg toPrXMLfcieg: open problem, butPrXMLfexpg with bounded height or bounded degree efficiently translatable to PrXMLfcieg

(19)

1 Introduction

2 P-documents

3 Efficient Translations between Models

4 Update Capabilities V-updates

Main Results

(20)

V-updates

Elementary insertions and deletions

Locator queryindicating where to apply the (cf. XPath for XUpdate, XQuery for XQuery Update)

The update itself can be probabilistic: “Insert this subtree at each node matched by this query with probabilityp.”

(21)

PrXMLfmux;detg (and any family v-translatable to this) closed under updates for any class of queries

PrXMLfcieg,PrXMLfmuxg,PrXMLfexpg, etc.: tractably closed under updates defined by single-path queries such that the matched node is at the end of the path

Withoutcie nodes: not tractably closed under insertions defined by single-path queries

PrXMLfcieg: tractably closed under insertions defined by tree-pattern queries with joins

(22)

Outline

1 Introduction

2 P-documents

3 Efficient Translations between Models

4 Update Capabilities

5 Conclusion

(23)

Tree-pattern projection queries can be processed efficientlyin PrXMLfind;muxg.

Tree-pattern projection queries are #P-completein PrXMLfcieg.

In summary:

PrXMLfcieg is more succinct.

Simple updates remaintractable in PrXMLfcieg.

. . . but (projection) queries are intractableinPrXMLfcieg.

(24)

A Word About Queries

It was shown [Kimelfeld and Sagiv, 2007, Kimelfeld et al., 2008] that:

Tree-pattern projection queries can be processed efficientlyin PrXMLfind;muxg.

Tree-pattern projection queries are #P-completein PrXMLfcieg.

In summary:

PrXMLfcieg is more succinct.

Simple updates remaintractable in PrXMLfcieg.

. . . but (projection) queries are intractableinPrXMLfcieg.

Trade-off betweenqueries and updates, or betweenqueries and express- ibility of complex dependencies.

(25)

Tree-pattern projection queries can be processed efficientlyin PrXMLfind;muxg.

Tree-pattern projection queries are #P-completein PrXMLfcieg.

In summary:

PrXMLfcieg is more succinct.

Simple updates remaintractable in PrXMLfcieg.

. . . but (projection) queries are intractableinPrXMLfcieg.

(26)

Perspectives

Alternative approach: external constraints[Cohen et al., 2008]

Multiset! setsemantics Equivalenceof p-documents Validationagainst a DTD

(27)
(28)

References I

Serge Abiteboul and Pierre Senellart. Querying and updating probabilistic information in XML. InProc. EDBT, Munich, Germany, March 2006.

Sara Cohen, Benny Kimelfeld, and Yehoshua Sagiv. Incorporating constraints in probabilistic XML. InProc. PODS, Vancouver, Canada, June 2008.

Edward Hung, Lise Getoor, and V. S. Subrahmanian. PXML: A probabilistic semistructured data model and algebra. InProc.

ICDE, Bangalore, India, March 2003.

Edward Hung, Lise Getoor, and V. S. Subrahmanian. Probabilistic interval XML. ACM Transactions on Computational Logic, 8(4), 2007.

Tomasz Imieliński and Witold Lipski. Incomplete information in

(29)

XML. InProc. VLDB, Vienna, Austria, September 2007.

Benny Kimelfeld, Yuri Kosharovski, and Yehoshua Sagiv. Query efficiency in probabilistic XML models. In Proc. SIGMOD, Vancouver, Canada, June 2008.

Andrew Nierman and H. V. Jagadish. ProTDB: Probabilistic data in XML. InProc. VLDB, Hong Kong, China, August 2002.

Pierre Senellart and Serge Abiteboul. On the complexity of managing probabilistic XML data. InProc. PODS, pages 283–292, Beijing, China, June 2007.

Maurice van Keulen, Ander de Keijzer, and Wouter Alink. A

Références

Documents relatifs

Proof PrXML {mux,det} poly o PrXML {ind,mux} is trivial, because a det node is a special case of an ind node (i.e., every child is chosen with probability 1).. We efficiently

[r]

Soit f et g deux fonctions l'une croissante et l'autre décroissante sur l'intervalle [-3

Soit f et g deux fonctions l'une croissante et l'autre décroissante sur l'intervalle [-3

[r]

[r]

The local trace formula, we will recall, is an ex- pansion of a certain distribution Idisc(f', f ) on ?-I(G(F)) x ?-I(G(F)) in terms of weighted orbital integrals

I n the final section of this paper we shall indicate how these theorems have to be modified if F is defined in a plane region.. Some consequences of Theorem 3 are