• Aucun résultat trouvé

Updating Probabilistic XML

N/A
N/A
Protected

Academic year: 2022

Partager "Updating Probabilistic XML"

Copied!
69
0
0

Texte intégral

(1)

Updating

Probabilistic XML

Evgeny Kharlamov, Werner Nutt, Pierre Senellart

Free University of Bozen-Bolzano Télécom ParisTech

INRIA Saclay – Île-de-France

Updates in XML, Lausanne, March 2010

1 2

1, 3

1 2 3

(2)

Outline

1. Probabilistic data

2. Problem of updates

3. Updating desecrate PXML

4. Updating continuous PXML

(3)

Applications of

Probabilistic Data

Approximate query processing: ranking, linkage

Information extraction: approximate search for entities (e.g. names) in text

Sensor data: imprecise or missing readings

...

(4)

Probabilistic Database

P = 0.3 P = 0.2 P = 0.5

Probabilistic DB:

(5)

Probabilistic Database

P = 0.3 P = 0.2 P = 0.5

Q a

Q Q

a Probabilistic DB:

(6)

Probabilistic Database

P = 0.3 P = 0.2 P = 0.5

Q a

Q Q

a (a, 0.8)

Answer:

Probabilistic DB:

(7)

Probabilistic Database

P = 0.3 P = 0.2 P = 0.5

Q a

Q Q

a (a, 0.8)

Answer:

Probabilistic DB: Representation of Prob DB:

(8)

Probabilistic Database

P = 0.3 P = 0.2 P = 0.5

Q a

Q Q

a (a, 0.8)

Answer:

Probabilistic DB: Representation of Prob DB:

Q

(a, 0.8)

(9)

Probabilistic Database

P = 0.3 P = 0.2 P = 0.5

Q a

Q Q

a (a, 0.8)

Answer:

Probabilistic DB: Representation of Prob DB:

Q

(a, 0.8)

(10)

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

PXML with Events and Distributional Nodes

[Kimelfeld&al:2007] [Senellart&al:2007]

c - event: “current”

Pr(c) = .4

MUX - mutually exclusive options

(11)

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

(e ∨ u) ∧¬ w

a ∨ ¬ u

PXML with Events and Distributional Nodes

[Kimelfeld&al:2007] [Senellart&al:2007]

c - event: “current”

Pr(c) = .4

MUX - mutually exclusive options

(12)

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

PXML with Events and Distributional Nodes

[Kimelfeld&al:2007] [Senellart&al:2007]

c - event: “current”

Pr(c) = .4

MUX - mutually exclusive options

(13)

Semantics: a world d

c = true (current data)

MUX: 4

Pr(d) = 0.4 x 0.1

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

PXML with Events and Distributional Nodes

[Kimelfeld&al:2007] [Senellart&al:2007]

c - event: “current”

Pr(c) = .4

MUX - mutually exclusive options

(14)

Semantics: a world d

c = true (current data)

MUX: 4

Pr(d) = 0.4 x 0.1

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

PXML with Events and Distributional Nodes

[Kimelfeld&al:2007] [Senellart&al:2007]

c - event: “current”

Pr(c) = .4

MUX - mutually exclusive options

(15)

Semantics: a world d

c = true (current data)

MUX: 4

Pr(d) = 0.4 x 0.1

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

PXML with Events and Distributional Nodes

[Kimelfeld&al:2007] [Senellart&al:2007]

c - event: “current”

Pr(c) = .4

MUX - mutually exclusive options

(16)

Semantics: a world d

c = true (current data)

MUX: 4

Pr(d) = 0.4 x 0.1

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

PXML with Events and Distributional Nodes

[Kimelfeld&al:2007] [Senellart&al:2007]

c - event: “current”

Pr(c) = .4

MUX - mutually exclusive options

(17)

Semantics: a world d

c = true (current data)

MUX: 4

Pr(d) = 0.4 x 0.1

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

PXML with Events and Distributional Nodes

[Kimelfeld&al:2007] [Senellart&al:2007]

c - event: “current”

Pr(c) = .4

MUX - mutually exclusive options

(18)

Semantics: a world d

c = true (current data)

MUX: 4

Pr(d) = 0.4 x 0.1

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

PXML with Events and Distributional Nodes

[Kimelfeld&al:2007] [Senellart&al:2007]

c - event: “current”

Pr(c) = .4

MUX - mutually exclusive options

4

(19)

Semantics: a world d

c = true (current data)

MUX: 4

Pr(d) = 0.4 x 0.1

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

PXML with Events and Distributional Nodes

[Kimelfeld&al:2007] [Senellart&al:2007]

c - event: “current”

Pr(c) = .4

MUX - mutually exclusive options

4

(20)

Discrete Probabilistic XML Documents

Probabilistic XML document D

represents (exponentially) many documents d

each with probability Pr(d)

It is achieved by

Events formulas on edges: over Bool. random vars.

Capture long-distance correlations

Distributional nodes: Mux, Det.

Capture local (hierarchical) dependancies.

(21)

Discrete Probabilistic XML Documents

Probabilistic XML document D

represents (exponentially) many documents d

each with probability Pr(d)

It is achieved by

Events formulas on edges: over Bool. random vars.

Capture long-distance correlations

Distributional nodes: Mux, Det.

Capture local (hierarchical) dependancies.

Special case of event formulas

(22)

Outline

1. Probabilistic data

2. Problem of updates

3. Updating desecrate PXML

4. Updating continuous PXML

(23)

Update Operations

For every professor, insert a bonus of 5 only if her team is in some EU project

For every professor, insert a bonus of X for all EU projects with a duration of X years,

that her team is involved in

We want to insert (delete) data in PXML.

We want to do it conditionally.

(24)

Update Operations

For every professor, insert a bonus of 5 only if her team is in some EU project

For every professor, insert a bonus of X for all EU projects with a duration of X years,

that her team is involved in

We want to insert (delete) data in PXML.

We want to do it conditionally.

(25)

Update operation (q, n, t):

q - condition query (formally will be defined later) n - locator of the update

t - the actual new data (tree) to be inserted

For every professor, insert a bonus of 5 only if her team is in some EU project

For every professor, insert a bonus of X for all EU projects with a duration of X years,

that her team is involved in

Structure of Updates

qn,t

(26)

Update operation (q, n, t):

q - condition query (formally will be defined later) n - locator of the update

t - the actual new data (tree) to be inserted

For every professor, insert a bonus of 5 only if her team is in some EU project

For every professor, insert a bonus of X for all EU projects with a duration of X years,

that her team is involved in

Structure of Updates

qn,t

(27)

Update operation (q, n, t):

q - condition query (formally will be defined later) n - locator of the update

t - the actual new data (tree) to be inserted

For every professor, insert a bonus of 5 only if her team is in some EU project

For every professor, insert a bonus of X for all EU projects with a duration of X years,

that her team is involved in

Structure of Updates

qn,t

(28)

Update operation (q, n, t):

q - condition query (formally will be defined later) n - locator of the update

t - the actual new data (tree) to be inserted

For every professor, insert a bonus of 5 only if her team is in some EU project

For every professor, insert a bonus of X for all EU projects with a duration of X years,

that her team is involved in

Structure of Updates

qn,t

(29)

Update operation (q, n, t):

q - condition query (formally will be defined later) n - locator of the update

t - the actual new data (tree) to be inserted

For every professor, insert a bonus of 5 only if her team is in some EU project

For every professor, insert a bonus of X for all EU projects with a duration of X years,

that her team is involved in

Structure of Updates

qn,t

(30)

Update operation (q, n, t):

q - condition query (formally will be defined later) n - locator of the update

t - the actual new data (tree) to be inserted Inspired by 2 update languages for XML

XUpdate, based on XPath

XQuery Update Facility, based on XQuery

For every professor, insert a bonus of 5 only if her team is in some EU project

For every professor, insert a bonus of X for all EU projects with a duration of X years,

that her team is involved in

Structure of Updates

qn,t

(31)

a. (Restricted) Single-Path updates - (R)SP b. Tree-Pattern updates - TP

c. Tree-Pattern updates with Joins - TPJ

root

EU

project prof team

name bonus

! root

prof team

bonus 3

Types of Updates

c.

a. b. root

name X prof team

KRDB id

name X prof team

bonus 6 DBWeb

id

(32)

a. (Restricted) Single-Path updates - (R)SP b. Tree-Pattern updates - TP

c. Tree-Pattern updates with Joins - TPJ

root

EU

project prof team

name bonus

! root

prof team

bonus 3

Types of Updates

c.

a. b. root

name X prof team

KRDB id

name X prof team

bonus 6 DBWeb

id

(33)

a. (Restricted) Single-Path updates - (R)SP b. Tree-Pattern updates - TP

c. Tree-Pattern updates with Joins - TPJ

root

EU

project prof team

name bonus

! root

prof team

bonus 3

Types of Updates

c.

a. b. root

name X prof team

KRDB id

name X prof team

bonus 6 DBWeb

id

(34)

a. (Restricted) Single-Path updates - (R)SP b. Tree-Pattern updates - TP

c. Tree-Pattern updates with Joins - TPJ

root

EU

project prof team

name bonus

! root

prof team

bonus 3

Types of Updates

c.

a. b. root

name X prof team

KRDB id

name X prof team

bonus 6 DBWeb

id

(35)

For every professor, insert a bonus of 5 only if her team is in some EU project

Only-if semantics:

Inserts at most one bonus per professor

For every professor, insert a bonus of X for all EU projects with a duration of X years,

that her team is involved in

For-all semantics:

Inserts possibly many bonuses for professors

Semantics of Insertions

(36)

Only-if semantics:

For every match of n, if there is a match of q, then insert t under n

For-all semantics:

For every match of n, for all k matches of q, insert t under n k-times

root

EU

project prof team

name bonus

Y !

Semantics of Updates

for XML Documents

(37)

Only-if semantics:

For every match of n, if there is a match of q, then insert t under n

For-all semantics:

For every match of n, for all k matches of q, insert t under n k-times

root

EU

project prof team

name bonus

Y !

Semantics of Updates for XML Documents

n

(38)

Only-if semantics:

For every match of n, if there is a match of q, then insert t under n

For-all semantics:

For every match of n, for all k matches of q, insert t under n k-times

root

EU

project prof team

name bonus

Y !

Semantics of Updates for XML Documents

q

(39)

Only-if semantics:

For every match of n, if there is a match of q, then insert t under n

For-all semantics:

For every match of n, for all k matches of q, insert t under n k-times

root

EU

project prof team

name bonus

Y !

Semantics of Updates for XML Documents

t

(40)

Only-if semantics:

For every match of n, if there is a match of q, then insert t under n

For-all semantics:

For every match of n, for all k matches of q, insert t under n k-times

root

EU

project prof team

name bonus

Y !

Semantics of Updates

for XML Documents

(41)

Fire a professor if

her team is in a EU project

For every match of n,

if there is a match of q, then

delete n and all its descendants

There is only one semantics

for deletions, that is similar to Only-if semantics

Deletions

root

EU

project prof team

Deletion operation: (q, n)

(42)

Fire a professor if

her team is in a EU project

For every match of n,

if there is a match of q, then

delete n and all its descendants

There is only one semantics

for deletions, that is similar to Only-if semantics

Deletions

n

root

EU

project prof team

Deletion operation: (q, n)

(43)

Fire a professor if

her team is in a EU project

For every match of n,

if there is a match of q, then

delete n and all its descendants

There is only one semantics

for deletions, that is similar to Only-if semantics

Deletions

q root

EU

project prof team

Deletion operation: (q, n)

(44)

Fire a professor if

her team is in a EU project

For every match of n,

if there is a match of q, then

delete n and all its descendants

There is only one semantics

for deletions, that is similar to Only-if semantics

Deletions

root

EU

project prof team

Deletion operation: (q, n)

(45)

Updating PXML Documents

D: PXML doc

root

grant FP6

name Tones project

team team

id prof name bonus

MUX

3 4 5

Bogdan DBWeb

.6 .1 .3

(46)

Updating PXML Documents

P = 0.3 P = 0.2 P = 0.5

Probability space of docs D: PXML doc

semantics

root

grant FP6

name Tones KRDB

id

project prof

team team

name bonus id

Diego 2 DBWeb

¬

root team id

prof

name bonus 4 Bogdan DBWeb

root

KRDB id team

root

grant FP6

name Tones project

team team

id prof name bonus

MUX

3 4 5

Bogdan DBWeb

.6 .1 .3

(47)

Updating PXML Documents

P = 0.3 P = 0.2 P = 0.5

d

Probability space of docs D: PXML doc

semantics

root

grant FP6

name Tones KRDB

id

project prof

team team

name bonus id

Diego 2 DBWeb

¬

root team id

prof

name bonus 4 Bogdan DBWeb

root

KRDB id team

1 d2

P = 0.7 P = 0.3

root

grant FP6

name Tones project

team team

id prof name bonus

MUX

3 4 5

Bogdan DBWeb

.6 .1 .3

Updated prob. space of docs qn,t

qn,t qn,t

(48)

Updating PXML Documents

P = 0.3 P = 0.2 P = 0.5

d

Probability space of docs D: PXML doc

semantics

root

grant FP6

name Tones KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ e e

.6 .1 .3

¬

root

grant FP6

name Tones KRDB

id

project prof

team team

name bonus id

Diego 2 DBWeb

¬

root team id

prof

name bonus 4 Bogdan DBWeb

root

KRDB id team

1 d2

P = 0.7 P = 0.3

root

grant FP6

name Tones project

team team

id prof name bonus

MUX

3 4 5

Bogdan DBWeb

.6 .1 .3

D : PXML doc1 Updated prob. space of docs

semantics

qn,t qn,t

qn,t qn,t

(49)

We want to study computation of representations of updates

Given a p-document D and update operation

Is it possible to compute a p-document D that represents the update?

How hard is the computation?

Problems to Investigate

qn,t

(50)

Outline

1. Probabilistic data

2. Problem of updates

3. Updating desecrate PXML

4. Updating continuous PXML

(51)

Querying PXML with Tree-Pattern Queries

[Kimelfed&al:2007], [Senellart&al:2007]

*

Queries Distr. nodes Event conjunct. Event formulas TP

TPJ

P #P-complete

#P-complete

* *

#P functions - counting counterparts of NP problems.

E.g: counting sat.-assignments for prop. CNF formulas.

Believed to be harder than NP.

(52)

Only-if Insertions:

Data Complexity

Only-if Distr. nodes Event conjunct Event formulas RSP

SP TP TPJ

L L L

P #P-hard Linear

? #P-hard P

#P-hard #P-hard P

only for queries without descendent edges

*

Linear P *

?

#P-hard P

#P-hard

The same table holds for deletions

(53)

Only-if Insertions:

Data Complexity

Only-if Distr. nodes Event conjunct Event formulas RSP

SP TP TPJ

L L L

P #P-hard Linear

? #P-hard P

#P-hard #P-hard P

only for queries without descendent edges

*

Linear P *

?

#P-hard P

#P-hard

The same table holds for deletions

(54)

Only-if Insertions:

Data Complexity

Only-if Distr. nodes Event conjunct Event formulas RSP

SP TP TPJ

L L L

P #P-hard Linear

? #P-hard P

#P-hard #P-hard P

only for queries without descendent edges

*

Linear P *

?

#P-hard P

#P-hard

The same table holds for deletions

(55)

Updating PXML: Example

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

root

EU

project prof team

name bonus

Y !

Only-if semantics:

For every match of n, if there is a match of q, then insert t under n

in this case only-if and for-all semantics coincide

Query: Data:

(56)

Updating PXML: Example

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

root

EU

project prof team

name bonus

Y !

Only-if semantics:

For every match of n, if there is a match of q, then insert t under n

in this case only-if and for-all semantics coincide

Query: Data:

(57)

Updating PXML: Example

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

root

EU

project prof team

name bonus

Y !

Only-if semantics:

For every match of n, if there is a match of q, then insert t under n

in this case only-if and for-all semantics coincide

Query: Data:

(58)

Updating PXML: Example

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

root

EU

project prof team

name bonus

Y !

Only-if semantics:

For every match of n, if there is a match of q, then insert t under n

in this case only-if and for-all semantics coincide

Query: Data:

(59)

Updating PXML: Example

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

root

EU

project prof team

name bonus

Y !

Only-if semantics:

For every match of n, if there is a match of q, then insert t under n

bonus 5

¬ c

in this case only-if and for-all semantics coincide

Query: Data:

(60)

Updating PXML: Example

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

root

EU

project prof team

name bonus

Y !

Only-if semantics:

For every match of n, if there is a match of q, then insert t under n

bonus 5

¬ c

in this case only-if and for-all semantics coincide

Query: Data:

(61)

Updating PXML: Example

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

root

EU

project prof team

name bonus

Y !

Only-if semantics:

For every match of n, if there is a match of q, then insert t under n

bonus 5

¬ c

in this case only-if and for-all semantics coincide

Query: Data:

(62)

Updating PXML: Example

root

name Tones

grant EU KRDB

id

project prof

team team

name bonus id

prof

name bonus

MUX

3 4 5

Diego 2 Bogdan DBWeb

¬ c c

.6 .1 .3 dur.

3

root

EU

project prof team

name bonus

Y !

Only-if semantics:

For every match of n, if there is a match of q, then insert t under n

bonus 5

¬ c

in this case only-if and for-all semantics coincide

Query: Data:

(63)

not in PSPACE

For-all mux,det exp Event conj Event formulas RSP

SP TP TPJ

L/P L/P L/P L/P

L/P L/P

P P

P P

not in PTIME

For-all Insertions:

Data Complexity

the computation is not in PSPACE, from [Abiteboul&al.:2009]

*

not in PTIME

not in PTIME, #P-hard * Linear/P

* Linear/P

P

Linear/P: Linear for queries w/o descendent edges, Polynomial otherwise

Distributional nodes

(64)

Outline

1. Probabilistic data

2. Problem of updates

3. Updating desecrate PXML

4. Updating continuous PXML

(65)

Continuous PXML

Probabilistic p-documents with

continuous distributions stored on the leaves

Semantics defined in terms of

continuous sets of XML documents

N(30, 4) - Normal

distribution

root

t 1

vl

N(30,4) i25

id mes mes

sensor

t vl

2 N(30,4)

(66)

probability of the insertion (event) is 1/2

the update is not representable

with event formulas and distributions on leaves:

we need correlations between distributions

Problems with Updates

Insert an alerter “increases”

for a sensor only-if

the second measurement is greater than the first one

root

t 1

vl N(30,4) i25

id mes mes

sensor

t vl

2 N(30,4)

(67)

Conclusion

Comprehensive picture of updates’ complexity:

Discrete PXML models with

distributional nodes and event formulas

RSP, SP, TP and TPJ update operations

Polynomial algorithm for SP update operations without descendent edges

Results can be generalized to other PXML models and probabilistic updates

Continuous PXML: problems are highlighted

(68)

Thank you

(69)

References

[Kimelfeld&al:2007] - Benny Kimelfeld, Yehoshua Sagiv: Matching Twigs in Probabilistic XML. VLDB 2007: 27-38

[Senellart&al:2007] -Pierre Senellart, Serge Abiteboul: On the complexity of managing probabilistic XML data. PODS 2007: 283-292

Références

Documents relatifs

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come

Brief teaching and research profile: Research and teaching within the area of mathematical programming and discrete mathematics. The teaching will be principally within the

In the mid-1980s the second author observed that for a connected reductive complex algebraic group G the singular support of any character sheaf on G is contained in a fixed

This pro- gramme involves nongovernmental or- ganizations (the Malian Association for the Blind, and the Organization for the Prevention of Blindness in France),

As mentioned (see (12b) above), in the 1st conjugation, diphthongization is stable and does not depend on the position of stress: a verb such as suonare, then, is built on a

In this exercise we define the structure sheaf on the projective line P 1 (the case of the projective space P n

As a first result towards the 1-2-3 Conjecture, Karo´ nski, Luczak and Thomason proved that there is a set W of 183 real weights such that any graph with no isolated edge admits

This article is inspired from the YouTube video “Ogni numero `e l’inizio di una potenza di 2” by MATH-segnale (www.youtube.com/mathsegnale) and from the article “Elk natuurlijk