• Aucun résultat trouvé

Querying and Updating Probabilistic Information in XML

N/A
N/A
Protected

Academic year: 2022

Partager "Querying and Updating Probabilistic Information in XML"

Copied!
62
0
0

Texte intégral

(1)

Querying and Updating Probabilistic Information in XML

Serge Abiteboul Pierre Senellart

GEMO Seminar

March 10th, 2006

(2)

Introduction The Hidden Web

The Hidden Web

Definition (Hidden Web)

The set of webpages (which may or may not be dynamically

generated) not accessible from the hyperlinked structure of the World Wide Web.

Size estimate (2001) : 500 times larger than the surface Web.

How to understand it and benefit from its content?

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 2 / 27

(3)

Introduction The Hidden Web

The Hidden Web

Definition (Hidden Web)

The set of webpages (which may or may not be dynamically

generated) not accessible from the hyperlinked structure of the World Wide Web.

Size estimate (2001) : 500 times larger than the surface Web.

How to understand it and benefit from its content?

(4)

Introduction The Hidden Web

The Hidden Web

Definition (Hidden Web)

The set of webpages (which may or may not be dynamically

generated) not accessible from the hyperlinked structure of the World Wide Web.

Size estimate (2001) : 500 times larger than the surface Web.

How to understand it and benefit from its content?

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 2 / 27

(5)

Introduction Semantic Interpretation of the Hidden Web

Semantic Interpretation of the Hidden Web

World Wide Web

(6)

Introduction Semantic Interpretation of the Hidden Web

Semantic Interpretation of the Hidden Web

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 3 / 27

(7)

Introduction Semantic Interpretation of the Hidden Web

Semantic Interpretation of the Hidden Web

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

(8)

Introduction Semantic Interpretation of the Hidden Web

Semantic Interpretation of the Hidden Web

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 3 / 27

(9)

Introduction Semantic Interpretation of the Hidden Web

Semantic Interpretation of the Hidden Web

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Web Services Index indexing

Web Services Index indexing

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

(10)

Introduction Semantic Interpretation of the Hidden Web

Semantic Interpretation of the Hidden Web

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Web Services Index indexing

Web Services Index indexing

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Results Queries querying

Web Services Index indexing

Web Services Index indexing

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 3 / 27

(11)

Introduction A Probabilistic XML Warehouse

A Probabilistic XML Warehouse

Query interface (tree-pattern with

join) Update interface

(insertions, deletions)

Probabilistic XML Warehouse

Module 1transactionUpdate

}

Module 2 Module 3 + confidence

+ confidence

Query Results

(12)

Framework

Outline

1 Introduction

2 Framework Data Trees Queries Updates

3 Possible Worlds Model

4 Simple Probabilistic Model

5 Fuzzy Tree Model

6 Conclusion

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 5 / 27

(13)

Framework Data Trees

Data Trees

Finite, unordered, trees.

No attribute nodes, no mixed content.

“Multiset” children

A

B B E D

val val’ C F

val"

val"

(14)

Framework Data Trees

Data Trees

Finite, unordered, trees.

No attribute nodes, no mixed content.

“Multiset” children

A

B B E D

val val’ C F

val"

val"

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 6 / 27

(15)

Framework Data Trees

Data Trees

Finite, unordered, trees.

No attribute nodes, no mixed content.

“Multiset” children

A

B B E D

val val’ C F

val"

val"

(16)

Framework Queries

Tree-Pattern With Join Queries

Queries: Tree-Pattern With Join (TPWJ)

Join: by value

Actually: any positive query

A

B C

+

D

val

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 7 / 27

(17)

Framework Queries

Tree-Pattern With Join Queries

Queries: Tree-Pattern With Join (TPWJ) Join: by value

Actually: any positive query

A

B C

+

D

val

(18)

Framework Queries

Tree-Pattern With Join Queries

Queries: Tree-Pattern With Join (TPWJ) Join: by value

Actually: any positive query

A

B C

+

D

val

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 7 / 27

(19)

Framework Updates

Update Transactions

Set of elementary operations:

Insertions of subtrees Deletions of subtrees

TPWJ query + mapping, stating where to perform operations.

Probabilistic update: update + confidence

(20)

Framework Updates

Update Transactions

Set of elementary operations:

Insertions of subtrees

Deletions of subtrees

TPWJ query + mapping, stating where to perform operations.

Probabilistic update: update + confidence

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 8 / 27

(21)

Framework Updates

Update Transactions

Set of elementary operations:

Insertions of subtrees Deletions of subtrees

TPWJ query + mapping, stating where to perform operations.

Probabilistic update: update + confidence

(22)

Framework Updates

Update Transactions

Set of elementary operations:

Insertions of subtrees Deletions of subtrees

TPWJ query + mapping, stating where to perform operations.

Probabilistic update: update + confidence

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 8 / 27

(23)

Framework Updates

Update Transactions

Set of elementary operations:

Insertions of subtrees Deletions of subtrees

TPWJ query + mapping, stating where to perform operations.

Probabilistic update: update + confidence

(24)

Possible Worlds Model

Outline

1 Introduction

2 Framework

3 Possible Worlds Model Model

Queries Updates

4 Simple Probabilistic Model

5 Fuzzy Tree Model

6 Conclusion

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 9 / 27

(25)

Possible Worlds Model Model

PW Model

Set of tree/probability pairs, one for each possible world.

A

C

A

C

D

A

B C

A

B C

D

P = 0.06 P = 0.14 P = 0.24 P = 0.56

(26)

Possible Worlds Model Model

PW Model

Set of tree/probability pairs, one for each possible world.

A

C

A

C

D

A

B C

A

B C

D

P = 0.06 P = 0.14 P = 0.24 P = 0.56

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 10 / 27

(27)

Possible Worlds Model Queries

Queries on PW sets

Definition

If T = { (t i , p i ) } , the result of query Q over the PW set T is the normalization of { (t, p i ) | t ∈ Q(t i ) }

Normalization: grouping of duplicates (summing probabilities)

(t, p) ∈ Q(T ) means “the probability that t matches T is p”.

(28)

Possible Worlds Model Queries

Queries on PW sets

Definition

If T = { (t i , p i ) } , the result of query Q over the PW set T is the normalization of { (t, p i ) | t ∈ Q(t i ) }

Normalization: grouping of duplicates (summing probabilities)

(t , p) ∈ Q(T ) means “the probability that t matches T is p”.

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 11 / 27

(29)

Possible Worlds Model Queries

Queries on PW sets

Definition

If T = { (t i , p i ) } , the result of query Q over the PW set T is the normalization of { (t, p i ) | t ∈ Q(t i ) }

Normalization: grouping of duplicates (summing probabilities)

(t, p) ∈ Q(T ) means “the probability that t matches T is p”.

(30)

Possible Worlds Model Updates

Updates on PW sets

Definition

The result of an update t with confidence c on a PW set T is the normalization of:

{(t, p) ∈ T | t is not selected by Q } S { (τ (t), p · c) | t is selected by Q } S { (t, p · (1 − c)) | t is selected by Q }

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 12 / 27

(31)

Simple Probabilistic Model

Outline

1 Introduction

2 Framework

3 Possible Worlds Model

4 Simple Probabilistic Model

Model and Possible Worlds Semantics Queries

Incompleteness

5 Fuzzy Tree Model

(32)

Simple Probabilistic Model Model and Possible Worlds Semantics

SP Trees

Data tree with probability assigned to each node.

A

B 0 . 8

C

D 0 . 7

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 14 / 27

(33)

Simple Probabilistic Model Model and Possible Worlds Semantics

PW Semantics of SP Trees

A node is assigned the probability p: means that the probability the node is in the tree if its parent is in the tree is p.

A

C

A

C

D

A

B C

A

B C

D

P = 0.06 P = 0.14 P = 0.24 P = 0.56

(34)

Simple Probabilistic Model Queries

Queries on SP Trees

Definition

Queries on SP trees:

Query on underlying tree.

Probabilities: multiplication of probabilities of nodes of the mapping.

Theorem Q( J T K ) = Q(T )

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 16 / 27

(35)

Simple Probabilistic Model Queries

Queries on SP Trees

Definition

Queries on SP trees:

Query on underlying tree.

Probabilities: multiplication of probabilities of nodes of the mapping.

Theorem

Q( J T K ) = Q(T )

(36)

Simple Probabilistic Model Incompleteness

Incompleteness of SP Trees

Theorem

The SP tree model is incomplete.

A

C

A

C

D

A

B C

P = 0.06 P = 0.70 P = 0.24

Theorem

SP trees are not closed under updates.

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 17 / 27

(37)

Simple Probabilistic Model Incompleteness

Incompleteness of SP Trees

Theorem

The SP tree model is incomplete.

A

C

A

C

D

A

B C

P = 0.06 P = 0.70 P = 0.24 Theorem

SP trees are not closed under updates.

(38)

Fuzzy Tree Model

Outline

1 Introduction

2 Framework

3 Possible Worlds Model

4 Simple Probabilistic Model

5 Fuzzy Tree Model

Model and Possible Worlds Semantics Queries

Updates

Implementation

6 Conclusion

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 18 / 27

(39)

Fuzzy Tree Model Model and Possible Worlds Semantics

Fuzzy Trees

Data tree with event conditions (conjunction of probabilistic events or negations of probabilistic events) assigned to each node.

A

B w 1 , ¬ w 2

C

D w 2

Event Proba.

w1 0.8

w2 0.7

(40)

Fuzzy Tree Model Model and Possible Worlds Semantics

PW Semantics of Fuzzy Trees

A node is assigned the condition C: means that this node is in the tree if C is true (and if its parent is in the tree).

A

C

A

C

D

A

B C

P = 0.06 P = 0.70 P = 0.24

Theorem

The fuzzy tree model is as expressive as the PW model.

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 20 / 27

(41)

Fuzzy Tree Model Model and Possible Worlds Semantics

PW Semantics of Fuzzy Trees

A node is assigned the condition C: means that this node is in the tree if C is true (and if its parent is in the tree).

A

C

A

C

D

A

B C

P = 0.06 P = 0.70 P = 0.24

Theorem

The fuzzy tree model is as expressive as the PW model.

(42)

Fuzzy Tree Model Queries

Queries on Fuzzy Trees

Definition

Queries on fuzzy trees:

Query on underlying tree.

Probabilities: probability of the conjunction of the conditions of nodes of the mapping.

Theorem Q( J T K ) = Q(T )

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 21 / 27

(43)

Fuzzy Tree Model Queries

Queries on Fuzzy Trees

Definition

Queries on fuzzy trees:

Query on underlying tree.

Probabilities: probability of the conjunction of the conditions of nodes of the mapping.

Theorem

Q( J T K ) = Q(T )

(44)

Fuzzy Tree Model Updates

Updates on Fuzzy Trees

Insertions: no problem. Conditions required for the query to match added to inserted nodes.

Deletions: ok, but more problematic. May yield an exponential growth of the fuzzy tree in case of complex dependencies.

Theorem

If (τ, c) is a probabilistic update transition and T a fuzzy tree: J (τ, c)(T ) K = (τ, c)( J T K )

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 22 / 27

(45)

Fuzzy Tree Model Updates

Updates on Fuzzy Trees

Insertions: no problem. Conditions required for the query to match added to inserted nodes.

Deletions: ok, but more problematic. May yield an exponential growth of the fuzzy tree in case of complex dependencies.

Theorem

If (τ, c) is a probabilistic update transition and T a fuzzy tree:

J (τ, c)(T ) K = (τ, c)( J T K )

(46)

Fuzzy Tree Model Updates

Updates on Fuzzy Trees

Insertions: no problem. Conditions required for the query to match added to inserted nodes.

Deletions: ok, but more problematic. May yield an exponential growth of the fuzzy tree in case of complex dependencies.

Theorem

If (τ, c) is a probabilistic update transition and T a fuzzy tree:

J (τ, c)(T ) K = (τ, c)( J T K )

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 22 / 27

(47)

Fuzzy Tree Model Updates

Example: Deduplication

Deduplication of the two B nodes with confidence 0.9.

A

B w 1

B w 2

Event Proba.

w1 0.8

w2 0.7

A

B w 1 , ¬ w 3

B w 1 , w 3 , ¬ w 2

B w 2 , ¬ w 3

B w 2 , w 3 , ¬ w 1

B w 1 , w 2 , w 3

Event Proba.

w1 0.8

w2 0.7

w3 0.9

(48)

Fuzzy Tree Model Updates

Example: Deduplication

Deduplication of the two B nodes with confidence 0.9.

A

B w 1

B w 2

Event Proba.

w1 0.8

w2 0.7

A

B w 1 , ¬ w 3

B w 1 , w 3 , ¬ w 2

B w 2 , ¬ w 3

B w 2 , w 3 , ¬ w 1

B w 1 , w 2 , w 3

Event Proba.

w1 0.8

w2 0.7

w3 0.9

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 23 / 27

(49)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

cf Demo

(50)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next)

Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

cf Demo

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 24 / 27

(51)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine

Updates expressed in XUpdate

cf Demo

(52)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

cf Demo

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 24 / 27

(53)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

cf Demo

(54)

Conclusion

Outline

1 Introduction

2 Framework

3 Possible Worlds Model

4 Simple Probabilistic Model

5 Fuzzy Tree Model

6 Conclusion Summary Perspectives

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 25 / 27

(55)

Conclusion Summary

Summary

A model for representing probabilistic information in a semi-structured database.

Special focus on updating, for managing a probabilistic warehouse.

More expressive and as concise as a simple model, more concise

and as expressive as the naïve model.

(56)

Conclusion Summary

Summary

A model for representing probabilistic information in a semi-structured database.

Special focus on updating, for managing a probabilistic warehouse.

More expressive and as concise as a simple model, more concise and as expressive as the naïve model.

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 26 / 27

(57)

Conclusion Summary

Summary

A model for representing probabilistic information in a semi-structured database.

Special focus on updating, for managing a probabilistic warehouse.

More expressive and as concise as a simple model, more concise

and as expressive as the naïve model.

(58)

Conclusion Perspectives

Perspectives

Implementation finalization.

Full complexity analysis.

More expressive queries (negative joints?), while keeping efficency.

Optimizations, simplifications of a fuzzy tree.

Validation of fuzzy trees.

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 27 / 27

(59)

Conclusion Perspectives

Perspectives

Implementation finalization.

Full complexity analysis.

More expressive queries (negative joints?), while keeping efficency.

Optimizations, simplifications of a fuzzy tree.

Validation of fuzzy trees.

(60)

Conclusion Perspectives

Perspectives

Implementation finalization.

Full complexity analysis.

More expressive queries (negative joints?), while keeping efficency.

Optimizations, simplifications of a fuzzy tree.

Validation of fuzzy trees.

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 27 / 27

(61)

Conclusion Perspectives

Perspectives

Implementation finalization.

Full complexity analysis.

More expressive queries (negative joints?), while keeping efficency.

Optimizations, simplifications of a fuzzy tree.

Validation of fuzzy trees.

(62)

Conclusion Perspectives

Perspectives

Implementation finalization.

Full complexity analysis.

More expressive queries (negative joints?), while keeping efficency.

Optimizations, simplifications of a fuzzy tree.

Validation of fuzzy trees.

Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 27 / 27

Références

Documents relatifs

The correct data coordination is the crucial in peer-to-peer databases, it is necessary for correct processing of queries and other database requests.. Our work is focused on updates

We shall assume that time and uncertainty are subjectively measured by an agent’s preferences if they satisfy some axioms and if they establish a hierarchy between these two

Updating Choquet valuation and discounting information arrivals.. André Lapied,

As in this data cube representation (subsection II-E), each cuboid over a

Running time of the different algorithms on a given query of. the

 Processes the lineage formula to get the probability of the query (and of each matching

The PrXML data model has been extended to represent continuous distributions of data values (Abiteboul et al., 2010), and to capture infinite probability spaces of documents where

The contributions of this paper are as follows: (i) a general picture of update tractability for various query languages and probabilistic XML models, with complexity results; (ii)