• Aucun résultat trouvé

Querying and Updating Probabilistic Information in XML

N/A
N/A
Protected

Academic year: 2022

Partager "Querying and Updating Probabilistic Information in XML"

Copied!
66
0
0

Texte intégral

(1)

Querying and Updating Probabilistic Information in XML

Serge Abiteboul Pierre Senellart

ICCL Summer School 2006 July 3rd, 2006

(2)

Introduction Understanding the Hidden Web

The Hidden Web

Definition (Hidden Web)

The set of webpages (which may or may not be dynamically

generated) not accessible from thehyperlinked structureof the World Wide Web.

Size estimate (2001): 500 times larger than thesurface Web.

How to understand it and benefit from its content?

(3)

Introduction Understanding the Hidden Web

The Hidden Web

Definition (Hidden Web)

The set of webpages (which may or may not be dynamically

generated) not accessible from thehyperlinked structureof the World Wide Web.

Size estimate (2001): 500 times larger than thesurface Web.

How to understand it and benefit from its content?

(4)

Introduction Understanding the Hidden Web

The Hidden Web

Definition (Hidden Web)

The set of webpages (which may or may not be dynamically

generated) not accessible from thehyperlinked structureof the World Wide Web.

Size estimate (2001): 500 times larger than thesurface Web.

How to understand it and benefit from its content?

(5)

Introduction Understanding the Hidden Web

Semantic Interpretation of the Hidden Web

World Wide Web

(6)

Introduction Understanding the Hidden Web

Semantic Interpretation of the Hidden Web

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

(7)

Introduction Understanding the Hidden Web

Semantic Interpretation of the Hidden Web

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

(8)

Introduction Understanding the Hidden Web

Semantic Interpretation of the Hidden Web

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

(9)

Introduction Understanding the Hidden Web

Semantic Interpretation of the Hidden Web

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Web Services Index indexing

Web Services Index indexing

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

(10)

Introduction Understanding the Hidden Web

Semantic Interpretation of the Hidden Web

World Wide Web

HTML form WSDL discovery UDDI

World Wide Web

WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Web Services Index indexing

Web Services Index indexing

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

Results Queries querying

Web Services Index indexing

Web Services Index indexing

analysis

Analyzed Web Services WSDL wrappers HTML form

WSDL discovery UDDI

World Wide Web

(11)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing Data Cleaning

Schema Matching . . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(12)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing Data Cleaning

Schema Matching . . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(13)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing

Data Cleaning Schema Matching . . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(14)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing Data Cleaning

Schema Matching . . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(15)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing Data Cleaning

Schema Matching

. . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(16)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing Data Cleaning

Schema Matching . . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(17)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing Data Cleaning

Schema Matching . . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(18)

Introduction A Probabilistic XML Warehouse

A Probabilistic XML Warehouse

Query interface (tree-pattern with

join) Update interface

(insertions, deletions)

Probabilistic XML Warehouse

Module 1transactionUpdate

}

Module 2 Module 3 + confidence

+ confidence

Query Results

(19)

Framework

Outline

1 Introduction

2 Framework Data Trees Queries Updates

3 Possible Worlds Model

4 Simple Probabilistic Model

5 Fuzzy Tree Model

6 Conclusion

(20)

Framework Data Trees

Data Trees

Finite,unordered, trees.

No distinction betweenattributeand element nodes. Nomixedcontent.

A

B B E D

foo foo C F

b a r

n e e

(21)

Framework Data Trees

Data Trees

Finite,unordered, trees.

No distinction betweenattributeand element nodes.

Nomixedcontent.

A

B B E D

foo foo C F

b a r

n e e

(22)

Framework Data Trees

Data Trees

Finite,unordered, trees.

No distinction betweenattributeand element nodes.

Nomixedcontent.

A

B B E D

foo foo C F

b a r

n e e

(23)

Framework Queries

Tree-Pattern With Join Queries

Queries: Tree-Pattern With Join(TPWJ) (standard subset of XQuery)

Join: by value

Result: minimal subtree containing all the nodes mapped by the query

A

B C

+

D

val

(24)

Framework Queries

Tree-Pattern With Join Queries

Queries: Tree-Pattern With Join(TPWJ) (standard subset of XQuery)

Join: by value

Result: minimal subtree containing all the nodes mapped by the query

A

B C

+

D

val

(25)

Framework Queries

Tree-Pattern With Join Queries

Queries: Tree-Pattern With Join(TPWJ) (standard subset of XQuery)

Join: by value

Result: minimal subtree containing all the nodes mapped by the query

A

B C

+

D

val

(26)

Framework Updates

Update Transactions

Setof elementary operations:

Insertionsof subtrees Deletionsof subtrees

Update Transaction: TPWJ query + mapping, statingwhereto perform operations.

Probabilistic update: update +confidence

(27)

Framework Updates

Update Transactions

Setof elementary operations:

Insertionsof subtrees

Deletionsof subtrees

Update Transaction: TPWJ query + mapping, statingwhereto perform operations.

Probabilistic update: update +confidence

(28)

Framework Updates

Update Transactions

Setof elementary operations:

Insertionsof subtrees Deletionsof subtrees

Update Transaction: TPWJ query + mapping, statingwhereto perform operations.

Probabilistic update: update +confidence

(29)

Framework Updates

Update Transactions

Setof elementary operations:

Insertionsof subtrees Deletionsof subtrees

Update Transaction: TPWJ query + mapping, statingwhereto perform operations.

Probabilistic update: update +confidence

(30)

Framework Updates

Update Transactions

Setof elementary operations:

Insertionsof subtrees Deletionsof subtrees

Update Transaction: TPWJ query + mapping, statingwhereto perform operations.

Probabilistic update: update +confidence

(31)

Possible Worlds Model

Outline

1 Introduction

2 Framework

3 Possible Worlds Model Model

Semantic Foundation

4 Simple Probabilistic Model

5 Fuzzy Tree Model

6 Conclusion

(32)

Possible Worlds Model Model

Possible Worlds Model

Semantic foundationfor probabilistic data: possible worlds model.

Set oftree/probability pairs, one for eachpossible world.

A

C

A

C

D

A

B C

A

B C

D

P =0.06 P =0.14 P =0.24 P =0.56

(33)

Possible Worlds Model Model

Possible Worlds Model

Semantic foundationfor probabilistic data: possible worlds model.

Set oftree/probability pairs, one for eachpossible world.

A

C

A

C

D

A

B C

A

B C

D

P =0.06 P =0.14 P =0.24 P =0.56

(34)

Possible Worlds Model Semantic Foundation

Queries, Updates: Semantic Foundation

Definition

IfT ={(ti,pi)}, the result of queryQover the Possible Worlds setT is the normalization of{(t,pi)|t ∈Q(ti)}

Definition

The result of an updateτ with queryQand confidencec on a Possible Worlds setT is the normalization of:

{(t,p)∈T |tis not selected by Q} S {(τ(t),p·c)|t is selected by Q} S {(t,p·(1−c))|t is selected by Q}

(35)

Possible Worlds Model Semantic Foundation

Queries, Updates: Semantic Foundation

Definition

IfT ={(ti,pi)}, the result of queryQover the Possible Worlds setT is the normalization of{(t,pi)|t ∈Q(ti)}

Definition

The result of an updateτ with queryQand confidencec on a Possible Worlds setT is the normalization of:

{(t,p)∈T |tis not selected by Q} S {(τ(t),p·c)|t is selected by Q} S {(t,p·(1−c))|t is selected by Q}

(36)

Simple Probabilistic Model

Outline

1 Introduction

2 Framework

3 Possible Worlds Model

4 Simple Probabilistic Model

Model and Possible Worlds Semantics Incompleteness

5 Fuzzy Tree Model

6 Conclusion

(37)

Simple Probabilistic Model Model and Possible Worlds Semantics

SP Trees

Data tree withprobability assigned to each node.

A

B 0 . 8

C

D 0 . 7

(38)

Simple Probabilistic Model Model and Possible Worlds Semantics

PW Semantics of SP Trees

A node is assigned the probabilityp: means that the probability the node isin the treeifits parent is in the treeisp.

A

C

A

C

D

A

B C

A

B C

D

P =0.06 P =0.14 P =0.24 P =0.56

(39)

Simple Probabilistic Model Incompleteness

Incompleteness of SP Trees

Theorem

The SP tree model is incomplete.

A

C

A

C

D

A

B C

P=0.06 P =0.70 P =0.24

Theorem

SP trees are not closed under updates.

(40)

Simple Probabilistic Model Incompleteness

Incompleteness of SP Trees

Theorem

The SP tree model is incomplete.

A

C

A

C

D

A

B C

P=0.06 P =0.70 P =0.24 Theorem

SP trees are not closed under updates.

(41)

Fuzzy Tree Model

Outline

1 Introduction

2 Framework

3 Possible Worlds Model

4 Simple Probabilistic Model

5 Fuzzy Tree Model

Model and Possible Worlds Semantics Queries

Updates

Implementation

6 Conclusion

(42)

Fuzzy Tree Model Model and Possible Worlds Semantics

Fuzzy Trees

Data tree withevent conditions(conjunction of probabilistic events or negations of probabilistic events)assigned to each node.

A

B w 1 , ¬ w 2

C

D w 2

Event Proba.

w1 0.8

w2 0.7

semantics

−→

A

C

A

C

D

A

B C

P=0.06 P=0.70 P=0.24

Theorem

The Fuzzy Tree model is as expressive as the Possible Worlds model.

(43)

Fuzzy Tree Model Model and Possible Worlds Semantics

Fuzzy Trees

Data tree withevent conditions(conjunction of probabilistic events or negations of probabilistic events)assigned to each node.

A

B w 1 , ¬ w 2

C

D w 2

Event Proba.

w1 0.8

w2 0.7

semantics

−→

A

C

A

C

D

A

B C

P=0.06 P=0.70 P=0.24

Theorem

The Fuzzy Tree model is as expressive as the Possible Worlds model.

(44)

Fuzzy Tree Model Model and Possible Worlds Semantics

Fuzzy Trees

Data tree withevent conditions(conjunction of probabilistic events or negations of probabilistic events)assigned to each node.

A

B w 1 , ¬ w 2

C

D w 2

Event Proba.

w1 0.8

w2 0.7

semantics

−→

A

C

A

C

D

A

B C

P=0.06 P=0.70 P=0.24

Theorem

The Fuzzy Tree model is as expressive as the Possible Worlds model.

(45)

Fuzzy Tree Model Queries

Queries on Fuzzy Trees

Definition

Queries on fuzzy trees:

Queryon underlying tree.

Probabilities: probability of the conjunction of the conditions of nodes of the mapping.

Theorem

Fuzzy Tree Fuzzy Tree

Possible Worlds Possible Worlds

query

semantics

query

semantics

(46)

Fuzzy Tree Model Queries

Queries on Fuzzy Trees

Definition

Queries on fuzzy trees:

Queryon underlying tree.

Probabilities: probability of the conjunction of the conditions of nodes of the mapping.

Theorem

Fuzzy Tree Fuzzy Tree

Possible Worlds Possible Worlds

query

semantics

query

semantics

(47)

Fuzzy Tree Model Updates

Updates on Fuzzy Trees

Insertions: no problem. Conditions required for the query to match added to inserted nodes.

Deletions: ok, but more problematic. May yield an exponential growth of the fuzzy tree in case of complex dependencies.

Theorem

Fuzzy Tree Fuzzy Tree

Possible Worlds Possible Worlds

update

semantics

update

semantics

(48)

Fuzzy Tree Model Updates

Updates on Fuzzy Trees

Insertions: no problem. Conditions required for the query to match added to inserted nodes.

Deletions: ok, but more problematic. May yield an exponential growth of the fuzzy tree in case of complex dependencies.

Theorem

Fuzzy Tree Fuzzy Tree

Possible Worlds Possible Worlds

update

semantics

update

semantics

(49)

Fuzzy Tree Model Updates

Updates on Fuzzy Trees

Insertions: no problem. Conditions required for the query to match added to inserted nodes.

Deletions: ok, but more problematic. May yield an exponential growth of the fuzzy tree in case of complex dependencies.

Theorem

Fuzzy Tree Fuzzy Tree

Possible Worlds Possible Worlds

update

semantics

update

semantics

(50)

Fuzzy Tree Model Updates

Example: Conditional Replacement

Replacement ofCbyDifBis present, with confidence 0.9.

A

B w 1

C w 2

Event Proba.

w1 0.8

w2 0.7

A

B

w1

C

¬w1,w2

C

w1,w2,¬w3

D

w1,w2,w3

Event Proba.

w1 0.8

w2 0.7

w3 0.9

(51)

Fuzzy Tree Model Updates

Example: Conditional Replacement

Replacement ofCbyDifBis present, with confidence 0.9.

A

B w 1

C w 2

Event Proba.

w1 0.8

w2 0.7

A

B

w1

C

¬w1,w2

C

w1,w2,¬w3

D

w1,w2,w3

Event Proba.

w1 0.8

w2 0.7

w3 0.9

(52)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

Available freely at

http://pierre.senellart.com/software/fuzzyxml/ cf demo

http://pierre.senellart.com/software/fuzzyxml/demo/

(53)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next)

Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

Available freely at

http://pierre.senellart.com/software/fuzzyxml/ cf demo

http://pierre.senellart.com/software/fuzzyxml/demo/

(54)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine

Updates expressed in XUpdate Available freely at

http://pierre.senellart.com/software/fuzzyxml/ cf demo

http://pierre.senellart.com/software/fuzzyxml/demo/

(55)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

Available freely at

http://pierre.senellart.com/software/fuzzyxml/ cf demo

http://pierre.senellart.com/software/fuzzyxml/demo/

(56)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

Available freely at

http://pierre.senellart.com/software/fuzzyxml/

cf demo

http://pierre.senellart.com/software/fuzzyxml/demo/

(57)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

Available freely at

http://pierre.senellart.com/software/fuzzyxml/

cf demo

http://pierre.senellart.com/software/fuzzyxml/demo/

(58)

Conclusion

Outline

1 Introduction

2 Framework

3 Possible Worlds Model

4 Simple Probabilistic Model

5 Fuzzy Tree Model

6 Conclusion Summary Perspectives

(59)

Conclusion Summary

Summary

A model for representingprobabilisticinformation for semi-structureddata.

Soundandcompletesupport for an important subset of XQuery. Soundandcompletesupport for XUpdate-based transactions with inserts and deletes.

An implementation based on compilation to XQuery/XUpdate.

(60)

Conclusion Summary

Summary

A model for representingprobabilisticinformation for semi-structureddata.

Soundandcompletesupport for an important subset of XQuery.

Soundandcompletesupport for XUpdate-based transactions with inserts and deletes.

An implementation based on compilation to XQuery/XUpdate.

(61)

Conclusion Summary

Summary

A model for representingprobabilisticinformation for semi-structureddata.

Soundandcompletesupport for an important subset of XQuery.

Soundandcompletesupport for XUpdate-based transactions with inserts and deletes.

An implementation based on compilation to XQuery/XUpdate.

(62)

Conclusion Summary

Summary

A model for representingprobabilisticinformation for semi-structureddata.

Soundandcompletesupport for an important subset of XQuery.

Soundandcompletesupport for XUpdate-based transactions with inserts and deletes.

An implementation based on compilation to XQuery/XUpdate.

(63)

Conclusion Perspectives

Perspectives

Complexity analysis: query, update, simplification.

Queryoptimization. Fuzzy datasimplification.

Extensions: negation, some limited order.

(64)

Conclusion Perspectives

Perspectives

Complexity analysis: query, update, simplification.

Queryoptimization.

Fuzzy datasimplification.

Extensions: negation, some limited order.

(65)

Conclusion Perspectives

Perspectives

Complexity analysis: query, update, simplification.

Queryoptimization.

Fuzzy datasimplification.

Extensions: negation, some limited order.

(66)

Conclusion Perspectives

Perspectives

Complexity analysis: query, update, simplification.

Queryoptimization.

Fuzzy datasimplification.

Extensions: negation, some limited order.

Références

Documents relatifs

Les schémas de métadonnées sont maintenus dans la mémoire du médiateur et classés par source, espace de noms, collection et chemin pour un accès rapide lors du traitement

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

The correct data coordination is the crucial in peer-to-peer databases, it is necessary for correct processing of queries and other database requests.. Our work is focused on updates

Running time of the different algorithms on a given query of. the

Updating probabilistic XML documents highlights the following issue in mod- els different from ProTDB and PrXML { fie } : the model may lack the property of being a

Our system, ProApproX, has the characteristic of not relying on a single algorithm to evaluate the probability of a lineage formula but of deciding on the algorithm to be used based

 Processes the lineage formula to get the probability of the query (and of each matching

The demonstration presents the graphical user interface of ProApproX 2.0, that allows a user to input an XPath query and approximation parameters, and lists query results with