• Aucun résultat trouvé

Querying and Updating Probabilistic Information in XML

N/A
N/A
Protected

Academic year: 2022

Partager "Querying and Updating Probabilistic Information in XML"

Copied!
52
0
0

Texte intégral

(1)

Querying and Updating Probabilistic Information in XML

Serge Abiteboul Pierre Senellart

EDBT 2006 March 28th, 2006

(2)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing Data Cleaning

Schema Matching . . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(3)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing Data Cleaning

Schema Matching . . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(4)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing

Data Cleaning Schema Matching . . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(5)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing Data Cleaning

Schema Matching . . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(6)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing Data Cleaning

Schema Matching

. . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(7)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing Data Cleaning

Schema Matching . . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(8)

Introduction Imprecise data

Imprecise data

Many tasks generateimprecisedata, with someconfidencevalue:

Information Extraction

Natural Language Processing Data Cleaning

Schema Matching . . .

Need for a way to manage this imprecision, to work with it throughout an entire complex process.

(9)

Introduction A Probabilistic XML Warehouse

A Probabilistic XML Warehouse

Query interface (tree-pattern with

join) Update interface

(insertions, deletions)

Probabilistic XML Warehouse

Module 1transactionUpdate

}

Module 2 Module 3 + confidence

+ confidence

Query Results

(10)

Framework

Outline

1 Introduction

2 Framework Data Trees Queries Updates

3 Possible Worlds Model

4 Fuzzy Tree Model

(11)

Framework Data Trees

Data Trees

Finite,unordered, trees.

No distinction betweenattributeand element nodes. Nomixedcontent.

A

B B E D

foo foo C F

b a r

n e e

(12)

Framework Data Trees

Data Trees

Finite,unordered, trees.

No distinction betweenattributeand element nodes.

Nomixedcontent.

A

B B E D

foo foo C F n e e

(13)

Framework Data Trees

Data Trees

Finite,unordered, trees.

No distinction betweenattributeand element nodes.

Nomixedcontent.

A

B B E D

foo foo C F

b a r

n e e

(14)

Framework Queries

Tree-Pattern With Join Queries

Queries: Tree-Pattern With Join(TPWJ) (standard subset of XQuery)

Join: by value

Result: minimal subtree containing all the nodes mapped by the query

A

B C

+

D

(15)

Framework Queries

Tree-Pattern With Join Queries

Queries: Tree-Pattern With Join(TPWJ) (standard subset of XQuery)

Join: by value

Result: minimal subtree containing all the nodes mapped by the query

A

B C

+

D

val

(16)

Framework Queries

Tree-Pattern With Join Queries

Queries: Tree-Pattern With Join(TPWJ) (standard subset of XQuery)

Join: by value

Result: minimal subtree containing all the nodes mapped by the query

A

B C

+

D

(17)

Framework Updates

Update Transactions

Setof elementary operations:

Insertionsof subtrees Deletionsof subtrees

Update Transaction: TPWJ query + mapping, statingwhereto perform operations.

Probabilistic update: update +confidence

(18)

Framework Updates

Update Transactions

Setof elementary operations:

Insertionsof subtrees

Deletionsof subtrees

Update Transaction: TPWJ query + mapping, statingwhereto perform operations.

Probabilistic update: update +confidence

(19)

Framework Updates

Update Transactions

Setof elementary operations:

Insertionsof subtrees Deletionsof subtrees

Update Transaction: TPWJ query + mapping, statingwhereto perform operations.

Probabilistic update: update +confidence

(20)

Framework Updates

Update Transactions

Setof elementary operations:

Insertionsof subtrees Deletionsof subtrees

Update Transaction: TPWJ query + mapping, statingwhereto perform operations.

Probabilistic update: update +confidence

(21)

Framework Updates

Update Transactions

Setof elementary operations:

Insertionsof subtrees Deletionsof subtrees

Update Transaction: TPWJ query + mapping, statingwhereto perform operations.

Probabilistic update: update +confidence

(22)

Possible Worlds Model

Outline

1 Introduction

2 Framework

3 Possible Worlds Model Model

Semantic Foundation

4 Fuzzy Tree Model

(23)

Possible Worlds Model Model

Possible Worlds Model

Semantic foundationfor probabilistic data: possible worlds model.

Set oftree/probability pairs, one for eachpossible world.

A

C

A

C

D

A

B C

A

B C

D

P =0.06 P =0.14 P =0.24 P =0.56

(24)

Possible Worlds Model Model

Possible Worlds Model

Semantic foundationfor probabilistic data: possible worlds model.

Set oftree/probability pairs, one for eachpossible world.

A

C

A

C

D

A

B C

A

B C

D

P =0.06 P =0.14 P =0.24 P =0.56

(25)

Possible Worlds Model Semantic Foundation

Queries, Updates: Semantic Foundation

Definition

IfT ={(ti,pi)}, the result of queryQover the Possible Worlds setT is the normalization of{(t,pi)|t ∈Q(ti)}

Definition

The result of an updatet with confidencec on a Possible Worlds setT is the normalization of:

{(t,p)∈T |tis not selected by Q} S {(τ(t),p·c)|t is selected by Q} S {(t,p·(1−c))|t is selected by Q}

(26)

Possible Worlds Model Semantic Foundation

Queries, Updates: Semantic Foundation

Definition

IfT ={(ti,pi)}, the result of queryQover the Possible Worlds setT is the normalization of{(t,pi)|t ∈Q(ti)}

Definition

The result of an updatet with confidencec on a Possible Worlds setT is the normalization of:

{(t,p)∈T |tis not selected by Q} S {(τ(t),p·c)|t is selected by Q} S {(t,p·(1−c))|t is selected by Q}

(27)

Fuzzy Tree Model

Outline

1 Introduction

2 Framework

3 Possible Worlds Model

4 Fuzzy Tree Model

Model and Possible Worlds Semantics Queries

Updates

Implementation

5 Conclusion

(28)

Fuzzy Tree Model Model and Possible Worlds Semantics

Fuzzy Trees

Data tree withevent conditions(conjunction of probabilistic events or negations of probabilistic events)assigned to each node.

A

B w 1 , ¬ w 2

C

D w 2

Event Proba.

w1 0.8

w2 0.7

semantics

−→

A

C

A

C

D

A

B C

P=0.06 P=0.70 P=0.24

Theorem

The fuzzy tree model is as expressive as the Possible Worlds model.

(29)

Fuzzy Tree Model Model and Possible Worlds Semantics

Fuzzy Trees

Data tree withevent conditions(conjunction of probabilistic events or negations of probabilistic events)assigned to each node.

A

B w 1 , ¬ w 2

C

D w 2

Event Proba.

w1 0.8

w2 0.7

semantics

−→

A

C

A

C

D

A

B C

P=0.06 P=0.70 P=0.24

Theorem

The fuzzy tree model is as expressive as the Possible Worlds model.

(30)

Fuzzy Tree Model Model and Possible Worlds Semantics

Fuzzy Trees

Data tree withevent conditions(conjunction of probabilistic events or negations of probabilistic events)assigned to each node.

A

B w 1 , ¬ w 2

C

D w 2

Event Proba.

w1 0.8

w2 0.7

semantics

−→

A

C

A

C

D

A

B C

P=0.06 P=0.70 P=0.24

(31)

Fuzzy Tree Model Queries

Queries on Fuzzy Trees

Definition

Queries on fuzzy trees:

Queryon underlying tree.

Probabilities: probability of the conjunction of the conditions of nodes of the mapping.

Theorem

Fuzzy Tree Fuzzy Tree

Possible Worlds Possible Worlds

query

query

semantics semantics

(32)

Fuzzy Tree Model Queries

Queries on Fuzzy Trees

Definition

Queries on fuzzy trees:

Queryon underlying tree.

Probabilities: probability of the conjunction of the conditions of nodes of the mapping.

Theorem

Fuzzy Tree query Fuzzy Tree

semantics semantics

(33)

Fuzzy Tree Model Updates

Updates on Fuzzy Trees

Insertions: no problem. Conditions required for the query to match added to inserted nodes.

Deletions: ok, but more problematic. May yield an exponential growth of the fuzzy tree in case of complex dependencies.

Theorem

Fuzzy Tree Fuzzy Tree

Possible Worlds Possible Worlds

update

update

semantics semantics

(34)

Fuzzy Tree Model Updates

Updates on Fuzzy Trees

Insertions: no problem. Conditions required for the query to match added to inserted nodes.

Deletions: ok, but more problematic. May yield an exponential growth of the fuzzy tree in case of complex dependencies.

Theorem

Fuzzy Tree Fuzzy Tree

Possible Worlds Possible Worlds

update

update

semantics semantics

(35)

Fuzzy Tree Model Updates

Updates on Fuzzy Trees

Insertions: no problem. Conditions required for the query to match added to inserted nodes.

Deletions: ok, but more problematic. May yield an exponential growth of the fuzzy tree in case of complex dependencies.

Theorem

Fuzzy Tree Fuzzy Tree

Possible Worlds Possible Worlds

update

update

semantics semantics

(36)

Fuzzy Tree Model Updates

Example: Conditional Replacement

Replacement ofCbyDifBis present, with confidence 0.9.

A

B w 1

C w 2

Event Proba.

w1 0.8

w2 0.7

A

B

w1

C

¬w1,w2

C

w1,w2,¬w3

D

w1,w2,w3

Event Proba.

w1 0.8

w2 0.7

w3 0.9

(37)

Fuzzy Tree Model Updates

Example: Conditional Replacement

Replacement ofCbyDifBis present, with confidence 0.9.

A

B w 1

C w 2

Event Proba.

w1 0.8

w2 0.7

A

B

w1

C

¬w1,w2

C

w1,w2,¬w3

D

w1,w2,w3

Event Proba.

w1 0.8

w2 0.7

w3 0.9

(38)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

Available freely at

http://pierre.senellart.com/software/fuzzyxml/ cf demo

http://pierre.senellart.com/software/fuzzyxml/demo/

(39)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next)

Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

Available freely at

http://pierre.senellart.com/software/fuzzyxml/ cf demo

http://pierre.senellart.com/software/fuzzyxml/demo/

(40)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine

Updates expressed in XUpdate Available freely at

http://pierre.senellart.com/software/fuzzyxml/ cf demo

http://pierre.senellart.com/software/fuzzyxml/demo/

(41)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

Available freely at

http://pierre.senellart.com/software/fuzzyxml/ cf demo

http://pierre.senellart.com/software/fuzzyxml/demo/

(42)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

Available freely at

http://pierre.senellart.com/software/fuzzyxml/

cf demo

http://pierre.senellart.com/software/fuzzyxml/demo/

(43)

Fuzzy Tree Model Implementation

Implementation

Java-based

File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate

Available freely at

http://pierre.senellart.com/software/fuzzyxml/

cf demo

http://pierre.senellart.com/software/fuzzyxml/demo/

(44)

Conclusion

Outline

1 Introduction

2 Framework

3 Possible Worlds Model

4 Fuzzy Tree Model

5 Conclusion Summary

(45)

Conclusion Summary

Summary

A model for representingprobabilisticinformation for semi-structureddata.

Soundandcompletesupport for an important subset of XQuery. Soundandcompletesupport for XUpdate-based transactions with inserts and deletes.

An implementation based on compilation to XQuery/XUpdate.

(46)

Conclusion Summary

Summary

A model for representingprobabilisticinformation for semi-structureddata.

Soundandcompletesupport for an important subset of XQuery.

Soundandcompletesupport for XUpdate-based transactions with inserts and deletes.

An implementation based on compilation to XQuery/XUpdate.

(47)

Conclusion Summary

Summary

A model for representingprobabilisticinformation for semi-structureddata.

Soundandcompletesupport for an important subset of XQuery.

Soundandcompletesupport for XUpdate-based transactions with inserts and deletes.

An implementation based on compilation to XQuery/XUpdate.

(48)

Conclusion Summary

Summary

A model for representingprobabilisticinformation for semi-structureddata.

Soundandcompletesupport for an important subset of XQuery.

Soundandcompletesupport for XUpdate-based transactions with inserts and deletes.

An implementation based on compilation to XQuery/XUpdate.

(49)

Conclusion Perspectives

Perspectives

Complexity analysis: query, update, simplification.

Queryoptimization. Fuzzy datasimplification.

Extensions: negation, some limited order.

(50)

Conclusion Perspectives

Perspectives

Complexity analysis: query, update, simplification.

Queryoptimization.

Fuzzy datasimplification.

Extensions: negation, some limited order.

(51)

Conclusion Perspectives

Perspectives

Complexity analysis: query, update, simplification.

Queryoptimization.

Fuzzy datasimplification.

Extensions: negation, some limited order.

(52)

Conclusion Perspectives

Perspectives

Complexity analysis: query, update, simplification.

Queryoptimization.

Fuzzy datasimplification.

Extensions: negation, some limited order.

Références

Documents relatifs

The correct data coordination is the crucial in peer-to-peer databases, it is necessary for correct processing of queries and other database requests.. Our work is focused on updates

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

Les schémas de métadonnées sont maintenus dans la mémoire du médiateur et classés par source, espace de noms, collection et chemin pour un accès rapide lors du traitement

Running time of the different algorithms on a given query of. the

Updating probabilistic XML documents highlights the following issue in mod- els different from ProTDB and PrXML { fie } : the model may lack the property of being a

Our system, ProApproX, has the characteristic of not relying on a single algorithm to evaluate the probability of a lineage formula but of deciding on the algorithm to be used based

 Processes the lineage formula to get the probability of the query (and of each matching

The demonstration presents the graphical user interface of ProApproX 2.0, that allows a user to input an XPath query and approximation parameters, and lists query results with