Querying and Updating Probabilistic Information in XML
Serge Abiteboul Pierre Senellart
GEMO Seminar
March 10th, 2006
Introduction The Hidden Web
The Hidden Web
Definition (Hidden Web)
The set of webpages (which may or may not be dynamically
generated) not accessible from the hyperlinked structure of the World Wide Web.
Size estimate (2001) : 500 times larger than the surface Web.
How to understand it and benefit from its content?
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 2 / 27
Introduction The Hidden Web
The Hidden Web
Definition (Hidden Web)
The set of webpages (which may or may not be dynamically
generated) not accessible from the hyperlinked structure of the World Wide Web.
Size estimate (2001) : 500 times larger than the surface Web.
How to understand it and benefit from its content?
Introduction The Hidden Web
The Hidden Web
Definition (Hidden Web)
The set of webpages (which may or may not be dynamically
generated) not accessible from the hyperlinked structure of the World Wide Web.
Size estimate (2001) : 500 times larger than the surface Web.
How to understand it and benefit from its content?
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 2 / 27
Introduction Semantic Interpretation of the Hidden Web
Semantic Interpretation of the Hidden Web
World Wide Web
Introduction Semantic Interpretation of the Hidden Web
Semantic Interpretation of the Hidden Web
World Wide Web
HTML form WSDL discovery UDDI
World Wide Web
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 3 / 27
Introduction Semantic Interpretation of the Hidden Web
Semantic Interpretation of the Hidden Web
World Wide Web
HTML form WSDL discovery UDDI
World Wide Web
WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Introduction Semantic Interpretation of the Hidden Web
Semantic Interpretation of the Hidden Web
World Wide Web
HTML form WSDL discovery UDDI
World Wide Web
WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 3 / 27
Introduction Semantic Interpretation of the Hidden Web
Semantic Interpretation of the Hidden Web
World Wide Web
HTML form WSDL discovery UDDI
World Wide Web
WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Web Services Index indexing
Web Services Index indexing
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Introduction Semantic Interpretation of the Hidden Web
Semantic Interpretation of the Hidden Web
World Wide Web
HTML form WSDL discovery UDDI
World Wide Web
WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Web Services Index indexing
Web Services Index indexing
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Results Queries querying
Web Services Index indexing
Web Services Index indexing
analysis
Analyzed Web Services WSDL wrappers HTML form
WSDL discovery UDDI
World Wide Web
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 3 / 27
Introduction A Probabilistic XML Warehouse
A Probabilistic XML Warehouse
Query interface (tree-pattern with
join) Update interface
(insertions, deletions)
Probabilistic XML Warehouse
Module 1transactionUpdate
}
Module 2 Module 3 + confidence+ confidence
Query Results
Framework
Outline
1 Introduction
2 Framework Data Trees Queries Updates
3 Possible Worlds Model
4 Simple Probabilistic Model
5 Fuzzy Tree Model
6 Conclusion
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 5 / 27
Framework Data Trees
Data Trees
Finite, unordered, trees.
No attribute nodes, no mixed content.
“Multiset” children
A
B B E D
val val’ C F
val"
val"
Framework Data Trees
Data Trees
Finite, unordered, trees.
No attribute nodes, no mixed content.
“Multiset” children
A
B B E D
val val’ C F
val"
val"
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 6 / 27
Framework Data Trees
Data Trees
Finite, unordered, trees.
No attribute nodes, no mixed content.
“Multiset” children
A
B B E D
val val’ C F
val"
val"
Framework Queries
Tree-Pattern With Join Queries
Queries: Tree-Pattern With Join (TPWJ)
Join: by value
Actually: any positive query
A
B C
+
D
val
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 7 / 27
Framework Queries
Tree-Pattern With Join Queries
Queries: Tree-Pattern With Join (TPWJ) Join: by value
Actually: any positive query
A
B C
+
D
val
Framework Queries
Tree-Pattern With Join Queries
Queries: Tree-Pattern With Join (TPWJ) Join: by value
Actually: any positive query
A
B C
+
D
val
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 7 / 27
Framework Updates
Update Transactions
Set of elementary operations:
Insertions of subtrees Deletions of subtrees
TPWJ query + mapping, stating where to perform operations.
Probabilistic update: update + confidence
Framework Updates
Update Transactions
Set of elementary operations:
Insertions of subtrees
Deletions of subtrees
TPWJ query + mapping, stating where to perform operations.
Probabilistic update: update + confidence
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 8 / 27
Framework Updates
Update Transactions
Set of elementary operations:
Insertions of subtrees Deletions of subtrees
TPWJ query + mapping, stating where to perform operations.
Probabilistic update: update + confidence
Framework Updates
Update Transactions
Set of elementary operations:
Insertions of subtrees Deletions of subtrees
TPWJ query + mapping, stating where to perform operations.
Probabilistic update: update + confidence
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 8 / 27
Framework Updates
Update Transactions
Set of elementary operations:
Insertions of subtrees Deletions of subtrees
TPWJ query + mapping, stating where to perform operations.
Probabilistic update: update + confidence
Possible Worlds Model
Outline
1 Introduction
2 Framework
3 Possible Worlds Model Model
Queries Updates
4 Simple Probabilistic Model
5 Fuzzy Tree Model
6 Conclusion
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 9 / 27
Possible Worlds Model Model
PW Model
Set of tree/probability pairs, one for each possible world.
A
C
A
C
D
A
B C
A
B C
D
P = 0.06 P = 0.14 P = 0.24 P = 0.56
Possible Worlds Model Model
PW Model
Set of tree/probability pairs, one for each possible world.
A
C
A
C
D
A
B C
A
B C
D
P = 0.06 P = 0.14 P = 0.24 P = 0.56
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 10 / 27
Possible Worlds Model Queries
Queries on PW sets
Definition
If T = { (t i , p i ) } , the result of query Q over the PW set T is the normalization of { (t, p i ) | t ∈ Q(t i ) }
Normalization: grouping of duplicates (summing probabilities)
(t, p) ∈ Q(T ) means “the probability that t matches T is p”.
Possible Worlds Model Queries
Queries on PW sets
Definition
If T = { (t i , p i ) } , the result of query Q over the PW set T is the normalization of { (t, p i ) | t ∈ Q(t i ) }
Normalization: grouping of duplicates (summing probabilities)
(t , p) ∈ Q(T ) means “the probability that t matches T is p”.
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 11 / 27
Possible Worlds Model Queries
Queries on PW sets
Definition
If T = { (t i , p i ) } , the result of query Q over the PW set T is the normalization of { (t, p i ) | t ∈ Q(t i ) }
Normalization: grouping of duplicates (summing probabilities)
(t, p) ∈ Q(T ) means “the probability that t matches T is p”.
Possible Worlds Model Updates
Updates on PW sets
Definition
The result of an update t with confidence c on a PW set T is the normalization of:
{(t, p) ∈ T | t is not selected by Q } S { (τ (t), p · c) | t is selected by Q } S { (t, p · (1 − c)) | t is selected by Q }
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 12 / 27
Simple Probabilistic Model
Outline
1 Introduction
2 Framework
3 Possible Worlds Model
4 Simple Probabilistic Model
Model and Possible Worlds Semantics Queries
Incompleteness
5 Fuzzy Tree Model
Simple Probabilistic Model Model and Possible Worlds Semantics
SP Trees
Data tree with probability assigned to each node.
A
B 0 . 8
C
D 0 . 7
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 14 / 27
Simple Probabilistic Model Model and Possible Worlds Semantics
PW Semantics of SP Trees
A node is assigned the probability p: means that the probability the node is in the tree if its parent is in the tree is p.
A
C
A
C
D
A
B C
A
B C
D
P = 0.06 P = 0.14 P = 0.24 P = 0.56
Simple Probabilistic Model Queries
Queries on SP Trees
Definition
Queries on SP trees:
Query on underlying tree.
Probabilities: multiplication of probabilities of nodes of the mapping.
Theorem Q( J T K ) = Q(T )
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 16 / 27
Simple Probabilistic Model Queries
Queries on SP Trees
Definition
Queries on SP trees:
Query on underlying tree.
Probabilities: multiplication of probabilities of nodes of the mapping.
Theorem
Q( J T K ) = Q(T )
Simple Probabilistic Model Incompleteness
Incompleteness of SP Trees
Theorem
The SP tree model is incomplete.
A
C
A
C
D
A
B C
P = 0.06 P = 0.70 P = 0.24
Theorem
SP trees are not closed under updates.
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 17 / 27
Simple Probabilistic Model Incompleteness
Incompleteness of SP Trees
Theorem
The SP tree model is incomplete.
A
C
A
C
D
A
B C
P = 0.06 P = 0.70 P = 0.24 Theorem
SP trees are not closed under updates.
Fuzzy Tree Model
Outline
1 Introduction
2 Framework
3 Possible Worlds Model
4 Simple Probabilistic Model
5 Fuzzy Tree Model
Model and Possible Worlds Semantics Queries
Updates
Implementation
6 Conclusion
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 18 / 27
Fuzzy Tree Model Model and Possible Worlds Semantics
Fuzzy Trees
Data tree with event conditions (conjunction of probabilistic events or negations of probabilistic events) assigned to each node.
A
B w 1 , ¬ w 2
C
D w 2
Event Proba.
w1 0.8
w2 0.7
Fuzzy Tree Model Model and Possible Worlds Semantics
PW Semantics of Fuzzy Trees
A node is assigned the condition C: means that this node is in the tree if C is true (and if its parent is in the tree).
A
C
A
C
D
A
B C
P = 0.06 P = 0.70 P = 0.24
Theorem
The fuzzy tree model is as expressive as the PW model.
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 20 / 27
Fuzzy Tree Model Model and Possible Worlds Semantics
PW Semantics of Fuzzy Trees
A node is assigned the condition C: means that this node is in the tree if C is true (and if its parent is in the tree).
A
C
A
C
D
A
B C
P = 0.06 P = 0.70 P = 0.24
Theorem
The fuzzy tree model is as expressive as the PW model.
Fuzzy Tree Model Queries
Queries on Fuzzy Trees
Definition
Queries on fuzzy trees:
Query on underlying tree.
Probabilities: probability of the conjunction of the conditions of nodes of the mapping.
Theorem Q( J T K ) = Q(T )
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 21 / 27
Fuzzy Tree Model Queries
Queries on Fuzzy Trees
Definition
Queries on fuzzy trees:
Query on underlying tree.
Probabilities: probability of the conjunction of the conditions of nodes of the mapping.
Theorem
Q( J T K ) = Q(T )
Fuzzy Tree Model Updates
Updates on Fuzzy Trees
Insertions: no problem. Conditions required for the query to match added to inserted nodes.
Deletions: ok, but more problematic. May yield an exponential growth of the fuzzy tree in case of complex dependencies.
Theorem
If (τ, c) is a probabilistic update transition and T a fuzzy tree: J (τ, c)(T ) K = (τ, c)( J T K )
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 22 / 27
Fuzzy Tree Model Updates
Updates on Fuzzy Trees
Insertions: no problem. Conditions required for the query to match added to inserted nodes.
Deletions: ok, but more problematic. May yield an exponential growth of the fuzzy tree in case of complex dependencies.
Theorem
If (τ, c) is a probabilistic update transition and T a fuzzy tree:
J (τ, c)(T ) K = (τ, c)( J T K )
Fuzzy Tree Model Updates
Updates on Fuzzy Trees
Insertions: no problem. Conditions required for the query to match added to inserted nodes.
Deletions: ok, but more problematic. May yield an exponential growth of the fuzzy tree in case of complex dependencies.
Theorem
If (τ, c) is a probabilistic update transition and T a fuzzy tree:
J (τ, c)(T ) K = (τ, c)( J T K )
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 22 / 27
Fuzzy Tree Model Updates
Example: Deduplication
Deduplication of the two B nodes with confidence 0.9.
A
B w 1
B w 2
Event Proba.
w1 0.8
w2 0.7
A
B w 1 , ¬ w 3
B w 1 , w 3 , ¬ w 2
B w 2 , ¬ w 3
B w 2 , w 3 , ¬ w 1
B w 1 , w 2 , w 3
Event Proba.
w1 0.8
w2 0.7
w3 0.9
Fuzzy Tree Model Updates
Example: Deduplication
Deduplication of the two B nodes with confidence 0.9.
A
B w 1
B w 2
Event Proba.
w1 0.8
w2 0.7
A
B w 1 , ¬ w 3
B w 1 , w 3 , ¬ w 2
B w 2 , ¬ w 3
B w 2 , w 3 , ¬ w 1
B w 1 , w 2 , w 3
Event Proba.
w1 0.8
w2 0.7
w3 0.9
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 23 / 27
Fuzzy Tree Model Implementation
Implementation
Java-based
File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate
cf Demo
Fuzzy Tree Model Implementation
Implementation
Java-based
File system storage (will look at an XML DB next)
Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate
cf Demo
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 24 / 27
Fuzzy Tree Model Implementation
Implementation
Java-based
File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine
Updates expressed in XUpdate
cf Demo
Fuzzy Tree Model Implementation
Implementation
Java-based
File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate
cf Demo
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 24 / 27
Fuzzy Tree Model Implementation
Implementation
Java-based
File system storage (will look at an XML DB next) Query evaluation: Qizx/open XQuery engine Updates expressed in XUpdate
cf Demo
Conclusion
Outline
1 Introduction
2 Framework
3 Possible Worlds Model
4 Simple Probabilistic Model
5 Fuzzy Tree Model
6 Conclusion Summary Perspectives
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 25 / 27
Conclusion Summary
Summary
A model for representing probabilistic information in a semi-structured database.
Special focus on updating, for managing a probabilistic warehouse.
More expressive and as concise as a simple model, more concise
and as expressive as the naïve model.
Conclusion Summary
Summary
A model for representing probabilistic information in a semi-structured database.
Special focus on updating, for managing a probabilistic warehouse.
More expressive and as concise as a simple model, more concise and as expressive as the naïve model.
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 26 / 27
Conclusion Summary
Summary
A model for representing probabilistic information in a semi-structured database.
Special focus on updating, for managing a probabilistic warehouse.
More expressive and as concise as a simple model, more concise
and as expressive as the naïve model.
Conclusion Perspectives
Perspectives
Implementation finalization.
Full complexity analysis.
More expressive queries (negative joints?), while keeping efficency.
Optimizations, simplifications of a fuzzy tree.
Validation of fuzzy trees.
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 27 / 27
Conclusion Perspectives
Perspectives
Implementation finalization.
Full complexity analysis.
More expressive queries (negative joints?), while keeping efficency.
Optimizations, simplifications of a fuzzy tree.
Validation of fuzzy trees.
Conclusion Perspectives
Perspectives
Implementation finalization.
Full complexity analysis.
More expressive queries (negative joints?), while keeping efficency.
Optimizations, simplifications of a fuzzy tree.
Validation of fuzzy trees.
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 27 / 27
Conclusion Perspectives
Perspectives
Implementation finalization.
Full complexity analysis.
More expressive queries (negative joints?), while keeping efficency.
Optimizations, simplifications of a fuzzy tree.
Validation of fuzzy trees.
Conclusion Perspectives
Perspectives
Implementation finalization.
Full complexity analysis.
More expressive queries (negative joints?), while keeping efficency.
Optimizations, simplifications of a fuzzy tree.
Validation of fuzzy trees.
Abiteboul, Senellart (INRIA Futurs) Querying and Updating Proba. Info. in XML GEMO, 2006/03/10 27 / 27