The Possibility Problem for Probabilistic XML
Antoine Amarilli
Télécom ParisTech, Paris, France
June 5, 2014
Probabilistic XML
We areunsureabout the exact contents of an XML document.
directory
person
address 42 Foo St.
x name
mux
Jean Doe Jane Doe
0.8 0.2
person
ind
mobile 1234 0.6 address 42 Foo St.
x name
John Doe
Introduction Known results Unordered documents Unambiguous labels Conclusion
Local formalisms: possible worlds semantics
r ind
b a
α β ⇒
(1−α)(1−β) α(1−β) (1−α)β αβ
r r
a
r b
r b a
r mux
b a
α β ⇒
1−α−β α β
r r
a
r b r
det b a
⇒
1 r b a
Caution: we impose α <1,β <1 in ind.
Introduction Known results Unordered documents Unambiguous labels Conclusion
Local formalisms: possible worlds semantics
r ind
b a
α β ⇒
(1−α)(1−β) α(1−β) (1−α)β αβ
r r
a
r b
r b a r
mux b a
α β ⇒
1−α−β α β
r r
a
r b
det b a
⇒ r
b a
Caution: we impose α <1,β <1 in ind.
Local formalisms: possible worlds semantics
r ind
b a
α β ⇒
(1−α)(1−β) α(1−β) (1−α)β αβ
r r
a
r b
r b a r
mux b a
α β ⇒
1−α−β α β
r r
a
r b r
det b a
⇒
1 r b a
Caution: we impose α <1,β <1 in ind.
Event formalisms
x 0.7
y 0.4
r
b a
x ¬x∧y
Probability distributionon events Draw events independently
Edges annotated withformulae on the events Edges with false formulae areremoved
⇒ mie: multivalued events (see later)
⇒ cie: conjunctions of Boolean events
⇒ fie: formulae of Boolean events
Possibility problem (Poss)
Given:
aprobabilistic documentD adeterministic documentW Is Wa possible worldof D?
If yes, with whichprobability?
Diverse probabilisticformalisms, ordered and unordered Likequery evaluation but:
⇒ Needinequality: “don’t collapse nodes”
⇒ Neednegation: “no additional things”
⇒ Querydepends on inputW
⇒ Specific boundsfor this Possproblem?
Table of contents
1 Introduction
2 Known results
3 Unordered documents
4 Unambiguous labels
5 Conclusion
In NP, in FP
#PGuess a valuation of the events Guess a match of W inD
Check that the match isrealized by the valuation
⇒ Likewise, probability computation is in FP#P
⇒ Of coursePossis NP-hard forfie
Tractable for ordered local documents
Localchoices and ordereddocuments
Possibility decisionandcomputationare in PTIME Intuitively:
match each possiblesubsequences of siblings dynamic algorithmfor match at each level
⇒ Implied bydetermininstic tree automata on probabilistic XML:
Cohen, Kimelfeld, and Sagiv 2009
⇒ Assumption of orderis crucial
Table of contents
1 Introduction
2 Known results
3 Unordered documents
4 Unambiguous labels
5 Conclusion
Computation is #P-hard for ind or mux
1 2 3
1 2 3
D W
r
⊤ a3 a2
1/2 1/2
⊤ a3 1/2
⊤ a2 a1
1/2 1/2
r
⊤ a3
⊤ a2
⊤ a1
Decision is in PTIME for ind or mux
Compute bottom-up if a node has theempty possible world Check dynamicallybetween all nodes ofD andW
⇒ Buildbipartite graphbased on child compatibility
⇒ Adddummy nodesfor deletions of nodes that can be deleted
⇒ Check in PTIME if graph has aperfect matching
D W G
r
ind c
1/2 X2
ind b
1/2 X1
ind b a 1/2 1/2
r X4
b X3
a
X3 X4 del X1 X2 c
Decision is NP-hard for any two of ind, mux , det
Withdet, reduction fromexact cover S={Si},Si={sij}
Is thereT⊆S such that∪ T=∪
Swith no dupes?
D W
S={{a,b}, {a,c}, {b}}
r
det det
det
1/2 1/2 1/2
r
c b a
Decision is NP-hard for any two of ind, mux , det (cont’d)
Withind andmux, reduction from SAT F= (a∨b∨ ¬c)∧(a∨c)∧(¬a)
D W
r
mux ind
c3 1/2 ind
c2 1/2 1/2 1/2 mux
ind ind
c1 1/2 1/2 1/2 mux
ind c3 1/2 ind
c2 c1 1/2 1/2
1/2 1/2
r
c3 c2 c1
Table of contents
1 Introduction
2 Known results
3 Unordered documents
4 Unambiguous labels
5 Conclusion
Unambiguity
D isunambiguousif node labels are unique Possible refinements (unique among siblings, etc.)
⇒ There is at most one way to matchW!
Alllocal modelstractable (can impose order)
⇒ Can we have correlations?
Still NP-hard for cie
F=∧
i
∨
j±xij in CNF Equivalently: ∧
i¬∧
j∓xij
D W
r
. . . an a1
∧
j∓x1j ∧
j∓x1j
r
⇒ W is a possible world ofDiff F is satisfiable
⇒ Decision for PossisNP-hard
The mie class
Var Val Prob
x 1 0.6
x 2 0.2
x 3 0.1
x 4 0.1
y 1 0.5
y 2 0.5
mie: Multivalued independent events Noconjunctions allowed
Captures mux
Doesn’t capture detor indhierarchies Intractable ifambiguous
⇒ Ifnon-ambiguous, do we have tractability?
mie tractable on non-ambiguous documents
Var Val Prob
x 1 0.6
x 2 0.2
x 3 0.1
x 4 0.1
y 1 0.5
y 2 0.5
D W
r
c g
y= 2 b
f e
x= 2 y= 1 a
d
y= 2 x= 1
r b a
d
x̸= 2,x̸= 1,y= 2,y̸= 1 x∈ {3,4},y∈ {2}.
⇒
Table of contents
1 Introduction
2 Known results
3 Unordered documents
4 Unambiguous labels
5 Conclusion
Introduction Known results Unordered documents Unambiguous labels Conclusion
Conclusion
Ordered local modelsare tractable Unordered local models are tractable
⇒ Fordecisiononly, and
⇒ With onlymux or onlyind
mie is tractable onunambiguousdocuments Other cases are hard
⇒ Height does not matter
⇒ Probabilities do not matter
⇒ Can we refine mie, unambiguity,mux–ind interaction?
⇒ What if Dis partially ordered?
Thanks for your attention!
Introduction Known results Unordered documents Unambiguous labels Conclusion
Conclusion
Ordered local modelsare tractable Unordered local models are tractable
⇒ Fordecisiononly, and
⇒ With onlymux or onlyind
mie is tractable onunambiguousdocuments Other cases are hard
⇒ Height does not matter
⇒ Probabilities do not matter
⇒ Can we refine mie, unambiguity,mux–ind interaction?
⇒ What if Dis partially ordered?
Thanks for your attention!
Conclusion
Ordered local modelsare tractable Unordered local models are tractable
⇒ Fordecisiononly, and
⇒ With onlymux or onlyind
mie is tractable onunambiguousdocuments Other cases are hard
⇒ Height does not matter
⇒ Probabilities do not matter
⇒ Can we refine mie, unambiguity,mux–ind interaction?
⇒ What if Dis partially ordered?
References
Cohen, Sara, Benny Kimelfeld, and Yehoshua Sagiv (2009).
“Running tree automata on probabilistic XML”. In:Proc.
PODS. ACM, pp. 227–236.