• Aucun résultat trouvé

Tractable Query Answering Under Probabilistic Constraints

N/A
N/A
Protected

Academic year: 2022

Partager "Tractable Query Answering Under Probabilistic Constraints"

Copied!
45
0
0

Texte intégral

(1)

Tractable Query Answering Under Probabilistic Constraints

Antoine Amarilli1, Pierre Bourhis2, Pierre Senellart1,3

1Télécom ParisTech

2CNRS-LIFL

3National University of Singapore

September 4th, 2014

(2)

Tractable Query Evaluation On Probabilistic Instances

Antoine Amarilli1, Pierre Bourhis2, Pierre Senellart1,3

1Télécom ParisTech

2CNRS-LIFL

3National University of Singapore

September 4th, 2014

(3)

Background Ideas Results Consequences

Instances and queries

Given a relational instance(= set of facts, hypergraph) I={R(a,b),R(b,c),S(c)}

Given a conjunctive query(CQ) (existentially quantified) q:∃xy R(x,y)∧S(y)

Query evaluation(model checking) of q onI

Data complexity: qis fixed

(4)

Background Ideas Results Consequences

Instances and queries

Given a relational instance(= set of facts, hypergraph) I={R(a,b),R(b,c),S(c)}

Given a conjunctive query(CQ) (existentially quantified) q:∃xy R(x,y)∧S(y)

Query evaluation(model checking) of q onI

Data complexity: qis fixed

(5)

Background Ideas Results Consequences

Instances and queries

Given a relational instance(= set of facts, hypergraph) I={R(a,b),R(b,c),S(c)}

Given a conjunctive query(CQ) (existentially quantified) q:∃xyR(x,y)∧S(y)

Query evaluation(model checking) of q onI

Data complexity: qis fixed

(6)

Background Ideas Results Consequences

Instances and queries

Given a relational instance(= set of facts, hypergraph) I={R(a,b),R(b,c),S(c)}

Given a conjunctive query(CQ) (existentially quantified) q:∃xy R(x,y)∧S(y)

Query evaluation(model checking) of q onI

Data complexity: qis fixed

(7)

Background Ideas Results Consequences

Instances and queries

Given a relational instance(= set of facts, hypergraph) I={R(a,b),R(b,c),S(c)}

Given a conjunctive query(CQ) (existentially quantified) q:∃xy R(x,y)∧S(y)

Query evaluation(model checking) of q onI

Data complexity: q is fixed

(8)

Background Ideas Results Consequences

Uncertain and probabilistic instances

Set of uncertain events

eflight CDG VIE flight AF1756 takes place ebus Vienna Bratislava buses are running

Annotate instance factswith formulaeon the events IsIn(AA, Paris) ¬eflight

IsIn(AA, Vienna) eflight∧ ¬ebus IsIn(AA, Bratislava) eflight∧ebus

Semantics: aset of instances (possible worlds). Add aprobability distribution on each event

each event hasprobability0<p<1 of being true all events are assumed to beindependent

Semantics: aprobability distribution on instances.

Query evaluation: determine theprobability of q onbI.

(9)

Background Ideas Results Consequences

Uncertain and probabilistic instances

Set of uncertain events

eflight CDG VIE flight AF1756 takes place ebus Vienna Bratislava buses are running Annotate instance factswith formulaeon the events

IsIn(AA, Paris) ¬eflight IsIn(AA, Vienna) eflight∧ ¬ebus IsIn(AA, Bratislava) eflight∧ebus

Semantics: aset of instances (possible worlds). Add aprobability distribution on each event

each event hasprobability0<p<1 of being true all events are assumed to beindependent

Semantics: aprobability distribution on instances.

Query evaluation: determine theprobability of q onbI.

(10)

Background Ideas Results Consequences

Uncertain and probabilistic instances

Set of uncertain events

eflight CDG VIE flight AF1756 takes place ebus Vienna Bratislava buses are running Annotate instance factswith formulaeon the events

IsIn(AA, Paris) ¬eflight IsIn(AA, Vienna) eflight∧ ¬ebus IsIn(AA, Bratislava) eflight∧ebus

Semantics: aset of instances (possible worlds).

Add aprobability distribution on each event each event hasprobability0<p<1 of being true all events are assumed to beindependent

Semantics: aprobability distribution on instances.

Query evaluation: determine theprobability of q onbI.

(11)

Background Ideas Results Consequences

Uncertain and probabilistic instances

Set of uncertain events

eflight CDG VIE flight AF1756 takes place ebus Vienna Bratislava buses are running Annotate instance factswith formulaeon the events

IsIn(AA, Paris) ¬eflight IsIn(AA, Vienna) eflight∧ ¬ebus IsIn(AA, Bratislava) eflight∧ebus

Semantics: aset of instances (possible worlds).

Add aprobability distribution on each event each event hasprobability0<p<1 of being true all events are assumed to beindependent

Semantics: aprobability distribution on instances.

Query evaluation: determine theprobability of q onbI.

(12)

Background Ideas Results Consequences

Uncertain and probabilistic instances

Set of uncertain events

eflight CDG VIE flight AF1756 takes place ebus Vienna Bratislava buses are running Annotate instance factswith formulaeon the events

IsIn(AA, Paris) ¬eflight IsIn(AA, Vienna) eflight∧ ¬ebus IsIn(AA, Bratislava) eflight∧ebus

Semantics: aset of instances (possible worlds).

Add aprobability distribution on each event each event hasprobability0<p<1 of being true all events are assumed to beindependent

Semantics: aprobability distribution on instances.

Query evaluation: determine theprobability of q onbI.

(13)

Background Ideas Results Consequences

Uncertain and probabilistic instances

Set of uncertain events

eflight CDG VIE flight AF1756 takes place ebus Vienna Bratislava buses are running Annotate instance factswith formulaeon the events

IsIn(AA, Paris) ¬eflight IsIn(AA, Vienna) eflight∧ ¬ebus IsIn(AA, Bratislava) eflight∧ebus

Semantics: aset of instances (possible worlds).

Add aprobability distribution on each event each event hasprobability0<p<1 of being true all events are assumed to beindependent

Semantics: aprobability distribution on instances.

Query evaluation: determine theprobability of q onbI.

(14)

Background Ideas Results Consequences

Hardness and tractability

Witharbitrary annotations

Query evaluation is#P-hardeven with a single fact (Immediate reduction from #SAT)

Withsimple annotations (one unique event per tuple)

Query evaluation is#P-hardon arbitrary instances (Use the instance to do the reduction)

Existing work:

Fix asimpleannotation scheme

Showdichotomybetween #P-hard and PTIME queries Our approach:

Find arestrictionon the instance and annotations

Show thatmany queriesare tractable in this case

(15)

Background Ideas Results Consequences

Hardness and tractability

Witharbitrary annotations

Query evaluation is#P-hardeven with a single fact (Immediate reduction from #SAT)

Withsimple annotations (one unique event per tuple)

Query evaluation is#P-hardon arbitrary instances (Use the instance to do the reduction)

Existing work:

Fix asimpleannotation scheme

Showdichotomybetween #P-hard and PTIME queries

Our approach:

Find arestrictionon the instance and annotations

Show thatmany queriesare tractable in this case

(16)

Background Ideas Results Consequences

Hardness and tractability

Witharbitrary annotations

Query evaluation is#P-hardeven with a single fact (Immediate reduction from #SAT)

Withsimple annotations (one unique event per tuple)

Query evaluation is#P-hardon arbitrary instances (Use the instance to do the reduction)

Existing work:

Fix asimpleannotation scheme

Showdichotomybetween #P-hard and PTIME queries Our approach:

Find arestrictionon the instance and annotations

Show thatmany queriesare tractable in this case

(17)

Background Ideas Results Consequences

Bounded treewidth

An idea from instances without probabilities...

If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...

Linear time data complexity

(18)

Background Ideas Results Consequences

Bounded treewidth

An idea from instances without probabilities...

If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...

instance I

R(a, b) R(b, c) S(c)

Linear time data complexity

(19)

Background Ideas Results Consequences

Bounded treewidth

An idea from instances without probabilities...

If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...

instance I

R(a, b) R(b, c) S(c) tree encoding TI tree decomposition

O(|I|) for fixed width instance I

R(a, b) R(b, c) S(c)

Linear time data complexity

(20)

Background Ideas Results Consequences

Bounded treewidth

An idea from instances without probabilities...

If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...

instance I

R(a, b) R(b, c) S(c) tree encoding TI tree decomposition

O(|I|) for fixed width instance I

R(a, b) R(b, c) S(c)

query q

∃xy R(x, y) ∧ S(y)

tree encoding TI tree decomposition

O(|I|) for fixed width

Linear time data complexity

(21)

Background Ideas Results Consequences

Bounded treewidth

An idea from instances without probabilities...

If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...

instance I

R(a, b) R(b, c) S(c) tree encoding TI tree decomposition

O(|I|) for fixed width instance I

R(a, b) R(b, c) S(c)

query q

∃xy R(x, y) ∧ S(y)

tree encoding TI tree decomposition

O(|I|) for fixed width

deterministic tree automaton Aq

rewriting O(1) data complexity

query q

∃xy R(x, y) ∧ S(y)

Linear time data complexity

(22)

Background Ideas Results Consequences

Bounded treewidth

An idea from instances without probabilities...

If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...

instance I

R(a, b) R(b, c) S(c) tree encoding TI tree decomposition

O(|I|) for fixed width instance I

R(a, b) R(b, c) S(c)

query q

∃xy R(x, y) ∧ S(y)

tree encoding TI tree decomposition

O(|I|) for fixed width

deterministic tree automaton Aq

rewriting O(1) data complexity

query q

∃xy R(x, y) ∧ S(y) deterministic tree automaton Aq

rewriting O(1) data complexity

evaluation linear time

query answer

Linear time data complexity

(23)

Background Ideas Results Consequences

Bounded treewidth

An idea from instances without probabilities...

If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...

instance I

R(a, b) R(b, c) S(c) tree encoding TI tree decomposition

O(|I|) for fixed width instance I

R(a, b) R(b, c) S(c)

query q

∃xy R(x, y) ∧ S(y)

tree encoding TI tree decomposition

O(|I|) for fixed width

deterministic tree automaton Aq

rewriting O(1) data complexity

query q

∃xy R(x, y) ∧ S(y) deterministic tree automaton Aq

rewriting O(1) data complexity

evaluation linear time

query answer evaluation

linear time

query answer

Linear time data complexity

(24)

Background Ideas Results Consequences

Bounded treewidth

An idea from instances without probabilities...

If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...

instance I

R(a, b) R(b, c) S(c) tree encoding TI tree decomposition

O(|I|) for fixed width instance I

R(a, b) R(b, c) S(c)

query q

∃xy R(x, y) ∧ S(y)

tree encoding TI tree decomposition

O(|I|) for fixed width

deterministic tree automaton Aq

rewriting O(1) data complexity

query q

∃xy R(x, y) ∧ S(y) deterministic tree automaton Aq

rewriting O(1) data complexity

evaluation linear time

query answer evaluation

linear time

query answer

Linear time data complexity

(25)

Background Ideas Results Consequences

Tractable inference

An idea from probabilities without instances...

Represent a propositional formulaF as a Boolean circuit Assume the circuit has constant treewidth

Probability of F can be computed in linear time (usingjunction tree algorithm for Bayesian networks) (assuming constant-time arithmetic operations)

(26)

Background Ideas Results Consequences

cc-tables

Boolean circuit for the annotations

1/2 1/2

1/2

R(a, b) R(b, c)

R(c, d) Circuitmust have low treewidth Instance must have low treewidth

Needsimultaneous decomposition

(27)

Background Ideas Results Consequences

cc-tables

Boolean circuit for the annotations 1/2

1/2 1/2

R(a, b) R(b, c) R(c, d)

Circuitmust have low treewidth Instance must have low treewidth

Needsimultaneous decomposition

(28)

Background Ideas Results Consequences

cc-tables

Boolean circuit for the annotations 1/2

1/2 1/2

R(a, b) R(b, c)

R(c, d) Circuitmust have low treewidth Instance must have low treewidth

Needsimultaneous decomposition

(29)

Background Ideas Results Consequences

cc-tables

Boolean circuit for the annotations 1/2

1/2 1/2

R(a, b) R(b, c)

R(c, d) Circuitmust have low treewidth Instance must have low treewidth

Needsimultaneous decomposition

1/2 R(a, b) 1/2

R(b, c)

1/2

R(c, d)

(30)

Background Ideas Results Consequences

Main result

instance I

1/2 1/2 1/2

R(a, b) R(b, c) R(c, d)

(31)

Background Ideas Results Consequences

Main result

instance I

1/2 1/2 1/2

R(a, b) R(b, c) R(c, d)

instance I

1/2 1/2 1/2

R(a, b) R(b, c) R(c, d)

tree encoding TI

tree decomposition O(|I|) for fixed width

1/2 R(a, b) 1/2

R(b, c)

1/2

R(c, d)

(32)

Background Ideas Results Consequences

Main result

instance I

1/2 1/2 1/2

R(a, b) R(b, c) R(c, d)

instance I

1/2 1/2 1/2

R(a, b) R(b, c) R(c, d)

tree encoding TI

tree decomposition O(|I|) for fixed width

1/2 R(a, b) 1/2

R(b, c)

1/2

R(c, d)

tree encoding TI

tree decomposition O(|I|) for fixed width

1/2 R(a, b) 1/2

R(b, c)

1/2

R(c, d)

query q

xy R(x, y) S(y)

(33)

Background Ideas Results Consequences

Main result

instance I

1/2 1/2 1/2

R(a, b) R(b, c) R(c, d)

instance I

1/2 1/2 1/2

R(a, b) R(b, c) R(c, d)

tree encoding TI

tree decomposition O(|I|) for fixed width

1/2 R(a, b) 1/2

R(b, c)

1/2

R(c, d)

tree encoding TI

tree decomposition O(|I|) for fixed width

1/2 R(a, b) 1/2

R(b, c)

1/2

R(c, d)

query q

xy R(x, y) S(y)

query q

xy R(x, y) S(y)

deterministic tree automaton Aq rewriting

O(1) data complexity

(34)

Background Ideas Results Consequences

Main result

instance I

1/2 1/2 1/2

R(a, b) R(b, c) R(c, d)

instance I

1/2 1/2 1/2

R(a, b) R(b, c) R(c, d)

tree encoding TI

tree decomposition O(|I|) for fixed width

1/2 R(a, b) 1/2

R(b, c)

1/2

R(c, d)

tree encoding TI

tree decomposition O(|I|) for fixed width

1/2 R(a, b) 1/2

R(b, c)

1/2

R(c, d)

query q

xy R(x, y) S(y)

query q

xy R(x, y) S(y)

deterministic tree automaton Aq rewriting

O(1) data complexity

deterministic tree automaton Aq rewriting

O(1) data complexity

instrumentation linear time

bounded treewidth circuit C

1/2

1/2

1/2

A A

A

(35)

Background Ideas Results Consequences

Main result

instance I

1/2 1/2 1/2

R(a, b) R(b, c) R(c, d)

instance I

1/2 1/2 1/2

R(a, b) R(b, c) R(c, d)

tree encoding TI

tree decomposition O(|I|) for fixed width

1/2 R(a, b) 1/2

R(b, c)

1/2

R(c, d)

tree encoding TI

tree decomposition O(|I|) for fixed width

1/2 R(a, b) 1/2

R(b, c)

1/2

R(c, d)

query q

xy R(x, y) S(y)

query q

xy R(x, y) S(y)

deterministic tree automaton Aq rewriting

O(1) data complexity

deterministic tree automaton Aq rewriting

O(1) data complexity

instrumentation linear time

bounded treewidth circuit C

1/2

1/2

1/2

A A

A

instrumentation linear time

bounded treewidth circuit C

1/2

1/2

1/2

A A

A

probability p probabilistic inference O(|C|) for fixed width

0.42

(36)

Background Ideas Results Consequences

Main result

instance I

1/2 1/2 1/2

R(a, b) R(b, c) R(c, d)

instance I

1/2 1/2 1/2

R(a, b) R(b, c) R(c, d)

tree encoding TI

tree decomposition O(|I|) for fixed width

1/2 R(a, b) 1/2

R(b, c)

1/2

R(c, d)

tree encoding TI

tree decomposition O(|I|) for fixed width

1/2 R(a, b) 1/2

R(b, c)

1/2

R(c, d)

query q

xy R(x, y) S(y)

query q

xy R(x, y) S(y)

deterministic tree automaton Aq rewriting

O(1) data complexity

deterministic tree automaton Aq rewriting

O(1) data complexity

instrumentation linear time

bounded treewidth circuit C

1/2

1/2

1/2

A A

A

instrumentation linear time

bounded treewidth circuit C

1/2

1/2

1/2

A A

A

probability p probabilistic inference O(|C|) for fixed width

0.42

probability p probabilistic inference O(|C|) for fixed width

0.42

(37)

Background Ideas Results Consequences

Consequences

For queries representable as deterministic automata...

CQs

Monadic second-order

Guarded second-order

... on various probabilistic models...

Tuple-independent tables

Block-independent disjoint tables

pc-tables (presented before)

Probabilistic XML

... assuming bounded treewidth(for reasonable definitions) ...

... probability of fixed q can be computed in O(|bI|)!

(38)

Background Ideas Results Consequences

Consequences

For queries representable as deterministic automata...

CQs

Monadic second-order

Guarded second-order

... on various probabilistic models...

Tuple-independent tables

Block-independent disjoint tables

pc-tables (presented before)

Probabilistic XML

... assuming bounded treewidth(for reasonable definitions) ...

... probability of fixed q can be computed in O(|bI|)!

(39)

Background Ideas Results Consequences

Consequences

For queries representable as deterministic automata...

CQs

Monadic second-order

Guarded second-order

... on various probabilistic models...

Tuple-independent tables

Block-independent disjoint tables

pc-tables (presented before)

Probabilistic XML

... assuming bounded treewidth(for reasonable definitions) ...

... probability of fixed q can be computed in O(|bI|)!

(40)

Background Ideas Results Consequences

Consequences

For queries representable as deterministic automata...

CQs

Monadic second-order

Guarded second-order

... on various probabilistic models...

Tuple-independent tables

Block-independent disjoint tables

pc-tables (presented before)

Probabilistic XML

... assuming bounded treewidth(for reasonable definitions) ...

... probability of fixed q can be computed in O(|bI|)!

(41)

Background Ideas Results Consequences

Conclusion

We can combinethe following techniques:

Computingtree decompositions

Encoding problems toautomataon tree encodings of instances Evaluatingprobabilitieson bounded-treewidth circuits

Applications:

Tractable probabilistic query evaluation inpractice? Reasoning underuncertain rules

(hence the bait-and-switch on the title...) Questions:

Othersemiringsthan Boolean AND/OR? Other tasks thanprobabilistic inference? 5

votes

0

answers

48

views

What are bounded-treewidth circuits good for?

circuit-complexity pr.probability treewidth arithmetic-circuits

modified aug 28 at 13:05a3nm1,432 http://cstheory.stackexchange.com/q/25624

Thanks for your attention!

(42)

Background Ideas Results Consequences

Conclusion

We can combinethe following techniques:

Computingtree decompositions

Encoding problems toautomataon tree encodings of instances Evaluatingprobabilitieson bounded-treewidth circuits

Applications:

Tractable probabilistic query evaluation inpractice?

Reasoning underuncertain rules

(hence the bait-and-switch on the title...)

Questions:

Othersemiringsthan Boolean AND/OR? Other tasks thanprobabilistic inference? 5

votes

0

answers

48

views

What are bounded-treewidth circuits good for?

circuit-complexity pr.probability treewidth arithmetic-circuits

modified aug 28 at 13:05a3nm1,432 http://cstheory.stackexchange.com/q/25624

Thanks for your attention!

(43)

Background Ideas Results Consequences

Conclusion

We can combinethe following techniques:

Computingtree decompositions

Encoding problems toautomataon tree encodings of instances Evaluatingprobabilitieson bounded-treewidth circuits

Applications:

Tractable probabilistic query evaluation inpractice?

Reasoning underuncertain rules

(hence the bait-and-switch on the title...) Questions:

Othersemiringsthan Boolean AND/OR?

Other tasks thanprobabilistic inference?

5

votes

0

answers

48

views

What are bounded-treewidth circuits good for?

circuit-complexity pr.probability treewidth arithmetic-circuits

modified aug 28 at 13:05a3nm1,432 http://cstheory.stackexchange.com/q/25624

Thanks for your attention!

(44)

Background Ideas Results Consequences

Conclusion

We can combinethe following techniques:

Computingtree decompositions

Encoding problems toautomataon tree encodings of instances Evaluatingprobabilitieson bounded-treewidth circuits

Applications:

Tractable probabilistic query evaluation inpractice?

Reasoning underuncertain rules

(hence the bait-and-switch on the title...) Questions:

Othersemiringsthan Boolean AND/OR?

Other tasks thanprobabilistic inference?

5

votes

0

answers

48

views

What are bounded-treewidth circuits good for?

circuit-complexity pr.probability treewidth arithmetic-circuits

modified aug 28 at 13:05a3nm1,432 http://cstheory.stackexchange.com/q/25624

Thanks for your attention!

(45)

Background Ideas Results Consequences

Conclusion

We can combinethe following techniques:

Computingtree decompositions

Encoding problems toautomataon tree encodings of instances Evaluatingprobabilitieson bounded-treewidth circuits

Applications:

Tractable probabilistic query evaluation inpractice?

Reasoning underuncertain rules

(hence the bait-and-switch on the title...) Questions:

Othersemiringsthan Boolean AND/OR?

Other tasks thanprobabilistic inference?

5

votes

0

answers

48

views

What are bounded-treewidth circuits good for?

circuit-complexity pr.probability treewidth arithmetic-circuits

modified aug 28 at 13:05a3nm1,432 http://cstheory.stackexchange.com/q/25624

Thanks for your attention!

Références

Documents relatifs

A novel Ontology Alignment Evaluation Initiative (OAEI) track, Ontology Alignment for Query Answering (OA4QA), introduced in the 2014 evaluation cam- paign, aims at bridging this gap

We study the data complexity of conjunctive query answering under these new semantics, and show a general tractability result for all known first-order rewritable ontology languages.

However, source data are generally not under the control of the data integration process, thus integrated data may violate global integrity constraints even in presence

An important result of these works is that the main variants of DL-Lite are not only decidable, but that answering (unions of) conjunctive queries for these logics is in LOGSPACE ,

While in the general case the complexity is the same as for WGTGDs, interestingly, when reasoning with a fixed set of dependencies (which is the usual setting in data ex- change and

In this talk, we focus on rule-based ontol- ogy languages, and we demonstrate the tight connection between classical and consistent query answering.. More precisely, we focus on

In this work, which is a short version of [10], we performed an in-depth complexity analysis of the problem of consistent query answering under the main decidable classes of

For example, the function S ⊥ that always returns an empty set of finite po- sitions, WChS(S ⊥ ) is the class of sticky programs, because stickiness must hold no matter what