• Aucun résultat trouvé

Leveraging the structure of uncertain data

N/A
N/A
Protected

Academic year: 2022

Partager "Leveraging the structure of uncertain data"

Copied!
159
0
0

Texte intégral

(1)

Leveraging the structure of uncertain data

Antoine Amarilli May 26, 2017

(2)

Example application: Subway routing

(3)

Example application: Subway routing

(4)

Example application: Subway routing

(5)

Example application: Subway routing

(6)

Example application: Subway routing

(7)

Database theory and query evaluation

13 6

23 14

Database

• (Hyper)graph

• Collection of ground facts

G(aa1, ab2), G(ab2, ac3), S(aa1, m4), S(ab2, rB), ...

(8)

Database theory and query evaluation

13 6

23 14

Query

?

+

Database

• (Hyper)graph

• Collection of ground facts

G(aa ab G(ab ac

• Regular path

∀X(rm ∈ X ∧ ∀xy (Metro|RER)*

• Logic formula

|(Bus|Tram)*

(9)

Database theory and query evaluation

13 6

23 14

Query

?

+

Database

• (Hyper)graph

• Collection of ground facts

G(aa1, ab2), G(ab2, ac3), S(aa1, m4), S(ab2, rB), ...

• Regular path

∀X(rm ∈ X ∧ ∀xy

i Result

• TRUE/FALSE

↔ Model checking (Metro|RER)*

• Logic formula

(x ∈ X ∧ G(x, y) → y ∈ X)) → gn ∈ X

|(Bus|Tram)*

(10)

Probabilistic query evaluation

13 6

23 14

(11)

Probabilistic query evaluation

(12)

Probabilistic query evaluation

(13)

Probabilistic query evaluation

(14)

Probabilistic query evaluation

(15)

Probabilistic query evaluation

(16)

Probabilistic query evaluation

(17)

Probabilistic query evaluation

13 6

23 14

(18)

Probabilistic query evaluation

13 6

23 95% 14

5% 60

98%

2% 60

(19)

Probabilistic query evaluation

• (Hyper)graph

• Collection of ground facts

Probabilistic database

+ independent probabilities

13 6

23 95%14

5% 60

98%

2% 60

(20)

Probabilistic query evaluation

Query

?

+

• (Hyper)graph

• Collection of ground facts

• Regular path

∀X(rm ∈ X ∧ ∀xy (Metro|RER)*

• Logic formula

|(Bus|Tram)*

Probabilistic database

+ independent

13 6

23 95%14

5% 60

98%

2% 60

(21)

Probabilistic query evaluation

Query

?

+

• (Hyper)graph

• Collection of ground facts

• Regular path

∀X(rm ∈ X ∧ ∀xy (Metro|RER)*

• Logic formula

(x ∈ X ∧ G(x, y) → y ∈ X)) → gn ∈ X

|(Bus|Tram)*

Probabilistic database

+ independent probabilities

Probabilistic i

Result

• Probability according to the input distribution

13 6

23 95%14

5% 60

98%

2% 60

proba to be on time: 98%

(22)

Computational complexity

Computing paths on a large graph:

#P-hardcomputational complexity in thedatabase

(23)

Computational complexity

50%

90%

42%

37%

90%

83%

78%

72%

Computing paths on a largeprobabilisticgraph:

???

#P-hardcomputational complexity in thedatabase

(24)

Computational complexity

50%

90%

42%

37%

90%

83%

78%

72%

Computing paths on a largeprobabilisticgraph:

#P-hardcomputational complexity in thedatabase

(25)

Computational complexity

50%

90%

42%

37%

90%

83%

78%

72%

Computing paths on a largeprobabilisticgraph:

Exponentialnumber of possibilities

(26)

Idea: use the structure of data

→ Shortest path:very easyon alarge tree

(27)

Idea: use the structure of data

→ Shortest path:very easyon alarge tree

(28)

Idea: use the structure of data

→ Shortest path:very easyon alarge tree

(29)

Idea: use the structure of data

→ Shortest path:very easyon alarge tree

(30)

Idea: use the structure of data

→ Shortest path:very easyon alarge tree

(31)

Idea: use the structure of data

→ Shortest path:very easyon alarge tree

(32)

Idea: use the structure of data

(33)

Leveraging the structure of uncertain data

Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?

In this talk:

Existing results onnon-probabilistic data:

Tree automata, to evaluate queries on trees

Treewidth, formalizes the notion of being “close to a tree”

Courcelle’s theorem

Introducenew toolsandresults:

Provenance circuitsof tree automata on uncertain trees

• Applications toprobabilistic query evaluation

Other applications:Counting, enumeration, provenance...

(34)

Leveraging the structure of uncertain data

Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?

In this talk:

Existing results onnon-probabilistic data:

Tree automata, to evaluate queries on trees

Treewidth, formalizes the notion of being “close to a tree”

Courcelle’s theorem

Introducenew toolsandresults:

Provenance circuitsof tree automata on uncertain trees

• Applications toprobabilistic query evaluation

Other applications:Counting, enumeration, provenance...

(35)

Leveraging the structure of uncertain data

Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?

In this talk:

Existing results onnon-probabilistic data:

Tree automata, to evaluate queries on trees

Treewidth, formalizes the notion of being “close to a tree”

Courcelle’s theorem

Introducenew toolsandresults:

Provenance circuitsof tree automata on uncertain trees

• Applications toprobabilistic query evaluation

Other applications:Counting, enumeration, provenance...

(36)

Leveraging the structure of uncertain data

Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?

In this talk:

Existing results onnon-probabilistic data:

Tree automata, to evaluate queries on trees

Treewidth, formalizes the notion of being “close to a tree”

Courcelle’s theorem

Introducenew toolsandresults:

Provenance circuitsof tree automata on uncertain trees

• Applications toprobabilistic query evaluation

Other applications:Counting, enumeration, provenance...

(37)

Leveraging the structure of uncertain data

Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?

In this talk:

Existing results onnon-probabilistic data:

Tree automata, to evaluate queries on trees

Treewidth, formalizes the notion of being “close to a tree”

Courcelle’s theorem

Introducenew toolsandresults:

Provenance circuitsof tree automata on uncertain trees

• Applications toprobabilistic query evaluation

Other applications:Counting, enumeration, provenance...

(38)

Leveraging the structure of uncertain data

Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?

In this talk:

Existing results onnon-probabilistic data:

Tree automata, to evaluate queries on trees

Treewidth, formalizes the notion of being “close to a tree”

Courcelle’s theorem

Introducenew toolsandresults:

Provenance circuitsof tree automata on uncertain trees

• Applications toprobabilistic query evaluation

Other applications:Counting, enumeration, provenance...

(39)

Leveraging the structure of uncertain data

Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?

In this talk:

Existing results onnon-probabilistic data:

Tree automata, to evaluate queries on trees

Treewidth, formalizes the notion of being “close to a tree”

Courcelle’s theorem

Introducenew toolsandresults:

Provenance circuitsof tree automata on uncertain trees

• Applications toprobabilistic query evaluation

Other applications:Counting, enumeration, provenance...

(40)

Leveraging the structure of uncertain data

Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?

In this talk:

Existing results onnon-probabilistic data:

Tree automata, to evaluate queries on trees

Treewidth, formalizes the notion of being “close to a tree”

Courcelle’s theorem

Introducenew toolsandresults:

Provenance circuitsof tree automata on uncertain trees

• Applications toprobabilistic query evaluation

Other applications:Counting, enumeration, provenance...

(41)

Leveraging the structure of uncertain data

Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?

In this talk:

Existing results onnon-probabilistic data:

Tree automata, to evaluate queries on trees

Treewidth, formalizes the notion of being “close to a tree”

Courcelle’s theorem

Introducenew toolsandresults:

Provenance circuitsof tree automata on uncertain trees

• Applications toprobabilistic query evaluation

Other applications:Counting, enumeration, provenance...

(42)

Table of contents

Introduction Existing tools

Provenance circuits and probabilistic query evaluation Other applications

(43)

Non-probabilistic query evaluation on trees

Database: atreeT where nodes have a color from an alphabet

?

QueryQ: asentencein monadic second-order logic (MSO)

P (x)means “xis blue”

xymeans “xis the parent ofy

“Is there both a pink and a blue node?”

∃x y P (x)∧P (y)

i

Result: TRUE/FALSE indicating if the treeT satisfies the queryQ

Computational complexityas a function ofT (the queryQisfixed)

(44)

Non-probabilistic query evaluation on trees

Database: atreeT where nodes have a color from an alphabet

?

QueryQ: asentencein monadic

second-order logic (MSO)

P (x)means “xis blue”

xymeans “xis the parent ofy

“Is there both a pink and a blue node?”

∃x y P (x)∧P (y)

i

Result: TRUE/FALSE indicating if the treeT satisfies the queryQ

Computational complexityas a function ofT (the queryQisfixed)

(45)

Non-probabilistic query evaluation on trees

Database: atreeT where nodes have a color from an alphabet

?

QueryQ: asentencein monadic

second-order logic (MSO)

P (x)means “xis blue”

xymeans “xis the parent ofy

“Is there both a pink and a blue node?”

∃x y P (x)∧P (y)

i

Result: TRUE/FALSE indicating if the treeT satisfies the queryQ

Computational complexityas a function ofT (the queryQisfixed)

(46)

Non-probabilistic query evaluation on trees

Database: atreeT where nodes have a color from an alphabet

?

QueryQ: asentencein monadic

second-order logic (MSO)

P (x)means “xis blue”

xymeans “xis the parent ofy

“Is there both a pink and a blue node?”

∃x y P (x)∧P (y)

i

Result: TRUE/FALSE indicating if the treeT satisfies the queryQ

Computational complexityas a function ofT (the queryQisfixed)

(47)

Tree automata Tree alphabet:

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states:{>}

Initial function: ⊥ P B

Transitions(examples): P

P

> B P

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees Corollary

Non-probabilistic query evaluation of MSO on trees is inlinear time.

(48)

Tree automata

Tree alphabet:

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states:{>}

Initial function: ⊥ P B

Transitions(examples): P

P

> B P

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees Corollary

Non-probabilistic query evaluation of MSO on trees is inlinear time.

(49)

Tree automata

Tree alphabet:

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states:{>}

Initial function: ⊥ P B

Transitions(examples): P

P

> B P

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees Corollary

Non-probabilistic query evaluation of MSO on trees is inlinear time.

(50)

Tree automata

Tree alphabet:

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states:{>}

Initial function: ⊥ P B

Transitions(examples): P

P

> B P

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees Corollary

Non-probabilistic query evaluation of MSO on trees is inlinear time.

(51)

Tree automata

Tree alphabet:

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states:{>}

Initial function: ⊥ P B

Transitions(examples): P

P

> B P

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees Corollary

Non-probabilistic query evaluation of MSO on trees is inlinear time.

(52)

Tree automata Tree alphabet:

B

P

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states:{>}

Initial function: ⊥ P B

Transitions(examples): P

P

> B P

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees Corollary

Non-probabilistic query evaluation of MSO on trees is inlinear time.

(53)

Tree automata Tree alphabet:

B

P

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states:{>}

Initial function: ⊥ P B

Transitions(examples):

P P

>

B P

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees Corollary

Non-probabilistic query evaluation of MSO on trees is inlinear time.

(54)

Tree automata Tree alphabet:

B

B

P

P

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states:{>}

Initial function: ⊥ P B

Transitions(examples):

P P

>

B P

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees Corollary

Non-probabilistic query evaluation of MSO on trees is inlinear time.

(55)

Tree automata Tree alphabet:

>

B

B

P

P

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states:{>}

Initial function: ⊥ P B

Transitions(examples):

P P

>

B P

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees Corollary

Non-probabilistic query evaluation of MSO on trees is inlinear time.

(56)

Tree automata Tree alphabet:

>

B

B

P

P

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states:{>}

Initial function: ⊥ P B

Transitions(examples):

P P

>

B P

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees

Corollary

Non-probabilistic query evaluation of MSO on trees is inlinear time.

(57)

Tree automata Tree alphabet:

>

B

B

P

P

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states:{>}

Initial function: ⊥ P B

Transitions(examples):

P P

>

B P

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees Corollary

(58)

Non-probabilistic query evaluation on trees

Database: atreeT where nodes have a color from an alphabet

?

QueryQ: asentencein monadic

second-order logic (MSO)

P (x)means “xis blue”

xymeans “xis the parent ofy

“Is there both a pink and a blue node?”

∃x y P (x)∧P (y)

i

Result: TRUE/FALSE indicating ifTsatisfies the queryQ

Computational complexityas a function of thetreeT (the queryQisfixed)

(59)

Non-probabilistic query evaluation on treelike data

Database: atreelike databaseT

???

?

QueryQ: asentencein monadic

second-order logic (MSO)

P (x)means “xis blue”

xymeans “xis the parent ofy

(Metro|RER)

|(Bus|Tram)

i

Result: TRUE/FALSE indicating ifTsatisfies the queryQ

Computational complexityas a function of thetreeT (the queryQisfixed)

(60)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(61)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(62)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(63)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(64)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(65)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(66)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(67)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(68)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(69)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(70)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(71)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(72)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(73)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliquesand(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(74)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliques and(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(75)

Treewidth

Treewidth by example:

Treeshave treewidth1

Cycleshave treewidth2

k-cliques and(k1)-gridshave treewidthk−1

→ Treelike: thetreewidthis bounded by aconstant

(76)

Courcelle’s theorem

(RER|metro)*

|(bus|tram)*

MSO query Treelike data

Theorem [Courcelle, 1990]

For any fixed Boolean MSO queryQandk∈N, given a databaseDof treewidthk,

we can compute inlinear timeinDwhetherDsatisfiesQ

(77)

Courcelle’s theorem

Tree automaton (RER|metro)*

|(bus|tram)*

MSO query Treelike data

Theorem [Courcelle, 1990]

For any fixed Boolean MSO queryQandk∈N, given a databaseDof treewidthk,

we can compute inlinear timeinDwhetherDsatisfiesQ

(78)

Courcelle’s theorem

Tree automaton Tree encoding

(RER|metro)*

|(bus|tram)*

MSO query Treelike data

linear

Theorem [Courcelle, 1990]

For any fixed Boolean MSO queryQandk∈N, given a databaseDof treewidthk,

we can compute inlinear timeinDwhetherDsatisfiesQ

(79)

Courcelle’s theorem

Tree automaton Tree encoding

(RER|metro)*

|(bus|tram)*

MSO query Treelike data

Query answer

TRUE

linear linear

Theorem [Courcelle, 1990]

For any fixed Boolean MSO queryQandk∈N, given a databaseDof treewidthk,

we can compute inlinear timeinDwhetherDsatisfiesQ

(80)

Courcelle’s theorem

Tree automaton Tree encoding

(RER|metro)*

|(bus|tram)*

MSO query Treelike data

Query answer

TRUE

linear linear

Theorem [Courcelle, 1990]

For any fixed Boolean MSO queryQandk∈N, given a databaseDof treewidthk,

(81)

Table of contents

Introduction Existing tools

Provenance circuits and probabilistic query evaluation Other applications

(82)

Probabilistic query evaluation on treelike data

DatabaseDwithtreewidthk

for some constantk

Probabilityof each fact ofD to be actually present in the data (independently from other facts)

?

QueryQ: asentencein monadic

second-order logic (MSO)

(Metro|RER)

|(Bus|Tram)

i

Result: Probabilitythat the databaseDsatisfies queryQ

Computational complexityas a function of thedatabaseD (the queryQisfixed)

(83)

Probabilistic query evaluation on treelike data

DatabaseDwithtreewidthk

for some constantk

Probabilityof each fact ofD to be actually present in the data (independently from other facts)

?

QueryQ: asentencein monadic

second-order logic (MSO)

(Metro|RER)

|(Bus|Tram)

i

Result: Probabilitythat the databaseDsatisfies queryQ

Computational complexityas a function of thedatabaseD (the queryQisfixed)

(84)

Probabilistic query evaluation on treelike data

DatabaseDwithtreewidthk

for some constantk

Probabilityof each fact ofD to be actually present in the data (independently from other facts)

?

QueryQ: asentencein monadic

second-order logic (MSO)

(Metro|RER)

|(Bus|Tram)

i

Result: Probabilitythat the databaseDsatisfies queryQ

Computational complexityas a function of thedatabaseD (the queryQisfixed)

(85)

Probabilistic query evaluation on treelike data

DatabaseDwithtreewidthk

for some constantk

Probabilityof each fact ofD to be actually present in the data (independently from other facts)

?

QueryQ: asentencein monadic

second-order logic (MSO)

(Metro|RER)

|(Bus|Tram)

i

Result: Probabilitythat the databaseDsatisfies queryQ

Computational complexityas a function of thedatabaseD (the queryQisfixed)

(86)

Roadmap

Tree automaton Tree encoding

(RER|metro)*

|(bus|tram)*

MSO query Treelike data

linear

(87)

Roadmap

Tree automaton Tree encoding

(RER|metro)*

|(bus|tram)*

MSO query Treelike data

linear linear

Provenance circuit

(88)

Roadmap

Tree automaton Uncertain tree encoding

(RER|metro)*

|(bus|tram)*

MSO query Probabilistic treelike data

linear linear

Provenance circuit

Probability

95%

linear

+probabilities

(89)

Uncertain trees

1

5

6 7 2

3 4

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{2,3,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

(90)

Uncertain trees

1

5

6 7 2

3 4

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels

Valuation:{2,3,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

(91)

Uncertain trees

1

5

6 7 2

3 4

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{2,3,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

(92)

Uncertain trees

1

5

6 7 2

3 4

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{27→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

(93)

Uncertain trees

1

5

6 7 2

3 4

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{2,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

(94)

Uncertain trees

1

5

6 7 2

3 4

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{2,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

(95)

Uncertain trees

1

5

6 7 2

3 4

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{2,3,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

The tree automatonAaccepts

(96)

Uncertain trees

1

5

6 7 2

3 4

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{27→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

The tree automatonArejects

(97)

Uncertain trees

1

5

6 7 2

3 4

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{2,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

The tree automatonAaccepts

(98)

Boolean circuit

¬

x

y

Directed acyclic graph ofgates

Outputgate:

Variablegates: x

Internalgates: ¬

Valuation: function from variables to{0,1} Example:ν ={x7→0, y7→1}... mapped to1

(99)

Boolean circuit

¬

x

y

Directed acyclic graph ofgates

Outputgate:

Variablegates: x

Internalgates: ¬

Valuation: function from variables to{0,1} Example:ν ={x7→0, y7→1}... mapped to1

(100)

Boolean circuit

¬

x

y

Directed acyclic graph ofgates

Outputgate:

Variablegates: x

Internalgates: ¬

Valuation: function from variables to{0,1} Example:ν ={x7→0, y7→1}... mapped to1

(101)

Boolean circuit

¬

x

y

Directed acyclic graph ofgates

Outputgate:

Variablegates: x

Internalgates: ¬

Valuation: function from variables to{0,1} Example:ν ={x7→0, y7→1}... mapped to1

(102)

Boolean circuit

¬

x

y

Directed acyclic graph ofgates

Outputgate:

Variablegates: x

Internalgates: ¬

Valuation: function from variables to{0,1}

Example:ν ={x7→0, y7→1}...

mapped to1

(103)

Boolean circuit

¬

x

y

Directed acyclic graph ofgates

Outputgate:

Variablegates: x

Internalgates: ¬

Valuation: function from variables to{0,1}

Example:ν ={x7→0, y7→1}...

mapped to1

(104)

Boolean circuit

¬

x

y

Directed acyclic graph ofgates

Outputgate:

Variablegates: x

Internalgates: ¬

Valuation: function from variables to{0,1}

Example:ν ={x7→0, y7→1}...

mapped to1

(105)

Boolean circuit

¬

x

y

Directed acyclic graph ofgates

Outputgate:

Variablegates: x

Internalgates: ¬

Valuation: function from variables to{0,1}

Example:ν ={x7→0, y7→1}... mapped to1

(106)

Provenance circuit 1

5 6 7 2

4 3

Query:Is there both a pink and a blue node?

Provenance circuit:

∨ 7

2 3

Formally:

Tree automatonA, uncertain treeT, circuitC

Variable gatesofC: nodes ofT

Condition:Letνbe a valuation ofT, thenν(C)iffAacceptsν(T)

(107)

Provenance circuit 1

5 6 7 2

4 3

Query:Is there both a pink and a blue node?

Provenance circuit:

∨ 7

2 3

Formally:

Tree automatonA, uncertain treeT, circuitC

Variable gatesofC: nodes ofT

Condition:Letνbe a valuation ofT, thenν(C)iffAacceptsν(T)

(108)

Provenance circuit 1

5 6 7 2

4 3

Query:Is there both a pink and a blue node?

Provenance circuit:

∨ 7

2 3

Formally:

Tree automatonA, uncertain treeT, circuitC

Variable gatesofC: nodes ofT

Condition:Letνbe a valuation ofT, thenν(C)iffAacceptsν(T)

(109)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

> P

P P

n

n

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

¬ ∧

(110)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

¬ ∧

(111)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

¬ ∧

(112)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

¬ ∧

(113)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

¬ ∧

(114)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

¬ ∧

(115)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

∨ ∨ ∨ ∨

⊥ B P >

¬ ∧

(116)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∨ ∨ ∨ ∨

⊥ B P >

¬ ∧

(117)

Probabilistic query evaluation

Tree automaton Uncertain tree encoding

(RER|metro)*

|(bus|tram)*

MSO query Probabilistic treelike data

linear linear

Provenance circuit

Probability

95%

linear

+probabilities

(118)

Details of the approach

Probabilistic treelike data

Each fact can disappear with some probability

→ How to computeefficientlythe probability of the circuit?

(119)

Details of the approach

Uncertain tree encoding Probabilistic

treelike data

Each fact can disappear with some probability

Each node label can disappear with the probability of the coded fact

→ How to computeefficientlythe probability of the circuit?

(120)

Details of the approach

Uncertain tree encoding Probabilistic

treelike data

Each fact can disappear with some probability

Each node label can disappear with the probability of the coded fact

Each variable can be true with the probability of the coded fact

Provenance circuit +probabilities

→ How to computeefficientlythe probability of the circuit?

(121)

Details of the approach

Uncertain tree encoding Probabilistic

treelike data

Each fact can disappear with some probability

Each node label can disappear with the probability of the coded fact

Each variable can be true with the probability of the coded fact

Probability

95%

Probability that the circuit evaluates to true

Provenance circuit +probabilities

→ How to computeefficientlythe probability of the circuit?

(122)

Details of the approach

Uncertain tree encoding Probabilistic

treelike data

Each fact can disappear with some probability

Each node label can disappear with the probability of the coded fact

Each variable can be true with the probability of the coded fact

Probability

95%

Probability that the circuit evaluates to true

Provenance circuit +probabilities

→ How to computeefficientlythe probability of the circuit?

(123)

Computing the circuit probability The circuit is ad-DNNF...

... so probability computation iseasy!

¬ gates only have variablesas inputs

¬ g

g0

P(g) :=1−P(g0)

gates always have mutually exclusiveinputs

g

g01 g02

P(g) :=P(g01) +P(g02)

gates are all on independentinputs

g

g01 g02

P(g) :=P(g01P(g02)

Theorem [Amarilli et al., 2015]

For anyfixedBoolean MSO queryQandk∈N,

given a databaseDoftreewidthkwithindependent probabilities, we can compute inlinear timethe probability thatDsatisfiesQ

(124)

Computing the circuit probability The circuit is ad-DNNF...

... so probability computation iseasy!

¬ gates only have variablesas inputs

¬ g

g0

P(g) :=1−P(g0)

gates always have mutually exclusiveinputs

g

g01 g02

P(g) :=P(g01) +P(g02)

gates are all on independentinputs

g

g01 g02

P(g) :=P(g01P(g02)

Theorem [Amarilli et al., 2015]

For anyfixedBoolean MSO queryQandk∈N,

given a databaseDoftreewidthkwithindependent probabilities, we can compute inlinear timethe probability thatDsatisfiesQ

(125)

Computing the circuit probability The circuit is ad-DNNF...

... so probability computation iseasy!

¬ gates only have variablesas inputs

¬ g

g0

P(g) :=1−P(g0)

gates always have mutually exclusiveinputs

g

g01 g02

P(g) :=P(g01) +P(g02)

gates are all on independentinputs

g

g01 g02

P(g) :=P(g01P(g02)

Theorem [Amarilli et al., 2015]

For anyfixedBoolean MSO queryQandk∈N,

given a databaseDoftreewidthkwithindependent probabilities, we can compute inlinear timethe probability thatDsatisfiesQ

(126)

Computing the circuit probability The circuit is ad-DNNF...

... so probability computation iseasy!

¬ gates only have variablesas inputs

¬ g

g0

P(g) :=1−P(g0)

gates always have mutually exclusiveinputs

g

g01 g02

P(g) :=P(g01) +P(g02)

gates are all on independentinputs

g

g01 g02

P(g) :=P(g01P(g02)

Theorem [Amarilli et al., 2015]

For anyfixedBoolean MSO queryQandk∈N,

given a databaseDoftreewidthkwithindependent probabilities, we can compute inlinear timethe probability thatDsatisfiesQ

(127)

Computing the circuit probability

The circuit is ad-DNNF... ... so probability computation iseasy!

¬ gates only have variablesas inputs

¬ g

g0

P(g) :=1−P(g0)

gates always have mutually exclusiveinputs

g

g01 g02

P(g) :=P(g01) +P(g02)

gates are all on independentinputs

g

g01 g02

P(g) :=P(g01P(g02)

Theorem [Amarilli et al., 2015]

For anyfixedBoolean MSO queryQandk∈N,

given a databaseDoftreewidthkwithindependent probabilities, we can compute inlinear timethe probability thatDsatisfiesQ

(128)

Computing the circuit probability

The circuit is ad-DNNF... ... so probability computation iseasy!

¬ gates only have variablesas inputs

¬ g

g0

P(g) :=1−P(g0)

gates always have mutually exclusiveinputs

g

g01 g02

P(g) :=P(g01) +P(g02)

gates are all on independentinputs

g

g01 g02

P(g) :=P(g01P(g02)

Theorem [Amarilli et al., 2015]

For anyfixedBoolean MSO queryQandk∈N,

given a databaseDoftreewidthkwithindependent probabilities, we can compute inlinear timethe probability thatDsatisfiesQ

Références

Documents relatifs

In addition to merely comparing our implementations of IndIR, CombIR, and CombQuadIR among each other, we also compare these implementations with existing data structures that can

The task of data discovery and extraction is to discover new data sources those are unknown in advance for the query processing according to the data inference configuration.. The

Given that the parser-components of the types of the schema tree are configured correctly to recognize the relevant text parts, we can use the proposed algorithm to create the

If collection data contains tuples with variable dimension length we can save from 55 % (only by using block which support variable dimension length) up to 85 % (for Fi-

Relational databases manage data, represented here as a labeled graph WorksAtA. Antoine Télécom Paris Antoine

• PQE(Q) is #P-hard for any Q closed under homomorphisms unless it is equivalent to a safe UCQ. • PQE(Q) is #P-hard for any self-join-free

Does query evaluation on probabilistic data have lower complexity when the structure of the data is close to a tree.. • Existing tools for

Does query evaluation on probabilistic data have lower complexity when the structure of the data is close to a tree.. • Existing results on