• Aucun résultat trouvé

Uncertain Data Management Probabilistic Query Evaluation

N/A
N/A
Protected

Academic year: 2022

Partager "Uncertain Data Management Probabilistic Query Evaluation"

Copied!
180
0
0

Texte intégral

(1)

Uncertain Data Management Probabilistic Query Evaluation

Antoine Amarilli1, Silviu Maniu2 December 5th, 2017

1Télécom ParisTech

2LRI

(2)

Table of contents

Probabilistic query evaluation Naive evaluation

Extensional evaluation Intensional query evaluation Conclusion

(3)

Reminder: TID

Remembertuple-independent databases(TID):

U date prof

04 S 0.8

04 A 0.2

Remember that they stand for aprobabilistic database: U

date prof

04 S

04 A

U date prof 04 S

04 A

U date prof

04 S

04 A

U date prof 04 S 04 A

0.8×0.2 (1−0.8)×0.2 0.8×(1−0.2) (1−0.8)×(1−0.2)

(4)

Reminder: TID

Remembertuple-independent databases(TID):

U date prof

04 S 0.8

04 A 0.2

Remember that they stand for aprobabilistic database: U

date prof

04 S

04 A

U date prof 04 S

04 A

U date prof

04 S

04 A

U date prof 04 S 04 A

0.8×0.2 (1−0.8)×0.2 0.8×(1−0.2) (1−0.8)×(1−0.2)

(5)

Reminder: TID

Remembertuple-independent databases(TID):

U date prof

04 S 0.8

04 A 0.2

Remember that they stand for aprobabilistic database: U

date prof

04 S

04 A

U date prof 04 S

04 A

U date prof

04 S

04 A

U date prof 04 S 04 A

0.8×0.2 (1−0.8)×0.2 0.8×(1−0.2) (1−0.8)×(1−0.2)

(6)

Reminder: TID

Remembertuple-independent databases(TID):

U date prof

04 S 0.8

04 A 0.2

Remember that they stand for aprobabilistic database: U

date prof

04 S

04 A

U date prof 04 S

04 A

U date prof

04 S

04 A

U date prof 04 S 04 A

0.8×0.2 (1−0.8)×0.2 0.8×(1−0.2) (1−0.8)×(1−0.2)

(7)

Reminder: TID

Remembertuple-independent databases(TID):

U date prof

04 S 0.8

04 A 0.2

Remember that they stand for aprobabilistic database: U

date prof

04 S

04 A

U date prof 04 S

04 A

U date prof

04 S

04 A

U date prof 04 S 04 A

0.8×0.2 (1−0.8)×0.2 0.8×(1−0.2) (1−0.8)×(1−0.2)

(8)

Reminder: TID

Remembertuple-independent databases(TID):

U date prof

04 S 0.8

04 A 0.2

Remember that they stand for aprobabilistic database: U

date prof

04 S

04 A

U date prof 04 S

04 A

U date prof

04 S

04 A

U date prof 04 S 04 A

(1−0.8)×0.2 0.8×(1−0.2) (1−0.8)×(1−0.2)

(9)

Reminder: TID

Remembertuple-independent databases(TID):

U date prof

04 S 0.8

04 A 0.2

Remember that they stand for aprobabilistic database: U

date prof

04 S

04 A

U date prof 04 S

04 A

U date prof

04 S

04 A

U date prof 04 S 04 A 0.8×0.2 (1−0.8)×0.2

0.8×(1−0.2) (1−0.8)×(1−0.2)

(10)

Reminder: TID

Remembertuple-independent databases(TID):

U date prof

04 S 0.8

04 A 0.2

Remember that they stand for aprobabilistic database: U

date prof

04 S

04 A

U date prof 04 S

04 A

U date prof

04 S

04 A

U date prof 04 S 04 A

(1−0.8)×(1−0.2)

(11)

Reminder: TID

Remembertuple-independent databases(TID):

U date prof

04 S 0.8

04 A 0.2

Remember that they stand for aprobabilistic database: U

date prof

04 S

04 A

U date prof 04 S

04 A

U date prof

04 S

04 A

U date prof 04 S 04 A

0.8×0.2 (1−0.8)×0.2 0.8×(1−0.2) (1−0.8)×(1−0.2)

(12)

Reminder: Relational algebra on probabilistic databases

U1

04 S. C017 11 S. C47

π(U1) =0.8

U2

11 A. C017

π(U2) =0.2

V1

π(V1) =0.9

V2

11 A. C017

π(V2) =0.1

=

W1

04 S. C017 11 S. C47

π(W1) =0.8×0.9

W2

04 S. C017 11 S. C47 11 A. C017

π(W1) =0.8×0.1

W3

11 A. C017

π(W1) =0.2×0.9 +0.2×0.1

(13)

Reminder: Relational algebra on probabilistic databases

U1

04 S. C017 11 S. C47

π(U1) =0.8

U2

11 A. C017

π(U2) =0.2

V1

π(V1) =0.9

V2

11 A. C017

π(V2) =0.1

=

W1

04 S. C017 11 S. C47

π(W1) =0.8×0.9

W2

04 S. C017 11 S. C47 11 A. C017

π(W1) =0.8×0.1

W3

11 A. C017

π(W1) =0.2×0.9 +0.2×0.1

(14)

Reminder: Relational algebra on probabilistic databases

U1

04 S. C017 11 S. C47

π(U1) =0.8

U2

11 A. C017

π(U2) =0.2

V1

π(V1) =0.9

V2

11 A. C017

π(V2) =0.1

=

W1

04 S. C017 11 S. C47

π(W1) =0.8×0.9

W2

04 S. C017 11 S. C47 11 A. C017

π(W1) =0.8×0.1

W3

11 A. C017

π(W1) =0.2×0.9 +0.2×0.1

(15)

Reminder: Relational algebra on probabilistic databases

U1

04 S. C017 11 S. C47

π(U1) =0.8

U2

11 A. C017

π(U2) =0.2

V1

π(V1) =0.9

V2

11 A. C017

π(V2) =0.1

=

W1

04 S. C017 11 S. C47

π(W1) =0.8×0.9

W2

04 S. C017 11 S. C47 11 A. C017

π(W1) =0.8×0.1

W3

11 A. C017

π(W1) =0.2×0.9 +0.2×0.1

(16)

Reminder: Relational algebra on probabilistic databases

U1

04 S. C017 11 S. C47

π(U1) =0.8

U2

11 A. C017 π(U2) =0.2

V1

π(V1) =0.9

V2

11 A. C017 π(V2) =0.1

=

W1

04 S. C017 11 S. C47

π(W1) =0.8×0.9

W2

04 S. C017 11 S. C47 11 A. C017

π(W1) =0.8×0.1

W3

11 A. C017

π(W1) =0.2×0.9 +0.2×0.1

(17)

Reminder: Relational algebra on probabilistic databases

U1

04 S. C017 11 S. C47

π(U1) =0.8

U2

11 A. C017 π(U2) =0.2

V1

π(V1) =0.9

V2

11 A. C017 π(V2) =0.1

=

W1

04 S. C017 11 S. C47

π(W1) =0.8×0.9

W2

04 S. C017 11 S. C47 11 A. C017

π(W1) =0.8×0.1

W3

11 A. C017

π(W1) =0.2×0.9 +0.2×0.1

(18)

Reminder: Relational algebra on probabilistic databases

U1

04 S. C017 11 S. C47

π(U1) =0.8

U2

11 A. C017 π(U2) =0.2

V1

π(V1) =0.9

V2

11 A. C017 π(V2) =0.1

=

W1

04 S. C017 11 S. C47

π(W1) =0.8×0.9 W2

04 S. C017 11 S. C47 11 A. C017

π(W1) =0.8×0.1

W3

11 A. C017

π(W1) =0.2×0.9 +0.2×0.1

(19)

Reminder: Relational algebra on probabilistic databases

U1

04 S. C017 11 S. C47

π(U1) =0.8

U2

11 A. C017 π(U2) =0.2

V1

π(V1) =0.9

V2

11 A. C017 π(V2) =0.1

=

W1

04 S. C017 11 S. C47

π(W1) =0.8×0.9

W2

04 S. C017 11 S. C47 11 A. C017

π(W1) =0.8×0.1 W3

11 A. C017

π(W1) =0.2×0.9 +0.2×0.1

(20)

Reminder: Relational algebra on probabilistic databases

U1

04 S. C017 11 S. C47

π(U1) =0.8

U2

11 A. C017 π(U2) =0.2

V1

π(V1) =0.9

V2

11 A. C017 π(V2) =0.1

=

W1

04 S. C017 11 S. C47

π(W1) =0.8×0.9

W2

04 S. C017 11 S. C47 11 A. C017

π(W1) =0.8×0.1

W3

11 A. C017

π(W1) =0.2×0.9 +0.2×0.1

(21)

Reminder: Relational algebra on probabilistic databases

U1

04 S. C017 11 S. C47

π(U1) =0.8

U2

11 A. C017 π(U2) =0.2

V1

π(V1) =0.9

V2

11 A. C017 π(V2) =0.1

=

W1

04 S. C017 11 S. C47 π(W1) =0.8×0.9

W2

04 S. C017 11 S. C47 11 A. C017

π(W1) =0.8×0.1

W3

11 A. C017

π(W1) =0.2×0.9 +0.2×0.1

(22)

Reminder: Relational algebra on probabilistic databases

U1

04 S. C017 11 S. C47

π(U1) =0.8

U2

11 A. C017 π(U2) =0.2

V1

π(V1) =0.9

V2

11 A. C017 π(V2) =0.1

=

W1

04 S. C017 11 S. C47 π(W1) =0.8×0.9

W2

04 S. C017 11 S. C47 11 A. C017 π(W1) =0.8×0.1

W3

11 A. C017

π(W1) =0.2×0.9 +0.2×0.1

(23)

Reminder: Relational algebra on probabilistic databases

U1

04 S. C017 11 S. C47

π(U1) =0.8

U2

11 A. C017 π(U2) =0.2

V1

π(V1) =0.9

V2

11 A. C017 π(V2) =0.1

=

W1

04 S. C017 11 S. C47 π(W1) =0.8×0.9

W2

04 S. C017 11 S. C47 11 A. C017 π(W1) =0.8×0.1

W3

11 A. C017 π(W1) =0.2×0.9

+0.2×0.1

(24)

Reminder: Relational algebra on probabilistic databases

U1

04 S. C017 11 S. C47

π(U1) =0.8

U2

11 A. C017 π(U2) =0.2

V1

π(V1) =0.9

V2

11 A. C017 π(V2) =0.1

=

W1

04 S. C017 11 S. C47 π(W1) =0.8×0.9

W2

04 S. C017 11 S. C47 11 A. C017 π(W1) =0.8×0.1

W3

11 A. C017

(25)

Reminder: strong representation system

• TID are not astrong representation system

• The result of a relational algebra query on a TID database is generally not representable as a TID database

→ Often, we don’t want theentire result

→ We just want to know theprobabilityof each output tuple

(26)

Reminder: strong representation system

• TID are not astrong representation system

• The result of a relational algebra query on a TID database is generally not representable as a TID database

→ Often, we don’t want theentire result

→ We just want to know theprobabilityof each output tuple

(27)

Reminder: strong representation system

• TID are not astrong representation system

• The result of a relational algebra query on a TID database is generally not representable as a TID database

→ Often, we don’t want theentire result

→ We just want to know theprobabilityof each output tuple

(28)

Reminder: strong representation system

• TID are not astrong representation system

• The result of a relational algebra query on a TID database is generally not representable as a TID database

→ Often, we don’t want theentire result

→ We just want to know theprobabilityof each output tuple

(29)

Probabilistic query evaluation

• Inputs:

• adatabaseDof TID instances

• a relational algebraqueryQ

• aresult tuple~t

• Output : what is theprobabilitythat~tis inQ(D)?

→ What is themarginal probabilityof obtaining~tas a result?

(30)

Probabilistic query evaluation

• Inputs:

• adatabaseDof TID instances

• a relational algebraqueryQ

• aresult tuple~t

• Output : what is theprobabilitythat~tis inQ(D)?

→ What is themarginal probabilityof obtaining~tas a result?

(31)

Probabilistic query evaluation

• Inputs:

• adatabaseDof TID instances

• a relational algebraqueryQ

• aresult tuple~t

• Output : what is theprobabilitythat~tis inQ(D)?

→ What is themarginal probabilityof obtaining~tas a result?

(32)

Probabilistic query evaluation example

TID instanceU QueryQ Tuple~t

date prof

04 S 0.8

04 A 0.2

πprof(U) S

→ The marginal probability is0.8

(33)

Probabilistic query evaluation example

TID instanceU QueryQ Tuple~t

date prof

04 S 0.8

04 A 0.2

πprof(U) S

→ The marginal probability is0.8

(34)

Marginal probabilities vs TID representations

Here’s another example:

TID instanceU0 TID instanceV QueryQ Tuple~t

date prof

04 S 1

04 A 1

date 04 0.5

πprof(U0./V) S and A

→ The marginal probability ofSis0.5

→ The marginal probability ofAis also0.5

→ Caution:It doesnotmean that the result is the TID instance at the right!

prof

S 0.5

A 0.5

(35)

Marginal probabilities vs TID representations

Here’s another example:

TID instanceU0 TID instanceV QueryQ Tuple~t

date prof

04 S 1

04 A 1

date 04 0.5

πprof(U0./V) S and A

→ The marginal probability ofSis0.5

→ The marginal probability ofAis also0.5

→ Caution:It doesnotmean that the result is the TID instance at the right!

prof

S 0.5

A 0.5

(36)

Marginal probabilities vs TID representations

Here’s another example:

TID instanceU0 TID instanceV QueryQ Tuple~t

date prof

04 S 1

04 A 1

date 04 0.5

πprof(U0./V) S and A

→ The marginal probability ofSis0.5

→ The marginal probability ofAis also0.5

→ Caution:It doesnotmean that the result is

prof

S 0.5

(37)

Motivation for probabilistic query evaluation

• Answers theintuitive question“what is the probability of this”?

• Often more interesting than thecorrelations between worlds

→ How tocomputethese probabilities?

(38)

Motivation for probabilistic query evaluation

• Answers theintuitive question“what is the probability of this”?

• Often more interesting than thecorrelations between worlds

→ How tocomputethese probabilities?

(39)

Table of contents

Probabilistic query evaluation

Naive evaluation Extensional evaluation Intensional query evaluation Conclusion

(40)

Naive probabilistic query evaluation

• Compute theprobabilistic instancerepresented by the input

Finite number of possible worlds

• Run the query overeach possible world

Check if the result tuple is in the output

• Sum theprobabilitiesof all worlds that contain the output tuple

(41)

Naive probabilistic query evaluation

• Compute theprobabilistic instancerepresented by the input

Finite number of possible worlds

• Run the query overeach possible world

Check if the result tuple is in the output

• Sum theprobabilitiesof all worlds that contain the output tuple

(42)

Naive probabilistic query evaluation

• Compute theprobabilistic instancerepresented by the input

Finite number of possible worlds

• Run the query overeach possible world

Check if the result tuple is in the output

• Sum theprobabilitiesof all worlds that contain the output tuple

(43)

Naive probabilistic query evaluation example

TID instanceU QueryQ Tuple~t

date prof

04 S 0.8

04 A 0.2

πprof(U) S

Probabilistic relationQ(U):

prof S A

prof S A

prof S A

prof S A

0.8×0.2 (1−0.8)×0.2 0.8×(1−0.2) (1−0.8)×(1−0.2) Total probabilitythat~tis inQ(U):0.8×0.2+0.8×(1−0.2) =0.8

(44)

Naive probabilistic query evaluation example

TID instanceU QueryQ Tuple~t

date prof

04 S 0.8

04 A 0.2

πprof(U) S

Probabilistic relationQ(U):

prof S A

prof S A

prof S A

prof S A

0.8×0.2 (1−0.8)×0.2 0.8×(1−0.2) (1−0.8)×(1−0.2)

Total probabilitythat~tis inQ(U):0.8×0.2+0.8×(1−0.2) =0.8

(45)

Naive probabilistic query evaluation example

TID instanceU QueryQ Tuple~t

date prof

04 S 0.8

04 A 0.2

πprof(U) S

Probabilistic relationQ(U):

prof S A

prof S A

prof S A

prof S A

0.8×0.2 (1−0.8)×0.2 0.8×(1−0.2) (1−0.8)×(1−0.2) Total probabilitythat~tis inQ(U):0.8×0.2+0.8×(1−0.2) =0.8

(46)

Naive evaluation advantages and drawbacks

• Naive evaluation isalways possible

• However, it takesexponential timein general

Even if the query output hasfew possible worlds!

Feasible if theinputhas few possible worlds (few tuples)

• Probabilistic query evaluation iscomputationally intractable so it is unlikely that we can beat naive evaluationin general

More efficient methods forspecial cases

(47)

Naive evaluation advantages and drawbacks

• Naive evaluation isalways possible

• However, it takesexponential timein general

Even if the query output hasfew possible worlds!

Feasible if theinputhas few possible worlds (few tuples)

• Probabilistic query evaluation iscomputationally intractable so it is unlikely that we can beat naive evaluationin general

More efficient methods forspecial cases

(48)

Naive evaluation advantages and drawbacks

• Naive evaluation isalways possible

• However, it takesexponential timein general

Even if the query output hasfew possible worlds!

Feasible if theinputhas few possible worlds (few tuples)

• Probabilistic query evaluation iscomputationally intractable so it is unlikely that we can beat naive evaluationin general

More efficient methods forspecial cases

(49)

Naive evaluation advantages and drawbacks

• Naive evaluation isalways possible

• However, it takesexponential timein general

Even if the query output hasfew possible worlds!

Feasible if theinputhas few possible worlds (few tuples)

• Probabilistic query evaluation iscomputationally intractable so it is unlikely that we can beat naive evaluationin general

More efficient methods forspecial cases

(50)

Table of contents

Probabilistic query evaluation Naive evaluation

Extensional evaluation Intensional query evaluation Conclusion

(51)

Extensional evaluation idea

• Sometimes we can compute theprobabilitiesat each step:

U date prof

04 S 0.8

04 A 0.2

V student

1 0.4

2 0.6

U×V date prof student

04 S 1

0.8×0.4

04 S 2

0.8×0.6

04 A 1

0.2×0.4

04 A 2

0.2×0.6

(52)

Extensional evaluation idea

• Sometimes we can compute theprobabilitiesat each step:

U date prof

04 S 0.8

04 A 0.2

V student

1 0.4

2 0.6

U×V date prof student

04 S 1

0.8×0.4

04 S 2

0.8×0.6

04 A 1

0.2×0.4

04 A 2

0.2×0.6

(53)

Extensional evaluation idea

• Sometimes we can compute theprobabilitiesat each step:

U date prof

04 S 0.8

04 A 0.2

V student

1 0.4

2 0.6

U×V date prof student

04 S 1

0.8×0.4

04 S 2

0.8×0.6

04 A 1

0.2×0.4

04 A 2

0.2×0.6

(54)

Extensional evaluation idea

• Sometimes we can compute theprobabilitiesat each step:

U date prof

04 S 0.8

04 A 0.2

V student

1 0.4

2 0.6

U×V date prof student

04 S 1

0.8×0.4

04 S 2

0.8×0.6

04 A 1

0.2×0.4 0.2×0.6

(55)

Extensional evaluation idea

• Sometimes we can compute theprobabilitiesat each step:

U date prof

04 S 0.8

04 A 0.2

V student

1 0.4

2 0.6

U×V date prof student

04 S 1 0.8×0.4

04 S 2

0.8×0.6

04 A 1

0.2×0.4

04 A 2

0.2×0.6

(56)

Extensional evaluation idea

• Sometimes we can compute theprobabilitiesat each step:

U date prof

04 S 0.8

04 A 0.2

V student

1 0.4

2 0.6

U×V date prof student

04 S 1 0.8×0.4

04 S 2 0.8×0.6

04 A 1

0.2×0.4 0.2×0.6

(57)

Extensional evaluation idea

• Sometimes we can compute theprobabilitiesat each step:

U date prof

04 S 0.8

04 A 0.2

V student

1 0.4

2 0.6

U×V date prof student

04 S 1 0.8×0.4

04 S 2 0.8×0.6

04 A 1 0.2×0.4

04 A 2

0.2×0.6

(58)

Extensional evaluation idea

• Sometimes we can compute theprobabilitiesat each step:

U date prof

04 S 0.8

04 A 0.2

V student

1 0.4

2 0.6

U×V date prof student

04 S 1 0.8×0.4

04 S 2 0.8×0.6

04 A 1 0.2×0.4

(59)

Query independence

• We say that queriesQandQ0 aresyntactically independent if no relation is used in bothQandQ0

Example:Q=R./SandQ0=πa(T×U)

Intuition: the tuples inQandQ0 are independent

• Independent join: ifQandQ0 are syntactically independent then we can computeQ./Q0 andQ×Q0: multiply the probabilities

U date pr

04 S 0.8

04 A 0.2

V pr st

A 1 0.4

S 2 0.6

U./V date pr st

04 S 2

0.8×0.6

04 A 1

0.2×0.4

(60)

Query independence

• We say that queriesQandQ0 aresyntactically independent if no relation is used in bothQandQ0

Example:Q=R./SandQ0=πa(T×U)

Intuition: the tuples inQandQ0 are independent

• Independent join:ifQandQ0 are syntactically independent then we can computeQ./Q0 andQ×Q0: multiply the probabilities

U date pr

04 S 0.8

04 A 0.2

V pr st

A 1 0.4

S 2 0.6

U./V date pr st

04 S 2

0.8×0.6

04 A 1

0.2×0.4

(61)

Query independence

• We say that queriesQandQ0 aresyntactically independent if no relation is used in bothQandQ0

Example:Q=R./SandQ0=πa(T×U)

Intuition: the tuples inQandQ0 are independent

• Independent join:ifQandQ0 are syntactically independent then we can computeQ./Q0 andQ×Q0: multiply the probabilities

U date pr

04 S 0.8

04 A 0.2

V pr st

A 1 0.4

S 2 0.6

U./V date pr st

04 S 2

0.8×0.6

04 A 1

0.2×0.4

(62)

Query independence

• We say that queriesQandQ0 aresyntactically independent if no relation is used in bothQandQ0

Example:Q=R./SandQ0=πa(T×U)

Intuition: the tuples inQandQ0 are independent

• Independent join:ifQandQ0 are syntactically independent then we can computeQ./Q0 andQ×Q0: multiply the probabilities

U date pr

04 S 0.8

04 A

V pr st

A 1 0.4

S 2

U./V date pr st

04 S 2

0.8×0.6

04 A 1

0.2×0.4

(63)

Query independence

• We say that queriesQandQ0 aresyntactically independent if no relation is used in bothQandQ0

Example:Q=R./SandQ0=πa(T×U)

Intuition: the tuples inQandQ0 are independent

• Independent join:ifQandQ0 are syntactically independent then we can computeQ./Q0 andQ×Q0: multiply the probabilities

U date pr

04 S 0.8

04 A 0.2

V pr st

A 1 0.4

S 2 0.6

U./V date pr st

04 S 2

0.8×0.6

04 A 1

0.2×0.4

(64)

Query independence

• We say that queriesQandQ0 aresyntactically independent if no relation is used in bothQandQ0

Example:Q=R./SandQ0=πa(T×U)

Intuition: the tuples inQandQ0 are independent

• Independent join:ifQandQ0 are syntactically independent then we can computeQ./Q0 andQ×Q0: multiply the probabilities

U date pr

04 S 0.8

04 A

V pr st

A 1 0.4

S 2

U./V date pr st

04 S 2 0.8×0.6

04 A 1

0.2×0.4

(65)

Query independence

• We say that queriesQandQ0 aresyntactically independent if no relation is used in bothQandQ0

Example:Q=R./SandQ0=πa(T×U)

Intuition: the tuples inQandQ0 are independent

• Independent join:ifQandQ0 are syntactically independent then we can computeQ./Q0 andQ×Q0: multiply the probabilities

U date pr

04 S 0.8

04 A 0.2

V pr st

A 1 0.4

S 2 0.6

U./V date pr st

04 S 2 0.8×0.6 04 A 1 0.2×0.4

(66)

More query independence

• Independent union:for syntactically independentQandQ0 we can computeQQ0using the rule for independent OR

U date prof

04 S 0.8

04 A 0.2

V date prof

04 A 0.4

11 A 0.2

UV date prof

04 S

0.8

04 A

1−(1−0.2)×(1−0.4)

11 A

0.2

(67)

More query independence

• Independent union:for syntactically independentQandQ0 we can computeQQ0using the rule for independent OR

U date prof

04 S 0.8

04 A 0.2

V date prof

04 A 0.4

11 A 0.2

UV date prof

04 S

0.8

04 A

1−(1−0.2)×(1−0.4)

11 A

0.2

(68)

More query independence

• Independent union:for syntactically independentQandQ0 we can computeQQ0using the rule for independent OR

U date prof

04 S 0.8

04 A 0.2

V date prof

04 A 0.4

11 A 0.2

UV date prof

04 S

0.8

04 A

1−(1−0.2)×(1−0.4)

11 A

0.2

(69)

More query independence

• Independent union:for syntactically independentQandQ0 we can computeQQ0using the rule for independent OR

U date prof

04 S 0.8

04 A 0.2

V date prof

04 A 0.4

11 A 0.2

UV date prof

04 S

0.8

04 A

1−(1−0.2)×(1−0.4)

11 A

0.2

(70)

More query independence

• Independent union:for syntactically independentQandQ0 we can computeQQ0using the rule for independent OR

U date prof

04 S 0.8

04 A 0.2

V date prof

04 A 0.4

11 A 0.2

UV date prof

04 S 0.8

04 A

1−(1−0.2)×(1−0.4) 0.2

(71)

More query independence

• Independent union:for syntactically independentQandQ0 we can computeQQ0using the rule for independent OR

U date prof

04 S 0.8

04 A 0.2

V date prof

04 A 0.4

11 A 0.2

UV date prof

04 S 0.8

04 A 1−(1−0.2)×(1−0.4)

11 A

0.2

(72)

More query independence

• Independent union:for syntactically independentQandQ0 we can computeQQ0using the rule for independent OR

U date prof

04 S 0.8

04 A 0.2

V date prof

04 A 0.4

11 A 0.2

UV date prof

04 S 0.8

04 A 1−(1−0.2)×(1−0.4)

(73)

Selection

Selection can just be applied in thestraightforward way:

U date prof

04 S 0.8

04 A 0.2

σprof=“S”(U) date prof

04 S 0.8

(74)

Selection

Selection can just be applied in thestraightforward way:

U date prof

04 S 0.8

04 A 0.2

σprof=“S”(U) date prof

04 S 0.8

(75)

Selection

Selection can just be applied in thestraightforward way:

U date prof

04 S 0.8

04 A 0.2

σprof=“S”(U) date prof

04 S 0.8

(76)

Selection

Selection can just be applied in thestraightforward way:

U date prof

04 S 0.8

04 A 0.2

σprof=“S”(U) date prof

04 S 0.8

(77)

Independent projection

• Self-join-free conjunctive query:a join (./) of projections (π) that does not use the same relation name twice:

Example: Q=R./S./ πa(T)

• Aseparatoris an attribute that occurs in all tables of the join:

Example: ifR(a,b),S(a),T(a,c)thenais a separator ofQ

• IfQis a self-join-free conjunctive query andais a separator thenπ−a(projecting awaythe attributea)

can becomputedusing independent OR

(78)

Independent projection

• Self-join-free conjunctive query:a join (./) of projections (π) that does not use the same relation name twice:

Example: Q=R./S./ πa(T)

• Aseparatoris an attribute that occurs in all tables of the join:

Example: ifR(a,b),S(a),T(a,c)thenais a separator ofQ

• IfQis a self-join-free conjunctive query andais a separator thenπ−a(projecting awaythe attributea)

can becomputedusing independent OR

(79)

Independent projection

• Self-join-free conjunctive query:a join (./) of projections (π) that does not use the same relation name twice:

Example: Q=R./S./ πa(T)

• Aseparatoris an attribute that occurs in all tables of the join:

Example: ifR(a,b),S(a),T(a,c)thenais a separator ofQ

• IfQis a self-join-free conjunctive query andais a separator thenπ−a(projecting awaythe attributea)

can becomputedusing independent OR

(80)

Independent projection example

U date prof

04 S 0.8

04 A 0.2

11 S 0.4

11 A 0.6

πdate(U) date

04

1−(1−0.8)×(1−0.2)

11

1−(1−0.4)×(1−0.6)

(81)

Independent projection example

U date prof

04 S 0.8

04 A 0.2

11 S 0.4

11 A 0.6

πdate(U) date

04

1−(1−0.8)×(1−0.2)

11

1−(1−0.4)×(1−0.6)

(82)

Independent projection example

U date prof

04 S 0.8

04 A 0.2

11 S 0.4

11 A 0.6

πdate(U) date

04 1−(1−0.8)×(1−0.2) 11

1−(1−0.4)×(1−0.6)

(83)

Independent projection example

U date prof

04 S 0.8

04 A 0.2

11 S 0.4

11 A 0.6

πdate(U) date

04 1−(1−0.8)×(1−0.2) 11 1−(1−0.4)×(1−0.6)

(84)

Another independent projection example Consider the two tables:

U date prof

04 S 1/2

V prof cause

S illness 1/2 S bahamas 1/2

• Query:Q(U,V) =πdate,prof(U./V)

• Can be rewritten as:U./ π−cause(V) π−cause(V)

prof

S 3/4

U./ π−cause(V) date prof

04 S 3/8

(85)

Another independent projection example Consider the two tables:

U date prof

04 S 1/2

V prof cause

S illness 1/2 S bahamas 1/2

• Query:Q(U,V) =πdate,prof(U./V)

• Can be rewritten as:U./ π−cause(V) π−cause(V)

prof

S 3/4

U./ π−cause(V) date prof

04 S 3/8

(86)

Another independent projection example Consider the two tables:

U date prof

04 S 1/2

V prof cause

S illness 1/2 S bahamas 1/2

• Query:Q(U,V) =πdate,prof(U./V)

• Can be rewritten as:U./ π−cause(V)

π−cause(V) prof

S 3/4

U./ π−cause(V) date prof

04 S 3/8

(87)

Another independent projection example Consider the two tables:

U date prof

04 S 1/2

V prof cause

S illness 1/2 S bahamas 1/2

• Query:Q(U,V) =πdate,prof(U./V)

• Can be rewritten as:U./ π−cause(V) π−cause(V)

prof

S 3/4

U./ π−cause(V) date prof

04 S 3/8

(88)

The choice of plan matters!

• Query:Q(U,V) =πdate,prof(U./V)

• Can be rewritten asπ−cause(U./V)instead ofU./ π−cause(V)

U date prof

04 S 1/2

V prof cause

S illness 1/2 S bahamas 1/2 U./V

date prof cause

04 S illness

1/4

04 S bahamas

1/4

π−cause(U./V) date prof

04 S

7/16??

→ The last projection is notindependent, soincorrect result!

(89)

The choice of plan matters!

• Query:Q(U,V) =πdate,prof(U./V)

• Can be rewritten asπ−cause(U./V)instead ofU./ π−cause(V) U

date prof

04 S 1/2

V prof cause

S illness 1/2 S bahamas 1/2 U./V

date prof cause

04 S illness

1/4

04 S bahamas

1/4

π−cause(U./V) date prof

04 S

7/16??

→ The last projection is notindependent, soincorrect result!

(90)

The choice of plan matters!

• Query:Q(U,V) =πdate,prof(U./V)

• Can be rewritten asπ−cause(U./V)instead ofU./ π−cause(V) U

date prof

04 S 1/2

V prof cause

S illness 1/2 S bahamas 1/2

U./V date prof cause

04 S illness

1/4

04 S bahamas

1/4

π−cause(U./V) date prof

04 S

7/16??

→ The last projection is notindependent, soincorrect result!

(91)

The choice of plan matters!

• Query:Q(U,V) =πdate,prof(U./V)

• Can be rewritten asπ−cause(U./V)instead ofU./ π−cause(V) U

date prof

04 S 1/2

V prof cause

S illness 1/2 S bahamas 1/2 U./V

date prof cause

04 S illness

1/4

04 S bahamas

1/4

π−cause(U./V) date prof

04 S

7/16??

→ The last projection is notindependent, soincorrect result!

(92)

The choice of plan matters!

• Query:Q(U,V) =πdate,prof(U./V)

• Can be rewritten asπ−cause(U./V)instead ofU./ π−cause(V) U

date prof

04 S 1/2

V prof cause

S illness 1/2 S bahamas 1/2 U./V

date prof cause

04 S illness 1/4

04 S bahamas

1/4

π−cause(U./V) date prof

04 S

7/16??

→ The last projection is notindependent, soincorrect result!

(93)

The choice of plan matters!

• Query:Q(U,V) =πdate,prof(U./V)

• Can be rewritten asπ−cause(U./V)instead ofU./ π−cause(V) U

date prof

04 S 1/2

V prof cause

S illness 1/2 S bahamas 1/2 U./V

date prof cause

04 S illness 1/4

04 S bahamas 1/4

π−cause(U./V) date prof

04 S

7/16??

→ The last projection is notindependent, soincorrect result!

(94)

The choice of plan matters!

• Query:Q(U,V) =πdate,prof(U./V)

• Can be rewritten asπ−cause(U./V)instead ofU./ π−cause(V) U

date prof

04 S 1/2

V prof cause

S illness 1/2 S bahamas 1/2 U./V

date prof cause

04 S illness 1/4

04 S bahamas 1/4

π−cause(U./V) date prof

04 S

7/16??

→ The last projection is notindependent, soincorrect result!

(95)

The choice of plan matters!

• Query:Q(U,V) =πdate,prof(U./V)

• Can be rewritten asπ−cause(U./V)instead ofU./ π−cause(V) U

date prof

04 S 1/2

V prof cause

S illness 1/2 S bahamas 1/2 U./V

date prof cause

04 S illness 1/4

04 S bahamas 1/4

π−cause(U./V) date prof

04 S 7/16??

→ The last projection is notindependent, soincorrect result!

(96)

The choice of plan matters!

• Query:Q(U,V) =πdate,prof(U./V)

• Can be rewritten asπ−cause(U./V)instead ofU./ π−cause(V) U

date prof

04 S 1/2

V prof cause

S illness 1/2 S bahamas 1/2 U./V

date prof cause

04 S illness 1/4

04 S bahamas 1/4

π−cause(U./V) date prof

04 S 7/16??

(97)

Safe plans

• Asafe planfor a queryQis a way to implementQ using the extensional operators:

• It must use themcorrectly, e.g., respecting independence

• It must beequivalentto the desired queryQ

→ With asafe plan, we can compute the marginal probability of all query results

(98)

Safe plans

• Asafe planfor a queryQis a way to implementQ using the extensional operators:

• It must use themcorrectly, e.g., respecting independence

• It must beequivalentto the desired queryQ

→ With asafe plan, we can compute the marginal probability of all query results

(99)

Safe plans

• Asafe planfor a queryQis a way to implementQ using the extensional operators:

• It must use themcorrectly, e.g., respecting independence

• It must beequivalentto the desired queryQ

→ With asafe plan, we can compute the marginal probability of all query results

(100)

Safe plans

• Asafe planfor a queryQis a way to implementQ using the extensional operators:

• It must use themcorrectly, e.g., respecting independence

• It must beequivalentto the desired queryQ

→ With asafe plan, we can compute the marginal probability of all query results

(101)

Do all queries have a safe plan?

• RelationsR(a),S(a,b),T(b)

• QueryQ−a−b(R./S./T))

• DoesQhave a safe plan?

• If we do thejoinsfirst then no projection is independent

• If we writeQasπ−a(R./ π−b(S./T))) then the projection isnot safe

Same problemforπ−b−a(R./S)./T)

→ In factQisintractableand it has no safe plan

(102)

Do all queries have a safe plan?

• RelationsR(a),S(a,b),T(b)

• QueryQ−a−b(R./S./T))

• DoesQhave a safe plan?

• If we do thejoinsfirst then no projection is independent

• If we writeQasπ−a(R./ π−b(S./T))) then the projection isnot safe

Same problemforπ−b−a(R./S)./T)

→ In factQisintractableand it has no safe plan

(103)

Do all queries have a safe plan?

• RelationsR(a),S(a,b),T(b)

• QueryQ−a−b(R./S./T))

• DoesQhave a safe plan?

• If we do thejoinsfirst then no projection is independent

• If we writeQasπ−a(R./ π−b(S./T))) then the projection isnot safe

Same problemforπ−b−a(R./S)./T)

→ In factQisintractableand it has no safe plan

(104)

Do all queries have a safe plan?

• RelationsR(a),S(a,b),T(b)

• QueryQ−a−b(R./S./T))

• DoesQhave a safe plan?

• If we do thejoinsfirst then no projection is independent

• If we writeQasπ−a(R./ π−b(S./T))) then the projection isnot safe

Same problemforπ−b−a(R./S)./T)

→ In factQisintractableand it has no safe plan

(105)

Do all queries have a safe plan?

• RelationsR(a),S(a,b),T(b)

• QueryQ−a−b(R./S./T))

• DoesQhave a safe plan?

• If we do thejoinsfirst then no projection is independent

• If we writeQasπ−a(R./ π−b(S./T))) then the projection isnot safe

Same problemforπ−b−a(R./S)./T)

→ In factQisintractableand it has no safe plan

(106)

Do all queries have a safe plan?

• RelationsR(a),S(a,b),T(b)

• QueryQ−a−b(R./S./T))

• DoesQhave a safe plan?

• If we do thejoinsfirst then no projection is independent

• If we writeQasπ−a(R./ π−b(S./T))) then the projection isnot safe

Same problemforπ−b−a(R./S)./T)

→ In factQisintractableand it has no safe plan

(107)

Do all queries have a safe plan?

• RelationsR(a),S(a,b),T(b)

• QueryQ−a−b(R./S./T))

• DoesQhave a safe plan?

• If we do thejoinsfirst then no projection is independent

• If we writeQasπ−a(R./ π−b(S./T))) then the projection isnot safe

Same problemforπ−b−a(R./S)./T)

→ In factQisintractableand it has no safe plan

(108)

Extensional query evaluation summary

• Extensional query evaluation:

• Express the query as asafe planwith the extensional operators

• Compute thequery resultsand theirprobabilitiesvia the plan

• The probabilities arecorrectbecause the plan is safe

• Summary of safe operators:

Productandjoinofsyntactically independent queries

Productof independent probabilities

Unionofsyntactically independent queries

Independent ORof the probabilities

Projecting awaya separator attribute

Independent ORbecause the tuples in each group are independent

• Applyingselectionin the straightforward way

• Also other rules: negation, inclusion-exclusion, etc.

→ Not all queries havesafe plans

(109)

Extensional query evaluation summary

• Extensional query evaluation:

• Express the query as asafe planwith the extensional operators

• Compute thequery resultsand theirprobabilitiesvia the plan

• The probabilities arecorrectbecause the plan is safe

• Summary of safe operators:

Productandjoinofsyntactically independent queries

Productof independent probabilities

Unionofsyntactically independent queries

Independent ORof the probabilities

Projecting awaya separator attribute

Independent ORbecause the tuples in each group are independent

• Applyingselectionin the straightforward way

• Also other rules: negation, inclusion-exclusion, etc.

→ Not all queries havesafe plans

(110)

Extensional query evaluation summary

• Extensional query evaluation:

• Express the query as asafe planwith the extensional operators

• Compute thequery resultsand theirprobabilitiesvia the plan

• The probabilities arecorrectbecause the plan is safe

• Summary of safe operators:

Productandjoinofsyntactically independent queries

Productof independent probabilities

Unionofsyntactically independent queries

Independent ORof the probabilities

Projecting awaya separator attribute

Independent ORbecause the tuples in each group are independent

• Applyingselectionin the straightforward way

• Also other rules: negation, inclusion-exclusion, etc.

Références

Documents relatifs

On the one hand in traditional cost models the operation cost (amount of resources for its evaluation) can be estimated based on the size of arguments, order of objects in

We introduce the RDF data model and the SPARQL query language under special attention to the construction of the dataset by link traversal and to the relation between the

→ PQE for MSO on bounded-treewidth instances has linear data complexity [Amarilli, Bourhis, & Senellart, 2015]. → There is an FO query for which PQE is #P-hard

Then, we propose algorithms for computing optimal obstruction censors for knowledge bases with OWL 2 QL as well as restricted OWL 2 RL ontologies, which are based on obstructions

In this paper we present FRaQuE (Framework for Rapid Query Processing Evaluation), a framework aimed to offer to a non-expert user the possibility to evaluate query processors

(Extensions of both query language and DL are considered in [7].) The probabilistic component follows the semantics of probabilistic ABoxes from [6], which itself is based on

Research Challenge: How should academics and system designers model and optimise search performance based on information needs and not a single query.. Method

For discrete probabilistic KBs (T , A), we can define a counting problem cor- responding to probabilistic query entailment that is solved by counting possible worlds in the