• Aucun résultat trouvé

Top-k Queries over Uncertain Scores

N/A
N/A
Protected

Academic year: 2022

Partager "Top-k Queries over Uncertain Scores"

Copied!
57
0
0

Texte intégral

(1)

Top-k Queries over Uncertain Scores

Top-k Queries over Uncertain Scores

Qing Liu, Debabrota Basu, Talel Abdessalem, St´ephane Bressan

(2)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Modern recommendation systems leverage some forms of collaborative user (crowd) sourced collection of information.

I Crowdsourcing Platforms

I easily announce their needs to the crowd / get access to the information they need

I choose the highest quality / most competitively priced

I Examples: TripAdvisor

I collaborative user or crowdsourced collection of information, e.g., user generated ratings and reviews, to recommend travel plans and hotels, vacation rentals and restaurants.

(3)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Modern recommendation systems leverage some forms of collaborative user (crowd) sourced collection of information.

I Crowdsourcing Platforms

I easily announce their needs to the crowd / get access to the information they need

I choose the highest quality / most competitively priced

I Examples: TripAdvisor

I collaborative user or crowdsourced collection of information, e.g., user generated ratings and reviews, to recommend travel plans and hotels, vacation rentals and restaurants.

(4)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Modern recommendation systems leverage some forms of collaborative user (crowd) sourced collection of information.

I Crowdsourcing Platforms

I easily announce their needs to the crowd / get access to the information they need

I choose the highest quality / most competitively priced

I Examples: TripAdvisor

I collaborative user or crowdsourced collection of information, e.g., user generated ratings and reviews, to recommend travel plans and hotels, vacation rentals and restaurants.

(5)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Modern recommendation systems leverage some forms of collaborative user (crowd) sourced collection of information.

I Crowdsourcing Platforms

I easily announce their needs to the crowd / get access to the information they need

I choose the highest quality / most competitively priced

I Examples: TripAdvisor

I collaborative user or crowdsourced collection of information, e.g., user generated ratings and reviews, to recommend travel plans and hotels, vacation rentals and restaurants.

(6)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Crowdsourcing and Collaborative Economy:

I communities or crowds rent, share, sell products or services

(7)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Crowdsourcing and Collaborative Economy:

I communities or crowds rent, share, sell products or services

(8)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Crowdsourcing and Collaborative Economy:

I communities or crowds rent, share, sell products or services

(9)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Crowdsourcing and Collaborative Economy:

I communities or crowds rent, share, sell products or services

(10)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Independent collection of information → uncertainty and diversity.

I Objects (services, vacation rentals and restaurants...) have uncertain scores (quality, price...).

(11)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Independent collection of information → uncertainty and diversity.

I Objects (services, vacation rentals and restaurants...) have uncertain scores (quality, price...).

(12)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Independent collection of information → uncertainty and diversity.

I Objects (services, vacation rentals and restaurants...) have uncertain scores (quality, price...).

(13)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Independent collection of information → uncertainty and diversity.

I Objects (services, vacation rentals and restaurants...) have uncertain scores (quality, price...).

(14)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Independent collection of information → uncertainty and diversity.

I Objects (services, vacation rentals and restaurants...) have uncertain scores (quality, price...).

(15)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Ranking is one of the building blocks of recommendation.

I A top-k query returns the sequence of the k objects with the highest scores, given a database of objects ranked by their scores for the feature of interest.

I Price of the apartments.

I With uncertain scores, a top-k query can only return an uncertain result.

(16)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Ranking is one of the building blocks of recommendation.

I A top-k query returns the sequence of the k objects with the highest scores, given a database of objects ranked by their scores for the feature of interest.

I Price of the apartments.

20000 3000 4000 5000 6000

0.2 0.4 0.6 0.8 1 1.2 1.4x 10−3

I With uncertain scores, a top-k query can only return an uncertain result.

(17)

Top-k Queries over Uncertain Scores Introduction

Introduction

I Ranking is one of the building blocks of recommendation.

I A top-k query returns the sequence of the k objects with the highest scores, given a database of objects ranked by their scores for the feature of interest.

I Price of the apartments.

20000 3000 4000 5000 6000

0.2 0.4 0.6 0.8 1 1.2 1.4x 10−3

I With uncertain scores, a top-k query can only return an uncertain result.

(18)

Top-k Queries over Uncertain Scores Related Work

Related Work

I Soliman, Hyas and Ben-David [Soliman and Ilyas, 2009] study top-k queries over objects with uncertain scores given as probability distributions.

I In this paper, we consider probabilistic top-k queries under the top-k semantics as in [Soliman and Ilyas, 2009].

(19)

Top-k Queries over Uncertain Scores Related Work

Related Work

I Soliman, Hyas and Ben-David [Soliman and Ilyas, 2009] study top-k queries over objects with uncertain scores given as probability distributions.

I In this paper, we consider probabilistic top-k queries under the top-k semantics as in [Soliman and Ilyas, 2009].

(20)

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

I O: a set ofn objects;

I s(oi): the score of an object oi ∈ O;

I Xi: a random variable, equals tos(oi);

I fi: bounded continuous probability density function of Xi;

I π(k)= [o1,· · · , ok]: sequence of k objects inO;

I P r(π(k)): probability ofπ(k) be the top-ksequence; P r(π(k)) =

Z

−∞

Z x1

−∞

· · · Z xk

−∞

f1(x1)· · ·fn(xn) dxn· · ·dx1

(1)

I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).

(21)

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

I O: a set ofn objects;

I s(oi): the score of an objectoi ∈ O;

I Xi: a random variable, equals tos(oi);

I fi: bounded continuous probability density function of Xi;

I π(k)= [o1,· · · , ok]: sequence of k objects inO;

I P r(π(k)): probability ofπ(k) be the top-ksequence; P r(π(k)) =

Z

−∞

Z x1

−∞

· · · Z xk

−∞

f1(x1)· · ·fn(xn) dxn· · ·dx1

(1)

I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).

(22)

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

I O: a set ofn objects;

I s(oi): the score of an objectoi ∈ O;

I Xi: a random variable, equals tos(oi);

I fi: bounded continuous probability density function of Xi;

I π(k)= [o1,· · · , ok]: sequence of k objects inO;

I P r(π(k)): probability ofπ(k) be the top-ksequence; P r(π(k)) =

Z

−∞

Z x1

−∞

· · · Z xk

−∞

f1(x1)· · ·fn(xn) dxn· · ·dx1

(1)

I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).

(23)

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

I O: a set ofn objects;

I s(oi): the score of an objectoi ∈ O;

I Xi: a random variable, equals tos(oi);

I fi: bounded continuous probability density function of Xi;

I π(k)= [o1,· · · , ok]: sequence of k objects inO;

I P r(π(k)): probability ofπ(k) be the top-ksequence; P r(π(k)) =

Z

−∞

Z x1

−∞

· · · Z xk

−∞

f1(x1)· · ·fn(xn) dxn· · ·dx1

(1)

I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).

(24)

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

I O: a set ofn objects;

I s(oi): the score of an objectoi ∈ O;

I Xi: a random variable, equals tos(oi);

I fi: bounded continuous probability density function of Xi;

I π(k)= [o1,· · · , ok]: sequence of kobjects inO;

I P r(π(k)): probability ofπ(k) be the top-ksequence; P r(π(k)) =

Z

−∞

Z x1

−∞

· · · Z xk

−∞

f1(x1)· · ·fn(xn) dxn· · ·dx1

(1)

I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).

(25)

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

I O: a set ofn objects;

I s(oi): the score of an objectoi ∈ O;

I Xi: a random variable, equals tos(oi);

I fi: bounded continuous probability density function of Xi;

I π(k)= [o1,· · · , ok]: sequence of kobjects inO;

I P r(π(k)): probability ofπ(k) be the top-ksequence;

P r(π(k)) = Z

−∞

Z x1

−∞

· · · Z xk

−∞

f1(x1)· · ·fn(xn) dxn· · ·dx1 (1)

I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).

(26)

Top-k Queries over Uncertain Scores Problem Definition

Problem Definition

I O: a set ofn objects;

I s(oi): the score of an objectoi ∈ O;

I Xi: a random variable, equals tos(oi);

I fi: bounded continuous probability density function of Xi;

I π(k)= [o1,· · · , ok]: sequence of kobjects inO;

I P r(π(k)): probability ofπ(k) be the top-ksequence;

P r(π(k)) = Z

−∞

Z x1

−∞

· · · Z xk

−∞

f1(x1)· · ·fn(xn) dxn· · ·dx1

(1)

I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).

(27)

Top-k Queries over Uncertain Scores Solutions

Solutions

I Naive: calculateP r(π(k)) for every possible sequenceπ(k) and returning the π(k) with the highestP r(π(k)).

I n!

(n−k)! possible sequences to examine.

I Branch-and-Bound[Soliman et al., 2010]: Prune some π(k)s.

I Worst case: (n−k)!n! possible sequences to examine.

I Soliman’s Algorithm [Soliman et al., 2010]: searches the space of candidate probabilistic top-k sequences using a Markov chain Monte Carlo algorithm.

I In this paper, we explore the variants of Markov chain Monte Carlo algorithms.

(28)

Top-k Queries over Uncertain Scores Solutions

Solutions

I Naive: calculateP r(π(k)) for every possible sequenceπ(k) and returning the π(k) with the highestP r(π(k)).

I n!

(n−k)! possible sequences to examine.

I Branch-and-Bound[Soliman et al., 2010]: Prune some π(k)s.

I Worst case: (n−k)!n! possible sequences to examine.

I Soliman’s Algorithm [Soliman et al., 2010]: searches the space of candidate probabilistic top-k sequences using a Markov chain Monte Carlo algorithm.

I In this paper, we explore the variants of Markov chain Monte Carlo algorithms.

(29)

Top-k Queries over Uncertain Scores Solutions

Solutions

I Naive: calculateP r(π(k)) for every possible sequenceπ(k) and returning the π(k) with the highestP r(π(k)).

I n!

(n−k)! possible sequences to examine.

I Branch-and-Bound[Soliman et al., 2010]: Prune some π(k)s.

I Worst case: (n−k)!n! possible sequences to examine.

I Soliman’s Algorithm [Soliman et al., 2010]: searches the space of candidate probabilistic top-k sequences using a Markov chain Monte Carlo algorithm.

I In this paper, we explore the variants of Markov chain Monte Carlo algorithms.

(30)

Top-k Queries over Uncertain Scores Solutions

Solutions

I Naive: calculateP r(π(k)) for every possible sequenceπ(k) and returning the π(k) with the highestP r(π(k)).

I n!

(n−k)! possible sequences to examine.

I Branch-and-Bound[Soliman et al., 2010]: Prune some π(k)s.

I Worst case: (n−k)!n! possible sequences to examine.

I Soliman’s Algorithm [Soliman et al., 2010]: searches the space of candidate probabilistic top-k sequences using a Markov chain Monte Carlo algorithm.

I In this paper, we explore the variants of Markov chain Monte Carlo algorithms.

(31)

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

I Soliman’s Algorithm

I Initial state: a rank over thenobjects

I Candidate State:

𝑜1

𝑜2

𝑜3

𝑜4

𝑜5

𝑜6

𝑜7

Pr(𝑜2< 𝑜3) 𝑜1

𝑜3

𝑜2

𝑜4

𝑜5

𝑜6

𝑜7

Pr(𝑜2< 𝑜4) Top-𝑘

𝑜1

𝑜2

𝑜3

𝑜4

𝑜5

𝑜6

𝑜7

Pr(𝑜6> 𝑜5) 𝑜1

𝑜2

𝑜3

𝑜4

𝑜6

𝑜5

𝑜7

Pr(𝑜6> 𝑜4)

I Acceptance Probability: α= min(P r(π

(k)

t+1)·P r(πtt+1) P r(πt(k))·P r(πt+1t),1)

(32)

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

I Soliman’s Algorithm

I Initial state: a rank over thenobjects

I Candidate State:

𝑜1

𝑜2

𝑜3

𝑜4

𝑜5

𝑜6

𝑜7

Pr(𝑜2< 𝑜3) 𝑜1

𝑜3

𝑜2

𝑜4

𝑜5

𝑜6

𝑜7

Pr(𝑜2< 𝑜4) Top-𝑘

𝑜1

𝑜2

𝑜3

𝑜4

𝑜5

𝑜6

𝑜7

Pr(𝑜6> 𝑜5) 𝑜1

𝑜2

𝑜3

𝑜4

𝑜6

𝑜5

𝑜7

Pr(𝑜6> 𝑜4)

I Acceptance Probability: α= min(P r(π

(k)

t+1)·P r(πtt+1) P r(πt(k))·P r(πt+1t),1)

(33)

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

I Soliman’s Algorithm

I Initial state: a rank over thenobjects

I Candidate State:

𝑜1

𝑜2

𝑜3

𝑜4

𝑜5

𝑜6

𝑜7

Pr(𝑜2< 𝑜3) 𝑜1

𝑜3

𝑜2

𝑜4

𝑜5

𝑜6

𝑜7

Pr(𝑜2< 𝑜4) Top-𝑘

𝑜1

𝑜2

𝑜3

𝑜4

𝑜5

𝑜6

𝑜7

Pr(𝑜6> 𝑜5) 𝑜1

𝑜2

𝑜3

𝑜4

𝑜6

𝑜5

𝑜7

Pr(𝑜6> 𝑜4)

I Acceptance Probability: α= min(P r(π

(k)

t+1)·P r(πtt+1) P r(πt(k))·P r(πt+1t),1)

(34)

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

I Swap andSwapEXP Algorithm

I Initial state: a rank over thenobjects

I Candidate State:

𝑜1

𝑜2

𝑜3

𝑜4

𝑜5

𝑜6

𝑜7 Top-𝑘

𝑜1

𝑜5

𝑜3

𝑜4

𝑜2

𝑜6

𝑜7 I Acceptance Probability:

Swap: α= min(P r(π

(k) t+1kn1

P r(πt(k)kn1 =P r(π

(k) t+1) P r(πt(k)),1) SwapEXP:

α= min(P r(πc

(k) t+1)

P r(πc t(k)) = exp(β(P r(π(k)t+1)P r(πt(k)))),1) (P r(πc (k)) =Cβ−1exp(βP r(π(k))))

I SwapEXP is more likely to reject the “worse” candidate state.

(35)

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

I Swap andSwapEXP Algorithm

I Initial state: a rank over thenobjects

I Candidate State:

𝑜1

𝑜2

𝑜3

𝑜4

𝑜5

𝑜6

𝑜7 Top-𝑘

𝑜1

𝑜5

𝑜3

𝑜4

𝑜2

𝑜6

𝑜7

I Acceptance Probability: Swap: α= min(P r(π

(k) t+1kn1

P r(πt(k)kn1 =P r(π

(k) t+1) P r(πt(k)),1) SwapEXP:

α= min(P r(πc

(k) t+1)

P r(πc t(k)) = exp(β(P r(π(k)t+1)P r(πt(k)))),1) (P r(πc (k)) =Cβ−1exp(βP r(π(k))))

I SwapEXP is more likely to reject the “worse” candidate state.

(36)

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

I Swap andSwapEXP Algorithm

I Initial state: a rank over thenobjects

I Candidate State:

𝑜1

𝑜2

𝑜3

𝑜4

𝑜5

𝑜6

𝑜7 Top-𝑘

𝑜1

𝑜5

𝑜3

𝑜4

𝑜2

𝑜6

𝑜7 I Acceptance Probability:

Swap: α= min(P r(π

(k) t+1kn1

P r(πt(k)kn1 =P r(π

(k) t+1) P r(πt(k)),1) SwapEXP:

α= min(P r(πc

(k) t+1)

P r(πc t(k)) = exp(β(P r(π(k)t+1)P r(πt(k)))),1) (P r(πc (k)) =Cβ−1exp(βP r(π(k))))

I SwapEXP is more likely to reject the “worse” candidate state.

(37)

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

I Swap andSwapEXP Algorithm

I Initial state: a rank over thenobjects

I Candidate State:

𝑜1

𝑜2

𝑜3

𝑜4

𝑜5

𝑜6

𝑜7 Top-𝑘

𝑜1

𝑜5

𝑜3

𝑜4

𝑜2

𝑜6

𝑜7 I Acceptance Probability:

Swap: α= min(P r(π

(k) t+1kn1

P r(πt(k)kn1 =P r(π

(k) t+1) P r(πt(k)),1) SwapEXP:

α= min(P r(πc

(k) t+1)

P r(πc t(k)) = exp(β(P r(π(k)t+1)P r(πt(k)))),1) (P r(πc (k)) =Cβ−1exp(βP r(π(k))))

I SwapEXP is more likely to reject the “worse” candidate state.

(38)

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

I ReSampleand ReSampleEXPAlgorithm

I Initial state: a rank over thenobjects

I Candidate State:

𝑜1: 9 𝑜2: 8 𝑜3: 6 𝑜4: 5 𝑜5: 4 𝑜6: 3 𝑜7: 2

Top-𝑘

𝑜1: 9 𝑜2: 8 𝑜5: 7 𝑜3: 6 𝑜4: 5 𝑜6: 3 𝑜7: 2 I Acceptance Probability:

ReSample: α= min(P r(π

(k)

t+1)·P r(πtt+1) P r(π(k)t )·P r(πt+1t),1) ReSampleEXP:

α= min(P r(πP r(πtt+1)

t+1t)·exp(β(P r(πt+1(k))P r(π(k)t ))),1).

(39)

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

I ReSampleand ReSampleEXPAlgorithm

I Initial state: a rank over thenobjects

I Candidate State:

𝑜1: 9 𝑜2: 8 𝑜3: 6 𝑜4: 5 𝑜5: 4 𝑜6: 3 𝑜7: 2

Top-𝑘

𝑜1: 9 𝑜2: 8 𝑜5: 7 𝑜3: 6 𝑜4: 5 𝑜6: 3 𝑜7: 2

I Acceptance Probability: ReSample: α= min(P r(π

(k)

t+1)·P r(πtt+1) P r(π(k)t )·P r(πt+1t),1) ReSampleEXP:

α= min(P r(πP r(πtt+1)

t+1t)·exp(β(P r(πt+1(k))P r(π(k)t ))),1).

(40)

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

I ReSampleand ReSampleEXPAlgorithm

I Initial state: a rank over thenobjects

I Candidate State:

𝑜1: 9 𝑜2: 8 𝑜3: 6 𝑜4: 5 𝑜5: 4 𝑜6: 3 𝑜7: 2

Top-𝑘

𝑜1: 9 𝑜2: 8 𝑜5: 7 𝑜3: 6 𝑜4: 5 𝑜6: 3 𝑜7: 2 I Acceptance Probability:

ReSample: α= min(P r(π

(k)

t+1)·P r(πtt+1) P r(π(k)t )·P r(πt+1t),1) ReSampleEXP:

α= min(P r(πP r(πtt+1)

t+1t)·exp(β(P r(πt+1(k))P r(π(k)t ))),1).

(41)

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

I ReSampleAllAlgorithm

I Initial state: a rank over thenobjects

I Candidate State:

𝑜1: 9 𝑜2: 8 𝑜3: 6 𝑜4: 5 𝑜5: 4 𝑜6: 3 𝑜7: 2

Top-𝑘

𝑜3: 10 𝑜2: 9 𝑜5: 8 𝑜1: 6 𝑜7: 4 𝑜6: 3 𝑜4: 2 I Acceptance Probability: ReSample: α= 1

(42)

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

I ReSampleAllAlgorithm

I Initial state: a rank over thenobjects

I Candidate State:

𝑜1: 9 𝑜2: 8 𝑜3: 6 𝑜4: 5 𝑜5: 4 𝑜6: 3 𝑜7: 2

Top-𝑘

𝑜3: 10 𝑜2: 9 𝑜5: 8 𝑜1: 6 𝑜7: 4 𝑜6: 3 𝑜4: 2

I Acceptance Probability: ReSample: α= 1

(43)

Top-k Queries over Uncertain Scores Solutions

Markov chain Monte Carlo Algorithms

I ReSampleAllAlgorithm

I Initial state: a rank over thenobjects

I Candidate State:

𝑜1: 9 𝑜2: 8 𝑜3: 6 𝑜4: 5 𝑜5: 4 𝑜6: 3 𝑜7: 2

Top-𝑘

𝑜3: 10 𝑜2: 9 𝑜5: 8 𝑜1: 6 𝑜7: 4 𝑜6: 3 𝑜4: 2 I Acceptance Probability: ReSample: α= 1

(44)

Top-k Queries over Uncertain Scores Performance Evaluation

Performance Evaluation

I Datasets: synthetic datasets

Table: Distributions

Setting 1 Setting 2 Setting 3

median score G(0.5,0.05) G(0.5,0.2) U[0,1]

width G(0.5,0.05) G(0.5,0.2) U[0,1]

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

I default: uniform score distributions, median score ofoi: li+u2 i, width: uili

I Metrics

I Probability of the Probabilistic top-k sequence (higher better)

I Convergence of the Markov chains (Gelman-Rubin Convergence Diagnostic)

I Efficiency (Complexity and runtime)

(45)

Top-k Queries over Uncertain Scores Performance Evaluation

Performance Evaluation

I Datasets: synthetic datasets

Table: Distributions

Setting 1 Setting 2 Setting 3

median score G(0.5,0.05) G(0.5,0.2) U[0,1]

width G(0.5,0.05) G(0.5,0.2) U[0,1]

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

I default: uniform score distributions, median score ofoi: li+u2 i, width: uili

I Metrics

I Probability of the Probabilistic top-k sequence (higher better)

I Convergence of the Markov chains (Gelman-Rubin Convergence Diagnostic)

I Efficiency (Complexity and runtime)

(46)

Top-k Queries over Uncertain Scores Performance Evaluation

Performance Evaluation

I Datasets: synthetic datasets

Table: Distributions

Setting 1 Setting 2 Setting 3

median score G(0.5,0.05) G(0.5,0.2) U[0,1]

width G(0.5,0.05) G(0.5,0.2) U[0,1]

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

I default: uniform score distributions, median score ofoi: li+u2 i, width: uili

I Metrics

I Probability of the Probabilistic top-k sequence (higher better)

I Convergence of the Markov chains (Gelman-Rubin Convergence Diagnostic)

I Efficiency (Complexity and runtime)

(47)

Top-k Queries over Uncertain Scores Performance Evaluation

Performance Evaluation

I Datasets: synthetic datasets

Table: Distributions

Setting 1 Setting 2 Setting 3

median score G(0.5,0.05) G(0.5,0.2) U[0,1]

width G(0.5,0.05) G(0.5,0.2) U[0,1]

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

I default: uniform score distributions, median score ofoi: li+u2 i, width: uili

I Metrics

I Probability of the Probabilistic top-k sequence (higher better)

I Convergence of the Markov chains (Gelman-Rubin Convergence Diagnostic)

I Efficiency (Complexity and runtime)

(48)

Top-k Queries over Uncertain Scores Performance Evaluation

Performance Evaluation

I Datasets: synthetic datasets

Table: Distributions

Setting 1 Setting 2 Setting 3

median score G(0.5,0.05) G(0.5,0.2) U[0,1]

width G(0.5,0.05) G(0.5,0.2) U[0,1]

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

I default: uniform score distributions, median score ofoi: li+u2 i, width: uili

I Metrics

I Probability of the Probabilistic top-k sequence (higher better)

I Convergence of the Markov chains (Gelman-Rubin Convergence Diagnostic)

I Efficiency (Complexity and runtime)

(49)

Top-k Queries over Uncertain Scores Performance Evaluation

Performance Evaluation

I Datasets: synthetic datasets

Table: Distributions

Setting 1 Setting 2 Setting 3

median score G(0.5,0.05) G(0.5,0.2) U[0,1]

width G(0.5,0.05) G(0.5,0.2) U[0,1]

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

00.10.20.30.40.50.60.70.80.91

0 1 2 3 4 5 6 7 8

I default: uniform score distributions, median score ofoi: li+u2 i, width: uili

I Metrics

I Probability of the Probabilistic top-k sequence (higher better)

I Convergence of the Markov chains (Gelman-Rubin Convergence Diagnostic)

I Efficiency (Complexity and runtime)

(50)

Top-k Queries over Uncertain Scores Performance Evaluation

Effectiveness of Six Algorithms

Effectiveness of Six Algorithms (Probability)

0 5 10

x 104 10−8

10−7 10−6 10−5

Chain Length

Probability

Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll

(a) Dataset5

0 5 10

x 104 10−8

10−7 10−6 10−5

Chain Length

Probability

Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll

(b) Dataset21

1 2 3 4 5 6

0.5 1 1.5 2 2.5

(51)

Top-k Queries over Uncertain Scores Performance Evaluation

Convergence of Six Algorithms

Convergence of the Markov Chains

0 5 10

x 104 0

2 4 6 8 10

Chain Length

Gelman−Rubin Diagnostic

Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll

(e) Dataset5

0 5 10

x 104 0

2 4 6 8 10

Chain Length

Gelman−Rubin Diagnostic

Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll

(f) Dataset21

0 0.2 0.4 0.6 0.8 1

0 1 2 3 4 5 6

(g) Dataset5 0 0.2 0.4 0.6 0.8 1

0 0.5 1 1.5 2 2.5

(h) Dataset21

(52)

Top-k Queries over Uncertain Scores Performance Evaluation

Efficiency

Efficiency

Table: Worst Case Time Complexity of Generating Next State

Soliman Swap(EXP) ReSample(EXP) ReSampleAll

Time Complexity O(nk) O(1) O(n) O(nlogk)

Table: Runtime Per Step of the Algorithms (seconds)

Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll

Runtime Per Step 0.0058 1.9128 0.1163 0.0523 0.0071 0.9056

(53)

Top-k Queries over Uncertain Scores Conclusion

Conclusion

I We explore the design space for Metropolis-Hastings Markov chain Monte Carlo algorithms.

I We verify through extensive experiments that the proposed algorithms are more effective than the state of the art approach.

I ReSampleAllis the best, since it samples directly from the target distribution instead of depending on “local”

information.

(54)

Top-k Queries over Uncertain Scores Conclusion

Conclusion

I We explore the design space for Metropolis-Hastings Markov chain Monte Carlo algorithms.

I We verify through extensive experiments that the proposed algorithms are more effective than the state of the art approach.

I ReSampleAllis the best, since it samples directly from the target distribution instead of depending on “local”

information.

(55)

Top-k Queries over Uncertain Scores Conclusion

Conclusion

I We explore the design space for Metropolis-Hastings Markov chain Monte Carlo algorithms.

I We verify through extensive experiments that the proposed algorithms are more effective than the state of the art approach.

I ReSampleAllis the best, since it samples directly from the target distribution instead of depending on “local”

information.

(56)

Top-k Queries over Uncertain Scores Q/A

Thank you! Questions?

Top-k Queries Uncertain Scores

MCMC

[email protected]

(57)

Top-k Queries over Uncertain Scores References

References I

Soliman, M. A. and Ilyas, I. F. (2009).

Ranking with uncertain scores.

InICDE, pages 317–328.

Soliman, M. A., Ilyas, I. F., and Ben-David, S. (2010).

Supporting ranking queries on uncertain and incomplete data.

The VLDB Journal, 19(4):477–501.

Références

Documents relatifs

www.lutinbazar.fr www.lutinbazar.frwww.lutinbazar.fr.. Mes

Le tableau 1 compare le gain entre les résultats obtenus avec la représentation supervisée par rapport à la représentation native pour PAM et l’algorithme modifié des k-moyennes

For searching the best k object according to user preferences, we designed a client application, which is able to obtain objects with their attribute values from remote

Now that all basic concept are defined, we can proceed to illustrate how Top-k DBPSB extends DBPSB to generate top-k SPARQL queries from the Auxiliary queries of DBPSB. Figure 1

Following well known principles of benchmarking [10], we formulate the re- search question of this work as: is it possible to set up a benchmark for top-k SPARQL queries,

How- ever, the existing data structures of these languages have been designed for static data processing and their correct use with evolving data is cumbersome – top-k query

The figure 7 (respectively figure 8) shows that for each of the three deletion strategies (LBD, SIZE and CV SIDS), even by removing less than fifty percent of the learned clauses

Schoenberg [S1, S2] established that this sequence has a continuous and strictly increasing asymptotic distribution function (basic properties of distribution functions can be found