Top-k Queries over Uncertain Scores
Top-k Queries over Uncertain Scores
Qing Liu, Debabrota Basu, Talel Abdessalem, St´ephane Bressan
Top-k Queries over Uncertain Scores Introduction
Introduction
I Modern recommendation systems leverage some forms of collaborative user (crowd) sourced collection of information.
I Crowdsourcing Platforms
I easily announce their needs to the crowd / get access to the information they need
I choose the highest quality / most competitively priced
I Examples: TripAdvisor
I collaborative user or crowdsourced collection of information, e.g., user generated ratings and reviews, to recommend travel plans and hotels, vacation rentals and restaurants.
Top-k Queries over Uncertain Scores Introduction
Introduction
I Modern recommendation systems leverage some forms of collaborative user (crowd) sourced collection of information.
I Crowdsourcing Platforms
I easily announce their needs to the crowd / get access to the information they need
I choose the highest quality / most competitively priced
I Examples: TripAdvisor
I collaborative user or crowdsourced collection of information, e.g., user generated ratings and reviews, to recommend travel plans and hotels, vacation rentals and restaurants.
Top-k Queries over Uncertain Scores Introduction
Introduction
I Modern recommendation systems leverage some forms of collaborative user (crowd) sourced collection of information.
I Crowdsourcing Platforms
I easily announce their needs to the crowd / get access to the information they need
I choose the highest quality / most competitively priced
I Examples: TripAdvisor
I collaborative user or crowdsourced collection of information, e.g., user generated ratings and reviews, to recommend travel plans and hotels, vacation rentals and restaurants.
Top-k Queries over Uncertain Scores Introduction
Introduction
I Modern recommendation systems leverage some forms of collaborative user (crowd) sourced collection of information.
I Crowdsourcing Platforms
I easily announce their needs to the crowd / get access to the information they need
I choose the highest quality / most competitively priced
I Examples: TripAdvisor
I collaborative user or crowdsourced collection of information, e.g., user generated ratings and reviews, to recommend travel plans and hotels, vacation rentals and restaurants.
Top-k Queries over Uncertain Scores Introduction
Introduction
I Crowdsourcing and Collaborative Economy:
I communities or crowds rent, share, sell products or services
Top-k Queries over Uncertain Scores Introduction
Introduction
I Crowdsourcing and Collaborative Economy:
I communities or crowds rent, share, sell products or services
Top-k Queries over Uncertain Scores Introduction
Introduction
I Crowdsourcing and Collaborative Economy:
I communities or crowds rent, share, sell products or services
Top-k Queries over Uncertain Scores Introduction
Introduction
I Crowdsourcing and Collaborative Economy:
I communities or crowds rent, share, sell products or services
Top-k Queries over Uncertain Scores Introduction
Introduction
I Independent collection of information → uncertainty and diversity.
I Objects (services, vacation rentals and restaurants...) have uncertain scores (quality, price...).
Top-k Queries over Uncertain Scores Introduction
Introduction
I Independent collection of information → uncertainty and diversity.
I Objects (services, vacation rentals and restaurants...) have uncertain scores (quality, price...).
Top-k Queries over Uncertain Scores Introduction
Introduction
I Independent collection of information → uncertainty and diversity.
I Objects (services, vacation rentals and restaurants...) have uncertain scores (quality, price...).
Top-k Queries over Uncertain Scores Introduction
Introduction
I Independent collection of information → uncertainty and diversity.
I Objects (services, vacation rentals and restaurants...) have uncertain scores (quality, price...).
Top-k Queries over Uncertain Scores Introduction
Introduction
I Independent collection of information → uncertainty and diversity.
I Objects (services, vacation rentals and restaurants...) have uncertain scores (quality, price...).
Top-k Queries over Uncertain Scores Introduction
Introduction
I Ranking is one of the building blocks of recommendation.
I A top-k query returns the sequence of the k objects with the highest scores, given a database of objects ranked by their scores for the feature of interest.
I Price of the apartments.
I With uncertain scores, a top-k query can only return an uncertain result.
Top-k Queries over Uncertain Scores Introduction
Introduction
I Ranking is one of the building blocks of recommendation.
I A top-k query returns the sequence of the k objects with the highest scores, given a database of objects ranked by their scores for the feature of interest.
I Price of the apartments.
20000 3000 4000 5000 6000
0.2 0.4 0.6 0.8 1 1.2 1.4x 10−3
I With uncertain scores, a top-k query can only return an uncertain result.
Top-k Queries over Uncertain Scores Introduction
Introduction
I Ranking is one of the building blocks of recommendation.
I A top-k query returns the sequence of the k objects with the highest scores, given a database of objects ranked by their scores for the feature of interest.
I Price of the apartments.
20000 3000 4000 5000 6000
0.2 0.4 0.6 0.8 1 1.2 1.4x 10−3
I With uncertain scores, a top-k query can only return an uncertain result.
Top-k Queries over Uncertain Scores Related Work
Related Work
I Soliman, Hyas and Ben-David [Soliman and Ilyas, 2009] study top-k queries over objects with uncertain scores given as probability distributions.
I In this paper, we consider probabilistic top-k queries under the top-k semantics as in [Soliman and Ilyas, 2009].
Top-k Queries over Uncertain Scores Related Work
Related Work
I Soliman, Hyas and Ben-David [Soliman and Ilyas, 2009] study top-k queries over objects with uncertain scores given as probability distributions.
I In this paper, we consider probabilistic top-k queries under the top-k semantics as in [Soliman and Ilyas, 2009].
Top-k Queries over Uncertain Scores Problem Definition
Problem Definition
I O: a set ofn objects;
I s(oi): the score of an object oi ∈ O;
I Xi: a random variable, equals tos(oi);
I fi: bounded continuous probability density function of Xi;
I π(k)= [o1,· · · , ok]: sequence of k objects inO;
I P r(π(k)): probability ofπ(k) be the top-ksequence; P r(π(k)) =
Z ∞
−∞
Z x1
−∞
· · · Z xk
−∞
f1(x1)· · ·fn(xn) dxn· · ·dx1
(1)
I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).
Top-k Queries over Uncertain Scores Problem Definition
Problem Definition
I O: a set ofn objects;
I s(oi): the score of an objectoi ∈ O;
I Xi: a random variable, equals tos(oi);
I fi: bounded continuous probability density function of Xi;
I π(k)= [o1,· · · , ok]: sequence of k objects inO;
I P r(π(k)): probability ofπ(k) be the top-ksequence; P r(π(k)) =
Z ∞
−∞
Z x1
−∞
· · · Z xk
−∞
f1(x1)· · ·fn(xn) dxn· · ·dx1
(1)
I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).
Top-k Queries over Uncertain Scores Problem Definition
Problem Definition
I O: a set ofn objects;
I s(oi): the score of an objectoi ∈ O;
I Xi: a random variable, equals tos(oi);
I fi: bounded continuous probability density function of Xi;
I π(k)= [o1,· · · , ok]: sequence of k objects inO;
I P r(π(k)): probability ofπ(k) be the top-ksequence; P r(π(k)) =
Z ∞
−∞
Z x1
−∞
· · · Z xk
−∞
f1(x1)· · ·fn(xn) dxn· · ·dx1
(1)
I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).
Top-k Queries over Uncertain Scores Problem Definition
Problem Definition
I O: a set ofn objects;
I s(oi): the score of an objectoi ∈ O;
I Xi: a random variable, equals tos(oi);
I fi: bounded continuous probability density function of Xi;
I π(k)= [o1,· · · , ok]: sequence of k objects inO;
I P r(π(k)): probability ofπ(k) be the top-ksequence; P r(π(k)) =
Z ∞
−∞
Z x1
−∞
· · · Z xk
−∞
f1(x1)· · ·fn(xn) dxn· · ·dx1
(1)
I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).
Top-k Queries over Uncertain Scores Problem Definition
Problem Definition
I O: a set ofn objects;
I s(oi): the score of an objectoi ∈ O;
I Xi: a random variable, equals tos(oi);
I fi: bounded continuous probability density function of Xi;
I π(k)= [o1,· · · , ok]: sequence of kobjects inO;
I P r(π(k)): probability ofπ(k) be the top-ksequence; P r(π(k)) =
Z ∞
−∞
Z x1
−∞
· · · Z xk
−∞
f1(x1)· · ·fn(xn) dxn· · ·dx1
(1)
I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).
Top-k Queries over Uncertain Scores Problem Definition
Problem Definition
I O: a set ofn objects;
I s(oi): the score of an objectoi ∈ O;
I Xi: a random variable, equals tos(oi);
I fi: bounded continuous probability density function of Xi;
I π(k)= [o1,· · · , ok]: sequence of kobjects inO;
I P r(π(k)): probability ofπ(k) be the top-ksequence;
P r(π(k)) = Z ∞
−∞
Z x1
−∞
· · · Z xk
−∞
f1(x1)· · ·fn(xn) dxn· · ·dx1 (1)
I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).
Top-k Queries over Uncertain Scores Problem Definition
Problem Definition
I O: a set ofn objects;
I s(oi): the score of an objectoi ∈ O;
I Xi: a random variable, equals tos(oi);
I fi: bounded continuous probability density function of Xi;
I π(k)= [o1,· · · , ok]: sequence of kobjects inO;
I P r(π(k)): probability ofπ(k) be the top-ksequence;
P r(π(k)) = Z ∞
−∞
Z x1
−∞
· · · Z xk
−∞
f1(x1)· · ·fn(xn) dxn· · ·dx1
(1)
I (Objective:) Probabilistic top-k sequence: the π(k) that maximizes P r(π(k)).
Top-k Queries over Uncertain Scores Solutions
Solutions
I Naive: calculateP r(π(k)) for every possible sequenceπ(k) and returning the π(k) with the highestP r(π(k)).
I n!
(n−k)! possible sequences to examine.
I Branch-and-Bound[Soliman et al., 2010]: Prune some π(k)s.
I Worst case: (n−k)!n! possible sequences to examine.
I Soliman’s Algorithm [Soliman et al., 2010]: searches the space of candidate probabilistic top-k sequences using a Markov chain Monte Carlo algorithm.
I In this paper, we explore the variants of Markov chain Monte Carlo algorithms.
Top-k Queries over Uncertain Scores Solutions
Solutions
I Naive: calculateP r(π(k)) for every possible sequenceπ(k) and returning the π(k) with the highestP r(π(k)).
I n!
(n−k)! possible sequences to examine.
I Branch-and-Bound[Soliman et al., 2010]: Prune some π(k)s.
I Worst case: (n−k)!n! possible sequences to examine.
I Soliman’s Algorithm [Soliman et al., 2010]: searches the space of candidate probabilistic top-k sequences using a Markov chain Monte Carlo algorithm.
I In this paper, we explore the variants of Markov chain Monte Carlo algorithms.
Top-k Queries over Uncertain Scores Solutions
Solutions
I Naive: calculateP r(π(k)) for every possible sequenceπ(k) and returning the π(k) with the highestP r(π(k)).
I n!
(n−k)! possible sequences to examine.
I Branch-and-Bound[Soliman et al., 2010]: Prune some π(k)s.
I Worst case: (n−k)!n! possible sequences to examine.
I Soliman’s Algorithm [Soliman et al., 2010]: searches the space of candidate probabilistic top-k sequences using a Markov chain Monte Carlo algorithm.
I In this paper, we explore the variants of Markov chain Monte Carlo algorithms.
Top-k Queries over Uncertain Scores Solutions
Solutions
I Naive: calculateP r(π(k)) for every possible sequenceπ(k) and returning the π(k) with the highestP r(π(k)).
I n!
(n−k)! possible sequences to examine.
I Branch-and-Bound[Soliman et al., 2010]: Prune some π(k)s.
I Worst case: (n−k)!n! possible sequences to examine.
I Soliman’s Algorithm [Soliman et al., 2010]: searches the space of candidate probabilistic top-k sequences using a Markov chain Monte Carlo algorithm.
I In this paper, we explore the variants of Markov chain Monte Carlo algorithms.
Top-k Queries over Uncertain Scores Solutions
Markov chain Monte Carlo Algorithms
I Soliman’s Algorithm
I Initial state: a rank over thenobjects
I Candidate State:
𝑜1
𝑜2
𝑜3
𝑜4
𝑜5
𝑜6
𝑜7
Pr(𝑜2< 𝑜3) 𝑜1
𝑜3
𝑜2
𝑜4
𝑜5
𝑜6
𝑜7
Pr(𝑜2< 𝑜4) Top-𝑘
𝑜1
𝑜2
𝑜3
𝑜4
𝑜5
𝑜6
𝑜7
Pr(𝑜6> 𝑜5) 𝑜1
𝑜2
𝑜3
𝑜4
𝑜6
𝑜5
𝑜7
Pr(𝑜6> 𝑜4)
I Acceptance Probability: α= min(P r(π
(k)
t+1)·P r(πt|πt+1) P r(πt(k))·P r(πt+1|πt),1)
Top-k Queries over Uncertain Scores Solutions
Markov chain Monte Carlo Algorithms
I Soliman’s Algorithm
I Initial state: a rank over thenobjects
I Candidate State:
𝑜1
𝑜2
𝑜3
𝑜4
𝑜5
𝑜6
𝑜7
Pr(𝑜2< 𝑜3) 𝑜1
𝑜3
𝑜2
𝑜4
𝑜5
𝑜6
𝑜7
Pr(𝑜2< 𝑜4) Top-𝑘
𝑜1
𝑜2
𝑜3
𝑜4
𝑜5
𝑜6
𝑜7
Pr(𝑜6> 𝑜5) 𝑜1
𝑜2
𝑜3
𝑜4
𝑜6
𝑜5
𝑜7
Pr(𝑜6> 𝑜4)
I Acceptance Probability: α= min(P r(π
(k)
t+1)·P r(πt|πt+1) P r(πt(k))·P r(πt+1|πt),1)
Top-k Queries over Uncertain Scores Solutions
Markov chain Monte Carlo Algorithms
I Soliman’s Algorithm
I Initial state: a rank over thenobjects
I Candidate State:
𝑜1
𝑜2
𝑜3
𝑜4
𝑜5
𝑜6
𝑜7
Pr(𝑜2< 𝑜3) 𝑜1
𝑜3
𝑜2
𝑜4
𝑜5
𝑜6
𝑜7
Pr(𝑜2< 𝑜4) Top-𝑘
𝑜1
𝑜2
𝑜3
𝑜4
𝑜5
𝑜6
𝑜7
Pr(𝑜6> 𝑜5) 𝑜1
𝑜2
𝑜3
𝑜4
𝑜6
𝑜5
𝑜7
Pr(𝑜6> 𝑜4)
I Acceptance Probability: α= min(P r(π
(k)
t+1)·P r(πt|πt+1) P r(πt(k))·P r(πt+1|πt),1)
Top-k Queries over Uncertain Scores Solutions
Markov chain Monte Carlo Algorithms
I Swap andSwapEXP Algorithm
I Initial state: a rank over thenobjects
I Candidate State:
𝑜1
𝑜2
𝑜3
𝑜4
𝑜5
𝑜6
𝑜7 Top-𝑘
𝑜1
𝑜5
𝑜3
𝑜4
𝑜2
𝑜6
𝑜7 I Acceptance Probability:
Swap: α= min(P r(π
(k) t+1)·kn1
P r(πt(k))·kn1 =P r(π
(k) t+1) P r(πt(k)),1) SwapEXP:
α= min(P r(πc
(k) t+1)
P r(πc t(k)) = exp(β(P r(π(k)t+1)−P r(πt(k)))),1) (P r(πc (k)) =Cβ−1exp(βP r(π(k))))
I SwapEXP is more likely to reject the “worse” candidate state.
Top-k Queries over Uncertain Scores Solutions
Markov chain Monte Carlo Algorithms
I Swap andSwapEXP Algorithm
I Initial state: a rank over thenobjects
I Candidate State:
𝑜1
𝑜2
𝑜3
𝑜4
𝑜5
𝑜6
𝑜7 Top-𝑘
𝑜1
𝑜5
𝑜3
𝑜4
𝑜2
𝑜6
𝑜7
I Acceptance Probability: Swap: α= min(P r(π
(k) t+1)·kn1
P r(πt(k))·kn1 =P r(π
(k) t+1) P r(πt(k)),1) SwapEXP:
α= min(P r(πc
(k) t+1)
P r(πc t(k)) = exp(β(P r(π(k)t+1)−P r(πt(k)))),1) (P r(πc (k)) =Cβ−1exp(βP r(π(k))))
I SwapEXP is more likely to reject the “worse” candidate state.
Top-k Queries over Uncertain Scores Solutions
Markov chain Monte Carlo Algorithms
I Swap andSwapEXP Algorithm
I Initial state: a rank over thenobjects
I Candidate State:
𝑜1
𝑜2
𝑜3
𝑜4
𝑜5
𝑜6
𝑜7 Top-𝑘
𝑜1
𝑜5
𝑜3
𝑜4
𝑜2
𝑜6
𝑜7 I Acceptance Probability:
Swap: α= min(P r(π
(k) t+1)·kn1
P r(πt(k))·kn1 =P r(π
(k) t+1) P r(πt(k)),1) SwapEXP:
α= min(P r(πc
(k) t+1)
P r(πc t(k)) = exp(β(P r(π(k)t+1)−P r(πt(k)))),1) (P r(πc (k)) =Cβ−1exp(βP r(π(k))))
I SwapEXP is more likely to reject the “worse” candidate state.
Top-k Queries over Uncertain Scores Solutions
Markov chain Monte Carlo Algorithms
I Swap andSwapEXP Algorithm
I Initial state: a rank over thenobjects
I Candidate State:
𝑜1
𝑜2
𝑜3
𝑜4
𝑜5
𝑜6
𝑜7 Top-𝑘
𝑜1
𝑜5
𝑜3
𝑜4
𝑜2
𝑜6
𝑜7 I Acceptance Probability:
Swap: α= min(P r(π
(k) t+1)·kn1
P r(πt(k))·kn1 =P r(π
(k) t+1) P r(πt(k)),1) SwapEXP:
α= min(P r(πc
(k) t+1)
P r(πc t(k)) = exp(β(P r(π(k)t+1)−P r(πt(k)))),1) (P r(πc (k)) =Cβ−1exp(βP r(π(k))))
I SwapEXP is more likely to reject the “worse” candidate state.
Top-k Queries over Uncertain Scores Solutions
Markov chain Monte Carlo Algorithms
I ReSampleand ReSampleEXPAlgorithm
I Initial state: a rank over thenobjects
I Candidate State:
𝑜1: 9 𝑜2: 8 𝑜3: 6 𝑜4: 5 𝑜5: 4 𝑜6: 3 𝑜7: 2
Top-𝑘
𝑜1: 9 𝑜2: 8 𝑜5: 7 𝑜3: 6 𝑜4: 5 𝑜6: 3 𝑜7: 2 I Acceptance Probability:
ReSample: α= min(P r(π
(k)
t+1)·P r(πt|πt+1) P r(π(k)t )·P r(πt+1|πt),1) ReSampleEXP:
α= min(P r(πP r(πt|πt+1)
t+1|πt)·exp(β(P r(πt+1(k))−P r(π(k)t ))),1).
Top-k Queries over Uncertain Scores Solutions
Markov chain Monte Carlo Algorithms
I ReSampleand ReSampleEXPAlgorithm
I Initial state: a rank over thenobjects
I Candidate State:
𝑜1: 9 𝑜2: 8 𝑜3: 6 𝑜4: 5 𝑜5: 4 𝑜6: 3 𝑜7: 2
Top-𝑘
𝑜1: 9 𝑜2: 8 𝑜5: 7 𝑜3: 6 𝑜4: 5 𝑜6: 3 𝑜7: 2
I Acceptance Probability: ReSample: α= min(P r(π
(k)
t+1)·P r(πt|πt+1) P r(π(k)t )·P r(πt+1|πt),1) ReSampleEXP:
α= min(P r(πP r(πt|πt+1)
t+1|πt)·exp(β(P r(πt+1(k))−P r(π(k)t ))),1).
Top-k Queries over Uncertain Scores Solutions
Markov chain Monte Carlo Algorithms
I ReSampleand ReSampleEXPAlgorithm
I Initial state: a rank over thenobjects
I Candidate State:
𝑜1: 9 𝑜2: 8 𝑜3: 6 𝑜4: 5 𝑜5: 4 𝑜6: 3 𝑜7: 2
Top-𝑘
𝑜1: 9 𝑜2: 8 𝑜5: 7 𝑜3: 6 𝑜4: 5 𝑜6: 3 𝑜7: 2 I Acceptance Probability:
ReSample: α= min(P r(π
(k)
t+1)·P r(πt|πt+1) P r(π(k)t )·P r(πt+1|πt),1) ReSampleEXP:
α= min(P r(πP r(πt|πt+1)
t+1|πt)·exp(β(P r(πt+1(k))−P r(π(k)t ))),1).
Top-k Queries over Uncertain Scores Solutions
Markov chain Monte Carlo Algorithms
I ReSampleAllAlgorithm
I Initial state: a rank over thenobjects
I Candidate State:
𝑜1: 9 𝑜2: 8 𝑜3: 6 𝑜4: 5 𝑜5: 4 𝑜6: 3 𝑜7: 2
Top-𝑘
𝑜3: 10 𝑜2: 9 𝑜5: 8 𝑜1: 6 𝑜7: 4 𝑜6: 3 𝑜4: 2 I Acceptance Probability: ReSample: α= 1
Top-k Queries over Uncertain Scores Solutions
Markov chain Monte Carlo Algorithms
I ReSampleAllAlgorithm
I Initial state: a rank over thenobjects
I Candidate State:
𝑜1: 9 𝑜2: 8 𝑜3: 6 𝑜4: 5 𝑜5: 4 𝑜6: 3 𝑜7: 2
Top-𝑘
𝑜3: 10 𝑜2: 9 𝑜5: 8 𝑜1: 6 𝑜7: 4 𝑜6: 3 𝑜4: 2
I Acceptance Probability: ReSample: α= 1
Top-k Queries over Uncertain Scores Solutions
Markov chain Monte Carlo Algorithms
I ReSampleAllAlgorithm
I Initial state: a rank over thenobjects
I Candidate State:
𝑜1: 9 𝑜2: 8 𝑜3: 6 𝑜4: 5 𝑜5: 4 𝑜6: 3 𝑜7: 2
Top-𝑘
𝑜3: 10 𝑜2: 9 𝑜5: 8 𝑜1: 6 𝑜7: 4 𝑜6: 3 𝑜4: 2 I Acceptance Probability: ReSample: α= 1
Top-k Queries over Uncertain Scores Performance Evaluation
Performance Evaluation
I Datasets: synthetic datasets
Table: Distributions
Setting 1 Setting 2 Setting 3
median score G(0.5,0.05) G(0.5,0.2) U[0,1]
width G(0.5,0.05) G(0.5,0.2) U[0,1]
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
I default: uniform score distributions, median score ofoi: li+u2 i, width: ui−li
I Metrics
I Probability of the Probabilistic top-k sequence (higher→ better)
I Convergence of the Markov chains (Gelman-Rubin Convergence Diagnostic)
I Efficiency (Complexity and runtime)
Top-k Queries over Uncertain Scores Performance Evaluation
Performance Evaluation
I Datasets: synthetic datasets
Table: Distributions
Setting 1 Setting 2 Setting 3
median score G(0.5,0.05) G(0.5,0.2) U[0,1]
width G(0.5,0.05) G(0.5,0.2) U[0,1]
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
I default: uniform score distributions, median score ofoi: li+u2 i, width: ui−li
I Metrics
I Probability of the Probabilistic top-k sequence (higher→ better)
I Convergence of the Markov chains (Gelman-Rubin Convergence Diagnostic)
I Efficiency (Complexity and runtime)
Top-k Queries over Uncertain Scores Performance Evaluation
Performance Evaluation
I Datasets: synthetic datasets
Table: Distributions
Setting 1 Setting 2 Setting 3
median score G(0.5,0.05) G(0.5,0.2) U[0,1]
width G(0.5,0.05) G(0.5,0.2) U[0,1]
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
I default: uniform score distributions, median score ofoi: li+u2 i, width: ui−li
I Metrics
I Probability of the Probabilistic top-k sequence (higher→ better)
I Convergence of the Markov chains (Gelman-Rubin Convergence Diagnostic)
I Efficiency (Complexity and runtime)
Top-k Queries over Uncertain Scores Performance Evaluation
Performance Evaluation
I Datasets: synthetic datasets
Table: Distributions
Setting 1 Setting 2 Setting 3
median score G(0.5,0.05) G(0.5,0.2) U[0,1]
width G(0.5,0.05) G(0.5,0.2) U[0,1]
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
I default: uniform score distributions, median score ofoi: li+u2 i, width: ui−li
I Metrics
I Probability of the Probabilistic top-k sequence (higher→ better)
I Convergence of the Markov chains (Gelman-Rubin Convergence Diagnostic)
I Efficiency (Complexity and runtime)
Top-k Queries over Uncertain Scores Performance Evaluation
Performance Evaluation
I Datasets: synthetic datasets
Table: Distributions
Setting 1 Setting 2 Setting 3
median score G(0.5,0.05) G(0.5,0.2) U[0,1]
width G(0.5,0.05) G(0.5,0.2) U[0,1]
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
I default: uniform score distributions, median score ofoi: li+u2 i, width: ui−li
I Metrics
I Probability of the Probabilistic top-k sequence (higher→ better)
I Convergence of the Markov chains (Gelman-Rubin Convergence Diagnostic)
I Efficiency (Complexity and runtime)
Top-k Queries over Uncertain Scores Performance Evaluation
Performance Evaluation
I Datasets: synthetic datasets
Table: Distributions
Setting 1 Setting 2 Setting 3
median score G(0.5,0.05) G(0.5,0.2) U[0,1]
width G(0.5,0.05) G(0.5,0.2) U[0,1]
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
00.10.20.30.40.50.60.70.80.91
0 1 2 3 4 5 6 7 8
I default: uniform score distributions, median score ofoi: li+u2 i, width: ui−li
I Metrics
I Probability of the Probabilistic top-k sequence (higher→ better)
I Convergence of the Markov chains (Gelman-Rubin Convergence Diagnostic)
I Efficiency (Complexity and runtime)
Top-k Queries over Uncertain Scores Performance Evaluation
Effectiveness of Six Algorithms
Effectiveness of Six Algorithms (Probability)
0 5 10
x 104 10−8
10−7 10−6 10−5
Chain Length
Probability
Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll
(a) Dataset5
0 5 10
x 104 10−8
10−7 10−6 10−5
Chain Length
Probability
Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll
(b) Dataset21
1 2 3 4 5 6
0.5 1 1.5 2 2.5
Top-k Queries over Uncertain Scores Performance Evaluation
Convergence of Six Algorithms
Convergence of the Markov Chains
0 5 10
x 104 0
2 4 6 8 10
Chain Length
Gelman−Rubin Diagnostic
Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll
(e) Dataset5
0 5 10
x 104 0
2 4 6 8 10
Chain Length
Gelman−Rubin Diagnostic
Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll
(f) Dataset21
0 0.2 0.4 0.6 0.8 1
0 1 2 3 4 5 6
(g) Dataset5 0 0.2 0.4 0.6 0.8 1
0 0.5 1 1.5 2 2.5
(h) Dataset21
Top-k Queries over Uncertain Scores Performance Evaluation
Efficiency
Efficiency
Table: Worst Case Time Complexity of Generating Next State
Soliman Swap(EXP) ReSample(EXP) ReSampleAll
Time Complexity O(nk) O(1) O(n) O(nlogk)
Table: Runtime Per Step of the Algorithms (seconds)
Soliman Swap SwapEXP ReSample ReSampleEXP ReSampleAll
Runtime Per Step 0.0058 1.9128 0.1163 0.0523 0.0071 0.9056
Top-k Queries over Uncertain Scores Conclusion
Conclusion
I We explore the design space for Metropolis-Hastings Markov chain Monte Carlo algorithms.
I We verify through extensive experiments that the proposed algorithms are more effective than the state of the art approach.
I ReSampleAllis the best, since it samples directly from the target distribution instead of depending on “local”
information.
Top-k Queries over Uncertain Scores Conclusion
Conclusion
I We explore the design space for Metropolis-Hastings Markov chain Monte Carlo algorithms.
I We verify through extensive experiments that the proposed algorithms are more effective than the state of the art approach.
I ReSampleAllis the best, since it samples directly from the target distribution instead of depending on “local”
information.
Top-k Queries over Uncertain Scores Conclusion
Conclusion
I We explore the design space for Metropolis-Hastings Markov chain Monte Carlo algorithms.
I We verify through extensive experiments that the proposed algorithms are more effective than the state of the art approach.
I ReSampleAllis the best, since it samples directly from the target distribution instead of depending on “local”
information.
Top-k Queries over Uncertain Scores Q/A
Thank you! Questions?
Top-k Queries Uncertain Scores
MCMC
[email protected]
Top-k Queries over Uncertain Scores References
References I
Soliman, M. A. and Ilyas, I. F. (2009).
Ranking with uncertain scores.
InICDE, pages 317–328.
Soliman, M. A., Ilyas, I. F., and Ben-David, S. (2010).
Supporting ranking queries on uncertain and incomplete data.
The VLDB Journal, 19(4):477–501.