• Aucun résultat trouvé

Get a Sample for a Discount

N/A
N/A
Protected

Academic year: 2022

Partager "Get a Sample for a Discount"

Copied!
60
0
0

Texte intégral

(1)

Get a Sample for a Discount

Get a Sample for a Discount

Sampling-Based XML Data Pricing

Ruiming Tang, Antoine Amarilli, Pierre Senellart, St´ephane Bressan

(2)

Get a Sample for a Discount Introduction

Introduction

I In most data pricing literature

(e.g., [Koutris et al., 2012a, Koutris et al., 2012b,

Koutris et al., 2013, Li and Miklau, 2012]), data prices are prescribed and not negotiable, and give access to the best data quality that the provider can achieve.

I We propose a pricing framework, in which data quality can be traded for discounted prices. We allow a data consumer to propose her own price for the requested data.

I If the proposed price is the full price of the requested data, the requested data is returned to the data consumer.

I If the proposed price is less than the full price, a lower quality version of the requested data is returned.

I What are the dimensions to access data quality?

2 / 25

(3)

Get a Sample for a Discount Introduction

Introduction

I In most data pricing literature

(e.g., [Koutris et al., 2012a, Koutris et al., 2012b,

Koutris et al., 2013, Li and Miklau, 2012]), data prices are prescribed and not negotiable, and give access to the best data quality that the provider can achieve.

I We propose a pricing framework, in which data quality can be traded for discounted prices. We allow a data consumer to propose her own price for the requested data.

I If the proposed price is the full price of the requested data, the requested data is returned to the data consumer.

I If the proposed price is less than the full price, a lower quality version of the requested data is returned.

I What are the dimensions to access data quality?

(4)

Get a Sample for a Discount Introduction

Introduction

I In most data pricing literature

(e.g., [Koutris et al., 2012a, Koutris et al., 2012b,

Koutris et al., 2013, Li and Miklau, 2012]), data prices are prescribed and not negotiable, and give access to the best data quality that the provider can achieve.

I We propose a pricing framework, in which data quality can be traded for discounted prices. We allow a data consumer to propose her own price for the requested data.

I If the proposed price is the full price of the requested data, the requested data is returned to the data consumer.

I If the proposed price is less than the full price, a lower quality version of the requested data is returned.

I What are the dimensions to access data quality?

2 / 25

(5)

Get a Sample for a Discount Introduction

Introduction

I In most data pricing literature

(e.g., [Koutris et al., 2012a, Koutris et al., 2012b,

Koutris et al., 2013, Li and Miklau, 2012]), data prices are prescribed and not negotiable, and give access to the best data quality that the provider can achieve.

I We propose a pricing framework, in which data quality can be traded for discounted prices. We allow a data consumer to propose her own price for the requested data.

I If the proposed price is the full price of the requested data, the requested data is returned to the data consumer.

I If the proposed price is less than the full price, a lower quality version of the requested data is returned.

I What are the dimensions to access data quality?

(6)

Get a Sample for a Discount Introduction

Introduction

I Data quality dimensions (defined in

[Pipino et al., 2002, Wang and Strong, 1996]):

I intrinsic quality (believability, objectivity, accuracy, reputation)

I contextual quality (value-added, relevancy, timeliness, ease of operation, appropriate amount of data, completeness)

I representational quality (interpretability, ease of understanding, concise representation, consistent representation)

I accessibility quality (accessibility, security)

I In our previous work [Tang et al., 2013], we proposed a pricing framework for relational data, in whichaccuracy can be traded for discounted prices.

I In this paper, we propose a pricing framework for XMLdata, in which completeness can be traded for discounted prices.

3 / 25

(7)

Get a Sample for a Discount Introduction

Introduction

I Data quality dimensions (defined in

[Pipino et al., 2002, Wang and Strong, 1996]):

I intrinsic quality (believability, objectivity, accuracy, reputation)

I contextual quality (value-added, relevancy, timeliness, ease of operation, appropriate amount of data, completeness)

I representational quality (interpretability, ease of understanding, concise representation, consistent representation)

I accessibility quality (accessibility, security)

I In our previous work [Tang et al., 2013], we proposed a pricing framework for relational data, in whichaccuracy can be traded for discounted prices.

I In this paper, we propose a pricing framework for XMLdata, in which completeness can be traded for discounted prices.

(8)

Get a Sample for a Discount Introduction

Introduction

I Data quality dimensions (defined in

[Pipino et al., 2002, Wang and Strong, 1996]):

I intrinsic quality (believability, objectivity, accuracy, reputation)

I contextual quality (value-added, relevancy, timeliness, ease of operation, appropriate amount of data, completeness)

I representational quality (interpretability, ease of understanding, concise representation, consistent representation)

I accessibility quality (accessibility, security)

I In our previous work [Tang et al., 2013], we proposed a pricing framework for relational data, in whichaccuracy can be traded for discounted prices.

I In this paper, we propose a pricing framework for XMLdata, in which completeness can be traded for discounted prices.

3 / 25

(9)

Get a Sample for a Discount Introduction

Introduction

I Data quality dimensions (defined in

[Pipino et al., 2002, Wang and Strong, 1996]):

I intrinsic quality (believability, objectivity, accuracy, reputation)

I contextual quality (value-added, relevancy, timeliness, ease of operation, appropriate amount of data, completeness)

I representational quality (interpretability, ease of understanding, concise representation, consistent representation)

I accessibility quality (accessibility, security)

I In our previous work [Tang et al., 2013], we proposed a pricing framework for relational data, in whichaccuracy can be traded for discounted prices.

I In this paper, we propose a pricing framework for XMLdata, in which completenesscan be traded for discounted prices.

(10)

Get a Sample for a Discount Framework Description

Our Framework

I Three main actors in data markets:

I Data provider: she has an XML document and sets a price to this document. She also assigns aweightto each node.

I Data consumer: she proposes a price for the document. The proposed price may be lower than the price of the document because

I she has a limited budget

I she wants to explore the document before the full purchase

I Data market owner:

I she negotiates with the data provider apricing functionto decide thecompletenessof a sample that should be returned, according to a proposed price

I she samples a rooted subtree of the requested document with the decided completenessuniformly at random.

4 / 25

(11)

Get a Sample for a Discount Framework Description

Our Framework

I Three main actors in data markets:

I Data provider: she has an XML document and sets a price to this document. She also assigns aweightto each node.

I Data consumer: she proposes a price for the document. The proposed price may be lower than the price of the document because

I she has a limited budget

I she wants to explore the document before the full purchase

I Data market owner:

I she negotiates with the data provider apricing functionto decide thecompletenessof a sample that should be returned, according to a proposed price

I she samples a rooted subtree of the requested document with the decided completenessuniformly at random.

(12)

Get a Sample for a Discount Framework Description

Our Framework

I Three main actors in data markets:

I Data provider: she has an XML document and sets a price to this document. She also assigns aweightto each node.

I Data consumer: she proposes a price for the document. The proposed price may be lower than the price of the document because

I she has a limited budget

I she wants to explore the document before the full purchase

I Data market owner:

I she negotiates with the data provider apricing functionto decide thecompletenessof a sample that should be returned, according to a proposed price

I she samples a rooted subtree of the requested document with the decided completenessuniformly at random.

4 / 25

(13)

Get a Sample for a Discount Framework Description

Our Framework

I Three main actors in data markets:

I Data provider: she has an XML document and sets a price to this document. She also assigns aweightto each node.

I Data consumer: she proposes a price for the document. The proposed price may be lower than the price of the document because

I she has a limited budget

I she wants to explore the document before the full purchase

I Data market owner:

I she negotiates with the data provider apricing functionto decide thecompletenessof a sample that should be returned, according to a proposed price

I she samples a rooted subtree of the requested document with the decided completenessuniformly at random.

(14)

Get a Sample for a Discount Framework Description

Weight and Completeness of a rooted subtree

I A rooted subtree t0 of a tree tis (1) a subtree of tand (2) root(t0) =root(t).

I The weight of a tree(weight(t)) is intuitively, the sum weight of all the nodes.

I Completenessof t0 with respect to a treet(t0 is a rooted subtree of t) isct(t0) = weight(tweight(t)0).

5 / 25

(15)

Get a Sample for a Discount Framework Description

Weight and Completeness of a rooted subtree

I A rooted subtree t0 of a tree tis (1) a subtree of tand (2) root(t0) =root(t).

I The weight of a tree(weight(t)) is intuitively, the sum weight of all the nodes.

I Completenessof t0 with respect to a treet(t0 is a rooted subtree of t) isct(t0) = weight(tweight(t)0).

(16)

Get a Sample for a Discount Framework Description

Weight and Completeness of a rooted subtree

I A rooted subtree t0 of a tree tis (1) a subtree of tand (2) root(t0) =root(t).

I The weight of a tree(weight(t)) is intuitively, the sum weight of all the nodes.

I Completenessof t0 with respect to a treet(t0 is a rooted subtree of t) is ct(t0) = weight(tweight(t)0).

5 / 25

(17)

Get a Sample for a Discount Framework Description

Pricing Function

I The pricing functionof a tree tis a function

φt: [0,1]→Q+. Its input is the completeness of a rooted subtree t0 and it returns the price of t0, as a non-negative rational.

I Non-decreasing. The more complete a rooted subtree is, the more expensive it should be, i.e., c1 ≥c2 ⇒φt(c1)≥φt(c2).

I Arbitrage-free. Buying a rooted subtree of completeness c1+c2 should not be more expensive than buying two subtrees with respective completeness c1 and c2, i.e., φt(c1) +φt(c2)≥φt(c1+c2). This property is useful when considering repeated requests.

(18)

Get a Sample for a Discount Framework Description

Pricing Function

I The pricing functionof a tree tis a function

φt: [0,1]→Q+. Its input is the completeness of a rooted subtree t0 and it returns the price of t0, as a non-negative rational.

I Non-decreasing. The more complete a rooted subtree is, the more expensive it should be, i.e., c1 ≥c2 ⇒φt(c1)≥φt(c2).

I Arbitrage-free. Buying a rooted subtree of completeness c1+c2 should not be more expensive than buying two subtrees with respective completeness c1 and c2, i.e., φt(c1) +φt(c2)≥φt(c1+c2). This property is useful when considering repeated requests.

6 / 25

(19)

Get a Sample for a Discount Framework Description

Pricing Function

I The pricing functionof a tree tis a function

φt: [0,1]→Q+. Its input is the completeness of a rooted subtree t0 and it returns the price of t0, as a non-negative rational.

I Non-decreasing. The more complete a rooted subtree is, the more expensive it should be, i.e., c1 ≥c2 ⇒φt(c1)≥φt(c2).

I Arbitrage-free. Buying a rooted subtree of completeness c1+c2 should not be more expensive than buying two subtrees with respective completeness c1 and c2, i.e., φt(c1) +φt(c2)≥φt(c1+c2). This property is useful when considering repeated requests.

(20)

Get a Sample for a Discount Framework Description

Pricing Function

I Minimum and maximum bound. We should have

φt(0) =prmin andφt(1) =prt, whereprmin is the minimum cost that a data consumer has to pay using the data market andprt is the price of the whole tree t.

I All these properties can be satisfied, for instance, by functions of the form φt(c) = (prt−prmin)cp+prmin wherep≤1.

7 / 25

(21)

Get a Sample for a Discount Framework Description

Pricing Function

I Minimum and maximum bound. We should have

φt(0) =prmin andφt(1) =prt, whereprmin is the minimum cost that a data consumer has to pay using the data market andprt is the price of the whole tree t.

I All these properties can be satisfied, for instance, by functions of the form φt(c) = (prt−prmin)cp+prmin wherep≤1.

(22)

Get a Sample for a Discount Framework Description

Sampling Problem

I Given a proposed price pr0, once a completeness value c∈φ−1t (pr0) is chosen, the weight of the returned rooted subtree is fixed as c×weight(t).

I We consider the problem of uniform sampling a rooted subtree with prescribed weight (instead with prescribed completeness). The problem of sampling a rooted subtree, given a tree tand a weight k, is to sample a rooted subtree t0 of t, such that weight(t0) =k,uniformly at random.

I Why “uniformly at random”? To be fair to the data consumer, there should be an equal chance to explore every possible part of the XML document that is worth the proposed price.

8 / 25

(23)

Get a Sample for a Discount Framework Description

Sampling Problem

I Given a proposed price pr0, once a completeness value c∈φ−1t (pr0) is chosen, the weight of the returned rooted subtree is fixed as c×weight(t).

I We consider the problem of uniform sampling a rooted subtree with prescribed weight (instead with prescribed completeness). The problem of sampling a rooted subtree, given a tree tand a weight k, is to sample a rooted subtree t0 of t, such that weight(t0) =k,uniformly at random.

I Why “uniformly at random”? To be fair to the data consumer, there should be an equal chance to explore every possible part of the XML document that is worth the proposed price.

(24)

Get a Sample for a Discount Framework Description

Sampling Problem

I Given a proposed price pr0, once a completeness value c∈φ−1t (pr0) is chosen, the weight of the returned rooted subtree is fixed as c×weight(t).

I We consider the problem of uniform sampling a rooted subtree with prescribed weight (instead with prescribed completeness). The problem of sampling a rooted subtree, given a tree tand a weight k, is to sample a rooted subtree t0 of t, such that weight(t0) =k,uniformly at random.

I Why “uniformly at random”? To be fair to the data consumer, there should be an equal chance to explore every possible part of the XML document that is worth the proposed price.

8 / 25

(25)

Get a Sample for a Discount Tractability of the Sampling Problem

Tractability of the Sampling Problem

I Given a tree tand a weightx, it is NP-hard to sample a rooted subtree of tof weight x uniformly at random.

I Tractable cases:

I Unweighted Sampling: w(n) = 1for alln.

I 0/1-weights Sampling.: w(n)∈ {0,1} for alln.

(26)

Get a Sample for a Discount Tractability of the Sampling Problem

Tractability of the Sampling Problem

I Given a tree tand a weightx, it is NP-hard to sample a rooted subtree of tof weight x uniformly at random.

I Tractable cases:

I Unweighted Sampling: w(n) = 1for alln.

I 0/1-weights Sampling.: w(n)∈ {0,1} for alln.

9 / 25

(27)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Binary Trees

Unweighted Sampling for Binary Trees

I We first present a sampling algorithm for binary trees and extend the algorithm to any unranked trees.

I First phase: Subtree Counting. We start by computing a matrix Dsuch that, for every node ni of the input tree tand any value 0≤k≤size(t),Di[k]is the number of subtrees of sizek rooted at nodeni.

(28)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Binary Trees

Unweighted Sampling for Binary Trees

I We first present a sampling algorithm for binary trees and extend the algorithm to any unranked trees.

I First phase: Subtree Counting. We start by computing a matrix Dsuch that, for every node ni of the input tree tand any value 0≤k≤size(t),Di[k]is the number of subtrees of sizek rooted at nodeni.

10 / 25

(29)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Binary Trees

Unweighted Sampling for Binary Trees

I n0

n1

n2 n3

n5

n4

n0

n1

n2 n3

n5

n4

n6

n0

n1

n3 n4

n6

n5

n2

I For leaf nodes, there is one rooted subtree of 

size 1 and one rooted subtree of size 0, 

therefore  1,1

1,2,1 therefore  1,1,2,1

1,2,3,3,1 therefore  1,1,2,3,3,1

1,2,3,5,6,4,1 therefore  1,1,2,3,5,6,4,1

I

1,1,2,1 size=0, 

empty tree

size=0 size=0 size=1

size=0 size=1 size=1

size=3

size=1 size=0 size=1

size=2

(30)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Binary Trees

Unweighted Sampling for Binary Trees

I n0

n1

n2 n3

n5

n4

n0

n1

n2 n3

n5

n4

n6

n0

n1

n3 n4

n6

n5

n2

I For leaf nodes, there is one rooted subtree of 

size 1 and one rooted subtree of size 0, 

therefore  1,1

1,2,1 therefore  1,1,2,1

1,2,3,3,1 therefore  1,1,2,3,3,1

1,2,3,5,6,4,1 therefore  1,1,2,3,5,6,4,1

I

1,1,2,1 size=0, 

empty tree

size=0 size=0 size=1

size=0 size=1 size=1

size=3

size=1 size=0 size=1

size=2

11 / 25

(31)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Binary Trees

Unweighted Sampling for Binary Trees

I n0

n1

n2 n3

n5

n4

n0

n1

n2 n3

n5

n4

n6

n0

n1

n3 n4

n6

n5

n2

I For leaf nodes, there is one rooted subtree of 

size 1 and one rooted subtree of size 0, 

therefore  1,1

1,2,1 therefore  1,1,2,1

1,2,3,3,1 therefore  1,1,2,3,3,1

1,2,3,5,6,4,1 therefore  1,1,2,3,5,6,4,1

I

size=0,  empty tree

size=0 size=0 size=1

size=0 size=1 size=1

size=3

size=1 size=0 size=1

size=2

(32)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Binary Trees

Unweighted Sampling for Binary Trees

I Second phase: Uniform Sampling. We sample a rooted subtree from t in a recursive top-down manner, based on the matrix Dcomputed.

I The basic idea is that to sample a rooted subtree at node ni, we decide on the size of the subtrees rooted at each child node, biased by the number of outcomes as counted inD, and then sample rooted subtrees of the desired sizes recursively.

12 / 25

(33)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Binary Trees

Unweighted Sampling for Binary Trees

I

n0

n1

n2 n3

n5 n4

n0

n1

n2 n3

n5 n4

n6

n0

n1

n3 n4

n6

n5

n2

I

1,1 1,1,2,1 , 1,1,2,3,3,1

1,1,2,3,5,6,4,1

Assume we want to sample a rooted  subtree of size 3. From  , we know  that there are 3 such rooted subtrees.

must be included in the sample. We have to sample 2  rooted subtrees at  , respectively whose sizes sum up to 2.

Only 2 possibilities: 

(1) a rooted subtree at  of size 1 and a rooted subtree at  of size 1. There is  1 1 1such case. (2) a rooted subtree at  of size 0 and a rooted subtree at  of size 2. There are  0 2 2such case.

We sample the first case with probability  and the second case with probability  .

Assume we sample the second case. Then we need to sample a rooted  submtree at  of size 0 and sample a rooted subtree at  of size 2. We  recursively operates the above process.

I

1,1 1,1,2,1 , 1,1,2,3,3,1

1,1,2,3,5,6,4,1

Assume we want to sample a rooted subtree of size 3. From  , we know that there are 3 such rooted subtrees.

must be included in the sample. We have to sample 2  rooted subtrees at  , respectively whose sizes sum up to 2.

Only 2 possibilities: 

(1) a rooted subtree at  of size 1 and a rooted subtree at  of size 1. There is  1 1 1such case. (2) a rooted subtree at  of size 0 and a rooted subtree at  of size 2. There are  0 2 2such case.

We sample the first case with probability  and the second case with probability .

Assume we sample the second case. Then we need to sample a rooted  submtree at  of size 0 and sample a rooted subtree at  of size 2. We  recursively operates the above process.

I

size=1 …… size=1

size=0 …… size=2

1 1 1 0 2 2

I

size=1 …… size=1

size=0 …… size=2

1 1 1 0 2 2

(34)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Binary Trees

Unweighted Sampling for Binary Trees

I

n0

n1

n2 n3

n5 n4

n0

n1

n2 n3

n5 n4

n6

n0

n1

n3 n4

n6

n5

n2

I

1,1 1,1,2,1 , 1,1,2,3,3,1

1,1,2,3,5,6,4,1

Assume we want to sample a rooted  subtree of size 3. From  , we know  that there are 3 such rooted subtrees.

must be included in the sample. We have to sample 2  rooted subtrees at  , respectively whose sizes sum up to 2.

Only 2 possibilities: 

(1) a rooted subtree at  of size 1 and a rooted subtree at  of size 1. There is  1 1 1such case.

(2) a rooted subtree at  of size 0 and a rooted subtree at  of size 2. There are  0 2 2such case.

We sample the first case with probability  and the second case with probability  .

Assume we sample the second case. Then we need to sample a rooted  submtree at  of size 0 and sample a rooted subtree at  of size 2. We  recursively operates the above process.

I

1,1 1,1,2,1 , 1,1,2,3,3,1

1,1,2,3,5,6,4,1

Assume we want to sample a rooted subtree of size 3. From  , we know that there are 3 such rooted subtrees.

must be included in the sample. We have to sample 2  rooted subtrees at  , respectively whose sizes sum up to 2.

Only 2 possibilities: 

(1) a rooted subtree at  of size 1 and a rooted subtree at  of size 1. There is  1 1 1such case. (2) a rooted subtree at  of size 0 and a rooted subtree at  of size 2. There are  0 2 2such case.

We sample the first case with probability  and the second case with probability .

Assume we sample the second case. Then we need to sample a rooted  submtree at  of size 0 and sample a rooted subtree at  of size 2. We  recursively operates the above process.

I

size=1 …… size=1

size=0 …… size=2

1 1 1 0 2 2

I

size=1 …… size=1

size=0 …… size=2

1 1 1 0 2 2

13 / 25

(35)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Binary Trees

Unweighted Sampling for Binary Trees

I

n0

n1

n2 n3

n5 n4

n0

n1

n2 n3

n5 n4

n6

n0

n1

n3 n4

n6

n5

n2

I

1,1 1,1,2,1 , 1,1,2,3,3,1

1,1,2,3,5,6,4,1

Assume we want to sample a rooted  subtree of size 3. From  , we know  that there are 3 such rooted subtrees.

must be included in the sample. We have to sample 2  rooted subtrees at  , respectively whose sizes sum up to 2.

Only 2 possibilities: 

(1) a rooted subtree at  of size 1 and a rooted subtree at  of size 1. There is  1 1 1such case.

(2) a rooted subtree at  of size 0 and a rooted subtree at  of size 2. There are  0 2 2such case.

We sample the first case with probability  and the second case with probability  . Assume we sample the second case. Then we need to sample a rooted  I

1,1 1,1,2,1 , 1,1,2,3,3,1

1,1,2,3,5,6,4,1

Assume we want to sample a rooted subtree of size 3. From  , we know that there are 3 such rooted subtrees.

must be included in the sample. We have to sample 2  rooted subtrees at  , respectively whose sizes sum up to 2.

Only 2 possibilities: 

(1) a rooted subtree at  of size 1 and a rooted subtree at  of size 1. There is  1 1 1such case.

(2) a rooted subtree at  of size 0 and a rooted subtree at  of size 2. There are  0 2 2such case.

We sample the first case with probability  and the second case with probability .

I

size=1 …… size=1

size=0 …… size=2

1 1 1 0 2 2

I

size=1 …… size=1

size=0 …… size=2

1 1 1 0 2 2

13 / 25

(36)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Binary Trees

Unweighted Sampling for Binary Trees

I

n0

n1

n2 n3

n5 n4

n0

n1

n2 n3

n5 n4

n6

n0

n1

n3 n4

n6

n5

n2

I

1,1 1,1,2,1 , 1,1,2,3,3,1

1,1,2,3,5,6,4,1

Assume we want to sample a rooted  subtree of size 3. From  , we know  that there are 3 such rooted subtrees.

must be included in the sample. We have to sample 2  rooted subtrees at  , respectively whose sizes sum up to 2.

Only 2 possibilities: 

(1) a rooted subtree at  of size 1 and a rooted subtree at  of size 1. There is  1 1 1such case.

(2) a rooted subtree at  of size 0 and a rooted subtree at  of size 2. There are  0 2 2such case.

We sample the first case with probability  and the second case with probability  .

Assume we sample the second case. Then we need to sample a rooted  submtree at  of size 0 and sample a rooted subtree at  of size 2. We  recursively operates the above process.

I

1,1 1,1,2,1 , 1,1,2,3,3,1

1,1,2,3,5,6,4,1

Assume we want to sample a rooted subtree of size 3. From  , we know that there are 3 such rooted subtrees.

must be included in the sample. We have to sample 2  rooted subtrees at  , respectively whose sizes sum up to 2.

Only 2 possibilities: 

(1) a rooted subtree at  of size 1 and a rooted subtree at  of size 1. There is  1 1 1such case.

(2) a rooted subtree at  of size 0 and a rooted subtree at  of size 2. There are  0 2 2such case.

We sample the first case with probability  and the second case with probability .

Assume we sample the second case. Then we need to sample a rooted  submtree at  of size 0 and sample a rooted subtree at  of size 2. We  recursively operates the above process.

I

size=1 ……

size=1

size=0 ……

size=2

1 1 1 0 2 2

I

size=1 …… size=1

size=0 …… size=2

1 1 1 0 2 2

13 / 25

(37)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Binary Trees

Unweighted Sampling for Binary Trees

I

n0

n1

n2 n3

n5 n4

n0

n1

n2 n3

n5 n4

n6

n0

n1

n3 n4

n6

n5

n2

I

1,1 1,1,2,1 , 1,1,2,3,3,1

1,1,2,3,5,6,4,1

Assume we want to sample a rooted  subtree of size 3. From  , we know  that there are 3 such rooted subtrees.

must be included in the sample. We have to sample 2  rooted subtrees at  , respectively whose sizes sum up to 2.

Only 2 possibilities: 

(1) a rooted subtree at  of size 1 and a rooted subtree at  of size 1. There is  1 1 1such case.

(2) a rooted subtree at  of size 0 and a rooted subtree at  of size 2. There are  0 2 2such case.

We sample the first case with probability  and the second case with probability  . Assume we sample the second case. Then we need to sample a rooted  I

1,1 1,1,2,1 , 1,1,2,3,3,1

1,1,2,3,5,6,4,1

Assume we want to sample a rooted subtree of size 3. From  , we know that there are 3 such rooted subtrees.

must be included in the sample. We have to sample 2  rooted subtrees at  , respectively whose sizes sum up to 2.

Only 2 possibilities: 

(1) a rooted subtree at  of size 1 and a rooted subtree at  of size 1. There is  1 1 1such case.

(2) a rooted subtree at  of size 0 and a rooted subtree at  of size 2. There are  0 2 2such case.

We sample the first case with probability  and the second case with probability .

I

size=1 …… size=0 ……

I

size=1 ……

size=1

size=0 ……

size=2

1 1 1 0 2 2

13 / 25

(38)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Binary Trees

Unweighted Sampling for Binary Trees

I Complexity of Phase 1 (the size of tis n) is O(n3).

I EachDi has size at mostn. Therefore computing the convolution sum of two suchDi’s is in O(n2).

I We need to compute at mostnconvolution sums. Therefore the overall complexity isO(n3).

I Complexity of Phase 2 (the size of tis n) is O(n2).

I At each nodeni, the number of possibilities to consider is O(n), since ni has at most two children.

I There arennodes, hence the overall complexity isO(n2).

14 / 25

(39)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Binary Trees

Unweighted Sampling for Binary Trees

I Complexity of Phase 1 (the size of tis n) is O(n3).

I EachDi has size at mostn. Therefore computing the convolution sum of two suchDi’s is in O(n2).

I We need to compute at mostnconvolution sums. Therefore the overall complexity isO(n3).

I Complexity of Phase 2 (the size of tis n) is O(n2).

I At each nodeni, the number of possibilities to consider is O(n), since ni has at most two children.

I There arennodes, hence the overall complexity isO(n2).

(40)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Unranked Trees

Unweighted Sampling for Unranked Trees

I The sampling algorithm for binary trees can be adapted for unranked trees, thanks to the associativity of the convolution sum.

15 / 25

(41)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Unranked Trees

Unweighted Sampling for Unranked Trees

I Based on this idea, we introduce “dummy node” to represent a set of (more than two) nodes. An unranked tree can be transformed to a binary tree by adding such “dummy nodes”.

I The change in the Phase 1 and Phase 2: when we are reaching a “dummy node”, this node has weight (namely size) 0.

I For instance,n6 is a “dummy node”.

I

n0

n1

n2 n3

n5

n4

n0

n1

n2 n3

n5

n4

n0

n1

n2 n3

n5

n4

n6

n0

n1

n3 n4

n6

n5

n2

(42)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Unranked Trees

Unweighted Sampling for Unranked Trees

I Based on this idea, we introduce “dummy node” to represent a set of (more than two) nodes. An unranked tree can be transformed to a binary tree by adding such “dummy nodes”.

I The change in the Phase 1 and Phase 2: when we are reaching a “dummy node”, this node has weight (namely size) 0.

I For instance,n6 is a “dummy node”.

I

n0

n1

n2 n3

n5

n4

n0

n1

n2 n3

n5

n4

n0

n1

n2 n3

n5

n4

n6

n0

n1

n3 n4

n6

n5

n2

16 / 25

(43)

Get a Sample for a Discount Algorithms for Tractable Sampling

Unweighted Sampling for Unranked Trees

Unweighted Sampling for Unranked Trees

I Based on this idea, we introduce “dummy node” to represent a set of (more than two) nodes. An unranked tree can be transformed to a binary tree by adding such “dummy nodes”.

I The change in the Phase 1 and Phase 2: when we are reaching a “dummy node”, this node has weight (namely size) 0.

I For instance,n6 is a “dummy node”.

I

n0

n1

n2 n3

n0

n1

n2 n3

n5

n4

n0

n1

n2 n3

n6

n0

n1

n3 n4

n2

(44)

Get a Sample for a Discount Algorithms for Tractable Sampling

0/1-weights Sampling

0/1-weights Sampling

I 0/1-weights sampling problem can be resolved by adapting the sampling algorithm of unweighted sampling for unranked trees, since nodes of 0 weight are similar to dummy nodes.

(Details refer to the paper)

I We have presented the algorithms of sampling problem for a single request. Next, we are going to discuss the case of multiple requests from a data consumer.

17 / 25

(45)

Get a Sample for a Discount Algorithms for Tractable Sampling

0/1-weights Sampling

0/1-weights Sampling

I 0/1-weights sampling problem can be resolved by adapting the sampling algorithm of unweighted sampling for unranked trees, since nodes of 0 weight are similar to dummy nodes.

(Details refer to the paper)

I We have presented the algorithms of sampling problem for a single request. Next, we are going to discuss the case of multiple requests from a data consumer.

(46)

Get a Sample for a Discount Repeated Requests

Repeated Requests

I Motivation: after having bought incomplete data, the data consumer may realize that she needs additional data, in which case she would like to obtain more incomplete data that is not redundant with what she already has.

I Sampling problem: A rooted subtree, including nodes that she already has and a set of new nodes whose sum weight is k, is sampled uniformly at random.

I This problem is NP-hard.

I The unweighted version of this problem is tractable. We can set the weights of nodes that a data consumer already has to 0. Then this sampling problem is similar to 0/1-weights sampling problem with slight adaption. (Details refer to the paper)

18 / 25

(47)

Get a Sample for a Discount Repeated Requests

Repeated Requests

I Motivation: after having bought incomplete data, the data consumer may realize that she needs additional data, in which case she would like to obtain more incomplete data that is not redundant with what she already has.

I Sampling problem: A rooted subtree, including nodes that she already has and a set of new nodes whose sum weight is k, is sampled uniformly at random.

I This problem is NP-hard.

I The unweighted version of this problem is tractable. We can set the weights of nodes that a data consumer already has to 0. Then this sampling problem is similar to 0/1-weights sampling problem with slight adaption. (Details refer to the paper)

(48)

Get a Sample for a Discount Repeated Requests

Repeated Requests

I Motivation: after having bought incomplete data, the data consumer may realize that she needs additional data, in which case she would like to obtain more incomplete data that is not redundant with what she already has.

I Sampling problem: A rooted subtree, including nodes that she already has and a set of new nodes whose sum weight is k, is sampled uniformly at random.

I This problem is NP-hard.

I The unweighted version of this problem is tractable. We can set the weights of nodes that a data consumer already has to 0. Then this sampling problem is similar to 0/1-weights sampling problem with slight adaption. (Details refer to the paper)

18 / 25

(49)

Get a Sample for a Discount Repeated Requests

Repeated Requests

I Motivation: after having bought incomplete data, the data consumer may realize that she needs additional data, in which case she would like to obtain more incomplete data that is not redundant with what she already has.

I Sampling problem: A rooted subtree, including nodes that she already has and a set of new nodes whose sum weight is k, is sampled uniformly at random.

I This problem is NP-hard.

I The unweighted version of this problem is tractable. We can set the weights of nodes that a data consumer already has to 0.

Then this sampling problem is similar to 0/1-weights sampling problem with slight adaption. (Details refer to the paper)

(50)

Get a Sample for a Discount Conclusion and Future Work

Conclusion and Future Work

I We proposed a framework for a data market in which data quality can be traded for a discount.

I We studied the case of XML documents with completeness as the quality dimension.

I The data consumer proposes a price but may get only a sample if the proposed price is lower than that of the entire document.

I A sample is a rooted subtree of prescribed weight, as determined by the proposed price, sampled uniformly at random.

19 / 25

(51)

Get a Sample for a Discount Conclusion and Future Work

Conclusion and Future Work

I We proposed a framework for a data market in which data quality can be traded for a discount.

I We studied the case of XML documents with completeness as the quality dimension.

I The data consumer proposes a price but may get only a sample if the proposed price is lower than that of the entire document.

I A sample is a rooted subtree of prescribed weight, as determined by the proposed price, sampled uniformly at random.

(52)

Get a Sample for a Discount Conclusion and Future Work

Conclusion and Future Work

I We proposed a framework for a data market in which data quality can be traded for a discount.

I We studied the case of XML documents with completeness as the quality dimension.

I The data consumer proposes a price but may get only a sample if the proposed price is lower than that of the entire document.

I A sample is a rooted subtree of prescribed weight, as determined by the proposed price, sampled uniformly at random.

19 / 25

(53)

Get a Sample for a Discount Conclusion and Future Work

Conclusion and Future Work

I We proposed a framework for a data market in which data quality can be traded for a discount.

I We studied the case of XML documents with completeness as the quality dimension.

I The data consumer proposes a price but may get only a sample if the proposed price is lower than that of the entire document.

I A sample is a rooted subtree of prescribed weight, as determined by the proposed price, sampled uniformly at random.

(54)

Get a Sample for a Discount Conclusion and Future Work

Conclusion and Future Work

I We proved that if nodes in the XML document have arbitrary non-negative weights, the sampling problem is intractable.

I We identified tractable cases, namely the unweighted sampling problem and 0/1-weights sampling problem, for which we devised P-TIME algorithms. We also considered repeated requests and provided P-TIME solutions to the unweighted cases.

20 / 25

(55)

Get a Sample for a Discount Conclusion and Future Work

Conclusion and Future Work

I We proved that if nodes in the XML document have arbitrary non-negative weights, the sampling problem is intractable.

I We identified tractable cases, namely the unweighted sampling problem and 0/1-weights sampling problem, for which we devised P-TIME algorithms. We also considered repeated requests and provided P-TIME solutions to the unweighted cases.

Références

Documents relatifs

This work presents a precise quantitative, finite sample analysis of the double descent phenomenon in the estimation of linear and non-linear models.. We make use of a

Indeed, the estimate of this probability results from Chebychev inequality and variance estimates in which the process is started in n.. What the symbol precisely means is

However, the number and duration of drifts changed over the lifetime of the bees and the season, and parasitism had an effect on drifters, with Nosema -infected bees performing more

Our asymptotic tests perform better than all competitors besides Wolf: in the latter case, we have greater Type II error for one neural dataset, lower Type II error on the Health

, θ k mixture indicator variate is drawn from the multinomial distribution with k probabilities equal to the mixture weights.. Then, given the drawn mixture indicator value, k say,

Cooling is achieved by producing good conductive contact between the aluminium sample holders and the copper cooling bars.. The apparatus, and some preliminary

If in the bourgeois ideology the female body was constrained to represent reproductive value, indeed, functioning as a kind of money (that other value in flux), once freed of

Exercice 2.2. You toss a coin four times. On the …rst toss, you win 10 dollars if the result is “tails” and lose 10 dollars if the result is “heads”. On the subsequent tosses,