• Aucun résultat trouvé

Crowd Mining

N/A
N/A
Protected

Academic year: 2022

Partager "Crowd Mining"

Copied!
1
0
0

Texte intégral

(1)

Crowd Mining

Yael Amsterdamer Yael Grossman Tova Milo Pierre Senellart

The model

We learn association rules of the form a,b  c,d

• E.g. , “heartburn”  “baking soda”, “lemon”

The answers contain

Rule support – frequency of a,b,c,d

Rule confidence – frequency of c,d given a,b

Items (for an open question)

Significant rules – average user support and confidence exceed fixed thresholds

 Users are sampled uniformly at random

Component framework

Data mining for the crowd?

 The discovery of data patterns in databases is done by data mining.

 Not suitable for our case

• People do not remember enough details!

For example, it is unrealistic to expect folk healers to remember

comprehensive details of all the cases they have treated in the past.

 They are far more likely to remember short summaries for personally prominent patterns

Introduction

Crowd data sourcing collects data from the crowd, often by asking questions

 We want to learn about new domains from the crowd

• E.g., traditional (folk) medicine in some region

• Or the leisure habits of hi-tech workers

 Data is not recorded anywhere

 The contents of the domain are unknown

• Discover what is interesting in this domain What should we ask the crowd?

“I treat patients with a cold every week”

Our approach

• Use personal summaries to learn about general trends

• Treat individual answers as samples

• Combine two types of questions o Open questions

o Closed questions

• Easier for users to answer

• Help digging deeper into their memories

We propose a formal model, a generic crowd mining framework, effective implementation for framework components and an

experimental study

“Which symptoms do you usually encounter?’’

“Do you use baking soda and lemon to relieve a heartburn?”

A hierarchy of black-box

components whose

implementation may be adapted to the settings

Experiments

• 3 new benchmark datasets (with known ground truth):

o Synthetic

o Retail (market basket analysis) o Wikipedia editing records

• A system CrowdMiner and two baseline alternatives

o Random o Greedy

• Varying the parameters, such as the mixture of open and closed questions, prior knowledge etc.

Error Estimations

 Not all the users can be asked about every rule

 We want to estimate the probability of making an error given the current knowledge

• We learn a distribution of the answer support and confidence

Significance estimation – by the position of >0.5 of the distribution mass

Error probability – for the true mean to be on the other side of the thresholds

 The next question is the one expected to minimize the overall error

Synthetic Retail Wikipedia

Retail Retail Synthetic

Références

Documents relatifs

In the third year, I will concentrate on experimental work referring to trace comparison and I will deal with operational support, agile workflow management, or other

The contribution of my thesis to the process discovery area consists in a novel tool that allows the construction of a data structure (called log-tree), that can be used both as

To obtain a proper transition from the current grid to successor, we introduce two types of vector fields, i.e., those corresponding to grids in the decomposition, which we call

For each evolution scenario ev i→i+1 of the co-evolution history, we first compute the differences diff(M src,i ,M src,i+1 ) and diff(M tgt,i ,M tgt,i+1 ). Subsequently, we count

The flexibility of this model lies in its first point: every choice of spontaneous velocity can be made and so every existing model for predicting crowd motion can be integrated

The existence and uniqueness of the proposed discrete velocity model solution has been demonstrated thanks

She is a member of the VLDB Endowment and the ICDT executive board and is an editor of TODS, the VLDB Journal, and the Logical Methods in Computer Sci- ence Journal. She has

Our WellBeing portal takes crowd involvement to the next level: using our crowd mining techniques we can identify where knowledge is lacking, and proactively engage portal users