• Aucun résultat trouvé

Contrˆole de version incertain dans l’´edition collaborative ouverte de documents arborescents

N/A
N/A
Protected

Academic year: 2022

Partager "Contrˆole de version incertain dans l’´edition collaborative ouverte de documents arborescents"

Copied!
27
0
0

Texte intégral

(1)

Contrˆ ole de version incertain dans l’´ edition collaborative ouverte de documents arborescents

M. Lamine BA, Talel Abdessalem & Pierre Senellart

BDA’13 – 22-25 October 2013 (Nantes, France)

http://dbweb.enst.fr/

(2)

Motivations Versioning on the Web is Uncertain

Versioning on the Web is Uncertain (I)

Large-scale, open and collaborative editing platforms, e.g., wikipedia

Unreliable contributors, Novice vs. Experts, etc.

Malicious edits, Vandalism acts,Contradictions

(3)

Motivations Versioning on the Web is Uncertain

Versioning on the Web is Uncertain (II)

Version control is used in large-scale web collaborative editing platforms in order to integratecontributions from different sources and to support fixing errors with the possibility to query previousdata versions

No notion of more relevant versions or contributions which will fit to user-preferences, but just the concept of last validrevisions

Deterministicversion control models [Kerstin et al.(2009), Al Koc et al.(2011)] in the literature

(4)

Motivations Versioning on the Web is Uncertain

Versioning on the Web is Uncertain (III)

v0 (a0, c0) v1 (a1, c1) v2 (a2, c2) v4 (a1, c4)

v3 (a3, c3) derivation derivation

(5)

Motivations Versioning on the Web is Uncertain

Versioning on the Web is Uncertain (III)

v0 (a0, c0) v1 (a1, c1) v2 (a2, c2) v4 (a1, c4)

Q1:v2 |¬a1

Q2:v2 |¬c1

Q3:all vi|Pr(vi)6=0

v3 (a3, c3) derivation derivation

(6)

Motivations Versioning on the Web is Uncertain

Versioning on the Web is Uncertain (III)

v0 (a0, c0) v1 (a1, c1) v2 (a2, c2) v4 (a1, c4)

Q1:v2 |¬a1

Q2:v2 |¬c1

Q3:all vi|Pr(vi)6=0

v3 (a3, c3) derivation derivation

☛ Uncertain version control in Open Collaborative Contexts

(7)

Motivations Versioning on the Web is Uncertain

Versioning on the Web is Uncertain (IV)

d2) Person

Name

“Obama”

Origin

“Tanzania”

d4) Person

Name

“Obama

Origin

“Tanzania”

Function

“US. President”

d1) Person

Name

“Obama”

d3) Person

Name

“B. Obama”

Origin

“Kenya”

e2

e3

e4

(8)

Motivations Versioning on the Web is Uncertain

Versioning on the Web is Uncertain (IV)

d2) Person

Name

“Obama”

Origin

“Tanzania”

d4) Person

Name

“Obama

Origin

“Tanzania”

Function

“US. President”

d1) Person

Name

“Obama”

d3) Person

Name

“B. Obama”

Origin

“Kenya”

d5) Person

Name

“B. Obama”

Origin

“Kenya”

Function

“US. President”

F={e1,e3,e4}

e2

e3

e4

e4

(9)

Motivations Versioning on the Web is Uncertain

Versioning on the Web is Uncertain (IV)

d2) Person

Name

“Obama”

Origin

“Tanzania”

d4) Person

Name

“Obama

Origin

“Tanzania”

Function

“US. President”

d1) Person

Name

“Obama”

d3) Person

Name

“B. Obama”

Origin

“Kenya”

d5) Person

Name

“B. Obama”

Origin

“Kenya”

Function

“US. President”

F={e1,e3,e4}

e2

e3

e4

e4

c

P) Person

Name

e1

“Obama”

¬e3

“B. Obama”

e3

Origin e2∨e3

“Tanzania”

¬e3

“Kenya”

e3

Function e4

“US. President”

Mapping Probabilistic Encoding

(10)

Uncertain Tree-Structured Data Model

Plan

Uncertain Tree-Structured Data Model Uncertain Multi-Version XML Document Conclusion and Further work

(11)

Uncertain Tree-Structured Data Model

Uncertain Tree-Structured Data

Probabilistic XML [Kimelfeld & Senellart.(2013)]

c P) r

p1 e1∨e2

t1 p2

¬e2

t2

d1) r

p1

t1 p2

t2 v1= (e1∧ ¬e2)

d2) r

p1

t1 v2= (¬e1∧e2) v

2= (e1∧e2)

d3) r

p2

t2

v3= (¬e1∧ ¬e2)

Pr(e1) = 0.2; Pr(e2) = 0.8 Pr(d1) = Pr(e1)×Pr(¬e2)

Pr(d2) = (Pr(¬e1)×Pr(e2)) + (Pr(e1)×Pr(e2)) Pr(d3) = Pr(¬e1)×Pr(¬e2)

(12)

Uncertain Tree-Structured Data Model

Uncertain Tree-Structured Data

Probabilistic XML [Kimelfeld & Senellart.(2013)]

c P) r

p1 e1∨e2

t1 p2

¬e2

t2

d1) r

p1

t1 p2

t2 v1= (e1∧ ¬e2)

d2) r

p1

t1 v2= (¬e1∧e2) v

2= (e1∧e2)

d3) r

p2

t2

v3= (¬e1∧ ¬e2)

Pr(e1) = 0.2; Pr(e2) = 0.8 Pr(d1) = 0.04

Pr(d2) = 0.80 Pr(d3) = 0.16

(13)

Uncertain Tree-Structured Data Model

Uncertain Tree-Structured Data

Probabilistic XML [Kimelfeld & Senellart.(2013)]

c P) r

p1 e1∨e2

t1 p2

¬e2

t2

d1) r

p1

t1 p2

t2 v1= (e1∧ ¬e2)

d2) r

p1

t1 v2= (¬e1∧e2) v

2= (e1∧e2)

d3) r

p2

t2

v3= (¬e1∧ ¬e2)

➤ Pc=<T,C(E),fie,Pr>

➤ [[Pc]] =<D,Pr>; Σ{Pr(d) |d ∈D}= 1

(14)

Uncertain Multi-Version XML Document

Plan

Uncertain Tree-Structured Data Model Uncertain Multi-Version XML Document

Uncertain Version Control Model Probabilistic XML Encoding

Updating Uncertain Multi-version XML documents Performance Analysis

Conclusion and Further work

(15)

Uncertain Multi-Version XML Document Uncertain Version Control Model

Uncertain Multi-Version XML Document

Uncertain Version Control Model

Probability space (PSV) over Uncertain versions of XML documents

Randomderivation graph (DG) over the document versions produced Intuition: states in DG are complex variablesei’s calledversion control events based on simpler variables b1. . .bm managing uncertainty in data

<G,Ω> defines a multi-version XML document with uncertain data

G is DG over a set of versioning eventsV ∪e0 withV ={e1. . .en}

Ω : 2V\{e0}→D a mapping computing the PSV according to sets of valid events

(16)

Uncertain Multi-Version XML Document Uncertain Version Control Model

Uncertain Multi-Version XML Document

Possible versions and Probabilities

Possible versions

Version control events come with edit scripts updating content

∀i,∀F⊆2V\{ei}, the possible version Ω({ei} ∪F) = [Ω(F)]i Probability of possible versions

Assume a prior probability distribution over simple variables b1. . .bm

The probability of a given possible version Ω(F) is the probability of W

FV,Ω(F)F

(17)

Uncertain Multi-Version XML Document Probabilistic XML Encoding

Uncertain Multi-Version XML Document

Probabilistic XML Encoding

Compact representation of all possible versions (PSV) in Ω mapping Intuition: Represent PSV usingpropositional formulasof simple variables

b1. . .bm attached to nodes in a global treeT containing all provided data

A probabilistic XML encoding of <G,Ω>is a couple <G,Pc>with

Pc=<T,C(V),fie,Pr>is a probabilistic XML document

Thm1: JPcKdefines the same probability distribution overD as Ω, i.e.,

<G,JPcK>=<G,Ω>

(18)

Uncertain Multi-Version XML Document Updating Uncertain Multi-version XML documents

Uncertain Multi-Version XML Document

Updating Uncertain Versions (I)

Uncertaineditions in ∆ over nodes of a possible tree version

An update is an uncertain version control event defined based on a triple

<ei,ej,∆> (ei ∈G andej 6∈G) updOP(<ei,ej, ∆>)[<G,Ω>]

− G :=G ∪({ej}, {(ei, ej)})

− Extension of Ω to a Ω mapping with for each F ∈2(V\{e0})∪{ej}(F) = [Ω(F\ {ej})] ifej ∈F and Ω(F) = Ω(F) otherwise

(19)

Uncertain Multi-Version XML Document Updating Uncertain Multi-version XML documents

Uncertain Multi-Version XML Document

Updating Uncertain Versions (II)

updPrXML(<ei,ej,∆>)[<Pc,Ω>]

− G :=G ∪({ej}, {(ei, ej)})

− For all ins(x, i)in ∆

Set(x,fie(x)(ej))ifx already inPc

Insertx inPcandSet(x,(ej))otherwise

− For all del(x)in ∆

Set(x,fie(x)∧ ¬(ej))

Thm2: updOP(<ei,ej,∆>)updPrXML(<ei,ej,∆>) Thm3: updPrXMLruns inO(1) whilePcgrows linearly according to |∆|

(20)

Uncertain Multi-Version XML Document Performance Analysis

Uncertain Multi-version XML documents

Performance Analysis (Metrics, datasets and baseline)

Estimation of the commit time and checkout cost of the model

Baseline Systems

Versioning tools SubVersion and Git

Use of their Java implementations based on the APIs SvnKit and JGit

Real Datasets

History of commits over two large file systems (shared tree-structured data)

Linux kernel development

Cassandra project

Implementation of our system (PrXML) in Java

Measures are obtained with all accesses in RAM Disk

(21)

Uncertain Multi-Version XML Document Performance Analysis

Evaluation of the model

Performance Analysis (Commit time)

0 50 100 150 200 250 300 101

102 103 104

Commit (Linux kernel)

Committime(ms)

Subversion Git PrXML

0 200 400 600

101 102 103 104

Commit (Cassandra project)

Committime(ms)

Subversion Git PrXML

(22)

Uncertain Multi-Version XML Document Performance Analysis

Evaluation of the model

Performance Analysis (Checkout time)

0 50 100 150 200 250 300 100

200 300 400

Revision (Linux kernel)

Checkouttime(ms)

Subversion Git PrXML

0 200 400 600

100 200 300 400

Revision (Cassandra project)

Checkouttime(ms)

Subversion Git PrXML

(23)

Conclusion and Further work

Plan

Uncertain Tree-Structured Data Model Uncertain Multi-Version XML Document Conclusion and Further work

(24)

Conclusion and Further work

Conclusion and Further work (i)

Design of a probabilistic version control approach for uncertain tree-structured documents

Both logical description and an efficient probabilistic compact XML encoding

Set-up of the most used version control operation, i.e., update and with an efficient mapping algorithms

Both theoretical and practical complexity analysis of the proposed model

Extension of our model in [Ba et al.(DChanges, 2013)] with the merging operationover uncertain versions

(25)

Conclusion and Further work

Conclusion and Further work (ii)

Support of more complex versioning operations such as copying, renaming etc.

Study the impact of introducing some constraints over the order of nodes in uncertain version control

(26)

Conclusion and Further work

Merci!

(27)

Conclusion and Further work

References

Kerstin Altmanninger, Martina Seidl, and Manuel Wimmer,A survey on model versioning approaches, IJWIS5(2009).

M. Lamine Ba, Talel Abdessalem, and Pierre Senellart,Towards a version control model with uncertain data, PIKM, 2011.

,Merging uncertain multi-version XML documents, Proc. DChanges (Florence, Italy), sep 2013.

,Uncertain version control in open collaborative editing of tree-structured documents, Proc. DocEng (Florence, Italy), sep 2013.

Evgeny Kharlamov, Werner Nutt, and Pierre Senellart,Updating Probabilistic XML, Updates in XML, 2010.

Benny Kimelfeld and Pierre Senellart,Probabilistic XML: Models and complexity, Advances in Probabilistic Databases for Uncertain Information Management (Zongmin Ma and Li Yan, eds.), Springer-Verlag, 2013.

Al Koc and Abdullah Uz Tansel,A survey of version control systems, ICEME, 2011.

Références

Documents relatifs

In this framework relationships among users and items and their corresponding likelihood will be encoded in a uncertain graph that can then be used to infer the probability of

It enables concurrent editing of a single OWL file and also features notes on RUs, a change tracking log for RUs (such as class edits), a discussion thread and an instant

La déforestation permet d’installer des plantations d'huile de palme décime les habitats des orangs-outans et les force à rentrer en contact plus souvent avec les Hommes.. Privés

This solution is based on the assumption that the prototype of a virtual community (section 4.5.1) describes the best this virtual community. This method can therefore be used with

L EMMA 4.1. Given that, mergePrXML first updates G as specified in Section 4.2.1. The idea is that each possible merge outcome, which occurs when e ′ is valuated to true regardless

Including links in TWIST are used to re- alized documents structures, and refer- ence links are used to realize references between documents or parts of docu- ments. 5.2.4

Partant de l’étude d’un corpus de dialogues d’assistance à la recherche de documents (RD), nous définissons un modèle de la tâche de recherche collaborative de docu- ments dans

une source d’inspiration. Le contexte particulier des observatoires que nous étudions, qui est lié à une « entreprise collective » territoriale, est cependant