• Aucun résultat trouvé

Merging Uncertain Multi-Version XML Documents

N/A
N/A
Protected

Academic year: 2022

Partager "Merging Uncertain Multi-Version XML Documents"

Copied!
33
0
0

Texte intégral

(1)

Merging Uncertain Multi-Version XML Documents

M. Lamine BA, Talel Abdessalem & Pierre Senellart

ACM DocEng 2013 - 1st International Workshop on Document Changes (Florence, Italy)

September 10th, 2013

(2)

Merging feature: a need in open environments

Merging feature: a need in open environments

I Merging documents related to the same topic or sharing a large common part, e.g., Wikipedia articles

(3)

Merging feature: a need in open environments

I Merging documents related to the same topic or sharing a large common part, e.g., Wikipedia articles

I Recommend the outcome of the merging of contributions of the most trustworthy contributors

☛ Proposal of a merging operation over multi-version tree-structured documents with uncertain data

(4)

Usual Merging Process over Documents

Usual Merging Process over Documents

(5)

Usual Merging Process over Documents

(i) Change Detection (diff algorithm): {u1, u2} and{u3, u4}

u1:Update A to A0 u2:Insert C

u3:Remove A u4:Insert D

(6)

Usual Merging Process over Documents

Usual Merging Process over Documents

(i) Change Detection (diff algorithm): {u1, u2} and{u3, u4}

(ii) Three merge scenarios: {A’, B, C, D},{B, C, D} and{A, B, C, D}

u1:Update A to A0 u2:Insert C

u3:Remove A u4:Insert D

merge outcome

(7)

State-of-the-art XML Merging algorithms

I Diff algorithms, for instance [Lindholm et al., 2006], for documents having tree-like structure

I Two-way merging [Suzuki, 2002], [Ma et al., 2010] vs. Three-way merging [Lindholm, 2004], [Abdessalem et al., 2011]

I All deterministic approaches require human input in the presence of uncertainties, e.g. conflicts handling, for the merge outcome

I Probabilistic merging, proposed in [Ma et al., 2010], does not retain enough information for retrieving back individual merged versions

(8)

State-of-the-art XML Merging algorithms

Outline

(9)

Uncertain Tree-Structured Data

Probabilistic XML [Kimelfeld & Senellart.(2013)]

I Unordered, unranked, and labeledXML treeswith annotated edges

annotations are propositional formulasof random Boolean variables

c P) r

p1 e1e2

t1 p2

¬e2

t2 PrXMLfiep-Document d1) r

p1

t1 p2

t2 F11={e1}

d2) r

p1

t1 F21={e2} F22={e1,e2}

Pr(e1) = 0.2 Pr(e2) = 0.8

Pr(d1) =Pr(e1)×Pr(¬e2)

Pr(d2) = (Pr(¬e1)×Pr(e2)) + (Pr(e1)×Pr(e2))

Possible worlds and their probabilities

(10)

Uncertain Tree-structured Multi-version Documents

Uncertain Tree-Structured Data

Probabilistic XML [Kimelfeld & Senellart.(2013)]

I Unordered, unranked, and labeledXML treeswith annotated edges

annotations are propositional formulasof random Boolean variables

c P) r

p1 e1e2

t1 p2

¬e2

t2 PrXMLfiep-Document d1) r

p1

t1 p2

t2 F11={e1}

d2) r

p1

t1 F21={e2} F22={e1,e2}

Pr(e1) = 0.2 Pr(e2) = 0.8 Pr(d1) = 0.04 Pr(d2) = 0.80

Possible worlds and their probabilities

(11)

Uncertain Tree-Structured Data

Probabilistic XML [Kimelfeld & Senellart.(2013)]

I Unordered, unranked, and labeledXML treeswith annotated edges

annotations are propositional formulasof random Boolean variables

c P) r

p1 e1e2

t1 p2

¬e2

t2 PrXMLfiep-Document d1) r

p1

t1 p2

t2 F11={e1}

d2) r

p1

t1 F21={e2} F22={e1,e2}

Pr(e1) = 0.2 Pr(e2) = 0.8 Pr(d1) = 0.04 Pr(d2) = 0.80

Possible worlds and their probabilities

Enumerating all possible worlds and their probabilities

Enable also to model uncertain updates on (uncertain) nodes [Kharlamov et al.(2010)]

Integrate such a representation in a typical version control process

(12)

Uncertain Tree-structured Multi-version Documents

Uncertain Multi-Version XML Document

Uncertain Version Control Model

Defines two equivalent views over any uncertain multi-version XML tree

I set V of random variablese0,e1. . .en modeling the tree states

I infinite set D of all (unordered) XML trees including the versions

(G,Ω): Logical View

DAGG built on variables inV

Mapping Ω : 2V\{e0}D which computes the possible versions according to sets of valid events

(G,Pc): Probabilistic XML Encoding

Similar DAGG of random variables inV

Probabilistic XML treePcwhich defines the same probability distribution as Ω mapping

(13)

Uncertain Multi-Version XML Document

Uncertain Version Control Model (Example)

d2) person

name

“Obama”

origin

“Tanzania”

F2={e1, e2}

d4) person

name

“Obama

origin

“Tanzania”

function

“US. President”

F4={e1,e2, e4} d1) person

name

“Obama”

F1={e1}

d3) person

name

“B. Obama”

origin

“Kenya”

F3={e1, e3}

e2

e3

e4

G)

e2 e4 e0 e1

e3

(14)

Uncertain Tree-structured Multi-version Documents

Uncertain Multi-Version XML Document

Uncertain Version Control Model (Example)

d2) person

name

“Obama”

origin

“Tanzania”

F2={e1, e2}

d4) person

name

“Obama

origin

“Tanzania”

function

“US. President”

F4={e1,e2, e4} d1) person

name

“Obama”

F1={e1}

d3) person

name

“B. Obama”

origin

“Kenya”

F3={e1, e3}

d5) person

name

“B. Obama”

origin

“Kenya”

function

“US. President”

F5={e1, e3,e4}

e2

e3

e4

e4

G)

e2 e4 e0 e1

e3

(15)

Uncertain Multi-Version XML Document

Uncertain Version Control Model (Example)

d2) person

name

“Obama”

origin

“Tanzania”

F2={e1, e2}

d4) person

name

“Obama

origin

“Tanzania”

function

“US. President”

F4={e1,e2, e4} d1) person

name

“Obama”

F1={e1}

d3) person

name

“B. Obama”

origin

“Kenya”

F3={e1, e3}

d5) person

name

“B. Obama”

origin

“Kenya”

function

“US. President”

F5={e1, e3,e4}

e2

e3

e4

e4

G)

e2 e4 e0 e1

e3

c

P) person

name

e1

“Obama”

¬e3

“B. Obama”

e3

origin e2e3

“Tanzania”

¬e3

“Kenya”

e3

function e4

“US. President”

Encode uncertain changes over nodes with formulas on edges

(16)

Uncertain XML Merging Operation Edit Detection

Uncertain XML Merging Operation

Edit Detection

I Three-way procedurediff3() detecting node insertions and deletions based on diff2() sub-routines

diff3(T1,T2,Ta) =diff2(Ta,T1)∪diff2(Ta,T2)

I diff3() output is an edit script consisting of equivalent, conflicting and independent edits

(17)

Uncertain XML Merging Operation

Edit Detection

T1) r

s2

p2

t2 Ta) r

s1 p1

t1

T2) r

s1

p1

t1 p’1

t’1 u2:Delete the subtree s1

u4:Insert the subtree s2

u3:Insert the subtree p01 at the node s1

diff2(Ta,T1) ={u2,u4} diff2(Ta,T2) ={u3}

diff3(T1,T2,Ta) ={u2,u4,u3}

u2and u3are conflicting edits u4is an independent edit s1 is a conflicting node

(18)

Uncertain XML Merging Operation Edit Detection

Uncertain XML Merging Operation

Edit Detection

I Three-way procedurediff3() detecting node insertions and deletions based on diff2() sub-routines

diff3(T1,T2,Ta) =diff2(Ta,T1)∪diff2(Ta,T2)

I diff3() output is an edit script consisting of equivalent, conflicting and independent edits

IC is considered as the restriction ofdiff3() to the set of conflicting edits (over conflicting nodes) for the merging

(19)

Uncertain XML Merging Operation

Formal Definition(I)

Given the triple (e1,e2,e0), an uncertain merge operation is formalized as MRGe

1,e2,e0

I eventse1 ande2 identify the two (uncertain) versions to be merged

I e0 (a merge event) is a new event assessing the amount of uncertainty in the merge

An uncertain merging operation MRGe

1,e2,e0 onTmv maps in a logic sense to the formula below

MRGe

1,e2,e0(Tmv) := (G ∪({e0},{(e1,e0),(e2,e0)}), Ω0).

(20)

Uncertain XML Merging Operation Formal Definition

Uncertain XML Merging Operation

Formal Definition(II)

Let Ae

1={e |e ∈G, e ≺e1},Ae

2 ={e |e ∈G, e ≺e2}, As=Ae

1 ∩Ae

2

For all subset F ∈2V∪{e0}, Ω0 is computed based on Ω mapping as follows

I ife06∈F: Ω0(F) := Ω(F);

I if{e1,e2,e0} ⊆F: Ω0(F) := Ω(F\ {e0});

I if{e1,e0} ⊆Fe26∈F: Ω0(F) := [Ω((F \ {e0})\(Ae2\As))]2−∆C; I if{e2,e0} ⊆Fe16∈F: Ω0(F) := [Ω((F \ {e0})\(Ae

1\As))]1−∆C; I if{e1,e2} ∩F =∅ ∧e0F:

0(F) := [Ω((F\ {e0})\((Ae

1\As)(Ae

2\As)))]3−∆C;

th

(21)

Uncertain XML Merging Operation

Formal Definition(II)

T1) r

s2

p2

t2 {e1,e2,e4} Ta) r

s1

p1

t1 {e1}

T2) r

s1

p1

t1 p’1

t’1 {e1,e3} u2:Delete the subtree s1

u4:Insert the subtree s2

u3:Insert the subtree p01 at the node s1

(22)

Uncertain XML Merging Operation Formal Definition

Uncertain XML Merging Operation

Formal Definition(II)

T1) r

s2

p2

t2 {e1,e2,e4} Ta) r

s1

p1

t1 {e1}

MRGe3,e4,e0

r s2 p2 t2 {e0} T2) r

s1

p1

t1 p’1

t’1 {e1,e3} u2:Delete the subtree s1

u4:Insert the subtree s2

u3:Insert the subtree p01 at the node s1

(evente’)

(23)

Uncertain XML Merging Operation

Formal Definition(II)

T1) r

s2

p2

t2 {e1,e2,e4} Ta) r

s1

p1

t1 {e1}

MRGe3,e4,e0

r s2 p2 t2 {e0}

r

s2

p2

t2 F={e1,e2,e4,e0}

0(F) = [T1]{∅}

T2) r

s1

p1

t1 p’1

t’1 {e1 e3} u2:Delete the subtree s1

u4:Insert the subtree s2

u3:Insert the subtree p01 at the node s1

(evente’) Propagateu2

(24)

Uncertain XML Merging Operation Formal Definition

Uncertain XML Merging Operation

Formal Definition(II)

T1) r

s2

p2

t2 {e1,e2,e4} Ta) r

s1

p1

t1 {e1}

MRGe3,e4,e0

r s2 p2 t2 {e0}

r

s1

p1

t1 p’1

t’1 s2

p2

t2 F={e1,e3,e0} 0(F) = [T2]{u4} T2) r

s1

p1

t1 p’1

t’1 {e1 e3} u2:Delete the subtree s1

u4:Insert the subtree s2

u3:Insert the subtree p01 at the node s1

(evente’) Propagateu3

(25)

Uncertain XML Merging Operation

Formal Definition(II)

T1) r

s2

p2

t2 {e1,e2,e4} Ta) r

s1

p1

t1 {e1}

MRGe3,e4,e0

r s2 p2 t2 {e0}

r

s1

p1

t1 s2

p2

t2 F={e1,e0} 0(F) = [Ta]{u4} T2) r

s1

p1

t1 p’1

t’1 {e1 e3} u2:Delete the subtree s1

u4:Insert the subtree s2

u3:Insert the subtree p01 at the node s1

(evente’) Rejectu2andu3

(26)

Merging over Probabilistic XML Conflicting and Non-conflicting Nodes

Merging Probabilistic XML (MergePrXML)

Conflicting and Non-conflicting Nodes

I MergePrXML distinguishes conflicting nodes to non-conflicting ones Under the probabilistic XML EncodingTcmv, a givenx in Pcis a conflicted node with respect to MRGe

1,e2,e0 when its lineagefie(x) is such that –

1. fie(x)|=νs;

2. fie(x)6|=ν1(orfie(x)6|=ν2) and;

3. ∃yPc,desc(x, y): fie(y)6|=νs andfie(y)|=ν2(orfie(y)|=ν1)

I fie(x) =V

z∈Pc,zx(fie(z)) and νs1, ν2 are valuations over As, Ae

1,Ae

2 respectively

I MergePrXML implements MRGe

1,e2,e0 as an update operation inPc that only modifies formulas of non-conflicting nodes of the merge

(27)

Merging Probabilistic XML (MergePrXML)

Merge Algorithm (I)

Input: (G,Pc), e1,e2,e0

Output: Merging Uncertain XML Versions in Tcmv G :=G ∪({e0},{(e1,e0),(e2,e0)});

foreach non-conflicted node x inPc\Pc|C

{e1, e2} do replace(fie(x), e1,(e1∨e0));

replace(fie(x), e2,(e2∨e0));

return(G,Pc)

✉ MergePrXML performs the merge in time proportional to the size of the formulas of nodes impacted by the updates in merged branches

(28)

Merging over Probabilistic XML Merge Algorithm

Merging Probabilistic XML (MergePrXML)

Merge Algorithm (II)

c

P) r

s1

e1∧ ¬e2

p1

t1

p’1

e3

t’1

s2

e4

p2

t2

(29)

Merging Probabilistic XML (MergePrXML)

Merge Algorithm (II)

c

P) r

s1

e1∧ ¬e2

p1

t1

p’1

e3

t’1

s2

e4

p2

t2

MRGe

3,e4,e0

(30)

Merging over Probabilistic XML Merge Algorithm

Merging Probabilistic XML (MergePrXML)

Merge Algorithm (II)

c

P) r

s1

e1∧ ¬e2

p1

t1

p’1

e3

t’1

s2

e4

p2

t2

c

P0) r

s1

e1∧ ¬e2

p1

t1

p’1

e3

t’1

s2

e4e0

p2

t2

MRGe

3,e4,e0

(31)

Conclusion and Further Works

I Merging operation over tree-structured multi-version documents with uncertain data

I implementation of the common deterministic merge scenarios

I modelling of the amount of uncertainty in the merged versions

I Efficient merging algorithm over Probabilistic XML encoding

(32)

Conclusion and Further Works

References I

Abdessalem, T., Ba, M. L., and Senellart, P. (2011).

A probabilistic XML merging tool.

InProc. EDBT.

Ba, M. L., Abdessalem, T., and Senellart, P. (2011).

Towards a version control model with uncertain data.

InPIKM.

Ba, M. L., Abdessalem, T., and Senellart, P. (2013).

Uncertain version control in open collaborative editing of tree-structured documents.

InProc. DocEng.

Kharlamov, E., Nutt, W., and Senellart, P. (2010).

Updating Probabilistic XML.

InProc. Updates in XML.

Kimelfeld, B. and Senellart, P. (2013).

Probabilistic XML: Models and Complexity.

InAdvances in Probabilistic Databases for Uncertain Information Management. Springer-Verlag.

Lindholm, T. (2004).

A three-way merge for XML documents.

InProc. DocEng.

Lindholm, T., Kangasharju, J., and Tarkoma, S. (2006).

Fast and simple XML tree differencing by sequence alignment.

InProc. DocEng.

(33)

References II

Ma, J., Liu, W., Hunter, A., and Zhang, W. (2010).

An XML based framework for merging incomplete and inconsistent statistical information from clinical trials.

In Ma, Z. and Yan, L., editors,Software Computing in XML Data Management. Springer-Verlag.

Suzuki, N. (2002).

A Structural Merging Algorithm for XML Documents.

InProc. ICWI.

Références

Documents relatifs

Since the tree merging operations we use for obtaining a probabilis- tic document can be seen as a succession of updates, we focus here on global distributional nodes, and, more

Talel Abdessalem Mouhamadou Lamine BA Pierre Senellart T´ el´ ecom ParisTech Universit´ e Cheikh Anta DIOP T´ el´ ecom ParisTech. Paris, France Dakar, Senegal

A dierent union record is created for each dierent type of document included in the cluster; thus, a particular cluster may have union records for a journal article, a

In this section we investigate the compatibility of the rationalization-driven merging operators introduced in the previous section, i.e., the expansion-based, revision-based,

In order to avoid the latter case, we are interested in defining consensus opera- tors, i.e., merging operators such that the merged base C that is generated satisfies the

In order to fill the gap, we investigate possi- ble translations in a belief merging framework of some egalitarian properties and concepts coming from social choice theory, such

Contrastingly, the compositional merging operators defined in this paper are based on several merging operators and offer better merging

This calls for new disjunctive merging operators satisfying as many (IC) postulates as possible, and more generally, performing better than formula-based operators with respect to