Merging Uncertain Multi-Version XML Documents
M. Lamine BA, Talel Abdessalem & Pierre Senellart
ACM DocEng 2013 - 1st International Workshop on Document Changes (Florence, Italy)
September 10th, 2013
Merging feature: a need in open environments
Merging feature: a need in open environments
I Merging documents related to the same topic or sharing a large common part, e.g., Wikipedia articles
Merging feature: a need in open environments
I Merging documents related to the same topic or sharing a large common part, e.g., Wikipedia articles
I Recommend the outcome of the merging of contributions of the most trustworthy contributors
☛ Proposal of a merging operation over multi-version tree-structured documents with uncertain data
Usual Merging Process over Documents
Usual Merging Process over Documents
Usual Merging Process over Documents
(i) Change Detection (diff algorithm): {u1, u2} and{u3, u4}
u1:Update A to A0 u2:Insert C
u3:Remove A u4:Insert D
Usual Merging Process over Documents
Usual Merging Process over Documents
(i) Change Detection (diff algorithm): {u1, u2} and{u3, u4}
(ii) Three merge scenarios: {A’, B, C, D},{B, C, D} and{A, B, C, D}
u1:Update A to A0 u2:Insert C
u3:Remove A u4:Insert D
merge outcome
State-of-the-art XML Merging algorithms
I Diff algorithms, for instance [Lindholm et al., 2006], for documents having tree-like structure
I Two-way merging [Suzuki, 2002], [Ma et al., 2010] vs. Three-way merging [Lindholm, 2004], [Abdessalem et al., 2011]
I All deterministic approaches require human input in the presence of uncertainties, e.g. conflicts handling, for the merge outcome
I Probabilistic merging, proposed in [Ma et al., 2010], does not retain enough information for retrieving back individual merged versions
State-of-the-art XML Merging algorithms
Outline
Uncertain Tree-Structured Data
Probabilistic XML [Kimelfeld & Senellart.(2013)]
I Unordered, unranked, and labeledXML treeswith annotated edges
− annotations are propositional formulasof random Boolean variables
c P) r
p1 e1∨e2
t1 p2
¬e2
t2 PrXMLfiep-Document d1) r
p1
t1 p2
t2 F11={e1}
d2) r
p1
t1 F21={e2} F22={e1,e2}
Pr(e1) = 0.2 Pr(e2) = 0.8
Pr(d1) =Pr(e1)×Pr(¬e2)
Pr(d2) = (Pr(¬e1)×Pr(e2)) + (Pr(e1)×Pr(e2))
Possible worlds and their probabilities
Uncertain Tree-structured Multi-version Documents
Uncertain Tree-Structured Data
Probabilistic XML [Kimelfeld & Senellart.(2013)]
I Unordered, unranked, and labeledXML treeswith annotated edges
− annotations are propositional formulasof random Boolean variables
c P) r
p1 e1∨e2
t1 p2
¬e2
t2 PrXMLfiep-Document d1) r
p1
t1 p2
t2 F11={e1}
d2) r
p1
t1 F21={e2} F22={e1,e2}
Pr(e1) = 0.2 Pr(e2) = 0.8 Pr(d1) = 0.04 Pr(d2) = 0.80
Possible worlds and their probabilities
Uncertain Tree-Structured Data
Probabilistic XML [Kimelfeld & Senellart.(2013)]
I Unordered, unranked, and labeledXML treeswith annotated edges
− annotations are propositional formulasof random Boolean variables
c P) r
p1 e1∨e2
t1 p2
¬e2
t2 PrXMLfiep-Document d1) r
p1
t1 p2
t2 F11={e1}
d2) r
p1
t1 F21={e2} F22={e1,e2}
Pr(e1) = 0.2 Pr(e2) = 0.8 Pr(d1) = 0.04 Pr(d2) = 0.80
Possible worlds and their probabilities
− Enumerating all possible worlds and their probabilities
− Enable also to model uncertain updates on (uncertain) nodes [Kharlamov et al.(2010)]
☞ Integrate such a representation in a typical version control process
Uncertain Tree-structured Multi-version Documents
Uncertain Multi-Version XML Document
Uncertain Version Control Model
Defines two equivalent views over any uncertain multi-version XML tree
I set V of random variablese0,e1. . .en modeling the tree states
I infinite set D of all (unordered) XML trees including the versions
(G,Ω): Logical View
• DAGG built on variables inV
• Mapping Ω : 2V\{e0}→D which computes the possible versions according to sets of valid events
(G,Pc): Probabilistic XML Encoding
• Similar DAGG of random variables inV
• Probabilistic XML treePcwhich defines the same probability distribution as Ω mapping
Uncertain Multi-Version XML Document
Uncertain Version Control Model (Example)
d2) person
name
“Obama”
origin
“Tanzania”
F2={e1, e2}
d4) person
name
“Obama
origin
“Tanzania”
function
“US. President”
F4={e1,e2, e4} d1) person
name
“Obama”
F1={e1}
d3) person
name
“B. Obama”
origin
“Kenya”
F3={e1, e3}
e2
e3
e4
G)
e2 e4 e0 e1
e3
Uncertain Tree-structured Multi-version Documents
Uncertain Multi-Version XML Document
Uncertain Version Control Model (Example)
d2) person
name
“Obama”
origin
“Tanzania”
F2={e1, e2}
d4) person
name
“Obama
origin
“Tanzania”
function
“US. President”
F4={e1,e2, e4} d1) person
name
“Obama”
F1={e1}
d3) person
name
“B. Obama”
origin
“Kenya”
F3={e1, e3}
d5) person
name
“B. Obama”
origin
“Kenya”
function
“US. President”
F5={e1, e3,e4}
e2
e3
e4
e4
G)
e2 e4 e0 e1
e3
Uncertain Multi-Version XML Document
Uncertain Version Control Model (Example)
d2) person
name
“Obama”
origin
“Tanzania”
F2={e1, e2}
d4) person
name
“Obama
origin
“Tanzania”
function
“US. President”
F4={e1,e2, e4} d1) person
name
“Obama”
F1={e1}
d3) person
name
“B. Obama”
origin
“Kenya”
F3={e1, e3}
d5) person
name
“B. Obama”
origin
“Kenya”
function
“US. President”
F5={e1, e3,e4}
e2
e3
e4
e4
G)
e2 e4 e0 e1
e3
c
P) person
name
e1
“Obama”
¬e3
“B. Obama”
e3
origin e2∨e3
“Tanzania”
¬e3
“Kenya”
e3
function e4
“US. President”
Encode uncertain changes over nodes with formulas on edges
Uncertain XML Merging Operation Edit Detection
Uncertain XML Merging Operation
Edit Detection
I Three-way procedurediff3() detecting node insertions and deletions based on diff2() sub-routines
diff3(T1,T2,Ta) =diff2(Ta,T1)∪diff2(Ta,T2)
I diff3() output is an edit script consisting of equivalent, conflicting and independent edits
Uncertain XML Merging Operation
Edit Detection
T1) r
s2
p2
t2 Ta) r
s1 p1
t1
T2) r
s1
p1
t1 p’1
t’1 u2:Delete the subtree s1
u4:Insert the subtree s2
u3:Insert the subtree p01 at the node s1
diff2(Ta,T1) ={u2,u4} diff2(Ta,T2) ={u3}
diff3(T1,T2,Ta) ={u2,u4,u3}
u2and u3are conflicting edits u4is an independent edit s1 is a conflicting node
Uncertain XML Merging Operation Edit Detection
Uncertain XML Merging Operation
Edit Detection
I Three-way procedurediff3() detecting node insertions and deletions based on diff2() sub-routines
diff3(T1,T2,Ta) =diff2(Ta,T1)∪diff2(Ta,T2)
I diff3() output is an edit script consisting of equivalent, conflicting and independent edits
I ∆C is considered as the restriction ofdiff3() to the set of conflicting edits (over conflicting nodes) for the merging
Uncertain XML Merging Operation
Formal Definition(I)
Given the triple (e1,e2,e0), an uncertain merge operation is formalized as MRGe
1,e2,e0
I eventse1 ande2 identify the two (uncertain) versions to be merged
I e0 (a merge event) is a new event assessing the amount of uncertainty in the merge
An uncertain merging operation MRGe
1,e2,e0 onTmv maps in a logic sense to the formula below
MRGe
1,e2,e0(Tmv) := (G ∪({e0},{(e1,e0),(e2,e0)}), Ω0).
Uncertain XML Merging Operation Formal Definition
Uncertain XML Merging Operation
Formal Definition(II)
Let Ae
1={e |e ∈G, e ≺e1},Ae
2 ={e |e ∈G, e ≺e2}, As=Ae
1 ∩Ae
2
For all subset F ∈2V∪{e0}, Ω0 is computed based on Ω mapping as follows
I ife06∈F: Ω0(F) := Ω(F);
I if{e1,e2,e0} ⊆F: Ω0(F) := Ω(F\ {e0});
I if{e1,e0} ⊆F∧e26∈F: Ω0(F) := [Ω((F \ {e0})\(Ae2\As))]∆2−∆C; I if{e2,e0} ⊆F∧e16∈F: Ω0(F) := [Ω((F \ {e0})\(Ae
1\As))]∆1−∆C; I if{e1,e2} ∩F =∅ ∧e0∈F:
Ω0(F) := [Ω((F\ {e0})\((Ae
1\As)∪(Ae
2\As)))]∆3−∆C;
th
Uncertain XML Merging Operation
Formal Definition(II)
T1) r
s2
p2
t2 {e1,e2,e4} Ta) r
s1
p1
t1 {e1}
T2) r
s1
p1
t1 p’1
t’1 {e1,e3} u2:Delete the subtree s1
u4:Insert the subtree s2
u3:Insert the subtree p01 at the node s1
Uncertain XML Merging Operation Formal Definition
Uncertain XML Merging Operation
Formal Definition(II)
T1) r
s2
p2
t2 {e1,e2,e4} Ta) r
s1
p1
t1 {e1}
MRGe3,e4,e0
r s2 p2 t2 {e0} T2) r
s1
p1
t1 p’1
t’1 {e1,e3} u2:Delete the subtree s1
u4:Insert the subtree s2
u3:Insert the subtree p01 at the node s1
(evente’)
Uncertain XML Merging Operation
Formal Definition(II)
T1) r
s2
p2
t2 {e1,e2,e4} Ta) r
s1
p1
t1 {e1}
MRGe3,e4,e0
r s2 p2 t2 {e0}
r
s2
p2
t2 F={e1,e2,e4,e0}
Ω0(F) = [T1]{∅}
T2) r
s1
p1
t1 p’1
t’1 {e1 e3} u2:Delete the subtree s1
u4:Insert the subtree s2
u3:Insert the subtree p01 at the node s1
(evente’) Propagateu2
Uncertain XML Merging Operation Formal Definition
Uncertain XML Merging Operation
Formal Definition(II)
T1) r
s2
p2
t2 {e1,e2,e4} Ta) r
s1
p1
t1 {e1}
MRGe3,e4,e0
r s2 p2 t2 {e0}
r
s1
p1
t1 p’1
t’1 s2
p2
t2 F={e1,e3,e0} Ω0(F) = [T2]{u4} T2) r
s1
p1
t1 p’1
t’1 {e1 e3} u2:Delete the subtree s1
u4:Insert the subtree s2
u3:Insert the subtree p01 at the node s1
(evente’) Propagateu3
Uncertain XML Merging Operation
Formal Definition(II)
T1) r
s2
p2
t2 {e1,e2,e4} Ta) r
s1
p1
t1 {e1}
MRGe3,e4,e0
r s2 p2 t2 {e0}
r
s1
p1
t1 s2
p2
t2 F={e1,e0} Ω0(F) = [Ta]{u4} T2) r
s1
p1
t1 p’1
t’1 {e1 e3} u2:Delete the subtree s1
u4:Insert the subtree s2
u3:Insert the subtree p01 at the node s1
(evente’) Rejectu2andu3
Merging over Probabilistic XML Conflicting and Non-conflicting Nodes
Merging Probabilistic XML (MergePrXML)
Conflicting and Non-conflicting Nodes
I MergePrXML distinguishes conflicting nodes to non-conflicting ones Under the probabilistic XML EncodingTcmv, a givenx in Pcis a conflicted node with respect to MRGe
1,e2,e0 when its lineagefie↑(x) is such that –
1. fie↑(x)|=νs;
2. fie↑(x)6|=ν1(orfie↑(x)6|=ν2) and;
3. ∃y∈Pc,desc(x, y): fie↑(y)6|=νs andfie↑(y)|=ν2(orfie↑(y)|=ν1)
I fie↑(x) =V
z∈Pc,zx(fie(z)) and νs,ν1, ν2 are valuations over As, Ae
1,Ae
2 respectively
I MergePrXML implements MRGe
1,e2,e0 as an update operation inPc that only modifies formulas of non-conflicting nodes of the merge
Merging Probabilistic XML (MergePrXML)
Merge Algorithm (I)
Input: (G,Pc), e1,e2,e0
Output: Merging Uncertain XML Versions in Tcmv G :=G ∪({e0},{(e1,e0),(e2,e0)});
foreach non-conflicted node x inPc\Pc|C
{e1, e2} do replace(fie(x), e1,(e1∨e0));
replace(fie(x), e2,(e2∨e0));
return(G,Pc)
✉ MergePrXML performs the merge in time proportional to the size of the formulas of nodes impacted by the updates in merged branches
Merging over Probabilistic XML Merge Algorithm
Merging Probabilistic XML (MergePrXML)
Merge Algorithm (II)
c
P) r
s1
e1∧ ¬e2
p1
t1
p’1
e3
t’1
s2
e4
p2
t2
Merging Probabilistic XML (MergePrXML)
Merge Algorithm (II)
c
P) r
s1
e1∧ ¬e2
p1
t1
p’1
e3
t’1
s2
e4
p2
t2
MRGe
3,e4,e0
Merging over Probabilistic XML Merge Algorithm
Merging Probabilistic XML (MergePrXML)
Merge Algorithm (II)
c
P) r
s1
e1∧ ¬e2
p1
t1
p’1
e3
t’1
s2
e4
p2
t2
c
P0) r
s1
e1∧ ¬e2
p1
t1
p’1
e3
t’1
s2
e4∨e0
p2
t2
MRGe
3,e4,e0
Conclusion and Further Works
I Merging operation over tree-structured multi-version documents with uncertain data
I implementation of the common deterministic merge scenarios
I modelling of the amount of uncertainty in the merged versions
I Efficient merging algorithm over Probabilistic XML encoding
Conclusion and Further Works
References I
Abdessalem, T., Ba, M. L., and Senellart, P. (2011).
A probabilistic XML merging tool.
InProc. EDBT.
Ba, M. L., Abdessalem, T., and Senellart, P. (2011).
Towards a version control model with uncertain data.
InPIKM.
Ba, M. L., Abdessalem, T., and Senellart, P. (2013).
Uncertain version control in open collaborative editing of tree-structured documents.
InProc. DocEng.
Kharlamov, E., Nutt, W., and Senellart, P. (2010).
Updating Probabilistic XML.
InProc. Updates in XML.
Kimelfeld, B. and Senellart, P. (2013).
Probabilistic XML: Models and Complexity.
InAdvances in Probabilistic Databases for Uncertain Information Management. Springer-Verlag.
Lindholm, T. (2004).
A three-way merge for XML documents.
InProc. DocEng.
Lindholm, T., Kangasharju, J., and Tarkoma, S. (2006).
Fast and simple XML tree differencing by sequence alignment.
InProc. DocEng.
References II
Ma, J., Liu, W., Hunter, A., and Zhang, W. (2010).
An XML based framework for merging incomplete and inconsistent statistical information from clinical trials.
In Ma, Z. and Yan, L., editors,Software Computing in XML Data Management. Springer-Verlag.
Suzuki, N. (2002).
A Structural Merging Algorithm for XML Documents.
InProc. ICWI.