Inverting Schema Mappings
(presenting works by Fagin, Kolaitis, Popa, Tan)
Dmitri Akatov1 Pierre Senellart2,3
1 2 3
Oxford Computing Laboratory
Data Exchange
Outline
1 Data Exchange Schema Mappings Dependencies
Solutions of Schema Mappings
2 Composing Schema Mappings
3 Inverses
4 Quasi-Inverses
5 Conclusion
Data Exchange Schema Mappings
Motivation
Data Exchange deals with the problem of transforming a database (the sourceinstance) into another database (thetarget instance) under a (possibly) different schema, while adhering to a set of conditions (the dependencies) between the source and target instances.
Target instance should reflect the information given in the source instance as accurately as possible
Target instance should be as general as possible (not introducing any more information than the source contained).
The “oldest” problem in Database Theory. Still open to a lot of research.
This talk will concentrate on relational databases.
Data Exchange Schema Mappings
Motivation
Data Exchange deals with the problem of transforming a database (the sourceinstance) into another database (thetarget instance) under a (possibly) different schema, while adhering to a set of conditions (the dependencies) between the source and target instances.
Target instance should reflect the information given in the source instance as accurately as possible
Target instance should be as general as possible (not introducing any more information than the source contained).
The “oldest” problem in Database Theory. Still open to a lot of research.
This talk will concentrate on relational databases.
Data Exchange Schema Mappings
Motivation
Data Exchange deals with the problem of transforming a database (the sourceinstance) into another database (thetarget instance) under a (possibly) different schema, while adhering to a set of conditions (the dependencies) between the source and target instances.
Target instance should reflect the information given in the source instance as accurately as possible
Target instance should be as general as possible (not introducing any more information than the source contained).
The “oldest” problem in Database Theory. Still open to a lot of research.
This talk will concentrate on relational databases.
Data Exchange Schema Mappings
Motivation
Data Exchange deals with the problem of transforming a database (the sourceinstance) into another database (thetarget instance) under a (possibly) different schema, while adhering to a set of conditions (the dependencies) between the source and target instances.
Target instance should reflect the information given in the source instance as accurately as possible
Target instance should be as general as possible (not introducing any more information than the source contained).
The “oldest” problem in Database Theory. Still open to a lot of research.
This talk will concentrate on relational databases.
Data Exchange Schema Mappings
Definitions and Example
Definition
A schema mappingis a triple
M= (S,T,Σ)
where Sis source schema,T istarget schema (with disjoint sets of relational symbols) and Σis a set of formulae in some logical formalism overhS,Ti definingthe schema mappingM.
Example
Let S=hEDLi,T=hED,DLi. Let
Σ ={∀x,y,z(EDL(x,y,z)→ED(x,y)∧DL(x,y))}.
Data Exchange Schema Mappings
Definitions and Example
Definition
A schema mappingis a triple
M= (S,T,Σ)
where Sis source schema,T istarget schema (with disjoint sets of relational symbols) and Σis a set of formulae in some logical formalism overhS,Ti definingthe schema mappingM.
Example
Let S=hEDLi,T=hED,DLi. Let
Σ ={∀x,y,z(EDL(x,y,z)→ED(x,y)∧DL(x,y))}.
Data Exchange Dependencies
Source-Target Conditions vs Target Conditions
We are interested inΣbeing a finite set of Tuple-Generating Dependencies (tgds) and Equality-Generating Dependencies(egds), in particular
Source-target tgds: FO formulae of the form
∀x(φ(x)→ ∃yψ(x,y))
whereφ is conjunction of atoms overS,ψ is conjunction of atoms overT.
Target tgds: FO formulae of the same form where bothφ andψ are conjunctions of atoms overT.
Target egds: FO formulae of the form∀x(φ(x)→xi =xj)where φis conjunction of atoms over Tand xi,xj are in x.
Full tgds: s-t tgds without∃
Notation: Universal quantifiers are usually ommitted. Existential quantifiers should always be written out!
Data Exchange Dependencies
Source-Target Conditions vs Target Conditions
We are interested inΣbeing a finite set of Tuple-Generating Dependencies (tgds) and Equality-Generating Dependencies(egds), in particular
Source-target tgds: FO formulae of the form
∀x(φ(x)→ ∃yψ(x,y))
whereφ is conjunction of atoms overS,ψ is conjunction of atoms overT.
Target tgds: FO formulae of the same form where bothφ andψ are conjunctions of atoms overT.
Target egds: FO formulae of the form∀x(φ(x)→xi =xj)where φis conjunction of atoms over Tand xi,xj are in x.
Full tgds: s-t tgds without∃
Notation: Universal quantifiers are usually ommitted. Existential quantifiers should always be written out!
Data Exchange Dependencies
Source-Target Conditions vs Target Conditions
We are interested inΣbeing a finite set of Tuple-Generating Dependencies (tgds) and Equality-Generating Dependencies(egds), in particular
Source-target tgds: FO formulae of the form
∀x(φ(x)→ ∃yψ(x,y))
whereφ is conjunction of atoms overS,ψ is conjunction of atoms overT.
Target tgds: FO formulae of the same form where bothφ andψ are conjunctions of atoms overT.
Target egds: FO formulae of the form∀x(φ(x)→xi =xj)where φis conjunction of atoms over Tandxi,xj are in x.
Full tgds: s-t tgds without∃
Notation: Universal quantifiers are usually ommitted. Existential quantifiers should always be written out!
Data Exchange Dependencies
Source-Target Conditions vs Target Conditions
We are interested inΣbeing a finite set of Tuple-Generating Dependencies (tgds) and Equality-Generating Dependencies(egds), in particular
Source-target tgds: FO formulae of the form
∀x(φ(x)→ ∃yψ(x,y))
whereφ is conjunction of atoms overS,ψ is conjunction of atoms overT.
Target tgds: FO formulae of the same form where bothφ andψ are conjunctions of atoms overT.
Target egds: FO formulae of the form∀x(φ(x)→xi =xj)where φis conjunction of atoms over Tandxi,xj are in x.
Full tgds: s-t tgds without∃
Notation: Universal quantifiers are usually ommitted. Existential quantifiers should always be written out!
Data Exchange Dependencies
Source-Target Conditions vs Target Conditions
We are interested inΣbeing a finite set of Tuple-Generating Dependencies (tgds) and Equality-Generating Dependencies(egds), in particular
Source-target tgds: FO formulae of the form
∀x(φ(x)→ ∃yψ(x,y))
whereφ is conjunction of atoms overS,ψ is conjunction of atoms overT.
Target tgds: FO formulae of the same form where bothφ andψ are conjunctions of atoms overT.
Target egds: FO formulae of the form∀x(φ(x)→xi =xj)where φis conjunction of atoms over Tandxi,xj are in x.
Full tgds: s-t tgds without∃
Notation: Universal quantifiers are usually ommitted. Existential quantifiers should always be written out!
Data Exchange Dependencies
Global-as-View and Local-as-View Contexts
There are special forms of tgds that come up particularly in Data Integration:
Local-as-ViewMappings are schema mappings in which Σis a set of s-t tgds in each of which φ(the left-hand side of the tgd) is a single atom.
Global-as-view Mappings are schema mappings in whichΣis a set of s-t tgds in each of which ψ(the right-hand side of the tgd) is a single atom.
LAV mappings have some nice invertibility properties
Data Exchange Dependencies
Global-as-View and Local-as-View Contexts
There are special forms of tgds that come up particularly in Data Integration:
Local-as-ViewMappings are schema mappings in which Σis a set of s-t tgds in each of which φ(the left-hand side of the tgd) is a single atom.
Global-as-view Mappings are schema mappings in whichΣis a set of s-t tgds in each of which ψ(the right-hand side of the tgd) is a single atom.
LAV mappings have some nice invertibility properties
Data Exchange Dependencies
Global-as-View and Local-as-View Contexts
There are special forms of tgds that come up particularly in Data Integration:
Local-as-ViewMappings are schema mappings in which Σis a set of s-t tgds in each of which φ(the left-hand side of the tgd) is a single atom.
Global-as-view Mappings are schema mappings in whichΣis a set of s-t tgds in each of which ψ(the right-hand side of the tgd) is a single atom.
LAV mappings have some nice invertibility properties
Data Exchange Dependencies
Global-as-View and Local-as-View Contexts
There are special forms of tgds that come up particularly in Data Integration:
Local-as-ViewMappings are schema mappings in which Σis a set of s-t tgds in each of which φ(the left-hand side of the tgd) is a single atom.
Global-as-view Mappings are schema mappings in whichΣis a set of s-t tgds in each of which ψ(the right-hand side of the tgd) is a single atom.
LAV mappings have some nice invertibility properties
Data Exchange Solutions of Schema Mappings
Definition
Definition
Given a schema mapping M= (S,T,Σ) and asource instance I ofS we define a solution for I to be an instance J ofT such thathI,Ji |= Σ. We define Sol(I) to be the set of all solutions for I.
We are interested in the most general solution: Definition
A universal solution J forI is a solution such that for any other solution J0 there exists a homomorphism h:J →J0.
Theorem
IfΣ is a finite set of s-t tgds, then there is always a universal solution for any source instance.
Data Exchange Solutions of Schema Mappings
Definition
Definition
Given a schema mapping M= (S,T,Σ) and asource instance I ofS we define a solution for I to be an instance J ofT such thathI,Ji |= Σ. We define Sol(I) to be the set of all solutions for I.
We are interested in the most general solution:
Definition
A universal solution J forI is a solution such that for any other solution J0 there exists a homomorphism h:J →J0.
Theorem
IfΣ is a finite set of s-t tgds, then there is always a universal solution for any source instance.
Data Exchange Solutions of Schema Mappings
Definition
Definition
Given a schema mapping M= (S,T,Σ) and asource instance I ofS we define a solution for I to be an instance J ofT such thathI,Ji |= Σ. We define Sol(I) to be the set of all solutions for I.
We are interested in the most general solution:
Definition
A universal solution J forI is a solution such that for any other solutionJ0 there exists a homomorphism h:J →J0.
Theorem
IfΣ is a finite set of s-t tgds, then there is always a universal solution for any source instance.
Data Exchange Solutions of Schema Mappings
Definition
Definition
Given a schema mapping M= (S,T,Σ) and asource instance I ofS we define a solution for I to be an instance J ofT such thathI,Ji |= Σ. We define Sol(I) to be the set of all solutions for I.
We are interested in the most general solution:
Definition
A universal solution J forI is a solution such that for any other solutionJ0 there exists a homomorphism h:J →J0.
Theorem
IfΣ is a finite set of s-t tgds, then there is always a universal solution for any source instance.
Data Exchange Solutions of Schema Mappings
Computing Universal Solutions (Outline)
To compute a universal solution we chasehI,∅iwith Σ: In each chase step we do
Select a dependency inΣ (a tgd or an egd) Find a homomorphism from φ(x) to K.
For a tgd: Extend the homomorphism by defining a freshlabelled null for each variable iny and take the image under ψ.
For an egd: Assume xi,xj have distinct images in K. If both are constants, then the chase ends in failure. Otherwise we either replace a labelled null by a constant or one labelled null by the other.
Repeat until no dependency inΣcan be used to extend the homomorphism
Theorem
For tgds and egds, if hI,Ji is the result of asuccessful finite chase, then J is a universal solution. If there exists a failing finite chase, then there is no solution.
Data Exchange Solutions of Schema Mappings
Computing Universal Solutions (Outline)
To compute a universal solution we chasehI,∅iwith Σ: In each chase step we do
Select a dependency inΣ (a tgd or an egd)
Find a homomorphism from φ(x) to K.
For a tgd: Extend the homomorphism by defining a freshlabelled null for each variable iny and take the image under ψ.
For an egd: Assume xi,xj have distinct images in K. If both are constants, then the chase ends in failure. Otherwise we either replace a labelled null by a constant or one labelled null by the other.
Repeat until no dependency inΣcan be used to extend the homomorphism
Theorem
For tgds and egds, if hI,Ji is the result of asuccessful finite chase, then J is a universal solution. If there exists a failing finite chase, then there is no solution.
Data Exchange Solutions of Schema Mappings
Computing Universal Solutions (Outline)
To compute a universal solution we chasehI,∅iwith Σ: In each chase step we do
Select a dependency inΣ (a tgd or an egd) Find a homomorphism from φ(x) to K.
For a tgd: Extend the homomorphism by defining a freshlabelled null for each variable iny and take the image under ψ.
For an egd: Assume xi,xj have distinct images in K. If both are constants, then the chase ends in failure. Otherwise we either replace a labelled null by a constant or one labelled null by the other.
Repeat until no dependency inΣcan be used to extend the homomorphism
Theorem
For tgds and egds, if hI,Ji is the result of asuccessful finite chase, then J is a universal solution. If there exists a failing finite chase, then there is no solution.
Data Exchange Solutions of Schema Mappings
Computing Universal Solutions (Outline)
To compute a universal solution we chasehI,∅iwith Σ: In each chase step we do
Select a dependency inΣ (a tgd or an egd) Find a homomorphism from φ(x) to K.
For a tgd: Extend the homomorphism by defining a freshlabelled null for each variable iny and take the image under ψ.
For an egd: Assume xi,xj have distinct images in K. If both are constants, then the chase ends in failure. Otherwise we either replace a labelled null by a constant or one labelled null by the other.
Repeat until no dependency inΣcan be used to extend the homomorphism
Theorem
For tgds and egds, if hI,Ji is the result of asuccessful finite chase, then J is a universal solution. If there exists a failing finite chase, then there is no solution.
Data Exchange Solutions of Schema Mappings
Computing Universal Solutions (Outline)
To compute a universal solution we chasehI,∅iwith Σ: In each chase step we do
Select a dependency inΣ (a tgd or an egd) Find a homomorphism from φ(x) to K.
For a tgd: Extend the homomorphism by defining a freshlabelled null for each variable iny and take the image under ψ.
For an egd: Assume xi,xj have distinct images in K. If both are constants, then the chase ends in failure. Otherwise we either replace a labelled null by a constant or one labelled null by the other.
Repeat until no dependency inΣcan be used to extend the homomorphism
Theorem
For tgds and egds, if hI,Ji is the result of asuccessful finite chase, then J is a universal solution. If there exists a failing finite chase, then there is no solution.
Data Exchange Solutions of Schema Mappings
Computing Universal Solutions (Outline)
To compute a universal solution we chasehI,∅iwith Σ: In each chase step we do
Select a dependency inΣ (a tgd or an egd) Find a homomorphism from φ(x) to K.
For a tgd: Extend the homomorphism by defining a freshlabelled null for each variable iny and take the image under ψ.
For an egd: Assume xi,xj have distinct images in K. If both are constants, then the chase ends in failure. Otherwise we either replace a labelled null by a constant or one labelled null by the other.
Repeat until no dependency inΣcan be used to extend the homomorphism
Theorem
For tgds and egds, if hI,Ji is the result of asuccessful finite chase, then J is a universal solution. If there exists a failing finite chase, then there is no solution.
Data Exchange Solutions of Schema Mappings
Computing Universal Solutions (Outline)
To compute a universal solution we chasehI,∅iwith Σ: In each chase step we do
Select a dependency inΣ (a tgd or an egd) Find a homomorphism from φ(x) to K.
For a tgd: Extend the homomorphism by defining a freshlabelled null for each variable iny and take the image under ψ.
For an egd: Assume xi,xj have distinct images in K. If both are constants, then the chase ends in failure. Otherwise we either replace a labelled null by a constant or one labelled null by the other.
Repeat until no dependency inΣcan be used to extend the homomorphism
Theorem
For tgds and egds, if hI,Ji is the result of asuccessful finite chase, then J is a universal solution. If there exists a failing finite chase, then there is no
Data Exchange Solutions of Schema Mappings
Canonical Universal Solutions
Definition
We call such a J (if it exists)a canonical universal solution. If it is unique (up to isomorphism), we call it the canonical universal solution.
However..
Example
Canonical solutions are not unique in general: Let S=hP,Qi,T=hRi. Let Σ ={P(x)→R(x),Q(x)→ ∃YR(Y)}. LetI ={P(a),Q(a)}. The result of a chase is either {R(a)} or {R(Y),R(a)}.
Canonical solutions may also not exist, in case there is no final chase (cyclic dependencies).
Data Exchange Solutions of Schema Mappings
Canonical Universal Solutions
Definition
We call such a J (if it exists)a canonical universal solution. If it is unique (up to isomorphism), we call it the canonical universal solution.
However..
Example
Canonical solutions are not unique in general: Let S=hP,Qi,T=hRi.
Let Σ ={P(x)→R(x),Q(x)→ ∃YR(Y)}. LetI ={P(a),Q(a)}.
The result of a chase is either {R(a)} or {R(Y),R(a)}.
Canonical solutions may also not exist, in case there is no final chase (cyclic dependencies).
Data Exchange Solutions of Schema Mappings
Canonical Universal Solutions
Definition
We call such a J (if it exists)a canonical universal solution. If it is unique (up to isomorphism), we call it the canonical universal solution.
However..
Example
Canonical solutions are not unique in general: Let S=hP,Qi,T=hRi.
Let Σ ={P(x)→R(x),Q(x)→ ∃YR(Y)}. LetI ={P(a),Q(a)}. The result of a chase is either {R(a)} or {R(Y),R(a)}.
Canonical solutions may also not exist, in case there is no final chase (cyclic dependencies).
Data Exchange Solutions of Schema Mappings
Canonical Universal Solutions
Definition
We call such a J (if it exists)a canonical universal solution. If it is unique (up to isomorphism), we call it the canonical universal solution.
However..
Example
Canonical solutions are not unique in general: Let S=hP,Qi,T=hRi.
Let Σ ={P(x)→R(x),Q(x)→ ∃YR(Y)}. LetI ={P(a),Q(a)}. The result of a chase is either {R(a)} or {R(Y),R(a)}.
Canonical solutions may also not exist, in case there is no final chase (cyclic dependencies).
Data Exchange Solutions of Schema Mappings
Weakly acyclic tgds and polynomial-length chase
Want to know when computing canonical universal solutions is feasible.
Using onlyfull tgdsensures that the chase is always finite and always has the same result. However full tgds are too restricted in practice. We define special sets of tgds called weakly acyclic sets of tgdswhich strictly include full tgds, as well asacyclic sets of inclusion
dependencies.
Weakly acyclic sets of tgds are sets of tgds, such that there is no cycle in the dependency graph going through aspecial edge(edges representing existentially quantified variables)
Theorem
Let Σbe the union of a weakly acyclic set of tgds and a set of egds. Then there exists a polynomial in the size of K that bounds the length of every chase of K with Σ.
Data Exchange Solutions of Schema Mappings
Weakly acyclic tgds and polynomial-length chase
Want to know when computing canonical universal solutions is feasible.
Using onlyfull tgdsensures that the chase is always finite and always has the same result. However full tgds are too restricted in practice.
We define special sets of tgds called weakly acyclic sets of tgdswhich strictly include full tgds, as well asacyclic sets of inclusion
dependencies.
Weakly acyclic sets of tgds are sets of tgds, such that there is no cycle in the dependency graph going through aspecial edge(edges representing existentially quantified variables)
Theorem
Let Σbe the union of a weakly acyclic set of tgds and a set of egds. Then there exists a polynomial in the size of K that bounds the length of every chase of K with Σ.
Data Exchange Solutions of Schema Mappings
Weakly acyclic tgds and polynomial-length chase
Want to know when computing canonical universal solutions is feasible.
Using onlyfull tgdsensures that the chase is always finite and always has the same result. However full tgds are too restricted in practice.
We define special sets of tgds called weakly acyclic sets of tgdswhich strictly include full tgds, as well asacyclic sets of inclusion
dependencies.
Weakly acyclic sets of tgds are sets of tgds, such that there is no cycle in the dependency graph going through aspecial edge(edges representing existentially quantified variables)
Theorem
Let Σbe the union of a weakly acyclic set of tgds and a set of egds. Then there exists a polynomial in the size of K that bounds the length of every chase of K with Σ.
Data Exchange Solutions of Schema Mappings
Weakly acyclic tgds and polynomial-length chase
Want to know when computing canonical universal solutions is feasible.
Using onlyfull tgdsensures that the chase is always finite and always has the same result. However full tgds are too restricted in practice.
We define special sets of tgds called weakly acyclic sets of tgdswhich strictly include full tgds, as well asacyclic sets of inclusion
dependencies.
Weakly acyclic sets of tgds are sets of tgds, such that there is no cycle in the dependency graph going through aspecial edge(edges representing existentially quantified variables)
Theorem
Let Σbe the union of a weakly acyclic set of tgds and a set of egds. Then there exists a polynomial in the size of K that bounds the length of every chase of K with Σ.
Data Exchange Solutions of Schema Mappings
Weakly acyclic tgds and polynomial-length chase
Want to know when computing canonical universal solutions is feasible.
Using onlyfull tgdsensures that the chase is always finite and always has the same result. However full tgds are too restricted in practice.
We define special sets of tgds called weakly acyclic sets of tgdswhich strictly include full tgds, as well asacyclic sets of inclusion
dependencies.
Weakly acyclic sets of tgds are sets of tgds, such that there is no cycle in the dependency graph going through aspecial edge(edges representing existentially quantified variables)
Theorem
Let Σbe the union of a weakly acyclic set of tgds and a set of egds. Then there exists a polynomial in the size of K that bounds the length of every chase of K with Σ.
Composing Schema Mappings
Outline
1 Data Exchange
2 Composing Schema Mappings The Composition Operator
Language for Expressing Composition Second-Order tgds and Data Exchange
3 Inverses
4 Quasi-Inverses
5 Conclusion
Composing Schema Mappings The Composition Operator
Motivation
Natural operators on schema mappings:
Composition Inverse . . .
⇒ High-levelmanagement of schema mappings. Need for well-definedsemantics.
In practice, useful for describing and maintaining successive evolutions of a schema.
S1
T1 M2 T01 M1
S01 M0
1
Composing Schema Mappings The Composition Operator
Motivation
Natural operators on schema mappings:
Composition Inverse . . .
⇒ High-levelmanagement of schema mappings. Need for well-definedsemantics.
In practice, useful for describing and maintaining successive evolutions of a schema.
S1
T1 M2 T01 M1
S01 M0
1
Composing Schema Mappings The Composition Operator
Motivation
Natural operators on schema mappings:
Composition Inverse . . .
⇒ High-levelmanagement of schema mappings.
Need for well-definedsemantics.
In practice, useful for describing and maintaining successive evolutions of a schema.
S1
T1 M2 T01 M1
S01 M0
1
Composing Schema Mappings The Composition Operator
Motivation
Natural operators on schema mappings:
Composition Inverse . . .
⇒ High-levelmanagement of schema mappings.
Need for well-definedsemantics.
In practice, useful for describing and maintaining successive evolutions of a schema.
S1
T1 M2 T01 M1
S01 M0
1
Composing Schema Mappings The Composition Operator
Motivation
Natural operators on schema mappings:
Composition Inverse . . .
⇒ High-levelmanagement of schema mappings.
Need for well-definedsemantics.
In practice, useful for describing and maintaining successive evolutions of a schema.
S1
T1 M2 T01 M1
S01 M0
1
Composing Schema Mappings The Composition Operator
Motivation
Natural operators on schema mappings:
Composition Inverse . . .
⇒ High-levelmanagement of schema mappings.
Need for well-definedsemantics.
In practice, useful for describing and maintaining successive evolutions of a schema.
S1
T1 M2 T01 M1
S01 M0
1
M01−1
Composing Schema Mappings The Composition Operator
Motivation
Natural operators on schema mappings:
Composition Inverse . . .
⇒ High-levelmanagement of schema mappings.
Need for well-definedsemantics.
In practice, useful for describing and maintaining successive evolutions of a schema.
S1
T1 M2 T01 M1
S01 M0
1
M01−1 M01−1◦ M1◦ M2
Composing Schema Mappings The Composition Operator
Definition
Unambiguous andquery-independentsemantics of schema mappings composition.
Based on natural composition of binary relations.
Definition
hI,Ki instance of M1◦ M2 ⇐⇒ there exists J such that: hI,Ji instance ofM1
hJ,Ki instance of M2.
Composing Schema Mappings The Composition Operator
Definition
Unambiguous andquery-independentsemantics of schema mappings composition.
Based on natural composition of binary relations.
Definition
hI,Ki instance of M1◦ M2 ⇐⇒ there exists J such that: hI,Ji instance ofM1
hJ,Ki instance of M2.
Composing Schema Mappings The Composition Operator
Definition
Unambiguous andquery-independentsemantics of schema mappings composition.
Based on natural composition of binary relations.
Definition
hI,Ki instance ofM1◦ M2 ⇐⇒ there exists J such that:
hI,Ji instance ofM1 hJ,Ki instance of M2.
Composing Schema Mappings The Composition Operator
Example
Example
M12= (S1,S2,Σ12),M23= (S2,S3,Σ23).
S1 = {Takes(·,·)}
S2 = {Takes’(·,·),Student(·,·)}
S3 = {Enrollment(·,·)}
Σ12 = {∀n∀c(Takes(n,c)→Takes’(n,c)),
∀n∀c(Takes(n,c)→ ∃sStudent(n,s))}
Σ23 = {∀n∀s∀c(Student(n,s)∧Takes’(n,c)→Enrollment(s,c))}
M12◦ M23= (S1,S3,Σ) with:
Σ ={∀n∃s∀c(Takes(n,c)→Enrollment(s,c))}
Composing Schema Mappings The Composition Operator
Example
Example
M12= (S1,S2,Σ12),M23= (S2,S3,Σ23).
S1 = {Takes(·,·)}
S2 = {Takes’(·,·),Student(·,·)}
S3 = {Enrollment(·,·)}
Σ12 = {∀n∀c(Takes(n,c)→Takes’(n,c)),
∀n∀c(Takes(n,c)→ ∃sStudent(n,s))}
Σ23 = {∀n∀s∀c(Student(n,s)∧Takes’(n,c)→Enrollment(s,c))}
M12◦ M23= (S1,S3,Σ) with:
Composing Schema Mappings Language for Expressing Composition
A Positive Result. . .
Proposition
IfM1 is a mapping expressed as a finite set offull source-to-targettgds and M2 is a mapping expressed as a finite set of source-to-targettgds, M1◦ M2 can be expressed as a mapping with source-to-targettgds.
Example
A(x,y)→B(x) ◦ B(x)→ ∃zC(x,z)
=
A(x,y) → ∃zC(x,z) Remark
If the tgds ofM2 arefull, so will be the tgds of M1◦ M2.
Composing Schema Mappings Language for Expressing Composition
A Positive Result. . .
Proposition
IfM1 is a mapping expressed as a finite set offull source-to-targettgds and M2 is a mapping expressed as a finite set of source-to-targettgds, M1◦ M2 can be expressed as a mapping with source-to-targettgds.
Example
A(x,y)→B(x) ◦ B(x)→ ∃zC(x,z)
=
A(x,y) → ∃zC(x,z) Remark
If the tgds ofM2 arefull, so will be the tgds of M1◦ M2.
Composing Schema Mappings Language for Expressing Composition
A Positive Result. . .
Proposition
IfM1 is a mapping expressed as a finite set offull source-to-targettgds and M2 is a mapping expressed as a finite set of source-to-targettgds, M1◦ M2 can be expressed as a mapping with source-to-targettgds.
Example
A(x,y)→B(x) ◦ B(x)→ ∃zC(x,z)
=
A(x,y) → ∃zC(x,z)
Remark
If the tgds ofM2 arefull, so will be the tgds of M1◦ M2.
Composing Schema Mappings Language for Expressing Composition
A Positive Result. . .
Proposition
IfM1 is a mapping expressed as a finite set offull source-to-targettgds and M2 is a mapping expressed as a finite set of source-to-targettgds, M1◦ M2 can be expressed as a mapping with source-to-targettgds.
Example
A(x,y)→B(x) ◦ B(x)→ ∃zC(x,z)
=
A(x,y) → ∃zC(x,z) Remark
If the tgds ofM2 arefull, so will be the tgds ofM1◦ M2.
Composing Schema Mappings Language for Expressing Composition
. . . and a Negative Result
Proposition
There exist two mappings M1 andM2, defined with a finite set of source-to-target tgds, such thatM1◦ M2 cannot be expressed in first-order logic (even with least fix point).
Proof.
Composition query problem: givenI and J, ishI,Ji an instance of the composed schema mapping?
Reduction of 3-colorability
⇒ NP-completeproblem.
⇒ Q.E.D. (descriptive complexity theory) Remark
Fagin’s theorem implies that this can be defined as an existential second- order formula. More precise characterization of the language?
Composing Schema Mappings Language for Expressing Composition
. . . and a Negative Result
Proposition
There exist two mappings M1 andM2, defined with a finite set of source-to-target tgds, such thatM1◦ M2 cannot be expressed in first-order logic (even with least fix point).
Proof.
Composition query problem: givenI and J, ishI,Ji an instance of the composed schema mapping?
Reduction of 3-colorability
⇒ NP-completeproblem.
⇒ Q.E.D. (descriptive complexity theory)
Remark
Fagin’s theorem implies that this can be defined as an existential second- order formula. More precise characterization of the language?
Composing Schema Mappings Language for Expressing Composition
. . . and a Negative Result
Proposition
There exist two mappings M1 andM2, defined with a finite set of source-to-target tgds, such thatM1◦ M2 cannot be expressed in first-order logic (even with least fix point).
Proof.
Composition query problem: givenI and J, ishI,Ji an instance of the composed schema mapping?
Reduction of 3-colorability
⇒ NP-completeproblem.
⇒ Q.E.D. (descriptive complexity theory) Remark
Fagin’s theorem implies that this can be defined as an existential second- order formula. More precise characterization of the language?
Composing Schema Mappings Language for Expressing Composition
A Natural Language for Composition
Definition
A second-ordertgd is of the form:
∃f1. . .fm((∀x1(φ1→ψ1))∧. . .∧(∀xn(φn→ψn)))
fi are function symbols
φi are conjunction of source relation atoms and equalities ψi are conjunctions of target relation atoms
(additional safety conditions omitted).
Example
∀n∃s∀c(Takes(n,c) → Enrollment(s,c))
≡
∃f(∀n∀c(Takes(n,c) → Enrollment(f(n),c)))
Composing Schema Mappings Language for Expressing Composition
A Natural Language for Composition
Definition
A second-ordertgd is of the form:
∃f1. . .fm((∀x1(φ1→ψ1))∧. . .∧(∀xn(φn→ψn)))
fi are function symbols
φi are conjunction of source relation atoms and equalities ψi are conjunctions of target relation atoms
(additional safety conditions omitted).
Example
∀n∃s∀c(Takes(n,c) → Enrollment(s,c))
≡
∃f(∀n∀c(Takes(n,c) → Enrollment(f(n),c)))
Composing Schema Mappings Language for Expressing Composition
A Natural Language for Composition
Definition
A second-ordertgd is of the form:
∃f1. . .fm((∀x1(φ1→ψ1))∧. . .∧(∀xn(φn→ψn)))
fi are function symbols
φi are conjunction of source relation atoms and equalities ψi are conjunctions of target relation atoms
(additional safety conditions omitted).
Example
∀n∃s∀c(Takes(n,c) → Enrollment(s,c))
≡
Composing Schema Mappings Language for Expressing Composition
Composition Theorem
Theorem
Second-order tgds are closedunder composition.
Constructive proof.
All features of the language (disjunction, equalities, second-order quantifiers) are required.
Remark
Any finite set of s-t tgds can be represented as a singlesecond-order tgds.
Composing Schema Mappings Language for Expressing Composition
Composition Theorem
Theorem
Second-order tgds are closedunder composition.
Constructive proof.
All features of the language (disjunction, equalities, second-order quantifiers) are required.
Remark
Any finite set of s-t tgds can be represented as a singlesecond-order tgds.
Composing Schema Mappings Language for Expressing Composition
Composition Theorem
Theorem
Second-order tgds are closedunder composition.
Constructive proof.
All features of the language (disjunction, equalities, second-order quantifiers) are required.
Remark
Any finite set of s-t tgds can be represented as a singlesecond-order tgds.
Composing Schema Mappings Language for Expressing Composition
Composition Theorem
Theorem
Second-order tgds are closedunder composition.
Constructive proof.
All features of the language (disjunction, equalities, second-order quantifiers) are required.
Remark
Any finite set of s-t tgds can be represented as a singlesecond-order tgds.
Composing Schema Mappings Second-Order tgds and Data Exchange
Properties of the Composition Operator
Two important properties for using second-order tgds in data exchange:
Polynomial chaseof schema mappings defined as a second-order tgd.
PTIME computation of certain answers to union of conjunctive queries.
Remark
Contrast with the fact that computing certain answers with arbitrary first- order mappings isundecidable.
Second-order tgds:
Powerful enough for including normal tgds and being closed under composition.
Restricted enoughto get a PTIME algorithm for answering queries.
Composing Schema Mappings Second-Order tgds and Data Exchange
Properties of the Composition Operator
Two important properties for using second-order tgds in data exchange:
Polynomial chaseof schema mappings defined as a second-order tgd.
PTIME computation of certain answers to union of conjunctive queries.
Remark
Contrast with the fact that computing certain answers with arbitrary first- order mappings isundecidable.
Second-order tgds:
Powerful enough for including normal tgds and being closed under composition.
Restricted enoughto get a PTIME algorithm for answering queries.
Composing Schema Mappings Second-Order tgds and Data Exchange
Properties of the Composition Operator
Two important properties for using second-order tgds in data exchange:
Polynomial chaseof schema mappings defined as a second-order tgd.
PTIME computation of certain answers to union of conjunctive queries.
Remark
Contrast with the fact that computing certain answers with arbitrary first- order mappings isundecidable.
Second-order tgds:
Powerful enough for including normal tgds and being closed under composition.
Restricted enoughto get a PTIME algorithm for answering queries.
Inverses
Outline
1 Data Exchange
2 Composing Schema Mappings
3 Inverses
Motivation and Definition Conditions
Computing Inverses
4 Quasi-Inverses
5 Conclusion
Inverses Motivation and Definition
Semantics of Inverses
Example
Let M12be EDL(x,y,z)→ED(x,y)∧DL(y,z). LetM21 be ED(x,y)∧DL(y,z)→EDL(x,y,z).
Under what conditions do we obtain the original EDL relation? DefineΓas EDL(x,y,z0)∧EDL(x0,y,z))→EDL(x,y,z). We want M21 to be theinverse of M12 for precisely those instancesI which satisfy Γ.
Inverses Motivation and Definition
Semantics of Inverses
Example
Let M12be EDL(x,y,z)→ED(x,y)∧DL(y,z). LetM21 be ED(x,y)∧DL(y,z)→EDL(x,y,z).
Under what conditions do we obtain the original EDL relation?
DefineΓas EDL(x,y,z0)∧EDL(x0,y,z))→EDL(x,y,z). We want M21 to be theinverse of M12 for precisely those instancesI which satisfy Γ.
Inverses Motivation and Definition
Semantics of Inverses
Example
Let M12be EDL(x,y,z)→ED(x,y)∧DL(y,z). LetM21 be ED(x,y)∧DL(y,z)→EDL(x,y,z).
Under what conditions do we obtain the original EDL relation?
DefineΓas EDL(x,y,z0)∧EDL(x0,y,z))→EDL(x,y,z).
We want M21 to be theinverse of M12 for precisely those instancesI which satisfy Γ.
Inverses Motivation and Definition
Semantics of Inverses
Example
Let M12be EDL(x,y,z)→ED(x,y)∧DL(y,z). LetM21 be ED(x,y)∧DL(y,z)→EDL(x,y,z).
Under what conditions do we obtain the original EDL relation?
DefineΓas EDL(x,y,z0)∧EDL(x0,y,z))→EDL(x,y,z).
We wantM21 to be theinverse of M12 for precisely those instancesI which satisfy Γ.
Inverses Motivation and Definition
Obvious Solution fails
Let M12 be defined by Σ12.
Let S12={hI,Ji:hI,Ji |= Σ12}. Let S21={hJ,Ii:hI,Ji ∈S12}.
Can we define inverse as mapping associated with set S21? No: If hI,Ji |= Σ12, then for allI0 ⊆I and J0 ⊇J, we also have hI0,J0i |= Σ12. This will not hold for the pairs inS21.
Inverses Motivation and Definition
Obvious Solution fails
Let M12 be defined by Σ12. Let S12={hI,Ji:hI,Ji |= Σ12}.
Let S21={hJ,Ii:hI,Ji ∈S12}.
Can we define inverse as mapping associated with set S21? No: If hI,Ji |= Σ12, then for allI0 ⊆I and J0 ⊇J, we also have hI0,J0i |= Σ12. This will not hold for the pairs inS21.
Inverses Motivation and Definition
Obvious Solution fails
Let M12 be defined by Σ12. Let S12={hI,Ji:hI,Ji |= Σ12}.
Let S21={hJ,Ii:hI,Ji ∈S12}.
Can we define inverse as mapping associated with set S21? No: If hI,Ji |= Σ12, then for allI0 ⊆I and J0 ⊇J, we also have hI0,J0i |= Σ12. This will not hold for the pairs inS21.
Inverses Motivation and Definition
Obvious Solution fails
Let M12 be defined by Σ12. Let S12={hI,Ji:hI,Ji |= Σ12}.
Let S21={hJ,Ii:hI,Ji ∈S12}.
Can we define inverse as mapping associated with set S21?
No: If hI,Ji |= Σ12, then for allI0 ⊆I and J0 ⊇J, we also have hI0,J0i |= Σ12. This will not hold for the pairs inS21.
Inverses Motivation and Definition
Obvious Solution fails
Let M12 be defined by Σ12. Let S12={hI,Ji:hI,Ji |= Σ12}.
Let S21={hJ,Ii:hI,Ji ∈S12}.
Can we define inverse as mapping associated with set S21? No: If hI,Ji |= Σ12, then for allI0 ⊆I and J0 ⊇J, we also have hI0,J0i |= Σ12. This will not hold for the pairs in S21.
Inverses Motivation and Definition
Definition
Definition
We say two schema mappings are equivalent on I, if they have the same solutions for I.
Definition
Let MId= (S1,Sc1,ΣId)be the identity mapping. Let
M12= (S1,S2,Σ12) andM21= (S2,Sc1,Σ21) be schema mappings. Let σ be the composition formulaΣ12◦Σ21 and letM11= (S1,Sc1, σ). LetI be an instance of S1. ThenM21 is an inverse of M12 for I if M11 and MId are equivalent onI.
Inverses Motivation and Definition
Definition
Definition
We say two schema mappings are equivalent on I, if they have the same solutions for I.
Definition
Let MId= (S1,Sc1,ΣId) be the identity mapping. Let
M12= (S1,S2,Σ12) andM21= (S2,Sc1,Σ21) be schema mappings. Let σ be the composition formulaΣ12◦Σ21 and letM11= (S1,Sc1, σ). LetI be an instance of S1. ThenM21 is an inverse of M12 for I if M11 and MId are equivalent onI.
Inverses Motivation and Definition
Local and Global Inverses
We call such an inverse a local inverse.
IfS is a class of instances such thatM21 is an inverse of M12 for eachI in S, then we call it anS-inverse.
IfS is the class of all instances, we call it a global inverse.
Inverses Motivation and Definition
Local and Global Inverses
We call such an inverse a local inverse.
IfS is a class of instances such thatM21 is an inverse of M12 for eachI in S, then we call it anS-inverse.
IfS is the class of all instances, we call it a global inverse.
Inverses Motivation and Definition
Local and Global Inverses
We call such an inverse a local inverse.
IfS is a class of instances such thatM21 is an inverse of M12 for eachI in S, then we call it anS-inverse.
IfS is the class of all instances, we call it a global inverse.
Inverses Conditions
Unique Solutions Property
We would like to know when global inverses exist.
IfM21 is an inverse of M12, andI1 andI2 are distinct source instances, then the solutions ofI1 andI2 are different underM12. Unique solutions property. A mapping M12 has it if wheneverI1,I2
are distinct source instances, then their solution sets are distinct This is necessary for global inverses to exist.
For LAV schema mappings it is also sufficient.
Inverses Conditions
Unique Solutions Property
We would like to know when global inverses exist.
IfM21 is an inverse of M12, andI1 andI2 are distinct source instances, then the solutions ofI1 andI2 are different under M12.
Unique solutions property. A mapping M12 has it if wheneverI1,I2
are distinct source instances, then their solution sets are distinct This is necessary for global inverses to exist.
For LAV schema mappings it is also sufficient.
Inverses Conditions
Unique Solutions Property
We would like to know when global inverses exist.
IfM21 is an inverse of M12, andI1 andI2 are distinct source instances, then the solutions ofI1 andI2 are different under M12. Unique solutions property. A mapping M12 has it if wheneverI1,I2
are distinct source instances, then their solution sets are distinct
This is necessary for global inverses to exist. For LAV schema mappings it is also sufficient.