• Aucun résultat trouvé

GDE: General Data Exchange with Schema and Data Level Mappings

N/A
N/A
Protected

Academic year: 2022

Partager "GDE: General Data Exchange with Schema and Data Level Mappings"

Copied!
3
0
0

Texte intégral

(1)

GDE: General Data Exchange with Schema and Data Level Mappings

Rana Awada and Iluju Kiringa

University of Ottawa, SITE

rawad049@uottawa.ca kiringa@site.uottawa.ca

1 Introduction

Data exchange (DE) [5, 3] and data coordination [1, 2, 6] are two important set- tings that were introduced previously in the literature to resolve the problem of integrating information that resides in different sources. A DE setting moves data residing in independent applications, which refer to the same object using the same name, and accesses it through a new target schema. However, a data coordination setting allows the access of data residing in independent sources and that possibly belong to different sets of vocabularies, without necessarily exchanging it and while maintaining autonomy.

Although a data coordination setting provides users with an amalgamated view of related information, this solution is not enough for applications that require a view of related information using a unified set of vocabularies for periodic reporting and decision making. We introduce a general data exchange (GDE) setting that extends DE settings to allow collaboration at the instance level, using a mapping table M, that specifies for each constant value in the source, the set of related (or corresponding) constant values in the target.1

We show in this paper that a GDE setting can be formalized using the knowl- edge exchange framework introduced in [4]. It allows us to store a target knowl- edge base (KB) which consists of a subset of the explicit data exchanged that is necessary to infer the full set of exchanged information using a setΣt of FO sentences. We identify in our work the class of “best” KBs to materialize and we define the set of certain answers.

2 Preliminaries

A (DE)setting[5, 3] is a tupleS= (S,T, Σst), where Sis a source schema,T is a target schema,SandTdo not have predicate symbols in common, andΣst

consists of a set of source-to-target tuple-generatingdependencies (st-tgds) that establish the relationship between source and target schemas. A st-tgd is a FO- sentence of the form:∀¯x∀¯y(φ(¯x,y)¯ → ∃¯z ψ(¯x,z)),¯ where φ(¯x,y) and¯ ψ(¯x,z) are¯ conjunctions of relational atoms overSandTrespectively. LetConstandVarbe infinite and disjoint sets of constants and nulls, respectively. We consider in our

1 We consider in this work a particular interpretation of related data in a mapping table; that is, a source element is always uniquely identified by at least one target element.

(2)

work “complete” source instances I of S, where it holds that dom(I) ⊆Const and do not contain missing data in the form of nulls. However, a target instance J ofT, is allowed to contain null values, and it holds thatdom(J)⊆Const∪Var;

A knowledge base [4] over a schemaRis a pair (K, Σ), whereKis an instance ofR(the explicit data) andΣ is a set of logical sentences overR(the implicit data). The set ofmodelsof (K, Σ), denoted byMod(K, Σ), is defined as the set of instances ofRthat contain the explicit data inKand the implicit data inΣ;

that is,Mod(K, Σ) corresponds to the set{K0|K0 is an instance ofR,K⊆K0 and K0 |=Σ }. From now on, KR0 denotes the restriction of instanceK to a subsetR0 of its schemaR.

Mapping tables [6] are mechanisms that establish how values from different domains correspond. In its simplest form, given two domains D1 and D2, not necessarily disjoint, a mapping table over (D1,D2) is a subset of D1×D2. Let ConstS andConstT be the sets of source and target constants respectively. We consider in our work mapping tables with the following property: for each value a∈ConstS∩dom(M), there exists at least a single target valuea0 ∈ConstT∩ dom(M) such that M(a, a0) holds, and there does not exist a source value b ∈ ConstS∩dom(M), where b is different than a and M(b, a0) holds. We say a0 uniquely identifiesainM. We defineCas the set of values indom(M)∩ConstT that uniquely identify source values mapped inM.

3 GDE a Knowledge Exchange System

A GDE setting S = (S,T,M, Σst) extends a DE setting with (1) a binary relation symbol M that appears neither in S nor in T, and that is called a source-to-target mapping; and (2)Σst that consists of a set ofmapping st-tgds, which are FO sentences of the form: ∀¯x∀¯y∀¯z(φ(¯x,y)¯ ∧µ(¯x,z)¯ → ∃w ψ(¯¯ z,w)),¯ where (a)φ(¯x,y) and¯ ψ(¯z,w) are conjunctions of relation symbols over¯ SandT respectively, and (b)µ(¯x,z) is a conjunction of relation symbols that only use¯ the st-mapping relation symbolM. We denote st-mapping tables byM.

In a GDE setting, source KBs are of the form ((I∪ {M}), Σs =∅), which correspond to data in the source instance I and the st-mapping table M. On the other hand, the target KBs are of the form ((J ∪ {M}), Σt) whereΣt is a set of FO sentences, of type f ulltgds (which are tgds that do not use existen- tial quantication). We formalize the notion of a (universal) GDE KB-solution, extending the notion of knowledge exchange (universal) solution in [4] to allow coordinating the source and target information provided byM, as follows:

1. J is aGDE KB-solution forI and M under S, if for everyK∈Mod((J∪ {M}), Σt) there is K0 ∈ Mod((I∪ {M}), Σs = ∅)) such that the following hold: (a)KM0 ⊆KM, and (b) ((KS0 ∪KM0 ), KTst.

2. Also,Jis auniversal GDE KB-solution(UGDE) forIandM underS, ifJis a GDE KB-solution, and for everyK0∈Mod((I∪{M}), Σs=∅) there isK∈ Mod((J∪{M}, Σt) such that (a)KM ⊆KM0 , and (b) ((KS0∪KM0 ), KTst. Intuitively, in a GDE settingS, C is the sole set of target values that can capture correctly the set of source values exchanged to a target instance. There-

(3)

fore, intuitively a GDE KB-solution J in S has a domain dom(J)⊆ C∪Var.

We defineΣtas the following set of full tgds over a schemaT∪ {M,Related}, whereRelatedis a fresh binary table:

1. For eachT ∈T∪ {M}of aritynand 1≤i≤n:

∀x1· · · ∀xn(T(x1, . . . , xi, . . . , xn)→Related(xi, xi)).

2. ∀x∀y∀z(M(z, x)∧ M(z, y)∧C(x)→Related(x, y)).

3. For eachT ∈Tof arityn:

∀x1, y1· · · ∀xn, yn(T(x1, . . . , xn)∧Vn

i=1Related(xi, yi)→T(y1, . . . , yn)).

In a GDE setting, we define “best” solutions formally following [4] as: Let S be a GDE setting, I be a source instance,M an st-mapping table, and J a UGDE solution forI andM underS. Then J is aminimalUGDE solution, if (1) there is no proper subsetJ0 ofJ such thatJ0 is a UGDE solution forI and M underS, and (2) there is no UGDE solution J0 such thatdom(J0)∩ConstT is properly contained in dom(J) ∩ ConstT. Also, given a fixed GDE setting, generating UGDE solutions and minimal UGDE solutions is inLogspace.

4 Query Answering

We adapt the notion of a certain answer in the usual DE setting to the GDE setting. Formally, letSbe a GDE setting,Ia source instance,M an st-mapping table, andQa conjunctive query overT. The set of certain answers ofQoverI andM and underS, denotedcertainS((I∪ {M}), Q), corresponds to the set of tuples of constants that belong to the evaluation ofQ overKT, for each GDE KB-solutionJ for I andM andK ∈Mod((J∪ {M}), Σt). Finally, generating certainS((I∪ {M}), Q) is in Logspace.

5 Future Work

An interesting extension for this work would be defining a GDE setting with a target that contains egds and tgds constraints. Also, investigating GDE in a peer-to-peer setting might add interesting challenges to the problem.

References

1. Lawrence, M., Pottinger, R., Staub-French, S.: Data Coordination: Supporting Con- tingent Updates. In: VLDB (2011)

2. Philip A. Bernstein , Fausto Giunchiglia , Anastasios Kementsietsidis , John My- lopoulos , Luciano Serafini , llya Zaihrayeu : Data Management for Peer-to-Peer Computing: A Vision. pp. 89–94 (2002)

3. Arenas, M., Barcel´o, P. , Libkin, L.,Murlak, F.: Relational and XML Data Exchange.

Morgan & Claypool Publishers, (2010).

4. Arenas, M. , Perez, J. ,Reutter, J.: Data exchange beyond complete data. In : PODS, pp. 83–94 (2011)

5. Fagin,R., Kolaitis, P. G. , Miller,R. J. , Popa, L.: Data exchange: semantics and query answering. In: Theoretical Computer Science, pp. 89–124 (2005)

31(4), pp. 761–791 (1984).

6. Kementsietsidis, A., Arenas, M., Miller, R. J. : Mapping data in peer-to-peer sys- tems: Semantics and algorithmic issues. In: SIGMOD, pp. 325–336 (2003).

Références

Documents relatifs

For such mappings and purely navigational queries based on RPQs and some extensions, query answering is decidable, with coNP data complexity [8,12]; to achieve tractability,

Considering that Facebook exports a JSON data dump, and that Twitter similarly will import 23 a JSON data dump - the process of transforma- tions will involve: (i) uplift

We study the relational to RDF data exchange problem, where the tar- get constraints are specified using Shape Expression schema (ShEx). We investi- gate two fundamental problems:

From a technical point of view, these service providers can be also seen as data providers and data consumers at the same time, e.g., they receive data as a data consumer from some

In this paper we develop an intuitionistic relevance-logic based semantics that allows us to extend the polynomial time “naive evaluation” tech- nique to Data Exchange

We present in this paper several services based on GXS that already run on top of an existing F2F network: distributed fo- rums where users can exchange ideas in a threaded hierarchy

The purpose of the Variability Exchange Language is to support the standardization of these interfaces by a common exchange format that defines, which information is exchanged between

In this paper, we investigate the problem of data exchange in a heterogeneous setting, where the source is a relational database, the target is a graph database, and the schema