Knowledge Base Exchange

(1)

Marcelo Arenas¹, Elena Botoeva², and Diego Calvanese²

1 Dept. of Computer Science, PUC Chile marenas@ing.puc.cl

2 KRDB Research Centre, Free Univ. of Bozen-Bolzano, Italy lastname@inf.unibz.it

Abstract. In this paper, we study the problem of exchanging knowledge between two knowledge bases (KBs) connected through mappings, with a special interest in exchanging implicit knowledge, not only data like in the traditional database exchange setting. As representation formalism we use Description Logics (DL) that exhibit a reasonable tradeoff between expressive power and complexity of reasoning. Thus, we assume that the source and target KBs are given as a DL TBox+ABox, while the mappings have the form of DL TBox assertions. We study the problem of translating the knowledge in the source KB according to these mappings. We define a general framework of KB exchange, and specify the problems of computing and representing different kinds of solutions, i.e., target KBs with specified properties, given a source KB and a mapping. We then develop first results and techniques and study the complexity of KB exchange for the case ofDL-LiteRDFS, a DL that corresponds to the FOL fragment of RDFS.

1 Introduction

In data exchange, data structured under one schema (called source schema) must be restructured and translated into an instance of a different schema (called target schema), and the way in which this restructuring should occur is specified by means of a mapping from the source schema to the target schema [5]. Such a problem has been studied extensively in recent years, under various choices for the languages used to specify the source and target schema, and the mappings [2]. While incomplete information in this setting is introduced by the mapping layer (see also [6]), one fundamental assumption in the works on data exchange is that the source is a (completely specified) database.

In this paper, we go beyond this setting, and consider data exchange in the case where implicit knowledge is present in the source, by which new data may be inferred.

We follow the line of the work in [1], where a general framework for data exchange is proposed, in which the source data may be incompletely specified, and thus (possibly infinitely) many source instances are implicitly represented. The framework in [1]

in based on the general notion ofrepresentation system, as a mechanism to represent multiple instances of a data schema, and considers the problem of incomplete data ex- changes under mappings constituted by a set of tuple generating dependencies (tgds).

We refine that framework to the case where as a representation system we use De- scription Logics (DL) knowledge bases (KBs) constituted by a TBox and an ABox, and where mappings are sets of DL inclusions. While in the traditional data exchange

(2)

setting, given a source instance and a mapping specification,(universal) solutionsare target instances derived from the source instance and the mapping, in our case solutions are target DL KBs, derived from the source KB and the mapping. Besides a notion of (universal) solution based on the correspondence between models of source and target KBs, we introduce also the weaker notion of(universal) CQ-solution, based on the correspondence between answers to conjunctive queries over source and target KBs.

In our setting where we exchange DL KBs, in order to minimize the exchange (and hence transfer and materialization) of explicit (i.e., ABox) information, we are inter- ested in computing universal (CQ-)solutions that contain as much implicit knowledge as possible. This leads us to define a new problem, calledrepresentability, whose goal is to compute from a source TBox and a mapping, a target TBox that leads to a universal (CQ-)solution when it is combined with a suitable ABox computed from the source ABox, independently of the actual source ABox.

We then develop first results and techniques for KB exchange and for the representability problem in the case of KBs expressed inDL-LiteRDFS, a DL that corresponds to the FOL fragment of RDFS.DL-LiteRDFSis a fragment ofDL-Lite_R[4] that does not allow for existential quantification (i.e., concepts of the form∃R) in the right-hand side of concept inclusions, nor for disjointness assertions.

The paper is organized as follows. In Section 2 we give preliminary notions on DLs and conjunctive queries (CQs). In Section 3 we define our framework of KB exchange.

In Section 4 we present the techniques for deciding the defined reasoning tasks. Finally, in Section 5 we draw some conclusions and outline issues for future work.

2 Preliminaries

We introduce here the necessary notions about the description logic (DL) that we use in this article, and about conjunctive queries, which we adopt as our query formalism.

2.1 DL-Lite_RKnowledge Bases

The DLs of theDL-Litefamily [4] of light-weight description logics are characterized by the fact that reasoning can be done in polynomial time, and that data complexity of reasoning and conjunctive query answering is in AC⁰. We adopt hereDL-LiteR, and present now its syntax and semantics.

LetN_C,N_R,N_a be sets of concept, role, and individual names, respectively, and assume thatA ∈ N_C andP ∈ N_R. InDL-LiteR,B andC are used to denote basic and complex concept descriptions, respectively, andRandQare used to denote basic and complex roles, respectively. These concept and roles constructs are defined by the following grammar:

R ::= P | P⁻ Q ::= R | ¬R

B ::= A | ∃R C ::= B | ¬B

In the following, for a basic roleR, we useR⁻ to denoteP⁻ whenR = P, andP whenR=P⁻.

(3)

ADL-LiteR TBox is a finite set of concept inclusionsB vC and role inclusions RvQ. ADL-LiteRABox is a finite set of membership assertions of the formA(a)and P(a, b), wherea, b∈N_a. ADL-LiteRKBKis a pairhT,Ai, whereT is aDL-LiteR

TBox and A is a DL-LiteR ABox. As usual, TBoxes represent implicit knowledge, while ABoxes represent the data.

In the following, we will also make use of a restricted form ofDL-Lite_R TBoxes.

ADL-Lite_R TBox is said to bedefiniteif there are only atomic concepts and atomic roles on the right-hand side of its inclusions. In other words, a definite TBox may not mention in its right-hand side a concept of the form∃R, and may not contain concept or role disjointness assertions. We callDL-LiteRDFSthe fragment ofDL-LiteRobtained by considering only definiteDL-LiteR TBoxes. Intuitively,DL-LiteRDFS corresponds to the fragment of RDFS [3] that is embeddable in FOL (and hence in DLs).

The semantics of DL-LiteR is defined in the standard way. We just remark that we use MOD(K)to denote the set of all models of KBK. From now on, we assume that interpretations satisfy the standard name assumption, that is, we assume given a fixed infinite setUof individual names, and we assume that for every interpretation I =h∆Î,·Îi, it holds that∆Î ⊆ UandaÎ = afor everya∈ ∆Î. Notice that this implies the Unique Name Assumption (UNA), i.e., different individuals are interpreted as different domain elements.

AsignatureΣis a set of concept and role names. An interpretationI=h∆^I,·^Iiis said to be an interpretation ofΣif it is defined exactly on the concept and role names in Σ. Given a KBK, thesignatureΣ(K)ofKis the alphabet of concept and role names occurring inK, and K is said to bedefined over (or simply, over) a signature Σ if Σ(K)⊆Σ(and likewise for a TBoxT, an ABoxA, a concept inclusionsB vC, a role inclusionsRvQ, and membership assertionsA(a)andR(a, b)).

2.2 Conjunctive Queries and Certain Answers

A conjunctive query (CQ) over a signature Σ is a first-order formula of the form q(x) = ∃y.conj(x,y), where x,yare tuples of variables and conj(x,y)is a con- junction of atoms of the form:(1) A(t), with Aa concept name inΣ andt either a constant fromUor a variable fromxory, or(2)P(t1, t2), withP a role name inΣ andti (i = 1,2) either a constant from Uor a variable fromxor y. In a conjunctive queryq(x) =∃y.conj(x,y),xis the tuple of free variables ofq(x). Aunion of conjunctive queries (UCQ)is a formula of the form:q(x) = Wn

i=1∃yi.conj_i(x,yi), where eachconji(x,yi)is as before. A queryq(either a CQ or a UCQ) is said to be a query over a KBKifqis a query over a signatureΣandΣ⊆Σ(K).

Let q be a CQ ∃y1· · · ∃y`.conj(x1, . . . , xk, y1, . . . , y`) over a signature Σ and I = h∆Î,·Îi an interpretation ofΣ. Then the answer of qover I, denoted by qÎ, is defined as the set of tuples(a₁, . . . , a_k)of elements from∆Î for which there ex- ist a tuple (b₁, . . . , b_`) of elements from ∆Î such that I satisfies every conjunct in conj(a₁, . . . , a_k, b₁, . . . , b_`). Moreover, given a UCQ q = Wn

i=1q_i, the answer of q over an interpretationI, denoted byq^I, is defined asSn

i=1q^I_i. Finally, given a queryq (either a CQ or a UCQ) over a KBK, the answer toqoverK, denoted bycert(q,K), is defined ascert(q,K) =T

I∈MOD(K)q^I. Each tuple incert(q,K)is called acertain answerforqoverK.

(4)

Certain answers inDL-LiteRcan be characterized through the notion of chase. We call achasea (possibly infinite) set of assertions of the formA(t),P(t, t⁰), wheret,t⁰ are either individuals, orlabeled nullsinterpreted as not necessarily distinct domain elements. ForDL-LiteRKBs, we employ the notion of oblivious chase as defined in [4].

For such a KB hT,Ai, the chase ofA w.r.t. T, denotedchase_T(A), is a chase obtained fromAby adding facts implied by inclusions inT, and introducing labeled nulls whenever required by an inclusion with∃Rin the right-hand side (see [4] for details).

3 Exchanging Knowledge Bases

In this section, we introduce the knowledge exchange framework used in the paper.

The starting point to define this framework is the notion of mapping, which has been shown to be of fundamental importance in the context of data exchange [5]. Formally, aDL-LiteR-mapping(or justmapping) is a tupleM = (Σ₁, Σ₂,T12), whereΣ₁,Σ₂ are disjoint signatures andT12 is aDL-LiteRTBox whose inclusions are of the form:

(1)C₁ vC₂, whereC₁,C₂ are complex concepts overΣ₁ andΣ₂, respectively, and (2)Q₁vQ₂, whereQ₁andQ₂are complex roles overΣ₁andΣ₂, respectively.

LetM = (Σ₁, Σ₂,T₁₂)be a mapping. Intuitively, mapping Mspecifies how a knowledge base over the vocabularyΣ1 should be translated into a knowledge base over the vocabularyΣ2. This intuition is formalized in terms of the notion ofsolution, which is defined as follows. Given an interpretation I1 of Σ1 and an interpretation I2 of Σ2, pair (I1,I2)satisfies TBox T12, denoted by (I1,I2) |= T12, if for each concept inclusionC1vC2∈ T12, it holds thatC₁Î¹⊆C₂Î², and for each role inclusion Q1 vQ2∈ T12, it holds thatQÎ₁¹ ⊆QÎ₂². Moreover, given an interpretationIofΣ1, SATM(I)is defined as the set of interpretationsJ ofΣ2such that(I,J)|=T12, and given a setX of interpretations ofΣ1, SATM(X)is defined as:

SATM(X) =S

I∈XSATM(I).

Then the notion of solution under a mapping is defined as follows, by considering this notion of satisfaction and the knowledge exchange framework proposed in [1].

Definition 1. LetM= (Σ1, Σ2,T12)be a mapping,K1a KB overΣ1, andK2a KB overΣ2. ThenK2is said to be a solution forK1underMif:

MOD(K2)⊆SATM(MOD(K1)).

That is,K2 is a solution forK1underMif for every modelI2ofK2, there exists a modelI1ofK1such that(I1,I2)|=T12.

LetM= (Σ₁, Σ₂,T₁₂). A KBK₁overΣ₁can have an infinite number of solutions underM. Thus, it is natural to ask what is agoodsolution for this knowledge base. Next we introduce the notion of universal solution, which is a simple extension of the concept of solution introduced in Definition 1, and is based on the notion of universal solution introduced in [1].

Definition 2. LetM= (Σ₁, Σ₂,T12)be a mapping,K1a KB overΣ₁, andK2a KB overΣ₂. ThenK2is said to be a universal solution forK1underMif:

MOD(K₂) =SATM(MOD(K₁)).

(5)

In the preceding definition, KBK₂is considered to be a good solution for KBK₁under mapping Mas the models ofK₂ exactly correspond to the valid translations of the models ofK₁according toM. We illustrate Definitions 1 and 2 in an example.

Example 1. LetM = (Σ₁, Σ₂,T₁₂), where Σ₁ = {A₁, B₁},Σ₂ = {A₂, B₂}, and T₁₂ = {A₁ v A₂, B₁ v B₂}. Furthermore, assume that K₁ = hT₁,A₁i, where T1 = {B1 vA1}andA1 ={B1(a)}. Then the following knowledge bases overΣ2

are solutions forK1underM:

K2=hT2,A2i, where T2=∅, A2={B2(a), A2(a)}

K⁰₂=hT₂⁰,A⁰₂i, where T₂⁰={B2vA2}, A⁰₂={B2(a)}

Moreover,K2is a universal solution forK1underM, whileK⁰₂is not. In fact, ifI2is an interpretation ofΣ₂such that:∆Î² ={a, b},AÎ₂² ={a}, andB₂Î² ={a, b}, then we have thatI₂ 6∈ MOD(K₂⁰)sinceI₂does not satisfy inclusionB₂ vA₂, butI₂ ∈ SATM(MOD(K₁))since(I₁,I₂) |= T₁₂forI₁ ∈ MOD(K₁)defined as∆Î¹ ={a}, AÎ₁¹ ={a}andB₁Î¹ ={a}.

In the previous example,K⁰₂is not a universal solution forK₁underMsince inclusion B₂ v A₂cannot be deduced from the information inK₁andM. Or, more formally, K⁰₂is not a universal solution asB2vA2is not implied byhT1∪ T12,A1i. However, K⁰₂can also be considered as a solution ofK1 that is desirable to materialize, as the implicit knowledge inK2(i.e., TBoxT2) represents the implicit knowledge inK1(i.e., TBoxT1), given the way that conceptsA1andB1have to be translated according to mappingM. In fact, solutionK⁰₂is as good as solutionK2in terms of the information that can be extracted from them by using some specific queries, but with the advantage thatK⁰₂represents knowledge in a more compact way. In what follows, we introduce a new class of good solutions that captures this intuition with respect to the widely used fragment of CQs.

Definition 3. LetM = (Σ1, Σ2,T12)be a mapping,K1 = hT1,A1ia KB overΣ1, andK2a KB overΣ2. ThenK2is said to be a CQ-solution forK1underMif for every CQqoverΣ₂, we have thatcert(q,hT1∪ T12,A1i)⊆cert(q,K2).

Moreover,K2is said to be a universal CQ-solution forK1underMif for every CQ qoverΣ₂, we have thatcert(q,hT1∪ T12,A1i) =cert(q,K2).

Notably, we have in Example 1 that both KBK2and KBK⁰₂are universal CQ-solutions for KBK1under mappingM.

A natural question at this point is what is the relationship between the notions of solution presented in this section. The following proposition, which is straightforward to prove, shows that the notion of (universal) CQ-solution is weaker than the notion of (universal) solution.

Proposition 1. LetM = (Σ₁, Σ₂,T₁₂)be a mapping, K₁ a KB overΣ₁, andK₂a KB overΣ2. IfK2is a (universal) solution forK1underM, thenK2is a (universal) CQ-solution forK1underM.

In this paper, we study several fundamental problems related to the notions of solution presented here. These problems are formally introduced below.

(6)

3.1 Knowledge Base Exchange: Reasoning Tasks

In the data exchange scenario [5], as well as in the knowledge exchange scenario [1], the problem of materializing solutions has been identified as the fundamental problem to solve. Given a classU of mappings (for example, the class ofDL-Lite_R-mappings) and a DLL(for example,DL-Lite_R), the problem of computing solutions is defined as follows forUandL:

PROBLEM : COMPSOL(U,L)

INPUT : A mappingM ∈ Uand anL-knowledge baseK1overΣ1

TO DO : Compute anL-knowledge baseK2overΣ2such thatK2is a solution forK1underM

Given a class U of mappings and a DL L, the computation problems COMPUNIVSOL(U,L), COMPCQSOL(U,L), and COMPUNIVCQSOL(U,L)are defined exactly as above, but considering universal solutions, CQ-solutions, and universal CQ-solutions instead of solutions, respectively.

In the rest of this paper, we study these problems, and some other fundamental problems associated to them, for a restriction of the class of mappings introduced in this section. More precisely, a mappingM=hΣ₁, Σ₂,T₁₂iis said to bedefiniteifT₁₂ is aDL-LiteRDFSTBox (recall the definition of definite TBoxes in Section 2.1). We use Udefto denote the class of definite mappings. Then, as our first result, we obtained that the chase can be used to compute universal solutions for definite mappings and DL- LiteRDFSTBoxes. In what follows, letchase_T,Σ(A)denote the projection ofchase_T(A) on the signatureΣ.

Proposition 2. LetM = (Σ1, Σ2,T12)be a definite mapping andK1 = hT1,A1ia DL-LiteRDFS KB overΣ1. Then h∅,chase_T₁₂,Σ₂(chase_T₁(A1))iis a universal solution forK1underM.

Thus, given that for definite mappings andDL-LiteRDFS TBoxes, the setschase_T₁(A1) andchase_T₁₂,Σ₂(chase_T₁(A1))are always finite and can be computed in polynomial time [4], we obtain as a corollary of Propositions 1 and 2 that the problems of computing solutions defined above can be solved in polynomial time for definite mappings andDL- LiteRDFSTBoxes.

Corollary 1. COMPSOL(Udef,DL-LiteRDFS), COMPUNIVSOL(Udef,DL-LiteRDFS), COMPCQSOL(Udef,DL-LiteRDFS), and COMPUNIVCQSOL(Udef,DL-LiteRDFS) can all be solved in polynomial time.

Unfortunately, the solutions obtained by using Proposition 2 are of little interest to us because the generated target ABoxes can be very large. Hence, we turn our attention to the case of non-empty target TBoxes and, more specifically, to the problem of computing universal CQ-solutions that include as much implicit knowledge as possible. This gives rise to the followingseparation properties.

Definition 4. LetM= (Σ₁, Σ₂,T₁₂)be a mapping andT₁a TBox overΣ₁.

– T₁ is representableinM if there exists a TBoxT₂ over Σ₂ such that for every ABoxA1overΣ1, it holds thathT2,chase_T₁₂,Σ₂(A1)iis a universal CQ-solution forhT1,A1iunderM.T2is called arepresentationofT1inM.

(7)

– T₁isweakly representableinMif there exists a mappingM^? = (Σ₁, Σ₂,T₁₂^?) such thatT₁₂⊆ T₁₂^?,T₁∪ T₁₂|=T₁₂^? andT₁is representable inM^?.

The separation problems are interesting to us because a positive answer would mean that we can construct the TBox of a solution by considering only the input TBox and the mapping, independently of the input ABox. On the other hand, the ABox of the solution can be found simply by translating the input ABox according to the rules in the mapping (and the input TBox). We illustrate the notions just defined in the following example.

Example 2. LetΣ₁ = {A₁, B₁},Σ₂ = {A₂, B₂}, and T₁ = {B₁ v A₁}. IfM = (Σ₁, Σ₂,T₁₂)withT₁₂={A₁vA₂, B₁vB₂}, then we have thatT₂={B₂ vA₂} is a representation ofT₁inM.

On the other hand, ifM⁰ = (Σ1, Σ2,T₁₂⁰ )withT₁₂⁰ ={A1 vA2}, then we have thatT1is not representable inM⁰: letA⁰₁ ={B1(a)}, thenchase_T⁰

12,Σ₂(A⁰₁) = ∅and for no TBox T₂⁰,hT₂⁰,chase_T⁰

12,Σ₂(A⁰₁)iis a universal CQ-solution tohT1,A⁰₁iunder M⁰. However, if we considerT₁₂^? =T₁₂⁰ ∪ {B1 vA₂}, we conclude thatT1is weakly representable in M⁰ since T₁₂⁰ ⊆ T₁₂^?, T1∪ T₁₂⁰ |= T₁₂^? andT1 is representable in M^?= (Σ₁, Σ₂,T₁₂^?)(in fact,∅is a representation ofT₁inM^?). Note, thatT₁₂⁰ ⊂ T₁₂. Now, ifM⁰⁰ = (Σ1, Σ2,T₁₂⁰⁰)withT₁₂⁰⁰ = {A1 v A2, B1 v B2, C1 v B2}, then again we have thatT1 is not representable inM⁰⁰: let A⁰⁰₁ = {B1(a), C1(c)}, thenchase_T⁰⁰

12,Σ₂(A⁰⁰₁) = {B2(a), B2(c)}. The query q() ←A2(a)evaluates to true inhT1∪ T₁₂⁰⁰,A⁰⁰₁i, hence in order for a TBoxT₂⁰⁰to be a representation ofT1inM⁰⁰, it must contain the inclusionB₂ v A₂. On the other hand, it implies that the query q⁰()←A₂(c)evaluates to true inhT₂⁰⁰,chase_T₁₂⁰⁰,Σ₂(A⁰⁰₁)i, whereas it evaluates to false inhT1∪T₁₂⁰⁰,A⁰⁰₁i. However, again if we considerT₁₂^??=T₁₂⁰⁰∪{B1vA₂}, we conclude thatT₁is weakly representable inM⁰⁰.

4 Techniques for Deciding Representability

We address now the problem of deciding (weak) representability of a TBox in a mapping, for the case where TBoxes are expressed inDL-LiteRDFSand mappings are definite.

We start by showing some properties that characterize solutions in terms of chases.

In the following for two chasesC1 andC2, ahomomorphism fromC1 to C2 is a mappinghfrom the individuals and labeled nulls inC1to those inC2such that(1)his the identity on the individuals,(2)ifA(t)∈ C₁thenA(h(t))∈ C₂, and(3)ifP(t, t⁰)∈ C₁thenP(h(t), h(t⁰))∈ C₂. We writeC₁→ C₂if there is a homomorphism fromC₁to C2, andC1C2ifC1→ C2andC2→ C1.

From now on, we assumeΣ1andΣ2as given, and for a mappingM= (Σ1, Σ2,T12), we use simplyMto denote alsoT12. Then we have the following characterizations of solutions in terms of chases, which are similar to the characterizations in [1].

Proposition 3. LetMbe a definite mapping,K1 = hT1,A1ia DL-LiteRDFS KB over Σ₁, andK2 = hT2,A2ia DL-LiteRDFS KB overΣ₂. If chaseM,Σ2(chase_T₁(A1)) → chaseT2(A₂),thenK₂is a CQ-solution forK₁underM.

(8)

Corollary 2. LetMbe a definite mapping,K₁ =hT₁,A₁ia DL-LiteRDFSKB overΣ₁, andK₂=hT₂,A₂ia DL-LiteRDFSKB overΣ₂. ThenK₂is a universal CQ-solution for K₁underMif chaseM,Σ2(chase_T₁(A₁))chaseT2(A₂).

4.1 Checking Representability of a TBox in a Mapping

We address now therepresentabilityproblem, which is to decide, given a TBoxT1over Σ1and a mappingM, whetherT1is representable inM(cf. Definition 4).

We start by considering the decision problem associated with representability:

Given a mappingM, a TBoxT₁overΣ₁, and a TBoxT₂overΣ₂, check whetherT₂ is a representation ofT₁inM, i.e., for each ABoxA₁overΣ₁,hT₂,chaseM,Σ2(A₁)i is a universal CQ-solution forhT₁,A₁iunderM.

For definite mappings andDL-LiteRDFSTBoxes, the decision problem associated with representability can be solved in two steps:

1. Check whether for each ABoxA1overΣ1,hT2,chaseM,Σ2(A1)iis a CQ-solution forhT1,A1iunderM.

2. Check whether for each ABoxA1overΣ₁and for each CQqoverΣ₂, we have thatcert(q,hT₂,chaseM,Σ2(A₁)i)⊆cert(q,hT₁∪ M,A₁i).

If both checks succeed, thenT₂is a representation ofT₁inM, otherwise it is not. We develop now techniques to perform these two tests.

Step 1:Checking whether for each ABoxA1overΣ1,hT2,chaseM,Σ₂(A1)iis a CQ- solution forhT1,A1iunderM.

We introduce now some notions that help us to characterize when a mapping is able to

“translate” the inclusions inT1to the target TBox.

We usepn(X)to denoteAifX isA, andP ifXis∃P,∃P⁻,P, orP⁻. ForX, Y DL-LiteRDFSexpressions, we say thatX iscompatible withY if(i)pn(X) =pn(Y), and(ii)ifXis∃R, thenY is∃R,R, orR⁻. For aDL-LiteRDFSinclusionα=X1vX2, we uselhs(α)to denoteX1, andrhs(α)to denoteX2. We say thatαisleft-compatible (resp.,right-compatible) withX, ifXis compatible withlhs(α)(resp.,rhs(α)).

LetMbe a definite mapping,α = N1 v M1 aDL-LiteRDFS inclusion over Σ1, and µ ∈ M left-compatible with M₁. Then the translation set of α and µ inM, denoted M(α, µ,M), is the set ofDL-LiteRDFS inclusions over Σ₂ such that, if there

Table 1.Definition ofM(α, µ,M).Ai,Eiare atomic concepts,Ri,Siare basic roles.

α µ ν β

A1vE1 E1vE2 A1vA2 A2vE2

∃R1vE1 E1vE2 ∃R1vA2 A2vE2

R1vR2 ∃R2vE2

R1vS1 S1vS2 R1vR2 R2vS2

∃S1vE2 ∃R1vA2 A2vE2

R1vR2 ∃R2vE2

∃S1−vE2 ∃R1−vA2 A2vE2

R1vR2 ∃R2−

vE2

(9)

exists an inclusionν inMleft-compatible withN₁, thenβ ∈M(α, µ,M), whereβis uniquely defined byα,µ,ν as shown in Table 1. When the mappingMis clear from the context, we writeM(α, µ). We say thatαisredundantw.r.t.µ=M₁⁰ vM₂(inM) ifM2vM2∈M(α, µ). We explain these definitions in an example.

Example 3. Letα=S1vR1andµ=∃R1vA2.

∃S₁⁻

∃S1

S1

∃R⁻₁

∃R1

R1

α

∃S⁻₂

∃S2

S2

E2

A2

ν ν⁰

µ

ν⁰⁰ β

β⁰

Letν = S₁ v S₂andν⁰ = ∃S1 v E₂be inM. Then{∃S2 v A₂, E₂ v A₂} ⊆ M(α, µ,M). Ifν⁰⁰=∃S1vA₂∈ M, thenαis redundant w.r.t.µ.

If none ofν,ν⁰andν⁰⁰is inM, thenM(α, µ,M)is empty.

With these notions in place, we can characterize CQ-solutions in the context of the representability problem.

Proposition 4. LetMbe a definite mapping,T1a DL-LiteRDFSTBox overΣ1, andT2a DL-LiteRDFS TBox overΣ2. Then for each ABoxA1, the KBhT2,chase_M,Σ₂(A1)iis a CQ-solution forhT1,A1iunderMif and only if for each inclusionα, s.t.T1|=α, and for each inclusionµ∈ Mleft-compatible withrhs(α), there existsβ ∈M(α, µ)such thatT2|=β.

Step 2:Checking whether for each ABoxA1overΣ1, and for each CQqoverΣ2, we have thatcert(q,hT2,chase_M,Σ₂(A1)i)⊆cert(q,hT1∪ M,A1i).

Recall the definition of the translation set. Now, letβ = N₂ v M₂ be aDL-LiteRDFS

inclusion overΣ₂, andν ∈ Mright-compatible withN₂. Then thereverse-translation setofβ andν inM, denotedM⁻(β, ν,M), is the set ofDL-LiteRDFS inclusions over Σ1such that, if there exists an inclusionµinMright-compatible withM2, thenα∈ M⁻(β, ν,M), whereαis uniquely defined byβ,ν, andµas shown in Table 1.

Proposition 5. LetMbe a definite mapping,T1a DL-LiteRDFS TBox overΣ1, andT2

a DL-LiteRDFSTBox overΣ2. Then for each ABoxA1overΣ1and for each CQqover Σ2,cert(q,hT2,chaseM,Σ₂(A1)i) ⊆ cert(q,hT1∪ M,A1i)if and only if for each inclusionβ, s.t.T2|=β, and for each inclusionν ∈ Mright-compatible withlhs(β), there existsα∈M⁻(β, ν), s.t.T1|=α.

With the techniques for Steps 1 and 2 at hand, we are ready to characterize repre- sentations.

(10)

Corollary 3. LetMbe a definite mapping,T₁a DL-LiteRDFS TBox overΣ₁, andT₂a DL-LiteRDFSTBox overΣ₂. Then for each ABoxA₁overΣ₁,hT₂,chaseM,Σ2(A₁)iis a universal CQ-solution forhT₁,A₁iunderMiff

– for each inclusionα, s.t.T1 |=α, and for each inclusionµ ∈ Mleft-compatible withrhs(α), there existsβ∈M(α, µ)s.t.T2|=β, and

– for each inclusionβ, s.t.T2 |=β, and for each inclusionν ∈ Mright-compatible withlhs(β), there existsα∈M⁻(β, ν), s.t.T1|=α.

Now, we can check whether a givenT1is representable in a givenM.

Theorem 1. LetMbe a definite mapping andT1a DL-LiteRDFS TBox overΣ1. Then we can check whetherT1is representable inMin polynomial time.

4.2 Checking Weak Representability

We can easily solve the weak representability problem forDL-LiteRDFSKBs even if the mappings are arbitrary tgds, i.e., assertions of the formq₁→q₂, mapping a CQq₁over Σ₁to a CQq₂overΣ₂of the same arity asq₁. We call such a mapping a tgd-mapping.

LetT₁be aDL-LiteRDFS TBox andMa tgd-mapping. We can enrichMby compiling knowledge fromT₁into it:

– Letq1 →q2 be a tgd inM, withq1a CQ overΣ1 andq2a CQ overΣ2. Let the UCQQ1={q₁¹, . . . , q₁^k}be the perfect reformulation ofq1w.r.t.T1(see, e.g., [4]).

Then we extendMwith a tgdq₁ⁱ →q2for eachq₁ⁱ ∈Q1.

– We perform this for each tgd inM. The resulting mapping is denoted withM^∗. Proposition 6. Let M be a tgd-mapping,T1 a DL-LiteRDFS TBox overΣ1, andM^∗ the mapping constructed as described above. Then for each ABoxA1overΣ1, the KB h∅,chase_M^∗,Σ₂(A1)iis a universal CQ-solution forhT1,A1iunderM.

From this we immediately get:

Theorem 2. LetMbe a definite mapping andT1a DL-LiteRDFSTBox overΣ₁. ThenT1

is weakly representable inM.

Thus, when the source KB is expressed inDL-LiteRDFSit is always possible to sepa- rate data by enriching mappings, even if mappings are tgds. WhenMis a set of tgds, the size ofM^∗might be exponential in the size ofM. IfMis aDL-LiteRDFSmapping, the size ofM^∗is always polynomial in the size ofM.

5 Conclusions

We have specialized the framework for KB exchange proposed in [1] to the case of DLs, i.e., the source and target KBs are given as DL KBs and the mappings have the form of DL TBox assertions. Moreover, we have defined a new reasoning problem: representability of a TBox in a mapping, whose goal is to compute from a source TBox and a mapping, a target TBox that leads to a universal (CQ-)solution when it is combined with a suitable ABox computed from the source ABox, independently of the

(11)

actual source ABox. A variation of this problem, called weak representability, allows for modification of the mapping, so that the source TBox becomes representable.

Then, we have developed first results and techniques for KB exchange and for the representability problem in the case of mappings and KBs expressed in DL-LiteRDFS

(such mappings are called definite).DL-LiteRDFS is a fragment ofDL-Lite_R that does not allow for existential quantification (i.e., concepts of the form∃R) in the right-hand side of concept inclusions, nor for disjointness assertions. It implies, first, that the chase is always finite inDL-LiteRDFS, and, second, that KBs are always consistent. We have shown the following results for definite mappings and DL-LiteRDFS KBs:(i)the problems of computing (universal) (CQ-)solutions can be solved in polynomial time;(ii)the problem of representability of a TBox in a mapping is decidable in polynomial time;

(iii)everyDL-LiteRDFSTBox is weakly representable in a definite mapping.

The main direction for future work is to extend the results to the case of fullDL- LiteR. This brings up the problem of labelled nulls in the chase, which becomes infinite in general. Moreover, due to the possible presence of disjointness assertions in TBoxes, consistency of the source and target KBs has to be checked.

Acknowledgments

The authors are grateful to Vladislav Ryzhikov and Evgeny Sherkhonov for helpful discussions, and to the anonymous referees for many helpful comments. The authors were supported by Marie Curie action “International Research Staff Exchange Scheme (IRSES)”, FP7-PEOPLE-2009-IRSES, under grant number 24761. M. Arenas was also supported by Fondecyt grant 1090565. E. Botoeva and D. Calvanese were also supported by the EU under the ICT Collaborative Project ACSI (Artifact-Centric Service Interoperation), grant agreement number FP7-257593, and under the large-scale inte- grating project (IP) OntoRule (ONTOlogies meet Business RULEs), grant agreement number FP7-231875.

References

1. M. Arenas, J. P´erez, and J. L. Reutter. Data exchange beyond complete data. In Proc.

of the 30th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS 2011), pages 83–94, 2011.

2. B. Barcel`o. Logical foundations of relational data exchange.SIGMOD Record, 38(1):49–58, 2009.

3. D. Brickley and R. V. Guha. RDF vocabulary description language 1.0: RDF Schema. W3C Recommendation, World Wide Web Consortium, Feb. 2004. Available athttp://www.

w3.org/TR/rdf-schema/.

4. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: TheDL-Litefamily. J. of Automated Reasoning, 39(3):385–429, 2007.

5. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: Semantics and query answering.Theoretical Computer Science, 336(1):89–124, 2005.

6. L. Libkin and C. Sirangelo. Data exchange and schema mappings in open and closed worlds.

J. of Computer and System Sciences, 77(3):542–571, 2011.