Rewriting SELECT SPARQL queries from 1:n complex correspondences

(1)

OATAO is an open access repository that collects the work of Toulouse

researchers and makes it freely available over the web where possible

Any correspondence concerning this service should be sent

to the repository administrator:

tech-oatao@listes-diff.inp-toulouse.fr

This is an author’s version published in:

https://oatao.univ-toulouse.fr/22203

To cite this version:

Thiéblin, Elodie and Amarger, Fabien and Haemmerlé, Ollivier and

Hernandez, Nathalie and Trojahn, Cassia Rewriting SELECT

SPARQL queries from 1:n complex correspondences. (2016) In:

11th International Workshop on Ontology Matching, collocated with

the 15th International Semantic Web Conference (OM 2016), 18

October 2016 - 18 October 2016 (Kobe, Japan).

Open Archive Toulouse Archive Ouverte

Official URL

(2)

Rewriting SELECT SPARQL queries from 1:n

complex correspondences

Élodie Thiéblin, Fabien Amarger, Ollivier Haemmerlé, Nathalie Hernandez, Cassia Trojahn

IRIT & Université de Toulouse 2 Jean Jaurès, Toulouse, France elodie.thieblin@gmail.com,{fabien.amarger, ollivier.haemmerle,

nathalie.hernandez,cassia.trojahn}@irit.fr

Abstract. This paper presents a mechanism for rewriting SPARQL queries based on complex ontology correspondences. While the useful-ness of simple correspondences, involving single entities from both source and target ontologies, has long been recognized, query rewriting requires more expressive links between ontology entities expressing the true rela-tionships between them. Here, complex correspondences, in the format 1:n, between overlapping ontologies are exploited for rewriting SELECT SPARQL queries, so that they can be expressed over different RDF data sets in the Linked Open Data. Our approach has been evaluated using two data sets, one from the agriculture domain and another based on a reduced set involving the ontologies from the OAEI Conference track.

1 Introduction

A SPARQL query is intrinsically related to the ontological model that describes the RDF source. To federate knowledge from different sources described by var-ious ontologies, a SPARQL query must be adapted to each of the knowledge bases. In order to use the Linked Open Data potential at its best, it is impor-tant to bridge the gap of semantic heterogeneity between knowledge bases. On-tology matching [5] is a solution for finding correspondences (i.e., an alignment) between two ontologies. There are two types of correspondences : simple corre-spondences and complex correcorre-spondences. A simple correspondence matches an element from the first ontology to its semantically related ontological element in the second ontology. Nevertheless, simple correspondences cannot cover every case of use because of the model differences between ontological sources. Com-plex correspondences palliate the lack of expressiveness of simple alignments. They extend simple correspondences to correspondences between complex con-structions of ontological entities of the two ontologies.

Simple correspondences can be easily used to transform SPARQL queries. The usual approach (integrated in the Alignment API [3]) is to replace the IRI of an ontological entity in the initial query by the IRI of its corresponding entity. This approach considers that the correspondence stands for an equivalence rela-tion. However by using simple correspondences, not all SPARQL queries can be transformed. In this paper, an approach that exploits complex correspondences

(3)

for SPARQL query transformation is proposed. Even though there are only a few systems able to automatically generate them, manually drawing correspondences used in the proposed mechanism is likely a less fastidious task than manually rewriting every SPARQL query for each new RDF-triple store.

Our approach is based on a set of rules for rewriting a subset of SELECT SPARQL queries from complex correspondences involving an equivalence rela-tion between ontology entities. These correspondences are expressed in EDOAL, a language proposed for representing complex correspondences. The approach has been validated on two data sets. The first one was built to meet the needs of agriculture experts willing to find cross knowledge about agronomic taxons between DBpedia and a dedicated knowledge base. The second data set was in-spired from a subset of queries from the OAEI oa4qa1 task data set and could be further developed in order to enrich this track.

The rest of the paper is structured as follows. §2 introduces ontology matching and the EDOAL syntax. §3 discusses related work and §4 presents the rewriting rules on which our approach is based. §5 discusses the validation of the approach and §6 concludes the paper and presents perspectives for future work.

2 Ontology matching

Matching two ontologies is the process of generating an alignment A between two ontologies O and O0. A is directional, denoted AO→O0 :

Definition 1 (Alignment). An alignment AO→O0 is a set of correspondences

AO→O0 = {c₁, c₂, ..., c_n}, where each c_i is a triple he_O, e0

O, ri :

– if the correspondence ci is simple, both eO and eO0 stand for one and only

one entity (i.e., a class or a property) (1:1);

– if the correspondence is complex, at least one of eO or eO0 involves one or

more entities in a logical formulation. The correspondence is therefore (1:n), (m:1) or (m:n), where eO refers to a subset of elements ∈ O, and eO0 refers

to a subset of elements ∈ O0. The elements of eO, resp. eO0 form a logical

construction using the constructors of a formal language (First-Order Logic or Description Logics);

– r is a relation, e.g., equivalence (≡), more general(w), more specific (v), holding between eO and eO0;

The alignment AO→O0 is said complex if it contains at least one complex

correspondence. A correspondence ci, can also be noted eO r eO0, as for the

complex correspondences in the following.

Chairman ≡ Demo_Chair t OC_Chair t P C_Chair t Session_Chair t T utorial_Chair t W orkshop_Chair (1) Accepted_P aper ≡ P aper u ∃hasDecision.Acceptance (2) 1

(4)

Example 1 expresses a complex correspondence between entities from Cmt2

and Ekaw3 _{ontologies. It states that the concept Chairman of Cmt is}

equiv-alent to the union of the concepts Demo_Chair, OC_Chair, P C_Chair, Session_Chair, T utorial_Chair and W orkshop_Chair of Ekaw. Example 2 expresses a complex correspondence, where the concept Accepted_P aper of Ekaw is equivalent to the concept P aper of Cmt for which the domain of the ob-ject property hasDecision is restricted to an individual of the type Acceptance. For representing complex correspondences, the EDOAL language has been proposed [6, 3]. It is fit to express simple and complex matching of cardinality (1:1), (1:n), (n:1) and (n:m). The entities eO and eO0 are represented by

expres-sions (class, relation or property expresexpres-sions) that can be an ID (or IRI), a con-struction or a restriction. For a detailed description of EDOAL syntax the reader can refer to [6]. We illustrate this syntax with the following examples of 1:n com-plex correspondences given above. We use the prefixes ekaw:<http://ekaw#>, cmt:<http://cmt#>. Example 1 presents a class expression involving a class construction built with a union of concepts, while Example 2 expresses a class expression involving an attribute domain restriction.

Example 1 <entity1> <edoal:Class rdf:about="&cmt;Chairman"/> </entity1> <entity2> <edoal:Class> <edoal:or rdf:parseType="Collection"> <edoal:Class rdf:about="&ekaw;Demo_Chair"/> <edoal:Class rdf:about="&ekaw;OC_Chair"/> <edoal:Class rdf:about="&ekaw;PC_Chair"/> <edoal:Class rdf:about="&ekaw;Session_Chair"/> <edoal:Class rdf:about="&ekaw;Tutorial_Chair"/> <edoal:Class rdf:about="&ekaw;Workshop_Chair"/> </edoal:or> </entity2> <measure rdf:datatype="&xsd;float">1.0</measure> <relation>Equivalence</relation> Example 2 <entity1> <edoal:Class rdf:about="&ekaw;Accepted_Paper"/> </entity1> <entity2> <edoal:Class> <edoal:and rdf:parseType="Collection"> <edoal:Class rdf:about="&cmt;Paper"/> <edoal:AttributeDomainRestriction> <edoal:onAttribute> <edoal:Relation rdf:about="&cmt;hasDecision"/> </edoal:onAttribute> <edoal:class> <edoal:Class rdf:about="&cmt;Acceptance"/> </edoal:class> </edoal:AttributeDomainRestriction> </edoal:and> </edoal:Class> </entity2> <measure rdf:datatype="&xsd;float">1.0</measure> <relation>Equivalence</relation>

Additional examples of correspondences expressed in EDOAL are presented in examples 3 and 4. Example 3 shows a correspondence where the relation writtenBy of Ekaw is equivalent to a relation expression constructed with the inverse of the relation writeP aper in Cmt. Example 4, involving the ConfOf4

ontology (prefix confOf:<http://confOf#>), gives an example of a class ex-pression constructed with an attribute value restriction stating that in Ekaw an Early-Registered_P articipant is a P articipant for which the value of the data property earlyRegistration is equal to true in ConfOf.

2_{http://oaei.ontologymatching.org/2015/conference/data/cmt.owl} 3

http://oaei.ontologymatching.org/2015/conference/data/ekaw.owl 4

(5)

Example 3 <entity1> <edoal:Relation rdf:about="&ekaw;writtenBy"/> </entity1> <entity2> <edoal:Relation> <edoal:inverse> <edoal:Relation rdf:about="&cmt;writePaper"/> </edoal:inverse> </edoal:Relation> </entity2> <measure rdf:datatype="&xsd;float">1.0</measure> <relation>Equivalence</relation> Example 4 <entity1> <edoal:Class rdf:about="&ekaw;Early-Registered_Participant"/> </entity1> <entity2> <edoal:Class> <edoal:and rdf:parseType="Collection"> <edoal:Class rdf:about="&confOf;Participant"/> <edoal:AttributeValueRestriction> <edoal:onAttribute> <edoal:Property rdf:about="&confOf;earlyRegistration"/> <edoal:onAttribute> <edoal:comparator rdf:resource="&edoal;equals"/> <edoal:value>

<edoal:Literal edoal:type="xsd;boolean" edoal:string="true"/> </edoal:value> </edoal:AttributeValueRestriction> </edoal:and> </edoal:Class> </entity2> <measure rdf:datatype="&xsd;float">1.0</measure> <relation>Equivalence</relation>

3 Related Work

A naive approach for rewriting SPARQL queries consists in replacing the IRI of an entity of the initial query by the corresponding IRI in the alignment, using simple correspondences. This approach is integrated in the Alignment API [3]. However, it does not take into account the specific kind of relation expressed in the correspondence (e.g., generalisation or specialization). The approach in Euzenat et al. [4] aims at writing CONSTRUCT SPARQL queries from complex alignments. A new knowledge base expressed with the source ontology vocabu-lary is populated with the instances of the target knowledge base. A rewriting approach not limited to queries of type CONSTRUCT and that takes advantage of complex (1:n) alignments has been proposed by Correndo et al. [1]. It applies a declarative formalism for expressing alignments between RDF graphs. In [2], a subset of EDOAL expressions are transformed into a set of rewriting rules. The expressions involving the restrictions on concepts and properties and the restrictions on property occurrences and values are not featured in the rewriting rules. Makris et al. [9, 8] present the SPARQL-RW rewriting framework that ap-plies a set of predefined rules for (complex) correspondences. They define a set of correspondence types on which the rewriting process is based (i.e., Class Ex-pression, Object Property ExEx-pression, Datatype Property, and Individual ). Zheng et al. [14] propose a rewriting algorithm that serves the purpose of context (i.e, units of measure) interchange for interoperability. Finally, Gillet et al. [7] pro-pose an approach for rewriting query patterns that describe query families, using complex alignments. In this paper, we propose a set of rules for automatically rewriting SPARQL queries based on complex alignments. Differently from [4], our approach rewrites SPARQL queries instead of writing them from a com-plex alignment. Unlike [2], the proposed mechanism can handle restrictions on concepts and relations. In comparison with the [9] approach, EDOAL is an al-ternative to represent alignments in a more expressive (thus complete) way. For

(6)

instance, we propose occurrence and property datatype restrictions translation rules. However, our approach is limited to (1:n) complex alignments and does not handle initial SPARQL queries containing filters, unions, or other SPARQL options. The approach is based on the assumption that the queries to be trans-formed aim at retrieving new instances to meet a certain need. This is why only T box elements are taken into account. [14] focuses on context correspondences while our approach intends to translate all T box elements of a query. Finally, the proposal of [7] rewrites query patterns while we are interested in rewriting SPARQL queries, in a different level of abstraction. Although our approach relies on complex correspondences, their generation is out of the scope of this paper. The reader can refer to [13, 11] on the generation of complex correspondences based on patterns, linguistic approaches [12], or query mining [10].

4 SPARQL queries reformulation approach

Our approach focuses on the reformulation of a subset of SELECT SPARQL queries. We consider initial queries, which are to be rewritten, of the type:

QO = SELECT DIST IN CT ? ( V ar + |0∗0 ) W HERE { TQO }

where V ar corresponds to the set of variables used as projection attributes and TQO stands for the query pattern made of triples expressed using the source

ontology O. A triple t of TQO is composed of a subject s, a predicate p and an

object o. ∀t ∈ TQO, t = hs, p, oi. We only consider triples where s is a variable.

The purpose of our approach is to produce the set TQO0 that contains the

triples expressed according to entities of the ontology O0, from TQO, by using the

complex alignment AO→O0. The approach is limited to complex correspondences

(1:n) establishing an equivalence relation between entities of same nature. Such correspondences involve EDOAL expressions, as follows :

We also make the assumption that the alignment is complete and covers all the correspondences required to transform the entities of TQO. We define rules

that take the set of triples TQO as input and generate a SPARQL query. Three

types of triples in TQO are considered : Class Object Triples, Predicate Triples

and Other Triples.

Algorithm 1 depicts the SPARQL query rewriting process. The rewrite-ClassObject and rewritePredicate functions apply the rules described in the following sections. These functions are recursive and can call each other. If a triple is not a Class Object Triples or a Predicate Triples, it means that its sub-ject s is a variable, its predicate is an obsub-ject property or a data property for which no correspondence is needed and its object is either a literal or a variable. This kind of triple does not need any transformation and is directly added to the final query. An example of such triple is ?s rdfs:label "a literal".

(7)

Algorithm 2 Rewriting mechanism process

new_query ← " "

for all triple t = hs, p, oi in query do if t is a Class Object Triple then

new_query ← new_query + rewriteClassObject(s,p, oO0)

else if t is a Predicate Triple then

new_query ← new_query + rewritePredicate(s,pO0, o)

else

new_query ← new_query + t end if

end for

return new_query

4.1 Class Object Triples

Class object triples, denoted T_QClass_O , are structured as

∀t ∈ TClass QO , t = hs, p, oOi , where          s is a variable p is rdf:type oO is a ClassID ∃ < oO, oO0, ≡> ∈ A_O→O0

A class triple is identified if its object oO is a ClassID and if there is a

corre-spondence linking oO to a class expression oO0 in A_O→O0. In the transformation

of a class triple t, its subject s and its predicate p remain the same. Only its ob-ject oO is transformed according to its equivalent element oO0 in the alignment.

The transformation rules of the rewriteClassObject function depend on the nature of the expression oO0, as follows:

1. ClassID : The expression oO0 is aClassID. The transformed triple return by

the function is: s p IRI(oO0).

2. ClassConstruction: oO0 is a class construction between two or more class

expressions denoted by: e1

O0, e2_O0, en_O0. The transformation rule depends on

the construction operator.

(a) AND: transforming an intersection consists in rewriting each triplet hav-ing as subject s, as predicate p and as object a distinct ei

O0 expression:

rewriteClassObject(s, p, e1

O0) + rewriteClassObject(s, p, e2_O0) +

... + rewriteClassObject(s, p, en_O0)

(b) OR: transforming a union consists in using the SPARQL keyword “UNION” between the rewriting of each triplet having as subject s, as predicate p and as object a distinct ei_O0:5

{ + rewriteClassObject(s, p, e1_O0) + } UNION { +

rewriteClassOb-ject(s, p, e2_O0) + } +...+ UNION { + rewriteClassObject(s, p, en_O0) +}

Table 1 shows an example of query rewriting based on the complex cor-respondence of Example 1, involving a class construction with OR. 5

(8)

Query for Cmt Transformed query for Ekaw SELECT ?z WHERE { ?z rdf:type cmt:Chairman. } SELECT ?z WHERE { {?z rdf:type ekaw:Demo_Chair. } UNION {?z rdf:type ekaw:OC_Chair. } UNION {?z rdf:type ekaw:PC_Chair. } UNION {?z rdf:type ekaw:Session_Chair. } UNION {?z rdf:type ekaw:Tutorial_Chair. } UNION {?z rdf:type ekaw:Workshop_Chair. }} Table 1. Transformation of a class triple based on the correspondence of Example 1 between a classID and a class construction using the OR rule.

(c) NOT: finding the negation of a class expression consists in finding the set of triples hs, p, vi, where v is an intermediate variable, and from which the triples hs, p, e1

O0i are removed :

s p v . MINUS { + rewriteClassObject(s, p, e1 O0) + }

3. ClassRestriction : Restriction on class expressions takes into account relation or property expressions noted relation(oO0) or property(o_O0). The

transfor-mation of the class triple depends on the nature of the restriction:

(a) TypeRestriction: this restriction applies to a property expression stated in oO0 that limits the datatype of the property to a given type. The

trans-formation rule consists in using an intermediate variable v that becomes the object of a new triple (that will keep on being rewritten according to the nature of property(oO0)). The type restriction is applied to v with

the use of a SPARQL FILTER and the datatype(v) function:

rewritePredicate(s, property(oO0), v) + FILTER (datatype(v) = type)

(b) DomainRestriction: this restriction limits the range of a relation expres-sion stated in oO0 to a class expression range(o_O0) also stated in o_O0.

The rewritePredicate function is called with the relation relation(oO0)

between the subject s and an intermediate variable v. The rewrite-ClassObject function is called to assert that v is an instance of the range(oO0) class expression : rewritePredicate(s, relation(o_O0), v) +

rewriteClassObject(v, rdf:type, range(oO0))

Table 2 presents a query transformation example based on the corre-spondence of Example 2 involving this kind of restriction.

Query for Ekaw Transformed query for Cmt SELECT ?z WHERE { ?z rdf:type ekaw:Accepted_Paper. } SELECT ?z WHERE { ?z rdf:type cmt:Paper. ?z cmt:hasDecision ?var_temp. ?var_temp rdf:type cmt:Acceptance. } Table 2. Transformation of a class triple based on the correspondence on Example 2 between a classID and a class expression using the DomainRestriction rule.

(c) ValueRestriction: this restriction applies to a relation or property ex-pression. The rewritePredicate function is called between the subject s, the relation(oO0) or property(o_O0) and an intermediate variable v.

(9)

To restrain the values that can be taken by v, a SPARQL “FILTER” is used to compare v to a value given in the class expression. In the ac-tual implementation, the stated value value can only be a literal or an instance. The comparator cp used in the SPARQL FILTER is one of the comparators provided by the EDOAL syntax : “=”, “>” and “<”.

rewritePredicate(s, relation/property(oO0), v) + FILTER (v cp value)

where cp ∈ {=, <, >}. For a “=” comparator, the resulting query is not optimal in terms of performance. The rewriting rule exception could be : rewritePredicate(s, relation/property(oO0), value) instead of using an

intermediate variable and a FILTER that applies to it. Table 3 presents a transformation example based on the correspondence of Example 4.

Query for Ekaw Transformed query for ConfOf SELECT ?z WHERE { ?z rdf:type ekaw:Early-Registered_Participant. } SELECT ?z WHERE { ?z rdf:type confOf:Participant. ?z confOf:earlyRegistration ?var_temp. FILTER(?var_temp="true"ˆˆ xsd:boolean).} Table 3. Transformation of a triple using the correspondence of Example 4 between a class ID and a class expression using the ValueRestriction rule.

(d) AttributeOccurrenceRestriction: this restriction restrains the number of occurrences of a relation or a property expression. In order to count this number of occurrences, a SPARQL SELECT is imbricated to link the subject s to the count countv of an intermediate variable v. The

value of countv is calculated thanks to the SPARQL COUNT function.

The graph pattern in the imbricated SELECT is represented by the call of rewritePredicate(s, relation/property(oO0), v). After the imbricated

SELECT, a FILTER limits the value of countv to the restriction value

valrest with the comparator cp (both stated in the class restriction).

{{SELECT s (COUNT(v) AS countv) WHERE

{ + rewritePredicate(s, relation/property(oO0), v) + }

GROUP BY s.}

FILTER (comptev cp valrest)} , where cp ∈ {=, <, >}

Here, the resulting query could be optimized for a relation or a prop-erty occurring at least once (count > 0) . Instead of having an imbricated

SELECT, the rewriting rule could be: rewritePredicate(s, relation/property(oO0), v)

with v a temporary variable. 4.2 Predicate Triples

Predicate triples, denoted by T_QP redicate_O have the following structure :

∀t ∈ TP redicate QO t = hs, pO, oi , where          s is a variable pO = a RelationId or PropertyId

o is a variable, an instance or a literal ∃ < pO, pO0, ≡>∈ A_O→O0

(10)

In predicate triples, pO is either a RelationId or a PropertyId and pO0 is

respectively a relation expression or a property expression. A relation triple is transformed according to the nature of the expression pO0.

1. RelationId or PropertyId : the following triple is added to TQO0: s IRI(pO0) o.

2. RelationConstruction or PropertyConstruction: pO0 is a construction

be-tween relation or property expressions designated by p1_O0, p2_O0, pn_O0. The

transformation of the relation triple depends on the operator of the con-struction.

(a) AND: this construction can be between two or more expressions (relation expressions resp. property expressions). The rewritePredicate function is called as follows:

rewritePredicate(s, p1

O0, o) + rewritePredicate(s, p2_O0, o) + ...

+ rewritePredicate(s, pn O0, o)

(b) OR: this construction can be between two or more expressions (relation expressions resp. property expressions). A SPARQL UNION links the calls to rewritePredicate:

{ + rewritePredicate(s, p1_O0, o) +} UNION { + rewritePredicate(s, p2_O0, o)

+ } + ... + UNION { + rewritePredicate(s, pn_O0, o) + }

(c) NOT: the negation of a relation is the subset of all relations minus this relation. To represent all relations an intermediate variable v is intro-duced. The negation will be done using a SPARQL MINUS:

s v o . MINUS { + rewritePredicate(s, p1

O0, o) + }

(d) COMPOSE: a relation composition is a relation chain. Intermediate vari-ables v1, v2, etc. are introduced to complete the chain between the

sub-ject s and the obsub-ject o. If the relation expression pO0 is a

RelationExpres-sion, all the expressions of the chain will be relation expressions. If pO0

is a PropertyExpression, all the expressions of the chain will be relation expressions except the last one that will be a property expression. We assume that a composition imbrication or the use of a negation inside a composition is a modeling problem in the alignment itself.

rewritePredicate(s, p1_O0, v1) + rewritePredicate(v1, p2O0, v2)

+ ... + rewritePredicate(vn−1, pnO0, o)

(e) INVERSE : this construction only applies to a RelationExpression. In-verting a relation consists in switching its subject and its object in a triple. rewritePredicate(o, p1

O0, s) Table 4 gives an example of a triple

transformation based on the correspondence of Example 3.

(f) REFLEXIVE : this construction only applies to a RelationExpression. This operator is used to specify that a relation links its subject s to itself.

rewritePredicate (s, p1_O0, s)

(g) SYMMETRIC : this construction only applies to a RelationExpression. This operator is used to specify that a relation is used both ways : it is the intersection of a relation and its inverse.

rewritePredicate(s, p1_O0, o) + rewritePredicate(o, p1_O0, s)

3. RelationDomainRestriction or PropertyDomainRestriction: these restrictions limit the domain of a relation or property to a class expression domain(p1

O0)

(11)

rewriteClassObject(s, rdf:type, domain(pO0))

4. RelationCoDomainRestriction: this restriction restrains the range of a Rela-tionExpression to a class expression range(pO0).

rewriteClassObject(o, rdf:type, range(pO0))

5. PropertyTypeRestriction: this restriction limits the datatype of a property to a given type type in the property expressionO0. A SPARQL FILTER with

the datatype(o) function is used. FILTER (datatype(o) = type)

Query for Ekaw Transformed query for Cmt SELECT ?z WHERE {

?paper :writtenBy ?author. }

SELECT ?z WHERE{

?author cmt:writePaper ?paper. } Table 4. Transformed of a triple using the correspondence of Example 3 between a relation ID and a relation construction with the INVERSE constructor.

5 Validation

As far as we know, there is no available data set consisting of two knowledge bases, a complex and complete alignment between two ontologies and corre-sponding SPARQL queries for both bases. In the context of the OAEI oa4qa6_{, a}

data set involving simple alignments is available. In order to fill this gap, we have manually created two data sets, following the principle of the oa4qa task. The validation of our mechanism checks that the translated query retrieves the same results as the reference query. Although these data sets only contain a small number of queries, it serves as a basis for a first validation of our approach.

Knowledge bases and SPARQL queries. The first data set was built during a project aiming at collecting knowledge about plant taxonomy. To meet this need, the knowledge bases Agronomic Taxon7 and DBpedia have been con-sidered. The task consists of retrieving answers to the following queries:

– qa1: which are the taxa of type species ?

– qa2: which are the taxa having for higher taxonomic rank a family taxon ? – qa3: which are the taxa of taxonomical rank kingdom ?

– qa4: which are the taxa of taxonomical rank order ? – qa5: which are the taxa of taxonomical rank genus ?

In order to build this data set, reference SPARQL queries corresponding to the natural language description above have been written manually for each knowl-edge base. The same approach was followed to construct the second data set. It aims at interrogating a subset of the OAEI 2015 ontologies about conference or-ganization8. Three ontologies of this data set were considered (Cmt, ConfOf and Ekaw). We have defined the SPARQL queries answering the following queries :

– qb1: which are the reviewers of accepted papers ? (Ekaw to Cmt) 6_{http://oaei.ontologymatching.org/2015/}

7

http://ontology.irstea.fr/AgronomicTaxon 8

(12)

– qb2: which are authors of long submissions ? (Ekaw to Cmt)

– qb3: which are the chairmen who have submitted a paper ? (Cmt to Ekaw) – qc1: which are the early registered participants who authored a submitted

paper ? (Ekaw to ConfOf)

– qc2: which are the late registered participants who wrote a poster ?(Ekaw to ConfOf)

The three ontologies were populated with instances meeting these needs. Si-multaneously, the queries were transformed into SPARQL queries specifically written for each of the knowledge bases.

Complex alignments. 10 complex correspondences (and 1 simple) have been manually produced between Agronomic Taxon and DBpedia. 8 simple and 6 complex correspondences have been manually produced between the three ontologies of the Conference data set. The alignments are available online9_.

Discussion. Our validation is based on the manual comparison of the set of results returned from the automatically rewritten query with respect to the results of the reference query. Even though the reference query and the rewrit-ten one differ in terms of syntax, they retrieve the same set of instances. For example, Table 5 shows the queries considered for the need qc1 described above. The initial SPARQL query for Ekaw was transformed by using the complex cor-respondences of Example 4 and the simple correspondence ekaw : authorOf ≡ conf Of : writes. As stated above, although the generated query is not syntac-tically equivalent to the reference query for ConfOf, they retrieve the same set of instances. The whole set of rewritten queries is available online10.

Initial query for Ekaw (a) Reference query for ConfOf Generated query for ConfOf

SELECT ?person WHERE { ?person :authorOf ?paper. ?paper a :Paper.

?person rdf:type

:Early-Registered_Participant.}

SELECT ?person WHERE{ ?person

:earlyRegistration true. ?person :writes ?papier. ?papier a :Paper.}

SELECT ?person WHERE { ?person :writes ?paper. ?paper a :Paper.

?person rdf:type :Participant. ?person :earlyRegistration ?v_temp0.

FILTER(?v_temp0 = "true"ˆˆxsd:boolean).} Table 5. Transformation of an initial query in comparison to its reference.

6 Conclusion and perspectives

In this paper, we have presented an approach to rewrite SELECT SPARQL queries formulated for a particular ontology to interrogate a knowledge base based on a second ontology using (1:n) complex correspondences. The proposed approach has been validated on two data sets manually created. There are many improvements to make to this mechanism. Indeed, the approach is limited to

9

https://www.irit.fr/recherches/MELODI/telechargements/alignements.zip 10

(13)

formatted queries composed of triples whose subject is a variable. Instance align-ments are not considered yet. We do not consider as well (n:m) correspondences. Proposals on complex graph pattern recognition in SPARQL queries would be interesting to take into account in order to address that matter. Another point is that we do not distinguish the kind of relation of a correspondence (subsump-tion and equivalence). Moreover, some EDOAL syntax of concepts have not been implemented, such as functions on literal (string concatenation, arithmetic op-erations, etc. that could be used in particular for value restrictions). Finally, property value restrictions is another EDOAL expression that was not imple-mented because it is more likely to be found in (n:m) correspondences. We plan to address all these points in future work.

References

1. Correndo, G., Salvadores, M., Millard, I., Glaser, H., Shadbolt, N.: SPARQL Query Rewriting for Implementing Data Integration over Linked Data. In: 1st Interna-tional Workshop on Data Semantics (DataSem 2010) (March 2010)

2. Correndo, G., Shadbolt, N.: Translating expressive ontology mappings into rewrit-ing rules to implement query rewritrewrit-ing. In: 6th Workshop on Ontology Matchrewrit-ing (2011)

3. David, J., Euzenat, J., Scharffe, F., Trojahn, C.: The Alignment API 4.0. Semantic Web 2(1), 3–10 (2011)

4. Euzenat, J., Polleres, A., Scharffe, F.: Processing ontology alignments with sparql. In: International Conference on Complex, Intelligent and Software Intensive Sys-tems. pp. 913–917 (2008)

5. Euzenat, J., Shvaiko, P.: Ontology Matching. Springer-Verlag, Berlin, Heidelberg (2007)

6. Euzenat, J., Scharffe, F., Zimmermann, A.: Expressive alignment language and implementation. Tech. rep., INRIA (2007), http://hal.inria.fr/hal-00822892/ 7. Gillet, P., Trojahn, C., Haemmerlé, O., Pradel, C.: Complex correspondences for

query patterns rewriting. In: 8th Workshop on Ontology Matching (2013) 8. Makris, K., Bikakis, N., Gioldasis, N., Christodoulakis, S.: SPARQL-RW:

transpar-ent query access over mapped RDF data sources. In: 15th International Conference on Extending Database Technology. pp. 610–613. ACM (2012)

9. Makris, K., Gioldasis, N., Bikakis, N., Christodoulakis, S.: Ontology Mapping and SPARQL Rewriting for Querying Federated RDF Data Sources. In: OTM Confed-erated International Conferences. pp. 1108–1117 (2010)

10. Qin, H., Dou, D., LePendu, P.: Discovering executable semantic mappings between ontologies. In: OTM International Conference. pp. 832–849 (2007)

11. Ritze, D., Meilicke, C., Sváb-Zamazal, O., Stuckenschmidt, H.: A pattern-based ontology matching approach for detecting complex correspondences. In: 4th Work-shop on Ontology Matching (2009)

12. Ritze, D., Völker, J., Meilicke, C., Sváb-Zamazal, O.: Linguistic analysis for com-plex ontology matching. In: 5th Workshop on Ontology Matching (2010)

13. Scharffe, F., Fensel, D.: Correspondence patterns for ontology alignment. In: Knowledge Engineering: Practice and Patterns, pp. 83–92. Springer (2008) 14. Zheng, X., Madnick, S.E., Li, X.: SPARQL Query Mediation over RDF Data

Sources with Disparate Contexts. In: WWW Workshop on Linked Data on the Web (2012)