A Graph-Based Approach for Classifying OWL 2 QL Ontologies

(1)

A graph-based approach for classifying OWL 2 QL ontologies

^?

Domenico Lembo, Valerio Santarelli, and Domenico Fabio Savo Dipartimento di Ing. Informatica, Automatica e Gestionale “Antonio Ruberti”

SapienzaUniversit`a di Roma Via Ariosto 25, I-00186 Roma, Italy {lembo,santarelli,savo}@dis.uniroma1.it

Abstract. Ontology classification is the reasoning service that computes all subsumption relationships inferred in an ontology between concept, role, and attribute names in the ontology signature. OWL 2 QL is a tractable profile of OWL 2 for which ontology classification is polynomial in the size of the ontology TBox. However, to date, no efficient methods and implementations specifically tailored to OWL 2 QL ontologies have been developed. In this paper, we provide a new algorithm for ontology classification in OWL 2 QL, which is based on the idea of encoding the ontology TBox into a directed graph and reducing core reasoning to computation of the transitive closure of the graph. We have implemented the algorithm in theQuOntoreasoner and extensively evaluated it over very large ontologies. Our experiments show thatQuOntooutperforms various popular reasoners in classification of OWL 2 QL ontologies.

1 Introduction

Ontology classification is the problem of computing all subsumption relationships inferred in an ontology between predicate names in the ontology signature, i.e., named concepts (a.k.a. classes), roles (a.k.a. object-properties), and attributes (a.k.a. data-properties). It is considered a core service for ontology reasoning, which can be exploited for various tasks, at both design-time and run-time, ranging from ontology navigation and visualization to query answering.

Devising efficient ontology classification methods and implementations is a challenging issue, since classification is in general a costly operation. Most popular reasoners for Description Logic (DL) ontologies, i.e., OWL ontologies, such as Pellet [23], Racer [11], FACT++ [24], and HermiT [9], offer highly optimized classification services for expressive DLs. Various experimental studies show that such reasoners have reached very good performances through the years. How- ever, they are still not able to efficiently classify very large ontologies, such as the full versions of GALEN [22] or of the FMA ontology [10].

Whereas the above tools use algorithms based on model construction through tableau (or hyper-tableau [9]), the CB reasoner [14] for the Horn-SHIQ DL is

?This paper is an extended abstract of [18].

(2)

a consequence-driven reasoner. The use of this technique allows CB to obtain an impressive gain on very large ontologies, such as full GALEN. However, the current implementation of the CB reasoner is rather specific for particular frag- ments of Horn-SHIQ (and incomplete for the general case) [14]. For example, it does not allow for classification of properties.

Other recently developed tools, such as Snorocket [17], ELK [15], and JCEL [20], are specifically tailored to intensional reasoning over logics of the ELfamily, and show excellent performances in classification of ontologies specified in such languages, which are the logical underpinning of OWL 2 EL, one of the tractable profile of OWL 2 [21].

Instead, to the best of our knowledge, ontology classification in the other OWL 2 profiles has received so far little attention. In particular, classification in OWL 2 RL has been investigated only in [16], whereas, to date, no techniques have been developed that are specifically tailored to intensional reasoning in OWL 2 QL, the “data oriented” profile of OWL 2, nor for any logic of theDL-Lite family [7]¹, which constitutes the logical underpinning of OWL 2 QL. Our aim is then to contribute to fill this lack on OWL 2 QL, encouraged also by the fact that such language, like all logics of theDL-Litefamily, allows for tractable intensional reasoning, and in particular for PTimeontology classification, as it immediately follows from the results in [7].

In this paper, we thus provide a new method for ontology classification in the OWL 2 QL profile. In our technique, we encode the ontology terminology (TBox) into a graph, and compute the transitive closure of the graph to then obtain the ontology classification. The analogy between simple inference rules in DLs and graph reachability is indeed very natural: consider, for example, an ontology containing the subsumptions A1 v A2 and A2 v A3, where A1, A2, andA3are class names in the ontology signature. We can then associate to this ontology a graph having three nodes labeled withA1,A2, andA3, respectively, an edge fromA1 toA2and an edge fromA2 toA3. It is straightforward to see thatA₃is reachable fromA₁, and therefore an edge fromA₁toA₃is contained in the transitive closure of the graph. This corresponds to the inferred subsumption A₁vA₃. On the other hand, things become soon much more complicated when complex (OWL) axioms come into play.

In this respect, we will show that for an OWL 2 QL ontology it is possible to easily construct a graph whose closure constitutes the major sub-task in ontology classification, because it allows us to obtain all subsumptions inferred by the “positive knowledge” specified by the TBox. We will show that the com- puted classification misses only “trivial” subsumptions inferred by unsatisfiable predicates, i.e., named classes (resp. properties) that always have an empty interpretation in every model of the ontology, and that are therefore subsumed by every class (resp. property) in the ontology signature. We therefore provide an algorithm that, exploiting the transitive closure of the graph, computes all unsatisfiable predicates, thus allowing us to obtain a complete ontology classi-

1 Not to be confused with the set of DLs studied in [2], which form theDL-Litebool

family.

(3)

fication. We notice that the presence of unsatisfiable predicates in an ontology is mainly due to errors in the design. However, it is not rare to find such predicates, especially in very large ontologies or in ontologies that are still “under construction”. In particular, we could find unsatisfiable concepts even in some benchmark ontologies we used in our experiments (cf. Section4). Of course, already debugged ontologies might not present such predicates [13,12]. In this case, one can avoid executing our algorithm for computing unsatisfiable predicates.

We have implemented our technique in a new module of QuOnto[1], the reasoner at the base of theMastro[6,8] system, and have carried out extensive experimentation, focusing in particular on very large ontologies. We have considered a number of well-known ontologies, often used as benchmark for ontology classification, and have suitably approximated in OWL 2 QL those that are out of this language.

QuOntoshowed better performances, in some cases corresponding to enor- mous gains, with respect to tableau-based reasoners (in particular, Pellet, Fact++, and HermiT). We also obtained comparable or better results with respect to the CB reasoner, for almost all ontologies considered, but, differently from CB reasoner, we were always able to compute a complete classification. We finally comparedQuOntowith ELK, one of the most performing reasoner for EL, for those approximated ontologies that turned out to be both in OWL 2 QL and OWL 2 EL, obtaining similar performances in almost all cases.

We conclude by noticing that, even though we refer here to OWL 2 QL, our algorithms and implementations can be easily adapted to deal with all logics of the DL-Lite family mentioned in [7], excluding those allowing for the use of conjunction in the left-hand side of inclusion assertions or the use of n-ary relations instead of binary roles.

The rest of the paper is organized as follows. In Section2, we provide some preliminaries. In Section3, we describe our technique for ontology classification in OWL 2 QL. In Section 4, we describe our experimentation, and finally, in Section5, we conclude the paper.

2 Preliminaries

In this section, we present some basic notions on DL ontologies, the formal underpinning of the OWL 2 language, and on OWL 2 QL. We also recall some notions of graph theory needed later on.

Description Logic Ontologies.We consider a signatureΣ, partitioned in two disjoint signatures, namely, Σ_P, containing symbols for predicates, i.e., atomic concepts, atomic roles, atomic attributes, and value-domains, andΣ_C, containing symbols for individual (object and value) constants. Complex concept, role, and attribute expressions are constructed starting from predicates ofΣ_P by applying suitable constructs, which vary in different DL languages. Given a DL language L, an L-TBox (or simply a TBox, whenL is clear) overΣ contains universally quantified first-order (FOL) assertions, i.e., axioms specifying general properties of concepts, roles, and attributes. Again, different DLs allow for different axioms.

(4)

An L-ABox (or simply an ABox, when L is clear) is a set of assertions on individual constants, which specify extensional knowledge. An L-ontologyO is constituted by both anL-TBox T and anL-ABoxA, denoted asO=hT,Ai.

The semantics of a DL ontologyO is given in terms of FOL interpretations (cf. [4]). We denote withMod(O) the set of models ofO, i.e., the set of FOL- interpretations that satisfy all TBox axioms and ABox assertions in O, where the definition of satisfaction depends on the DL language in whichOis specified.

An ontologyOissatisfiable ifMod(O)6=∅. A FOL-sentenceφisentailed by an ontologyO, denotedO |=φ, ifφis satisfied by every model inMod(O). All the above notions naturally apply to a TBoxT alone.

Traditional intensional reasoning tasks with respect to a given TBox are verification of subsumption and satisfiability of concepts, roles, and attributes [4].

More precisely, a concept C1 is subsumed in T by a conceptC2, written T |= C₁ v C₂, if, in every model I of T, the interpretation of C₁, denoted C₁Î, is contained in the interpretation of C₂, denotedC₂Î, i.e., C₁Î ⊆C₂Î for every I ∈ Mod(T). Furthermore, a concept C in T is unsatisfiable, which we wrote as T |=Cv ¬C, if the interpretation ofCis empty in every model ofT, i.e.,CÎ =∅ for everyI∈Mod(T). Analogous definitions hold for roles and attributes.

Strictly related to the previous reasoning tasks is the classification inference service, which we focus on in this paper. Given a signature ΣP and a TBox T overΣP, such a service allows to determine subsumption relationships inT between concepts, roles, and attributes in ΣP. Therefore, classification allows to structure the terminology of T in the form of a subsumption hierarchy that provides useful information on the connection between different terms, and can be used to speed up other inference services. Here we define it more formally.

Definition 1. Let T be a satisfiable L-TBox over ΣP. We define the T- classification ofΣP (or simply T-classification whenΣP is clear from the con- text) as the set of inclusion assertions defined as follows:

LetS1 andS2 be either two concepts, roles, or attributes in ΣP. If T |=S1vS2 thenS1vS2 belongs to theT-classification ofΣP.

The OWL 2 QL Language.The OWL 2 QL language is based onDL-Lite_R, a DL of theDL-Litefamily [7]. Differently fromDL-Lite_R, however, besides object properties (i.e., roles), OWL 2 QL allows also for the use of data properties (i.e., attributes), as well as some further constructs, as (ir-)reflexivity on properties. For the sake of presentation, we prefer to not consider here attributes, nor (ir-)reflexivity constraints. This choice does not actually correspond to a real simplification, since in the algorithms proposed in this paper we can treat both attributes and roles essentially in the same way, and our techniques can be applied to full OWL 2 QL ontologies with minimal adaptations. Therefore, in the following, we provide a simplified, German style, syntax for OWL 2 QL, which actually corresponds to that ofDL-LiteR, whereas refer the reader to [21]

for the complete, OWL functional-style syntax of this language².

2 Notice that (a)symmetric roles allowed in OWL 2 QL, even though not explicitly mentioned, can be easily expressed in the syntax that we consider.

(5)

Expressions in OWL 2 QL are formed according to the following syntax:

B −→A | ∃Q Q−→P | P⁻ C −→B | ¬B | ∃Q.A R −→Q | ¬Q

where: A and P are symbols in ΣP denoting respectively an atomic concept and an atomic role; P⁻ denotes the inverse of P; ∃Q, also called unqualified existential role, denotes the set of objects related to some object by the roleQ;

the concept ∃Q.A, or qualified existential role, denotes the qualified domain of Q with respect toA, i.e., the set of objects that Qrelates to some instance of A. In the following, we call B abasic concept, and Qa basic role.

An OWL 2 QL TBox T is a finite set of axioms of the form B v C and QvR, where the former denote subsumptions between concepts, and the latter subsumptions between roles. We callpositive inclusionsaxioms of the formB1v B2, B1 v ∃Q.A, and Q1 v Q2, and negative inclusions axioms of the form B₁v ¬B2 andQ₁v ¬Q2.

The semantics of OWL 2 QL ontologies and TBoxes is given in the standard way [21,4].

As for OWL 2 QL ABoxes, we do not present them here, since we concentrate on intensional reasoning, and refer the interested reader to [21].

Graph Theory Notions.In this paper we use the term digraph to refer to a directed graph. We assume that a digraphGis a pair (N,E), whereN is a set of elements callednodes, andEis a set of ordered pairs (s, t) of nodes inN, called arcs, where sis denoted thesource of the arc, andt thetarget of the arc.

The transitive closureG^∗ = (N,E^∗) of a digraph G = (N,E) is a digraph such that there is an arc inE^∗having a node sas source and a nodet as target if and only if there is a path from stot in G [5]. Let G= (N,E) be a digraph, and letnbe a node inN. We denote withpredecessors(n,G) the set of nodespn

in N such that there exists inE an arc (pn, n).

3 T -classification in OWL 2 QL

In this section we describe our approach to computing, given a signature Σ_P and an OWL 2 QL TBox T overΣ_P, theT-classification ofΣ_P.

In OWL 2 QL, a subsumption relation between two concepts or roles inΣ_P, can be inferred by a TBox T if and only if (i) T contains such subsumption;

(ii) T contains a set of positive inclusion assertions that together entail the subsumption; or (iii), trivially, the subsumed concept or role is unsatisfiable in T. The above observation is formalized as follows.

Theorem 1. LetT be an OWL 2 QL TBox containing only positive inclusions, and letS1andS2be two atomic concepts or two atomic roles.S1vS2is entailed by T if and only if at least one of the following conditions holds:

1. a setP of positive inclusions exists inT, such that P |=S1vS2; 2. T |=S₁v ¬S₁.

(6)

Given a OWL 2 QL TBox T over a signature Σ_P, we use Φ_T and Ω_T to denote two sets of positive inclusions of the form S₁ vS₂, with S₁, S₂ ∈ Σ_P, such thatΦ_T contains only positive inclusions for which statement1 holds, and Ω_T contains only positive inclusions for which statement2 holds. It is easy to see thatΦ_T andΩ_T are not disjoint. From Definition1and Theorem1it follows that theT-classification coincides with the union of the setsΦ_T andΩ_T.

In the following, we describe our approach to the computation of the T- classification by firstly computing the setΦ_T, and then computing the set Ω_T. Computation ofΦT. Given an OWL 2 QL TBoxT, in order to computeΦT, we encode the set of positive inclusions in T into a digraph GT and compute the transitive closure of GT in such a way that each subsumption S₁ vS₂ in Φ_T corresponds to an arc (S₁, S₂) in such transitive closure, and vice versa. The following constructive definition describes the appropriate fashion to obtain the digraph TBox representation for our aims.

Definition 2. Let T be an OWL 2 QL TBox over a signature ΣP. We call the digraph representation of T the digraph GT = (N,E)built as follows:

1. for each atomic conceptA inΣ_P,N contains the nodeA;

2. for each atomic role P inΣP,N contains the nodesP,P⁻,∃P,∃P⁻; 3. for each concept inclusionB1vB2∈ T,E contains the arc(B1, B2);

4. for each role inclusion Q₁ v Q₂ ∈ T, E contains the arcs (Q₁, Q₂), (Q⁻₁, Q⁻₂),(∃Q1 ,∃Q2), and(∃Q⁻₁,∃Q⁻₂);

5. for each concept inclusionB1v ∃Q.A∈ T,E contains the arc(B1,∃Q);

The idea is that each node in the digraph G_T represents a basic concept or a basic role, and each arc models a positive inclusion, i.e., a subsumption, contained inT, where the source node of the arc represents the left-hand side of the subsumption and the target node of the arc represents the right-hand side of the subsumption. Observe that for each role inclusion assertion P1 vP2 in the TBoxT, we also represent as nodes and arcs in the digraphGT the entailed positive inclusionsP₁⁻ vP₂⁻,∃P1v ∃P2, and ∃P₁⁻v ∃P₂⁻.

LetT be an OWL 2 QL TBox and letG_T = (N,E) be its digraph representation. We denote withG_T^∗ = (N,E^∗) the transitive closure ofG_T. Note that by definition of digraph transitive closure, for each node n∈ N there exists in E^∗ an arc (n, n). Moreover, in what follows, we denote withα(E^∗) the set of arcs (S1, S2) ∈ E^∗ such that both terms S1 and S2 denote in T either two atomic concepts or two atomic roles. Then, the following property holds.

Theorem 2. Let T be an OWL 2 QL TBox and letG_T = (N,E)be its digraph representation. Let S1 and S2 be two atomic concepts or two atomic roles. An inclusion assertionS1vS2 belongs toΦ_T if and only if there exists inα(E^∗)an arc (S1, S2).

We can then easily construct an algorithm, calledComputeΦ, that, taken as input an OWL 2 QL TBox T, first builds the digraph G_T = (N,E) according

(7)

Algorithm:computeUnsat Input:an OWL 2 QL TBoxT

Output:a set of concept and role expressions Emp← ∅;

foreachnegative inclusionS1v ¬S2∈ T do /* step 1 */

foreachn1∈predecessors(S1,GT^∗)do foreachn2∈predecessors(S2,G^∗T)do

if n1=n2

thenEmp←Emp∪ {n1};

if (n1 =∃Q⁻andn2=A)or(n2=∃Q⁻andn1=A) thenEmp←Emp∪ {∃Q.A};

Emp⁰← ∅;

whileEmp6=Emp⁰do /* step 2 */

Emp⁰←Emp;

foreachS∈Emp⁰ do

foreachn∈predecessors(S,GT^∗)do Emp←Emp∪ {n};

if n=P orn=P⁻orn=∃P orn=∃P⁻ thenEmp←Emp∪ {P, P⁻,∃P,∃P⁻};

if there existsBv ∃Q.n∈ T thenEmp←Emp∪ {∃Q.n};

returnEmp.

Fig. 1:The algorithmcomputeUnsat(T)

to Definition2, then computes its transitive closure, and finally returns the set Φ_T, which contains an inclusion assertionS₁vS₂for each arc (S₁, S₂)∈α(E^∗).

According to Theorem2, ComputeΦ is sound and complete with respect to the problem of computing Φ_T for any OWL 2 QL TBox T containing only positive inclusions.

Computation of Ω_T. We first observe that, according Definition 2, no node corresponding to a qualified existential role is created in the TBox digraph representation. This kind of node is indeed not useful for computingΦ_T. Differently, if one aims to identify every cause of unsatisfiability, the creation of nodes corresponding to a qualified existential role is needed. This is due to the fact that a TBox may entail that a qualified existential role ∃P.A is unsatisfiable, even in case of satisfiability of ∃P. Specifically, this may occur in two instances: (i) if the TBox T entails the assertion∃P⁻ v ¬A, and (ii), the TBox T entails Av ¬A. Clearly, in both cases the concept∃P.A is unsatisfiable. We therefore modify here Definition2 by substituting Rule5with the following one:

5^∗. for each concept inclusionB₁v ∃Q.A∈ T,N contains the node∃Q.A, and E contains the arcs (B₁,∃Q.A) and (∃Q.A,∃Q);

From now on, we adopt the digraph representation built according to Defini- tion2, where rule 5^∗replaces rule 5. Given one such TBoxT over a signatureΣ_P, the algorithmcomputeUnsatgiven in Figure1returns all unsatisfiable concepts and roles inΣP, by exploiting the transitive closure of the digraph representation ofT.

(8)

Before describing the algorithm, we recall that, given a digraphG = (N,E) and a node n∈ N, the setpredecessors(n,G^∗) contains all those nodesn⁰ inN such thatG^∗contains the arc (n⁰, n), which means that there exists a path fromn⁰ toninG. Also, it can be shown thatG_T^∗ allows in fact to obtain all subsumptions between satisfiablebasic concepts or roles, in the sense that the TBoxT infers one such subsumptionS1vS2if and only if there is an arc (S1, S2) inE^∗. Then, the two steps that compose the algorithm proceed as follows:

Step 1 Let S be either a concept expression or a role expression. We have that for each Sⁱ ∈ predecessors(S,G_T^∗) the TBox T entails Sⁱ v S.

Hence, given a negative inclusion assertion S₁ v ¬S2, for each S₁ⁱ ∈ predecessors(S₁,G_T^∗) and for each S₂^j ∈ predecessors(S₂,G_T^∗), T |= S₁ⁱ v

¬S₂^j. Therefore, for each negative inclusion S1 v ¬S2 ∈ T, the algorithm computes the set predecessors(S1,G_T^∗) and predecessors(S2,G_T^∗) and is able to: (i) recognize as unsatisfiable all those concepts and roles whose corresponding nodes occur in both the setpredecessors(S1,G^∗_T) and predecessors(S2,G_T^∗), and (ii) identify those unsatisfiable qualified existential roles ∃Q.A whose inverse existential role node ∃Q⁻ occurs in predecessors(S1,G_T^∗) (resp.predecessors(S2,G_T^∗)) and whose concept nodeA occurs inpredecessors(S2,G_T^∗) (resp.predecessors(S1,G_T^∗)), which indeed im- plies∃Q⁻v ¬Aand therefore unsatisfiability of∃Q.A.

Step 2 Further unsatisfiable concepts and roles are identified by the algorithm through a cycle in which: (i) if a concept or roleS is inEmp, then all the expressions corresponding to the nodes inpredecessors(S,G_T^∗) are inEmp. This captures propagation of unsatisfiability through chains of positive inclusions;

(ii) if at least one of the expressionsP, P⁻,∃P,∃P⁻is inEmp, then all four expressions are inEmp; (iii) for each expression ∃Q.A in N, if A ∈ Emp, then∃Q.A∈Emp. We notice that the algorithm stops cycling when no new expressions of the form∃Qor ∃Q.A are added toEmp(indeed, in this case only a single further iteration may be needed).

It easy to see that, by virtue of the fact that the size of the set N of the digraph representation of the TBox T is finite, computeUnsat(T) terminates, and that the number of executions of the while cycle is less than or equal to|N |.

The following theorem shows that algorithmcomputeUnsatcan be used for computing the set containing all the unsatisfiable concepts and roles inT. Theorem 3. Let T be an OWL 2 QL TBox and letS be either an atomic concept or an atomic role inΣP.T |=Sv ¬Sif and only if S∈computeUnsat(T).

We call ComputeΩ the algorithm that, taken T as input, returns Ω_T by making use ofcomputeUnsat.

The following theorem, which is a direct consequence of Theorem 2 and of Theorem3, states that our technique is sound and complete with respect to the problem of classifying an OWL 2 QL TBox.

Theorem 4. Let T be an OWL 2 QL TBox and let S₁ and S₂ be either two atomic predicates. T |= S1 v S2 if and only if S1 v S2 ∈ ComputeΦ(T)∪ ComputeΩ(T).

(9)

Ontology Concepts Roles Attributes Original DL fragment

Original axioms

Owl 2 QL axioms

Negative inclusions

Mouse 2753 1 0 ALE 3463 3463 0

Transportation 445 89 4 ALCH(D) 931 931 317

DOLCE 209 313 4 SHOIN(D) 1736 1991 45

AEO 760 47 16 SHIN(D) 3449 3432 1957

Gene 26225 4 0 SH 42655 42655 3

EL-Galen 23136 950 0 ELH 46457 48026 0

Galen 23141 950 0 ALEHIF+ 47407 49926 0

FMA 1.4 6488 165 0 ALCOIF 18612 18663 0

FMA 2.0 41648 148 20 ALCOIF(D) 123610 118181 0

FMA 3.2.1 84454 132 67 ALCOIF(D) 88204 84987 0

FMA-OBO 75139 2 0 ALE 119558 119558 0

Table 1:In the table the Original and OWL 2 QL axioms fields indicate respectively the total number of axioms in the original version of the ontology and in the OWL 2 QL-approximated version. The Negative inclusion field reports the number of negative inclusions in the OWL 2 QL-approximated version.

4 Implementation and Evaluation

By exploiting the results presented in Section3, we have developed a Java-based OWL 2 QL classification module for theQuOntoreasoner [1,6,8].

This module computes the classification of an OWL 2 QL TBoxT by adopt- ing the technique described in Section3. In this implementation the transitive closure of the digraphG_T is based on a breadth first search throughG_T. In the implementation we have considered all aspects of OWL 2 QL which were ignored in the theoretical discussion presented in the previous sections (see Section2).

We have performed comparative experiments, where QuOnto was tested against several popular ontology reasoners. Specifically, during our test we com- pared ourselves with the Fact++ [24], Hermit [9], and Pellet [23] OWL reasoners, and with the CB [14] HornSHIQreasoner, and with the ELK [15] reasoner for those ontologies that are also in OWL 2 EL.

The ontology suite used during testing includes twenty OWL ontologies, as- sembled from the TONES Ontology Repository³ and from other independent sources. The six reasoners exhibited negligible differences in performance for the majority of the smaller tested ontologies, so we will only discuss the ontologies which offered interesting results, meaning those on which reasoning times are significantly different for at least a subset of the reasoners.

These ontologies include: the Mouse ontology; the Transportation ontology⁴; the Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE) [19]; the Athletic Events Ontology (AEO)⁵; the Gene Ontology (GO) [3]; two versions of the GALEN ontology [22]; and four versions of the Foundational Model of Anatomy Ontology (FMA) [10].

BecauseQuOntois an OWL 2 QL reasoner, each benchmark ontology not in OWL 2 QL was preprocessed prior to classification in order to fit OWL 2 QL expressivity. Therefore, every OWL expression which cannot be expressed

3 http://owl.cs.manchester.ac.uk/repository/

4 http://www.daml.org/ontologies/409

5 http://www.boemie.org/deliverable d 3 5

(10)

Ontology QuOnto FaCT++ HermiT Pellet CB ELK

Mouse 0.156 0.282 0.296 0.179 0.159 0.246

Transportation 0.150 0.045 0.163 0.151 0.195 0.343

DOLCE 1.327 0.245 25.619 1.696 1.358 —

AEO 0.650 0.743 0.920 0.647 0.605 —

Gene 1.255 1.400 3.810 2.803 1.918 1.419

EL-Galen 2.788 109.835 7.966 50.770 2.446 1.205

Galen 4.600 145.485 34.608timeout 2.505 —

FMA 1.4 0.688timeout 93.781timeout 1.243 —

FMA 2.0 4.111out of memory out of memory timeout 7.142 —

FMA 3.2.1 4.146 4.576 11.518 24.117 4.976 —

FMA-OBO 4.827timeout 50.842 16.852 7.433 4.078

Table 2: Classification times of benchmark OWL 2 QL ontologies by QuOntoand other tested reasoners.

by OWL 2 QL axioms was approximated from the ontology specifications. This approximation follows this procedure: each axiom in the ontology is fed to an external reasoner, specifically Hermit, and every OWL 2 QL-compliant axiom that is implied from that axiom, between the ontology symbols that appear in it, is added to the OWL 2 QL-approximated ontology. For instance, the OWL assertion EquivalentClasses(ObjectUnionOf(:Male :Female) :Person) is approximated by the two assertions SubClassOf(:Male :Person) and SubClassOf(:Female :Person). Note that, as is the case in this example, the OWL 2 QL-approximated ontology may contain a greater number of axioms than the original ontology. Ta- ble 1 shows that the Mouse, Transportation, Gene, and FMA-OBO ontologies are in OWL 2 QL, and thus do not need approximation, while AEO and FMA 1.4 are subject to minimal changes by the approximation procedure.

During the tests for each reasoner, classification was performed on the OWL 2 QL-compliant versions of the ontologies resulting from the above described preprocessing. Metrics about the ontologies are reported in Table1.

All tests were performed on a DELL Latitude E6320 notebook with Intel Core i7-2640M 2.8Ghz CPU and 4GB of RAM, running Microsoft Windows 7 Premium operating system, and Java 1.6 with 2GB of heap space. Classification timeout was set at one hour, and aborting if maximum available memory was exhausted. All figures reported in Table 2are in seconds, and, because classification results are subject to minor fluctuation, particularly when dealing with large ontologies, are the average of 3 classifications of the respective ontologies with each reasoner. The following versions of the OWL reasoners were tested:

Fact++ v.1.5.3, HermiT v.1.3.6, Pellet v.2.3.0, CB v.12, and ELK v.0.3.2.

In our test configuration, the classifications of the FMA 2.0 ontology by the Hermit and FaCT++ reasoners terminate due to an out-of-memory error. In [9], classification of this ontology by the Hermit reasoner is performed successfully, but classification time far exceeds the one registered by QuOnto.

The results of the experiments are summarized in Table 2. These results confirm that the performance offered byQuOntocompares favorably to other reasoners for almost all tested ontologies. Classification for even the largest of the tested ontologies, i.e., the FMA-OBO and FMA 3.2.1 ontologies, is performed in under 5 seconds, and memory space issues were never experienced during

(11)

our tests with QuOnto. For some test cases, the gap in performance between QuOntoand other reasoners is sizeable: for instance, classification by Pellet of the Galen and FMA (1.4 and 2.0) and by FaCT++ of the FMA (1.4 and OBO) ontologies exceeds the predetermined timeout limit of one hour.

Detailed analysis of the results provided in Table2 shows that only the CB and ELK reasoners consistently display comparable performances toQuOnto, which is fastest for all ontologies which feature only positive inclusions, with the exception of the EL-Galen, Galen, and FMA-OBO ontologies. The CB reasoner, which is the best-performing reasoner for the Galen ontology, does not however always perform complete classification. For instance, it does not compute property hierarchies. The ELK reasoner instead is slower than QuOnto for three out of the five ontologies also in OWL 2 EL, showing instead markedly better performance for EL-Galen.

Furthermore, if, as it is usually the case, an ontology does not present unsatisfiable predicates, the computation of such predicates through the exploration of all negative inclusions can be avoided. This is the case for ontologies such as DOLCE and AEO, for which computation of the setΦ_T of positive inclusion assertions resulting from the transitive closure ofG_T is performed respectively in 0.347 and 0.384 seconds, fastest among tested reasoners. Instead, for ontologies such as Pizza and Transportation, which feature respectively 2 and 62 unsatisfiable atomic concepts, the identification of all such predicates is unavoidable, and the resulting set of trivial inclusion assertions must be added toΩ_T.

5 Conclusions

The research presented in this paper can be extended in various directions. First of all, in the implementation of our technique we have adopted anaive algorithm for computing the digraph transitive closure. We are currently experimenting more sophisticated and efficient techniques for this task. We are also working to optimize the procedure through which we identify unsatisfiable predicates.

Finally, we are working to extend our technique to compute all inclusions that are inferred by the TBox (which, in OWL 2 QL, are a finite number). In this respect, we notice that throughG_T^∗ it is already possible to obtain the classification of all basic concepts, basic roles, and attributes, and not only that of predicates in the signature, and that, with slight modifications ofcomputeUnsat, we can actually obtain the set of all negative inclusions inferred by an OWL 2 QL TBox. The remaining challenge is to devise an efficient mechanism to obtain all inferred positive inclusions involving qualified existential roles and attribute domains.

Acknowledgments. This research has been partially supported by the EU under FP7 project Optique (grant n. FP7-318338), and by the EU under FP7- ICT project ACSI (grant no. 257593).

(12)

References

1. A. Acciarri, D. Calvanese, G. D. Giacomo, D. Lembo, M. Lenzerini, M. Palmieri, and R. Rosati. QuOnto: Querying Ontologies. In M. Veloso and S. Kambhampati, editors,Proc. of AAAI 2005, pages 1670–1671. AAAI Press/The MIT Press, 2005.

2. A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. The DL-Lite family and relations. J. of Artificial Intelligence Research, 36:1–69, 2009.

3. M. Ashburner, C. Ball, J. Blake, D. Botstein, H. Butler, J. Cherry, A. Davis, K. Dolinski, S. Dwight, J. Eppig, et al. Gene Ontology: tool for the unification of biology. Nature genetics, 25(1):25, 2000.

4. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors.The Description Logic Handbook: Theory, Implementation and Applications.

Cambridge University Press, 2nd edition, 2007.

5. J. Bang-Jensen and G. Z. Gutin. Digraphs: Theory, Algorithms and Applications.

Springer, 2nd edition, 2008.

6. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, M. Rodriguez- Muro, R. Rosati, M. Ruzzi, and D. F. Savo. The MASTRO system for ontology- based data access. Semantic Web J., 2(1):43–53, 2011.

7. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: TheDL-Lite family.

J. of Automated Reasoning, 39(3):385–429, 2007.

8. G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, R. Rosati, M. Ruzzi, and D. F.

Savo. MASTRO: A reasoner for effective ontology-based data access. InProc. of ORE-2012, volume 858 ofCEUR,ceur-ws.org, 2012.

9. B. Glimm, I. Horrocks, B. Motik, R. Shearer, and G. Stoilos. A novel approach to ontology classification. J. of Web Semantics, 14:84–101, 2012.

10. C. Golbreich, S. Zhang, and O. Bodenreider. The foundational model of anatomy in OWL: Experience and perspectives. J. of Web Semantics, 4(3):181–195, 2006.

11. V. Haarslev and R. M¨oller. RACER system description. In R. Gor´e, A. Leitsch, and T. Nipkow, editors,Proc. of IJCAR 2001, volume 2083 ofLNCS, pages 701–706.

Springer, 2001.

12. Q. Ji, P. Haase, G. Qi, P. Hitzler, and S. Stadtm¨uller. RaDON - repair and diagnosis in ontology networks. In L. Aroyo, P. Traverso, F. Ciravegna, P. Cimiano, T. Heath, E. Hyv¨onen, R. Mizoguchi, E. Oren, M. Sabou, and E. P. B. Simperl, editors,Proc.

of ESWC 2009, volume 5554 of LNCS, pages 863–867. Springer, 2009.

13. A. Kalyanpur, B. Parsia, E. Sirin, and J. A. Hendler. Debugging unsatisfiable classes in OWL ontologies. J. of Web Semantics, 3(4):268–293, 2005.

14. Y. Kazakov. Consequence-driven reasoning for Horn SHIQ ontologies. In C. Boutilier, editor,Proc. of IJCAI 2009, pages 2040–2045. AAAI press, 2009.

15. Y. Kazakov, M. Kr¨otzsch, and F. Simancik. Concurrent classification ofEL ontologies. In L. Aroyo, C. Welty, H. Alani, J. Taylor, A. Bernstein, L. Kagal, N. F.

Noy, and E. Blomqvist, editors,Proc. of ISWC 2011, volume 7031 ofLNCS, pages 305–320. Springer, 2011.

16. M. Kr¨otzsch. The not-so-easy task of computing class subsumptions in OWL RL.

In P. Cudr´e-Mauroux, J. Heflin, E. Sirin, T. Tudorache, J. Euzenat, M. Hauswirth, J. X. Parreira, J. Hendler, G. Schreiber, A. Bernstein, and E. Blomqvist, editors, Proc. of ISWC 2012, volume 7649 ofLNCS, pages 279–294. Springer, 2012.

17. M. Lawley and C. Bousquet. Fast classification in Prot´eg´e: Snorocket as an OWL 2 EL reasoner. In T. Meyer, M. Orgun, and K. Taylor, editors, In Proc.

of AOW 2010, volume 122 ofCRPIT, pages 45–50. ACS, 2010.

(13)

18. D. Lembo, V. Santarelli, and D. F. Savo. Graph-based Ontology Classification in OWL 2 QL. InProc. of ESWC 2013, 2013. (to appear).

19. C. Masolo, S. Borgo, A. Gangemi, N. Guarino, A. Oltramari, and L. Schneider. The wonderweb library of foundational ontologies and the DOLCE ontology. Technical Report D17, WonderWeb, 2002.

20. J. Mendez, A. Ecke, and A. Turhan. Implementing completion-based inferences for theEL-family. InProc. of DL 2011, volume 745 ofCEUR,ceur-ws.org, 2011.

21. B. Motik, B. Cuenca Grau, I. Horrocks, Z. Wu, A. Fokoue, and C. Lutz.

OWL 2 Web Ontology Language – Profiles (2nd edition). W3C Recommenda- tion, World Wide Web Consortium, Dec. 2012. Available athttp://www.w3.org/

TR/owl2-profiles/.

22. J. Rogers and A. Rector. The GALEN ontology.Medical Informatics Europe (MIE 96), pages 174–178, 1996.

23. E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz. Pellet: A practical OWL-DL reasoner. J. of Web Semantics, 5(2):51–53, 2007.

24. D. Tsarkov and I. Horrocks. FaCT++ description dogic reasoner: System description. In U. Furbach and N. Shankar, editors,Proc. of IJCAR 2006, volume 4130 ofLNCS, pages 292–297. Springer, 2006.