Role-depth Bounded Least Common Subsumer in Prob-EL with Nominals

(1)

Role-depth bounded Least Common Subsumer in Prob-EL with Nominals

Andreas Ecke^1?, Rafael Pe˜naloza^1,2??, and Anni-Yasmin Turhan^{1? ? ?}

1 Institute for Theoretical Computer Science, Technische Universit¨at Dresden

2 Center for Advancing Electronics Dresden {ecke,penaloza,turhan}@tcs.inf.tu-dresden.de

Abstract. Completion-based algorithms can be employed for computing the least common subsumer of two concepts up to a given role-depth, in extensions of the lightweight DLEL. This approach has been applied also to the probabilistic DL Prob-EL, which is variant ofELwith subjective probabilities. In this paper we extend the completion-based lcs- computation algorithm to nominals, yielding a procedure for the DL Prob-ELO⁰¹c .

1 Introduction

The least common subsumer (lcs) is a reasoning service that generalizes a set of input concept descriptions into a single concept description that subsumes all the input concepts and is the least one w.r.t. subsumption. This inference has turned out to be quite useful for a number of applications, like the definition of similarity measures for concept descriptions [5, 7], the bottom-up construction of knowledge bases [4], information retrieval, and more (see [6, 16]).

In particular, for large biomedical ontologies the lcs can be used effectively to aid construction and maintenance. Many of these biomedical ontologies, notably SNOMED CT [12]and the Gene Ontology [1], are written in the EL-family of lightweight description logics.

An interesting extension of EL that still admits tractable reasoning is the use of nominals. Nominals basically allow to characterize a concept in terms of specific individuals. Nominals are also admitted in the OWL 2 EL profile of OWL [14] and thus are interesting to practical applications. For the EL-family of DLs there exist completion-based classification algorithms [3] that compute all the subsumption relations between all named concepts and nominals in an ontology in polynomial time. Kazakov et al. showed in [11] that the original completion algorithm is indeed incomplete forELO and introduced a complete

?Supported by DFG Graduiertenkolleg 1763 (QuantLA).

?? Partially supported by DFG within the Cluster of Excellence ‘cfAED’

? ? ?

Partially supported by the German Research Foundation (DFG) in the Collaborative Research Center 912 “Highly Adaptive Energy-Efficient Computing”.

(2)

consequence-based classification algorithm for this DL. This algorithm is a variant of the completion-based algorithm, with the additional benefit of exhibiting a pay-as-you-go behavior, and thus allows for efficient implementation.

Another extension ofEL is Prob-EL [13], which allows the modeling of uncertain knowledge by introducing probabilistic constructors. Prob-ELuses subjective (or Type-2 [10]) probabilities, which correspond to degrees of belief and are interpreted using a multiple-world semantics. For example, in Prob-ELone can express that obese people are likely to have high pressure, without requiring every obese person to be hypertense, using the GCI

ObesevP_≥0.9∃hasCondition.HighPressure.

A completion algorithm for classifying TBoxes in the sublanguage Prob-EL⁰¹_c of Prob-ELwas described in [13].

For generalEL-TBoxes with cyclic concept definitions, the lcs may not exist, as it may require an infinite nesting of existential restrictions to be expressed.

Therefore, an approximation has been introduced in [16], that limits the maximal nesting of quantifiers of the resulting concept descriptions. These approximations, called role-depth bounded lcs (k-lcs), can be computed forELwith role inclusions by using completion sets produced by the completion-based classification algorithm [8].

In this paper, we give a classification algorithm for the classic DLELOintro- duced in [11] in terms of the completion algorithm in order to extend thek-lcs to this DL. Furthermore, we extend the completion algorithm for (a moderate form of) subjective probabilities to nominals, which results in a classification algorithm for the DL Prob-ELO⁰¹_c . In this direction, we correct a small error from the algorithm in [13] that caused it to be incomplete. From this algorithm, we develop an algorithm that computes the k-lcs for Prob-ELO⁰¹_c . We show that for both lcs approximations, the resulting concept description subsumes all input concepts, and is the least one w.r.t. subsumption. Thus, if the exact lcs exists for some role-depth boundn, then the k-lcs is exact fork≥n. Recently, necessary and sufficient conditions for the existence of the lcs w.r.t. generalEL- TBoxes have been devised [17]. By the use of these conditions the k for which the role-depth bound lcs and the lcs coincide can be determined, if the lcs exists.

The paper is organized as follows. First, we introduce some basic notions.

In Section 3 we give a completion algorithm for EL with nominals and extend the k-lcs algorithm to this DL. The completion-based classification algorithm for Prob-ELO⁰¹_c is devised in Section 4, along with the algorithm for thek-lcs for this probabilistic DL. We end with conclusions and remarks on future work.

The main proofs of Section 4 have been deferred to the appendix.

2 Preliminaries

ELO-concept descriptions are built from mutually disjoint sets NC of concept names, NRofrole names andNI ofindividual names using the syntax rule:

C, D::=> |A| {a} |CuD| ∃r.C,

(3)

Table 1.Concept constructors and TBox axioms forELO.

Syntax Semantics

named concept A (A∈NC) A^I⊆∆^I

top concept > ∆^I

nominal {a} (a∈NI) {a}^I={a^I}

conjunction CuD (CuD)Î =CÎ∩DÎ

existential restriction ∃r.C (r∈NR) (∃r.C)Î ={d∈∆Î| ∃e.(d, e)∈rÎ∧e∈CÎ}

GCI CvD C^I ⊆D^I

whereA∈NC,r∈NRanda∈NI. As usual the semantics ofELO-concepts are defined by means of interpretations. The semantics of named concepts and roles are extended to concept descriptions as shown in the upper part of Table 1.

An ELO-TBox consists of a finite set of general concept inclusion axioms (GCIs) of the form C vD. We use C ≡D as an abbreviation forC vD and D vC. WithSig(T) we denote the signature of a TBoxT, i.e., the set of all concept, role, and individual names that occur inT. An interpretation is amodel for a TBox if it satisfies all its GCIs, as shown in the lower part of Table 1.

The central inference discussed in this paper is theleast common subsumer (lcs) of two concept descriptions, i.e., to compute a concept that subsumes both and is the least one w.r.t. subsumption. Since the lcs does not need to exist for generalEL-TBoxes [2], we follow the idea from [16] and compute an approximation of the lcs that limits the role depth (rd(C)), i.e., the maximal nesting of quantifiers of the resulting conceptC:

Definition 1 (Role-depth bounded lcs). Let L be a DL, T be an L-TBox and k∈IN. The role-depth bounded least common subsumerof twoL-concept descriptions C1, C2 w.r.t.T (written:k-lcs_T(C1, C2)) is theL-concept descrip- tionD s.t.:

1. rd(D)≤k,

2. C1vT D andC2vT D, and

3. for eachL-concept description E holds: C1v_T E andC2v_T E and rd(E)≤k, impliesDv_T E.

For the DLs considered in this paper, thek-lcs is unique up to equivalence for a givenk; thus we speak ofthe k-lcs.

3 The k-lcs in ELO

The algorithms to compute the role-depth bounded lcs are based on completion- based classification algorithms for the corresponding DL. ForELO, a consequence- based classification algorithm is given by Kazakov et al. in [11], building upon the incomplete completion algorithm developed in [3]. The completion algorithm presented next adapts the ideas of the complete algorithm.

(4)

3.1 Completion Algorithm for Classification of ELO-TBoxes

The completion algorithms works on normalized TBoxes. We define forELOthe set ofbasic concepts for a TBoxT as

BCT = (Sig(T)∩(NC∪NI))∪ {>}.

AnELO-TBoxT is innormal form if every GCI contained inT is of one of the forms:

AvB, A1uA2vB, Av ∃r.B, or∃r.AvB,

withA, A₁, A₂, B∈BC_T. AllELO-TBoxes can be transformed into normal form in linear time by applying a set of normalization rules similar to those given in [3].Before describing the completion algorithm in detail, we introduce the reachability relation R, which plays a fundamental role in the correct treatment of nominals [3, 11].

Definition 2 ( R).LetT be anELO-TBox in normal form,G∈N_Ca concept name, and D ∈BC_T.G _RD iff there exist roles r₁, . . . , r_n and basic concepts A₀, . . . , A_n, B₀, . . . , B_n ∈ BC_T, n ≥ 0 such that A_i vT B_i for all 0 ≤i ≤n, B_i−1v ∃r_i.A_i∈ T for all1≤i≤n,A₀ is eitherGor a nominal, andB_n =D.

Informally, the concept name D is reachable from G if there is a chain of existential restrictions leading to D that starts either with G or with a nominal.

This implies that, forG RD, if the interpretation ofGis not empty, then the interpretation ofDcannot be empty either. This, in turn can cause equivalence of concepts, e.g.|A^I|>0 andAv {b}impliesA≡ {b}.

The basic idea of the completion algorithm forEL (without nominals) is to store all basic concepts that subsume a conceptA∈(Sig(T)∩N_C)∪ {>}in its subsumer setS(A) and and all basic conceptsB for which∃r.B subsumesAin the subsumer setS(A, r). These completion sets are then extended using a set of completion rules. However, with nominals the algorithm needs to keep track of completion sets of the form S^G(A) and S^G(A, r) for every G ∈ (Sig(T)∩ N_C)∪ {>}, since the non-emptiness of an interpretation of a concept G may imply additional subsumption relationships for A. The completion set S^G(A) for A ∈ BC_T therefore stores all basic concepts that subsume A under the assumption that G is not empty. Similarly S^G(A, r) stores all concepts B for which ∃r.B subsumes Aunder the same assumption. For every G∈(Sig(T)∩ NC)∪ {>}, every basic concept A and every role name r, the completion sets are initialized as S^G(A) = {A,>} and S^G(A, r) = ∅. The completion sets are then extended by applying the completion rules adapted from [11] and shown in Figure 1 exhaustively.

It can be shown that the algorithm terminates in polynomial time, and is sound and complete for classifying the TBoxT [11]. In particular, if no rules are applicable the completion sets have the following properties.

Proposition 1 ([11]). Let T be an ELO-TBox in normal form to which the completion rules have been applied exhaustively, C, D∈BC_T,r∈Sig(T)∩N_R, and G = C if C ∈ NC and G = > otherwise. Then, the following properties hold:

(5)

OR1 IfA1∈S^G(A),A1vB∈ T andB6∈S^G(A), thenS^G(A) :=S^G(A)∪ {B}

OR2 IfA1, A2 ∈S^G(A),A1uA2vB∈ T andB6∈S^G(A), thenS^G(A) :=S^G(A)∪ {B}

OR3 IfA1∈S^G(A),A1v ∃r.B∈ T andB6∈S^G(A, r), thenS^G(A, r) :=S^G(A, r)∪ {B}

OR4 IfB∈S^G(A, r),B1∈S^G(B),∃r.B1 vC∈ T andC6∈S^G(A), thenS^G(A) :=S^G(A)∪ {C}

OR5 If{a} ∈S^G(A1)∩S^G(A2),G RA2, andA2∈/S^G(A1), thenS^G(A1) :=S^G(A1)∪ {A2}

Fig. 1.Completion rules forELO

Cv_T D iff D∈S^G(C), and

Cv_T ∃r.D iff there existsE∈BC_T such thatE∈S^G(C, r)andD∈S^G(E).

We now show how to use these completion sets for computing the role-depth bounded lcs forELO-concept description w.r.t. a generalELO-TBox.

3.2 Computing the Role-depth Bounded ELO-lcs

In order to compute the role-depth bounded lcs of twoELO-conceptsC andD, we follow an idea very similar to the one presented forELR-concepts in [8], where we compute the cross product of the tree unravelings of the completion graph for C andD up to the role-depthk. Clearly, in the presence of nominals, the right completion sets need to be chosen such that they preserve the non-emptiness of the interpretation of concepts derived by _R.

An algorithm that computes the role-depth boundedELO-lcs using completion sets is shown in Figure 2. In the first step, the algorithm introduces two new concept namesAandB as abbreviations for the conceptsCandD, and the augmented TBox is normalized. The completion sets are then initialized and the completion rules from Figure 1 are applied exhaustively, yielding the saturated completion sets. In the recursive procedurek-lcs-rfor conceptsAandB, we first obtain all the basic concepts that subsume bothAand B from the setsS^A(A) andS^B(B). For every role namer, the algorithm then recursively computes the (k−1)-lcs of the conceptsA⁰andB⁰in the subsumer setsS^A(A, r) andS^B(B, r), i.e. for whichAvT ∃r.A⁰ andBvT ∃r.B⁰. The resulting concepts are conjoined as existential restrictions to the resultingk-lcs.

The algorithm only introduces concept and role names that occur in the original TBox T. Therefore those names introduced by the normalization are not used in the concept description for the k-lcs and an extra denormalization step as in [16, 8] is not necessary.

Notice, that for every pair (A⁰, B⁰) ofr-successors ofA andB it holds that A RA⁰andB RB⁰. Intuitively, we are assuming that the interpretation of both AandBis not empty. This in turn causes the interpretation of∃r.A⁰ and∃r.B⁰ to be not empty, either. Thus, it suffices to consider the completion setsS^A(. . .)

(6)

Procedurek-lcs(C, D,T, k)

Input:C, D:ELO-concept descriptions;T:ELO-TBox;k∈IN Output:role-depth boundedELO-lcs ofC, Dw.r.t.T andk

1: T⁰:=normalize(T ∪ {A≡C, B≡D}) 2: ST :=apply-completion-rules(T⁰)

3: return k-lcs-r(A, B,ST, k, A, B,Sig(T)) Procedurek-lcs-r(X, Y,ST, k, A, B,Sig(T))

Input:A, B: concept names,X, Y: basic concepts withA RX, B RY;k∈IN;

ST: set of saturated completion sets;Sig(T): signature ofT Output:role-depth boundedELO-lcs ofX, Y w.r.t.T andk

1: CN :=S^A(X)∩S^B(Y)∩BCT

2: if k= 0then 3: return l

P∈CN

P 4: else

5: return l

P∈CN

Pu l

r∈Sig(T)∩N_R

_l

E∈S^A(X,r), F∈S^B(Y,r)

∃r.k-lcs-r E, F,ST, k−1, A, B,Sig(T)

Fig. 2.Computation algorithm for role-depth boundedELO-lcs.

andS^B(. . .), without the need to additionally computeS^A⁰(. . .) andS^B⁰(. . .), or the completion setsS^E(. . .) for any other basic conceptEencountered during the recursive computation of thek-lcs. This allows for a goal-oriented optimization in cases where there is no need to classify the full TBox.

4 The k-lcs in Prob-ELO

⁰¹_c

So far, we have discussed only classic DLs that can be used to represent and reason with certain knowledge. However, it is not uncommon to encounter sit- uations where a degree of uncertainty is unavoidable. This is often the case in the medical and biological domains, where knowledge is obtained through clini- cal testing, and there might exist hidden, or not completely understood, factors affecting the outcome. For instance, we would like to be able to express that obese people arelikelyto have high pressure, without asserting that every obese personmust have high pressure.

The probabilistic logic Prob-ELwas introduced by [13] as an extension ofEL that allows to express uncertain knowledge through probabilistic concepts and roles. Here, we extend these ideas to cover also nominals. Formally, Prob-ELO concepts extend classicalELOconcepts with the constructors

P.qC and ∃P.qr.C,

where r∈NR, .∈ {>, <,≥,≤,=}, andq∈[0,1]. Intuitively, a concept of the formP.qCdenotes the class of all objects that belong toCwith a probability.q.

(7)

For the above example, we can use the conceptP_≥0.9∃hasCondition.HighPressure to represent the class of all individuals that are likely to have high pressure.

The semantics of this logic generalizes the semantics of classicalELOby considering a set of possible worlds, corresponding to a formalization of subjective (or Type 2) probabilities [10]. Formally, the semantics of Prob-ELOare based on probabilistic interpretations of the form I = (∆Î, W,(Iw)_w∈W, µ), where∆Î is a (non-empty) domain,W is a non-empty set of (possible)worlds,µis a discrete probability distribution over W, and for every w ∈ W, Iw is a classical ELO interpretation with domain ∆Î. Additionally, for every a∈ NI and every two worldsw, w⁰∈W, it holds thataÎ^w=aÎ^w⁰.

From a probabilistic interpretation, we can compute the probability that a given element of the domain d∈ ∆^I belongs to the interpretation of a named conceptA, and respectively, the probability that a pair of individuals are related via a roleras follows:

pÎ_d(A) :=µ({w∈W |d∈AÎ^w}), pÎ_d,e(r) :=µ({w∈W |(d, e)∈rÎ^w}).

The functionsIwandp^I_d are extended to general concepts through the following mutual recursion.

>Î^w = ∆Î, (∃r.C)Î^w= {d∈∆Î| ∃e∈CÎ^w.(d, e)∈rÎ^w}, (CuD)Î^w = CÎ^w∩DÎ^w, (P_.qC)Î^w= {d∈∆Î|pÎ_d(C). q},

({o})Î^w = {oÎ^w}, (∃P.qr.C)Î^w= {d∈∆Î| ∃e∈CÎ^w.pÎ_d,e(r). q}, pÎ_d(C) =µ({w∈W |d∈CÎ^w}).

We say that the probabilistic interpretation I = (∆Î, W,(Iw)w∈W, µ) satisfies the GCICvD if for every worldw∈W it holds thatCÎ^w⊆DÎ^w.I is amodel of the TBoxT ifI satisfies all GCIs inT. A conceptCis subsumed by concept D w.r.t. the TBoxT (Cv_T D), if every modelI ofT satisfiesCvD.

Unfortunately, the probabilistic constructors increase the complexity of reasoning, and deciding subsumption becomes intractable. In fact, as shown in [9], the problem isExpTime-complete, even if only one constructor of the formP.q

withq∈(0,1) is allowed. Moreover, the problem becomesPSpace-hard if probabilistic existential restrictions of the form∃P>0ror ∃P=1r are used. To regain tractability Prob-ELwas restricted in [13] to probabilistic concepts of the form P>0CorP=1Cand no probabilistic existential restrictions or roles, yielding the DL Prob-EL⁰¹_c . The extension of this DL by nominals is Prob-ELO⁰¹_c , which is the DL we consider for the remainder of the paper.

4.1 Completion Algorithm for Prob-ELO⁰¹_c

The basic idea of the completion algorithm for Prob-ELO⁰¹_c is the same as in the crisp case: to construct a canonical model of the given TBox. In order to intro-

(8)

duce it, we need to extend the notion of basic concepts to the new constructors BC_T = (Sig(T)∩(N_C∪N_I))∪ {>} ∪ {P_>0A, P₌₁A|A∈Sig(T)∩N_C}.

Since probabilistic interpretations contain a set of worlds, the completion algorithm has to work on sets of completion sets: one for each world and, since Prob-ELO⁰¹_c also contains nominals, for each basic concept. LetP₀^T denote the set of all probabilistic concepts of the form P>0A appearing in T. The completion algorithm uses a set of worlds V := {0,1, ε} ∪ P₀^T; this way, for each GCI B vP>0A ∈ T the world v =P>0A serves as witness for this subsumption. Therefore, the Prob-ELO⁰¹_c completion sets are now of the formS_∗^G(X, v) or S^G_∗(X, r, v) where X is a concept name, >, or a nominal, v ∈ V, G is a concept name or >, ∗ ∈ {0, ε} and ris a role name. The completion sets con- tainbasic concepts fromBC_T. The normal form for Prob-ELO⁰¹_c -TBoxes is the same as the normal form as for ELO-TBoxes, i.e. each axiom is of the form C v D, C₁uC₂ vD, C v ∃r.A, or ∃r.AvD, with C, C₁, C₂, D ∈BC_T and A∈(Sig(T)∩N_C)∪ {>}.

The reachability relation for Prob-ELO⁰¹_c -concepts extends the one for ELO in distinguishing between concept names X and probabilistic concepts P>0X. For example, non-emptiness of Gdoes not imply non-emptiness ofP>0X, even ifG _RX, e.g. in worlds with probability 0. Similarly, non-emptiness ofGdoes not imply non-emptiness ofX forG _RP_>0X. Therefore we introduce two kinds of reachability relation,G ⁰_RX forG _RX andG ^ε_RX forG _RP_>0X: Definition 3 (Reachability of Prob-ELO⁰¹_c -concept descriptions). LetT be a Prob-ELO⁰¹_c -TBox in normal form,G∈NC∪ {>}, andD∈NC∪NI.

G⁰_RD iff there exist r₁, . . . , r_n ∈ N_R and basic concepts A₀, . . . , A_n where A_i∈S₀^G(A_i−1, r_i,0)for all1≤i≤n, such thatA₀is eitherGor a nominal andA_n=D.

G ^εRD iff G ⁰RX, P>0Y ∈ S₀^G(X,0) and there are r1, . . . , rn ∈ NR and A0, . . . , An ∈ NC with Ai ∈ S_ε^G(Ai−1, ri, ε) for all 1 ≤ i ≤ n such that A0=Y andAn=D.

When it is clear from the context which relation ⁰R or ^εR we are using, we will sometimes denote it simply as R.

Additionally, nominals interact with the set of possible worlds in a different way than normal concepts. In particular, the conceptsP_>0{a} andP₌₁{a} are indeed equivalent to{a}, since{a}is interpreted as the singleton domain element a^Iin each world. This implies that alsoX vP>0{a},X vP=1{a}andXv {a}

are equivalent. In other words, whenever {a} ∈ S_∗^G(X, v), then{a} must be in S_∗^G(X, w) for allw∈V.

The completions setsS_∗^G(X, v) andS_∗^G(X, r, v) are initialized as follows:

– S₀^G(X,0) ={>, X} andS₀^G(X, v) ={>}for allv∈V \ {0}, – S_ε^G(X, ε) ={>, X}andS_ε^G(X, v) ={>}for allv∈V \ {ε}, – S₀^G(X, r, v) =S_ε^G(X, r, v) =∅for allv∈V.

(9)

PR1 IfC⁰∈S^G∗(X, v) andC⁰vD∈ T, thenS∗^G(X, v) :=S∗^G(X, v)∪ {D}

PR2 IfC1, C2∈S∗^G(X, v) andC1uC2vD∈ T, thenS^G∗(X, v) :=S∗^G(X, v)∪ {D}

PR3 IfC⁰∈S^G∗(X, v) andC⁰v ∃r.D∈ T, thenS∗^G(X, r, v) :=S∗^G(X, r, v)∪ {D}

PR4 IfD∈S^G∗(X, r, v),D⁰∈S_γ(v)^G (D, γ(v)) and∃r.D⁰vE∈ T, thenS^G∗(X, v) :=S∗^G(X, v)∪ {E}

PR5 If{a} ∈S∗^G₁(X,∗1)∩S∗^G₂(D,∗2) andG^∗²RD, thenS^G∗₁(X,∗1) :=S∗^G₁(X,∗1)∪ {P∗₂D}

PR6 IfP>0A∈S∗^G(X, v), thenS∗^G(X, P>0A) :=S^G∗(X, P>0A)∪ {A}

PR7 IfP=1A∈S^G∗(X,0), thenS^G∗(X,1) :=S∗^G(X,1)∪ {A}

PR8 IfP=1A∈S^G∗(X, v),v6= 0, thenS∗^G(X, v) :=S∗^G(X, v)∪ {A}

PR9 IfA∈S∗^G(X, v) andv6= 0,P>0A∈ P0^T, thenS∗^G(X, v⁰) :=S∗^G(X, v⁰)∪ {P>0A}

PR10IfA∈S∗^G(X,1) andP=1Aoccurs inT, thenS∗^G(X, v) :=S^G∗(X, v)∪ {P=1A}

PR11If{a} ∈S∗^G(X, v) thenS∗^G(X, v⁰) :=S∗^G(X, v⁰)∪ {{a}}

Fig. 3.Completion rules for Prob-ELO⁰¹_c

These completion sets are then extended by applying the completion rules from Figure 3 exhaustively. The function γ : V → {0, ε} used in rule PR4 is defined byγ(0) = 0, andγ(v) =εfor allv∈V \ {0}. The completion rules can be divided into two groups. The rules PR1 to PR5 form the first group and are basically the same as the rules OR1 to OR5 forELO—they are used to compute all subsumption relationships between concepts inside each world. The next five rulesPR6 to PR11 handle probabilistic concepts and therefore propagate derived facts between the different worlds. For example, whenever we haveP>0Ain the subsumer set ofB, then rulePR6 will pushAinto the subsumer set of Bfor the world v = P>0A, i.e., the world v is a witness of the subsumption. Similarly, wheneverP=1Ais in the subsumer set ofB for some worldv, then rulePR7and

PR10will pushP₌₁Ainto the subsumer sets ofB for all other worldswand rule

PR8 will finally put A into the subsumer set of B for each world with non-zero probability (i.e. all worlds except world 0). Rule PR11 distributes nominals in subsumer sets between the worlds inV as explained earlier.

This set of completion rules extends the rules given in [13] for Prob-EL⁰¹_c in two ways. First, by the rules for nominals (written as completion rules). Second, by rulePR7, which is actually necessary to achieve completeness of the completion algorithm for Prob-EL⁰¹_c (without nominals). To see this, consider the following TBox:

Tex ={AvP=1B, BvC, P=1CvD}

Clearly, we haveAv_T_ex D, however, without rulePR7, the completion algorithm is stuck withP₌₁B∈S₀(A,0) and will never deriveB∈S₀(A,1),C∈S₀(A,1), P=1C∈S₀(A,0) and finallyD∈S₀(A,0).

Theorem 1. The completion algorithm for Prob-ELO⁰¹_c is sound and complete.

(10)

A detailed proof is given in the appendix (see Lemmas 1 and 2). Also note that the completion algorithm for Prob-ELO⁰¹_c still runs in polynomial time, since

|BC_T|,|Sig(T)∩N_C|, |Sig(T)∩N_I|, |V|, are all linear in the size of|T |=n.

4.2 Computing the Role-depth Bounded Prob-ELO⁰¹_c -lcs

The approach for computing the role-depth bounded Prob-ELO⁰¹_c -lcs is similar to the classical case, where we first introduce new concept names for the input concepts, normalize the TBox, apply the completion rules, then we intersect the direct subsumers stored in the completion sets and add the cross-product of the existential restrictions of both concepts. However, in the presence of probabilistic concepts, we need to compute also the probabilistic direct subsumers and probabilistic existential restrictions. Therefore, this algorithm computes the cross-product of the existential restrictions three times: for the unconditional concepts, for those concepts with probability 1, and for concepts with non-zero probability. In contrast, we need to compute the intersection of the basic concepts only once, since wheneverX ∈S₀^G(A, v) with v 6= 0, then by completion rule PR9 we have alsoP>0X ∈ S₀^G(A,0) and similarly, whenever X ∈S₀^G(A,1) then we also have P=1X ∈S₀^G(A,0) by completion rulePR10. The algorithm to compute the role-depth bounded lcs in Prob-ELO⁰¹_c is described in Figure 4.

Theorem 2. LetT be a Prob-ELO⁰¹_c -TBox,C andD be Prob-ELO⁰¹_c -concepts andkbe a natural number. Thenk-lcs(C, D,T, k)is the role-depth bounded least common subsumer ofC andD w.r.t. T and the role-depthk.

Correctness of this algorithm for computing the role-depth bounded Prob-ELO⁰¹_c - lcs follows from soundness and completeness of the completion rules which are used to generate the underlying completion sets. The full proof can be found in the appendix. As in the crisp case, the resultingk-lcs can have a size exponential inkif computed forninput concepts, but it is still polynomial in the size of the input TBoxT.

5 Conclusions

In this paper we have studied extensions of the light-weight description logic EL, that include nominals and are capable of handling uncertainty. For the DL Prob-ELO⁰¹_c , we have introduced a completion algorithm that generalizes the previously known algorithm for Prob-EL⁰¹_c , with correct rules for handling nominals, while still retaining the polynomial time complexity of classification.

Second, we described how the completion sets saturated by the completion algorithm can be combined to compute (approximations of) the lcs of two concepts in DLs with nominals—both for the crisp case inELOand the probabilistic case in Prob-ELO⁰¹_c . In cases where the exact lcs exists, the algorithms compute the exact lcs for a sufficiently largek.

The extension ofEL⁺ by nominals, covers (most of) the OWL 2 EL profile, thus combining the algorithms for computing the role-depth bounded lcs inEL⁺

(11)

Procedurek-lcs(C, D,T, k)

Input:C, D: Prob-ELO⁰¹c -concept descriptions;T: Prob-ELO⁰¹c -TBox;k∈IN Output:k-lcs(C, D): role-depth bounded Prob-ELO⁰¹c -lcs ofC, Dw.r.t.T andk

1: T⁰:=normalize(T ∪ {A≡C, B≡D}) 2: ST :=apply-completion-rules(T⁰) 3: L:=k-lcs-r(A, B,ST, k, A, B) 4: return L

Procedurek-lcs-r(X, Y,S, k, A, B)

Input:A, B: concept names,X, Y: basic concepts withA RX, B RY; S: set of saturated completion sets;k: natural number

Output:k-lcs(A, B): role-depth bounded Prob-ELO⁰¹c -lcs ofX, Y w.r.t.T andk 1: CN :=d

E∈S^A 0(X,0)∩S^B

0(Y,0)∩BC_T E 2: if k= 0then

3: return CN 4: else

5: return CNu l

r∈Sig(T)∩N_R

l

(E,F)∈S^A

0(X,r,0)×S^B 0(Y,r,0)

∃r.k-lcs-r(E, F,S, k−1, A, B)u

l

(E,F)∈S^A

0(X,r,1)×S^B 0(Y,r,1)

P=1

∃r.k-lcs-r(E, F,S, k−1, A, B) u

l

(E,F)∈PR^A(X,r)×PR^B(Y,r)

P>0

∃r.k-lcs-r(E, F,S, k−1, A, B) ,

wherePR^G(X, r) =S

v∈V\{0}S0^G(X, r, v)

Fig. 4.Computation algorithm for role-depth bounded Prob-ELO⁰¹c -lcs.

[8] and inELOpresented here allows to compute generalizations in this profile.

Similarly, as in case of the lcs, an approximation of the most specific concept (msc) can be computed based on a completion algorithm, see [15]. For DLs with nominals, the completion algorithm given in this paper can be directly used for this, since an ABox can always be absorbed into the TBox in a preprocessing step using these nominals. However, since the msc in the presence of nominals is trivial (msc(a) =a), another target DL should be considered in order to yield an informative version of the msc.

Besides the msc, there exist several other non-standard inferences that have been studied for classical DLs and would be of interest in the context of subjective probabilities. One of them isaxiom pinpointing, i.e. the task of discovering the precise axioms from a knowledge base that are responsible for a consequence to follow. The use of probabilities introduces a new challenge as seemingly in- nocuous axioms may interact to produce unexpected (and possibly unwanted) consequences. A further study of this problem will be a matter of future work.

(12)

References

1. M. Ashburner. Gene ontology: Tool for the unification of biology.Nature Genetics, 25:25–29, 2000.

2. F. Baader. Terminological cycles in a description logic with existential restrictions.

In G. Gottlob and T. Walsh, editors,Proc. of the 18th Int. Joint Conf. on Artificial Intelligence (IJCAI-03), pages 319–324. Morgan Kaufmann, 2003.

3. F. Baader, S. Brandt, and C. Lutz. Pushing the EL envelope. In Proc. of the 19th Int. Joint Conf. on Artificial Intelligence (IJCAI-05), Edinburgh, UK, 2005.

Morgan-Kaufmann Publishers.

4. F. Baader, R. K¨usters, and R. Molitor. Computing least common subsumers in description logics with existential restrictions. In T. Dean, editor,Proc. of the 16th Int. Joint Conf. on Artificial Intelligence (IJCAI-99), pages 96–101, Stockholm, Sweden, 1999. Morgan Kaufmann, Los Altos.

5. A. Borgida, T. Walsh, and H. Hirsh. Towards measuring similarity in description logics. InProc. of the 2005 Description Logic Workshop (DL 2005), volume 147 of CEUR Workshop Proceedings, 2005.

6. S. Brandt and A.-Y. Turhan. Using non-standard inferences in description logics

— what does it buy me? In G. G¨orz, V. Haarslev, C. Lutz, and R. M¨oller, editors, Proc. of the 2001 Applications of Description Logic Workshop (ADL 2001), num- ber 44 in CEUR Workshop, Vienna, Austria, September 2001. RWTH Aachen. See http://CEUR-WS.org/Vol-44/.

7. C. d’Amato, N. Fanizzi, and F. Esposito. A semantic similarity measure for expres- sive description logics. InProc. of Convegno Italiano di Logica Computazionale, CILC05, 2005.

8. A. Ecke and A.-Y. Turhan. Role-depth bounded least common subsumers forEL⁺ andELI. In Y. Kazakov, D. Lembo, and F. Wolter, editors,Proc. of the 2012 De- scription Logic Workshop (DL 2012), volume 846 ofCEUR Workshop Proceedings.

CEUR-WS.org, 2012.

9. V. Guti´errez-Basulto, J. C. Jung, C. Lutz, and L. Schr¨oder. A closer look at the probabilistic description logic prob-EL. InProceedings of Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), 2011.

10. J. Y. Halpern. An analysis of first-order logics of probability.Artificial Intelligence, 46:311–350, 1990.

11. Y. Kazakov, M. Kr¨otzsch, and F. Simanˇc´ık. Practical reasoning with nominals in theELfamily of description logics. In G. Brewka, T. Eiter, and S. A. McIlraith, editors,Proc. of the 12th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR-12), pages 264–274. AAAI Press, 2012.

12. K. M. Kudla and M. C. Rallins. Snomed: A controlled vocabulary for computer- based patient records. Journal of the American Health Information Management Association, 69:40–44, 1998.

13. C. Lutz and L. Schr¨oder. Probabilistic description logics for subjective uncertainty.

In F. Lin, U. Sattler, and M. Truszczynski, editors,Proc. of the 12th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR-10). AAAI Press, 2010.

14. B. Motik, B. Cuenca Grau, I. Horrocks, Z. Wu, A. Fokoue, and C. Lutz. OWL 2 web ontology language profiles. W3C Recommendation, 27 October 2009. http:

//www.w3.org/TR/2009/REC-owl2-profiles-20091027/.

15. R. Pe˜naloza and A.-Y. Turhan. Towards approximative most specific concepts by completion forEL⁰¹with subjective probabilities. In T. Lukasiewicz, R. Pe˜naloza,

(13)

and A.-Y. Turhan, editors, Proceedings of the First International Workshop on Uncertainty in Description Logics (UniDL’10), 2010.

16. R. Pe˜naloza and A.-Y. Turhan. A practical approach for computing generalization inferences inEL. In M. Grobelnik and E. Simperl, editors,Proc. of the 8th European Semantic Web Conf. (ESWC’11), Lecture Notes in Computer Science. Springer, 2011.

17. B. Zarrieß and A.-Y. Turhan. Most specific generalizations w.r.t. general EL- TBoxes. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’13), Beijing, China, 2013. AAAI Press. To appear.

(14)

A Omitted Proofs

A.1 Correctness of the Classification Algorithm for Prob-ELO⁰¹_c Theorem 1. The completion algorithm for Prob-ELO⁰¹_c is sound and complete.

We prove each of the claims in the following. Please note that the consequence- driven algorithms forELOfrom [11] and also the completion algorithm presented in Section 3 compute subsumption relationshipsunder the assumption that certain concepts have a non-empty interpretation. To indicate the assumption that, sayGis not empty, when considering the subsumption relationship between C andD, we writeG:CvT D.

Lemma 1. The completion algorithm is sound, i.e.

C∈S_∗^G(X, v)impliesG:P_∗Xv_T P_vC (1) C∈S_∗^G(X, r, v)impliesG:P_∗Xv_T Pv∃r.C (2) Proof. We show this by induction on the number of rule applications. It is easy to see that the initial subsumer sets satisfy (1) and (2). Also, after each rule application (1) and (2) will still be satisfied. We will only show (1) for the new completion rules for Prob-ELO⁰¹_c , which are not in the original completion algorithm in [13]. For property (2), note that none of these new rules changes the subsumer setsS_∗^G(X, r, v).

PR5 If {a} ∈ S^G_∗₁(X,∗1)∩S_∗^G₂(D,∗2), then by induction hypothesis we have G:P_∗₁Xv_T {a} andG:P_∗₂Dv_T {a}. Additionally, by definition of R, G^∗²RD implies that if G is not empty, then P∗₂D must be not empty as well. Thus we haveG:P∗₂D≡T {a} and thereforeG:P∗₁XvP∗₂D. This means, that the addition ofP∗₂D toS_∗^G₁(X,∗1) still satisfies (1).

PR7 If P₌₁A ∈ S_∗^G(X,0), then by induction hypothesis G : P_∗X vT P₌₁A.

Thus the implication A ∈ S_∗^G(X,1) ⇒ G : P_∗X vT P₌₁A is obviously correct, and we can addAto S_∗^G(x,1).

PR11 If{a} ∈ S_∗^G(X, v), then by induction hypothesisG:P_∗X v_T P_v{a}, i.e.

for all modelsI= (∆^I, W,(Iw)_w∈W, µ) ofT and all worldsw∈W we have:

ifGis not empty, then (P_∗X)Î,w ⊆(Pv{a})Î,w. Together with (P>0{a})Î,w={d| ∃v∈W :µ(v)>0∧d∈ {a}Î,v}

={d| ∃v∈W :µ(v)>0∧d=aÎ}={aÎ}={a}Î,w and similarly (P₌₁{a})Î,w = {a}Î,w, this yields: if G is not empty, then (P_∗X)Î,w ⊆ {a}Î,w ={aÎ} = {a}Î,v⁰ for all v⁰ ∈W. Hence it holds that G:P_∗X vT P_v⁰{a}. This means, that the addition of{a}to S_∗^G(X, v⁰) still satisfies (1).

Lemma 2. The completion algorithm is complete, i.e. for a normalized TBox T,G∈N_C,B∈BCT, andr∈N_R we have

Gv_T B implies B∈S₀^G(G,0)

GvT ∃r.Bimplies ∃A withA∈S^G₀(G, r,0)andB ∈S₀^G(A,0)

(15)

Proof. We assume thatB 6∈S^G₀(G,0) (resp. there is noAwithA∈S^G₀(G, r,0) and B ∈S₀^G(A,0)) and construct a modelI_G of T which shows that G6v_T B (resp. G6v_T ∃r.B). To construct this model, we need the classes of equivalent nominals: [a] = {b ∈ Sig(T)∩NI | {a} ∈ S₀^G({b},0)}. The domain of the interpretation will contain all nominals (modulo equivalence) and for each world w ∈ V all concepts that are not subsumed by a nominal and can be reached fromGor a nominal using the relation R.

LetIG = (∆^I^G, W,(IG,w)w∈W, µ) be the following interpretation:

∆^I^G :={[a]|a∈Sig(T)∩NI} ∪

{(A, v)∈Sig(T)∩N_C×V |G^γ(v)_RA,{a} 6∈S_γ(v)^G (A, γ(v))}

W :=V µ(0) := 0 µ(w) := 1

|W\ {0}| for allw∈W\ {0}

a^I^G= [a] for alla∈Sig(T)∩N_I

To interpret concept and role names, we also need a bijectionπ_v(w) :W →W for eachv∈W\ {0} withπ_v(v) =εandπ_v(0) = 0. Moreover,π₀is the identity mapping onW. Then:

A^I^G^,w={[a]|A∈S₀^G({a}, w)}

∪ {(B, v)∈∆^I^G|A∈S_γ(v)^G (B, πv(w))}

rÎ^G^,w={([a],[b])∈∆Î^G×∆Î^G| ∃A:A∈S₀^G({a}, r, w)∧ {b} ∈S^G_γ(w)(A, γ(w))}

∪ {([a],(A, w))∈∆^I^G×∆^I^G |A∈S₀^G({a}, r, w)}

∪ {((B, v),[b])∈∆^I^G×∆^I^G | ∃A:A∈S_γ(v)^G (B, r, πv(w))

∧ {b} ∈S_γ(w)^G (A, γ(w))}

∪ {((B, v),(A, w))∈∆^I^G×∆^I^G|A∈S_γ(v)^G (B, r, πv(w))}

Before proving thatIG is indeed a model of T, we generalize the definition of A^I^G^,w to probabilistic concepts:

X ∈S_γ(v)^G (B, πv(w)) iff (B, v)∈XÎ^G^,w forX ∈BC_T,(B, v)∈∆Î^G (3) X ∈S₀^G({a},0) iff [a]∈XÎ^G^,w forX ∈BC_T,[a]∈∆Î^G (4) To show this, we make a case distinction according to the kind of concept X is. First notice that in (3)X cannot be a nominal, since otherwiseB would be subsumed by one and hence not be in the domain∆Î^G as we assumed.

– IfX =>, then (3) and (4) are true by definition of >^I^G^,w =∆^I^G and the fact that>is in each subsumer set.

– IfX =A∈NC, then (3) and (4) are true by definition ofA^I^G^,w.