A Richer Policy Language for GDPR Compliance

(1)

Piero A. Bonatti¹, Iliana M. Petrova², and Luigi Sauro¹

1 Universit`a degli studi di Napoli Federico II

2 CeRICT

Abstract. We illustrate an OWL2 fragment properly tailored for the specifica- tion of both companies’ data protection policies and data subjects’ consent to data processing. Assessing a company’s policy for compliance with data subjects’

consent is then reduced to a subsumption checking. We provide a complete subsumption algorithm which combines tableaux-based and structural-subsumption techniques and show that, under some conditions, it performs in polynomial time.

1 Introduction

The European General Data Protection Regulation³ (GDPR) places tight restrictions on the processing of personal data and establishes substantial administrative fines to discourage infringements. Since collecting and processing personal data is a paramount source of innovation and revenue,data controllers(i.e. the personal and legal entities that process personal data) are interested in maximizing personal data usage within the limits posed by the GDPR.

The European H2020 project SPECIAL⁴ is aimed at supporting controllers with methodological and technological means to comply with the GDPR requirements ef- ficiently and safely. Compliance checking plays a crucial role in this picture: first, the data usage policies of the industrial partners must be compared with a (partial) formalization of the GDPR itself. Second, in the absence of a legitimate legal basis⁵, personal data cannot be stored or processed, even temporarily, without the explicit consent of the data subjects. As the number of data subjects (and their policies) can be as large as the number of customers of a major communication service provider, some of the project’s use cases consist in checking storage permissions against a stream of incom- ing data points, at the rate of hundreds of thousands per minute. Thus, a crucial tasks is the development of scalable reasoning procedures for compliance checking.

Here, we focus onPL, that is, the SPECIAL’s approach to the representation of data usage activities, consent to data processing, and selected parts of the GDPR.PLis a DL

3http://data.consilium.europa.eu/doc/document/

ST-5419-2016-INIT/en/pdf

4https://www.specialprivacy.eu/

5Such legal bases include public interest, the vital interests of the data subject, signed contracts, and the legitimate interests of the data controller, just to name a few (cf. GDPR Art. 6). How- ever, the regulation constrains the applicability of these legal bases with a number of provisos and caveats as the data minimization principle introduced in Art. 5, and the limitations to the legitimate interests of the controller rooted in Art. 6.1(f).

(2)

fragment specifically designed to balance expressiveness requirements to encode data usage policies and scalability of reasoning. A first formalization has been presented in [5]. In this paper, we describe an extention,PL⁺, that overcomes some limitations of expressiveness that arose from a further in-depth analysis of the GDPR on one side, and the policy specifications of the industrial partners involved in the project on the other.

More precisely,PL⁺enhancesPLin the following aspects:

– it extendsPLby including role domains in the ontology language and role chain agreements(R₁◦ · · · ◦R_n=S₁◦ · · · ◦S_m)in the query language;

– it relies on a new subsumption checking algorithm that combines tableaux-based and structural subsumption reasoning techniques. This algorithm is notably simpler than the one presented in [5] and hence we expect it to perform faster.

Thereafter, we focus on the technical aspects ofPL⁺referring to [5] for further moti- vations and examples.

2 Preliminaries

PL⁺is tailored to formalize policies regulating the usage of personal data. It can ex- press (i) the policies ofdata controllers(i.e. the organizations that collect and process personal data), (ii) the selective consent to data usage issued by data subjects, and (iii) parts of the GDPR related to consent management, user rights, and data transfer.

The aspects of data usage that have legal relevance are clearly indicated in several articles of the GDPR. In particular, the main properties of ausage policythat need to be encoded and archived include:

– reasons for data processing (purpose);

– which data categories are involved;

– what kind of processing is applied to the data;

– which third parties data are distributed to (recipients);

– countries in which the data will be stored (location);

– time limits for data erasure (duration).

Here we focus only on the technical aspects needed in this work and refer the reader to [3] for further details. ThePL⁺language is built from countably infinite and mutually disjoint sets of concept names (NC), role names (NR) and concrete feature names (NF).

AninterpretationI is a structureI = (∆Î,·Î)where∆Î is a nonempty set, and the interpretation function·Î, defined over the signature Σ = NC∪NR ∪NF, is s.t. (i) AÎ ⊆∆Î ifA∈NC; (ii)RÎ ⊆∆Î×∆Î ifR∈NR; (iii)fÎ ⊆∆Î×Niff ∈NF, whereNis the set of natural numbers. A pointed interpretation is a pairI, dwhereIis an interpretation anddis an element of the domain∆Î.

APL⁺knowledge baseKBis a finite set of axioms of the formfunc(f),func(R), dom(R, A),ran(R, A),disj(A, B), andAvB, whereAandBare concept names,R is a role name, andf denotes a concrete feature. An interpretationIsatisfiesan axiom αifIsatisfies the corresponding semantic condition in Figure 1.

Figure 2 shows the syntax and semantics ofPL⁺ operators which allow to de- fine query concepts. We say that a concept issimpleif no disjunction occurs in it. By applying the logical equivalences C1 u(C2tC3) ≡ (C1uC2)t(C1 uC3)and

(3)

Name Syntax Semantics

GCI AvB A^I ⊆B^I

disj disj(A, B) AÎ∩BÎ =∅ role functionality func(R) RÎis a partial function concr. feature functionality func(f) fÎis a partial function

domain dom(R, A)RÎ⊆AÎ×∆Î

range ran(R, A) RÎ⊆∆Î×AÎ

Fig. 1.Syntax and semantics ofPL⁺axioms.

∃R.(C1tC2) ≡ ∃R.C1t ∃R.C2 any conceptC can be reduced in a normal form C1t · · · tCnwhere eachCiis simple (clearly, the normal form can be exponentially longer thanC). A pointed interpretation satisfies a conceptC, formally I, d |= C, iff d ∈ CÎ. Then, I satisfies C in case I, d |= C for some d ∈ ∆Î. As usual, KB |=CvDmeans thatCÎ ⊆DÎ, for all modelsIofKB.

Example 1. A company – call it BeFit – sells a wearable fitness appliance and wants to process biometric data (stored in the EU) for sending health-related advice to a customer. Furthermore, BeFit wants to share the customer’s location data with a group of enterprises (located in the USA) engaged in a joint economic activity. According to Art. 47 of the GDPR, the transfer of personal data outside of EU is allowed towards recipients that enforce, so-called, binding corporate rules. Location data are kept for a minimum of one year but no longer than 5; biometric data are kept for an unspecified amount of time. In order to do all this legally, BeFit needs consent from its customers.

The consent BeFit acquires from a customer may look as follows:

(∃purpose.F itnessRecommendationu ∃data.BiometricDatau

∃processing.Analyticsu ∃recipient.BeF itu ∃store.∃location.EU) t

(∃purpose.M onetizeu ∃data.LocationDatau ∃processing.T ransf eru

∃recipient.U SEnterpriseGroupu ∃store.(∃location.U SAuduration: [1,5])u (controller◦ applyBindingCorporateRulesT o=recipient)u

∃controller.BeF it),

where integers represent years. Moreover,KBcan include the following axioms:

HeartRatevBiometricData ComputeAvgvAnalytics f unc(recipient)

Given the knowledge base, the first disjunct of the above consent allows BeFit to compute the average heart rate of the data subject in order to send her fitness rec- ommendations. The purpose could be further detailed by saying how the data subject is contacted, e.g. “recommendation via SMS”. Then, in the customer’s consent

(4)

Name Syntax Semantics

bottom ⊥ ⊥^I =∅

conjunction CuD (CuD)Î =CÎ∩DÎ

disjunction CtD (CtD)Î =CÎ∪DÎ

existential restriction

∃R.C {d∈∆Î| ∃(d, e)∈RÎ:e∈CÎ}

concrete constraint

f: [l, u] {d∈∆^I| ∃w∈Ns.t.(d, w)∈f^Iandl≤ w≤u)

existential chain restriction

(R1◦ · · · ◦Rn=S1◦ · · · ◦Sm){d ∈ ∆Î | ∃d¯s.t.(d,d)¯ ∈ (R1◦ · · · ◦ Rn)Î∩(S1◦ · · · ◦Sm)Î}

Fig. 2.Syntax and semantics ofPL⁺(query) concepts.

∃purpose.F itnessRecommendationshould be replaced with something like

∃purpose.(F itnessRecommendationu ∃contact.SMS).

Note also that, sincerecipientis a functional role, the chain agreement in the second disjunct imposes to the recipient the application of the controller’s binding corporate

rules. ut

An AboxAis a finite set of assertions of the formC(x)andR(x, y), whereRis a role name,C is a simple query concept, andxandy belong to a countably infinite set of variable namesVR. In particular, we assume thatVRcontains a distinguished variablex0. An assignmentπ : VR→ ∆Î interprets all variables inVRas elements of an interpretation I. LetVR_A ⊆VRbe the set of variable names occurring inA, we say thatπandπ⁰ agree inAifπ(x) = π⁰(x), for allx ∈VR_A. Then, a pairI, π satisfies an AboxA(formally,I, π|=A) iff (i)π(x)∈CÎ, for allC(x)∈ A, and (ii) (π(x), π(y))∈RÎ, for allR(x, y)∈ A.

Finally, we say thatC(resp.A) andDareinterval safeiff for all constraintsf : [l, u]

occurring inC (resp. for allf : [l, u]s.t.f : [l, u](x) ∈ A, for some variablex) and f : [l⁰, u⁰]occurring inD, either[l, u]⊆[l⁰, u⁰], or[l, u]∩[l⁰, u⁰] =∅.

3 The subsumption algorithm

Checking whetherKB |= C vD is performed in three phases. The first phase pre- processes the conceptC. First of all, for eachf : [l, u]inC, letx1< x2<· · ·< xrbe the integers that occur as interval endpoints in sub-conceptf : [l⁰, u⁰]ofDand belong to[l, u]. Letx0=`andxr+1=uand replacef[l, u]with the equivalent concept

r

G

i=0

f: [xi, xi]tf: [xi+ 1, xi+1−1]

tf: [xr+1, xr+1]. (1) Doing so, we ensure thatCandDare interval safe. Then,Cis translated in the corresponding normal formC1t · · · tCn. As a consequence, a knowledge baseKBentails

(5)

Rule→u if 1. C1uC2(x)∈ A and 2. {C1(x), C2(x)} 6⊆ A, then set A⁰← A ∪ {C1(x), C2(x)}

Rule→v if 1. A(x)∈ A and 2. AvB∈KB, and 3. B(x)6∈ A then set A⁰← A ∪ {B(x)}

Rule→∃ if 1. ∃R.C(x)∈ A and 2. for noy{R(x, y), C(y)} ⊆ A

then create a new variableynewand setA⁰← A ∪ {R(x, ynew), C(ynew)}

Rule→dom if 1. R(x, y)∈ A and 2. dom(R, A)∈KB, and 3. A(x)6∈ A, then set A⁰← A ∪ {A(x)}

Rule→ran if 1. R(x, y)∈ A and 2. ran(R, A)∈KB, and 3. A(y)6∈ A then set A⁰← A ∪ {A(y)}

Rule→func(R) if 1. {R(x, y), R(x, z)} ⊆ A and 2. func(R)∈KB then set A⁰← A[y/z]

Rule→func(f) if 1. {f: [l, u](x), f: [v, w](x)} ⊆ A and 2. func(f)∈KB then set A⁰← A ∪ {f: [max(l, v),min(u, w)]}

Rule→= if 1. (R1◦ · · · ◦Rn=S1◦ · · · ◦Sm)(x)∈ A, and 2. there does not existy1, . . . , yn−1, z1, . . . , zm−1,y¯s.t.

{R1(x, y1), S1(x, z1)} ⊆ A,

Ri(yi−1, yi)∈ A, fori= 1, . . . , n−1, Sj(zj−1, zj)∈ A, forj= 1, . . . , m−1, {Rn(yn−1,y), S¯ m(zm−1,y} ⊆ A¯

then create new variablesy^new₁ , . . . , y^newn−1, z₁^new, . . . , zm−1^new,y¯^new and set A⁰← A ∪ {R1(x, y^new1 ), . . . , Rn(y^newn−1,y¯^new)}

∪ {S1(x, z^new1 ), . . . , Sm(zm−1^new,y¯^new)}

Rule→disj if 1. {A(x), B(x)} ⊆ A and 2. disj(A, B)∈KB then set A⁰← A ∪ {⊥(x)}

Rule→f_⊥ if 1. f: [l, u](x)∈ A and 2. u < l then set A⁰← A ∪ {⊥(x)}

Fig. 3.PL⁺tableaux rules CvDiffKB |=CivD, for alli= 1, . . . , n⁶.

After the preprocessing phase, the subsumption problem is reduced to checking whether each simple conceptCi is subsumed byD. For a simple conceptCi, during the second phase, an AboxA0is first initialised with aC_i(x₀)and then expanded by exhaustively applying the tableaux rules in Figure 3. The subsumption vacuously holds if the resulting AboxAcontains assertions of the form⊥(x)for somex. In contrary case, the third phase checks whetherx0 ∈ hhDii_A, wherehhDii_Ais the set of variable names contained inVR_Ainductively defined as follows:

6Note that, differently from the subsumption algorithm presented in [5],Dis not normalized.

This may significantly impact on performance since the normalization process might exponentially inflateD.

(6)

hhAiiA={x∈VRA|A(x)∈ A}

hhD1uD2iiA={x∈VRA|x∈ hhD1iiA∩ hhD2iiA} hhD1tD2iiA={x∈VRA|x∈ hhD1iiA∪ hhD2iiA}

hh∃R.DiiA={x∈VRA| ∃y∈VRAs.t.R(x, y)∈ A andy∈ hhDiiA}

hhR1◦ · · · ◦Rn=S1◦ · · · ◦SmiiA={x∈VRA| ∃(y, z)∈VR²As.t.(y, z)∈ hhRn, SmiiA

andx∈ hhR1◦ · · · ◦Rn−1, yiiA

andx∈ hhS1◦ · · · ◦Sm−1, ziiA}

hhR, SiiA={(y, z)∈VR²A| ∃w∈VRAs.t.R(y, w)∈ A andS(z, w)∈ A}

hh·, yiiA={y}

hhRˆ1◦ · · · ◦Rˆk, yiiA={x∈VRA| ∃w∈VRAs.t.Rˆ1(x, w)∈ A andw∈ hhRˆ2◦ · · · ◦Rˆk, yiiA}

hhf: [a, b]iiA={x∈VRA| ∃a⁰, b⁰∈Ns.t.f: [a⁰, b⁰](x)∈ A anda≤a⁰≤b⁰≤b}

Note that hhR, SiiA, hhRˆ1 ◦ · · · ◦Rˆk, yiiA, and hh·, yiiA are only used to support the computation of the set hhR1◦ · · · ◦Rn = S1◦ · · · ◦SmiiAof variable names in VR_A that satisfies a role chain agreement. Intuitively,hhRn, SmiiAcomputes the set of pairs (y, z)that are connected to a single variable through the role namesRnandSm. Then, hhR1◦ · · · ◦Rn−1, yiiA(resp.hhS1◦ · · · ◦Sm−1, ziiA) recursively computes the set of variables connected toy(resp. toz) through a role chainR1◦· · ·◦Rn−1(resp.S1◦· · ·◦Sm−1).

Thereafter,A → A⁰ means thatA⁰results from the application of a rule in Figure 3 toA. As usual→^∗ denotes the reflexive and transitive closure of the→relation.A is complete (w.r.t.KB) if no rule is applicable and is clash-free if it does not contain assertions of the form⊥(x). Then, letCbe a simple concept andKBaPL⁺knowledge base, we say thatAis the tableaux ofCw.r.t.KBifAis complete and{C(x0)}→ A.^∗ Lemma 1. LetAandA⁰be two Aboxes overVRs.t.A → A⁰. For all modelsIofKB andd∈∆^I, we have thatI, d|=AiffI, d|=A⁰.

Proof. First, considerA →_func(R) A⁰and assume thatI, d|=A. This means that for some assignmentπ, withπ(x0) =d,I, π|=A. Since the rule→_func(R)is applicable to A, it must be the case thatI, π|=R(x, y1)andI, π|=R(x, y2), for somex, y1, y2∈ VR_Aand a roleRs.t.func(R)∈K. Futhermore,I |=KBby hypothesis, soπ(y1) = π(y2) and hence I, π |= A⁰. Vice versa, assume that I, d |= A⁰ and let π be the corresponding assignment. Sincey2does not occur inA⁰,πis free to mapy2arbitrarily.

Then, in order to satisfyA, it is sufficient to assignπ(y1)toy2. Since in both directions the assignment ofx0has not changed, we have thatI, d|=AiffI, d|=A⁰.

Regarding the other rules, note thatA ⊆ A⁰and henceI, d|=A⁰impliesI, d|=A by monotonicity. For the opposite direction, assume thatI, π|=A, whereπ(x₀) =d.

CaseA →u A⁰. Since→uis applicable toA,(C₁uC₂)(x)must occur inA, for some simple conceptsC₁, C₂and a variablex∈VRA.I, π|=C₁(x)andI, π|=C₂(x)

(7)

directly follow from the assumptionI, π|=A. Consequently,I, π|=A⁰.

CaseA →_vA⁰. Since→_vis applicable toA, for some atomic conceptsA, Band a variablex∈VR_A,A(x)∈ AandAvB ∈KB. Then, fromI, π|=AandI |=KB, it follows thatI, π|=B(x)and henceI, π|=A⁰.

CaseA →_∃ A⁰. From the precondition of the rule →_∃, we have that ∃R.C(x) occurs inA. Moreover, from the assumptionI, d |=A, it holdsdx ∈ (∃R.C)Î, with π(x) =dx, which means that there exists an individuald⁰s.t.d⁰ ∈CÎ and(dx, d⁰)∈ RÎ. Sincey^newdoes not occur inA, we can assignπ(y^new) =d⁰and obtainI, π|=A⁰. CaseA →domA⁰. From the precondition of→dom,R(x, y)∈ Aanddom(R, A)∈ KB. So, anyI, πsatisfyingAandKBmust also satisfyA(x)and henceA⁰.

CaseA →ranA⁰. Similar to the previous case.

CaseA →func(f)A⁰. From the precondition of the→func(f)rule, we have thatI, π satisfyf : [l, u](x)andf : [v, w](x)where, according toKB,f is a function from∆Î toN. Then, givend_x =π(x), bothl≤fÎ(d_x)≤uandv≤fÎ(d_x)≤whold which clearly means thatmax(l, v)≤fÎ(d_x)≤min(u, w). Consequently,I, π|=A⁰.

CaseA →₌ A⁰. By assumptionI, π |= Aand (R₁ ◦ · · · ◦R_n = S₁ ◦ · · · ◦ S_m)(x) ∈ A. Letd_x be the individual associated to the variablexby π. There exist two pathsd_x, d⁰₁, . . . , d⁰_n−1,d¯andd_x, d⁰⁰₁, . . . , d⁰⁰_m−1,d¯s.t. (i) (d_x, d⁰₁) ∈ RÎ₁ and (dx, d⁰⁰₁) ∈ S₁Î; (ii)(d⁰_i−1, d⁰_i) ∈ RÎ_i, withi = 2, . . . , n−1; (iii)(d⁰⁰_j−1, d⁰⁰_j) ∈ S_jÎ, withj= 2, . . . , m−1; (iv)(d⁰_n−1,d)¯ ∈RÎ_nand(d⁰⁰_m−1,d)¯ ∈S_mÎ. Since the variables y₁^new, . . . , y_n−1^new, z₁^new, . . . , z_m−1^new,y¯^neware fresh, we can extendπby mapping them to d⁰₁, . . . , d⁰_n−1, d⁰⁰₁, . . . , d⁰⁰_m−1,d, respectively.¯ I, π|=A⁰follows by construction.

CaseA →disjA⁰. According to the precondition of→disj,{A(x), B(x)} ⊆ Aand disj(A, B) ∈ KB. Then, this case is vacuously satisfied since there does not exist a modelIofKB and an assignmentπthat satisfyA.

CaseA →f⊥ A⁰.Vacuously satisfied, otherwise we havel≤f^I(d)≤uandu < l.

Finally, note that in all of the previous casesπhas been extended by mapping only fresh variables. The assignment ofx0therefore never changes, henceI, d|=A⁰. ut LetAbe a complete and clash-free Abox, a canonical model ofAis an interpreta- tionJ_A= (∆^J^A,·^J^A)satisfying the following conditions:

1. ∆^J^A =VRA;

2. A^J^A ={x∈VR_A|A(x)∈ A};

3. R^J^A ={(x, y)∈VR_A×VR_A|R(x, y)∈ A};

4. iff is non-functional (i.e.func(f)6∈KB), then

(a) f^J^A ⊆ {(x, h)∈VR_A×N| ∃l, u∈Ns.t.f : [l, u](x)∈ Aandl≤h≤u};

(b) for eachf : [l, u](x)∈ A, there exists at least onel≤h≤us.t.(x, h)∈f^J^A; 5. iff is functional, then(x, h)∈f^J^A iff

(a) h= max{l|f : [l, u](x)∈ A}orh= min{u|f : [l, u](x)∈ A};

(b) for eachh⁰6=h,(x, h⁰)6∈f^J^A.

Note that all canonical models share the same domain as well as the same interpretation of atomic concepts and roles. Concrete features involve multiple canonical models, in particular condition 4 requires a canonical model to satisfy all assertionsf : [l, u](x)for a not functionalf occurring inA. Condition 5 associates to a variablexthe maximum lower (resp. minimum upper) bound of the intervals relative toxand a functionalf.

(8)

Lemmas 2 and 3 show that a canonical modelJ_AofAsatisfies bothAandKB. Lemma 2. LetAbe a complete and clash-free Abox,JAa canonical model ofAand πthe identity assignment s.t.π(x) =x, for allxoccurring inA. Then,JA, π|=A.

Proof. Clearly,JAandπ satisfy all assertionsA(x)andR(x, y)occurring in Aby construction. Moreover, givenf : [l, u](x) ∈ Acorresponding to non-functional feature, by construction there exists a valuel ≤h≤us.t.(x, h)∈f_A^J (see condition 4).

As a consequence, alsof : [l, u](x)∈ Ais satisfied.

Letfbe a functional concrete feature and assume by contradiction thatJA, π6|=f : [l, u](x), withf : [l, u](x) ∈ A. By construction,f_A^J associates toxa single valueh which is eithermax{l |f : [l, u](x)∈ A}ormin{u|f : [l, u](x)∈ A}. Consider the first case and letf : [ˆl,u](x)ˆ be an assertion inAs.t.ˆl= max{l |f : [l, u](x)∈ A}.

From JA 6|= f : [l, u](x) and the fact thatl ≤ ˆl it follows thatˆl > u. Moreover, sinceAis complete, by applying→_func(f)tof : [l, u](x)andf : [ˆl,u](x)ˆ we obtain thatf : [max(l,ˆl),min(u,u)](x)ˆ is also inA. Then, we have thatmax(l,ˆl) = ˆland min(u,u)ˆ ≤u <ˆl. Consequently, by application of rule→f⊥we would have that⊥(x) occurs inAagainst the hypothesis thatAis clash-free. Symmetrically, the assumption thatf_A^J associates toxthe minimum upper bound once again results in⊥(x)∈ A. ut Lemma 3. LetAbe clash-free and complete w.r.t.KB, then a canonical modelJAof Ais also a model ofKB.

Proof. The proof is by reduction ad absurdum. AssumeJAdoes not satisfyAvB ∈ KB. Then by construction, for some variablex∈ VR_A,A(x) ∈ AandB(x) 6∈ A.

However, in this case→vwould be applicable toA. Analogously, the assumptions that J_A does not satisfydom(R) v A ∈ KB,ran(R) v A ∈ KB andfunc(R) ∈ KB all lead to a contradiction of the hypothesis thatAis complete. Finally, for eachf s.t.

func(f)∈KB,f_A^J is functional by construction. ut The following lemmas prove that, under the interval safety assumption, the semantic consequences of a knowledge base and a complete Abox can be computed byhh·ii_A. Lemma 4. LetAbe clash-free and complete w.r.t.KB andDa concept s.t.DandA are interval safe. Then,KB,A |=D(x)iffJ_A, x|=D, for all canonical modelsJ_A. Proof. For the the left-to-right direction, we already know that each canonical model J_AsatisfiesKBandJ_A, π|=A, whereπis the identity assignment. Then,KB,A |= D(x)in particular impliesJ_A, π|=D(x)and, sinceπ(x) =x, we have thatJ_A, x|= D. The right-to-left direction can be proved by induction on the structure ofD.

CaseD = A. By definition,J_A, x |= Aiffx ∈ A^J^A iffA(x) ∈ A. Moreover, given a modelI ofKB and an assignmentπ, the facts thatI, π |=AandA(x) ∈ A imply by definition thatI, π|=A(x). Hence,KB,A |=A(x).

CaseD=f : [l, u], wheref is not functional. We show that ifJ_A, x|=f : [l, u], for all canonical modelsJ_A, thenAcontains an assertionf : [l⁰, u⁰](x)withl⁰≥land u⁰ ≤u. This clearly ensures thatKB,A |= f : [l, u](x). Letf : [l1, u1](x), . . . , f : [ln, un](x)be the assertions inArelative tof andx, and assume by contradiction that

(9)

for alli = 1, . . . , n [l_i, u_i] 6⊆ [l, u]. Since D andA are interval safe, we have that [l_i, u_i]∩[l, u] = ∅. Then, by condition 4a, it follows thatJ_Athat does not satisfies f : [l, u], a contradiction.

Case D = f : [l, u], where f is functional. By construction, that all canonical models of A satisfy f : [l, u] means that both ˆl = maxf:[l,u](x)∈Al anduˆ = minf:[l,u](x)∈Aulie in the interval [l, u]. Then, on the one hand, beingAclash-free ensures thatˆl≤ˆuand hence[ˆl,u]ˆ is an interval contained in[l, u]. On the other hand, sinceAis complete,f : [ˆl,u](x)ˆ occurs inAand henceKB,A |=f : [l, u](x).

CasesD =D₁uD₂, D =D₁tD₂, D =∃R.D⁰ andD = (R₁◦ · · · ◦R_n = S₁◦ · · · ◦S_m)follow directly from the definitions and the induction hypothesis. ut Lemma 5. LetAbe clash-free and complete w.r.t.KB andDa concept s.t.DandA are interval safe. Then,x∈ hhDii_AiffJ_A, x|=D, for all canonical modelsJ_A. Proof. The proof is by induction on the structure ofD.

CaseD=A. In this casehhDii_A=A^J_A, for all canonical modelsJ_A.

CaseD=D1uD2. We have thatJ_A, x|=D1uD2, for allJ_AiffJ_A, x|=D1, for allJ_A, andJ_A, x|=D2, for allJ_A. By induction, this holds iffx∈ hhD1ii_A∩ hhD2ii_A and hencex∈ hhD1uD2ii_A.

Analogously, the cases whenD=D1tD2, D=∃R.D⁰andD= (R1◦· · ·◦Rn= S1◦ · · · ◦Sm)follow directly from the definitions applying the induction hypothesis.

CaseD =f : [l, u]. For the left-to-right direction,x ∈ hhf : [l, u]iiAmeans that there exists some l⁰, u⁰ s.t. l ≤ l⁰ ≤ u⁰ ≤ uandf : [l⁰, u⁰](x) ∈ A. However, this ensures thatKB,A |=f : [l, u](x)and by Lemma 4 it follows thatJA, x|=f : [l, u], for all canonical models ofA. For the opposite direction, assume that all canonical models satisfyf : [l, u]inx. As we know from Lemma 4, whetherf is functional or not, this ensures that there exists an assertionf : [l⁰, u⁰](x)inAwith[l⁰, u⁰] ⊆[l, u].

By construction follows thatx∈ hhf : [l, u]ii_A. ut

The following theorem shows that the combined subsumption checking algorithm is correct and complete.

Theorem 1. LetKB be aPL⁺knowledge base,Ca simple concept andDa concept s.t.CandDare interval safe. Then,KB |=CvDiff the tableauxAofCcontains a clash orx0∈ hhDiiA.

Proof. For the left-to-right direction, assume by contraposition thatAis clash-free and x06∈ hhDiiA. By Lemma 5, we have thatJA, x0 6|=D. Then, by Lemmas 3 and 1,JA

is a model ofKBandJA, x₀|=C. It follows thatKB 6|=CvD.

For the opposite direction, ifA contains a clash, by Lemma 1 no model ofKB satisfiesC. Hence,KB |= C v Dvacuously holds. Finally, assume thatAis clash- free andx₀ ∈ hhDii_A. By Lemmas 5 and 3 we haveKB,A |= D. LetI be a model of KB s.t. I, d |= C. Clearly, by assigningx₀ tod,I, d |= C(x₀)holds and hence I, d |=A(Lemma 1). Then,I is a model ofKBand, sinceI, d |=A, there exists an assignmentπs.t.I, π|=A. SinceKB,A |=Dit follows thatI, d|=D ut Finally, we show that tableaux-based expansion in the second phase together with the structural subsumption in the third phase can be performed in polynomial time.

(10)

Theorem 2. LetKB be aPL⁺knowledge base,Ca simple concept andDa concept s.t.CandDare interval safe. Subsumption checkingKB |=CvDis PTIME.

Proof. With respect to the tableaux construction, we show that the resulting AboxAis polynomial in the size ofKBandC. First of all, the number of variable names occurring inAdepends on the application of→∃and→=rules. Since rule→∃generates a single variable name, rule→₌generatesn+m−1variables at a time and the total number of their applications is bounded by|C|, it follows thatVR_Ais proportional to|C|.

BeingVRA ∝ |C|, it is straightforward to see that rule→u can be applied at most|C|times whereas the number of possible applications of rules→v,→dom,→ran,

→func(R), and→disjis bounded by|C| · |KB|. Finally, the applications of→func(f)and

→_f_⊥ are at most|C|³· |KB|. Then, the size of the resulting tableaux is polynomial in the size ofKB andC and hence the second phase of the subsumption algorithm is performed in polynomial time.

Then, checkingx₀ ∈ hhDiiincrementally updates the extension of the subconcepts ofDoverA. Consequently, the third phase can be performed inO(|A| · |D|). ut As said above, in caseC is not simple, the preprocessing reduces it in a normal formC₁t · · · tC_n that can be exponentially longer thanC. This in particular may occur ifCandDare not interval safe, due to introduction of disjunctions in the interval normalization (1), thus making the subsumption checking (see also Theorem 7 in [5]).

However, the subsumption algorithm can be optimized in several ways in order to improve performance. A first optimization concerns the interval normalization. Clearly, the number of resulting interval splits depend on the set of distinct end-points occurring inD. We can reduce this set by using the following syntactic replacements inD:

f[l1, u1]tf[l2, u2] f[min{l1, l2},max{u1, u2}] if[l1, u1]∪[l2, u2]is an interval (2) f[l1, u1]uf[l2, u2] f[¯l,u]¯ if¯l≤u¯andfis functional. (3) where¯l= max{l1, l2}andu¯= min{u1, u2}.

Standard optimizations can also be used to speed up the tablaux expansion. For instance, rules →_∃ and→_func(R) can be merged as follows: given ∃R.C(x), if R is not functional orxhas noR-successor, then→_∃is applied as usual. Otherwise, Cis added to the (single)R-successor ofx. The rule→_func(f)can as well be optimized. Let f : [l1, u1], . . . , f : [ln, un]be the interval constrains occuring inC that are assigned to a variablex,¯lthe maximum lower bound of{l1, . . . , ln}andu¯the minimum upper bound of{u1, . . . , un}. It is easy to see that eitheru <¯ ¯l, that isCis inconsistent w.r.t.

KB, or[¯l,u]¯ is an interval that by construction is contained in all[li, ui]. Therefore, we can replacef : [l1, u1](x), . . . , f : [ln, un](x)inAwith a single assertionf : [¯l,u](x).¯

4 Discussion

We have introduced the description logicPL⁺which allows to formalize the data usage policies adopted by controllers as well as the consent to data processing granted by data subjects. Checking whether the controllers’ policies comply with the available consent boils down to subsumption checking betweenPL⁺concepts.

(11)

The frequency of compliance checks can be high, soPL⁺has been designed to ad- dress scalability requirements by making the language as simple as possible. As a consequence, the ontology and the query languages are notably different from each other.

In particular, the ontology language largely intersects some fragments of the EL and DL-lite families of DLs [2, 4, 1]. Conversely, the query language supports disjunctions, role chain agreements, and non-convex interval constraints in order to model limitations on data storage duration. These features placePL⁺outside the space of Horn DLs [10, 11] – and in particular the tractable profiles of OWL2.

We also presented a subsumption algorithm which, in broad terms, reminds the structural subsumption algorithm for the description logic underlying CLASSIC [9].

However, CLASSIC supports neither concept unions (t) nor qualified existential restrictions (∃R.C). On the other hand, CLASSIC additionally supports number restrictions and qualified universal restrictions (that strictly generalize the range restrictions ofPL⁺), which make it incomparable toPL⁺.

To the best of our knowledge all previous encodings of policies in KR languages [14, 12, 6], focus on trust management and access control, rather than data usage control. As a consequence, those languages lack the terms for expressing privacy-related and usage-related concepts. A crucial drawback is that the main reasoning task in those frameworks is permission checking; policy comparison (which is central to our work) is not considered. Both Rei and Protune [12, 6] support logic program rules. However, if rules are recursive the policy comparison results undecidable, otherwise, it isNP-hard.

Similar considerations apply to KAoS [14], a framework based on a DL that, in general, is not tractable, and supports role-value maps that in general make reasoning undecidable (see [3], Ch. 5). The authors do not discuss how to avoid this issue. In comparison, checking compliance betweenPL⁺policies is tractable if there is a constant bound on the number of disjuncts occurring in a normalized concept.

As future work, we are planning to implement the presented algorithm and to as- sess its scalability experimentally against real data usage policies provided by SPE- CIAL’s industrial partners. Moreover, we are planning to study three further extensions of PLmotivated by SPECIAL’s use cases. Two of them are needed by a novel, dy- namic method for collecting consent. This method is based on fine-grained authorizations, where the data categories involved consist of single data points (e.g. a specific location). This can be achieved by introducing nominals in PL. Moreover, the data subject should be able to deny consent for some of such specific authorizations, which amounts to introducing some form of negation in the policy language. The third extension is needed to support so-calledsticky policies[13], that constitute a sort of a license that applies to the data released to third parties. The recipient is allowed to use the data according to the sticky policy. In order to apply the same sticky policy transitively to arbitrary sequences of data transfers, a construct similar to fixpoints is needed (see [8, 7] for an extension of description logics with fixpoints).

Acknowledgments

This research is funded by the European Unions Horizon 2020 research and innovation programme under grant agreement N. 731601.

(12)

References

1. A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. The dl-lite family and relations.J. Artif. Intell. Res., 36:1–69, 2009.

2. F. Baader, S. Brandt, and C. Lutz. Pushing the EL envelope. InIJCAI-05, pages 364–369.

Professional Book Center, 2005.

3. F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors.

The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, 2003.

4. F. Baader, C. Lutz, and S. Brandt. Pushing the EL envelope further. In K. Clark and P. F.

Patel-Schneider, editors,Proceedings of the Fourth OWLED Workshop on OWL: Experiences and Directions, Washington, DC, USA, 1-2 April 2008., volume 496 ofCEUR Workshop Proceedings. CEUR-WS.org, 2008.

5. P. A. Bonatti. Fast compliance checking in an OWL2 fragment. In J. Lang, editor,Proceed- ings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden., pages 1746–1752. ijcai.org, 2018.

6. P. A. Bonatti, J. L. D. Coi, D. Olmedilla, and L. Sauro. A rule-based trust negotiation system.

IEEE Trans. Knowl. Data Eng., 22(11):1507–1520, 2010.

7. P. A. Bonatti, C. Lutz, A. Murano, and M. Y. Vardi. The complexity of enriched mu-calculi.

Logical Methods in Computer Science, 4(3), 2008.

8. P. A. Bonatti and A. Peron. On the undecidability of logics with converse, nominals, recur- sion and counting.Artif. Intell., 158(1):75–96, 2004.

9. A. Borgida and P. F. Patel-Schneider. A semantics and complete algorithm for subsumption in the CLASSIC description logic.J. Artif. Intell. Res., 1:277–308, 1994.

10. D. Carral, C. Feier, B. C. Grau, P. Hitzler, and I. Horrocks.EL-ifying ontologies. In S. Demri, D. Kapur, and C. Weidenbach, editors,Automated Reasoning - 7th International Joint Con- ference, IJCAR 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Aus- tria, July 19-22, 2014. Proceedings, volume 8562 ofLecture Notes in Computer Science, pages 464–479. Springer, 2014.

11. D. Carral, C. Feier, B. C. Grau, P. Hitzler, and I. Horrocks. Pushing the boundaries of tractable ontology reasoning. In P. Mika, T. Tudorache, A. Bernstein, C. Welty, C. A.

Knoblock, D. Vrandecic, P. T. Groth, N. F. Noy, K. Janowicz, and C. A. Goble, editors, The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part II, volume 8797 ofLecture Notes in Computer Science, pages 148–163. Springer, 2014.

12. L. Kagal, T. W. Finin, and A. Joshi. A policy language for a pervasive computing environ- ment. In4th IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY), pages 63–, Lake Como, Italy, June 2003. IEEE Computer Society.

13. S. Pearson and M. C. Mont. Sticky policies: An approach for managing privacy across multiple parties. IEEE Computer, 44(9):60–68, 2011.

14. A. Uszok, J. M. Bradshaw, R. Jeffers, N. Suri, P. J. Hayes, M. R. Breedy, L. Bunch, M. John- son, S. Kulkarni, and J. Lott. KAoS policy and domain services: Towards a description-logic approach to policy representation, deconfliction, and enforcement. In4th IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY), pages 93–96, Lake Como, Italy, June 2003. IEEE Computer Society.