• Aucun résultat trouvé

Planning and Change in Graph Structured Data under Description Logics Constraints

N/A
N/A
Protected

Academic year: 2022

Partager "Planning and Change in Graph Structured Data under Description Logics Constraints"

Copied!
6
0
0

Texte intégral

(1)

Planning and Change in Graph Structured Data under Description Logics Constraints

Shqiponja Ahmetaj1, Diego Calvanese2, Magdalena Ortiz1, and Mantas ˇSimkus1

1 Institute of Information Systems, Vienna University of Technology, Austria

2 KRDB Research Centre for Knowledge and Data, Free University of Bozen-Bolzano, Italy

1 Introduction

The complex structure and increasing size of information that has to be managed in today’s applications calls for flexible mechanisms for storing such information, making it easily and efficiently accessible, and facilitating its change and evolution over time.

The paradigm ofgraph structured data(GSD) [8] has gained popularity recently as an alternative to traditional relational DBs that provides more flexibility and thus can overcome the limitations of an a priori imposed rigid structure on the data. Indeed, differently from relational data, GSD do not require a schema to be fixed a priori.

This flexibility makes them well suited for many emerging application areas such as managing Web data, information integration, persistent storage in object-oriented software development, or management of scientific data. Concrete examples of models for GSD are RDFS [4], object-oriented data models, and XML.

Here, we build on recent work that advocates the use of Description Logics (DLs) for managing change in GSD [7] that happens as the result of (agents or users) executing actions. We consider GSD, understood in a broad sense, as information represented by means of a node and edge labeled graph, in which the labels convey semantic information.

We identify GSD with thefinitestructures over which DLs are interpreted, and use DL knowledge bases as descriptions of constraints and properties of the data. We express actions using a specially tailored action language in which actions are finite sequences of (possibly conditional) insertions and deletions performed on the extensions of labels.

In this setting, thestatic verification problem, which consists on deciding whether the execution of a given action willpreservesome given integrity constraints on any possible GSD, has been studied in [7]. Here, we discuss further problems that can be considered as variants of planning, such as deciding if there is a sequence of actions that leads a given structure into a state where some property (either desired or not) holds, or deciding whether a given sequence of actions leads every structure into a state where some property necessarily holds. We develop algorithms for variations of these problems, and characterize their computational complexity. We consider both the case of known and arbitrary initial state. We show that existence of a plan (of unbounded length) is undecidable even for lightweight DLs and a simple forms of actions. Motivated by this we study planning for plans of bounded length, and provide tight complexity bounds for the considered variants of the problem. An extended version of this work, with proofs of the technical results and more detailed discussion can be found in [1].

This research has been partially supported by FWF projects T515 and P25518, by WWTF project ICT12-015, by EU IP Project Optique FP7-318338, and by the Wolfgang Pauli Institute.

(2)

2 Description Logics for Graph Structured Data

We now introduce the DLALCHOIQbr, which we use to describe constraints on GSD.

In DLs, the domain of interest is modeled using individuals (denoting objects), concepts (denoting sets of objects), and roles (denoting binary relations between objects). We assume countably infinite setsNRofrole names,NCofconcept names,NIofindividual names, andNV ofvariables.Rolesare defined inductively:(i)ifp∈NR, thenpandp (theinverseofp) are roles;(ii)if{t, t0} ⊆NI∪NV, then{(t1, t2)}is a role;(iii)ifr1,r2

are roles, thenr1∪r2andr1\r2are roles; and(iv)ifris a role andCis a concept, then r|C is a role.Conceptsare defined inductively as well:(i)eachA∈NCis a concept;

(ii)ift∈NI∪NV, then{t}is a concept (callednominal);(iii)ifC1,C2are concepts, thenC1uC2,C1tC2, and¬C1are concepts;(iv)ifris a role,Cis a concept, andn is a non-negative integer, then∃r.C,∀r.C,6n r.C, and>n r.Care concepts.

Aconcept(resp.,role)inclusionhas the formα12, whereα1, α2are concepts (resp., roles). Aconcept(resp.,role)assertionhas the formt : C(resp.,(t, t0) : r), where{t, t0} ⊆NI∪NV,Cis a concept, andris a role. Concepts, roles, inclusions, and assertions without variables are calledordinary. We define (ALCHOIQbr-)formulae inductively: inclusion and assertions are formulas, and ifK1,K2are formulas, so are K1∧ K2,K1∨ K2, and¬K1. Aknowledge base (KB)is a formula with no variables.

Notice thatALCHOIQbrextends the expressive DLALCHOIQ[3] with Boolean combinations of axioms, a constructor for a singleton role, union, difference and restric- tions of roles, and variables as place-holders for individuals. We consider here also the lightweight DLDL-Lite[6], which closely matches the expressive power of traditional conceptual modeling formalisms, such as UML class diagrams and ER schemas.

As usual in DLs, the semantics is given in terms of interpretations. Aninterpretation is a pairI = h∆IIi, where∆I 6= ∅ is thedomain,AI ⊆ ∆I for eachA ∈ NC, pI⊆∆I×∆Ifor eachp∈NR, andoI ∈∆Ifor eacho∈NI. We make theunique name assumption (UNA), i.e., distinct individuals are interpreted as distinct objects. For ordinary roles{(o1, o2)}, we let{(o1, o2)}I ={(oI1, oI2)}, and for ordinary rolesr|C, we let(r|C)I={(e1, e2)|(e1, e2)∈rIande2∈CI}. The function·Iis extended to the remaining ordinary concepts and roles in the usual way [3].

An interpretationIsatisfiesan ordinary inclusionα12, denotedI |=α12, ifαI1 ⊆α2I, and an ordinary assertionβ =o : C(resp.,β = (o1, o2) :r), denoted I |= β, ifoI ∈ CI (resp., (oI1, oI2) ∈ rI). Satisfaction is extended to knowledge bases as follows:(i)I |= K1∧ K2 ifI |= K1 andI |= K2;(ii)I |= K1∨ K2 if I |=K1orI |=K2;(iii)I |=¬KifI 6|=K. IfI |=K, thenIis amodelofK. The finite satisfiability problemis to decide given a KBKif there exists a modelI ofK with∆I finite. The finite satisfiability problem forALCHOIQbrKBs has the same computational complexity as for the standardALCHOIQ:

We are interested in the problem ofeffectivelymanaging GSDs satisfying the con- straints expressed in a DL KBK. Hence, we must assume that such data are offinite size, i.e., they correspond naturally tofinite interpretations that satisfy the constraintsin K. In other words, we consider configurations of the GSD that are finite models ofK.

Many of the reasoning problems we study here will be reduced to finite satisfiability, which is NEXPTIME-complete forALCHOIQbr[7]. Contrast this withDL-Lite, for which (finite) satisfiability is NLOGSPACE-complete [2].

(3)

Example 1. The following interpretationI1represents (part of) the project database of a research institute. There are two active projects and three employees working in them.

PrjI1={p1, p2}, ActivePrjI1={p1, p2}, FinishedPrjI1={}, EmplI1={e1, e3, e7}, worksForI1={(e1, p1),(e3, p1),(e7, p2)}.

We assume constantspiwithpiI =pi for projects, and analogously constantseifor employees. The KBK1expresses constraints on this project database: all projects are active or finished, the domain ofworksForare the employees, and its range the projects.

(PrjvActivePrjtFinishedPrj)∧(∃worksFor.> vEmpl)∧(∃worksFor.> vPrj)

3 Updating Graph Structured Data

For manipulating GSD, we use the following language. The basic actions allow one to insert or delete individuals from extensions of concepts, and pairs of individuals from extensions of roles. The candidates for additions and deletions can be chosen by means of complex concepts and roles. The language also allows for composition of actions and conditional action execution.

Definition 1 (Action language). Basic actionsβ and (complex) actionsαare built according to the following grammar, whereAis a concept name,C is an arbitrary concept,pis a role name,ris an arbitrary role, andKis an arbitraryALCHOIQbr–

formula. The special symbolεdenotes theempty action:

β −→ (A⊕C)|(A C)|(p⊕r)|(p r) α −→ β·α| K?α;α|ε

A (complex) actionαis calledsimpleif (i) no (concept or role) inclusions occur in α, and (ii) all concepts ofαare Boolean combinations of concept names, nominals, and concepts of the form∃r.>.

Asubstitutionis a functionσfromNVtoNI. For a formula, an action or an action sequenceΓ, we useσ(Γ)to denote the result of replacing inΓ every occurrence of a variablexby the individualσ(x). An actionαisgroundif it has no variables. An action α0is called aground instanceof an actionαifα0=σ(α)for some substitutionσ.

Intuitively, an application of an action(A⊕C)on an interpretationI stands for the addition of the content ofCI toAI. In turn, removingCI fromAIcan be done by applying(A C)onI. The two operations can also be performed on extensions of roles. Composition stands for successive action execution, and a conditional action K?α12 says that α1 is executed if the interpretation is a model ofK, andα2 is executed otherwise. We now formally define the semantics of actions.

Definition 2. Assume an interpretationIand letEbe a concept or role name. IfEis a concept, letW ⊆∆I, and ifEis a role, letW ⊆∆I×∆I. Then letI ⊕EW (resp., I EW) denote the interpretationI0such that (i)∆I0 =∆I, (ii)EI0 =EI∪W

(4)

(resp.,EI0 =EI\W), and (iii)E1I0 =EI1, for all symbolsE1 6=E. For a ground actionα, we define a mappingSαfrom interpretations to interpretations:

S(A⊕C)·α(I) =Sα(I ⊕ACI) S(p⊕r)·α(I) =Sα(I ⊕prI) S(A C)·α(I) =Sα(I ACI) S(p r)·α(I) =Sα(I prI)

Sε(I) =I SK?α12(I) =

(Sα1(I), ifI |=K, Sα2(I), ifI 6|=K.

Note that we have not defined the semantics of actions with variables. In our approach, all variables of an action are seen as parameters whose values are given before execution by a substitution with actual individuals, i.e., by grounding.

Thestatic verification problemamounts to checking, given a KBKand an actionα, that for every finite modelIofKand every ground instanceα0ofα,Sα0(I)|=K, i.e., the execution ofαpreserves the satisfaction of the constraints expressed byK.

Theorem 1 ([7]).The static verification problem is coNEXPTIME-complete for input KBs inALCHOIQbr, and coNP-complete for DL-Lite input KBs and simple actions.

The proof of this theorem relies on aregressiontechnique that incorporates into a givenKthe effects of an actionα, and reduces reasoning about its effects on any structure satisfyingKto reasoning about a single KB. In particular, static verification is reduced to finite KB unsatisfiability. This regression technique is also the main tool for most upper bounds in the next section, but we must omit proofs due to lack of space.

4 Planning

When data evolves, there may be desirable states that we want to ensure, or undesirable states that we want to avoid. For example, a finished project should never be made active again. We tackle such issues next, through someplanningproblems. We use DLs to describe states of KBs, which may act as goals or preconditions. Aplanis a sequence of actions from a given set, whose execution leads from the current state to a state that satisfies a given goal. To support unbounded introduction of fresh values in the data, we allow for the domain to be expanded with a finite set of domain elements.

Definition 3. LetI = h∆IIibe a finite interpretation,Act a finite set of actions, andKa KB (thegoalKB). A finite sequenceP =hα1, . . . , αniof ground instances of actions fromActis called aplan forKfromI(of lengthn), if there exists a finite set∆ with∆I∩∆=∅such thatSα1···αn(I0)|=K, whereI0 =h∆I∪∆,·Ii.

Example 2. RecallI1andK1from Example 1. The followinggoalKB requires thatp1 is not an active project, and thate1is an employee. Consider the following actionsα1

andα2. Actionα1movesp1from the active to the finished projects, and removes the employees working only forp1from the corresponding tables. Actionα2transferse1

from projectp1to projectp2(only if the necessary preliminary checks are successful).

Kg =¬(p1:ActivePrj)∧e1:Empl

α1 =ActivePrj {p1} ·FinishedPrj⊕ {p1} ·worksFor worksFor|{p1}· Empl ¬∃worksFor.Prj

α2=(p2:Prj∧(e1,p1):worksFor) ?worksFor {(e1,p1)} ·worksFor⊕ {(e1,p2)};ε

(5)

The sequencehα2, α1iis a plan forKgfromI1, and the interpretationSα2·α1(I1)that reflects the resulting status of the data looks as follows. Note thatSα2·α1(I1)|=K1∧ Kg.

PrjSα2·α1(I1)={p1, p2}, ActivePrjSα2·α1(I1)={p2}, FinishedPrjSα2·α1(I1)={p1}, EmplSα2·α1(I1)={e1, e7}, worksForSα2·α1(I1)={(e1, p2),(e7, p2)},

We define the next planning problems:

(P1) Given a setActof actions, a finite interpretationI, and a goal KBK, does there exist a plan forKfromI?

(P2) Given a setAct of actions and a pairKpre,K of formulae, does there exist a substitutionσand a plan forσ(K)from some finiteIwithI |=σ(Kpre)?

(P1) is the classic plan existence problem, formulated in the setting of GSD. (P2) also aims at deciding plan existence, but rather than the full actual state of the data, we have as an input apreconditionKB, and we are interested in deciding the existence of a plan from some of its models. To see the relevance of (P2), consider the complementary problem:

a ‘no’ instance of (P2) means that, from every relevant initial state, (undesired) goals cannot be reached. For instance,Kpre =Kic∧x:FinishedPrjandK=x:ActivePrj may be used to check whether starting with GSD that satisfies the integrity constraints and contains some finished projectp, it is possible to makepan active project again.

Unfortunately, these problems are undecidable in general.

Theorem 2. (P1) and (P2) are undecidable, already for DL-Lite KBs and simple actions.

To regain decidability, we define ‘bounded’ versions of these problems. (P1) becomes decidable if the size of∆in Definition 3 is bounded. (P2) remains undecidable even for∆=∅, but it becomes decidable if we place a bound on the length of plans. In the following we assume numbers are coded in unary.

(P1b) Given a setAct of actions, a finite interpretationI, a goal KBK, and a positive integerk, does there exist a plan forKfromIwhere|∆| ≤k?

(P2b) Given a set of actionsAct, a pairKpre,Kof formulae, and a positive integerk, does there exist a substitutionσand a plan of length at mostkforσ(K)from some finite interpretationIwithI |=σ(Kpre)?

The problem (P1b) can be solved in polynomial space forALCHOIQbr, and a matching lower bound holds even for settings more restricted thanDL-Lite(the goal is a basic assertiona:Cand only simple actions with no complex DL expressions are allowed). Note that planning in our setting is not harder than deciding plan existence in standard automated planning formalisms such as propositional STRIPS [5].

Theorem 3. The problem (P1b) isPSPACE-complete.

Now we establish the complexity of (P2b), both in the general setting whereKpre

andKare inALCHOIQbr, and for the restricted case ofDL-Lite.

Theorem 4. The problem (P2b) isNEXPTIME-complete. It isNP-complete ifKpre,K are expressed in DL-Lite and all actions inAct are simple.

Now we consider problems that are related to ensuring that plansalwaysachieve a goalK, given a possibly incomplete descriptionKpre of the initial data. They are variants of the so-calledconformantplanning, which deals with incomplete information.

The first such problem is to ‘certify’ that a candidate plan is always a plan for the goal.

(6)

(C) Given a sequenceP of actions and formulaeKpre,K, isσ(P)a plan forσ(K)from every finite interpretationIwithI |=σ(Kpre), for every substitutionσ?

Finally, we are interested in deciding the existence of a plan that always achieves the goal, for every possible state satisfying the precondition. Solving this problem corresponds to the automatedsynthesisof a program for reaching a certain condition.

(S) Given a setAct of actions and formulaeKpre,K, does there exist a sequenceP of actions fromAct such thatσ(P)is a plan forσ(K)from every finiteI with I |=σ(Kpre), for every substitutionσ?

(Sb) Given a setActof actions, formulaeKpre,K, and a positive integerk, does there exist a sequenceP of actions fromAct such thatσ(P)is a plan forσ(K)of length at mostk, from every finiteIwithI |=σ(Kpre), for every substitutionσ?

Theorem 5. (i) Problem (S) is undecidable, already for DL-Lite KBs and simple actions.

(ii) Problems (C) and (Sb) are coNEXPTIME-complete. (iii) IfKpre,Kare expressed in DL-Lite and all actions inActare simple, then (C) is in coNPand (Sb) is inNPNP.

The upper bounds in the third item are tight for an extension ofDL-Litethat allows to use regression for capturing action effects. We omit details due to lack of space.

5 Conclusions

We believe this work provides powerful tools for analyzing the effects of executing complex actions on graph structured data, possibly in the presence of integrity constraints expressed in DLs. The considered problems are intractable even for restricted fragments ofDL-Liteand forms of actions. However, these are worst case bounds, and we believe that practicable algorithms can still be obtained. In our future work we want to verify this, and identify meaningful restrictions to regain tractability.

References

1. S. Ahmetaj, D. Calvanese, M. Ortiz, and M. ˇSimkus. Managing change in graph-structured data using description logics. InProc. of AAAI, 2014. Long version with proofs available at http://arxiv.org/abs/1404.4274.

2. A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. TheDL-Litefamily and relations.JAIR, 36:1–69, 2009.

3. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation and Applications. CUP, 2003.

4. D. Brickley and R. V. Guha. RDF vocabulary description language 1.0: RDF Schema. W3C Recommendation, W3C, Feb. 2004.http://www.w3.org/TR/rdf-schema/.

5. T. Bylander. The computational complexity of propositional STRIPS planning.AIJ, 69:165–

204, 1994.

6. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: TheDL-Litefamily. JAR, 39(3):385–429, 2007.

7. D. Calvanese, M. Ortiz, and M. ˇSimkus. Evolving graph databases under description logic constraints. InProc. of DL, volume 1014 ofCEUR,ceur-ws.org, pages 120–131, 2013.

8. S. Sakr and E. Pardede, editors.Graph Data Management: Techniques and Applications. IGI Global, 2011.

Références

Documents relatifs

In this paper, we investigate the problem of data exchange in a heterogeneous setting, where the source is a relational database, the target is a graph database, and the schema

Data exchange is the problem of translating data structured under a source schema according to a target schema and a set of source-to-target constraints known as schema mappings..

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

We study the problem of de- termining whether a given data integration system discloses a source query to an attacker in the pres- ence of constraints, providing both lower and

Flair: An easy-to-use framework for state-of-the-art NLP. Contextual string embeddings for se- quence labeling. To- wards Scalable Hybrid Stores: Constraint-Based Rewriting to

[10] Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph

In theory, data cleaning pipelines could use IC discover algorithms to mine ICs from sample data, and then use those ICs as data quality rules to detect and repair errors in

Graph-structured data, which are receiving increased atten- tion in the database community, are tightly connected to Description Logics (DLs), given that these two formalims share