Backward Planning in the Logics of Communication and Change

(1)

Backward Planning in the Logics of Communication and Change

^?

Pere Pardo and Mehrnoosh Sadrzadeh

1 Institut d’Investigaci´o en Intel·lig`encia Artificial (IIIA - CSIC)

2 Dept. Computer Science, Oxford University

Abstract. In this contribution we study how to adapt Backward Plan search to the Logics of Communication and Change (LCC). These are dynamic epistemic logics with common knowledge modeling the way in which announcements, sensing and world-changing actions modify the beliefs of agents or the world itself. The proposedLCCplanning system greatly expands the social complexity of scenarios involving cognitive agents that can be solved. For example, goals or plans may consist of a certain distribution of beliefs and ignorance among agents.

Our results include: soundness and completeness of backward plan search (under BFS), both for deterministic planning and strong non-deterministic planning.

1 Introduction

Practical rationality or decision-making is a key component of autonomous agents, like humans, and correspondingly has been studied at large. Research on this topic has been conducted from several fields: game theory, planning, decision theory, etc. each focus- ing on a different aspect: strategic decision-making, propositional means-ends analysis, and uncertainty, respectively. Moreover, some of these models of rationality also provide a foundation for the notion of agreement (game-theoretic approaches to negotia- tion, multi-agent planning, etc). Indeed, the process of reaching an agreement seems to be a particular case of (multi-agent) practical decision-making.

While these models of practical rationality are well-understood, they were (under- standably) designed with a considerably low level of expressivity at the object language.

For instance, game-theory abstracts from propositional representations of states, actions and goals. Planning algorithms take them into account, but not fully in the object language, which makes it difficult to extend these algorithms to reason about others’ beliefs and actions. All this contrasts with the area of logic, where logics with increasing expressivity have been characterized; though abilities or motivational attitudes have only more recently been dealt with, in BDI, logics of strategic ability, etc. In the present contribution we mainly focus on combining dynamic epistemic logic and plan search [9], [5].

Specially relevant to the topic of cognitive agents are the notions of belief, action, goal, norm, etc. The first two elements are the target of dynamic epistemic logicsDEL, a recent family of logics for reasoning about agents’ communications and observations. We focus on the so-called Logics of Communication and Change LCC [10],

?AT2012, 15-16 October 2012, Dubrovnik, Croatia. Copyright held by the author(s).

(2)

which generalize many previously knownDELlogics with physical actions and common knowledge. In particularLCClogics include a rich variety of epistemic actions from the literature onDEL, including several types of communicative actions (truthful or lying, public or private announcements). This is done with the help of action models, that assign each action an accessibility relation describing how the action appears to other agents, and hence what they learn after its execution.

These logics may offer a semantic understanding of the notion of agreement between agents, since reaching an agreement requires these agents to have common knowledge about it. The same can be said about the epistemic aspects involved in the (intentional) process of reaching such an agreement. To this end, we address planning problems expressible inLCC. This is done by introducing goals, which can be an arbitrary formula in the language ofLCC. Planning domains consist as usual of an initial state formula, available actions and a goal formula, and the task consists in finding some structure of actions that make the goal formula true when executed in the initial state.

Instead of a semantic [3],[11],[7] or a proof-theoretic approach [1], we suggest to reduce plan search into search in the space of proofs. With more detail, search methods operate in the space ofLCCformulas that are about plans (their success, and executability). As in those works, anLCCplan is built while having in mind what agents can learn during its execution.

In the present contribution, we extend previous results on Backward Planning by describing simpler algorithms for deterministic and strong non-deterministic planning.

These are shown to be sound and complete: if the algorithm terminates, it outputs a solution plan; and if a successful plan exists, then the algorithm will find such a solution.

Motivating example.Our aim, then, is to endowLCCreasoning agents with planning capacities for these logics, so they can achieve their goals in scenarios where other agents have similar cognitive and acting abilities. In particular,LCCplanning seems necessary for an agent whose goals consist in (or depend on) a certain distribution of knowledge, ignorance and false beliefs among agents. To illustrate the kind of rational behavior anLCCplanner can exhibit, consider the following example.

Example 1. Agentajust bet some prize agentbthat the next coin toss will land heads (h);aknows she can sense and even flip the coin withoutbever suspecting it. Given a sensing action that tellsawhetherhholds or not, a successful plan seems to be: toss the coin; if sense thath, then announce it tob; otherwise flip the coin and announceh.

Structure of the paper.The paper is structured as follows: after a review of the related work in Section 2, we briefly present the dynamic epistemic logics inLCCin Section 3. Then in Section 4, we present an algorithm for deterministic planning inLCCbased on Breadth First Search (BFS) and show it is sound and complete. The rest of the paper is devoted to non-deterministic planning: Section 5 we present an extension ofLCC with action composition and choice. Finally, in Section 6 we propose an algorithm for (strong) non-deterministic and show its soundness and completeness. The paper concludes with a summary and some topics for future work.

(3)

2 Related Work

Several logics address the related problem of plan existence (rather than plan search) for similarly expressive languages. For example, arbitrary public announcement logics APAL [4], and logics for strategic ability. These logics contain modal operatorshCifor agents or coalitions, with formulashCiϕexpressing: there is a plan or strategy allowing agents inCto enforceϕ.

More in line with the present aims, among logics for action guidance we find the family ofBDIand related logics for intention (or some other motivational attitudes).

While these logics usually allow for considerable expressivity w.r.t. intention (and their interaction with beliefs), they are not completely understood, and in fact planning methods have been suggested forBDIagent architectures. In particular, Bolander et al. [3]

suggest the use ofLCCplanning to this end for (a limited fragment of)BDI. In this work [3], the authors study incremental forward planning based on the semantics of update models. Byincrementalwe mean that the plan is built in a step-wise fashion: plans are being refined (with further actions) until a solution is found. The framework of [3]

is shown to be sound and complete under Breadth First Search, and hence the proposed planning system forLCCbecomes semi-decidable. Here we study the opposite direction of search, i.e. backward planning (from the goals to the initial state). The reason is that in forward planning, relevant actions to be considered are those that arecurrently executableactions to be the notion of relevant action, rather thancurrent goal enforcing actions, as in backward planning. This makes a difference inLCCsince many actions may exist (in the action model) which are everywhere executable, so forward planning would typically face the state explosion problem. Along this line, Aucher [1] (and related papers) introduce regression methods for the fragment ofLCCwithout common knowledge. The present work differs from the latter two by making plan search incremental (as in [3], [11]), but allowing for backward search. The proposed algorithms easily adapt to search for optimal plans in backward planning.

3 Preliminaries: The Logics of Communication and Change

We briefly recall the basics of LCC logics [10]. Since these are built by adding an action modelUon top of (an epistemic reading of) propositional dynamic logicPDL, we recall the latter first. The syntax ofPDL, denotedLPDL, is as follows:

ϕ::=p| ¬ϕ|ϕ₁∧ϕ₂|[π]ϕ π::=a|?ϕ|π₁;π₂|π₁∪π₂|π^∗

As usual, the symbols⊥,∨,↔andhπiare defined from the above as abbreviations.

In the epistemic reading ofPDL, denoted asE·PDL, the set of atomic programsa is just the set of agentsAg, and we read the program[a]asagentabelieves (or knows) that.³With thePDLprogram constructors (composition “;”, choice “∪” and the Kleene star “(·)^∗”, we can model:

3E·PDL, andLCC, are rather abstract about the particular properties of the accessibilityRa

relation of agenta, that distinguish belief from knowledge.

(4)

nested beliefs [a;b] agentabelieves agentbbelieves that group belief [B], or[a∪b] agents inB(={a, b})believe that common knowledge [B^∗], or[(a∪b)^∗] agents inBhave common knowledge that

AnE·PDLmodelM = (W,hRai_a∈Ag, V)does, as usual, contain a set of worlds W⁰, a relationRafor each agenta, and an evaluationV :Var→ P(W)).

AnLCClogic will add to anE·PDLlanguage a set of modalities [U,e]for each pointed action model U,ewith distinguished (actual) action e. These new operators [U,e]readafter each execution of actioneit is the case that. An action model is a tuple U= (E,R,pre,post)containing

– E={e0, . . . ,en−1}, a set of actions

– R:Ag→(E×E), a map assigning a relationR_ato each agenta∈Ag – pre:E→ LPDL, a map assigning a preconditionpre(e)to each actione

– post:E×Var → LPDL, a map assigning a post-conditionpost(e)(p), orp^post(e), to eache∈Eandp∈Var

The semantics ofLCCconsists in computingM, w|= [U,e]pin terms of the product update ofM, wandU,e. Their product update is (again) anE·PDLpointed model M ◦U,(w,e), whereM◦U= (W⁰,hR⁰_aia∈Ag, V⁰)is as follows. The setW⁰ consists of those worlds(w,e)such thatM, w|=pre(e), so executingewill lead to the corresponding state(w,e). The relation(w,e)R⁰_a(v,f)holds iff bothwR_avandeR_afhold.

Finally,V⁰(p) = {(w,e) ∈W⁰ | M, w |=post(e)(p)}, i.e. the truth-value ofpafter executingedepends on that ofpost(e)(p)before the execution.

Here, though, we will follow [14] and restrict postconditions to post(e)(p) ∈ {p,>,⊥}. This makes the truth-value ofpaftereto be either: that beforee, or simply true or false irregardless of its truth-value beforee. (This framework is shown in [14] to be as expressive as the general case above, in case choice of actions is considered. See Section 6.) ThePDLsemantics[·]forE·PDL-formulas can be extended to a semantics forLCCas follows:

[[U,e]ϕ]^M ={w∈W | ifM, w|=pre(e) then (w,e)∈[ϕ]^M◦U}.

In [10], the authors define program transformersT_ij^U(π)provide a mapping between E·PDLprograms. Given any combination of ontic or epistemic actions (e.g. public and private announcements) the transformers provide a complete set of reduction axioms, reducingLCCtoE·PDL. These axioms push theU,e-modalities inside the formula, and then the basic case[U,e]pis turned into anE·PDLformula. See [10] for the details on theT^U(π)mapping.

Definition 1. A calculus for LCC is given by the axioms and rules for PDL plus [U,e]> ↔ >and the following

[U,e]p↔(pre(e)→post(e)(p)) [U,e]¬ϕ↔(pre(e)→ ¬[U,e]ϕ) [U,e](ϕ1∧ϕ2)↔([U,e]ϕ1∧[U,e]ϕ2) [U,ei][π]ϕ↔Vn−1

j=0[T_ij^U(π)][U,ej]ϕ and necessitation rules for each action model modality: if`ϕthen`[U,e]ϕ.

The completeness for this calculus is shown by reducingLCCtoE·PDL. The translation simultaneously defined for formulas and programs, resp.tandr, is

(5)

t([U,e]p) =t(pre(e))→p^post(e) t([U,e_i][π]ϕ) =Vj=n−1

j=0 [T_ij^U(r(π))]t([U,e_j]ϕ) t([U,e]¬ϕ) =t(pre(e))→ ¬t(ϕ) t([U,e][U,f]ϕ) =t([U,e]t([U,f]ϕ))

ris trivial for all programs, andtis also trivial for the remaining types of formula. The translation assigns to eachLLCCformula an equivalent formula inL_E·PDL.

4 Backward deterministic LCC planning.

In [8], we studiedLCCplanning with the help of generalized frame axioms forLCC (actually, theorems in this logic) used to study the regression of an open goal under an action. Here we propose a simpler approach, based on the above reduction ofLCCinto E·PDL; and moreover they allow for better plan search methods. We will simply use the expressions of the form

t([U,e]ϕ∧ hU,ei>)→[U,e]ϕ∧ hU,ei>

These give rise to a goal-regression function: a current goalϕis mapped intot([U,e]ϕ∧

hU,ei>); the latter is the new goal we need to address for an execution ofeto end up in aϕ-world.

Definition 2. Given someLCClogic for an action modelU, aplanning domainis a triple M = (ϕ_T, A, ϕ_G), whereϕ_T, ϕ_G are consistentE·PDL formulas describing, resp., the initial and goal states; andA ⊆Eis the subset of a actions available to the agent.

AsolutiontoMis a sequenceπ= (f1, . . . ,fm)of actions inA, such that

|=ϕT →[U,f1]. . .[U,fm]ϕG and |=ϕT → hU,f1i. . .hU,fmi>

Definition 3. Given some planning domain M = (ϕT, A, ϕG), the (initial) empty plan is the pair π_∅ = (∅, ϕG) and if π = (π, ϕ_goals(π)) is a plan, then π(e) = (π^∩hei, ϕgoals(π(e))), defined by the goal ϕgoals(π(e)) = t([U,e]ϕ_goals(π)∧ hU,ei>), is also a plan.A planπ is aleaf iff ϕgoals(π(e)) is inconsistent, or |= ϕgoals(π(e)) → ϕgoals(π).

Leafs are plans not worth considering, either because unexecutable or non- contributing to the ultimate goalsϕ_G. Also note that actionse-as defined above- are deterministic, in the sense that|= [U,e]ϕ∨ψ↔[U,e]ϕ∨[U,e]ψ. Thus,deterministic planningsimply consists in “restricting” actionse∈Eto our current action models, or simplyLCCplanning. Later we extendLCCwith composition⊗and choice∪to study the non-deterministic case. A BFS plan search algorithm for deterministic planning is the following.

Theorem 1. BFS is sound and complete forLCCbackward planning: the outputπof the algorithm in Fig. 1 is a solution for(ϕT, A, ϕG), and is such a solution exists, then the algorithm terminates with such an output.

(6)

1.LETPlans=hπ∅iandϕ=ϕG.

2.SELECTthe first elementπofPlans.DELETEπfromPlans.

3.Ifπis a leafREPEATstep 2. If|=T →goals(π), thenOUTPUTπ.

OtherwiseSETPlans←Plans^∩hπ(e)|e∈Ai.REPEATstep 2.

Fig. 1.BFS algorithm for deterministic planning with input(ϕT, A, ϕG).

4.1 Optimal planning.

We may consider as well a cost functioncost :E →R⁺ assigning toethe cost of an execution ofe.⁴This gives a measure of the total cost of a plancost(π) =Σ_e∈πcost(e).

Recall search for optimal plans (e.g. A*) can be defined from BFS in Fig. 1 by adding an extra step in step 3: after updatingPlans, this set is re-ordered into a sequence of plans with increasing cost. That is,πis re-ordered beforeπ⁰ inPlansif and only if cost(π) ≤cost(π⁰). Since by definitioncost(e) >0, and the setEis finite, there can be no infinite sequence of actions with decreasing cost. This implies (using [9] Thm 1) that A* is complete and admissible for deterministic backward planning inLCCusing the (trivial) heuristic estimationh(π(e)) =cost(π) +cost(e).

Corollary 1. A* is sound, and complete forLCCbackward deterministic planning. It is also admissible: the output of A* is an optimal solutionπfor(ϕT, A, ϕG), if some solution exists.

Fig. 2.(Left) Announcing¬pas insider in Example 2. (Right) Announcing¬pdisguised as a fortune-teller.

4For example, the energy cost of voicing an announcement thatϕinto a crowd,ϕ!^aB, can be defined in terms of|B|and the length ofϕ.

(7)

Example 2. Suppose you are a high-ranked employeea^?inPhil. Incand want to sell some sensible information, ¬p = Phil. Inc isnotdoing well, to a close acquaintance a₁, an important investor inPhil. Inc. It is commonly ignored that¬p, except for you anda0, also an important investor inPhil. Inc. Knowing¬p,a0 wants to sell stocks beforea1does, but she prefers to wait for the price to increase a bit more. As long as [a0]ha1ipholds, you can sell the info, so your actionseshould preserve this formula while causing[a1]¬p. To make things more difficult, say agentsa0, a1are sitting next to each other.

Figure 2 (Left) models your truthful announcement of¬p, presenting yourselfas insidera^?; that is,¬p!^a_a^?₁. Intuitively, this would not preserve[a0]ha1ip; and indeed, available actions would either destroy the old goal or would not be successful about the new goal.

Figure 2 (Right) models¬p!^aa^?1^/f: your truthful announcement that¬pwhen dis- guised as a fortune-teller. You expecta1will identify you asa^?, whilea0will think it common belief that you are a fortune-teller (and that no one can tell a fortune-teller is right or not). The new goal raised by this refinement is already true in anyϕT state. So this action already constitutes a solution to the problem.

5 LCC with composition and choice

In this section we introduce non-deterministic actions into planning models, and study the corresponding logicLCC_⊗∪. This will be used in Section 6 to find an algorithm for non-deterministic planning in this logic. First, we expand anyLCClogic with action composition⊗, and later we add choice∪. These operations model the following executions:

– compositione⊗fmodels an execution ofefollowed by an execution off, and – choicee∪f, models non-deterministic actions: each execution ofe∪f either in-

stantiates as an execution ofeor as an execution off.

We introduce first composition⊗in the action modelsUand define the logicLCC⊗, which can be reduced toLCC. Then, we will introduce choice∪and its semantics, the latter be based on the semantics ofLCC_⊗.

5.1 LCC⊗: extendingLCCwith composition of actions.

On the one hand, the introduction of composition⊗proceeds as usual. We will simply consider that the action modelUis now closed under finite composition⊗. To do so, we must define first the composition of an action model by itselfU⊗U, also denoted U²= (E²,pre²,post²,R²).

Definition 4. LetU = (E,R,pre,post)be an action model. We defineU⊗U = (E× E,R◦R,pre,post)as follows:

pre(e⊗e⁰) =pre(e)∧[U,e]pre(e⁰) post(e⊗e⁰)(p) =

(post(e)(p) ifpost⁰(e⁰)(p) =p post⁰(e⁰)(p) ifpost⁰(e⁰)(p)∈ {>,⊥}

(8)

Lemma 1. We have the following isomorphism

M ◦(U⊗U)∼= (M ◦U)◦U.

Definition 5. The axioms and rules corresponding toLCC_⊗ are those ofLCCplus:

axiomKand Necessitation fore⊗fand the reduction axiom:

[U,e⊗f]ϕ↔[U,e][U,f]ϕ

The LCCcalculus extends to LCC_⊗ using this reduction axiom. Moreover, the previous Lemma 1 extends to the general updateU^⊗with compositionse⊗ · · · ⊗fof at mostnactions inE, for any fixed finite numbern. (This is done by taking disjoint unions of update models withn⁰ compositions, for eachn⁰ ≤n.) The extensions ofLCCfor these compositional action models (defined by the corresponding axioms) will also be denotedLCC⊗. Having such a boundnon composition means in practice that we will not know a priori whichLCC_⊗is actually needed to solve the planning problem. Only after the plan search algorithm terminates, we can identify this logic. A similar remark applies to the logical extensionsLCC_⊗∪studied next.

5.2 LCC_⊗∪: extendingLCCwith composition and choice.

Now we will introduce the choice for actionse∪f, denoting an indeterminate choice betweeneandf. Note if the actione∪f is available (inA), then in generale,f will be inEbut not inA. As suggested in [10], we adopt the (multi-pointed) semantics and axioms to model choice in an extension ofLCC. (Note in the previous semantics we cannot extendpost(·)(p)to the choice of ontic actions e∪f if these disagree about post(·)(p).)

Definition 6. Given an epistemic modelM and an action modelU, letWd ⊆W and Ed={f1, . . . ,fk} ⊆E. ThenM, WdandU,Edare multi-pointed models. We define

M, W_d|=ϕ iff M, w|=ϕ for eachw∈W_d M, w|= [U,E_d]ϕ iff V

f∈E_dM, w|= [U,f]ϕ

iff M, w|=pre(f)impliesM◦U,(w,f)|=ϕ, for eachf∈E_d Choice will be indistinctly represented as followsEd,{e, . . . ,f}ore∪. . .∪f.

In summary, the semantics of update with a multi-pointed action modelU,e∪fcan be defined by the multi-pointed epistemic modelM◦U,{(w,e),(w,f)}. The semantics of [U,e∪f]from Definition 6 then becomesM, w|= [U,{e,f}]ϕiff

M, w|=pre(e)⇒M ◦U,(w,e)|=ϕandM, w|=pre(f)⇒M◦U,(w,f)|=ϕ So in caseM, w |= pre(e)∧pre(f), the two RHS conditions reduce to satisfaction in a multi-pointed epistemic model:M ◦U,{(w,e),(w,e⁰)} |=ϕ. In fact, for simplicity here we only consider the casepre(e) = pre(f), for any (available) non-deterministic actione∪f.

Also note thate∪fis not an action in the setEof the action modelUorU^⊗. The syntax ofLCC_⊗∪extends that ofLCC_⊗with the further type of formulas[U,E_d]ϕ, for eachEd⊆E(whose elements have identical preconditions). In [10], the next reduction axiom was suggested for∪:

(9)

[U,Ed]ϕ↔V

e∈Ed[U,e]ϕ

Definition 7. The axioms and rules corresponding toLCC_⊗∪are those ofLCC_⊗plus:

axiomKand Necessitation for[U,e∪. . .∪f], and the previous reduction axiom, i.e.:

[U,e∪. . .∪f]ϕ↔[U,e]ϕ∧. . .∧[U,f]ϕ

Fact 1 The usualLCCreduction axioms for[U,e]are derivable for[U,e∪e⁰], except for[U,e∪f]¬ϕ↔(pre(e∪f)→ ¬[U,e∪f]ϕ).

Proposition 1. The axioms and rules from Def. 7 are sound and complete forLCC_⊗∪.

6 Non-Deterministic Planning

Now we turn into strong planning for domains with non-deterministic actions, i.e. planning involving actions with (truly) disjunctive effects

|= [U,f0∪f1]p∨q, but with 6|= [U,f0∪f1]p and 6|= [U,f0∪f1]q (For this, letpost(f₀)(p) =post(f₁)(q) =>, andpost(f₀)(q) =qandpost(f₁)(p) = p). A strong solution for a given planning domain is a plan such that all of its executions in the initial state lead to a goal state. Thus, ignoring preconditions, the above action f0∪f1 is a strong solution to(ϕT,{f0∪f1}, ϕG), ifϕG = p∨q; and it is a weak solution ifϕG = p.) Consider the action oftossing a coin, whose effect isheads or tails, denotedh∨ ¬h. We represent it as a choice between two actions:tossh,toss_¬h, with postconditions

post(toss_h)(h) =>, andpost(toss_¬h)(h) =⊥

Note bothtosshandtoss_¬hare deterministic actions inE, though neither of them will be inA. The reason is that they are not individually available to the agent: onlytoss(h)∪

toss(¬h)models this agent’s abilities and hence only the latter can be used during plan search. On the other hand, e∪f is an element ofA but not ofE. For simplicity, we only define the choice between two actions f₀ ∪f₁. The following definitions easily generalize to the choice of finitely many actionse∪ · · · ∪f.

First we must redefine solution for the non-deterministic case; the reasons for this are the generalization of available actionsAto the non-deterministic case, and the the limited reduction axioms that characterizeU,e∪fmodalities.

Definition 8. Given an action modelUinLCC, letAbe a set of actionseinEand actions of the formf₀∪f₁, wheref₀,f₁∈E. We define a setA^⊗∪ofA-modalitiesin the extended language ofLCC_⊗∪as follows:

(A) (E∩A)^<ω⊆A^⊗∪

(∪)ife,e⁰∈A^⊗∪, thene∪e⁰∈A^⊗∪

(⊗L)ife∈A^⊗∪ande⁰ ∈(E∩A)^<ω, thene⁰⊗e∈A^⊗∪

(⊗R)ife∈A^⊗∪ande⁰ ∈(E∩A)^<ω, thene⊗e⁰ ∈A^⊗∪

(10)

Ifedoes not make use of(⊗R)we say it is insearch form. Ifedoes not make use of (⊗L)or(⊗R)we say it is inbranching form.

See Figure 3 for an illustration of these and other modalities. Note a modalityU,ein branching form essentially consists in a conjunction ofLCC_⊗modalities. The output of a plan search algorithm forLCC_⊗∪will be anA-modalityU,ein tree form. But let us define first when anA-modality is a solution for a planning domainM.

Definition 9. Let theLCC_⊗∪ logic of some action modelU(= U^⊗)be given, plus a set of available actionsA. LetM= (ϕ_T, A, ϕ_G)be some planning domain definable in this logic. We say an actione∈A^⊗∪is asolutionforMiff

(1)|=ϕT →[U,e]ϕG and (2) |= ^

−

→e a path inΠ

ϕT → hU,−→ei>

The only difference with Definition 2 lies in executability: we check that every possible instantiation−→e of edoes indeed lead to some state (and so, by (1), the actual instantiation will lead to a goal state).

For (strong) non-deterministic planning, the idea is to reduce the search for a non- deterministic outputΠ to a tree of deterministic solutionsΠ = ({π, . . .},≺). With more detail, the planning domainM = (ϕ_T, A, ϕ_G)will split into a tree of planning domainsM^ξ, each to with its own deterministic solutionπ^ξ. For this, we need to enrich the notationπ^ξ = (π^ξ, ϕ_goals(πξ))into a 4-tuple as follows

plan= (action seq.,open goals,initial state,original goals),

π^ξ = ( π^ξ, ϕ_goals(πξ), ϕ_Tξ, ϕ_Gξ )

The tree structure of these deterministic plans is given by the successor relation≺. A planπ^ξand any of its successorsπ^ξ⁰ are related by some formula they share:π^ξ≺π^ξ⁰ means that the initial stateϕ_Tξ0 inM^ξ⁰ is part of the goalsϕ_GξforM^ξ. Besides the tree structure for these componentsπ^ξ, these and their planning domainsM^ξwill be totally ordered by some lexicographic ordering<_lexon the labelsξ. This ordering will capture the order in which planning domainsπ^ξare to be solved.

See Figure 3 for an illustration of the lexicographic ordering of labelsξ, namely 0<_lex1<_lex2<_lex21<_lex22<_lex221<_lex222.

Another modification w.r.t.LCCdeterministic planning concerns the Terminating Condition for plans introduced after refining with some non-deterministic actione∪f.

For a plan of the formπ^ξ1addressing the case wheree∪finstantiates asf, we replace the condition

|=ϕ_Tξ1 →ϕ_goals(πξ1) by |= [U,f]ϕ_goals(πξ1)

The reason is that we need to capture the effects off, for which a single formula might not suffice. (Consider, for example, all the consequences of common knowledge after a public announcement). We will denote this kind of initial states as “[U,f1](·)”. If this kind of Terminating Condition is satisfied byπ^ξ1, the resulting plan solution forM^ξ1 will be defined asπ^ξ1(f1).

(11)

Fig. 3.(a) An output of non-deterministic planning BFSnd; (b)-(d) three equivalent representations of (a) as modalitiesU,e: in (b) compact, (c) search and (d) branching form. Dotted, parallel lines (e.g. withinπ²², π²²¹) represent two instances of the same actions (independently found by each corresponding BFS execution).

Definition 10. Given some action modelU, letM= (ϕT, A, ϕG). We define aplan set Π = ({π^ξ, π^ξ⁰, . . .},≺)

where ≺ is a successor relation inducing a (strict) partial order on a set of deterministic plans. Theempty plan setis defined asΠ_∅ = ({π⁰},∅), whereπ⁰ = π_∅ = (∅, ϕG, ϕT, ϕG)is the empty plan forM.

IfΠ = ({. . . , π^ξ, . . .},≺)is a plan set, letπ^ξ = (π^ξ, ϕ_goals(πξ), ϕ_Tξ, ϕ_Gξ)be the plan inΠ not solvingM^ξ, and whose labelξis<lex-minimal among the plans with this property. Lete∈A∩Eandf=f₀∪f₁∈A. The refinements of the plan setΠare:

Π(e) = ( {. . . , π^ξ(e) , . . .},≺ ) Π(f) = ( {. . . , π^ξ, π^ξ1, π^ξ2, . . .},≺⁺).

under the convention that the refinement π^ξ(e) will be again denoted as π^ξ. While π^ξ(e)is as in Def. 3 (but now a 4-tuple), the triple of new plansπ^ξ, π^ξ1, π^ξ2 and the new partial order≺⁺are defined as follows.

plan actions open goals initial state original goals

π^ξ π^ξ(f0) t([U,f0]ϕ_goals(πξ)∧ hU,f0i>) [U,f0]ϕ_goals(πξ)∧ hU,f0i> ϕ_Gξ

π^ξ1 ∅ ϕ_Gξ “[U,f1](·)” ϕ_Gξ

π^ξ2 ∅ ϕ_goals(πξ(f₀))∧ϕ_goals(πξ1(f₁)) ϕ_Tξ as open goals

≺⁺=≺ ∪{hπ^ξ2, π^ξi,hπ^ξ2, π^ξ1i}.

Example 3. Let the outputΠ for some(ϕT, A, ϕG)andAbe as in Figure 3. The left- most figure represents the output of the BFSndalgorithm, built incrementally accord- ing to the lexicographic ordering:Π =hπ^∅, π¹, π², π²¹, π²², π²²¹, π²²²i. Arrows also

(12)

indicate the direction of backward search, with horizontal arrows representing the introduction of a non-deterministic action.

See Figure 4 for a description of the BFSndalgorithm for non-deterministic planning. If a planΠ is the output of BFSnd, then we also call its≺-minimal element, say π^ξ2, asπ^root.

1.LETPlans=hΠ∅i.

2.SELECTthe first elementΠofPlans.DELETEΠfromPlans.

3.SELECTthe<lex-minimal elementπ^ξofΠnot solvingM^ξ. Ifπ^ξis a leafREPEATstep 2.

IfTerminating Condition holds forϕ_goals(πξ), andξis<lex-maximal inΠ, thenOUTPUTΠ.

OtherwiseSETPlans←Plans^∩hΠ(e)|e∈Ai.REPEATstep 2.

Fig. 4.BFSndalgorithm for non-deterministic planning with input(ϕT, A, ϕG).

First we show how to translate a BFS_ndoutput (plan set) forMinto a modalityU,e^? withe^? ∈ A^⊗∪, and such thatU,e^? is in search form. Then we show soundness of BFSnd, i.e. thatU,e^?to be a solution forM.

Definition 11. Given an outputΠ for a planning domain(ϕ_T, A, ϕ_G), we define the modalitye^?induced fromΠ as follows. First, for eachπ^ξ = (f_m, . . . ,f₁)∈Π, define f^ξ=f1⊗ · · · ⊗fmandf^ξ =skipifπ^ξ =π_∅. Then, define

e^ξ=

(f^ξ ifξis ≺-maximal f^ξ⊗(e^ξ⁰∪e^ξ⁰⁰) ifξ≺ξ⁰, ξ⁰⁰ Finally, we definee^?=e^root.

Theorem 2. For any output for Π for a givenM = (ϕT, A, ϕG), the inducedU,e^? modality defines a solution forM:

(1)|=ϕ_T →[U,e^?]ϕ_G and (2) |= ^

−

→e a path ine^?

ϕ_T → hU,−→ei>

Now we proceed to show that BFSndis also complete for non-deterministic planning inLCC_⊗∪.

Lemma 2. For a given action modelUand a set of available actionsA,

each modalityU,einA^⊗∪is logically equivalent to a modalityU,e^?inA^⊗∪search form.

Using this Lemma, we may simply assume next that the solution, to be found by BFSnd, is in search form.

(13)

Theorem 3. Given an LCClogic of some action model U, a setA, and a planning domainMexpressible in the extended logicLCC_⊗∪. LetU,ebe a modality, withe∈A in search form. IfU,eis a solution forM, then the algorithmBFS_ndforMterminates with an outputΠ.

Let us note that from Theorems 2 and 3 we can conclude that backward planning for LCC⊗∪ logics is at least semi-decidable. For the case of (incremental) forward planning, this result is shown in [3]. To illustrate non-determinsitic planning, let us finally solve Example 1.

Fig. 5.Plan search in Example 1. A single planπ⁰ in the plan set (Left), is refined by a non- deterministic action (Center), splittingπ⁰intoπ⁰, π¹, π². These plans are latter solved, as shown in (Right).

Example 4. Recall Example 1, where the planner agentamust show heads, denoted h, to win the prize. Among the actionsAof agenta, we find: getting the coin; tossing the coin; flipping the coin to heads (with precondition thataknows that¬h); a set of announcements, includingjustified truthful announcementsϕ!!^a_b(the latter require that the announcing agent knows the truth ofϕ); and secret observations, for headsh?aand tails¬h?a. (Their secrecy consists inbmistaking either action byskip_a.) In 5(Left), the BFSnd algorithm searches first for a single planπ⁰. This particular π⁰ demands ϕ_goals(π0)=h, and the only action possibly enforcinghis to toss the coin, which also can lead to¬h. Refining with this action is shown in (Center). Finally, (Right) shows the output of BFSndthat solves this problem.

Before concluding, let us remark that the previous A^∗search for optimal planning easily extends to the non-deterministic case. Among the several possible options to

(14)

define the total cost of a non-deterministic plan, consider e.g. the pessimistic and opti- mistic estimations based on

cost(e∪e⁰) = max{cost(e),cost(e⁰)} and cost(e∪e⁰) = min{cost(e),cost(e⁰)}

Thus, the cost of a plan is themaxormincost of its paths. For each of these cost aggregation functionsmax,min, one can redefine the BFSndalgorithm so that instead of stopping when the first outputΠis found (with costk), it continues search until all plansΠ⁰∈Planshave at least some path whose cost is≥k. Hence, optimal planning can be extended to strong non-deterministic planning inLCC_⊗∪.

Conclusions and Future Work

We presented backward planning algorithms for a planner-reasoner agent enabling this agent to find deterministic or (non-deterministic) strong plans in multi-agent scenarios.

The logics considered here are dynamic epistemic logics with ontic actions, thus allowing the plans to contain communicative actions, sensing or the usual fact-changing actions. We hope the proposed work might serve for practical reasoning in communicative agents, and in particular provide a logical foundation for the modeling (and computing) of agreements among motivated agents. As for future work, several direc- tions seem interesting: like the introduction of dynamic operators for belief revision, and at a more applied level, the study of heuristic criteria for optimal plans.

Acknowledgements:This work has been funded by projects AT (CSD 2007-022); Lo- MoReVI (FFI2008-03126-E/FILO FP006); ARINF (TIN2009-14704-C03-03); and the GenCat 2009-SGR-1434 and EPSRC grant EP/J002607/1.

References

1. G. Aucher. DEL-sequents for progression,Journal of Applied Non-Classical Logics21(3-4):

289–321 (2011)

2. A. Baltag, L. Moss and S. Solecki. The logic of public announcements, common knowledge and private suspicions, Proc. of 7th Theoretical Aspects of Representation Knowledge TARK’98pp. 43–56 (1998)

3. T. Bolander and M. Birkegaard Andersen. Epistemic planning for single- and multi-agent systemsJournal of Applied Non-Classical Logics, in press (2011)

4. T. French and H. van Ditmarsch. Undecidability for arbitrary public announcement logic, Advances in Modal Logic AiML 2008, F. Wolter, H. Wansing, M. de Rijke and Zakharyaschev (eds.), Vol. 7, pp. 23–42, CSLI Stanford (2008)

5. M. Ghallab, D. Nau and P. Traverso.Automated Planning: Theory and Practice, Morgan Kaufmann, San Francisco, USA (2004)

6. B. Kooi. Expressivity and completeness for public update logics via reduction axioms,Jour- nal of Applied Non-Classical Logics, 17(2): 231–253 (2007)

7. B. L¨owe, E. Pacuit and A. Witzel. Planning based on dynamic epistemic logic.Tech. Report (2010)

8. P. Pardo and M. Sadrzadeh. Planning in the Logics of Communication and Change,Proc.

of Autonomous Agents and Multiagent Systems AAMAS 2012, V. Conitzer, M. Winikoff, L.

Padgham and W. van der Hoek (eds.) Valencia, Spain (2012)

(15)

9. J. PearlHeuristics: Intelligent Search Strategies for Computer Problem SolvingAddison- Wesley, 1984.

10. J. van Benthem, J. van Eijck and B. Kooi. Logics of Communication and Change,Informa- tion and Computation204:1620–1662 (2006)

11. W. van der Hoek and M. Wooldridge. Tractable Multiagent Planning for Epistemic Goals Proc. of Autonomous Agents and Multiagent Systems AAMAS 2002, pp. 1167–1174 (2002) 12. H. van Ditmarsch, W. van der Hoek and B. Kooi.Dynamic Epistemic Logic, Springer (2008) 13. H. van Ditmarsch, J. van Eijck, F Sietsma and Y. Wang. On the logic of Lying,Games, Actions and Social SoftwareJ. van Eijck and R. Verbrugge (eds.) LNAI-FoLLI, Springer (in press)

14. H. van Ditmarsch, B. Kooi. Semantic results for ontic and epistemic changeProceedings of Logic and the Foundations of Game and Decision Theory LOFT 7G. Bonanno, W. van der Hoek and M. Wooldridge (eds.) pp. 87–117 (2008)