• Aucun résultat trouvé

Towards a Unified View of AI Planning and Reactive Synthesis

N/A
N/A
Protected

Academic year: 2021

Partager "Towards a Unified View of AI Planning and Reactive Synthesis"

Copied!
12
0
0

Texte intégral

(1)

HAL Id: hal-02342156

https://hal.archives-ouvertes.fr/hal-02342156

Submitted on 31 Oct 2019

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Towards a Unified View of AI Planning and Reactive Synthesis

Alberto Camacho, Meghyn Bienvenu, Sheila Mcilraith

To cite this version:

Alberto Camacho, Meghyn Bienvenu, Sheila Mcilraith. Towards a Unified View of AI Planning and

Reactive Synthesis. Twenty-Ninth International Conference on Automated Planning and Scheduling

(ICAPS), Jul 2019, Berkeley, United States. �hal-02342156�

(2)

HAL Id: hal-02342156

https://hal.archives-ouvertes.fr/hal-02342156

Submitted on 31 Oct 2019

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Towards a Unified View of AI Planning and Reactive Synthesis

Alberto Camacho, Meghyn Bienvenu, Sheila Mcilraith

To cite this version:

Alberto Camacho, Meghyn Bienvenu, Sheila Mcilraith. Towards a Unified View of AI Planning and

Reactive Synthesis. Proceedings of the Twenty-Ninth International Conference on Automated Plan-

ning and Scheduling (ICAPS), Jul 2019, Berkeley, United States. �hal-02342156�

(3)

Towards a Unified View of AI Planning and Reactive Synthesis

Alberto Camacho

Department of Computer Science University of Toronto

Vector Institute Toronto, Canada acamacho@cs.toronto.edu

Meghyn Bienvenu

CNRS

University of Bordeaux Bordeaux, France meghyn.bienvenu@labri.fr

Sheila A. McIlraith

Department of Computer Science University of Toronto

Vector Institute Toronto, Canada sheila@cs.toronto.edu

Abstract

Automated planning and reactive synthesis are well- established techniques for sequential decision making. In thispaperweexamineacollectionofAIplanningproblems withtemporallyextendedgoals,specifiedi nL inearTempo- ralLogic(LTL).Wecharacterizetheseso-calledLTLplanning problemsastwo-playergamesandtherebyestablishtheircor- respondence to reactive synthesis problems. This unifying viewfurthersourunderstandingoftherelationshipbetween planandprogramsynthesis,establishingcomplexityresults forLTLplanningtasks.Buildingonthiscorrespondence,we identifyrestrictedfragmentsofLTLforwhichplansynthesis canberealizedmoreefficiently.

1 Introduction

In this work we explore and expose the relationship be- tweenAIautomatedplanningandreactivesynthesis,aclass of program synthesis problems. The two seemingly dis- tinctsequentialdecisionmakingframeworksshareacom- monpastintheseminalworkonprogramandplansynthe- sisvia theoremproving (e.g.,(Green 1969; Nilsson1969;

WaldingerandLee1969;FikesandNilsson1971))butdi- vergedasplanningresearchfocusedonefficientalgorithms forclassicalplanning.Withgrowinginterestinnon-classical planning problems that include non-deterministic actions andtemporallyextendedgoals,wereunitetheseframeworks throughthelensoftwo-playergames.Thisunifiedviewad- vancesour understandingof therelationshipbetweenplan andprogramsynthesis,providinguswithcomplexityresults forLinearTemporalLogic(LTL)planningandleadingtothe identificationoffragmentsofLTLforwhichplansynthesis ismoreefficient.

Program synthesis from logical specificationfollows fromtheclassicalsynthesisproblemoriginallyproposedby Church(1957).Reactivesynthesisisaclassofprogramsyn- thesisproblemsconcernedwithsynthesizingareactivemod- ulethat respondstothe environmentinaccordancewitha temporal logic specification.O urc oncerni nt hisp aperis withso-calledLTL synthesis– reactivesynthesisof speci- ficationsexpressedinLTL.Theproblemiscommonlychar- acterizedas atwo-playergame andhasbeen showntobe

2EXP-complete (Pnueli and Rosner 1989). More recently LTLsynthesis has been extended to specifications inLTLin- terpreted over finite traces (akaLTLf) (e.g., (De Giacomo and Vardi 2013; 2015; Zhu et al. 2017; Camacho et al. 2018a;

2018b)), and the model has been extended to handle envi- ronment assumptions and quality measures (Camacho, Bi- envenu, and McIlraith 2018).

The popularity of LTLis not restricted to program syn- thesis. Over the past 20 years there has been growing inter- est in planning with temporally extended goals and prefer- ences specified inLTL. Initial work focused on planning with deterministic actions andLTLgoals (and preferences) inter- preted over finite traces (e.g., (Bacchus and Kabanza 2000;

Doherty and Kvarnstr¨om 2001; Baier and McIlraith 2006a;

2006b; Triantafillou, Baier, and McIlraith 2015; Torres and Baier 2015)) and subsequently infinite traces (e.g., (Al- barghouthi, Baier, and McIlraith 2009; Patrizi et al. 2011)).

Other work focused on LTL preferences (e.g., (Edelkamp 2006; Baier, Bacchus, and McIlraith 2009; Coles and Coles 2011; Bienvenu, Fritz, and McIlraith 2011)) some of it in the PDDL3.0 (Planning Domain Definition Language) fragment ofLTL (Gerevini et al. 2009). More recent work has examined LTL interpreted over both finite and infinite traces in fully-observable non-deterministic (FOND) plan- ning settings (e.g., (De Giacomo, Patrizi, and Sardi˜na 2010;

Patrizi, Lipovetzky, and Geffner 2013; Camacho et al. 2017;

De Giacomo and Rubin 2018; D’Ippolito, Rodr´ıguez, and Sardi˜na 2018)). We refer to this broad collection of planning problems loosely asLTLPlanning Problems.

We are seeing increasing commonalities between the re- cent work onLTLsynthesis and the work onLTLplanning, particularly in the FOND setting. Both exploit automata rep- resentations of LTL for synthesizing solutions. Further the non-determinism in the FOND setting is closely related to the non-determinism exhibited by the environment in syn- thesis settings. Finally, we note the growing interest in us- ing planning algorithms for the realization ofLTLfandLTL synthesis (Camacho et al. 2018a; 2018c; 2018d). The corre- spondence between planning and synthesis has been previ- ously noted (e.g. (D’Ippolito, Rodr´ıguez, and Sardi˜na 2018;

Camacho et al. 2018b)), and even hybrid models have been proposed (e.g. (De Giacomo et al. 2016)). Opportunities abound from the cross-fertilization between these research areas, which motivated us to develop a unified view ofLTL

(4)

planning and reactive synthesis.

In Section 2 we review LTLand reactiveLTLsynthesis, and in Section 3 we outline a collection of LTLplanning problems in a manner that is amenable to their formulation in terms of structured games. Next, we introduce game struc- tures, providing a characterization ofLTLsynthesis in terms of structured games, re-establishing complexity results for LTL synthesis in this context, and highlighting the equiv- alence of winning strategies for these game and solutions to the correspondingLTLsynthesis problems. In Section 5, we provide a clear bidirectional mapping between structured games andLTLFOND and show thatLTLfFOND can be re- duced toLTLFOND and games. Our reductions allow us to obtain complexity results for LTL/LTLfplanning that repli- cate those for games, utilizing notions of combined com- plexity as well as domain complexity and goal complexity.

While the domain complexity ofLTLandLTLfFOND plan- ning is the same as for FOND with reachability goals, the overall complexity jumps to double-exponential time as re- sult of the goal. In Section 6, we identify restricted frag- ments ofLTLandLTLffor which plan synthesis has the same worst-case complexity as FOND with reachability goals but which is richer in terms of its expressiveness. We first ex- plore GR(1), a syntactically restricted fragment ofLTLfor which synthesis can be done in single exponential time.

Then we move to the finite case, and revisit the syntax of PDDL3.0 in the context of LTLf FOND planning. In both cases, we show that these compelling and expressive frag- ments ofLTLcan be employed without realizing the double exponential complexity ofLTLFOND.

The main contributions of this paper are: (i) a unified view ofLTLsynthesis and various forms ofLTL/LTLfplan synthe- sis in terms ofgame structures; (ii) complexity results for variousLTL/LTLfplan synthesis tasks that decouple the goal and domain complexity, thereby exposing the barriers to ef- ficient algorithms forLTL/LTLfplanning and synthesis; (iii) a structure-preserving correspondence betweenLTLsynthesis andLTL/LTLfplanning in FOND settings; (iv) identification of fragments ofLTLandLTLffor which plan synthesis can be realized more efficiently, including that FOND planning for PDDL3.0 goals is in the same complexity class as FOND planning with final-state goals.

Our theoretical results open the door to combining the benefits of structured planning models and reactive synthesis to specify and perform controller synthesis. Similarly, exist- ing techniques from one field (e.g.LTLsynthesis for GR(1) objectives) may be leveraged for the other (e.g. FOND plan- ning with GR(1) goals, for which no dedicated algorithm exists as of yet).

2 Preliminaries

2.1 LTL and Automata

Linear Temporal Logic(LTL) is a propositional modal logic commonly used to express temporally extended properties of state trajectories (Pnueli 1977). The syntax ofLTLis de- fined over a set of propositional variables p ∈ AP, and in- cludes the standard logical connectives and basic temporal

operatorsnext( ϕ) anduntil(ϕ12).

ϕBp| > | ⊥ | ¬ϕ|ϕ1∨ϕ21∧ϕ2| ϕ|ϕ12

Other temporal operators are defined in terms of these ba- sic operators, including eventually (♦ϕ := >Uϕ),always (ϕ:=¬♦¬ϕ),release(ϕ12 :=¬(¬ϕ1U¬ϕ2))), andweak until(ϕ12:=ϕ1∨(ϕ12)). We denote by|ϕ|the size ofϕ, i.e., its total number of symbols.

LTLformulae are evaluated overinfinite traces, i.e., infi- nite sequencess1s2. . .where eachsi⊆APdefines a propo- sitional valuation. Intuitively, formula ϕstates thatϕholds in the next timestep, while ϕ12 states that ϕ1 needs to hold untilϕ2 holds. Formally, we say that an infinite trace π = s1s2· · · satisfies anLTLformulaϕifπ,1 |=ϕ, where π,i|=ϕ(i≥1) is defined recursively as follows:

• π,i|=piffp∈APandp∈si,

• π,i|=¬ϕiffit is not the case thatπ,i|=ϕ,

• π,i|=(ϕ1∧ϕ2) iffπ,i|=ϕ1andπ,i|=ϕ2,

• π,i|= ϕiffπ,i+1|=ϕ,

• π,i |= ϕ12iffthere exists a j ≥isuch thatπ,j |= ϕ2 and for everyk∈ {i, . . . ,j−1},π,k|=ϕ1.

We writeπ|=ϕwhenπsatisfiesϕand callπamodelofϕ.

Every LTL formula can be transformed into an equiva- lent automaton that accepts all and only the models of the formula. Property 1 formulates this result fordeterministic parity word automata(DPW) (Piterman 2007). A DPW is a tupleA=hΣ,Q,q0, δ, αi, whereΣis a finite alphabet,Qis a finite set of states,q0∈Qis the initial state,δ:Q×Σ→Qis the transition function, andα:Q→Nis the labeling func- tion. The number of different values (also calledcolours) thatαcan take is called theindexof the automaton. Arun of A on an infinite wordw = s0s1· · · ∈ Σω is a sequence q0q1· · · ∈ Qω whereqi+1 = δ(qi,si) for each i ≥ 0. We say that a runρ = q0q1· · · isaccepting when the highest value that appears infinitely often inα(q0)α(q1)· · · is even.

Finally, the languageL(A) ofAis the set of words that have an accepting run onA.LTLformulae can be transformed into DPW over the alphabetΣB2APin double-exponential time.

Property 1. Given anLTLformulaϕ, a DPWAϕthat accepts all and only the models ofϕcan be constructed in worst-case double-exponential time in|ϕ|, with the index ofAϕ being worst-case single-exponential in|ϕ|.

In this work, we also consider LTLf, a variant of LTL in which formulae are interpreted over finite traces π = s1· · ·sn.LTLf inherits the syntax ofLTL. A macrofinal :=

¬ >is commonly used to indicate the end of the trace. It is important to notice that inLTLf,¬ ϕ. ¬ϕ, but instead we have ¬ ϕ ≡ final∨ ¬ϕ(here≡denotes equivalence for LTLf semantics, i.e., being satisfied by the same finite traces).LTLf can be transformed into finite-word automata (cf. (Baier and McIlraith 2006b)).

2.2 Reactive LTL Synthesis

Reactive synthesis is the problem of automatically con- structing a reactive module – a system that reacts to the environment with the objective ofrealizinga given logical specification (Church 1957). In this paper, we are concerned

(5)

withLTLsynthesis, a form of reactive synthesis where the specification is expressed inLTL(Pnueli and Rosner 1989).

Definition 1. An LTL specification is a tuple hX,Y, ϕi, whereX andYare disjoint finite sets of variables, andϕ is anLTLformula overX ∪ Y.

Astrategyfor anLTLspecificationhX,Y, ϕiis a function σ: (2X)+ → 2Y that maps finite sequences of subsets ofX into subsets ofY. For an infinite sequenceX= X1X2· · · ∈ (2X)ω, theplayinduced by strategyσis the infinite sequence ρσ,X=(X1∪σ(X1))(X2∪σ(X1X2))· · ·. A playρiswinning ifρ |= ϕ, and a strategy is winning when ρσ,X |= ϕfor all X∈(2X)ω.Realizabilityis the problem of deciding whether anLTLspecification has a winning strategy, andsynthesisis the problem of computing one. Both problems can be solved in double-exponential time (Pnueli and Rosner 1989):

Theorem 1. LTLrealizability is 2EXP-complete.

3 Plan Synthesis with LTL Goals

Automated planning provides a framework for sequential decision-making in which the agent can interact with the environment according to a prescribed domain theory, and where the objective is to generate a plan whose execution is guaranteed to satisfy a prescribed goal. Planning problems differ in the assumptions they make about the form of the ini- tial state (complete, partial), the transition system (determin- istic, non-deterministic, stochastic), the nature of the goal condition, whether the state is (partially) observable during plan execution, the form of the solution orplan(sequence of actions, conditional plan, policy), and whether the execution of that plan is finite or infinite.

Our concern in this paper is with planning problems whose transition systems are non-deterministic and whose goals are temporally extended and as such take the form of constraints on valid executions of the plan. We specify such temporally extended goals inLTL. Plan executions may be finite or infinite. Unless otherwise noted, initial states are complete and plan execution is fully observable. In the rest of this section we review this class of so-calledLTL(resp.

LTLf) Planning Problemsusing notation that will facilitate subsequent characterization in terms of 2-player games.

Planning DomainFollowing (Ghallab, Nau, and Traverso 2016), we represent planning domains compactly using a symbolic representation of states, actions, and transitions:

Definition 2. Aplanning domainis a tuplehF,A, δ,Possi, whereF is a finite set of fluent symbols,Ais a finite set of action symbols, Poss⊆2F× A, andδ: 2F× A →22F. The size of the domainis|F |.

The fluents in F denote state properties; we employ a propositional STRIPS representation to parsimoniously rep- resent states as subsetss⊆ F; the set of states isS B2F. We use the termsearch spaceto refer to the setS. An ac- tionais applicable in state sif (s,a) ∈ Poss. The function δdescribes state transitions, withs0∈δ(s,a) indicating that states0is a possible result of applying actionain states.

Astrategyfor a planning domainD= hF,A, δ,Possiis a functionσ : (2F)+ → A ∪ {end} that maps finite state

histories to actions, or toendotherwise.1Hereendis a spe- cial symbol, not contained inD, that denotes the termina- tion of a plan. We require strategies to be well defined, that is, for all n ≥ 0, action σ(s0· · ·sn) is applicable in sn. A strategy is history-independent if σ(s0· · ·sn) = σ(sn) for each s0· · ·sn ∈ (2F)+. In this case, we callσapolicy. An execution of σ from s0 ∈ S is a sequence of state-action pairs π = (s0,a0)(s1,a1)· · · such that, for each n < |π|:

(i) an = σ(s0· · ·sn); (ii) if n < |π| −1, thenan ∈ Aand sn+1∈δ(sn,an); and (iii) ifn=|π|−1, thenan=end. The ex- ecutionterminateswhenan =end. Terminating executions are finite sequencesπ = (s0,a0)· · ·(sn,an) wheren is the first and only indexisuch thatai = end. Non-terminating executions are infinite, and we write|π|=∞.

LTL FOND Planning An LTL fully-observable non- deterministic (FOND) planning problem is a tuple hD, ϕI, ϕGi, where D = hF,A, δ,Possi is a planning do- main,ϕI is a propositional formula overF that describes a unique state called theinitial state(typically given as a com- plete conjunction of literals), andϕGis anLTLformula over F ∪ A. A strategyσis asolutionto anLTLFOND planning problemhD, ϕI, ϕGiif all executions ofσthat commence in s0 |=ϕI are infinite and satisfyϕG. In other words, the ob- jective is to define a non-terminating process that guarantees achievement of the given temporally extended goal, whose satisfaction may depend on the entire execution, rather than just the final state.

LTLfFOND PlanningWe can defineLTLfFOND planning problemsin exactly the same manner, except that a strategy σis deemed asolutionif all executions ofσthat commence ins0 |=ϕI terminateand satisfyϕG(note that we use LTLf semantics to evaluateϕGover the induced finite trace). Thus, the aim is to generate terminating strategies that guarantee satisfaction of the goal.

FOND PlanningAs a special case ofLTLfFOND planning, we have (plain) FOND planning, in which the goal takes the formϕG=♦(θg∧final), withθga propositional formula over fluentsF (typically, a conjunction of fluents). Solutions to FOND planning are policies (rather than arbitrary strate- gies), but we observe that this restriction does not affect the existence of solutions (cf. Proposition 1).

Proposition 1. Any strategy that solves aLTLfFOND plan- ning problemPwith goalϕG =♦(θg∧final)can be trans- formed into a policy that is a solution toP.

Planning with Partially Specified Initial StatesA benefit of describing the initial state by a formula ϕI is that it al- lows us topartially specifythe initial state. We extend the definitions ofLTL/LTLf FOND to allow forϕI to describe a family of states, i.e., those that satisfyϕI. Solutions to an LTL/LTLf FOND problemhD, ϕI, ϕGiwith partially spec- ified initial state ϕI are defined in the natural way. A strat- egy σ is a solution to hD, ϕI, ϕGi if σ is a solution to an LTL/LTLfFOND problemhD, ϕ0I, ϕGifor anycompletefor- mulaϕ0I that describes a unique initial state, and such that

1History-dependent strategies are often represented compactly in form of finite-state controllers (cf. (Patrizi, Lipovetzky, and Geffner 2013; Camacho et al. 2017)).

(6)

ϕ0I |=ϕI. Informally,σsolves all problems with initial state that satisfiesϕI.

DeterministicLTL/LTLfPlanningThe preceding problems can be restricted by only considering actions with determin- istic effects, i.e., requiring that |δ(s,a)| = 1 for each pair (s,a) ∈S × A. We refer to this class of problems asdeter- ministicLTL/LTLfplanning problems.

Complexity of Planning When speaking about the com- plexity of planning, we focus on the decision problem of plan existence. There are different possible complexity mea- sures depending on which parts of the problem are consid- ered as fixed or varying. In addition to the standardcombined complexitymeasure (in terms of the size of the whole prob- lem), we will considerdomain complexity(in terms of the size of the domain, with the goal formula treated as fixed) andgoal complexity(in terms of the size of the goal formula, with the domain treated as fixed).

4 Game Structures

In the previous section, we presented a variety ofLTL(resp.

LTLf) planning problems. In this section we review and elab- orate on the notion ofgame structureswhich we employ to expose and highlight correspondences between these plan- ning problems andLTLsynthesis.

Piterman, Pnueli, and Sa’ar (2006) studied the problem of computing winning strategies for so-calledgame structures (see Definition 3). Intuitively, a game structure is similar to anLTLspecification. The difference is that the moves of the players (system and environment) in a game structure are as- sumed to be constrained by therulesof the game. Solutions are system strategies that guarantee achievement of awin- ning conditionϕ, under the assumption that both players act as mandated by the rules described byθese, andψs. Definition 3. A game structure is a tuple G = hX,Y, θe, θs, ψe, ψs, ϕi, where:

• Xis a finite set ofenvironmentvariables.

• Yis a finite set ofsystemvariables, disjoint withX.

• θeis a propositional formula overX.

• θsis a propositional formula overX ∪ Y.

• ψeis a propositional formula overX ∪ Y ∪ X.

• ψsis a propositional formula overX ∪ Y ∪ X ∪ Y.

• ϕis anLTLformula overX ∪ Y(thewinning condition) with X={ x|x∈ X}and Y={ y|y∈ Y}.

Theenvironmentplayer moves first in each turn, andin- puts a subset of the X variables. In response, the system playeroutputsa subset ofYvariables. The moves Xn ⊆ X andYn ⊆ Yin the n-th turn constitute a game configura- tion, orstate sn = Xn ∪Yn. We shall use the term search spaceto refer to the set of all such states and observe that its size is singly exponential in |X|+|Y|. The structure of the game constrains the players’ moves. TheLTLformulaψe relates the values ofXn,Yn, andXn+1and states how the en- vironment is allowed to respond to a statesn. Similarly, the LTLformulaψsrelates the values ofXn,Yn,Xn+1, andYn+1, and states how the system can react to the environment given

the previous state. Formulasθeandθshave similar purposes, serving to constrain the players’ initial moves.

Aplayof the gameGis a maximal (possibly infinite) se- quence of states, ρ = s0s1· · ·, such that: (i) s0 is anini- tial state, i.e.,s0 |= θe∧θs; and (ii) for eachn > 0, sn is asuccessor of sn−1, i.e. (sn−1,sn) |= ψe ∧ψs. The system player aims to generate a play that satisfies the winning con- ditionϕ. Formally, a playρiswinningfor the system if ei- ther (i)ρ = s0· · ·sn is finite and there is no sn+1such that (sn,sn+1)|=ψe(i.e., the environment has no legal move); or (ii)ρis infinite andρ|=ϕ.

Gamestrategiesare defined as forLTLsynthesis. Namely, a strategy for the system player is a functionσ : (2X)+ → 2Y. A playρ = (X0∪Y0)(X1 ∪Y1)· · · is σ-compliant if Yn = σ(X0· · ·Xn) at eachn < |ρ|. We sayσ is awinning strategy for the system player if allσ-compliant plays are winning.Winner determinationis to decide whether such a winning strategy exists, and bysolving the game, we mean the task of producing such a strategy.

Game Structures and LTL Synthesis The LTL synthesis and structured game problems are interreducible: structured games can be reduced toLTLsynthesis where all the struc- ture is embedded into theLTLobjective (Theorem 2); con- versely, anyLTLspecification formula can be converted into an equivalent game with no structural constraints (Propo- sition 2). We should emphasize that Theorem 2 was orig- inally formulated for syntactically restricted formulae (in the GR(1) fragment, discussed in Section 6.1), but the re- sult naturally extends to arbitraryLTLformulae. TheTempo- ral Logic Synthesis Format(TLSF) is a high-level descrip- tion format forLTLsynthesis (Jacobs, Klein, and Schirmer 2016). TLSF specifications are described in terms of the components of a structured game. TLSF can be interpreted under two different semantics, so-called strict or alterna- tivelystandardsemantics. The reduction in Theorem 2 cor- responds to thestrictsemantics.

Theorem 2(adapted2and extended from (Klein and Pnueli 2010; Bloem et al. 2012)). The system wins the game G = hX,Y, θe, θs, ψe, ψs, ϕiiffhX,Y, ϕGiis realizable, where:

ϕGB(θe→θs)∧(θe→(ψsW¬ψe))∧((ψe∧θe)→ϕ)) Moreover, the transformation preserves winning strategies.

We now show howLTLsynthesis corresponds to a special case of structured games:

Proposition 2. A strategy is winning for theLTLsynthesis problemhX,Y, ϕiiffit is a winning strategy for the corre- sponding structured gamehX,Y,>,>,>,>, ϕi

Solving Structured Games As the reduction from Theo- rem 2 preserves winning strategies, it provides a way to solve games via reduction toLTLsynthesis and to derive the following complexity results:

Theorem 3. Structured games can be solved in double- exponential time, and the winner determination problem for structured games is 2EXP-complete.

2The formula in (Bloem et al. 2012) makes uses future and past temporal operators. Here, we use an equivalent formula with only future temporal operators (cf. (Jacobs, Klein, and Schirmer 2016)).

(7)

In what follows, we present an alternative way to solve games that exploits their structure and allows us to obtain more fine-grained complexity results in terms of the size of the search space and winning condition (Theorem 4). These results extend similar results to games in a restricted frag- ment ofLTL, which we will revisit in Section 6.1.

The classical approach to LTL realizability reduces the specification to a parity game, and constructs a strategy via fixpoint computation (e.g. (Pnueli and Rosner 1989)). This approach can be straightforwardly adapted to solve struc- tured gamesG=hX,Y, θe, θs, ψe, ψs, ϕi. We revisit here the main steps. The interesting thing to notice is that the pro- cedure decouples the role of the winning condition and the search space within the game complexity.

1. Transformϕinto a DPWAϕ 2. Construct a parity gameG0=Aϕ||G 3. Compute a winning strategy forG0

The first step computes a DPW Aϕ, in worst-case dou- ble exponential time in|ϕ|, that recognizes the models ofϕ.

The index d of Aϕ (i.e., number of colors) is, at most, ex- ponential in |ϕ|. In the second step, a product paritygame G0 = Aϕ||G is constructed that integrates the dynamics of AϕandGin parallel. In the context of this paper, a parity game may be viewed as a structured game with a parity win- ning condition, in place of anLTLformula, to evaluate which infinite plays are winning. Parity games are played on a fi- nite graph whose nodes represent game states. The dynamics of the game alternates between system’ and environment’

nodes. The nodes associated with the turn of the environ- ment player are labeled with tuples (q,X,Y), whereq is a DPW state, X ⊆ X, and Y ⊆ Y. Intuitively,q keeps track of the run ofAϕon a partial playρ=s0s1· · ·sn, andXand Y are the components of state sn. Similarly, the nodes as- sociated with the turn of the system player are labeled with tuples (q,X,Y,X0) that keep track of the run ofAϕonρfol- lowed by the environment’s move X0. The colour of a node is the colour of the DBW state in the node’s label, and the index ofG0 is the highest colour among all its nodes. The edges of the tree connect nodes in the natural way whenever both players have been following the rules of the structured game. Edges (q,X,Y) X

0

−−→ (q,X,Y,X0) reflect an environ- ment’s move, X0, and edges (q,X,Y,X0) Y

0

−→ (q0,X0,Y0) re- flect a system’s move,Y0, withq0=δ(q,X0∪Y0). Additional edges are included to handle illegal moves of the structured game G. Environment moves that violate ψe are captured by edges that point to a dummy node,q>. More precisely, (q,X,Y) X

0

−−→q>whenever (X∪Y)(X0)6|=ψe. Similarly, edges to a dummy nodeq handle system moves that violateψs, i.e., (q,X,Y,X0) Y

0

−→qwhenever (X∪Y)(X0∪Y0)6|=ψs. We shall assign colour 0 toq>, and 1 toq. Bothq>andq are sink nodes, meaning that outgoing edges are all self-loops.

The root node of the game is labeled with (q0, , ), withq0

being the initial state of the DBW. The root node has no play history associated to it. Its outgoing edges correspond to en- vironment movesXand transition to nodes with labels of the

form (q0, , ,X) wheneverX|=θe, and toq>otherwise. Sim- ilarly, nodes with labels of the form (q0, , ,X) transition to nodes with labels of the form (q0,X,Y) wheneverX∪Y |=θs, and toqotherwise. This construction preserves winner de- termination, and a winning strategy forGcan be extracted from a winning strategy forG0.

In the third step, a winning strategy forG0can be com- puted in O(e·nd), where e is the number of edges, n is the number of nodes, and d is the index ofG0 (Zielonka 1998). In the parity gameG0 described above, the number of nodes – as well as the number of edges – is polynomial on|Aϕ| ·2O(|X|+|Y|), andG0has the same index ofAϕ, which is exponential in|X|+|Y|(w.l.o.g., we assumeAϕhas index greater than zero). The following result follows:

Theorem 4. Structured games can be solved in double- exponential time in the size of the winning condition, and in polynomial time in the size of the search space.

5 Plan Synthesis and Structured Games

The correspondence between planning and game structures has been noted in the past. De Giacomo et al. (2010) con- structed a reduction of FOND planning with final-state goals into game structures. Here, we extend those results toLTL FOND andLTLfFOND planning.

We provide a clear bidirectional mapping between struc- tured games andLTLFOND planning (Sections 5.1 and 5.2), and then show thatLTLfFOND planning can be reduced to LTLFOND and games (Section 5.3). Our reductions allow us to obtain complexity results for LTL/LTLf planning that replicate those for games.

5.1 Structured Games as LTL FOND Planning We begin by presenting a reduction of structured games to LTLFOND planning. Intuitively, we model the moves of the system by means of planning actions, and the response of the environment by means of non-deterministic action effects.

LetG =hX,Y, θe, θs, ψe, ψs, ϕibe a game structure. We assume for now thatθeis satisfiable and defines a complete truth assignment to the variables X, returning later to the general case. We construct anLTLFOND planning problem PG=hD, ϕI, ϕGiwith domainD=hF,A, δ,Possiand

F BX ∪ Y ∪ X0

xinit,xend,x0end,aend

AB{A0|A⊆ Y} ∪ a0end

ϕIBxinit∧ ¬xend∧ ¬x0end∧ ¬aend∧θ0e ϕGB♦xend∨ ϕ Note that we useV0B{v0|v∈ V}to denote a set of primed variables (and similarly, useθ0eforθewith all variables now primed). Planning states keep track of the last game state.

Intuitively, primed variables serve to indicate the moves per- formed by each player in the current turn, whereas unprimed variables indicate the last game state.

The set Poss ⊆ 2F × Aconsists of all pairs (X1∪A1∪ X02,A02) such that X1 ⊆ X ∪ {xinit,xend}, X2 ⊆ X ∪ {xend}, A1,A2 ⊆ Y ∪ {aend}, and

• X1|=xinit∧ ¬xendand (X2∪A2)|=θs, or

• X1,X2 ⊆ X,A1,A2⊆ Y, and (X1∪A1)(X2∪A2)|=ψs, or

• X2|=xendandA2 ={aend}

(8)

The initial stateϕI simulates the initial move of the en- vironment, determined by the (unique, by assumption) in- terpretation ofXvariables that makeθetrue. We now define howδsimulates the moves of the environment in response to the moves of the system. Intuitively, the non-determinism in δallows the environment to make a legal move, or otherwise transition to{xend}. Formally, ifX1,X2⊆ X, andA1,A2⊆ Y, thenδis defined by:

δ(X1∪A1∪X02,A02)B

X2∪A2∪X03|X3={xend} ∪ X2∪A2∪X30 |X3⊆ Xand (X2∪A2)(X3)|=ψe

We simulate that the environment makes an illegal move by assigningX3 ={xend}, and force the dynamics to enter a loop in which only actionaend can be applied:δ(X1∪A1∪ {x0end},{aend}) B {{xend,aend,x0end}}. Those simulated execu- tions satisfy♦xend. Note that executions that satisfy ϕare in correspondence with plays that satisfy the winning condi- tion,ϕ. The one-timestep offset in ϕis needed becauseϕ is evaluated over unprimed variables, and those are delayed by one timestep in the construction of the problem.

In the general case, with genericθe, our reduction can be extended with a dummy action, only applicable in a new dummy initial state, that transitions non-deterministically into one of the states that satisfy xinit ∧ ¬xend ∧ ¬x0end

¬aend∧θ0e, orxend. The new goal becomesϕGB♦xend∨ ϕ, with the additionalnextoperator being introduced to reflect the extra initial timestep. The reduction is polynomial, as it transforms a gameGinto anLTLFOND problemPGwith a polynomial increase in the size of the search space. Winning strategies toGcan be extracted directly from solutions to the reducedLTLFOND problem.

Theorem 5. A structured game with winning condition ϕ can be reduced to anLTLFOND planning problem with goal ϕGB♦xend∨ ϕ, with a polynomial increase in the size of the search space.

5.2 LTL FOND Planning as Structured Games To complete the bidirectional mapping, we provide a re- duction from LTL FOND planning into structured games.

For anLTLFOND planning problemP =hD, ϕI, ϕGiwith domain D = hF,A, δ,Possi we construct a game GP = hX,Y, θe, θs, ψe, ψs, ϕGi, whereXBF,YBA, and:

θeI ψeB _

s∈2F

_

a∈A

_

s0∈δ(s,a)

(τ(s)∧a∧ (τ(s0))) θsB







ϕoneA ∧ _

(s,a)∈Poss

τ(s)∧a







 ψsB







ϕoneA ∧ _

(s,a)∈Poss

τ(s)∧a







Consistent with our notation for planning, we usesfor sub- sets ofF andAfor subsets of A. We letτ(s) BV

f∈s f ∧ V

f∈F \s¬f and setϕoneA B W

a∈Aa∧V

a,a0∈A,a0,a¬(a∧a0).

Note that inψeandψs, we use in front of arbitrary propo- sitional formulas, but by pushing down to the fluents and actions, we obtain formulae of the required form.

The components ofGP mimic the dynamics ofP, accord- ing to the semantics discussed in Section 4. In the gameGP, the environment controls the fluents (XBF), and the sys- tem controls the actions (Y B A). The formulas ψe and ψs reproduce the state transition model in D. Formulaψe

checks whether states0in a sequence (s∪A)(s0∪A0) of two consecutive game states is a legal move of the environment player, relative to sandA– assuming that previous moves of each player respected the dynamics of the game. This oc- curs whens0∈δ(s,a). Formulaψschecks whetherA0⊆ A is a legal move of the system player, relative tos,A, ands0– assuming that previous moves of each player respected the dynamics of the game. This occurs when (i) A0contains a single action; and (ii) actiona ∈ A0 is applicable ins0. Fi- nally, formulasθeandθsare assertions over XandX ∪ Y, respectively, and check that the initial move of the environ- ment and system players is legal – that is, the environment sets the initial state toϕI, and the system responds with an action that is applicable in the initial state. The auxiliary for- mulaϕoneA ensures that exactly one action symbol holds in each game position, making it possible to translate winning game strategies into solutions. Note that our reduction does not requireϕIto define a unique state.

Theorem 6. Plan existence of anLTLFOND planning prob- lemPcan be reduced to winner determination of the game structure GP and, by extension, toLTLrealizability of ϕP B(θe→θs)∧(θe→(ψsW¬ψe))∧((ψe∧θe)→ϕG)) Furthermore, the reduction is such that solutions can be di- rectly extracted from winning strategies.

Encoding STRIPS Models When we represent planning problems with propositional STRIPS-like models, formulas ψeandψscan be written more compactly, as follows:

ψeaffBa→ _

e∈Effa







^

f∈Adde

f∧ ^

f∈Dele

¬f∧ ^

f∈F\Adde∪Dele

f↔ f







ψeB

^

a∈A

ψeffa θsB







ϕoneA ∧^

a∈A







¬a∨ ^

p∈Prea

p













ψsB θs

Observe that ψeaff captures both the positive and negative effects of non-deterministic action a, as well as frame as- sumptions – that the truth value of every other fluent, not affected by the actiona, stays the same. In particular, for- mulaeV

f∈F\Adde∪Dele f ↔ f capture the frame axioms for effect e ∈ Effa. We can write these as successor state ax- ioms to realize a more parsimonious solution to the frame problem (Reiter 2001).

Complexity Analysis Our reduction ofLTL FOND plan- ning into structured games conveniently decouples the do- main from the goal formula. This allows us to characterize the complexity ofLTLFOND planning in terms of these two components and obtain analogous results to those in game structures (cf. Theorem 4), The same complexity results have been independently obtained very recently (available on arXiv, (Aminof et al. 2018)) with a different approach.

Theorem 7. Plan existence inLTLFOND planning is EXP- complete in domain complexity, and 2EXP-complete for goal and combined complexity.

Proof sketch. Membership follows from the reduction to games in Theorem 6 and results in Theorem 4. Hardness is obtained by reducing LTLf FOND intoLTL FOND (see Lemma 1), and then applying the hardness results forLTLf FOND from (De Giacomo and Rubin 2018).

(9)

Remark 1. The results in Theorems 6 and 7 also apply to LTLFOND planning with partially specified initial states.

5.3 LTLfFOND Planning as Structured Games It is also possible to reduceLTLfFOND planning to struc- tured games. We show this by reducingLTLfFOND toLTL FOND, from which Theorem 6 can be applied. The re- duction of an LTLf FOND problemP = hD, ϕI, ϕfini with domain D = hF,A, δ,Possi is an LTL FOND problem hD0, ϕ0I, ϕinfi whose domain,D0 B hF0,A0, δ0,Poss0i, has symbolsF0 B F ∪ {alive}andA0 BA ∪ {aend}. The dy- namics ofD0 simulate plan executions inD. Symbolaend

indicates that the simulated plan execution has terminated, and symbolaliveindicates the opposite. Formally:

Poss0B{(s∪ {alive},a)|(s,a)∈Poss} ∪n

(s,aend)|s∈2F0o δ0(s,a)B

({s0∪ {alive}}s0∈δ(s\{alive},a) ifa,aendands|=alive

{∅} otherwise

The new initial state is given byϕ0II ∧(alive). Finally, LTLgoal formulaϕinf Btr(ϕfin)∧♦¬(alive) is obtained by applying the following polynomial transformation toϕfin:

tr(p)=p tr(¬ϕ)=¬tr(ϕ) tr1∧ϕ2)=tr1)∧tr2) tr( ϕ)= (alive∧tr(ϕ)) tr12)=tr1)U(alive∧tr2))

Intuitively, an infinite traceπsatisfiesϕinf iffaliveis false at some point, and the finite prefix π0 in which alive is truesimulatesa finite trace that satisfiesϕfin. The preceding transformation is based on the transformations found in (De Giacomo and Vardi 2013; Camacho, Bienvenu, and McIl- raith 2018), but fixes some minor mistakes and makes some simplifications by leveraging the dynamics induced byaend. The transformation preserves plan existence (Lemma 1).

Moreover, solutionsσfin to theLTLf FOND planning prob- lem can be easily defined from solutions σinf to the LTL FOND planning reduction asσfin(s0· · ·sn)Bσinf(s00· · ·s0n), where primed states are augmented with predicatealive, and execution terminates (end) wheneverσinf outputsaend. Lemma 1. Plan existence of anLTLfFOND planning prob- lemhD, ϕI, ϕfini can be polynomially reduced to plan exis- tence of anLTLFOND planning problemhD0, ϕ0I, ϕinfi.

Combining Lemma 1 and Theorem 6 yields:

Theorem 8. Plan existence of an LTLf FOND planning problem P can be reduced to winner determination of a game structure and, by extension, toLTLrealizability.

Encoding STRIPS ModelsWith a propositional STRIPS- like representation of the planning problems, the translation of a domainDwith symbolsF,Aproduces a new domain, D0, with symbolsF0 BF ∪ {alive}andA0BA ∪ {aend}.

Actionaend has preconditions Preaend B {alive}and a de- terministic effect,Effaend ={h∅,F0i}, that deletes all fluents.

Actions inAare updated to incorporate symbolalivein their preconditions.

Remark 2. The results in Lemma 1 and Theorem 8 apply to LTLfFOND planning with partially specified initial states.

Complexity Analysis Complexity results for LTLf FOND planning (with unique initial state) were recently published

by De Giacomo and Rubin (2018). Membership can be eas- ily deduced from our reductions. We extend their result to LTLf FOND planning with partially specified initial states.

Interestingly, the complexity ofLTLf FOND replicates that of LTL FOND. The same phenomenon occurs in LTLand LTLfsynthesis, both being 2EXP-complete problems (Pnueli and Rosner 1989; De Giacomo and Vardi 2015).

Theorem 9. Plan existence forLTLf FOND with partially specified initial states is EXP-complete in domain complex- ity, and 2EXP-complete in goal and combined complexity.

Proof sketch. Hardness results follow from (De Giacomo and Rubin 2018). Membership follows from our reductions (cf. Lemma 1, Theorem 7, and Remarks 1 and 2).

6 More E ffi cient LTL / LTL

f

Plan Synthesis

In the previous sections, we formalized the connection be- tween LTL synthesis, game structures, and various forms of planning. By doing so, we derived complexity results for planning that mimic those in LTLsynthesis and games.

While the domain complexity ofLTLandLTLfFOND plan- ning is the same as for FOND with reachability goals, the overall complexity jumps to double-exponential time.

The aim of this section is to identify restricted fragments ofLTLandLTLffor which plan synthesis has the same worst- case complexity as FOND with reachability goals but which is richer in terms of its expressiveness. We first explore GR(1), a syntactically restricted fragment ofLTLfor which synthesis can be done in single-exponential time. Then we move to the finite case, and revisit the syntax of PDDL3.0 in the context ofLTLfFOND planning.

6.1 GR(1) Synthesis and Game Structures

Generalized Reactivity (1) (GR(1) for short) is a fragment ofLTLthat has been studied in the context of synthesis and games for its computational advantages. Inconsistent use of terminology in the literature has created some confusion re- garding what the GR(1) fragment is, how GR(1) synthesis differs from GR(1) games, and the complexity of these tasks.

We provide a brief historical overview, with the aim of clar- ifying each of these concepts as we understand them.

The GR(1) fragment ofLTLwas originally introduced by Piterman, Pnueli, and Sa’ar (2006), and comprised the frag- ment ofLTLformulae in the syntactic form

^

i∈I

♦αi→^

j∈J

♦βj

whereαiandβjare propositional formulae.GR(1) synthesis can be done with exponentially lower worst-case complexity than generalLTLsynthesis (Theorem 10).

Theorem 10(Piterman et al. 2006). Synthesis of specifica- tionshX,Y, ϕiwith GR(1) formulaϕcan be done in expo- nential time on the number of variables inϕ.

Piterman et al. studied the problem of computing winning strategies for so-calledGR(1) games, that is, game structures with GR(1) winning condition. They provided a first reduc- tion from GR(1) games into standard assume-guaranteeLTL synthesis. That reduction contained a fundamental flaw that

(10)

produced false unrealizable specifications. The flaw was de- tected and corrected by Klein and Pnueli (2010), and subse- quently included in (Bloem et al. 2012) in the form given in Theorem 2, namely:

e→θs)∧(θe→(ψsW¬ψe))∧((ψe∧θe)→ϕ)) Similar to GR(1) synthesis, GR(1) games can be solved with exponentially lower hardness bounds than for general LTLwinning conditions. Perhaps better known than Theo- rem 11 is the tractability result for GR(1) games in Corol- lary 1. It is worth noting that the polynomial time bounds are only validwhen the size of the game search space is fixed.

Theorem 11. [Bloem et al. 2012] GR(1) games with win- ning condition ϕcan be solved in polynomial time on the size of the search space, and exponential time in the number of variables inϕ.

Corollary 1. For a fixed search space, GR(1) games can be solved in polynomial time.

Despite the fact that GR(1) is a syntactically restricted fragment of LTL, many non-GR(1) properties can be cap- tured by GR(1) games by adding extra controlvariables.

Maoz and Ringert (2015) gave a method to reduceLTLre- alizability into winner determination for a GR(1) game. The approach works for anyLTLformula that can be transformed intodeterministicB¨uchi word automata (DBW). This is the case for 54 out of the 56 industrial patterns listed in (Dwyer, Avrunin, and Corbett 1999), highlighting the strong appeal of the GR(1) fragment for practical applications. The con- struction first takes an LTL description of the pattern and transforms it into a DBW. In a second step, anLTLformula is constructed with extra controllable variables – one per au- tomaton state – that capture the B¨uchi acceptance condition.

The syntactic form of the formula makes it possible to ap- ply existing methods that reduceLTLrealizability to winner determination of a GR(1) game.

6.2 Planning with GR(1) Goals

The advantageous computational properties of the GR(1) fragment for game structures also manifest in LTLFOND planning (Theorem 13). This is because there exists a bidi- rectional mapping between game structures and LTLplan- ningwhen the goal and winning condition is a GR(1) for- mula: GR(1) games can be reduced toLTL planning with GR(1) goals and, conversely,LTLplanning with GR(1) goals can be reduced to GR(1)LTLplanning.

Theorem 12. GR(1) games can be polynomially reduced to LTLFOND planning with GR(1) goals, and vice versa.

Proof sketch.For the first direction, we employ the reduction from Theorem 5. Observe thatϕGB♦xend∨ ϕis equiv- alent toϕG0 B♦xend∨ϕwhenϕ=V

i∈I♦αi→V

j∈J♦βj is a GR(1) formula. Further observe that, for the constructed planning problem, an execution satisfies♦xendiffit satisfies

¬♦¬xend. This is becausexendcannot be made false once it has been made true. By means of syntactic manipulation, we can further show that an execution satisfiesϕGiffit satisfies the GR(1) formula ♦¬xend∧V

i∈I♦αi → V

j∈J♦βj. For the second statement, we can reuse the reduction under- lying Theorem 6, as it does not modify goal formulae.

Theorem 13. Plan existence forLTLFOND planning prob- lems with GR(1) goals is EXP-complete in domain and com- bined complexity, and in PTIME w.r.t. goal complexity (i.e., when the domain is fixed).

Proof sketch.Membership results are obtained by combining Theorems 11, 12, and Corollary 1. The EXP lower bound follows from the EXP-hardness of FOND and the fact that reachability goals can encoded using GR(1) formulae.

The results in Theorem 13 can be extended to the finite case: plan synthesis of FOND problems with LTLf goal in GR(1) syntactic form can be done in exponential time. Un- fortunately, this result is not very interesting because LTLf

FOND planning with GR(1) goals is equivalent to FOND planning with final-state goals.

Property 2. The GR(1) syntactic fragment of LTLf is equally expressive as the syntactic fragment of formulas of the form♦(θ∧final) for some propositional formulaθ.

Proof. Satisfaction of GR(1) LTLf formula V

i∈I♦αi → V

j∈J♦βj is equivalent to satisfaction of LTLf formula

♦(θ∧final) withθ = V

i∈Iαi → V

j∈Jβj. Conversely, sat- isfaction ofLTLfformula♦(θ∧final) is equivalent to satis- faction of GR(1)LTLfformula♦> →♦θ.

6.3 Request-Response Planning

We propose here the class of request-response formulas:

a syntactic class that appear to be a reasonable counter- part to GR(1) in the finite case, as it captures similar pat- terns. Request-response formulas take the form(α→♦β), for some propositional formulasαandβ. Request-response LTLformulas can be transformed into GR(1)LTLformulas that capture the same patterns, possibly with introduction of two extra variables using Maoz and Ringert (2015)’s tech- nique. Therefore, LTL games, synthesis, and planning for request-responseLTLformulas can be polynomially reduced into their GR(1) counterparts. In contrast to GR(1), request- responseLTLfformulas are more expressive than final-state goals when interpreted over finite traces.

We omit the proof of Theorem 14, as it can be obtained from the more general Theorem 15.

Theorem 14. Plan existence forLTLfFOND with request- response goal formulas is EXP-complete for domain and combined complexity, and in PTIME for goal complexity.

6.4 Planning with PDDL3.0 Goals

The Planning Domain Definition Language (PDDL) is a globally accepted language used to describe planning prob- lems. Different variants of PDDL exist that allow for e.g.

durative actions and conditional effects. PDDL3.0 (Gerevini et al. 2009), which we abbreviate to PDDL3, extends the language of PDDL with the modal temporal operators listed in Table 1, to be interpreted over finite traces. We in- cluded equivalentLTLf formulae, as shown in (Gerevini et al. 2009; De Giacomo, Masellis, and Montali 2014). We provided corrected versions for the operatorshold-during and hold-after. We also simplified the formula for sometime-beforeandat-most-once.

(11)

modal operator equivalentLTLf size of DFA

(at-endθ) final) 2

(alwaysθ) θ 3

(sometimeθ) θ 3

(sometime-afterθ1θ2) 1θ2) 3

(sometime-beforeθ1θ2) θ2R¬θ1 4

(at-most-onceθ) (¬θ)W(θW(¬θ)) 5

(withinnθ) W

0≤i≤n iθ n+3

(always-withinnθ1θ2) 1W

0≤i≤n iθ2) n+3

* (hold-duringn1n2θ) W

0≤i≤n1

ifinal)

n1+n2+2

V

n1<i≤n2

i(θ)

(hold-afternθ) n+1θ 2n+5

* (hold-afternθ) W

0≤i≤n+1 ifinal) 2n+5

Table 1: PDDL3 modal temporal operators, equivalentLTLf

formulae, and size of the corresponding minimal DFA.

Nested operators are abbreviated with a superindex. Opera- torweak-next( ) is defined as ϕBfinal∨ ϕ. We provide correctedLTLfformulae for operators tagged with∗.

The syntax of PDDL3 does not allow nesting of modal operators, and is limited to conjunctions of modal operators and literals. Despite its restricted syntax, PDDL3 temporal operators subsume GR(1) and request-response formulas. In this paper, all PDDL3 formulae apply PDDL3 modal oper- ators to propositional formulae, but note that PDDL3 also allows for a lifted syntax.

Relation with Request-Response formulas Request- response formulas(p→♦q) correspond to PDDL3 opera- tor (sometime-afterp q).

Relation with GR(1) formulasWhen interpreted over finite traces, GR(1) formulasV

i♦αi → V

j♦βjare equivalent to PDDL3 operator (at-endθ), withθ=V

iαi→V

jβj. Relation with FOND planningFOND planning with final- state goals is a special case of FOND planning with PDDL 3 goals in the form (at-endθ). Solutions in the form of a policy can be constructed from strategies (cf. Proposition 1).

Reduction to Structured Games FOND planning with PDDL3 goals can be reduced to LTLf FOND by rewriting PDDL3 operators inLTLf, as indicated in Table 1.

Complexity AnalysisThe following theorem clarifies the complexity of FOND planning with PDDL3 goals, which to the best of our knowledge has not yet been considered in the literature. The formulation is a bit more involved than earlier results due to the use of numeric values in some of the PDDL3 temporal operators.

Theorem 15. Assuming a unary encoding of numbers in PDDL3 modal operators, plan existence for FOND plan- ning with PDDL3 goals is EXP-complete for domain and goal complexity, and the goal complexity is in PTIME pro- vided that there is a bound on the number of conjuncts in the goal that use numeric temporal operators.

Proof sketch.FOND planning with PDDL3 goals can be re- duced toLTLfFOND (cf. Table 1).LTLfFOND planning can

be reduced to solving a DFA game3following a similar tech- nique as in Section 5.2. The game DFA is the product of (i) a DFA encoding the transition system (which is exponential, (ii) multiple DFA, one per PDDL3 operator in the goal for- mula. The former DFA is of exponential size in the domain, and each of the latter DFAs is of polynomial size (see third column of Table 1), assuming a unary representation of nu- meric values. The product DFA is thus of exponential size, and if the domain is fixed, the restriction on the number of numeric conjuncts suffices to obtain a polynomial-size DFA.

We conclude by recalling that DFA games can be solved in polynomial time in the size of the DFA (see e.g. (Camacho, Bienvenu, and McIlraith 2018)).

7 Summary and Discussion

Synthesizing strategies for software agents is a central prob- lem in artificial intelligence. In this work, we made a step towards reconciling two different perspectives that address this problem: AI planning andLTLsynthesis.

Our work contributes with principled, bidirectional map- pings between so-called LTL FOND planning, two-player games, and reactiveLTLsynthesis. By means of these map- pings, we were able to revisit well-known complexity results inLTLandLTLfFOND planning, and derive new ones.

A further contribution was the identification of restricted forms ofLTLandLTLfFOND planning for which plan syn- thesis can be realized more efficiently than in the general case. ForLTLFOND planning, we derived results for GR(1) goals that are analogous to so-called GR(1) games. For the finite-trace setting, we provided novel complexity results forLTLfFOND planning with PDDL3.0 goals and observed that, over finite traces, its expressivity subsumes other frag- ments including GR(1). These results show thatLTLFOND andLTLf FOND plan synthesis can be performed with the same computational complexity as traditional FOND plan- ning with final-state goals, but utilizing a richer class of tem- porally extended goal formulae.

Acknowledgements

We gratefully acknowledge funding from the Natural Sciences and Engineering Research Council of Canada (NSERC) and Microsoft Research.

References

Albarghouthi, A.; Baier, J. A.; and McIlraith, S. A. 2009. On the use of planning technology for verification. InVVPS@ICAPS.

Aminof, B.; De Giacomo, G.; Murano, A.; and Rubin, S. 2018.

Planning and synthesis under assumptions.CoRRabs/1807.06777.

Bacchus, F., and Kabanza, F. 2000. Using temporal logics to ex- press search control knowledge for planning.Artificial Intelligence 116(1-2):123–191.

Baier, J., and McIlraith, S. 2006a. Planning with temporally ex- tended goals using heuristic search. InICAPS, 342–345.

3In the context of this paper, DFA games can be seen as struc- tured games where the winning condition is given by the accep- tance of DFA.

Références

Documents relatifs

Our inclusion criteria were the following: (1) one of the measures used in the studies must be a metacognitive measure which is a subjective evaluation of performance (i.e., a

MCMAS can thus solve, in particular, the strategy existence problem for LTL objectives, when we restrict attention to memoryless agents, i.e., agents whose strategies depend only on

In this work, we propose a fuzzy spatio-temporal description logic named f-ALC(D)-LTL that extends spatial fuzzy description logic f-ALC(D) [4] with linear temporal logic LTL [5]..

To check if a protocol or a session of a protocol does not contain flaw, we have proposed to resort to model-checking, using an algebra of coloured Petri nets called ABCD to model

The system takes as input multiple traces (that can be automatically generated), corresponding to the behaviour of a distributed system, and an LTL formula. Then the formula

In this section, we reduce the realizability problem with a specification given by a turn- based universal co-B¨uchi automaton (tbUCW) to a specification given by a turn-based

&#34; b e i ng s formed .by bodiEis of kn~wledg.e .' we encounter through languAge' .e a our essential m'edium of exposure to

2) Second, we abstract away the self-modifying instruc- tions and proceed as if these instructions were not self- modifying. In this case, we translate the binary codes to