• Aucun résultat trouvé

Automated synthesis of low-rank stochastic dynamical systems using the tensor-train decomposition

N/A
N/A
Protected

Academic year: 2021

Partager "Automated synthesis of low-rank stochastic dynamical systems using the tensor-train decomposition"

Copied!
83
0
0

Texte intégral

(1)

Automated Synthesis of Low-Rank Stochastic

Dynamical Systems Using the Tensor-Train

Decomposition

by

John Irvin P. Alora

B.S. in Electical Engineering, United States Air Force Academy (2014)

Submitted to the Department of Aeronautics and Astronautics

in partial fulfillment of the requirements for the degree of

Master of Science in Aeronautics and Astronautics

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2016

c

○ 2016 John Irvin P. Alora. All rights reserved.

The author hereby grants to MIT and the Charles Stark Draper Laboratory, Inc. permission to reproduce and to distribute publicly paper and electronic copies of this

thesis document in whole or in part in any medium now known or hereafter created.

Author . . . .

Department of Aeronautics and Astronautics

May 19, 2016

Certified by . . . .

Sertac Karaman

Associate Professor of Aeronautics and Astronautics

Thesis Supervisor

Certified by . . . .

Nathan Lowry

Senior Member of Technical Staff

Draper Fellow Supervisor

Accepted by . . . .

Paulo C. Lozano

Associate Professor of Aeronautics and Astronautics

Chair, Graduate Program Committee

(2)
(3)

Automated Synthesis of Low-Rank Stochastic Dynamical

Systems Using the Tensor-Train Decomposition

by

John Irvin P. Alora

Submitted to the Department of Aeronautics and Astronautics on May 19, 2016, in partial fulfillment of the

requirements for the degree of

Master of Science in Aeronautics and Astronautics

Abstract

Cyber-physical systems are increasingly becoming integrated in various fields such as medicine, finance, robotics, and energy. In these systems and their applications, safety and correctness of operation is of primary concern, sparking a large amount of interest in the development of ways to verify system behavior. The tight coupling of physical constraints and computation that typically characterize cyber-physical systems make them extremely complex, resulting in unexpected failure modes. Fur-thermore, disturbances in the environment and uncertainties in the physical model require these systems to be robust. These are difficult constraints, requiring cyber-physical systems to be able to reason about their behavior and respond to events in real-time. Thus, the goal of automated synthesis is to construct a controller that provably implements a range of behaviors given by a specification of how the system should operate.

Unfortunately, many approaches to automated synthesis are ad hoc and are limited to simple systems that admit specific structure (e.g. linear, affine systems). Not only that, but they are also designed without taking into account uncertainty. In order to tackle more general problems, several computational frameworks that allow for more general dynamics and uncertainty to be investigated.

Furthermore, all of the existing computational algorithms suffer from the curse of dimensionality, the run time scales exponentially with increasing dimensionality of the state space. As a result, existing algorithms apply to systems with only a few degrees of freedom. In this thesis, we consider a stochastic optimal control problem with a special class of linear temporal logic specifications and propose a novel algorithm based on the tensor-train decomposition. We prove that the run time of the proposed algorithm scales linearly with the dimensionality of the state space and polynomially with the rank of the optimal cost-to-go function.

Thesis Supervisor: Sertac Karaman

(4)
(5)

Acknowledgments

First and foremost, I want to thank my family and godparents for their unconditional love and support. As Peter is to the Catholic church, my family is to me, the rock on which I build my life and future. I cannot thank them enough for the personal sacrifices they have made to give me an opportunity to pursue my dreams. Without them, the fruits of my work and labor would be for naught. I thank God for allowing me to have them in my life.

Many thanks to my academic advisors for their patience and guidance throughout the entirety of this work. Especially, to my advisor, Professor Sertac Karaman for his insight and creativity; your guidance has profoundly shaped my academic interests and has helped me to develop the mathematical rigor required to solve the really hard problems. A special thanks to my Draper advisor, Dr. Nathan Lowry for his advisorship and for taking interest in my personal development as an engineer and Air Force officer. Your extra instruction and homework have served as useful breaks from the monotony of classes and thesis work. Lastly, I would like to thank Dr. Alex Gorodetsky for his contributions. Without his guidance and brilliance, this work would not have been possible.

A heartfelt thanks to all of my friends and colleagues who have made my experience in Boston and at MIT, an unforgettable one. To all of my friends in the FAST group, especially David Miculescu and Fabian Reither, thank you all for helping make the lab a creative and exciting place. I sincerely value all of your friendship. To my roommates Juddy and Zach, thank you guys for making the trap house a special sanctuary from my work, reserved for debauchery and good times. To all my friends in Boston, especially Joe Cordes, Justin Bentubo, and Matt McDermott, here’s to the nights we can’t remember but will never forget. Thank you guys for making my time in Boston a really good time. Lastly, to a special red head–your constant encouragement was significant in getting this thesis turned in on time. It has been an amazing two months, and I have cherished every moment of it.

(6)

Although our intellect always longs for clarity and certainty, our nature often finds uncertainty fascinating.

(7)

Contents

1 Introduction 13

1.1 Motivation . . . 13

1.2 Literature Review and Analysis . . . 15

1.3 Thesis Contributions and Outline . . . 19

2 Background 21 2.1 Model Checking . . . 21

2.2 Stochastic Optimal Control . . . 25

2.3 Tensor Decomposition . . . 33

2.4 Summary . . . 36

3 Problem Definition 37 3.1 Stochastic Optimal Control with sc-LTL Specifications . . . 37

4 Proposed Algorithm 41 4.1 Consistent discretization of stochastic optimal control problems . . . 41

4.2 Stochastic Shortest Path Problem . . . 45

4.3 Proposed Algorithm . . . 47

5 Analysis 51 6 Experiments 59 6.1 Three Dimensional Dubin’s Car . . . 59

6.2 Six Dimensional Dubin’s Plane . . . 63

(8)

6.4 Summary of Results . . . 73

7 Conclusions 75

A Experiments 77

(9)

List of Figures

1-1 Air traffic over Airspace of the United States (source: NASA) . . . . 14 1-2 Autonomous Indoor Robotic Aircraft (source: MIT) . . . 15

2-1 Three-Dimensional Tensor, 𝒳 ∈ R𝐼,𝐽,𝐾 . . . 34 2-2 Fibers of a three-dimensional tensor, 𝒳 (a) First Dimension (column)

fibers x:𝑗𝑘 (b) Second Dimension (row) fibers x𝑖:𝑘 and (c) Third

Di-mension (tube) fibers x𝑖𝑗: [32] . . . 35

3-1 Product Space as a set of continuous layers, corresponding to the dis-crete modes in 𝒜 . . . 39

6-1 DFA for eventually visiting three regions, characterized by the formula 𝜙 = ♦𝐴1∧ ♦𝐴2∧ ♦𝐴3 . . . 60

6-2 Comparison of a simulated sample trajectory, 𝑥[𝜔] using interpolated control policy found from tensor-based value iteration and exact value iteration for a three dimensional Dubin’s Car. The parameters used for the tensor-based approach were ℎ = 0.125 and 𝜖 = 10−2 . . . 61 6-3 Norm of the Dubin’s Car Cost Function at ℎ = 0.125 for varying levels

of 𝜖 as a function of iteration number. The norm from exact VI is provided by the thick black line as reference . . . 61 6-4 Rank of Cost Function after 100 iterations computed by Tensor-based

VI for different number of automata states for various tensor errors 𝜖 at ℎ = 0.125 . . . 62

(10)

6-5 Cost functions obtained from exact VI at various initial states and modes of 𝒜𝜙. The exact cost functions are taken at (a) 𝑥0 = (𝑥, 𝑦, 0)

and 𝑞 = {}, (b) 𝑥0 = (𝑥, 𝑦, 0.77) and 𝑞 = 𝑎1, and (c) 𝑥0 = (𝑥, 𝑦, 1.77)

and 𝑞 = {𝑎1, 𝑎3}. . . 62

6-6 Cost functions obtained from tensor-based VI at various initial states and modes of 𝒜𝜙. The approximate cost functions are taken at (a)

𝑥0 = (𝑥, 𝑦, 0) and 𝑞 = {}, (b) 𝑥0 = (𝑥, 𝑦, 0.77) and 𝑞 = 𝑎1, and (c)

𝑥0 = (𝑥, 𝑦, 1.77) and 𝑞 = {𝑎1, 𝑎3}. . . 63

6-7 Simulated sample trajectory, 𝑥[𝜔], starting at x1

0 = (1, 1, 1, 0, 0, 0),

x20 = (0, 3, 3, 0, 0, 1), x30 = (9, 7, 6, 𝜋, 0, 1) under the feedback policy found from tensor-based value iteration with 𝑛 = 32, 𝑚 = 4, and 𝜖 = 10−2 . . . 64 6-8 Fraction of states evaluated for Dubin’s Plane Problem by Tensor-based

VI for various discretizations ℎ at 𝜖 = 10−2 . . . 66 6-9 Norm of the Agile Aircraft Cost Function at ℎ = 0.0625 for 𝜖 = 1𝑒 − 2

as a function of iteration number . . . 66 6-10 Simulated sample trajectory, 𝑥[𝜔], starting at x1

0 = (1, 1, 4.7, 0, 0, 2),

x2

0 = (9, 9, 4.5, 𝜋, 0, 1) under the feedback policy found from

tensor-based value iteration with 𝑛 = 32, 𝑚 = 4, and 𝜖 = 10−2 . . . 67 6-11 Fraction of states evaluated for Agile Plane Problem by Tensor-based

VI (Mission 1) for various discretizations ℎ at 𝜖 = 10−2 . . . 68 6-12 DFA for visiting 𝐴3 before subsequently visiting either area 𝐴1 or 𝐴2,

characterized by the formula 𝜙2 = 𝐴2∧ ○ (♦𝐴1∨ ♦𝐴2) . . . 69

6-13 Simulated Trajectory of 𝑥[𝜔] for initial states (a) x1

0 = (1, 3.4, 5.5, 𝜋/12, 0, 2),

x20 = (1.2, 4, 5.5, 3𝜋/2, 0, 2) and (b) x10 = (1, 7, 5.5, 3𝜋/2, 0, 2), x20 = (1.7, 8, 5.5, 3𝜋/2, 0, 2) . . . 71 6-14 Cost functions obtained from tensor-based VI at various initial states

and modes of 𝒜𝜙2. The exact cost functions are taken at (a) 𝑥0 =

(𝑥, 𝑦, 5, 5, 𝜋/12.0, 0.0, 2.0) and 𝑞 = {}, (b) 𝑥0 = (𝑥, 𝑦, 4.551, 0.132, 0.189, 2.138)

(11)

6-15 Cost functions obtained from tensor-based VI at various initial states and modes of 𝒜𝜙. The exact cost functions are taken at (a) 𝑥0 =

(𝑥, 𝑦, 5, 5, 3𝜋/2, 0.0, 2.0) and 𝑞 = {}, (b) 𝑥0 = (𝑥, 𝑦, 4.92, 5.09, −0.368, 2.34)

and 𝑞 = 𝑎3 . . . 72

A-1 Norm of Dubin’s Plane Cost Function for various discretization levels with 𝜖 = 1𝑒−2 . . . 77

(12)

List of Tables

6.1 Convergence Times of Proposed Algorithm and Naïve VI for all Ex-periments (in seconds) with 𝜖 = 1𝑒 − 2 for Dubin’s Car, Dubin’s Plane, and Agile Aircraft (Mission 1) cases. Experiment with Agile Aircraft (Mission 2) utilized 𝜖 = 1𝑒 − 3 . . . 73

(13)

Chapter 1

Introduction

1.1

Motivation

Cyber-physical systems are of primary interest due to their potentially significant im-pact in various applications, including highway traffic management, air traffic man-agement, networked power grids, and autonomous vehicles [39]. In general, cyber-physical systems refer to a new generation of engineered systems that require tight integration of computation, communication, and control technologies to achieve sta-bility, reliasta-bility, performance, efficiency, and robustness in many application domains. Specifically, cyber-physical systems in highway traffic management play a role in man-aging speed and spacing between vehicles, subject to changes in the roads and in the weather. The ultimate goal of these systems is to ensure safety by maintaining near-zero collision. Similarly, in air traffic management, cyber-physical systems play an important role in managing and allocating the flight trajectories of thousands of air-craft. In order to do this safely, the system must be robust to changes in weather, aircraft emergencies, and deviation of aircraft from their specified flight path. In net-worked power grids, cyber-physical systems are expected to handle power outages by allocating the load of distribution other power plans in the grid in order to minimize cost and damage. Lastly, cyber-physical systems within autonomous vehicles and robotics allow for a tighter coupling of motion and task planning, a significant topic of research.

(14)

Figure 1-1: Air traffic over Airspace of the United States (source: NASA)

As evident in the application examples above, cyber-physical systems must be able to reason about the correctness of their behavior in order to be robust to changes in their environment. Furthermore, the uncertainty in the physical models requires that the dynamical equations modeling the physical process take into account random noise. This and the increasingly tight integration of computation, communication, and control technologies often create non-trivial failure modes. Thus, the operation of cyber-physical systems in practical applications necessitates the development of formal methods to specify, design, and verify the system behavior.

Many sources of uncertainty affect the behavior of cyber-physical systems. Due to the consequences of unexpected behavior in these systems, it is important to con-cisely and unambiguously specify the desired system behavior. Once the behavior is specified, the goal is to automatically synthesize a controller that provably imple-ments this behavior. This problem is known as automated synthesis. Automated synthesis has attracted a tremendous amount of attention in recent years, leading to the development of algorithms that allow for systematic and provably-correct control design. In particular, this line of work has had a tremendous impact in motion and task planning problems, where the automatic construction of robot control policies subject to high-level task specifications are of primary interest. However, most of the existing algorithms suffer from the curse of dimensionality, i.e., run time scales ex-ponentially with the dimensionality of the state space. As a result, these algorithms apply to systems with only a few degrees of freedom.

(15)

com-Figure 1-2: Autonomous Indoor Robotic Aircraft (source: MIT)

putational framework for automated controller synthesis that solves general, high-dimensional, stochastic, nonlinear dynamic systems. This framework draws on ideas from optimization, model checking, and tensor decomposition in order to automate the design of controllers that guarantee correct behavior of large cyber-physical sys-tems such as robots and autonomous vehicles.

1.2

Literature Review and Analysis

Correct-by-design automated construction of control systems have attracted a tremen-dous amount of attention in recent years, leading to the development of algorithms that allow for systematic and provably-correct control design. In particular, this line of work has had a tremendous impact in motion and task planning problems, where the automatic construction of robot control policies subject to high-level task speci-fications are of primary interest. These problems require reasoning on both discrete task specifications and continuous robot motions, which poses significant computa-tional challenges. The survey paper [34] highlights the need for automatic synthesis in fully autonomous vehicles and details the corresponding challenges. Specifically these vehicles must be able to navigate partially known urban environments with static and dynamic obstacles, obey traffic laws, and operate in different vehicles modes (i.e. on-and off-road driving). The need for automated synthesis in robotic applications is further highlighted in the survey done in [48], which reviews existing techniques with specific application to task and motion planning. The works highlighted in this sur-vey generalize the classical motion planning problem by constraining the system to meet specified high-level tasks.

(16)

A wide variety of sampling-based algorithms [1, 25, 29, 46] have been proposed in literature to tackle the motion planning problem. They have been shown to be very effective in solving high-dimensional problems in robotic navigation and manipula-tion [26, 27, 50]. In order to perform task and momanipula-tion planning, a variety of auto-mated synthesis techniques deal with tight coupling of discrete task planning with sampling-based motion planning. Recent work has focused on the application of sim-ilar controller synthesis methods in robot motion planning problems where the tasks are specified using the Linear Temporal Logic (LTL) language [28, 45, 47, 54, 55, 57]. In the motion planning context, a transition graph (or tree) that abstracts the sys-tem dynamics is generated incrementally using a sampling based algorithm and is then augmented with the finite state automata (which encodes the LTL specifica-tions). The resulting product transition graph then represents possible poses of the robotic system and has the additional property that the sequence of pose transitions eventually satisfy the specification.

A different set of techniques were highlighted in the survey work of [5], which tackle the problem of automated synthesis of provable correct robot control laws using a hierarchical, three-level process approach. The first level, called the specification level, is an obstacle-free configuration space of the robot partitioned into cells. A path in the partitioned configuration space that traverses the cells containing an initial point to the cell containing the final point is said to satisfy the specification. At the second level, called the execution level, a path that satisfies the specification is chosen based off of an optimality criteria. Finally, at the third step, known as the implementation level, a controller is generated to implement the a trajectory that follows the chosen path. Literature in this direction typically focus on two types of approaches: a top-down approach where the environment drives the discretization and a bottom-up approach where discretization is carried out at the control and dynamical level.

These two approaches have their pros and cons. For example, the environment-driven approach is favored when the specification is given in terms of regions of interest in the environment. While it is simpler to design algorithms in this framework, it

(17)

is limited to systems with simple dynamics and priori-known environments [17, 31]. The seminal papers by Tabuada and Pappas [53] as well as Kloetzer and Belta [30] analyze linear systems and synthesize controllers by constructing a discrete abstrac-tion of the state space that captures all essential properties that can be represented in the specification language. Techniques applying the control-driven approach are favored in unknown environments and when the system dynamics are more complex (e.g. coping with robot mechanical constraints), but typically require ad hoc designs making them difficult to design. Even then, the control-design approach in literature is only applied to small dimensional and simple nonlinear dynamics [12, 13]. Typical solution methodologies in this framework require a linear approximation of the sys-tem to compute a controller that implements the desired discrete behavior specified by the specification [6, 16, 33].

The methodologies in the literature described above pertain only to determinis-tic systems, where perfect modeling and full knowledge of states are assumed. As such, these methods are limited to deterministic behavior and perfect knowledge of the environment. The seminal paper by Krezz-Gazit, Fainekos, and Pappas [33] was one of the first papers to investigate automated synthesis in uncertain environments. Their approach considers an efficient fragment of LTL called Generalized Reactivity[1] (GR[1]) [44]. Given a dynamical robot model and a temporal logic formula modeling all possible ways an environment can behave (admissible environments), the goal is to compute a controller that satisfies the system specifications within all possible admis-sible environments. The approach naturally takes on a game-theoretic formulation between the robot and the environment and enables for explicit modeling of many possible scenarios. The downside to this approach is that all possible scenarios are assumed to be explicitly modeled and thus it cannot handle unexpected behaviors that are not and/or cannot be modeled.

Recent work in probabilistic automated synthesis has begun to consider the prob-lem of uncertainty. The hallmark of these methods is the Markov Decision Process (MDP), which is a mathematical framework for modeling decision making where out-comes are random and under the control of a decision maker. More precisely, it is a

(18)

discrete time stochastic control process whereby at each step, the process is in some state 𝑠, and may choose an action 𝑎 which results in a probability 𝑃 (𝑠 | 𝑠′, 𝑎) that 𝑠 randomly transitions to another state 𝑠′ and incurs a cost 𝑔(𝑠, 𝑎, 𝑠′). State transitions in the MDP process satisfy the Markov property, which means that given 𝑠 and 𝑎, it is conditionally independent of all previous states and actions. We provide a more rigorous definition of MDPs in Section 2.2.2.

The works of [14, 15,18, 37, 56] consider controller synthesis for stochastic systems. The works [14, 15, 56] construct a coarse MDP abstraction of the system where the transition probabilities between regions are generated through simulation. While [14, 15] considers the problem of synthesizing controllers for persistent tasks in dynamic environments, the work in [37] is interested in maximizing the probability of satisfying specifications on properties of regions in the environment. The work of [56] assumes that knowledge of the transition probabilities in the MDP are not exactly known, but belong in an uncertainty set. The goal is then to synthesize a controller that maximizes the worst-case probability of satisfying the specification. The works of [14, 15] and [56] couple the high-level specifications with low-level behavior of the system by building a product MDP between the automata representing the specification and the MDP abstraction of the system. Also, a recent paper [18] looks at a timed variation of LTL called Metric Temporal Logic (MTL), and constructs a product MDP between an approximating MDP of the original system and the finite automata of the MTL specification. Given this product MDP, an optimal policy with respect to maximizing the probability of satisfying the specification is computed. A nice feature of the framework [18] is that the behavior of the system is proven to converge to the continuous-time stochastic product process as the discretization becomes finer.

The tight interplay between the physical/modeling constraints and computation inherent in the approaches to automated synthesis, as detailed in the research above, result in complex systems. This results in a computational challenge that requires reasoning over a vast hybrid discrete/continuous space that capture these constraints. For example, in motion planning, these constraints stem from complex geometries, motion dynamics, collision avoidance, and goal specification. Unfortunately, for

(19)

gen-eral dynamical systems, as the abstraction of the system model becomes finer and the high-level specifications become more complex, automated synthesis hits a computa-tional barrier. This barrier is known as the state explosion problem [34]. Hence, exist-ing algorithms are either restricted to simple systems, e.g. linear dynamical systems, or they are intractable, e.g. the running time scales exponentially with increasing dimensionality of the state space. In order to handle more complex dynamics, com-putational methods typically require discretization of the state space into a regular grid, rendering the number of discrete state exponential in the dimensionality of the state space. This is known as the curse of dimensionality and is an inherent prob-lem in almost all task and motion planning probprob-lems that involve complex system dynamics. One of the few works that addresses this problem in automated synthesis is [24], which proposes an efficient framework for synthesizing controllers subject to high-level specifications using recent progress in tensor decompositions. The idea is to solve a series of constrained reachability problems and then composing the solutions. Unfortunately, the framework only applies to a subclass of continuous-time stochastic dynamical system which have linear partial differential equations.

1.3

Thesis Contributions and Outline

The main contribution of this thesis is an automated synthesis algorithm for general continuous-time stochastic dynamical systems. The algorithm proposed is based on concepts and methods from various fields including optimal control, model checking, and tensor decomposition. In particular, the proposed algorithm first builds a product space that captures the high-level specifications and low-level dynamics of the system. It then computes a control policy that minimizes a given cost function and avoids the curse of dimensionality by carrying out the computation in a compressed manner using the tensor-train decomposition.

Theoretically, we show that computing the optimal solution involves solving a reachability problem in the product space and that it is consistent to the original continuous-time solution in some way. We analytically bound the error of the

(20)

pro-posed algorithm to the actual solution and show convergence of the solution as the error tolerances of the algorithm approach zero. Most importantly, we show that the computational complexity of the algorithm scales linearly in dimensionality, polynomi-ally in the rank of the cost function, linearly in the size of the corresponding automata of the specification, and linearly in the number of states along each dimension, hence overcoming the curse of dimensionality.

The remaining chapters of this thesis, with major contributions highlighted, pro-ceed as follows:

∙ Chapter 2 provides a technical background to ground this research in the fields of model checking, stochastic optimal control, and tensor decomposition. ∙ Chapter 3 is devoted to a formal problem definition. First, we describe the

stochastic control systems and a subclass of temporal logics called the co-safe Linear Temporal Logic. Subsequently, we formulate the automated synthesis problem as a stochastic optimal control problem subject to the specifications. ∙ Chapter 4 details our methodology for solving the problem defined in Chapter

3. The proposed algorithm is presented in this chapter.

∙ Chapter 5 provides analytical guarantees on the characteristics and optimality of the solution. We prove that our algorithm is efficient and scales linearly in dimensionality and polynomially with the rank of the cost function.

∙ Chapter 6 showcases the algorithm on several robotic motion planning problems, including: a three-dimensional Dubin’s car, a six-dimensional modified Dubin’s plane, and a compelling six-dimensional physical model of an agile robotic plane. On top of this, we include illustrative examples to characterize our algorithm’s performance under various conditions. We also compare the performance of our algorithm to existing exact numerical methods.

∙ Chapter 7 summarizes the contributions of this thesis and proposes ways forward towards a more general framework.

(21)

Chapter 2

Background

This chapter presents a review of model checking, stochastic optimal control, and tensor decompositions in order to ground this research in its relevant fields. Existing ideas are examined and synthesized to provide a technical basis for the contributions of this thesis.

First, some preliminaries. The set of integers and that of reals are denoted by Z and R, respectively. Their non-negative counterparts are denoted by Z+ and R+.

A probability space is denoted by (Ω,F , P), where Ω is the sample space, F is a 𝜎-algebra, and P is a probability measure. The expectation operator is denoted by E[·].

2.1

Model Checking

The complexity of reactive systems such as cyber-physical systems has resulted in more time being invested in the verification process than on construction. The role of formal verification is to reduce and ease verification efforts while increase coverage by establishing system correctness through mathematical rigor. As a result, formal verification results in an exhaustive exploration of all possible behaviors. In other words, a system that is correct by formal verification implies that all of its possible behaviors have been explored. Model checking is a form of formal verification whereby mathematical models describe the possible system behavior in a precise and

(22)

unam-biguous way. The goal of model checking is to check a desired behavioral property over a given system through exhaustive enumeration of all states reachable by the system and the behaviors that traverse them.

2.1.1

Linear Temporal Logic

Verification of correctness in reactive systems depends on the execution of the system and upon reasoning about different moments at which they occur. Linear Tempo-ral Logic (LTL) extends the usual propositional logic through tempoTempo-ral operators and provides a mathematical framework for expressing specifications precisely and non-ambiguously [2]. Hence, LTL provides an intuitive but rigorous framework for expressing properties that relate events in system execution. Specifically, formulas in LTL express discrete, linear time properties: at each moment in time there is a single successor moment. It is also time abstract, allowing one to specify relative order of events, but not necessarily the precise timing of events. In the automated synthe-sis literature, LTL is used to capture the richness of the specifications of interest in cyber-physical system, especially in robotics. It is a natural mathematical framework that enables controls engineers to automate the controller synthesis problem in a systematic way.

Given a set of atomic propositions 𝐴𝑃 , an LTL formula 𝜙 is formed by combining propositions with logical (and ∧, or ∨, not ¬) and temporal operators (next ○, until 𝒰 , eventually ♦, always ).

Let us denote 2𝐴𝑃 as the power set of 𝐴𝑃 , i.e. the set of all possible subsets in

AP.

Definition 1. The syntax of Linear Temporal Logic (LTL) formulas over a finite set of propositions is defined inductively in the Backus-Naur form as

𝜙 ::= 𝑝 | 𝜙 ∨ 𝜙 | ¬𝜙 | ○ 𝜙 | 𝜙1𝒰 𝜙2,

(23)

The language of an LTL formula is the set of all infinite sequences (i.e. infinite words) (𝜋1, 𝜋2, . . . ), where 𝜋𝑖 ∈ 2𝐴𝑃, that satisfy the semantics given below.

Definition 2. Let 𝑤 = (𝜋1, 𝜋2, . . . ) be an infinite sequence of sets of atomic

propo-sitions, i.e., 𝜋𝑖 ∈ 2𝐴𝑃. The semantics of LTL formulas is defined inductively as

follows: 𝑤 |= 𝑝 if 𝑝 ∈ 𝜋1 𝑤 |= ¬𝜙 if 𝑤 ̸|= 𝜙 𝑤 |= 𝜙1∧ 𝜙2 if 𝑤 |= 𝜙1 and 𝑤 |= 𝜙2 𝑤 |= ○𝜙 if 𝜋2 |= 𝜙 𝑤 |= 𝜙1𝒰 𝜙2 if ∃𝑖 ≥ 0, 𝜋𝑖 |= 𝜙2 and 0 ≤ 𝑗 < 𝑖, 𝜋𝑗 |= 𝜙1

The other operators can be expressed in terms of equivalence relations based on the syntax and semantics above. Specifically, ⊥ ≡ 𝜋 ∧ ¬𝜋, ⊤ ≡ ¬⊥, 𝜙1 ∨ 𝜙2 ≡

¬(¬𝜙1∧ ¬𝜙2), ♦𝜙1 ≡ ⊤𝒰 𝜙1, 𝜙1 ≡ ¬♦¬𝜙1

Properties such as safety and liveness are expressible in LTL. Specifically, safety properties (𝜙) express that something bad never happen while liveness properties (♦𝜙) express that something good eventually happens. The richness of the LTL language allows for the construction of various specifications to include coverage, sequencing, partial ordering, conditions, and avoidance and persistency [48].

The compositional nature of LTL specifications allows for the construction of complex specifications based on simpler tasks. For example, consider a networked power grid tasked with supplying power to a major city. This task can employ propositions that represent power levels of individual plants, failure in one or several plants, and their status (operational or non-operational). Avoidance and consistency ensure uniform load distribution among the plants, for example, by preventing any one plant from ever going beyond a specified power level. Conditional constructs can be employed to even the load distribution among plants if one or several of them fail. Coverage and partial ordering constructs automate the verification of plant status

(24)

throughout the day by self-scheduling maintenance checks in no particular order or in some particular order, respectively.

The next section details a finite variant of LTL, whereby the always  operator is no longer permitted.

2.1.2

Co-Safe LTL

In this research, syntactically co-safe linear temporal logic (sc-LTL) [38] is used to specify desired system behavior over a finite behavior. More specifically, sc-LTL for-mulas are satisfied by finite sequences of discrete states rather than, on the other hand, infinite sequences which satisfy general LTL formulas. Since many cyber-physical system behaviors are, in general, finite in nature, using sc-LTL formulas does not significantly limit the general applicability of our approach.

An sc-LTL formula is composed from the boolean operators ¬ (negation), ∨ (dis-junction), ∧ (conjunction) and the temporal operators 𝒰 (until), and ♦ (eventually). An sc-LTL formula is written in positive normal form, i.e., negations are only al-lowed in front of atomic propositions. Notice that in this form there is no longer an equivalent representation for the always  operator.

Definition 3. The syntax of syntactically co-safe LTL (sc-LTL) formulas over a finite set of propositions is defined inductively in the Backus-Naur form as

𝜙 ::= 𝑝 | ¬𝑝 | 𝜙1∧ 𝜙2| 𝜙1 ∨ 𝜙2| ○ 𝜙 | 𝜙1𝒰 𝜙2,

where 𝑝 is an atomic proposition, i.e., 𝑝 ∈ 𝐴𝑃 .

The sc-LTL language differs from classical logic with temporal operators, namely ‘until’ (𝒰 ) and ‘eventually’ (♦). The formulas 𝜙1𝒰 𝜙2 and ○𝜙 states that 𝜙1 is

true until 𝜙2 becomes true and 𝜙 becomes true at the next time step, respectively.

The language of an sc-LTL formula is the set of all sequences (𝜋1, 𝜋2, . . . , 𝜋𝑘), where

𝜋𝑖 ∈ 2𝐴𝑃, that satisfy the semantics detailed in Definition 2.

The sc-LTL language formulas can be represented by deterministic finite au-tomata. Most system behaviors are finite and we take advantage of the fact that

(25)

finite behavior is easier to model, especially under uncertainty, in order to develop our framework. Let us formalize this connection.

Definition 4. A Deterministic Finite Automaton (DFA) is a 5-tuple 𝒜 = (𝑄, Σ, 𝛿, 𝑞0, 𝐹 ) where 𝑄 is a finite set of states, Σ is the input alphabet, 𝛿 : 𝑄 × Σ → 𝑄

is the transition function, 𝑞0 ∈ 𝑄 is the initial state, and 𝐹 ⊆ 𝑄 is the set of accepting

states.

A word over an alphabet Σ is a sequence 𝑤 = (𝜋0, 𝜋1, . . . , 𝜋𝑘−1) such that 𝜋𝑖 ∈ Σ

for all 𝑖 ∈ {0, 1, . . . , 𝑘 − 1}. For each word 𝑤 = (𝜋0, 𝜋1, . . . , 𝜋𝑘−1) over the input

alphabet Σ, the corresponding run on automata 𝒜 = (𝑄, Σ, 𝛿, 𝑞0, 𝐹 ) is a sequence

𝜎 = (𝑞0, 𝑞1, . . . , 𝑞𝑘) of states, i.e., 𝑞𝑖 ∈ 𝑄 for all 𝑖 ∈ {0, 1, . . . , 𝑘}, such that (i) 𝑞0 is the

initial state, (ii) 𝑞𝑖+1 = 𝛿(𝑞𝑖, 𝜋𝑖) for all 𝑖 ∈ {0, 1, . . . , 𝑘 − 1}. A run 𝜎 = (𝑞0, 𝑞1, . . . , 𝑞𝑘)

is said to be an accepting run of automaton 𝒜, if it ends in an accepting state, i.e., 𝑞𝑘∈ 𝐹 . The set of all words that correspond to accepting runs is called the language

generated by the automaton 𝒜.

For any sc-LTL formula 𝜙 defined on the set 𝐴𝑃 of atomic propositions, there exists a deterministic finite automaton 𝒜𝜙 with input alphabet Σ = 2𝐴𝑃 such that

the language generated by 𝒜 is precisely the language generated by 𝜙..

2.2

Stochastic Optimal Control

The stochastic optimal control problem is embedded in a myriad of problems whereby a system can be described by a dynamical model that is imperfect and subject to uncertainty. The goal is to compute control policies for the system that minimize some expected cost (or maximize some expected reward). Let us formalize this problem. Let 𝑑, 𝑑𝑤, 𝑑𝑢 ∈ Z+. Consider continuous-time, stochastic systems written in the form

of a Stochastic Differential Equation (SDE):

(26)

where 𝑏 : 𝑋 × 𝑈 → R𝑑 denotes the drift vector, 𝐹 : 𝑋 × 𝑈 → R𝑑×𝑑𝑤 denotes the

diffusion matrix, 𝑋 ⊂ R𝑑 and 𝑈 ⊂ R𝑑𝑢 be compact sets with smooth boundaries and

non-empty interiors. The function 𝑏 and 𝐹 are assumed to be measurable, continuous, and bounded functions. The stochastic process {𝑤(𝑡) : 𝑡 ≥ 0} is the 𝑑𝑤-dimensional

Brownian motion defined on the probability space (Ω,F , P).

The evolution of the state is an 𝑋-valued stochastic process {𝑥(𝑡) : 𝑡 ≥ 0}. We denote a realization of this state process for a given sample path 𝜔, by 𝑥[𝜔], which itself is a mapping from time into 𝑋, i.e., 𝑥[𝜔] : R+ → 𝑋. The stochastic optimal

control problem is then defined as follows

minimize 𝐽 (𝑧) = E [︂∫︁ 𝑇 0 𝑔(𝑥(𝑡), 𝑢(𝑡))𝑑𝑡 + ℎ(𝑥(𝑇 ))⃒⃒𝑥(0) = 𝑧 ]︂ subject to Equation 2.1

where 𝑔 : 𝑋 × 𝑈 → R+ and ℎ : 𝑋 × 𝑈 → R are the stage cost function and terminal

cost function, respectively. Also the first exit time, 𝑇 := inf{𝑡 : 𝑥(𝑡) ̸∈ 𝑖𝑛𝑡(𝑋)} is the first time the trajectory of the system in Equation 2.1 hits the boundary of 𝑋.

The stochastic optimal control problem is commonly found in robotic motion con-trol problems, whereby a robot is subject to environmental disturbances (i.e. wind, current, etc) and uncertainty in the physical model that describes it. Other sources of application include, for example, finance, where the expected profits of a stock port-folio is maximized or supply chains, where the expected operating costs are sought to be minimized. In general, analytical solutions are typically only available for systems with linear dynamics and Gaussian noise [36]. However, many applications such as robot motion control require a framework that can handle general dynamics and non-Gaussian noise. Hence, computational numerical methods have been proposed during the past several decades in order to tackle the stochastic optimal control problem.

The two numerical approaches for solving the stochastic optimal control problem are known as the indirect and direct methods. Indirect methods convert the stochastic optimal control problem to a boundary-value problem. An optimal solution is found by deriving optimality conditions, leading to a boundary-value problem for a special

(27)

class of partial differential equations called the stochastic Hamilton-Jacobi-Bellman (HJB) equation. Numerical methods for solving the stochastic HJB equation have been proposed in [4, 40]. The direct method, on the other hand, transcribes an infinite-dimensional optimization problem to a finite-dimensional problem. Thus, the state space of the original continuous-time system is discretized in time and space, and is approximated by a sequence of discrete Markov Decision Processes (MDP). The sequence of MDPs are then solved to obtain an approximation of the optimal policy for the original problem. Numerical methods such as dynamic programming and linear programming have been proposed [7, 11, 35] for solving this problem. This research focuses on the direct method as it is capable of handling a wide variety of dynamics and cost functions, and thus more general. Many methods that involve solving the HJB equations require specific structure either in the dynamics or the cost function [23] limiting them to a specific subclass of problems.

2.2.1

Dynamic Programming

Dynamic programming is an optimization technique that is widely used as a system-atic tool for solving optimization problems in fields like robotics, operations research, finance, and computer science. The idea behind dynamic programming is to solve a complex problem by breaking it down into simpler subproblems, solving the sub-problems once, and then storing their solutions in memory. The next time the same subproblems come up, one simply looks up the solution to the subproblem instead of recomputing the solution, thus saving computation time. Dynamic programming leverages the idea of the Principle of Optimality, which simply states that the optimal solution of a subproblem is part of the optimal solution to the whole problem.

We formalize this intuition and define the dynamic programming algorithm by considering a discrete-time variant of Equation 2.1. Denote the discrete time system trajectory by 𝑥0, . . . , 𝑥𝑁 ∈ R𝑛, where 𝑥𝑘 is the state at time 𝑡𝑘. Similarly we denote

the inputs to the system as 𝑢0, . . . , 𝑢𝑁 −1 ∈ R𝑚 and the disturbances of the system

as 𝑤0, . . . , 𝑤𝑁 −1 ∈ R𝑟. The system evolves by the state transition function 𝑓𝑘 :

(28)

R𝑛× R𝑚× R𝑟 → R at each time step. Furthermore, the inputs are constrained to lie in some set set such that 𝑢𝑘 ∈ 𝑈𝑘(𝑥𝑘). Then, the stochastic optimal control problem

with horizon length N is

minimize E 𝑤𝑘 [︃𝑁 −1 ∑︁ 𝑘=0 𝑔𝑘(𝑥𝑘, 𝑢𝑘, 𝑤𝑘) + 𝑔𝑁(𝑥𝑁) ]︃ subject to 𝑥𝑘+1 = 𝑓𝑘(𝑥𝑘, 𝑢𝑘, 𝑤𝑘), 𝑘 = 0, . . . , 𝑁 − 1 𝑢𝑘 ∈ 𝑈𝑘(𝑥𝑘), 𝑘 = 0, . . . , 𝑁 − 1

where 𝑔𝑁 is the terminal cost and the expectation is taken with respect to the

distri-bution of 𝑤𝑘. The principle of optimality is stated as follows

Definition 5. Principle of Optimality: Let 𝜋* = {𝜇*0, 𝜇*1, . . . , 𝜇*𝑁 −1} be an optimal policy for the whole problem, and assume that given 𝜋* there exists some state 𝑥𝑖 that

occurs at stage 𝑖 with positive probability. Consider the 𝑖-th subproblem where we start from state 𝑥𝑖 and minimize the cost-to-go from time 𝑖 to time 𝑁

E 𝑤𝑘 [︃ 𝑔𝑁(𝑥𝑁) + 𝑁 −1 ∑︁ 𝑘=𝑖 𝑔𝑘(𝑥𝑘, 𝜇𝑘(𝑥𝑘), 𝑤𝑘) ]︃

Then the truncated policy {𝜇*𝑖, 𝜇*𝑖+1, . . . , 𝜇*𝑁 −1} is optimal for the subproblem

We now present the dynamic programming algorithm and prove its optimality via the principle of optimality.

Proposition 1. For every initial state 𝑥0 ∈ R𝑛 and admissible final state 𝑥𝑁 ∈ R𝑛,

the optimal cost 𝐽*(𝑥0) of the stochastic optimal control problem is equal to 𝐽0(𝑥0),

given by the last step of the following algorithm, which proceeds backwards in time from 𝑁 − 1 to 0 𝐽𝑁(𝑥𝑁) = 𝑔𝑁(𝑥𝑁), 𝐽𝑘(𝑥𝑘) = min 𝑢𝑘∈𝑈𝑘 E 𝑤𝑘 [︃ 𝑔𝑘(𝑥𝑘, 𝑢𝑘, 𝑤𝑘) + 𝐽𝑘(𝑓𝑘(𝑥𝑘, 𝑢𝑘, 𝑤𝑘)) ]︃ , 𝑘 = 0, . . . , 𝑁 − 1

(29)

Also if 𝑢*𝑘 = 𝜇*𝑘(𝑥𝑘) obtains the minimum in 𝐽𝑘 for all 𝑥𝑘 and 𝑘, then the policy

𝜋* = {𝜇*0, . . . , 𝜇*𝑁 −1} is optimal.

Proof. (sketch) For any admissible policy 𝜋 = {𝜇0, . . . , 𝜇𝑁 −1} and 𝑘 = 0, . . . , 𝑁 − 1

denote a sub-policy 𝜋𝑘= {𝜇𝑘, . . . , 𝜇𝑁 −1}. 𝐽𝑘* is the optimal cost for the (𝑁 − 𝑘)-stage

problem. We prove by induction that 𝐽𝑘*is equal to 𝐽𝑘generated by the DP algorithm

in Proposition 1 for all 𝑘, starting from 𝐽𝑁*(𝑥𝑁) = 𝐽𝑁(𝑥𝑁) = 𝑔𝑁(𝑥𝑁) for all 𝑥𝑁.

Let us assume that 𝐽𝑘+1* (𝑥𝑘+1) = 𝐽𝑘+1(𝑥𝑘+1) for some 𝑘 and for all 𝑥𝑘+1. Since

𝜋𝑘 = (𝜇𝑘, 𝜋𝑘+1), the for all 𝑥𝑘 we have

𝐽𝑘*(𝑥𝑘) = min 𝜇𝑘,𝜋𝑘 𝑤𝑘,...,𝑤E𝑁 −1 [︃ 𝑔𝑘(𝑥𝑘, 𝜇𝑘(𝑥𝑘), 𝑤𝑘) + 𝑔𝑁(𝑥𝑁) + 𝑁 −1 ∑︁ 𝑖=𝑘+1 𝑔𝑖(𝑥𝑖, 𝜇𝑖(𝑥𝑖), 𝑤𝑖) ]︃ = min 𝜇𝑘 𝑤E𝑘 [︃ 𝑔𝑘(𝑥𝑘, 𝜇𝑘(𝑥𝑘), 𝑤𝑘) + min 𝜋𝑘 𝑤𝑘+1,...,𝑤E 𝑁 −1 [︃ 𝑔𝑁(𝑥𝑁) + 𝑁 −1 ∑︁ 𝑖=𝑘+1 𝑔𝑖(𝑥𝑖, 𝜇𝑖(𝑥𝑖), 𝑤𝑖) ]︃]︃ = min 𝜇𝑘 𝑤E𝑘 [︃ 𝑔𝑘(𝑥𝑘, 𝜇𝑘(𝑥𝑘), 𝑤𝑘) + 𝐽𝑘+1* (𝑓𝑘(𝑥𝑘, 𝜇𝑘(𝑥𝑘), 𝑤𝑘, 𝜇𝑘)) ]︃ = min 𝜇𝑘 𝑤E𝑘 [︃ 𝑔𝑘(𝑥𝑘, 𝜇𝑘(𝑥𝑘), 𝑤𝑘) + 𝐽𝑘+1(𝑓𝑘(𝑥𝑘, 𝜇𝑘(𝑥𝑘), 𝑤𝑘, 𝜇𝑘)) ]︃ = min 𝑢𝑘∈𝑈𝑘(𝑥𝑘) E 𝑤𝑘 [︃ 𝑔𝑘(𝑥𝑘, 𝜇𝑘(𝑥𝑘), 𝑤𝑘) + 𝐽𝑘+1(𝑓𝑘(𝑥𝑘, 𝑢𝑘, 𝑤𝑘, 𝜇𝑘)) ]︃ = 𝐽𝑘(𝑥𝑘)

where the first equality is from the definition of 𝐽𝑘*. In the second equality, we make use of the principle of optimality argument (the tail portion of an optimal policy is optimal for the tail subproblem) in order to move the minimization over 𝜋𝑘+1 inside

the expectation. The third equation results from the definition of 𝐽𝑘+1* and the fourth results from our initial assumption that 𝐽𝑘+1* = 𝐽𝑘+1. In the fifth equation, we make

use of the fact that for any function 𝐽 and policy 𝜇, min

𝜇∈𝑀𝐽 (𝑥, 𝜇(𝑥)) = min𝑢∈𝑈 (𝑥)𝐽 (𝑥, 𝑢)

where 𝑀 is the set of all function 𝜇(𝑥) such that 𝜇(𝑥) ∈ 𝑈 (𝑥) for all 𝑥. Thus, the sixth equality follows and the proof is finished.

(30)

choosing the minimizing control at each state 𝑥𝑘, 𝜇𝑘(𝑥𝑘) ∈ argmin 𝑢𝑘∈𝑈𝑘(𝑥𝑘) E 𝑤𝑘 [︃ 𝑔𝑘(𝑥𝑘, 𝜇𝑘(𝑥𝑘), 𝑤𝑘) + 𝐽𝑘+1(𝑓𝑘(𝑥𝑘, 𝑢𝑘, 𝑤𝑘, 𝜇𝑘)) ]︃ , 𝑘 = 0, . . . , 𝑁 − 1

In the next section we describe the Markov Decision Process, a framework in which dynamic programming has a natural application.

2.2.2

Markov Decision Process

The direct method involves solving the stochastic optimal control problem via a state space discretization of Equation 2.1, transforming the original continuous-time, continuous-space problem into discrete-time, discrete-space problem in the form a Markov Decision Process (MDP). In summary, assume a system occupies one of a finite set of discrete states 𝑥 ∈ 𝑄. At each state, the system has a set of actions 𝑢 ∈ 𝐴 that can be taken. When an action is chosen, the system transitions from 𝑥 to 𝑥′ ∈ 𝑄 with probability P(𝑥′| 𝑥, 𝑢) while subsequently incurring a cost 𝐺(𝑥, 𝑢). More formally, an MDP is defined as follows

Definition 6. A (discrete) Markov Decision Process (MDP) is a 5-tuple (𝑄, 𝐴, 𝑃, 𝐺, 𝐻), where 𝑄 is a discrete set of states, 𝐴 is a discrete set of actions, 𝑞0 ∈ 𝑄 is an initial state, 𝑃 : 𝑄 × 𝐴 × 𝑄 → [0, 1] is a transition probability function,

𝐺 : 𝑄 × 𝐴 → R+ is the stage cost, and 𝐻 : 𝑄 → R+ is the terminal cost.

When the process is in state 𝑥 ∈ 𝑄 and action 𝑢 is applied, the probability of transitioning to a next state 𝑥′ ∈ 𝑄 is defined as 𝑃 (𝑥′| 𝑥, 𝑢) such that ∑︀

𝑥′∈𝑄

𝑃 (𝑥′| 𝑥, 𝑢) = 1. A policy is a mapping 𝜇 : 𝑄 → 𝑈 such that each state is associated with an action. If an action 𝜇(𝑥) is applied whenever the process reaches state 𝑥 then let {𝜉𝑖 : 𝑖 ∈ N}

denote the random sequence of states that results. The cost associated with {𝜉𝑖 : 𝑖 ∈ N}

is defined as

𝑁

∑︁

𝑖=0

(31)

where 𝜉𝑖 ∈ 𝑄 for all 𝜉 and N is the time the process terminates. Furthermore, an

optimal policy 𝜇* is one that attains the minimum of the cost defined above.

Note that in this framework the state and action sets are finite, thus the transition function is no longer governed by the smooth dynamics 𝑓𝑘(𝑥𝑘, 𝑢𝑘) in Section 2.2.1.

Assuming a stationary system (i.e. the system equation, the cost per stage, and the random disturbance statistics do not change from one stage to the next), a common formulation for solving the stochastic optimal control problem in MDPs is to solve an infinite horizon DP problem (a slightly different version of the problem in Sec-tion 2.2.1). Adopting the notaSec-tion in DefiniSec-tion 6, the cost at state 𝑥 ∈ 𝑄 for a policy 𝜇 is given as 𝐽 (𝑥) = lim 𝑁 →∞ 𝑤E𝑘 [︃ 𝑁 ∑︁ 𝑘=0 𝛼𝑘𝐺(𝜉𝑘, 𝜇𝑘(𝜉𝑘)) ]︃ 𝑘 = 0, 1, . . . ,

where the initial state of the process 𝜉0 = 𝑥. By the definition of 𝐽*, corresponding

to some optimal policy 𝜋* results in a total minimized cost of

𝐽*(𝑥) = min 𝑢 𝐺(𝑥, 𝑢) + lim𝑁 →∞ 𝑁 ∑︁ 𝑘=1 𝛼𝑘𝐺(𝜉𝑘, 𝜇*(𝜉𝑘)),

Let 𝜉1 = 𝑥′. Given the definition of 𝐽 (𝑥′), realizing that 𝐽 (𝑥′, 𝜇*(𝑥′)) = 𝐽*(𝑥′), and

then using the probability transition P given in Definition 6 to represent the transition between 𝑥 and 𝑥′, the cost at state 𝑥 is defined as

𝐽*(𝑥) = min

𝑢 𝐺(𝑥, 𝑢) + 𝛼

∑︁

𝑥′

𝑃 (𝑥′| 𝑥, 𝜇*(𝑥))𝐽*(𝑥′), (2.2)

Equation 2.2 is known as the (Optimal) Bellman Equation, giving rise to standard algorthms such as Value Iteration (VI) and Policy Iteration for solving discrete-state, infinite horizon stochastic optimal control problems.

The two common types of infinite horizon problems are the discounted (𝛼 < 1) and undiscounted (𝛼 = 1) problems. The undiscounted problem, also known as the Stochastic Shortest Path (SSP) problem, is common in the minimum expected

(32)

time formulation. Value iteration solves the transcribed stochastic optimal control problem by iterating through Bellman’s Equation (2.3) for all states in the MDP. In more rigorous terms, denote 𝑇 to be the Bellman operator such that for an arbitrary cost function 𝐽 , (𝑇 𝐽𝑘)(𝑥) = 𝐽𝑘+1 = min 𝑢 [︁ 𝐺(𝑥, 𝑢) +∑︁ 𝑥′ 𝑃 (𝑥′| 𝑥, 𝑢)𝐽𝑘(𝑥′) ]︁ , ∀𝑥 ∈ 𝑄 (2.3)

The idea is that each successive approximation 𝐽0, . . . , 𝐽𝑖 is improving and as 𝑘 → ∞,

Equation 2.3 eventually converges to the optimal solution (2.2). Assuming bounded and strictly positive stage cost, this convergence is guaranteed for the discounting case. However, in the discounting case convergence is only guaranteed if there exists at least one policy that is guaranteed to drive the system to a terminal, absorbing region. Another technique, known as policy iteration, selectively improves the policy and value function independently. While this method is generally more rapid in practice, both require a significant amount of computational time in order to converge, especially for systems with large dimensionality.

Value and policy iteration are both exponential time algorithms with respect to the dimensionality of the problem. In order to mitigate the curse of dimensionality, the common approach is to parametrize the cost function with a sparse set of basis functions, leading to Approximate Dynamic Programming (ADP) [7, 49]. While this is good for computation, the solutions are not guaranteed to be optimal and are almost always, in practice, suboptimal. Furthermore, convergence for many of these algorithms are not guaranteed.

The algorithm proposed in this research can be considered as an ADP algorithm, but with the added property that we can approximate the cost function to an arbitrary 𝜖 error (which is tunable). In order to do this, the cost function is represented as a tensor instead of by a set of basis functions. We can then utilize recent advancements in the field of tensor decompositions to combat the curse of dimensionality.

(33)

2.3

Tensor Decomposition

Tensors are multidimensional arrays that often arise when representing multidimen-sional functions on tensor product grids, and in the context of this thesis, we will interpret the discretized cost-to-go function as a tensor. To combat the curse of di-mensionality, we seek to exploit low rank structure in the cost function. We utilize tensor decompositions since they are compressed representations of multidimensional arrays, allowing us to represent an array with fewer elements than its total size. The complexity of algorithms dealing with these representations are dominated by the rank of the tensor and often retain linear complexity with dimensions.

Separation of variables plays an important role in tensor representation and is closely tied to tensor rank. Specifically, tensor decompositions represent a multidi-mensional function by sums of separable functions. For example, the representation of a multidimensional function 𝑓 : R𝑑 → R that has rank 𝑅 in the canonical

decom-position format is 𝑓 (𝑥1, . . . , 𝑥𝑑) = 𝑅 ∑︁ 𝑟=1 𝑓𝑟(1)(𝑥1) · 𝑓𝑟(2)(𝑥2) · · · 𝑓𝑟(𝑑)(𝑥𝑑).

where 𝑓𝑟𝑘 are one dimensional functions of the 𝑘-dimensional variable.

Although the canonical decomposition maintains linear complexity with dimen-sionality, determining the canonical rank 𝑅 is an NP-hard problem [52], and finding a best rank 𝑅 approximation of 𝑓 is ill-posed [22]. Another representation, known as the Tucker decomposition, allows for easy computation of a low rank approximation, but has exponential complexity in dimensionality. The tensor-train (TT) decompo-sition [41], on the other hand, maintains the advantages of both representations and the algorithms for its computation have strong guarantees. Hence, we leverage the TT decomposition and present algorithms in the context of this tensor representation.

Consider a multidimensional array with each dimension 𝑖 discretized into the set 𝒳𝑖 = {𝑥𝑖[1], . . . , 𝑥𝑖[𝑛𝑖]} of 𝑛𝑖 ∈ Z+ points. Now let F : 𝒳1× . . . × 𝒳𝑑→ R be a tensor

(34)

Figure 2-1: Three-Dimensional Tensor, 𝒳 ∈ R𝐼,𝐽,𝐾

whose elements are function evaluations of the function 𝑓 on the tensor product grid – i.e., F[𝑖1, . . . , 𝑖𝑑] = 𝑓 (𝑥1[𝑖1], . . . , 𝑥𝑑[𝑖𝑑]). The elements of a tensor in TT format can

be computed through the summations

F[𝑖1, 𝑖2, . . . , 𝑖𝑑] = 𝑟0 ∑︁ 𝛼0=1 𝑟1 ∑︁ 𝛼1=1 · · · 𝑟𝑑 ∑︁ 𝛼𝑑=1 F1[𝛼0, 𝑖1, 𝛼1] · · · F𝑑[𝛼𝑑−1, 𝑖𝑑, 𝛼𝑑]

where the three dimensional arrays F𝑘 ∈ R𝑟𝑘−1×𝑛𝑘×𝑟𝑘 are called the TT-cores and 𝑟𝑘

are called the TT-ranks with 𝑟0 = 𝑟𝑑= 1. Thus, to store a tensor in TT-format only

requires storing each of its cores, if we assume constant ranks this yields a 𝒪(𝑑𝑛𝑟2) storage cost. The TT-ranks are bounded above by the ranks of each unfolding matrix of the tensor, i.e. 𝑟𝑘 is guaranteed to not be higher than the rank of the unfolding

matrix F𝑘[𝑖

1, . . . , 𝑖𝑘; 𝑖𝑘+1, . . . , 𝑖𝑑]. The proof of this statement is constructive [41] and

provides an algorithm for decomposing a tensor into its TT representation through a sequence of singular value decompositions (SVD). The algorithm allows for the com-putation of an approximation ˜F to F with an accuracy 𝜖 such that ‖ ˜F−F‖𝐹 ≤ 𝜖‖F‖𝐹.

It can be shown that this computation can be performed in 𝒪(𝑑𝑛𝑟2) operations if we

let 𝑛𝑘 = 𝑛 and 𝑟𝑘= 𝑟 [41]. The problem with the SVD-based algorithm in [41] is that

it requires evaluating all of the elements in F, making it computationally impractical for large-scale tensors with many dimensions. Thus, we will use the interpolation

(35)

(a) (b) (c)

Figure 2-2: Fibers of a three-dimensional tensor, 𝒳 (a) First Dimension (column) fibers x:𝑗𝑘

(b) Second Dimension (row) fibers x𝑖:𝑘 and (c) Third Dimension (tube) fibers x𝑖𝑗: [32]

algorithm [42] based on replacing the SVD with the CUR/Skeleton decomposition to restrict the number of elements of F that are evaluated. In two dimensions, this algorithm can be realized as seeking the skeleton decomposition of a matrix 𝐴 can be written as follows

𝐴 = 𝐴[:, 𝐶]𝐴[𝐼, 𝐶]†𝐴[𝐶, :]

where 𝐼 with |𝐼| ≥ 𝑟 is a set of rows and 𝐶 with |𝐶| ≥ 𝑟 is a set of columns in A. Notice that this decomposition only requires access to certain rows and columns of the matrix 𝐴, and the higher dimensional analogue similarly only requires access to certain fibers of the tensor.

This interpolation algorithm, called TT-cross, achieves 𝜖-level accuracy and uses the Maxvol [19] algorithm to choose the necessary fibers for evaluation. The Maxvol algorithm chooses a set of fibers such that the cross matrix 𝐴[𝐼, 𝐶] of each unfolding matrix is a sufficiently good cross matrix among those that are nonsingular. The TT-cross algorithm uses the Maxvol [19] algorithm to choose the necessary fibers, such that 𝐴[𝐼, 𝐶] has maximum volume.

The algorithm requires specification of the upper bounds to each 𝑟𝑘, which, if

set too low result in errors in the approximations; however if they are too high, the computational effort increases. We utilize a rank-adaptive version of the algorithm,

(36)

which can be found in [51].

2.4

Summary

In this chapter, we established the basis of our approach and algorithm on multiple concepts in various fields including model checking, stochastic optimal control, and tensor decompositions. The idea is that we can specify the (finite) desired behavior of our system using a finite variant of LTL called sc-LTL. Next, we can approximate our continuous-time, continuous-space system using a discrete framework called a Markov Decision Process, allowing us to utilize standard dynamic programming algorithms to find optimal control policies that minimize some cost function and solve the stochas-tic optimal control problem. Unfortunately, these algorithms run into the curse of dimensionality. This leads us to consider compressed representations of the cost func-tion, in the form of tensor decompositions, allowing us to carry out computations in linear time with respect to dimensionality.

In the following chapters, we detail the automated synthesis problem and present an computationally-efficient algorithm for solving it. The framework, at its core, involves taking the cross product between the approximating MDP of the system and the automata of the sc-LTL specification in order to obtain a product MDP that preserves the system dynamics while limiting the system behavior to the set of behavior described by the specification. We then formulate the problem as a stochastic optimal control problem, whereby we minimize the cost to getting to a terminal set of states. By representing the cost function in a compressed manner, we can efficiently carry out value iteration in linear time with respect to dimensionality and in polynomial time with respect to the rank of the cost function.

(37)

Chapter 3

Problem Definition

In this chapter we provide a formal problem definition of the stochastic optimal control problem with sc-LTL specifications. In summary, the basis of our approach relies on constructing a product system based on the approximating MDP of Equation 2.1 and the automata of an sc-LTL specification, 𝜙.

3.1

Stochastic Optimal Control with sc-LTL

Specifi-cations

We use atomic propositions to describe various properties of the states of the system. Recall that 𝐴𝑃 is a finite set of atomic propositions. A labeling function is a mapping 𝐿 : 𝑋 → 2𝐴𝑃 that maps each state to the atomic propositions that hold for that state. When an atomic proposition 𝑝 holds at state 𝑥, i.e., 𝑝 ∈ 𝐿(𝑥), the atomic proposition 𝑝 is said to be True at state 𝑥; otherwise, 𝑝 is said to be False at 𝑥. Lastly let us denote the operator [[·]] : 2𝐴𝑃 →, which maps a set of propositions to a subset states

in the continuous state space.

We are interested in reasoning about the trajectories of a stochastic system us-ing the sc-LTL language. For this purpose, below, we define a product system of a stochastic dynamical system (see Equation (2.1)) with state space 𝑋 and a finite automaton 𝒜 (see Definition 4) with states 𝑄. Let us define the set of all product

(38)

states as 𝑆 = 𝑋 × 𝑄.

3.1.1

The product system

For a given 𝑥[𝜔], let (𝑡0, 𝑡1, 𝑡2, . . . , 𝑡𝑘) denote the increasing sequence of time instances

for which 𝐿(𝑥[𝜔](𝑡𝑖)) ̸= 𝐿(𝑥[𝜔](𝑡𝑖) − 𝜖) for all small enough 𝜖 > 0. That is, 𝑡𝑖 is the

𝑖th time instance that the trajectory 𝑥[𝜔](𝑡) is about to cross into a region where a different set of atomic propositions hold. Define 𝑡0 := 0, and define 𝜋𝑖 = 𝐿(𝑥[𝜔](𝑡𝑖))

for all 𝑖 ∈ {0, 1, . . . , 𝑘 − 1}. The sequence 𝑤𝑥[𝜔]= (𝜋0, 𝜋1, . . . , 𝜋𝑘−1) is called the word

generated by the state trajectory 𝑥[𝜔]. Let 𝜎𝑥[𝜔] = (𝑞0, 𝑞1, . . . , 𝑞𝑘) denote the run

generated by 𝑤𝑥[𝜔] on automaton 𝒜. Finally, the state process of the product system

is denoted by {𝑠(𝑡) : 𝑡 ≥ 0}, where 𝑠(𝑡) ∈ 𝑆, and defined as follows: For any given sample path 𝜔 ∈ Ω,

𝑠[𝜔](𝑡) =(︀ 𝑥[𝜔](𝑡), 𝑞𝑖)︀,

for all 𝑡 ∈ [𝑡𝑖−1, 𝑡𝑖) and all 𝑖 ∈ {1, 2, . . . , 𝑘}. Intuitively, the product structure can

be visualized as layers of continuous spaces, where each continuous space is a copy of the state space of the dynamical system represented by Equation (2.1) [54]. A transition between two layers, say between those representing 𝑞 and 𝑞′, respectively, occurs when the sample trajectory 𝑥[𝜔] hits some set [[𝑝]] ⊂ 𝑋ℎ at some time 𝑡 such

that 𝑞′ = 𝛿(𝑞, 𝐿(𝑥[𝜔](𝑡))). Figure 3-1 illustrates this layers intuition in the original continuous problem.

In a nutshell, the state process {𝑠(𝑡) : 𝑡 ≥ 0} of the product system encompasses the state process of the continuous-time continuous-state stochastic dynamic system along with the state evolution of the discrete automaton. Notice that the states of the automaton evolve according to the atomic propositions that are satisfied along the trajectories of the stochastic dynamic system. For any given sc-LTL formula, we can use the corresponding automaton 𝒜𝜙 = (𝑄, Σ, 𝛿, 𝑞0, 𝐹 ) in the product system,

and be able to understand whether the trajectories of the continuous system satisfy the formula 𝜙.

(39)

Figure 3-1: Product Space as a set of continuous layers, corresponding to the discrete modes in 𝒜

3.1.2

Feedback control policies

A control policy is a mapping 𝜇 : 𝑆 → 𝑈 that assigns a control input to each product state. The product process under the influence of policy 𝜇, denoted by {𝑠𝜇(𝑡) : 𝑡 ≥ 0},

is obtained by setting the input 𝑢(𝑡) = 𝜇(𝑠𝜇(𝑡)) for all 𝑡 ∈ R+ in Equation (2.1).

The first entry time for policy 𝜇 is defined as the the first time that the product process hits the boundary of ℱ𝑖, i.e.,

𝑇𝜇:= inf{𝑡 : 𝑠𝜇(𝑡) ∈ 𝜕ℱ × 𝐹 }.

The expected cost-to-go function under policy 𝜇 is a mapping 𝐽𝜇 : 𝑆 → R:

𝐽𝜇(𝑠0) = E [︂∫︁ 𝑇𝜇 0 𝑔(𝑠𝜇(𝑡))𝑑𝑡 + ℎ(𝑠𝜇(𝑇𝜇)) ⃒ ⃒𝑠𝜇(0) = 𝑠0 ]︂ , (3.1)

where 𝑔 : 𝑆 → R+ and ℎ : 𝑆 → R are the stage cost function and terminal cost

func-tion, respectively. The optimal cost-to-go function maps each 𝑠 ∈ 𝑆 to the minimum cost-to-go at 𝑠 over the set of all proper policies, i.e.,

𝐽*(𝑠0) = inf

𝜇𝐽𝜇(𝑠0), for all 𝑠0 ∈ 𝑆.

An optimal policy 𝜇* is one that achieves the optimal cost to go function, i.e., 𝐽𝜇*(𝑠) =

(40)

We are interested in the following problem: Problem 1. Given the following:

∙ (𝑋, 𝑈, 𝑏, 𝐹 ): a continuous-time continuous-space stochastic dynamical system, ∙ 𝐴𝑃 : a set of atomic propositions;

∙ 𝐿: a labeling function that maps each state 𝑧 ∈ 𝑋 to the set of all atomic propositions that hold at 𝑧;

∙ 𝜙: an sc-LTL formula over 𝐴𝑃 ,

∙ (𝑔, ℎ): a pair of stage cost and terminal cost functions. compute an optimal policy 𝜇*.

(41)

Chapter 4

Proposed Algorithm

In this chapter, we introduce a framework for approximating the continuous-time, continuous-space problem, formulating the discrete approximation as a stochastic shortest path problem, followed by a formal description of our algorithm.

4.1

Consistent discretization of stochastic optimal

control problems

First, we discuss numerical methods that are used to solve continuous-time, continuous-space stochastic optimal control problems. These numerical methods dis-cretize Equation 2.1 to obtain a discrete-time, discrete-space MDP, which can be solved using standard techniques, such as the value or policy iteration algorithms.

Now, we focus on the continuous-time, continuous-space stochastic optimal control problem and introduce a consistent discretization method from the literature. Recall that a continuous-time continuous-space stochastic dynamics is described by the 5-tuple (𝑋, 𝑈, 𝑏, 𝐹 ). See Section 2.2. This dynamical system can be approximated arbitrarily well with a sequence of discrete MDPs, which we describe next.

Recall the Definition 6 of an MDP in Section 2.2. Let ℎ𝑙 > 0 be a sequence of real numbers, such that lim𝑙→∞ℎ𝑙 = 0. Let 𝑋𝑙 ⊂ 𝑋 be a finite set of states

(42)

(𝑋𝑙, 𝑈, 𝑃𝑙, 𝐹𝑙, 𝐺𝑙, 𝐻𝑙) indexed by 𝑙 ∈ N, where 𝑋𝑙 is a set of states, 𝑈 is a set of

control actions, 𝑃𝑙 : 𝑋𝑙 × 𝑈 × 𝑋𝑙 → [0, 1] is the transition probability function,

𝐹𝑙 ⊆ 𝑋𝑙 is a set of terminal states, 𝐺𝑙 is the stage cost, and 𝐻𝑙 is the terminal cost.

Let {𝜉𝑙

𝑖 : 𝑖 ∈ N} denote the evolution of the state for MDP 𝑙. We define a sequence of

holding times as a sequence of functions {∆𝑡𝑙 : 𝑙 ∈ R+}, where ∆𝑡𝑙 : 𝑋𝑙× 𝐴𝑙 → R+

for all 𝑙 ∈ N. The sequence of holding times allow us to generate continuous-time state evolution from the discrete-time state evolution of the MDPs. More precisely, given the state evolution of a sequence of MDPs, i.e., {𝜉𝑙

𝑖 : 𝑖 ∈ N}, and a sequence

of holding times, i.e., {∆𝑡𝑙 : 𝑙 ∈ N}, we construct a sequence of continuous-time trajectories, denoted by {𝜉𝑙 : 𝑙 ∈ N}, as follows:

𝜉𝑙(𝜏 ) = 𝜉𝑛𝑙, for all 𝜏 ∈ [𝑡𝑙𝑛, 𝑡𝑛𝑙 + ∆𝑡𝑙(𝜉𝑛𝑙, 𝑢(𝑡𝑙𝑛))),

where 𝑡𝑙𝑛 = ∑︀𝑛−1

𝑖=1 ∆𝑡 𝑙(𝜉𝑙

𝑖𝑢(𝑡𝑙𝑖)). Notice that, for each 𝑙 ∈ N, we have that 𝜉𝑙(𝑡) ∈ 𝑋𝑙

and {𝜉𝑙(𝑡) : 𝑡 ≥ 0} is a stochastic process on its own right.

Let {𝑢(𝑡) : 𝑡 ≥ 0} be an input process and 𝑥0 ∈ 𝑋 be an initial state. Let

{𝑥(𝑡) : 𝑡 ≥ 0} denote the resulting state process for the continuous-time stochastic dynamic system in Equation 2.1. Let {𝜉𝑙(𝑡) : 𝑡 ≥ 0} denote the stochastic process

obtained by applying inputs 𝑢(𝜏 ) at time 𝜏 ∈ [𝑡𝑙𝑛, 𝑡𝑙𝑛+ ∆𝑡𝑙(𝜉𝑛𝑙)).

A classical result by Kushner and coworkers [35] guarantees the consistency of discretizations, if a set of conditions, often called the local consistency conditions, are satisfied.

Definition 7 (Local Consistency Conditions). A sequence of MDPs 𝑀𝑙 =

(𝑋𝑙, 𝑈, 𝑃𝑙, 𝐹𝑙, 𝐺𝑙, 𝐻𝑙) and a sequence of holding times {∆𝑡𝑙 : 𝑙 ∈ N}, both indexed by 𝑙 ∈ N, are said to be locally consistent with the continuous-time stochastic dynam-ics of Equation 2.1 if the following are satisfied: (i) lim

𝑙→∞𝑖∈N, 𝜔∈Ωsup ‖𝜉 𝑙

(43)

(ii) for all 𝑧 ∈ 𝑋𝑙, 𝑢 ∈ 𝑈 : lim 𝑙→∞△𝑡 𝑙(𝑧, 𝑢) = 0. lim 𝑙→∞ E𝑃𝑙[𝜉𝑖+1𝑙 − 𝜉𝑙𝑖| 𝜉𝑖𝑙= 𝑧, 𝑢𝑙𝑖 = 𝑢] △𝑡𝑙(𝑧, 𝑢) = 𝑏(𝑧, 𝑢), lim 𝑙→∞ Cov𝑃𝑙[𝜉𝑖+1𝑙 − 𝜉𝑖𝑙| 𝜉𝑖𝑙 = 𝑧, 𝑢𝑙𝑖 = 𝑢] △𝑡𝑙(𝑧, 𝑢) = 𝐹 (𝑧, 𝑢)𝐹 (𝑧, 𝑢) 𝑇 ,

Roughly speaking, a discrete MDP satisfies Definition 7 if the expectation and covariance of the difference between random states in a sample path converge to the original drift and covariance matrix of the system in Equation 2.1, respectively.

A sequence of MDPs and holding times are said to be consistent, if they satisfy the local consistency conditions. Once a consistent discretization is obtained, we solve the the discrete MDPs by minimizing:

𝐽𝑙(𝑧) = min 𝜇 E ⎡ ⎣ 𝑁𝑙 ∑︁ 𝑖=1 𝐺(𝜉𝑖) + 𝐻(𝜉𝑁𝑙) ⎤ ⎦,

where 𝑁𝑙 is the first time the state hits 𝐹𝑙, i.e., 𝑁𝑙= inf{𝑖 : 𝜉𝑙

𝑖 ∈ 𝐹𝑙}.

If the conditions above are met, then Theorem 1 below guarantees that these solutions converge to the solution of Problem 1.

Theorem 1 (Kushner et al. [35]). Suppose the sequence of MDPs 𝑀𝑙 and the

se-quence of holding times {∆𝑡𝑙 : 𝑙 ∈ N}, both indexed by 𝑙 ∈ N, are locally consistent

with the continuous-time stochastic dynamics of Equation 2.1. Then, the stochastic process {𝜉𝑙(𝑡) : 𝑡 ≥ 0} converges to the state process {𝑥(𝑡) : 𝑡 ≥ 0} as 𝑙 tends to

infinity. Furthermore, the optimal cost-to-go function 𝐽𝑙 for the MDPs 𝑀𝑙 converges

to the optimal-cost-to go function 𝐽* of the continuous-time stochastic optimal control problem, i.e.,

lim

𝑙→∞‖𝐽

𝑙− 𝐽*‖ = 0.

As an example, one valid discretization can be based on an upwind discretization. Let 𝑒1, . . . , 𝑒𝑑 be unit vectors in R𝑑 and assume 𝐹 is a diagonal matrix and the

Figure

Figure 1-1: Air traffic over Airspace of the United States (source: NASA)
Figure 1-2: Autonomous Indoor Robotic Aircraft (source: MIT)
Figure 2-1: Three-Dimensional Tensor,
Figure 2-2: Fibers of a three-dimensional tensor,
+7

Références

Documents relatifs

For the linear fractional systems of commensurate order, the state space representation is defined as for regular integer state space representation with the state vector

2 shows that SSPR has a faster convergence speed when compared to the other methods (for both real and complex cases), since it requires a smaller number of iterations to solve

Abstract—In this paper, we propose an improvement of the attack on the Rank Syndrome Decoding (RSD) problem found in [1], usually the best attack considered for evaluating the

The primitive BSP EXCHANGE send each state s for a pair (i, s) in tosend to the processor i and returns the set of states received from the other processors, together with the

One of the first contributions to solve the train scheduling or timetabling problems was made by Szpigel [20], he modelled the single track train scheduling problem as a job

We apply the idea of online optimization of RPCA [6] on tensor and update the indi- vidual basis iteratively followed by the processing of one frame per time instance using

Keywords: parallel numerical methods · difference scheme · advection equation · time delay.. 1 Introduction and

Our main contribution is a multiscale approach for solving the unconstrained convex minimization problem introduced in [AHA98], and thus to solve L 2 optimal transport.. Let us