Optimistic Planning Algorithms For State-Constrained Optimal Control Problems

(1)

HAL Id: hal-03283075

https://hal.archives-ouvertes.fr/hal-03283075v2

Submitted on 29 Jan 2022

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Optimistic Planning Algorithms For State-Constrained Optimal Control Problems

Olivier Bokanowski, Nidhal Gammoudi, Hasnaa Zidani

To cite this version:

Olivier Bokanowski, Nidhal Gammoudi, Hasnaa Zidani. Optimistic Planning Algorithms For State-

Constrained Optimal Control Problems. Computers & Mathematics with Applications, Elsevier, 2022,

109 (1), pp.158-179. �hal-03283075v2�

(2)

Optimistic Planning Algorithms For State-Constrained Optimal Control Problems

Olivier Bokanowski

Universit´e de Paris, LJLL, F-75013 Paris, France and Sorbonne Universit´e, CNRS, LJLL, F-75005 Paris, France

Nidhal Gammoudi

Unité de Mathématiques Appliquées (UMA), Ensta Paris, 828 Bd des Maréchaux, 91762 Palaiseau, France

Hasnaa Zidani

Normandie Univ, INSA Rouen Normandie, Laboratoire de Math´ematiques Appliqu´ees (LMI), 76000 Rouen, France

Abstract

In this work, we study optimistic planning methods to solve some state-constrained optimal control problems in finite horizon. While classical methods for calculating the value function are generally based on a discretization in the state space, optimistic planning algorithms have the advantage of using adaptive discretization in the control space. These approaches are therefore very suitable for control problems where the dimension of the control variable is low and allow to deal with problems where the dimension of the state space can be very high. Our algorithms also have the advantage of providing, for given computing resources, the best control strategy whose performance is as close as possible to optimality while its corresponding trajectory comply with the state constraints up to a given accuracy.

Keywords: Optimistic planning algorithm, optimal control problems, state constraints, piecewise control policies

2010 MSC: 49M30, 49M25, 65K05

1. Introduction

This paper concerns a numerical approach for solving some state-constrained non-linear optimal control problems. LetT >0 be a fixed final horizon, and let A be a given non-empty compact subset of a metric

Email addresses: olivier.bokanowski@math.univ-paris-diderot.fr(Olivier Bokanowski), Nidhal.Gammoudi@ensta-paris.fr(Nidhal Gammoudi),Hasnaa.Zidani@insa-rouen.fr(Hasnaa Zidani)

(3)

space. Consider two closed subsetsK andC ofR^d (d≥1). Forx∈R^d, the control problem is the following

ϑ(x) := inf Z T

0

`(y^α_x(s), α(s))ds+ Φ(y^α_x(T))

α(s)∈A, y^α_x(s)∈ K fors∈[0, T], and y_x^α(T)∈ C

, (1) where the statey^α_x(·) and the controlα(·) are related by the following non-linear differential equation

˙

y(s) =f(y(s), α(s)) a.e.s∈(0, T), y(0) =x. (2)

In this problem, the functionsf :R^d×A→R^d,`:R^d×A→R^d and Φ :R^d→Rare Lipschitz continuous (the precise assumptions are given in section 2). Several mathematical and numerical contributions have been devoted to the analysis and approximation of the solutions of (1). Most of the classical optmisation-based methods are only able to guarantee the computation of local solutions, but there are also some mathematical methods that aim at computing the global solutions. Among these methods, the dynamic programming approach gives a mathematical characterization of the value functionϑand provides the optimal control law in a feedback form, which is interesting for many engineering applications. However, the numerical cost of this approach restricts its application to problems where the state space is of small dimension.

The approach that we investigate in the present work is based on optimistic planing (OP) algorithms [8, 9].

These algorithms use a discretization in the space of controls (and do not require any discretization of the state space). This approach is very interesting especially for many applications where the control dimension q is much lower compared to the state dimension d. On a given discretization of the time interval, our approach will seek to identify the best control strategy to apply on each time sub-interval. The OP methods perform the optimal control search by branch and bound on the control set, always refining the region with the best lower bound of the optimal value (this is what justifies the label ”optimistic”). An interesting feature of these algorithms is the close relationship between computational resources and quasi-optimality, which exploits some ideas of reinforcement learning [18]. Indeed, for given computational resources, the OP approaches provide a sub-optimal strategy whose cost isas close as possibleto the optimal value (the measure of ”close to the optimum” will be specified in section 3).

Many OP variants have been proposed in the literature (see [7, 13, 14, 15] and the references therein) with mainly heuristic rules for the refinement and without convergence analysis. In [8, 9], some OP methods are proposed for continuous actions. These methods use selection rules for adaptive refinement of the control. A rigorous analysis of convergence and of the complexity of the methods are also presented in [8], within the framework of an infinite horizon problem with a discount factor and without state constraints.

(4)

Our main contribution is to extend some optimistic planning algorithms to solve finite horizon control problems in presence of state constraints. We first use some ideas introduced in [2, 4] to reformulate the original control problem as an auxiliary problem without state constraints. In this auxiliary problem, the set of controls remain the same, however, the dimension of the state space is increased by one more component. We adapt two algorithms introduced in [9, 8] (OPC - ”Optimistic Planning with Continuous actions”, and SOP(or SOPC) - ”Simultaneous OPC”) to solve the auxiliary control problem and then to get an approximation of the optimal control strategy for the original state-constrained problem. We show that in this framework, we can obtain convergence results similar to those established in [9]. Besides, we improve the analysis of the complexity of these algorithms and provide simplified proof arguments for this analysis. Furthermore, we propose an algorithm which combines both the optimistic planning approach with ideas from the MPC (Model Predictive Control). This algorithm gives numerical results similar to those of SOP but reduces significantly the computational time. We illustrate the relevance of the OP algorithms on several non-linear optimal control problems (in one of these examples, the dimension of the state is 10³). Furthermore, as the OP algorithm applies to discrete-time optimal control problems, we give here an estimate of the error between the continuous time optimal control problem and its discrete-time approximation. We show that, for a large class of optimal control problems, this error is of the orderO(_N¹) whereN is the number of time steps (see Appendix B for a precise statement).

Notations. In all this paper,Rdenotes the set of real numbers,k · kdenotes the Euclidean norm onR^r(for any r ≥ 1). For any set S ⊆ R^N, ∂S, denotes its boundary. The distance function to S is dist(x, S) = inf{kx−yk: y∈S}. For anya₁,· · ·, a_q ∈R, the notation

q

_

i=1

a_i stands for max(a₁,· · ·, a_q).For any finite setU,|U| is the cardinality ofU. Finally, the abbreviation ”w.r.t.” stands for ”with respect to”, and ”a.e.”

means ”almost everywhere”.

2. State-constrained control problems.

LetT >0 be a fixed final time, and letAbe a given non-empty compact subset ofR^q (q≥1) . Consider the controlled system:

˙

y(s) =f(y(s), α(s)) a.e.s∈(t, T), and y(0) =x, (3) whereα(·) is an admissible control in

A_ad:=

α: (0, T)→R^q measurable,α(t)∈Aa.e.

,

(5)

and the dynamicsf :R^d×A→R^d is a Lipschitz continuous function satisfying:

(A1)There existLf,x>0 andLf,a>0, for anyx, y∈R^d, for alla, a⁰ ∈A, we have:

kf(x, a)−f(y, a⁰)k ≤L_f,xkx−yk+L_f,aka−a⁰k.

Under assumption (A1), for anyα∈ A_ad and for any x∈R^d, there exists a unique absolutely continuous trajectoryy =y^α_x satisfying (3). Let K andC be two non-empty closed sets ofR^d. A trajectory y_x^αwill be saidfeasible(on the time interval (0, T)) if

y_x^α(s)∈ K, for alls∈(0, T), and y^α_x(T)∈ C. (4)

Now, consider two Lipschitz continuous functions`:R^d×A→Rand Φ :R^d→Rsatisfying:

(A2)There exists L`,x≥0 andL`,a≥0 such that for anyx, y∈R^d and for anya, a⁰∈A, we have:

|`(x, a)−`(y, a⁰)| ≤L_`,xkx−yk+L_`,aka−a⁰k.

(A3)There exists LΦ≥0 such that for anyx, y∈R^d , we have: |Φ(x)−Φ(y)| ≤LΦkx−yk.

The state-constrained control problem is formulated as follows:

ϑ(x) := inf Z T

0

`(y^α_x(s), α(s))ds+ Φ(y_x^α(T))

α∈ A_ad, y_x^α(s)∈ K ∀s∈[0, T], andy^α_x(T)∈ C

, (5)

with the convention that inf∅ = +∞. In general, when K 6=R^d or C 6=R^d, it may happen that the set of feasible trajectories is empty, and in this case the functionϑmay take an infinite value. Indeed, unless some controllability assumptions are satisfied, we may have no control inputαthat can act on the system and force it to comply with the state constraints [21, 12]. So, in general, it is not an easy task to know apriori if the valueϑ(x) is finite or not. The approach which we will use thereafter is based on an idea introduced in [2]. It provides the valueϑ(x) and an optimal trajectory associated to (5) (if any). When the valueϑ(x) is infinite (i.e., when the set of feasible trajectories is empty), our approach computes trajectories which minimize the cost and remain as close as possible to the sets of constraints.

Let us consider two Lipschitz continuous functionsgand Ψ (with Lipschitz constantsL_gandLΨ) verifying the following property:

∀y∈R^d, g(y)≤0 ⇐⇒ y∈ K and Ψ(y)≤0 ⇐⇒ y ∈ C. (6)

SinceKandC are closed sets, the existence of Lipschitz functions satisfying (6) is guaranteed. For instance,

(6)

g and Ψ can be chosen, respectively, as the signed distance toKand the signed distance toC:

d_K(x) :=







dist(x, ∂K) ifx6∈ K

−dist(x, ∂K) ifx∈ K

and d_C(x) :=







dist(x, ∂C) ifx6∈ C

−dist(x, ∂C) ifx∈ C

The auxiliary control problem is defined, for (x, z)∈R^d×R, as follows (witha∨b∨c:= max(a, b, c)):

w(x, z) := inf

α∈Aad

hZ T 0

`(y_x^α(s), α(s))ds+ Φ(y_x^α(T))−zi _ max

s∈[0,T]g(y^α_x(s))_

Ψ(y^α_x(T))

. (7)

Let us emphasize on that the auxiliary control problem is free of state constraints. The cost function in (7) is called the improvement function. Similar auxiliary problems have been introduced and analyzed in the context of finite-dimensional optimization problems in [17, 19] (see also [3]). For continuous time control problems, the properties of the auxiliary problem have been established in [2, 4]. Among other things, it can be proved that a couple (¯y,u) is an optimal solution for (5) if and only if it is also an optimal solution of (7)¯ withz=ϑ(x). Moreover, the value ofϑ(x) can be recovered fromw(x,·) (see [2, 4] for a precise statement).

Throughout the paper, we will assume that assumptions (A1)-(A3)are satisfied.

3. Discrete time setting

First, for N ≥1, consider a uniform partition of [0, T] withN+ 1 time steps: tk =k∆t, k= 0, . . . , N, where ∆t= _N^T is the step size. We introduce a discrete dynamics defined as follows (by Heun scheme):

F(x, a) :=x+∆t 2

f(x, a) +f(x+ ∆tf(x, a), a)

. (8)

For a sequence of actionsa= (a_k)_0≤k≤N₋₁∈A^N, we consider the discrete-time dynamical system:





 y0=x,

y_k+1=F(y_k, a_k) k= 0, ..., N−1.

(9)

For every x∈R^d and every sequence a= (ak)k ∈A^N, equation (9) admits a unique solution that will be denoted byy^x,a= (y^x,a_k )^N_k=0. This trajectory is said feasible if

y^x,a_k ∈ K, for any k= 0, ..., N and y_N^x,a∈ C.

(7)

Consider also an instantaneous cost functionρthat approximates the integral of`over an interval [tk, tk+1], fork= 0, ..., N−1, by a trapezoidal quadrature rule, as follows

ρ(x, a) := ∆t 2

`(x, a) +`(F(x, a), a) .

One can easily check that under assumption (A1) and (A2), the functionsF andρare Lipschitz continuous kF(x, a)−F(x⁰, a⁰)k ≤LF,xkx−x⁰k+LF,aka−a⁰k,

|ρ(x, a)−ρ(x⁰, a⁰)| ≤Lρ,xkx−x⁰k+Lρ,aka−a⁰k,

for everyx∈R^d anda∈A, where LF,x:= 1 + ∆tLf,x

1 + ∆t

2 Lf,x

, LF,a:= ∆tLf,a

1 +∆t

2 Lf,xLf,a

, and (10a)

L_ρ,x:=∆t 2 L_`,x

1 +L_F,x

, L_ρ,a:= ∆t

L_`,a+1

2L_`,xL_F,a

. (10b)

Note that other approximations of the dynamics and of the integral term would lead to similar estimates as in (10). The approximations presented in this section are the ones that will be used in the numerical simulations of Section 5. Now, we define the state-constrained optimal control problem as follows:

V(x) := inf

a=(a_k)_k∈A^N

(_N₋₁ X

k=0

ρ(y_k^x,a, ak) + Φ(y_N^x,a) | y_k^x,a∈ K, ∀k= 0, ..., N and y_N^x,a∈ C )

. (11)

Then, for the discrete auxiliary control problem, we introduce the cost functionalJ defined by

J(x, z, a) :=^N−1X

k=0

ρ(y^x,a_k , a_k) + Φ(y^x,a_N )−z _ max

0≤k≤N g(y^x,a_k ) _

Ψ(y_N^x,a) (12)

(for (x, z, a= (ak))∈R^d×R×A^N) and the corresponding value is defined as follows, for (x, z)∈R^d×R:

W(x, z) := inf

a=(ak)k∈A^NJ(x, z, a). (13)

It is worth mentioning thatW(x, z) converges to w(x, z) as N → ∞(i.e. ∆t= _N^T →0). For Bolza-type control problems (i.e., when there is no running costg and no cost Ψ) an error estimate of orderO(^√¹

N) has been established in [11]. This bound is of orderO(_N¹) under a semi-concavity assumption of the functionsf,

` and Φ (but still for the Bolza problem, so without the costsg and Ψ), see [20]. In our case, the auxiliary control problem (13) cannot be recast into the framework of [11, 20]. Besides, to extend the arguments

(8)

developed in [20], to our setting, one would require very restrictive assumptions on the functionsg and Ψ (which would induce restrictive assumptions on the set of state constraints). In Appendix B, we will prove that the error between w(s, z) and W(x, z) is of order O(_N¹) under a convexity assumption (see (A4) in Appendix B) and without any additional requirement on Φ, g and Ψ. Furthermore, note that by using similar arguments as in [4], one can also show that the sequence of discrete-time optimal trajectories (13) (forN ∈N) provide convergent approximations of optimal trajectories of the continuous problem (7).

From now on, we will focus on algorithms for solving the discrete auxiliary control problems (13) and the original constrained problem (11). By using similar arguments as in [4], one can prove the following result.

Proposition 3.1. We have:

V(x) = inf{z|W(x, z)≤0}. (14)

Furthermore, a pair (¯y,a)¯ is optimal for problem (11)if and only if(¯y,¯a)is optimal for problem (13)with z=V(x).

Remark 3.2. A classic approach to take into account the constraints in (11)consists in approachingV(x) by a penalized problem as follows:

V(x) := inf

a=(ak)k∈A^NJ(x, a), with J(x, a) :=

N−1

X

k=0

ρ(y_k^x,a, a_k) + Φ(y^x,a_N ) + 1

X^N

k=0

g(y^x,a_k )⁺+ Ψ(y_N^x,a)⁺

. The advantage of the penalized problem is that it does not increase the dimension of the state variable. However, the Lipschitz constant of the penalized criterionJ involves a term in ¹ which tends to infinity when the penalization parametergoes to 0. As it will be shown in the paper, the Lipschitz constants are used in the elaboration of the optimistic algorithms and in their complexity analysis. Let us emphasize on that the auxiliary problem involves a cost function whose Lipschitz constant does not explode, which makes this problem more suitable than the classical penalization approach for using optimistic methods and taking into account the state constraints.

Now, let us state a preliminary result that will be useful in the rest of the paper.

Proposition 3.3. For any(x, z)∈R^d×Randa= (ak)k,¯a= (¯ak)k ∈A^N, the following estimate is satisfied:

|J(x, z, a)−J(x, z,¯a)| ≤^NX⁻¹

k=0

βkkak−¯akk _ ^NX⁻¹

k=0

γkkak−¯akk

, (15)

whereβk:=Lρ,xLF,a

L^N_F,x^−k−1−1

L_F,x−1 +Lρ,a+LΦLF,aL^N−k−1_F,x , and γk:=

Lg

_LΨ

LF,aL^N_F,x^−k−1.

(9)

A quite similar estimate of the difference of performances is established, in [9, 8], for the case of an infinite horizon sum with a positive discount factor and under a boundedness assumption on the instantaneous reward.

In our case, the cost functionalJ is defined, in finite horizon, as a maximum of several terms, and without discount factor.

Proof. Denote byy = (y_k)_k the discrete trajectory associated to the inputa, and by ¯y= (¯y_k)_k the discrete trajectory associated to ¯a. Both trajectories start from the initial position x. By Lipschitz continuity of the dynamicsF, straightforward calculations give:

ky_k−y¯_kk ≤L_F,a

k−1

X

j=0

L^k−1−j_F,x ka_j−¯a_jk, 0≤k≤N. (16)

Now, by using the Lipschitz regularity of the cost functions, we obtain:

|J(x, z, a)−J(x, z,¯a)| ≤^NX⁻¹

k=0

Lρ,xkyk−y¯kk+

N−1

X

k=0

Lρ,akak−a¯kk+LΦkyN−y¯Nk _ L_g max

0≤k≤Nkyk−y¯_kk _

L_ΨkyN −y¯_Nk

. (17)

From (16), we get:

N−1

X

k=0

ky_k−y¯_kk ≤L_F,a

N−1

X

k=0 k−1

X

j=0

L^k−1−j_F,x ka_j−a¯_jk ≤L_F,a

N−1

X

k=0

L^N−k−1_F,x −1

L_F,x−1 ka_k−¯a_kk,

and for anyk= 0, ..., N−1, we have also: kyk−y¯kk ≤LF,aPN−1

j=0 L^N_F,x^−j−1kaj−¯ajk. Therefore,

Lg max

0≤k≤Nkyk−y¯kk _

LΨkyN−y¯Nk

≤ Lg

_LΨ

LF,a

N−1

X

k=0

L^N_F,x^−k−1kak−¯akk.

Combining the above inequalities gives the desired result (15).

Corollary 3.4 (Upper bound). There exists a constantC >0, independent of N, such that for any(x, z)∈ R^d×Randa= (a_k)_k, a_k= (¯a_k)_k∈A^N, we have:

|J(x, z, a)−J(x, z,¯a)| ≤C∆t

N−1

X

k=0

( 1 LF,x

)^kkak−¯akk. (18)

Proof. From (10),L_F,x= 1 + ∆tLwithL:=L_f,x(1 +^∆t₂ L_f,x)>0. Thus, for every 0≤k≤N−1:

L^N−k−1_F,x −1

LF,x−1 ≤ L^N−1_F,x

LF,x−1(LF,x)^−k ≤e^N^∆tL

∆tL (LF,x)^−k.

(10)

Using the definitions ofLρ,x andLF,a(see (10)), we obtain:

Lρ,xLF,a

L^N_F,x^−k−1−1

LF,x−1 ≤C1,1∆t 1 LF,x

^k ,

withC_1,1:=^e^{T L}

2LL_`,x(1 +L_F,x)L_f,a(1 +^T₂L_f,xL_f,a) is a constant independent ofN. Again from (10), we get:

Lρ,a = 1 L_F,x

k

Lρ,a(LF,x)^k ≤ 1 L_F,x

k

Lρ,ae^k∆tL≤C1,2∆t 1 L_F,x

k

,

withC1,2:=e^{T L} L`,a+¹₂L`,xLF,a

is a constant independent ofN. With similar argument, we also obtain:

L_ΦL_F,aL^N−k−1≤C_1,3∆t 1 LF,x

^k

with C_1,3:=L_Φe^{T L}L_f,a

1 +T

2L_f,xL_f,a

.

By setting C1 = C1,1+C1,2+C1,3, we conclude that: βk ≤ C1∆t 1 LF,x

^k

. Similar arguments lead to an upper bound ofγ_k as follows:

γk=C2∆t 1 LF,x

^k

, where C2:=

Lg

_LΨ

e^{T L}Lf,a

1 +T

2Lf,xLf,a

.

In conclusion, we obtain the claim withC:=C1WC2.

4. Optimistic planning

For sake of simplicity and without loss of generality, we suppose that the control is of dimension q= 1 and we denote by D its maximal diameter (∀a, a⁰ ∈A,ka−a⁰k ≤D). Let us emphasize that the approach can be generalized to control variables with multiple dimensions.

4.1. Optimistic planning approach

Planning algorithms are based on the principles of optimistic optimisation (see [10]). In order to minimize the objective functionJover the spaceA^N, we refine, in an iterative way the search space into smaller subsets.

A search space, called node and denoted by Ai with i ∈ N, is a Cartesian product of sub-intervals of A i.e. Ai :=Ai,0×Ai,1×...×A_i,N−1 ⊆A^N, whereAi,k represents the control interval at time stepk, for k= 0...N−1. The collection of nodes will be organized into a tree Υ that will be constructed progressively by expanding the tree nodes. Expanding a nodeAi, with i∈ N, consists in choosing an interval Ai,k, for k= 0...N−1, and splitting it uniformly toM sub-intervals whereM >1 is a parameter of the algorithm.

(11)

A0

A1 A2 A3

A4 A5 A6

A7 A8 A9

0

1

2

3 p(depth)

Figure 1: Illustrative example of refinement ofA^N withM= 3 after splitting 3 nodesA0,A1andA6.

In figure 1, we represent a simple example of a tree construction to explain. At the beginning of the process, the tree contains only its rootA0:= [0,1]^N, from which we generate M=3 children nodesA1= [0,¹₃]×A^N−1, A² = [¹₃,²₃]×A^N−1 and A³ = [²₃,1]×A^N⁻¹, after splitting its first interval. Then, suppose that A¹ is the second node to be refined by splitting its second interval. Hence, we get A4 = [0,¹₃]×[0,¹₃]×A^N−2, A5= [0,¹₃]×[¹₃,²₃]×A^N⁻²andA6= [0,¹₃]×[²₃,1]×A^N⁻². Finally, we chooseA6and we split its first interval [0,¹₃] to generate A7 = [0,¹₉]×[²₃,1]×A^N−2, A8= [¹₉,²₉]×[²₃,1]×A^N⁻² and A9 = [²₉,¹₃]×[²₃,1]×A^N−2. The order of expanded nodes and the intervals that have to be split will be made precise later. For now, we introduce some useful notations related to the tree Υ:

• We associate, for any nodeAi ∈Υ, a sample sequence of controlsa_i := (a_i,k)^N_k=0⁻¹ ∈Ai such thata_i,k corresponds to the midpoint of the intervalA_i,k for any k= 0, ..., N−1.

• For any node Ai ∈Υ corresponds a split function s_i(·) such thats_i(k) indicates the number of splits needed to obtain the interval Ai,k fork= 0, ..., N−1. For example, in figure 1,s0(·)≡0 for the root A0. As forA7, we haves7(0) = 2,s7(1) = 1 ands7(k) = 0 fork≥2.

• Denote di,k, fork= 0, ..., N−1, the diameter of the intervalAi,k of some node Ai∈Υ. In particular, d_i,k = D

M^sⁱ^(k).

• The depth of a nodeAi, denotedpi, is the total number of splits effectuated to obtain this node:

pi :=

N−1

X

k=0

si(k). (19)

By Depth(Υ), we designate the maximum depth in the tree Υ.

(12)

• A node Ai is a tree leaf if it has not been expanded. The set of the tree leaves is denoted by Λ.

• Finally, we denote by Λ_pthe set of leaves of Υ at a depth equal top∈N: Λ_p:=n

Ai∈Υ s.t. p_i=po . Remark 4.1. By selecting controls at the intervals centers and by taking M odd, we guarantee that after expanding a node Ai we generate at least one nodeAj with J(x, z, aj)≤J(x, z, ai). Indeed, the middle child Aj contains the same control sequence as Ai henceJ(x, z, aj) =J(x, z, ai).

Proposition 4.2. By the tree construction, there exists at least a leaf node Ai ∈Λ containing an optimal control sequence and verifying:

J(x, z, ai)−σi≤W(x, z)≤J(x, z, ai), (20) whereai is the sample control sequence inAi and

σ_i:= 1 2

^N−1X

k=0

β_kd_i,k _ ^N−1X

k=0

γ_kd_i,k

, (21)

with β_k andγ_k are given in proposition 3.3.

Proof. Since the set of leaves covers, at any level, the entire space A^N, there exists at least a leaf node Ai ∈ Λ that contains an optimal control sequence a^∗_i = (a^∗_i,k)k ∈ Ai. From Proposition 3.3 and since W(x, z) =J(x, z, a^∗_i), we have:

|J(x, z, ai)−W(x, z)| ≤^NX⁻¹

k=0

βkkai,k−a^∗_i,kk _ ^N−1X

k=0

γkkai,k−a^∗_i,kk .

On the other hand, sincea_i= (a_i,k)_k is chosen at the center of the intervalAi, we obtain :

∀k= 1,· · · , N kai,k−a^∗_i,kk ≤ di,k

2 , and |J(x, z, ai)−W(x, z)| ≤σi.

In the optimistic planning algorithms, at each iteration, one or several optimistic nodes are chosen and split to get from each node M children (M > 1 is a fixed parameter of the algorithm). The choice of the optimistic nodes is based on some criterion that we explain hereafter.

To expand a nodeAi, we choose an interval fromA_i,0×A_i,1×...×A_i,N₋₁ and we partition it uniformly to M sub-intervals. If we choose to split the interval A_i,k, for somek= 0, ..., N−1, then we will generate M nodes with an error termσ⁺_i (k) defined by:

σ_i⁺(k) := ^N−1X

j=0,j6=k

βjdi,j+βk

d_i,k M

_ ^N−1X

j=0,j6=k

γjdi,j+γk

d_i,k M

.

(13)

Henceforth, in order to minimize the errorσ_i⁺(k), the best choice of the interval to split,κ^∗_i, is given by:

κ^∗_i ∈ argmin

0≤k≤N−1

σ_i⁺(k). (22)

The following result gives an upper bound on the error termσ_i, of any nodeAi ∈Υ. This result will be used later in the convergence analysis of OP and SOP algorithms. The proof is given in Appendix A.

Theorem 4.3. Assume that M > L_F,x >1 and letN ≥2. Let τ :=l

logM log(L_F,x)

m

. Consider a node Ai at some depthpi=p. For plarge enough, the error σi (defined in (21)) is bounded as follows:

σi≤δp:= c1(N) ∆t M⁻^N^p, (23)

wherec1(N) :=C ¹

1−^M_LF,x^1/τM^q(N⁾,q(N) := 2−(N−1)_2τ(τ−1)^τ−2 , andC≥0 independent ofN andp.

Notice that, by definition, we have τ ≥2, and L⁻¹_F,xM^1/τ ≤1. Hence (23) is meaningful only when the strict inequality L⁻¹_F,xM^1/τ<1 holds. Moreover,q(N)≤2, hencec₁(N) is bounded independently ofN.

We introduce some additional definitions that will be useful for the presentation of the algorithms and for their analysis. For a given depth p∈N, we define the set of nodes that will be expanded by optimistic planning algorithms:

Υ^∗_p:={Ai∈Υ at depth p | J(x, z, ai)−δp≤W(x, z)}, whereδp is defined as in (23). The set containing all expanded nodes : Υ^∗:=S

q≥0Υ^∗_q is in general smaller than the whole tree Υ.

Definition 4.4. For a given depthp∈N, we define the asymptotic branching factorm, as the smallest real m∈[1, M]such that

∃R≥1, ∀p≥0, |Υ^∗_p| ≤Rm^p. (24)

Remark 4.5. The asymptotic branching factor mlies in[1, M]because at any depth p, there is at least one node in Υ^∗_p (the one containing the optimal solution) and at most M^p. The factor m is a measure of the complexity of the optimistic algorithm. Indeed, the nodes that will be expanded are contained in Υ^∗_p, whose size is bounded by Rm^p. A problem with low resolution complexity will correspond to a small value of m.

The complexity constantR (in front of the main growth factor m^p) will be used in the convergence analysis of the algorithms later on.

(14)

4.2. Optimistic Planning (OP) Algorithm

In this section, we will present the rules for refining the search of an optimal control strategy. In the first algorithm, at each iteration, the nodeAi^∗ minimizing the lower bound (J(x, z, ai)−σi) will be selected and split to M children. More precisely, we identify an intervalAi^∗,κ^∗_i_∗ whose partition inM sub-intervals will produce the lowest errorσi^∗(κ^∗_i∗).

Algorithm 1: Optimistic Planning (OP)

Require: The number of intervalsN, the split factorM, the maximal number of expanded nodesImax 1: Initialize Υ with a rootA0:=A^N andn= 0 (n:= number of expanded nodes).

2: whilen < Imaxdo

3: Select an optimistic node to expand: Ai^∗∈argmin

Ai∈Λ

(J(x, z, ai)−σi).

4: Selectκ^∗_i∗, defined in (22), the interval to split for the nodeAi^∗.

5: Update Υ by expandingAi^∗ alongκ^∗_i∗ and adding itsM children.

6: Updaten=n+ 1.

7: end while

8: return Control sequenceai^∗= (ai^∗,k)k∈A^N of the nodeAi^∗∈argmin

Ai∈Λ

J(x, z, ai) and J^∗(x, z) :=J(x, z, ai^∗).

In this algorithm, the number of expanded nodes corresponds to the number of iterations, since at each iteration only one node is expanded. The number Imax represents a maximum available computational resource.

Lemma 4.6. (See [8, Proposition 3]) The OP algorithm expands only nodes satisfying J(x, z, ai)−σi ≤ W(x, z) (thus only nodes in Υ^∗). Furthermore, the returned value J(x, z, ai^∗) satisfies 0 ≤ J(x, z, ai^∗)− W(x, z)≤σmin whereσmin corresponds to the smallest computed valueσi among all the expanded nodes.

From Theorem 4.3 and Lemma 4.6, we derive the following result whose proof is given in Appendix??.

Theorem 4.7. Assume that M > L_F,x>1 and letN ≥2. Let a_i^∗ andJ(x, z, a_i^∗)be the output of the OP algorithm, and let n≥1 be the corresponding number of expanded nodes. For nlarge enough, the following error bound holds:

0≤J(x, z, ai^∗)−W(x, z)≤ E_OP(N, n) :=







c2(N)n⁻^N^log(M)^log(m) ifm >1, c2(N)(log(n))⁻^log(M)^{N R} ifm= 1,

(25)

where

(15)

As already mentioned, c1(N) is bounded independently of N, and so is the constant c2(N). From Theorem 4.7 we deduce that the error, between the optimal cost computed by OP algorithm and the (global) optimal value, converges to zero asn→ ∞(for a fixed number of time stepsN).

Remark 4.8. The boundE_OP in Theorem 4.7 improves slightly the error estimates given in [8]. Indeed, in [8, Theorem 6], whenm >1, the error is bounded byO(γ^C

√

log(n))for someγ∈(0,1) (discount factor) and some constantC >0, while in Theorem 4.7, E_OP is of order O(β^log(n)), for some constant β ∈(0,1), which is therefore asymptotically lower.

4.3. Simultaneous Optimistic Planning (SOP) Algorithm

The SOP algorithm expands at each iteration several nodes which are supposed to be optimistic.

Algorithm 2: Simultaneous Optimistic Planning (SOP)

Require: The number of intervalsN, the split factorM, a maximal number of expanded nodesI_maxand a maximal depth functionp_max(.)≥1.

1: Initialize Υ with rootA0:=A^N and set n= 0 (n:= number of expanded nodes).

2: whilen < I_maxdo

3: p= min

Ai∈Λpi: the minimal depth among the tree leaves of Υ.

4: whilep≤pmax(n)do

5: Select an optimistic node at depthp: Ai^∗∈argmin

Ai∈Λp

J(x, z, a_i).

6: Selectκ^∗_i∗, as defined in (22), the interval to split for the nodeAi^∗.

7: Update Υ by expandingAⁱ^∗ alongκ^∗_i∗ and adding itsM children at depthp+ 1.

8: Updatep←p+ 1 and n←n+ 1

9: end while

10: end while

11: return Control sequencea^∗:= (ai^∗,k)k∈A^N of the nodeAi^∗= argmin

Ai∈Λ

J(x, z, ai) and J^∗(x, z) :=J(x, z, a^∗).

In Algorithm 2, p_max(n) denotes a maximal depth that the tree should not exceed, at iteration n. As in Algorithm 1,Imax represents a maximum available computational resource.

The following result gives a lower bound on the depth of the optimal node generated by the SOP algorithm.

This is a direct consequence of [18, Lemma 4.1]. For convenience of the reader, a sketch of the proof is given in Appendix A.

Lemma 4.9. Let pmax(n)≥1. Let p(n)≥1be the smallest integer p≥1 such that:

pmax(n)

p

X

q=0

Rm^q ≥n (26)

(16)

and letp^∗_n:= min{p(n)−1, pmax(n)}. Then the SOP algorithm, after niterations, has expanded an optimal node at depthp^∗_n and the solution obtained has an error bounded byδp^∗_n (recall thatδp is defined in (23)).

The SOP algorithm gives a sub-optimal solution with an upper bound estimate given in the following theorem whose proof is postponed to Appendix A.

Theorem 4.10. Consider the SOP algorithm withp_max(n) :=n^η for some givenη∈]0,1[. Letnbe a given number of expanded nodes in the SOP algorithm and let J^∗(x, z)be the corresponding returned value. The following upper bound holds asn→ ∞:

0≤J^∗(x, z)−W(x, z)≤ E_SOP(N, n) :=







c₃(N) ∆t n⁻^log(m)^(1−η)^log(M)^N ifm >1andη∈]0,1[

c3(N) ∆t M⁻ⁿ

1−η

R if m= 1 andη∈[¹₂,1[,

with c₃(N) :=c₁(N)M^N^log(R)^log(m)⁺^N² if m >1 and c₃(N) :=c₁(N)M⁻^N² if m = 1. In particular, the error of the SOP algorithm converges to zero asn→ ∞(for a fixed number of time stepsN).

Remark 4.11. The SOP estimate E_SOP, given in Theorem 4.10, improves the estimate presented in [9, Theorem 11]. Firstly, one can choosepmax(n) :=√

nfor all possible values of the branching factorm∈[1, M]

(which is not the case of [9, Theorem 11]). Secondly, in the case whenm >1, we obtain an error ofO(β^log(n)) for some constant β∈(0,1). This error is therefore asymptotically lower than the error of [9, Theorem 11], that is of order O(β⁰

√

log(n)

) for some β⁰ ∈ (0,1). In the case m = 1, our error can be written O(βⁿ^1/2) for some β ∈ (0,1), while in [9, Theorem 11] the bound is of order O(β⁰ⁿ^1/6) for some β⁰ ∈ (0,1) (and p_max=n^1/3).

4.4. Simultaneous Optimistic Planning with Multiple Steps (SOPMS) Algorithm

We start by introducing algorithm 3 (SOP-tree-update) which is a generic elementary algorithm that describes how some given tree Υ⁰ should be updated in a similar way to SOP.

The SOPMS algorithm uses the elementary algorithmSOP-tree-updatein order to optimize the objective function from an initial state x∈ R^d and an auxiliary variablez ∈ R. In this algorithm, we will need the definition of the following cost functionJ_k, starting from a time step k, for anyk= 0, ..., N−1, as follows:

J_k(y, z, a) :=^NX⁻¹

i=k

ρ(y^a_i, a_i) + Φ(y^a_N)−z _ max

k≤i≤N g(y_i^a) _ Ψ(y_N^a),

(17)

Algorithm 3: SOP-tree-update(Υ⁰, I, pmax(·), Jk(y, z,·))

Require: A tree Υ⁰, a maximal number of expanded nodesIand a maximal depth function pmax(·).

1: Initializen= 0 (n:= number of expanded nodes).

2: whilen < I do

3: p= min

Ai∈Λpi: the minimal depth among the tree leaves.

4: whilep≤pmax(n)do

5: Select an optimistic node at depthp: Ai^∗∈argmin

Ai∈Λp

Jk(x, z, ai).

6: Selectκ^∗_i∗, defined in (22), the interval to split for the nodeAi^∗.

7: Update Υ⁰ by expandingAi^∗ alongκ^∗_i∗ and adding itsM children at depthp+ 1.

8: Updatep←p+ 1 and n←n+ 1

9: end while

10: end while

11: return Control sequencea_i^∗ of the nodeAi^∗= argmin

Ai∈Λ

J_k(y, z, a_i) andJ_k^∗:=J_k(y, z, a_i^∗).

where y∈R^d,z ∈R, a:= (a_i)_i ∈A^N^−k and (y_i^a)_i is the discrete trajectory starting from y and associated to the control sequence (a_i)_i.

Algorithm 4: Simultaneous Optimistic Planning with Multiple Steps (SOPMS)

Require: The number of intervalsN, the split factorM, a maximal number of expanded nodesImax, a local initial number of expanded nodesIeval,0, a maximal depth functionpmax(·), a tolerance >0

1: Initializek←0,y0=xandn←0 (n:= total number of expanded nodes)

2: whilek≤N−1 do

3: Initialize Υk with controlsa^k:= (a^k_i)_{k≤i≤N−1}∈A^N−k, rootA0:=QN−1

i=k Ai, andWk :=Jk(yk, z, a^k) (see Remark 4.12 for the definition ofAi)

4: `:= 0

5: loop

6: Ieval := 2^`Ieval,0

7: (a^∗, W_k^temp)←SOP-tree-update(Υ_k, I_eval, p_max(·), Jk(y_k, z,·)) : expand nodes of Υk 8: updaten(n←n+I_eval)

9: if |^W

temp k −Wk

Wk | ≤and`≥1then

10: Accept the first componenta^∗_k∈A of the control inputa^∗, set yk+1=F(yk, a^∗_k)

11: k←k+ 1, andgo to 2

12: end if

13: Wk←W_k^temp

14: `←`+ 1

15: if n≥Imax then

16: Accept the control sequence (a^∗_k, a^∗_k+1, ..., a^∗_N−1)∈A^N−k andgo to20

17: end if

18: end loop

19: end while

20: return Control sequence (a^∗_k)_k∈A^N, valueJ^∗(x, z) =J(x, z, a^∗), budgetn.

(18)

Remark 4.12. (Initialisation step) In the initialization step of SOPMS algorithm (line 3), whenk= 0, the initial control interval sequence isA0 := [0,1]^N. Then, fork≥1, the initial control(aj)j≥k is taken as the computed control at iterationk−1(a_j :=a^∗_j forj≥k) and we choose the corresponding initial intervals as the intervals centered ina_i and of maximal possible diameter, contained in[0,1](that is,A0:=QN−1

i=k A_i, with A_i:= [a_i−r_i, a_i+r_i] andr_i:= min(a_i,1−a_i)).

At each time step k = 0, ..., N −1, the SOPMS algorithm optimizes the cost functional J_k over control sequences in a subset of A^N−k. When the cost functional cannot be improved any more (line 9), up to a relative threshold, the optimization procedure is stopped for the current time step, the first control component (as well as the first position) is accepted and we move forward to the next time step. The remaining control sequence is utilized as a starting point for the next iteration.

We do not have a convergence proof for the SOPMS algorithm. However, because of the presence of a tolerance parameter used to stop the successive computations, we will numerically observe a reduced number of expanded nodesn(compared to SOP algorithm) and hence a reduced complexity of the overall algorithm, leading to good precision up to a very small residual error and for a reduced budget (see Section 5.1).

Notice that by choosingI_eval,0=I_max, SOPMS becomes equivalent to SOP and provides the same error estimate.

4.5. Resolution procedure for a state-constrained problem

We describe here the procedure to get an approximation of the optimal valueV(x), and an approximated optimal trajectory for the state-constrained problem (11) starting from an initial statex. Let us first remark that the functionz→W(x, z) is a non-increasing function. By using the properties of the functionsF,ρ, Φ, g, and Ψ, one can get two boundsZminandZmax, such that: V(x) = inf{z∈[Zmin, Zmax] | W(x, z)≤0}.

The computation of an approximation of V(x) will be performed by a dichotomy algorithm combined with an optimistic method for evaluatingW(x, z) up to a prescribed precisionε >0. First, notice the the valueZmin can always be chosen such thatW(x, Zmin)> . We assume thatZmaxcan also be chosen such thatW(x, Zmax)≤(if this is not the case, it means that there is no admissible trajectory that satisfies the constraints andV(x) = +∞).

Remark 4.13. In Algorithm 5 (line 3), we need to compute J(x, c, a^c) up to a precision . To fulfill this requirement, a sufficiently large computational budget should be allowed in the optimistic algorithm in order to obtain a control strategya^c such that the errorJ(x, c, a^c)−W(x, c)is lower than. In practice, this budget can be large, and difficult to evaluate from the theoretical estimates obtained in Theorems 4.7 and 4.10. In the

(19)

Algorithm 5: dichotomy algorithm combined with OP (resp. SOP/SOPMS) method Require: A threshold >0,Zmin< Zmax such thatW(x, Zmin)> ,W(x, Zmax)≤

1: b`←Zmin,br←Zmax 2: whilebr−b`> do

3: c←(b_`+b_r)/2; Compute a strategya^c by OP (or SOP/SOPMS) such thatJ(x, c, a^c)−W(x, c)≤ (see Remark 4.13);

4: ifJ(x, c, a^c)> , set b_`:=c;

5: ifJ(x, c, a^c)≤, set b_r:=c;

6: end while

7: return z=b_r, the valueJ(x, z, a) along with its corresponding optimal control sequencea:= (a_k)_k and optimal trajectoryy:= (y_k)k.

sequel, we will assume that the computation budget is large enough and we will study the asymptotic behavior of the method when the threshold parameter converges to zero.

Let us first introduce the following problem with relaxed constraints:

V(x) := inf

a∈A^N

^N−1

X

k=0

ρ(y_k^x,a, ak) + Φ(y_N^x,a)|max

Ψ(y^x,a_N ), max

0≤k≤Ng(y^x,a_k )

≤

. (27)

Lemma 4.14. Assume that the dichotomy algorithm is combined with OP (resp. SOP). Assume that the computation budget is large enough so that E_OP(N, n)≤ (resp. E_SOP(N, n)≤). Then the output z=z of the dichotomy algorithm satisfies

− < W(x, z)≤, V(x)≤z+, and V(x)≥z−.

Proof. Assume that the dichotomy algorithm is combined with OP method and the calculations are performed withE_OP(N, n)≤. The outputz=br of the algorithm satisfies W(x, z)≤J(x, z, a) =J(x, z, a)≤. Moreover at the last iteration of the algorithm, we haveJ(x, b`, a^b_`)> which leads toW(x, b`)≥J(x, b`, a^b_`)−

E_OP>0. SinceW(x,·) is 1-Lipschitz, we obtain thatW(x, z)≥W(x, b_`)− |z−b_`|>−.

Now, W(x, z)≤implies that:

N−1

X

k=0

ρ(y_k, a_k) + Φ(y_N)−z≤ and max

Ψ(y_N ), max

0≤k≤Ng(y_k)

≤.

Hence the trajectory y is feasible for problemV(x), andV(x)≤z+. Finally, sinceW(x, b`)>0 and because the application z 7−→W(x, z) is non-increasing, we deduce that b` < V(x). This inequality along

(20)

with the fact thatbr−b`≤lead toV(x)> b`≥br−=z−.

In what follows, O(x) denotes the set of optimal solutions associated to the exact valueV(x). We define also the distance to setO(x) by: d(y,O(x)) := inf{ max

0≤k≤Nkyk−y¯_kk, y¯∈ O(x)}.

Theorem 4.15. Let x∈R^d such that O(x)6=∅. Let(y, a, z) be the outcome of Algorithm 5 for >0.

Then, (i) lim

→0d(y,O(x)) = 0.

(ii) lim

→0z= lim

→0V(x) =V(x);

Proof. (i) Assume that lim sup

→0

d(y,O(x)) > 0. Then, by compactness of A and (A1), there exists a subsequence (n), ¯z∈Rand a trajectory ¯ysuch thatzⁿconverges to ¯z,yⁿconverges to ¯y andd(¯y,O(x)) =

lim_n→0d(yⁿ,O(x))>0. However, from the inequalityW(x, zⁿ)≤n, we get max

Ψ(y_Nⁿ), max

0≤k≤Ng(y_kⁿ)

≤n. Therefore, by passing to the limit we obtain max

Ψ(¯y_N), max

0≤k≤Ng(¯y_k)

= 0, which proves that ¯y is an admissible trajectory. Moreover,W(x,z) = lim¯

n→0W(x, zⁿ)≤0. ThenV(x)≤z. On the other hand, we have¯ V(x)≥ lim

n→0(zⁿ−_n) = ¯z. Therefore,V(x) = ¯z and ¯y∈ O, which contradicts the fact thatd(¯y,O(x))>0.

We conclude that the claim (i) holds.

(ii) From the previous Lemma, we have: V−≤z≤V(x) +. By classical compactness arguments of A^N, we conclude the desired result.

5. Numerical simulations

Computations for Examples 1, 2 and 4 where realized with a c++ gnu 6.5.0 on a 2.2 GHz Intel i7 64 bits computer (with 16 GB RAM). Computations for Example 3 where realized using c++ gnu on a 2.4 GHz Intel XEON E5-2695 CPU 64 bits computer (with 128 GB RAM).

5.1. Example 1: a 2D example without state constraints

We first consider a simple example with no state constraints. Here, the dynamics f is defined by:

f((x1, x2), a) =





−x2

a



, for (x1, x2)∈R², a∈A:= [−1,1].

The time horizon is set toT = 1, and the distributed and final cost functions are defined by`(x, a) = 0 and Φ(x) = kxk. In this context, the structure of the exact solutions of the continuous and discrete problems

(21)

are known¹. Table 1 gives some reference values for the exact value ϑ(x) of the time-continuous problem (second column) for three initial positions x= (1,0), x = (1.5,0) and x= (2,0). The values V(x) of the corresponding discretized problem are also reported, for different values ofN, in the third column of Table 1.

x ϑ(x) V(x)

N= 10 N = 20 N = 40 (2.0,0) 1.638580 1.639542 1.639149 1.638626 (1.5,0) 1.159678 1.160946 1.160262 1.159701 (1.0,0) 0.687630 0.689017 0.688202 0.687676

Table 1: Some reference values (rounded to 6 decimal places) for the continuous and discretized problems.

Figure 2: (Example 1) Trajectories obtained by the SOP algorithm forN= 20,Imax= 10⁴, from different initial positions.

We use the optimistic algorithms to compute an approximation J^∗(x) of the optimal value V(x), for different values ofN. In these simulations, the parameters used areI_eval,0:= 10 and:= 10⁻⁵for SOPMS, and the functionpmax(n) := 5√

nfor SOP. This choice ofpmax(n) comes from the analysis of Remark 4.11.

Note that for SOPMS algorithm, the number of expanded nodes,n, may differ from the budgetImax (while for OP and SOP we have always n = Imax). The optimal trajectories of the discretized problem, with N = 20, are displayed in Figure 2 and compared to the solutions computed by SOP withImax = 10⁴. From this figure, one can see that the computed solutions are very close to the exact ones. Besides, Figure 3 shows the error|J^∗(x)−V(x)|with respect to the number of expanded nodesn(left graphics) and with respect to the CPU time (right graphics), for algorithms OP, SOP and SOPMS, in the case when N = 20. Figure 3 confirms that the SOP algorithm is much more efficient than OP in term of accuracy and CPU time. We

1For the continuous problem, with an initial positionx0= (x01,0), the optimal control law satisfies: a(t) = 1 on [0, t^∗[, and a(t) =−1 on ]t^∗, T], where the switching timet=t^∗is a root of the equation ((2t−T)−t)(¹₂(T−2t)²−t²+x01) + (2t−T) = 0.

The optimal control sequence (a^∗_k)kof the discrete problem has the following structure: a^∗_k= 1 fork < k,