Consistent Approximations for the Optimal Control of Constrained Switched Systems---Part 1: A Conceptual Algorithm

(1)

Consistent Approximations for the Optimal Control of

Constrained Switched Systems---Part 1: A Conceptual Algorithm

The MIT Faculty has made this article openly available.

Please share

how this access benefits you. Your story matters.

Citation

Vasudevan, Ramanarayan, Humberto Gonzalez, Ruzena Bajcsy,

and S. Shankar Sastry. “Consistent Approximations for the Optimal

Control of Constrained Switched Systems---Part 1: A Conceptual

Algorithm.” SIAM J. Control Optim. 51, no. 6 (January 2013): 4463–

4483. © 2013, Society for Industrial and Applied Mathematics

As Published

http://dx.doi.org/10.1137/120901490

Publisher

Society for Industrial and Applied Mathematics

Version

Final published version

Citable link

http://hdl.handle.net/1721.1/86125

Terms of Use

Article is made available in accordance with the publisher's

policy and may be subject to US copyright law. Please refer to the

publisher's site for terms of use.

(2)

CONSISTENT APPROXIMATIONS FOR THE OPTIMAL CONTROL OF CONSTRAINED SWITCHED SYSTEMS—PART 1:

A CONCEPTUAL ALGORITHM∗

RAMANARAYAN VASUDEVAN†, HUMBERTO GONZALEZ‡, RUZENA BAJCSY§, AND S. SHANKAR SASTRY§

Abstract. Switched systems, or systems whose control parameters include a continuous-valued

input and a discrete-valued input which corresponds to the mode of the system that is active at a particular instance in time, have shown to be highly effective in modeling a variety of physical phenomena. Unfortunately, the construction of an optimal control algorithm for such systems has proved difficult since it demands some form of optimal mode scheduling. In a pair of papers, we construct a first order optimization algorithm to address this problem. Our approach, which we prove in this paper converges to local minimizers of the constrained optimal control problem, first relaxes the discrete-valued input, performs traditional optimal control, and then projects the constructed relaxed discrete-valued input back to a pure discrete-valued input by employing an extension to the classical chattering lemma that we formalize. In the second part of this pair of papers, we describe how this conceptual algorithm can be recast in order to devise an implementable algorithm that constructs a sequence of points by recursive application that converge to local minimizers of the optimal control problem for switched systems.

Key words. optimal control, switched systems, chattering lemma, consistent approximations AMS subject classifications. Primary, 49J21, 49J30, 49J52, 49N25, 93C30; Secondary, 49M25, 90C11, 90C30

DOI. 10.1137/120901490

1. Introduction. LetQ and J be ﬁnite sets and U ⊂ Rm _{a compact, convex,}

nonempty set. Let f : [0, 1]× Rn× U × Q → Rn be a vector field overRnthat defines a switched system, i.e., for each i∈ Q, the map (t, x, u) → f(t, x, u, i) is a classical controlled vector field. Let{hj}_j_∈J be a collection of functions, hj:Rn→ R for each

j ∈ J , and h₀: Rn → R. Given an initial condition x₀ ∈ Rn, we are interested in solving the following optimal control problem:

infh₀x(1)| ˙x(t) = ft, x(t), u(t), d(t), x(0) = x₀, hj

x(t)≤ 0,

(1.1)

x(t)∈ Rn, u(t)∈ U, d(t) ∈ Q, for a.e. t ∈ [0, 1] ∀j ∈ J.

The optimization problem defined in (1.1) is numerically challenging because the function d : [0, 1] → Q maps to a finite set rendering the immediate application of derivative-based algorithms impossible. This paper, which is the first in a two-part series, presents a conceptual algorithm, Algorithm 4.1, to solve the problem described in (1.1) and a theorem proving the convergence of the algorithm to points satisfying a necessary condition for optimality, Theorem 5.16.

1.1. Related work. Switched systems, a particular class of hybrid systems,

have been used in a variety of modeling applications including automobiles and

∗_{Received by the editors December 6, 2012; accepted for publication (in revised form) September}

3, 2013; published electronically December 11, 2013. This material is based upon work supported by the National Science Foundation under award ECCS-0931437.

http://www.siam.org/journals/sicon/51-6/90149.html

†_{CSAIL, Massachusetts Institute of Technology, Cambridge, MA 02139 (ramv@csail.mit.edu).} ‡_{ESE, Washington University in St. Louis, St. Louis, MO 63130 (hgonzale@ese.wustl.edu).} §_{EECS, University of California at Berkeley, Berkeley, CA 94720 (bajcsy@eecs.berkeley.edu,}

sastry@eecs.berkeley.edu).

4463

(3)

locomotives employing diﬀerent gears [15, 26], manufacturing systems [8], and sit-uations where a control module switches its attention among a number of subsys-tems [18, 33]. Given their utility, there has been considerable interest in devising algorithms to perform optimal control for such systems. Since the discrete mode of operation of switched systems is completely controlled, the determination of an optimal control for such systems is challenging due to the combinatorial nature of calculating an optimal discrete mode schedule. The theoretical underpinnings for the optimal control of switched systems have been considered in various forms includ-ing a suﬃcient condition for optimality via quasi-variational inequalities by Branicky, Borkar, and Mitter [7] and extensions of the classical maximum principle for hybrid systems by Piccoli [21], Riedinger, Iung, and Kratz [25], and Sussmann [31].

Implementable algorithms for the optimal control of switched systems have been pursued in numerous forms. Xu and Antsaklis [36] proposed one of the first such algorithms, wherein a bilevel optimization scheme that fixes the mode schedule iter-ates between a low-level algorithm that optimizes the continuous input within each visited mode and a high-level algorithm that optimizes the time spent within each visited mode. The Gauss pseudospectral optimal control software developed by Rao et al. [24] applies a variant of this approach. Shaikh and Caines [28] modified this bilevel scheme by designing a low-level algorithm that simultaneously optimizes the continuous input and the time spent within each mode for a fixed mode schedule by employing the maximum principle and a high-level algorithm that modifies the mode schedule by using the Hamming distance to compare nearby mode schedules. Xu and Antsaklis [37] and Shaikh and Caines [29] each extended their approach to systems with state-dependent switching. Alamir and Attia [1] proposed a scheme based on the maximum principle. Considering the algorithm constructed in this paper, the most relevant approach is the one described by Bengea and DeCarlo [4], which relaxes the discrete-valued input into a continuous-valued input over which it can apply the max-imum principle to perform optimal control and then uses the chattering lemma [5] to approximate the optimal relaxation with another discrete-valued input. However, no constructive method to generate such an approximation is included. Importantly, the numerical implementation of optimal control algorithms that rely on the maximum principle for nonlinear systems including switched systems is fundamentally restricted due to their reliance on approximating needle variations with arbitrary precision as explained in [20].

Several authors [9, 16] have focused on the optimization of autonomous switched systems (i.e., switched systems without a continuous input) by fixing a mode sequence and optimizing the amount of time spent within each mode using implementable vector space variations. Axelsson et al. [3] extended this approach by employing a variant of the bilevel optimization strategy proposed in [28]. In the low level they optimized the amount of time spent within each mode using a first order algorithm, and in the high level they modified the mode sequence by employing a single-mode insertion technique. There have been two major extensions to this result. First, Wardi and Egerstedt [35] extended the approach by constructing an algorithm that allowed for the simultaneous insertion of several modes each for a nonzero amount of time. This reformulation transformed the bilevel optimization strategy into a single step optimization algorithm but required exact computation in order to prove convergence. Wardi, Egerstedt, and Twu [34] addressed this deficiency by constructing an adaptive-precision algorithm that balanced the requirement for adaptive-precision with limitations on computation. Second, Gonzalez et al. [12, 13], extended the bilevel optimization approach to constrained nonautonomous switched systems. Though these insertion

(4)

techniques avoid the computational expense of considering all possible mode schedules during optimization, they restrict the possible modiﬁcations of the existing mode schedule, which may terminate the execution in undesired stationary points.

1.2. Our contribution and organization. In this paper, we devise and

im-plement a conceptual ﬁrst order algorithm for the optimal control of constrained nonlinear switched systems. In sections 2 and 3, we introduce the notation and as-sumptions used throughout the paper and formalize the optimization problem we solve. Our approach, which is formulated in section 4, computes a step by treating the discrete-valued input as a continuous-valued one and then uses a gradient de-scent technique on this continuous-valued input. Using an extension of the chattering lemma which we develop, a discrete-valued approximation to the output of the gra-dient descent algorithm is constructed. In section 5, we prove that the sequence of points generated by recursive application of our algorithm asymptotically satisﬁes a necessary condition for optimality. The second part of this two-paper series [32] de-scribes an implementable version of our conceptual algorithm and includes several benchmarking examples to illustrate our approach.

2. Notation. We present a summary of the notation used throughout the paper.

·p standard vector space p-norm

·i,p matrix induced p-norm

·Lp functional L

p_-norm

Q = {1, . . . , q} set of discrete mode indices (section 1)

J = {1, . . . , Nc} set of constraint function indices (section 1)

U ⊂ Rm _{compact, convex, nonempty set (section 1)}

Σq

r q-simplex inRq (equation (3.5))

Σq

p={ei}_i_∈Q corners of the q-simplex inRq (equation (3.6))

Dr relaxed discrete input space (section 3.2)

Dp pure discrete input space (section 3.2)

U continuous input space (section 3.2)

(X , ·_X) general optimization space and its norm (section 3.2)

Xr relaxed optimization space (section 3.2)

Xp pure optimization space (section 3.2)

N (ξ, ε) ε-ball around ξ in theX -norm (Deﬁnition 4.1) Nw(ξ, ε) ε-ball around ξ in the weak topology (Deﬁnition 4.3)

di or [d]i ith coordinate of d : [0, 1]→ Rq

f : [0, 1]× Rn× U × Q → Rn switched system (section 1)

V : L2([0, 1],Rn)→ R total variation operator (equation (3.3))

h₀:Rn→ R terminal cost function (section 3.3)

{hj:Rn→ R}_j_∈J collection of constraint functions (section 3.3)

x(ξ): [0, 1]→ Rn trajectory of the system given ξ (equation (3.8))

φt:Xr→ Rn ﬂow of the system at t∈ [0, 1] (equation (3.9))

J : Xr→ R cost function (equation (3.10))

Ψ :Xr→ R constraint function (equation (3.11))

ψj,t:Xr→ R component constraint functions (equation (3.12))

Pp optimal control problem (equation (3.13))

Pr relaxed optimal control problem (equation (4.1))

D derivative operator (equation (4.2))

θ :Xp→ (−∞, 0] optimality function (equation (4.3))

g : Xp→ Xr descent direction function (equation (4.3))

(5)

ζ :Xp× Xr→ R optimality function objective (equation (4.4))

μ :Xp → N step size function (equation (4.5))

1A: [0, 1]→ R indicator function of A⊂ [0, 1], 1 = 1_[0,1](section 4.4)

FN:Dr→ Dr Haar wavelet operator (equation (4.7))

PN: Dr→ Dp pulse-width modulation operator (equation (4.8))

ρN:Xr→ Xp projection operator (equation (4.9))

ν :Xp→ N frequency modulation function (equation (4.10))

Γ :Xp→ Xp algorithm recursion function (equation (4.11))

Φ(ξ): [0, 1]× [0, 1] → Rn×n _{state transition matrix given ξ (equation (5.2))}

3. Preliminaries. We begin by formulating the problem we solve in this paper. 3.1. Norms and functional spaces. We focus on the optimization of functions

with ﬁnite L2-norm and bounded variation. Given n∈ N, f : [0, 1] → Rn belongs to

L2([0, 1],Rn_{) if} (3.1) f_L2= ₁ 0 f(t) 2 2dt 1 2 <∞. Similarly, f : [0, 1]→ Rn _{belongs to L}∞_{([0, 1],}_Rn_{) if} (3.2) f_L∞ = inf

α≥ 0 | f(x)₂≤ α for almost every x ∈ [0, 1]<∞.

Both definitions are formulated with respect to the Lebesgue measure on [0, 1]. In order to define the space of functions of bounded variation, we first define the total variation of a function. Given P , the set of all finite partitions of [0, 1], we define the

total variation of f : [0, 1]→ Rn _by (3.3) V(f ) = sup _m₋₁ j₌₀ f(tj+1)− f(tj)₁| {tk} m k=0∈ P, m ∈ N .

Note that the total variation of f is a seminorm, i.e., it satisﬁes the triangle inequality but does not separate points. f is of bounded variation if V(f ) <∞, and we deﬁne

BV ([0, 1],Rn_{) to be the set of all functions of bounded variation from [0, 1] to} _Rn_.

The following result, which is an extension of Propositions X.1.3 and X.1.4 in [17] for vector-valued functions, is critical to our analysis of functions of bounded variation.

Proposition 3.1. _{If f} ∈ BV ([0, 1], Rn_{) and v : [0, 1]}→ R is bounded and

diﬀer-entiable almost everywhere, then

(3.4)

₁

0

f (t) ˙v(t) dt ≤ v_L∞V(f ).

3.2. Optimization spaces. To formalize the optimal control problem, we deﬁne

three spaces: the pure discrete input space, Dp, the relaxed discrete input space, Dr,

and the continuous input space,U. Let the q-simplex, Σq

r, be deﬁned as (3.5) Σq_r= (d₁, . . . , dq)∈ [0, 1]q| q i=1 di= 1 ,

and let the corners of the q-simplex, Σqp, be deﬁned as

(3.6) Σqp = (d₁, . . . , dq)∈ {0, 1} q | q i₌₁ di= 1 .

(6)

Note that Σq

p ⊂ Σqr. Since there are exactly q elements in Σqp, we write Σqp =

{e1, . . . , eq}. We employ the elements in Σqp to index the vector ﬁelds deﬁned in

section 1, i.e., for each i∈ Q we write f(·, ·, ·, ei) for f (·, ·, ·, i). Also note that for each

d∈ Σq r, d =

q i=1diei.

Using this notation, we deﬁne the pure discrete input space,Dp= L2([0, 1], Σqp)∩

BV ([0, 1], Σq

p), the relaxed discrete input space, Dr = L2([0, 1], Σqr)∩ BV ([0, 1], Σqr),

and the continuous input space, U = L2([0, 1], U )∩ BV ([0, 1], U). Let the general

optimization space be X = L2([0, 1],Rm₎_{× L}2_{([0, 1],}_Rq_{) endowed with the norm}

ξ_X =u_L2+dL2 for each ξ = (u, d)∈ X . We combine U and Dp to deﬁne the

pure optimization space, Xp =U × Dp, and we endow it with the same norm as X .

Similarly, we combine U and Dr to deﬁne the relaxed optimization space, Xr =

U × Dr, and endow it with theX -norm too. Note that Xp ⊂ Xr⊂ X . Note that we

emphasize the fact that our input spaces are subsets of L2 to motivate the appropri-ateness of theX -norm.

3.3. Trajectories, cost, constraint, and the optimal control problem.

Given ξ = (u, d)∈ Xr, for convenience throughout the paper we let

(3.7) ft, x(t), u(t), d(t)= q i=1 di(t)f t, x(t), u(t), ei .

Note that for points belonging to the pure optimization space, the bounded variation assumption restricts the number of switches between modes to be ﬁnite.

We make the following assumptions about the dynamics.

Assumption 3.2. For each i ∈ Q and t ∈ [0, 1], the map (x, u) → f(t, x, u, ei)

is diﬀerentiable. Also, for each i∈ Q the vector ﬁeld and its partial derivatives are Lipschitz continuous, i.e., there exists L > 0 such that given t₁, t₂∈ [0, 1], x₁, x₂∈ Rn_,

and u₁, u₂∈ U,

(1) f(t₁, x₁, u₁, ei)− f(t₂, x₂, u₂, ei)₂≤ L(|t₁− t₂| + x₁− x₂₂+u₁− u₂₂),

(2) ∂f_∂x(t₁, x₁, u₁, ei)−∂f_∂x(t2, x2, u2, ei)_i,₂≤ L(|t1−t2|+x1−x22+u1−u22),

(3) ∂f_∂u(t₁, x₁, u₁, ei)−∂f_∂u(t₂, x₂, u₂, ei)_i,₂≤ L(|t₁−t₂|+x₁−x₂₂+u₁−u₂₂).

Given x₀ ∈ Rn, a trajectory of the system corresponding to ξ ∈ Xr is the

solu-tion to

(3.8) x(t) = f˙ t, x(t), u(t), d(t) for a.e. t∈ [0, 1], x(0) = x₀,

and denote it by x(ξ): [0, 1]→ Rn _{to emphasize its dependence on ξ. We also deﬁne}

the ﬂow of the system, φt:Xr→ Rn for each t∈ [0, 1] as

(3.9) φt(ξ) = x(ξ)(t).

The next result is a straightforward extension of the classical existence and uniqueness theorem for nonlinear diﬀerential equations (see Lemma 5.6.3 and Propo-sition 5.6.5 in [22]).

Theorem 3.3. For each ξ = (u, d)∈ X , diﬀerential equation (3.8) has a unique

solution. Moreover, there exists C > 0 such that for each ξ ∈ Xr and t ∈ [0, 1],

x(ξ)(t) ≤C.

(7)

To deﬁne the cost function, we assume that we are given a terminal cost function,

h₀:Rn _{→ R. The cost function, J : X}

r→ R, for the optimal control problem is then

deﬁned as

(3.10) J (ξ) = h₀x(ξ)(1).

If the problem formulation includes a running cost, then one can extend the existing state vector by introducing a new state and modifying the cost function to evaluate this new state at the final time, as shown in section 4.1.2 in [22]. Employing this modification, observe that each mode of the switched system can have a different running cost associated with it.

Given ξ∈ Xr, the state x(ξ)is said to satisfy the constraint if hj

x(ξ)(t)≤ 0 for each t∈ [0, 1] and each j ∈ J . We compactly describe all the constraints by deﬁning the constraint function Ψ :Xr→ R by

(3.11) Ψ(ξ) = maxhj

x(ξ)(t)| t ∈ [0, 1], j ∈ J since hj

x(ξ)(t)≤ 0 for each t and j if and only if Ψ(ξ) ≤ 0. To ensure clarity in the ensuing analysis, it is useful to sometimes emphasize the dependence of hj

x(ξ)(t) on ξ. Therefore, we deﬁne component constraint functions ψj,t: Xr → R for each

t∈ [0, 1] and j ∈ J as (3.12) ψj,t(ξ) = hj φt(ξ) .

With these deﬁnitions, we can state the optimal control problem, which is a restate-ment of the problem deﬁned in (1.1):

inf{J(ξ) | Ψ(ξ) ≤ 0, ξ ∈ Xp} . (Pp)

(3.13)

We introduce the following assumption to ensure the well-posedness of Pp.

Assumption 3.4. The functions h₀ and {hj}_j_∈J, and their derivatives, are

Lipschitz continuous, i.e., there exists L > 0 such that given x₁, x₂∈ Rn_,

(1) |h₀(x₁)− h₀(x₂)| ≤ L x₁− x₂₂, (2) ∂h0 ∂x(x1)− ∂h₀ ∂x(x2)₂≤ L x1− x22, (3) |hj(x1)− hj(x2)| ≤ L x1− x2₂, (4) ∂h_∂xj(x₁)−∂h_∂xj(x₂)₂≤ L x₁− x₂₂.

4. Optimization algorithm. In this section, we describe our optimization

al-gorithm to solve Pp, which proceeds as follows: ﬁrst, we treat a given pure discrete

input as a relaxed discrete input by allowing it to belongDr; second, we perform

opti-mal control over the relaxed optimization space; and finally, we project the computed relaxed input into a pure input. We begin with a brief digression to explain why such a roundabout construction is required in order to devise a first order numerical optimal control scheme to solve Pp defined in (3.13).

4.1. Directional derivatives. To appreciate why the construction of a

nu-merical scheme to find the local minima of Pp defined in (3.13) is difficult, suppose

that the optimization in the problem took place over the relaxed optimization space rather than the pure optimization space. The relaxed optimal control problem is then deﬁned as

inf{J(ξ) | Ψ(ξ) ≤ 0, ξ ∈ Xr} . (Pr)

(4.1)

(8)

The local minimizers of this problem are then deﬁned as follows.

Definition 4.1. Let an ε-ball in theX -norm around ξ be deﬁned as N (ξ, ε) = _¯

ξ∈ Xr|ξ− ¯ξ_X < ε

. A point ξ∈ Xr is a local minimizer of Pr if Ψ(ξ)≤ 0 and

there exists ε > 0 such that J ( ˆξ)≥ J(ξ) for each ˆξ ∈ N (ξ, ε) ∩ξ¯∈ Xr| Ψ(¯ξ) ≤ 0

.

To concretize how to formulate an algorithm to ﬁnd minimizers in Pr, we introduce

some additional notation. Given ξ∈ Xrand any function G : Xr→ Rn, the directional

derivative of G at ξ, denoted DG(ξ;·): X → Rn_{, is computed as}

(4.2) DG(ξ; ξ) = lim

λ_↓0

G(ξ + λξ)− G(ξ)

λ

when the limit exists. Using the directional derivative DJ (ξ; ξ), the ﬁrst order ap-proximation of the cost at ξ in the ξ∈ X direction is J(ξ + λξ)≈ J(ξ) + λ DJ(ξ; ξ) for λ suﬃciently small. Similarly, using the directional derivative Dψj,t(ξ; ξ) one can

construct a ﬁrst order approximation of ψj,tat ξ in the ξ ∈ X direction.

It follows from the ﬁrst order approximation of the cost that if DJ (ξ; ξ) is neg-ative, then it is possible to decrease the cost by moving in the ξ direction. In an identical fashion, one can argue that if ξ is infeasible and Dψj,t(ξ; ξ) is negative,

then it is possible to reduce the infeasibility of ψj,t by moving in the ξ direction.

Employing these observations, one can utilize the directional derivatives of the cost and component constraints to construct both a descent direction and a necessary condition for optimality for Pr. Unfortunately, trying to exploit the same strategy

for Pp is untenable, since it is unclear how to deﬁne directional derivatives for Dp

which is not a vector space. However, as we describe next, in an appropriate topology overXp there exists a way to exploit the directional derivatives of the cost and

com-ponent constraints in Xr while reasoning about the optimality of points in Xp with

respect to Pp.

4.2. The weak topology on the optimization space and local minimiz-ers. To motivate the type of required relationship, we begin by describing the

chat-tering lemma.

Theorem 4.2 (_{Theorem 4.3 in [5]}1). _{For each ξ}_r∈ X_r _{and ε > 0 there exists a}

ξp ∈ Xp such that for each t∈ [0, 1], φt(ξr)− φt(ξp)₂≤ ε.

Theorem 4.2 says that any trajectory produced by an element inXr can be

ap-proximated arbitrarily well by a point in Xp. Unfortunately, the relaxed and pure

points prescribed as in Theorem 4.2 need not be near one another in the metric induced by the X -norm. However, these points can be made arbitrarily close in a diﬀerent topology.

Definition 4.3. The weak topology onX_rinduced by diﬀerential equation (3.8)

is the smallest topology onXr such that the map ξ→ x(ξ) is continuous with respect

to the L2-norm. Moreover, an ε-ball in the weak topology around ξ is deﬁned as Nw(ξ, ε) =

_¯

ξ∈ Xr|x(ξ)− x(¯ξ)_L2 < ε

.

More details about the weak topology of a space can be found in section 3.8 in [27]. To understand the relationship betweenN and Nw, observe that φtis Lipschitz

continuous for all t ∈ [0, 1], say, with constant L > 0 as is proved in Lemma 5.1. It follows that for any ξ∈ Xr and ε > 0, if ¯ξ∈ N (ξ, ε), then ¯ξ ∈ Nw(ξ, Lε). However,

1_{The theorem as proved in [5] is not immediately applicable to switched systems, but an extension}

as proved in Theorem 1 in [4] makes that feasible. A particular version of this existence theorem can also be found in Lemma 1 in [30].

(9)

the converse is not true in general. Since the weak topology, in contrast to theX -norm induced topology, places points that generate nearby trajectories next to one another, and using the fact thatXp⊂ Xr, we can deﬁne the concept of local minimizer for Pp.

Definition 4.4. We say that ξ∈ X_p is a local minimizer of P_p if Ψ(ξ)≤ 0 and

there exists ε > 0 such that J ( ˆξ)≥ J(ξ) for each ˆξ ∈ Nw(ξ, ε)∩

_¯

ξ∈ Xp| Ψ(¯ξ) ≤ 0

.

With this deﬁnition of local minimizer, we can exploit Theorem 4.2, as an exis-tence result, along with the notion of directional derivative over the relaxed optimiza-tion space to construct a necessary condioptimiza-tion for optimality for Pp.

4.3. An optimality condition. Motivated by the approach in [22], we deﬁne

an optimality function, θ :Xp→ (−∞, 0], which determines whether a given point is

a local minimizer of Pp, and a corresponding descent direction, g :Xp→ Xr,

θ(ξ) = min ξ∈X_rζ(ξ, ξ _{) + V(ξ}_{− ξ),} _{g(ξ) = arg min} ξ∈X_r ζ(ξ, ξ) + V(ξ− ξ), (4.3) where (4.4) ζ(ξ, ξ) = ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ max max (j,t)∈J ×[0,1]Dψj,t(ξ; ξ _{− ξ) + γΨ(ξ), DJ(ξ; ξ}_{− ξ)} +ξ− ξ_X if Ψ(ξ)≤ 0, max max (j,t)∈J ×[0,1]Dψj,t(ξ; ξ _{− ξ), DJ(ξ; ξ}_{− ξ) − Ψ(ξ)} +ξ− ξ_X if Ψ(ξ) > 0,

with γ > 0 a design parameter, and where without loss of generality for each ξ = (u, d) ∈ Xr, V(ξ) = V(u) + V(d). The well-posedness of θ(ξ) and g(ξ) for each

ξ ∈ Xp follows from Theorem 5.5. Note that θ(ξ) ≤ 0 for each ξ ∈ Xp, as we show

in Theorem 5.6. Also note that the directional derivatives in (4.4) consider directions

ξ− ξ in order to ensure that they belong to Xr. Our deﬁnition of ζ does not include

the total variation term since it does not satisfy the same properties as the directional derivatives or the X -norm, such as Lipschitz continuity (as shown in Lemma 5.4). In fact, the total variation term is added in order to make sure that g(ξ)− ξ is of bounded variation.

To understand how the optimality function behaves, consider several cases. First, if θ(ξ) < 0 and Ψ(ξ) = 0, then there exists a ξ ∈ Xr such that both DJ (ξ; ξ− ξ)

and Dψj,t(ξ; ξ− ξ) are negative for all j ∈ J and t ∈ [0, 1]; thus, using the ﬁrst order

approximation argument in section 4.1 and Theorem 4.2, ξ is not a local minimizer of Pp. Second, if θ(ξ) < 0 and Ψ(ξ) < 0, an identical argument shows that ξ is

not a local minimizer of Pp. Finally, if θ(ξ) < 0 and Ψ(ξ) > 0, then there exists a

ξ ∈ Xr such that Dψj,t(ξ; ξ− ξ) is negative for all j ∈ J and t ∈ [0, 1]. It is clear

that ξ is not a local minimizer of Pp since Ψ(ξ) > 0, but it also follows by the ﬁrst

order approximation argument in section 4.1 and Theorem 4.2 that it is possible to locally reduce the infeasibility of ξ. In this case, the addition of the DJ term in ζ is a heuristic to ensure that the reduction in infeasibility does not come at the price of an undue increase in the cost.

These observations are formalized in Theorem 5.6, where we prove that if ξ is a local minimizer of Pp as in Deﬁnition 4.4, then θ(ξ) = 0, i.e., θ(ξ) = 0 is a necessary

condition for the optimality of ξ. Recall that in ﬁrst order algorithms for ﬁnite-dimensional optimization the lack of negative directional derivatives at a point is a

(10)

necessary condition for optimality of a point (for example, see Corollary 1.1.3 in [22]). Intuitively, θ is the inﬁnite-dimensional analogue for Pp of the directional derivative

for ﬁnite-dimensional optimization. Hence, if θ(ξ) = 0 we say that ξ satisﬁes the

optimality condition.

4.4. Choosing a step size and projecting the relaxed discrete input.

Theorem 4.2 is unable to describe how to exploit the descent direction, g(ξ), since its proof provides no means to construct a pure input that approximates the behavior of a relaxed input while controlling the quality of the approximation. We extend Theorem 4.2 by devising a scheme that allows for the development of a numerical optimal control algorithm for Pp that ﬁrst performs optimal control over the relaxed

optimization space and then projects the computed relaxed control into a pure control. The argument that minimizes ζ is a “direction” along which to move the inputs in order to reduce the cost in the relaxed optimization space, but ﬁrst we require an algorithm to choose a step size. We employ a line search algorithm similar to the traditional Armijo algorithm used in ﬁnite-dimensional optimization in order to choose a step size [2]. Let α, β ∈ (0, 1); then the step size for ξ is chosen as βμ(ξ)_{, where the}

step size function, μ :Xp→ N, is deﬁned as

μ(ξ) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ mink∈ N | Jξ + βk(g(ξ)− ξ)− J(ξ) ≤ αβkθ(ξ), Ψξ + βk(g(ξ)− ξ)≤ αβkθ(ξ) if Ψ(ξ)≤ 0, mink∈ N | Ψξ + βk_(g(ξ)_{− ξ)}_{− Ψ(ξ) ≤ αβ}k_θ(ξ) _{if Ψ(ξ) > 0.} (4.5)

Lemma 5.13 proves that if θ(ξ) < 0 for some ξ ∈ Xp, then μ(ξ) <∞, which means

that we can construct a pointξ + βμ(ξ)_(g(ξ)_{− ξ)}_{∈ X}

r that produces a reduction

in the cost (if ξ is feasible) or a reduction in the infeasibility (if ξ is infeasible). Now, we deﬁne a projection that takes this constructed point in Xr to a point

belonging toXp while controlling the quality of approximation in two steps. First, let

λ :R → R be the Haar wavelet (section 7.2.2 in [19]):

λ(t) = ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 1 if t∈ 0,1 2 , −1 if t ∈ 1 2, 1 , 0 otherwise. (4.6)

Let1A: [0, 1]→ R be the indicator function of A ⊂ [0, 1], and let 1 = 1_[0,1]. Given

k∈ N and j ∈0, . . . , 2k− 1, let bk,j: [0, 1]→ R be deﬁned by bk,j(t) = λ(2kt− j).

The Haar wavelet operator, FN: Dr → Dr, computing the N th partial sum Haar

wavelet approximation is deﬁned as

(4.7) FN(d) i(t) = di,1 + N k₌₀ 2k₋₁ j=0 di, bk,j bk,j(t) bk,j2_L2 .

The inner product here is the traditional inner product in L2. Without loss of general-ity we applyFN to continuous inputs inU, noting that the only change is the domain

of the operator. Lemma 5.7 proves that for each N ∈ N and d ∈ Dr,FN(d)∈ Dr.

We can project the output of FN(d), which is piecewise constant, to a pure

discrete input by employing the pulse-width modulation operator, PN: Dr → Dp,

which computes the pulse-width modulation of its argument with frequency 2−N:

(11)

(4.8) PN(d) i(t) = 1 if t∈SN,k,i₋₁, SN,k,i , k∈0, 1, . . . , 2N_{− 1}_, 0 otherwise, where SN,k,i= 2−N k + i_j₌₁dj _k 2N

. Lemma 5.7 also proves that for each N ∈ N and d ∈ Dr, PN

FN(d)

∈ Dp, as desired. Given N ∈ N, we compose the two

operators and deﬁne the projection operator, ρN:Xr→ Xp, as

(4.9) ρN(u, d) = FN(u),PN FN(d) .

As shown in Theorem 5.10, this projection allows us to extend Theorem 4.2 by providing a rate of convergence for the approximation error between ξr ∈ Xr and

ρN(ξr)∈ Xp. In our algorithm, we control this approximation error by choosing the

frequency at which we perform the pulse-width modulation with a rule inspired in the Armijo algorithm. Let ¯α∈ (0, ∞), ¯β ∈√1

2, 1

, and ω∈ (0, 1); then we modulate a point ξ with frequency 2−ν(ξ), where the frequency modulation function, ν :Xp→ N,

is deﬁned as ν(ξ) = ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ mink∈ N |ξ = ξ + βμ(ξ)_(g(ξ)_{− ξ),} Jρk(ξ) − J(ξ) ≤αβμ(ξ)_{− ¯α ¯β}k_θ(ξ), Ψρk(ξ) ≤ 0, ¯ α ¯βk _{≤ (1 − ω)αβ}μ(ξ) _{if Ψ(ξ)}_{≤ 0,} mink∈ N |ξ = ξ + βμ(ξ)_(g(ξ)_{− ξ),} Ψρk(ξ) − Ψ(ξ) ≤αβμ(ξ)_{− ¯α ¯β}k_θ(ξ), ¯ α ¯βk _{≤ (1 − ω)αβ}μ(ξ) _{if Ψ(ξ) > 0.} (4.10)

Lemma 5.14 proves that if θ(ξ) < 0 for some ξ ∈ Xp, then ν(ξ) <∞, which implies

that we can construct a new point ρν(ξ)

ξ + βμ(ξ)(g(ξ)− ξ) ∈ Xp that produces

a reduction in the cost (if ξ is feasible) or a reduction in the infeasibility (if ξ is infeasible).

4.5. Switched system optimal control algorithm. Algorithm 4.1 describes

our numerical method to solve Pp. For analysis purposes, we deﬁne the algorithm

recursion function, Γ :Xp→ Xp, by

(4.11) Γ(ξ) = ρν(ξ)

ξ + βμ(ξ)(g(ξ)− ξ).

We say{ξj}j_∈Nis a sequence generated by Algorithm 4.1 if ξj₊₁ = Γ(ξj) for each j ∈ N

such that θ(ξj) = 0, and ξj₊₁= ξj otherwise. We prove several important properties

about the sequence generated by Algorithm 4.1. First, Lemma 5.15 proves that if there exists i₀ ∈ N such that Ψ(ξi₀)≤ 0, then Ψ(ξi) ≤ 0 for each i ≥ i₀. Second,

Theorem 5.16 proves that limj_→∞θ(ξj) = 0, i.e., Algorithm 4.1 converges to a point

that satisﬁes the optimality condition.

5. Algorithm analysis. In this section, we derive the various components of

Algorithm 4.1 and prove that Algorithm 4.1 converges to a point that satisfies our optimality condition. Our argument proceeds as follows: first, we construct the com-ponents of the optimality function and prove that these comcom-ponents satisfy various properties that ensure the well-posedness of the optimality function; second, we prove that we can control the quality of approximation between the trajectories generated by a relaxed discrete input and its projection by ρN as a function of N ; and finally,

we prove the convergence of our algorithm.

(12)

Algorithm 4.1_{. Conceptual optimization algorithm for P}_p_. Require: ξ₀∈ Xp, α, β, ω∈ (0, 1), ¯α, γ ∈ (0, ∞), ¯β ∈ ₁ √ 2, 1 . 1: Set j = 0. 2: loop 3: Compute θ(ξj), g(ξj), μ(ξj), and ν(ξj). 4: if θ(ξj) = 0 then 5: return ξj. 6: end if 7: Set ξj₊₁= ρν(ξj) ξj+ βμ(ξj)(g(ξj)− ξj) . 8: Replace j by j + 1. 9: end loop

5.1. Derivation of algorithm terms. We begin by noting the Lipschitz

con-tinuity of the solution to diﬀerential equation (3.8) with respect to ξ. The argument is a simple extension of Lemma 5.6.7 in [22], and therefore we omit the details.

Lemma 5.1. There exists a constant L > 0 such that for each ξ₁, ξ₂ ∈ X_r and

t∈ [0, 1], φt(ξ₁)− φt(ξ₂)₂≤ L ξ₁− ξ₂_X.

Next, we formally derive the components of the optimality function. This result could be proved using an argument similar to Theorem 4.1 in [6], but we include the proof for the sake of thoroughness and in order to draw comparisons with its corresponding argument, Lemma 4.4, in the second of this pair of papers.

Lemma 5.2. _{Let ξ = (u, d)} ∈ X_r _{and ξ} _{= (u}_{, d}₎ ∈ X . Then the directional

derivative of φt, as deﬁned in (4.2), is given by

(5.1) Dφt(ξ; ξ) = t 0 Φ(ξ)(t, τ ) ∂f ∂u τ, φτ(ξ), u(τ ), d(τ ) u(τ ) + fτ, φτ(ξ), u(τ ), d(τ ) dτ, where Φ(ξ)(t, τ ) is the unique solution of the following matrix diﬀerential equation:

(5.2) ∂Φ ∂t(t, τ ) = ∂f ∂x t, φt(ξ), u(t), d(t) Φ(t, τ ) ∀t ∈ [0, 1], Φ(τ, τ) = I.

Proof. For notational convenience, let x(λ) = x(ξ+λξ), u(λ) = u + λu, and

d(λ) = d + λd. Then, if we deﬁne Δx(λ) = x(λ)− x(ξ), and using the mean value theorem, Δx(λ)(t) = t 0 λ q i=1 d_i(τ )fτ, x(λ)(τ ), u(λ)(τ ), ei dτ (5.3) + t 0 ∂f ∂x τ, x(ξ)(τ ) + νx(τ )Δx(λ)(τ ), u(λ)(τ ), d(τ ) Δx(λ)(τ )dτ + t 0 λ∂f ∂u τ, x(ξ)(τ ), u(τ ) + νu(τ )λu(τ ), d(τ ) u(τ )dτ,

where νu, νx: [0, t]→ [0, 1]. Let z(t) be the unique solution of the following diﬀerential

equation: ˙ z(τ ) = ∂f ∂x τ, x(ξ)(τ ), u(τ ), d(τ )z(τ ) +∂f ∂u τ, x(ξ)(τ ), u(τ ), d(τ )u(τ ) (5.4) + q i=1 d_i(τ )fτ, x(ξ)(τ ), u(τ ), ei , τ ∈ [0, t], z(0) = 0.

(13)

We want to show that limλ_↓01_λΔx(λ)(t)− z(t)₂ = 0. To prove this, consider the

inequalities that follow from condition (2) in Assumption 3.2, t 0 ∂f ∂x τ, x(ξ)(τ ), u(τ ), d(τ )z(τ ) (5.5) −∂f ∂x τ, x(ξ)(τ ) + νx(τ )Δx(λ)(τ ), u(λ)(τ ), d(τ ) Δx(λ)_{(τ )} λ dτ 2 ≤ L t 0 z(τ )− Δx(λ)(τ ) λ 2 dτ + L t 0 _Δx_(λ)_{(τ )} 2+ λu(τ )2 z(t)₂dτ,

from condition (3) in Assumption 3.2, t 0 ∂f ∂u τ, x(ξ)(τ ), u(τ ), d(τ ) (5.6) −∂f ∂u τ, x(ξ)(τ ), u(τ ) + νu(τ )λu(τ ), d(τ ) u(τ )dτ 2 ≤ L t 0 λνu(τ )u(τ )₂u(τ )₂dτ ≤ L t 0 λu(τ )2₂dτ,

and from condition (1) in Assumption 3.2, t 0 q i=1 d_i(τ )fτ, x(ξ)(τ ), u(τ ), ei r− fτ, x(λ)(τ ), u(λ)(τ ), ei dτ 2 (5.7) ≤ L t 0 q i₌₁ di(τ )Δx(λ)(τ )₂+ λu(τ )₂ dτ.

Then, using the Bellman–Gronwall inequality, Δx(λ)(t) λ − z(t) 2 ≤ eLt_L t 0 Δx(λ)_{(τ )} 2+ λu(τ )2 z(t)₂+ λu(τ )2₂ (5.8) + q i=1 di(τ )Δx(λ)(τ )₂+ λu(τ )₂ dτ ,

but note that every term in the integral above is bounded, and Δx(λ)(τ ) → 0 for each τ ∈ [0, t] since x(λ) → x(ξ) uniformly as follows from Lemma 5.1. Thus, by the dominated convergence theorem and by noting that Dφt(ξ; ξ) is exactly the solution

of diﬀerential equation (5.4) we get

lim λ_↓0 Δx(λ)_λ (t)− z(t) 2 = lim λ_↓0 1 λx (ξ+λξ₎ (t)− x(ξ)(t)− Dφt(ξ; λξ)₂= 0. (5.9)

The result of the lemma then follows.

(14)

We now construct the directional derivative of the cost and component constraint functions, which follows from the chain rule and Lemma 5.2.

Lemma 5.3. Let ξ ∈ X_r, ξ ∈ X , j ∈ J , and t ∈ [0, 1]. The directional derivatives

of the cost J and the component constraint ψj,t in the ξ direction are

(5.10) DJ (ξ; ξ) = ∂h0 ∂x φ₁(ξ)Dφ₁(ξ; ξ) and Dψj,t(ξ; ξ) = ∂hj ∂x φt(ξ) Dφt(ξ; ξ).

Lemma 5.4. There exists an L > 0 such that for each ξ₁, ξ₂ ∈ X_r, ξ ∈ X , and

t∈ [0, 1],

(1) Dφt(ξ1; ξ)− Dφt(ξ2; ξ)₂≤ L ξ1− ξ2_Xξ_X,

(2) |DJ(ξ₁; ξ)− DJ(ξ₂; ξ)| ≤ L ξ₁− ξ₂_Xξ_X,

(3) |Dψj,t(ξ1; ξ)− Dψj,t(ξ2; ξ)| ≤ L ξ1− ξ2_Xξ_X, and

(4) |ζ(ξ₁, ξ)− ζ(ξ₂, ξ)| ≤ L ξ₁− ξ₂_X.

Proof. Condition (1) follows after proving the boundedness of the vector ﬁelds,

the Lipschitz continuity of the vector ﬁelds and their derivatives, and the boundedness and Lipschitz continuity of the state transition matrix. Conditions (2) and (3) follow immediately from condition (1).

To prove condition (4), ﬁrst notice that for{xi} k i=1,{yi} k i=1⊂ R, max i_∈{1,...,k}xi− maxi_∈{1,...,k}yi ≤ max i_∈{1,...,k}|xi− yi| . (5.11)

Let Ψ+(ξ) = max{0, Ψ(ξ)} and Ψ−(ξ) = max{0, −Ψ(ξ)}. Then, (5.12) ζ(ξ₁, ξ)− ζ(ξ₂, ξ) ≤maxDJ(ξ₁; ξ− ξ₁)− DJ(ξ₂; ξ− ξ₂)+Ψ+(ξ₂)− Ψ+(ξ₁), max j∈J , t∈[0,1] Dψj,t(ξ1; ξ− ξ1)− Dψj,t(ξ2; ξ− ξ2) + γΨ−(ξ₂)− Ψ−(ξ₁) +ξ− ξ₁_X− ξ− ξ₂_X.

Now, note thatξ− ξ₁_X− ξ− ξ₂_X ≤ ξ₁− ξ₂_X. Also,

DJ (ξ₁; ξ− ξ₁)− DJ(ξ₂; ξ− ξ₂) ≤ DJ (ξ₁; ξ− ξ₁)− DJ(ξ₂; ξ− ξ₁) (5.13) +∂h0 ∂x φ₁(ξ₂)Dφ₁(ξ₂; ξ₂− ξ₁) ≤ L ξ1− ξ2_X,

where we used the linearity of DJ (ξ;·), condition (2), the fact that ξ and ξ₁ are bounded, the Cauchy–Schwartz inequality, the fact that the derivative of h₀ is also bounded, and since there exists a C > 0 such thatDφ₁(ξ₂; ξ₂− ξ₁)₂≤ C ξ₂− ξ₁_X for all ξ₁, ξ₂ ∈ Xr. The last result follows from Theorem 3.3 and the boundedness of

the state transition matrix. By an identical argument,Dψj,t(ξ₁; ξ−ξ₁)−Dψj,t(ξ₂; ξ−

ξ₂) ≤Lξ₁− ξ₂_X. Finally, the result follows since Ψ+(ξ) and Ψ−(ξ) are Lipschitz continuous.

The following theorem shows that g is a well-deﬁned function.

(15)

Theorem 5.5. For each ξ∈ X_p, the map ξ→ ζ(ξ, ξ) + V(ξ− ξ), with domain

Xr, is strictly convex and has a unique minimizer.

Proof. The map is strictly convex since it is the sum of the maximum of linear

functions of ξ− ξ (which is convex), the total variation of ξ− ξ (which is convex), and theX -norm of ξ− ξ (which is strictly convex).

Next, let ¯θ(ξ) = ζ(ξ, ξ) + V(ξ− ξ) | ξ∈ L2([0, 1], U )× L2([0, 1], Σr)

. Note that|ζ(ξ, ξ)| → ∞ as ξ_L2 → ∞. Then, by Proposition II.1.2 in [10], there exists a

unique minimizer ξ∗ of ¯θ(ξ). But ξ∗ ∈ Xr, since ¯θ(ξ) and ζ(ξ, ξ∗) are bounded. The

result follows after noting thatXr⊂ L2([0, 1], U )× L2([0, 1], Σr); thus ξ∗ is also the

unique minimizer of θ(ξ).

Finally, we prove that θ encodes a necessary condition for optimality.

Theorem 5.6. _{The function θ is nonpositive valued. Moreover, if ξ} ∈ X_p _{is a}

local minimizer of Pp as in Deﬁnition 4.4, then θ(ξ) = 0.

Proof. Notice that ζ(ξ, ξ) = 0 and θ(ξ)≤ ζ(ξ, ξ), which proves the ﬁrst part.

To prove the second part, we begin by making several observations. Given ξ∈ Xr

and λ∈ [0, 1], using the mean value theorem and condition (2) in Lemma 5.4 we have that there exists L > 0 such that

(5.14) Jξ + λ(ξ− ξ)− J(ξ) ≤ λ DJξ; ξ− ξ+ Lλ2ξ− ξ2_X.

Similarly, if A(ξ) = arg maxhj

φt(ξ)

| (j, t) ∈ J × [0, 1], then there exists a pair (j, t)∈ Aξ + λ(ξ− ξ)such that using condition (3) in Lemma 5.4,

(5.15) Ψξ + λ(ξ− ξ)− Ψ(ξ) ≤ λ Dψj,t

ξ; ξ− ξ+ Lλ2ξ− ξ2_X.

Also, by condition (1) in Assumption 3.4,

(5.16) Ψξ + λ(ξ− ξ)− Ψ(ξ) ≤ L max

t_∈[0,1]

φt

ξ + λ(ξ− ξ)− φt(ξ)₂.

We proceed by contradiction, i.e., we assume that θ(ξ) < 0 and show that for each

ε > 0 there exists ˆξ∈ Nw(ξ, ε)∩

_¯

ξ∈ Xp| Ψ(¯ξ) ≤ 0

such that J ( ˆξ) < J (ξ). Note that

since θ(ξ) < 0, g(ξ)= ξ, and also, as a result of Theorem 4.2, for eachξ + λ(g(ξ)− ξ) ∈ Xr and ε > 0 there exists a ξλ ∈ Xp such that x(ξλ)− x(ξ+λ(g(ξ)−ξ))_L2 <

ε. Now, letting ε = −λθ_2L(ξ) > 0, using Lemma 5.1, and adding and subtracting x(ξ+λ(g(ξ)−ξ)), (5.17) x(ξλ)_{− x}(ξ) L2 ≤ −θ(ξ) 2L + Lg(ξ) − ξX λ.

Also, by (5.14), after adding and subtracting Jξ + λ(g(ξ)− ξ),

(5.18) J (ξλ)− J(ξ) ≤

θ(ξ)λ

2 + 4A

2_Lλ2_,

(16)

where A = maxu₂+ 1| u ∈ Uand we used the fact thatξ − ξ2_X ≤ 4A2 and DJ (ξ; ξ− ξ) ≤ θ(ξ). Hence for each λ ∈0,−θ(ξ)_8A2L

, J (ξλ)− J(ξ) < 0. Using a similar

argument, and after adding and subtracting Ψξ + λ(g(ξ)− ξ), we have (5.19) Ψ(ξλ)≤ L max t_∈[0,1] φt(ξλ)− φt ξ + λ(g(ξ)− ξ)₂+ Ψ(ξ) +θ(ξ)− γΨ(ξ)λ + 4A2Lλ2 ≤θ(ξ)λ 2 + 4A 2_Lλ2_{+ (1}_{− γλ)Ψ(ξ),}

where we used the fact that Dψj,t(ξ; ξ− ξ) ≤ θ(ξ) − γΨ(ξ) for each j and t. Hence

for each λ∈0, min−θ(ξ)_8A2L,γ1

, Ψ(ξλ)≤ (1 − γλ)Ψ(ξ) ≤ 0.

Summarizing, suppose ξ ∈ Xp is a local minimizer of Pp and θ(ξ) < 0. For each

ε > 0, by choosing any (5.20) λ∈ 0, min −θ(ξ) 8A2L, 1 γ, 2Lε 2L2g(ξ) − ξ_X − θ(ξ) ,

we can construct a ξλ ∈ Xp such that ξλ ∈ Nw(ξ, ε), J (ξλ) < J (ξ), and Ψ(ξλ)≤ 0.

Therefore, ξ is not a local minimizer of Pp, which is a contradiction.

5.2. Approximating relaxed inputs. In this subsection, we prove that ρN

takes elements from the relaxed optimization space to the pure optimization space with a known bound for the quality of approximation.

Lemma 5.7. _{For each N}∈ N and d ∈ D_r_,F_N_(d)∈ D_r _andP_NF_N_(d)∈ D_p_.

Proof. Note ﬁrst that [FN(d)]i(t)∈ [0, 1] for each t ∈ [0, 1] due to the result in

section 3.3 in [14]. Next, observe that q_i₌₁[FN(d)]i(t) = 1 for each t ∈ [0, 1] since

the wavelet approximation is linear and

q i=1 [FN(d)]i= q i=1 di,1 + N k=0 2k₋₁ j=0 di, bk,j bk,j bk,j2_L2 (5.21) = 1, 1 + N k=0 2k₋₁ j=0 1, bk,j bk,j bk,j2_L2 = 1,

where the last equality holds since 1, bk,j = 0 for each k, j.

Next, observe thatPN

FN(d)

i(t)∈ {0, 1} for each t ∈ [0, 1] due to the

deﬁni-tion ofPN. Also note that

q i=1 PN FN(d)

i(t) = 1 for each t∈ [0, 1]. This follows

because of the piecewise continuity of FN(d) over intervals of size 2−N and because

q

i=1[FN(d)]i(t) = 1, which implies that {[SN,k,i−1, SN,k,i)}_{(i,k)∈{1,...,q}×{0,...,2}N_−1}

forms a partition of [0, 1]. Our desired result follows.

The next two lemmas lay the foundations of our extension to the chattering lemma and rely upon a bounded variation assumption in order to construct a rate of approximation, which is critical while choosing a value for the parameter ¯β in order to

ensure that if θ(ξ) < 0, then ν(ξ) <∞. This is proved in Lemma 5.14 and guarantees that step 7 of Algorithm 4.1 is well-deﬁned.

Lemma 5.8. Let f ∈ L2([0, 1],R) ∩ BV ([0, 1], R); then

(5.22) f − FN(f )L2≤ 1 2 1 √ 2 N V(f ).

(17)

Proof. Since L2is a Hilbert space and the collection {bk,j}_k,j is a basis, then (5.23) f = f, 1 + ∞ k₌₀ 2k₋₁ j=0 f, bk,j bk,j bk,j2_L2 .

Note thatbk,j2L2 = 2−kand that t

0bk,j(s)ds ≤2−k−1 for each t∈ [0, 1]. Thus, by

Proposition 3.1, f, bk,j ≤2−k−1V

f1_[j2−k,(j+1)2−k]

. Finally, Parseval’s identity for Hilbert spaces (Theorem 5.27 in [11]) implies that

(5.24) f − FN(f )2L2 ≤ ∞ k_=N+1 2−k−2 2k₋₁ j=0 Vf1_[j2−k,(j+1)2−k] ₂ ≤ 2−N−2_{V(f )}2_,

as desired, where the last inequality follows by the deﬁnition in (3.3).

Lemma 5.9. _{There exists K > 0 such that for each d}∈ D_r_{and f} ∈ L2_{([0, 1],}Rq₎∩

BV ([0, 1],Rq), (5.25) d− PN FN(d) , f ≤K 1 √ 2 N fL2V(d) + 1 2 N V(f ) .

Proof. To simplify our notation, let pi,k = [FN(d)]i

_k 2N , Si,k = i j₌₁pj,k, and Ai,k = k_+S_i−1,k 2N , k_+S_i,k 2N . Note thattk 2N pi,k− 1A_i,k(s)ds ≤ p_i,k 2N for each t∈ [0, 1].

Thus, by Proposition 3.1, the deﬁnition in (3.3), and H¨older’s inequality, (5.26) FN(d)− PN FN(d) , f= 2N₋₁ k₌₀ q i=1 k+1 2N k 2N pi,k− 1A_i,k(t) fi(t)dt ≤ 1 2NV(f ).

The ﬁnal result follows from the inequality above, Lemma 5.8, and the Cauchy– Schwartz inequality.

The following theorem is an extension of the chattering lemma for functions of bounded variation. As explained earlier, this theorem is key to our result since it provides a rate of convergence for the approximation of trajectories.

Theorem 5.10. _{There exists K > 0 such that for each ξ}∈ X_r _{and t}∈ [0, 1],

(5.27) φt ρN(ξ) − φt(ξ)₂≤ K 1 √ 2 N V(ξ) + 1.

Proof. Let ξ = (u, d). To simplify our notation, let us denote uN = FN(u)

and dN =PN

FN(d)

; thus ρN(ξ) = (uN, dN). First note that since f is Lipschitz

continuous, (5.28) x(uN,d_N)_(t)_{− x}(u,d_N)(t) 2≤ L ₁ 0 x(uN,d_N)_(s)_{− x}(u,d_N)(s) 2+uN(s)− u(s)₂ds ≤LeL √ 2 2 1 √ 2 N V(u),

(18)

where the last inequality follows by Bellman–Gronwall’s inequality together with Lemma 5.8. On the other hand,

(5.29) x(u,dN)_(t)_{− x}(u,d)(t) 2≤ e L ₀1q i=1 [dN]i(s)− di(s) ft, x(u,d)(s), u(s), ei ds 2 ,

where again we used the Lipschitz continuity of f and Bellman–Gronwall’s inequality. Let vk,i(t) =

ft, x(u,d)(t), u(t), ei

k and vk = (vk,1, . . . , vk,q); then vk is of bounded

variation. Indeed, x(ξ) is of bounded variation (by Corollary 3.33 and Lemma 3.34 in [11]); thus there exists C > 0 such that Vx(ξ)≤ C and vk,i_L2 ≤ C, and using

the Lipschitz continuity of f , we get that for each i∈ Q, V(vk,i)≤ L

1 + C + V(u). Hence, Lemma 5.9 implies that there exists K > 0 such that for each k∈ {1, . . . , n},

d− dN, vk ≤K 1 √ 2 N CV(d) + 1 2 N 1 + C + V(u) . (5.30)

The result follows easily from these three inequalities.

5.3. Convergence of the algorithm. To prove the convergence of our

algo-rithm, we employ a technique similar to the one prescribed in section 1.2 in [22]. Summarizing the technique, one can think of an algorithm as a discrete-time dynam-ical system, whose desired stable equilibria are characterized by the stationary points of its optimality function, i.e., points ξ ∈ Xp, where θ(ξ) = 0, since we know from

Theorem 5.6 that all local minimizers are stationary. Before applying this line of rea-soning to our algorithm, we present a simpliﬁed version of this argument for a general unconstrained optimization problem in the interest of clarity.

Definition 5.11 (Deﬁnition 2.1 in [3]2). LetS be a metric space, and consider

the problem of minimizing the cost function J :S → R. A function Γ: S → S has the uniform suﬃcient descent property with respect to an optimality function θ :S →

(−∞, 0] if for each C > 0 there exists a δC > 0 such that for every x ∈ S with

θ(x) <−C, JΓ(x)− J(x) ≤ −δC.

A sequence of points generated by an algorithm satisfying the uniform suﬃcient descent property can be shown to approach the zeros of the optimality function.

Theorem 5.12 (Proposition 2.1 in [3]). LetS be a metric space and J : S → R

be a lower bounded function. Suppose that Γ :S → S satisﬁes the uniform suﬃcient descent property with respect to θ :S → (−∞, 0]. If {xj}j_∈N is constructed such that

(5.31) xj+1=

Γ(xj) if θ(xj) < 0,

xj if θ(xj) = 0,

then limj→∞θ(xj) = 0.

Proof. Suppose that lim infj→∞θ(xj) = −2ε < 0. Then there exists a

subse-quence {xj_k}k∈N such that θ(xj_k) <−ε for each k ∈ N. Deﬁnition 5.11 implies that

there exists δε such that for each k∈ N, J(xj_k+1)− J(xj_k)≤ −δε. But this is a

con-tradiction, since J (xj+1)≤ J(xj) for each j ∈ N, and thus J(xj)→ −∞ as j → ∞,

even though J is lower bounded.

2_{Versions of this result can also be found in [23, 35].}

(19)

Note that Theorem 5.12 does not assume the existence of accumulation points of the sequence{xj}_j_∈N. This is critical in inﬁnite-dimensional optimization problems

where the level sets of a cost function may not be compact. Our proof of convergence of the sequence of points generated by Algorithm 4.1 does not make explicit use of Theorem 5.12 since ρN and the constraints require special treatment, but the main

argument of the proof follows from a similar principle.

We begin by showing that μ and ν are bounded when θ is negative.

Lemma 5.13. Let α, β∈ (0, 1). For each δ > 0 there exists M_δ∗<∞ such that if

θ(ξ)≤ −δ for ξ ∈ Xp, then μ(ξ)≤ Mδ∗.

Proof. Assume that Ψ(ξ) ≤ 0. Recall that DJξ; g(ξ)− ξ≤ θ(ξ); then using

(5.14),

Jξ + βk(g(ξ)− ξ)− J(ξ) − αβkθ(ξ)≤ −(1 − α)βkθ(ξ) + 4A2Lβ2k,

(5.32)

where A = maxu₂+ 1| u ∈ U. Hence, for each k∈ N such that βk _≤ (1−α)δ 4A2L we

have that Jξ + βk(g(ξ)− ξ)− J(ξ) ≤ αβkθ(ξ). Similarly, since Dψj,t

ξ; g(ξ)− ξ≤ θ(ξ) for each (j, t)∈ J × [0, 1], using (5.15),

Ψξ + βk(g(ξ)− ξ)− Ψ(ξ) + βkγΨ(ξ)− αθ(ξ)≤ −(1 − α)βkθ(ξ) + 4A2Lβ2k; (5.33)

hence, for each k∈ N such that βk _{≤ min}(1−α)δ 4A2L ,1γ , Ψξ +βk_(g(ξ)_−ξ)_−αβk_θ(ξ)_≤ 1− βk_γ_Ψ(ξ)_{≤ 0.} Now, if Ψ(ξ) > 0, Ψξ + βk(g(ξ)− ξ)− Ψ(ξ) − αβkθ(ξ)≤ −(1 − α)δβk+ 4A2Lβ2k. (5.34)

Hence, for each k∈ N such that βk _≤ (1−α)δ 4A2L , Ψ

ξ + βk_(g(ξ)_{− ξ)}_{− Ψ(ξ) ≤ αβ}k_θ(ξ).

Finally, let M_δ∗= 1 + maxlog_β(1−α)δ_4A₂_L , log_β_γ1, then we get that μ(ξ)≤ M_δ∗ as desired.

The following result relies on the rate of convergence result proved in Theorem 5.10 that follows from Lemmas 5.8 and 5.9 and from the assumption that the relaxed optimization space is of bounded variation.

Lemma 5.14. Let α, β, ω ∈ (0, 1), ¯α ∈ (0, ∞), ¯β ∈ √1

2, 1

, and ξ ∈ Xp. If

θ(ξ) < 0, then ν(ξ) <∞.

Proof. To simplify our notation let us denote M = μ(ξ) and ξ= ξ +βM_g(ξ)_−ξ_.

Theorem 5.10 implies that there exists K > 0 such that

JρN(ξ)

− J(ξ₎_{≤ K L 2}−N

2V(ξ) + 1,

(5.35)

where L is the Lipschitz constant of h₀. LetA(ξ)= arg maxhj

φt(ξ) | (j, t) ∈ J × [0, 1]; then given (j, t)∈ AρN(ξ) , ΨρN(ξ) − Ψ(ξ₎_{≤ ψ} j,t ρN(ξ) − ψj,t(ξ)≤ K L 2− N 2V(ξ) + 1. (5.36)

Hence, there exists N₀ ∈ N such that K L 2−N2V(ξ) + 1 ≤ −¯α ¯βN_{θ(ξ) for each}

N ≥ N₀. Also, there exists N₁≥ N₀such that for each N ≥ N₁, ¯α ¯βN _{≤ (1 − ω)αβ}M_.

(20)

Now suppose that Ψ(ξ)≤ 0; then for each N ≥ N₁the following inequalities hold: JρN(ξ) − J(ξ) = JρN(ξ) − J(ξ_{) + J (ξ}₎_{− J(ξ) ≤}_αβM _{− ¯α ¯β}N_θ(ξ), (5.37) ΨρN(ξ) = ΨρN(ξ) − Ψ(ξ_{) + Ψ(ξ}₎_≤_αβM _{− ¯α ¯β}N_θ(ξ)_{≤ 0.} (5.38)

Similarly, if Ψ(ξ) > 0, then, using the same argument as above, we have that ΨρN(ξ)

− Ψ(ξ) ≤αβM− ¯α ¯βNθ(ξ).

(5.39)

Therefore, from (5.37), (5.38), and (5.39), it follows that ν(ξ)≤ N₁ as desired. The following lemma proves that once Algorithm 4.1 ﬁnds a feasible point, every point generated afterward is also feasible. We omit the proof since it follows directly from the deﬁnition of ν.

Lemma 5.15. Let {ξ_i}

i∈N be a sequence generated by Algorithm 4.1. If there

exists i₀∈ N such that Ψ(ξi0)≤ 0, then Ψ(ξi)≤ 0 for each i ≥ i0.

Employing these preceding results, we can prove that every sequence produced by Algorithm 4.1 asymptotically satisﬁes our optimality condition.

Theorem 5.16. _Let {ξ_i}

i_∈N be a sequence generated by Algorithm 4.1. Then

limi→∞θ(ξi) = 0.

Proof. If the sequence produced by Algorithm 4.1 is ﬁnite, then the theorem is

trivially satisﬁed, so we assume that the sequence is inﬁnite. Suppose the theorem is not true; then lim infi→∞θ(ξi) = −2δ < 0 and therefore there exists k0 ∈ N and a

subsequence{ξi_k}k∈N such that θ(ξi_k)≤ −δ for each k ≥ k0. Also, recall that ν(ξ)

was chosen such that given μ(ξ), αβμ(ξ)− ¯α ¯βν(ξ) ≥ ωαβμ(ξ).

From Lemma 5.13 we know that there exists M_δ∗, which depends on δ, such that

βμ(ξ) _{≥ β}M_δ∗_{. Suppose that the subsequence} _{ξ

i_k}k∈N is eventually feasible; then by

Lemma 5.15, the sequence is always feasible. Thus, (5.40) JΓ(ξik) − J(ξik)≤ αβμ(ξ)− ¯α ¯βν(ξ)θ(ξik)≤ −ωαβ μ_(ξ)_δ_{≤ −ωαβ}M_δ∗_δ.

This inequality, together with the fact that J (ξi₊₁)≤ J(ξi) for each i∈ N, implies

that lim infk→∞J (ξi_k) =−∞, but this is a contradiction since J is lower bounded,

which follows from Theorem 3.3.

The case when the sequence is never feasible is analogous after noting that since the subsequence is infeasible, then Ψ(ξi_k) > 0 for each k ∈ N, establishing a similar

contradiction.

As described in the discussion that follows Theorem 5.12, this result does not prove the existence of accumulation points to the sequence generated by Algorithm 4.1, but only shows that the generated sequence asymptotically satisﬁes a necessary con-dition for optimality.

Acknowledgments. We are grateful to Maryam Kamgarpour and Claire

Tom-lin, who helped write our original papers on switched system optimal control. We are also grateful to Magnus Egerstedt and Yorai Wardi, whose comments on our original papers were critical to our realizing the deficiencies of the sufficient descent argument. The chattering lemma argument owes its origin to a series of discussions with Ray DeCarlo. The contents of this article reflect a longstanding collaboration with Elijah Polak.