Appendix D. Universal approximation property

v=Df(y^∗(y0))v+BDF^∗(y^∗(y0))v+δv, v(0) =v0 (C.16) withv0∈Rⁿ, δy∈L²(I;Rⁿ)admits a solutionv∈W∞andkvkW∞ ≤C(kδvkL²(0,∞;Rⁿ)+|δy0|).Herey^∗(y0))∈ W∞ denotes the unique solution to (3.2).

Proof. For the linearized state equation with linearization at the origin, the optimal feedback law is provided by −_β¹B^>Π, with Π given in (C.2). In particular the linear closed loop system is exponentially stable and we have for somec >0 that

((A− 1

βBB^>Π)y, y)_Rn≤ −c|y|²_Rn, for ally∈Rⁿ. (C.17) For thisc >0 we determineρ >0 such that

|D(f(y^∗(y₀)(t)) +BF^∗(y^∗(y₀)(t)) )−D(f(0) +BF^∗(0))| ≤ ^c₂ (C.18) for allt≥0 andy₀∈B_ρ. This choice is possible due to the a-priori estimate (3.5). We next use the relationship between the Riccati operator and the second derivative of the value function as

D(f(0) +BF^∗(0)) =D(f(0)−1

βBB^>∇V(0)) =A− 1

βBB^>Π.

Finally we turn to the estimate the asymptotic behavior of the solutions to (C.16). Taking the inner product of (C.16) withy(t) we obtain by (C.17) and (C.18), fory₀∈B_ρ

1 2 d

dt|v(t)|²≤(Df(y^∗(y₀)(t)) +BDF^∗(y^∗(y₀)(t))v(t), v(t)) + (δv(t), v(t))

≤ −₂^c|v(t)|²+ (δv(t), v(t)).

Hence we have ¹₂_dt^d|v(t)|² ≤ −₄^c|v(t)|²+|δv(t)|². This implies that |v(t)|²≤e⁻^ct²|v0|²+ 2Rt

0e^c(s−t)² |δv(s)|²ds, for all t≥0, ally₀∈B_ρ, and all v₀∈Rⁿ, δv∈L²(I;Rⁿ), and hence kvk²_L2(I;Rⁿ)≤ ²_c(|v0|²+ 2kδvk²_L2(I;Rⁿ)).

From here the desired estimate follows.

Appendix D. Universal approximation property

In this last section we give the technical proof of Proposition5.4.

Proof of Proposition 5.4. In many aspects we can profit from Theorem 1 of [24], and its proof, and from [31]

adapted to our purposes. Here, however, we only require a mild regularity assumption forψ, andwand bare restricted to subsets rather than allowed to vary in all ofRⁿandR. These situations are also commented on in the cited references but not treated in detail.

Step 1. We recall that the set of polynomials in n-variables is a dense linear subspace of C¹(Rⁿ,R), i.e.for every ϕ∈ C¹(Rⁿ,R) and every compact set K⊂Rⁿ there exists a sequence pn of polynomials in (x1, . . . , xn) such that limn→∞kϕ−pnk_C1(K,R)= 0, see page 169 of [37].

Step 2.We introduce

M(W) =span{ϕ(w· •) :ϕ∈ C¹(R,R), w∈ W} ⊂ C¹(Rⁿ,R).

Following [25] we introduce the set of homogenous polynomials of degreek:

H_kⁿ =





 X

|m|1=k

cms^m:cm∈R





 ,

as well as the set of all homogenous polynomials in nvariables,

Hⁿ=

∞

[

k=0

H_kⁿ,

where the usual multi-index notation is used with m = (m1, . . . , mn) ∈ Zⁿ,|m|1 = Pn

i=1mi, ands^m = s^m₁¹. . . s^m_nⁿ. In ([25], Proof, Thm. 2.1) it is verified that H_kⁿ =span{(d·x)^k :d∈ W} ⊂ M(W) for all k.

Here it is used that W contains an open set and thus there does not exist a nontrivial homogenous polynomial vanishing on it. ThusM(W) containsHⁿ and thus all polynomials. By Step 1 we have thatM(W) is dense in C¹(Rⁿ,R) in theC¹-norm.

Step 3.Letχbe inC₀^∞(R,R), the space ofC^∞(R,R)-functions with compact support. Sinceψis not a polyno-mial, χcan be chosen such thatψ∗χ=R

ψ(· −y)χ(y)dy is not a polynomial as well. This follows from Steps 6 and 7 of proof of Theorem 1 on [24]. Let us set ˜ψ=ψ∗χ. Then ˜ψ∈ C^∞(R,R).

We verify in this step that Σ˜1=span{ψ(λx˜ +b) :λ∈Λ\0, b∈B}is dense inC¹(R,R). Here Λ is an open neighborhood of the origin with the property thatλ∈Λ implies thatλw∈ W for allw∈ W, a property, which will be used in Step 6 below.

Note that

d^h(x) := 1

2h( ˜ψ((λ+h)x+b)−ψ((λ˜ −h)x+b))∈Σ˜1

for every λ∈Λ, b∈B, and|h| sufficiently small withh6= 0. Moreover d

dxd^h(x) = 1

2h((λ+h) ˜ψ⁰((λ+h)x+b)−(λ−h) ˜ψ⁰((λ−h)x+b)),

and thus lim_h→∞d^h(x) = _dλ^dψ(λx˜ +b) in theC¹-norm on compact subsets ofR. We have _dλ^dψ(λx˜ +b)∈cl(Σ˜1), where cl(Σ˜1) denotes the closure of Σ˜1 with respect to theC¹-topology. Note moreover that _dx^d _dλ^dψ(λx˜ +b) = ψ˜⁰(λx+b) +λxψ˜⁰⁰(λx+b). In a similar way we argue that

( d

dλ)^(k)ψ(λx˜ +b)∈cl(Σ˜1)

for all k = 1,2, . . ., λ ∈ Λ and b ∈ B. We calculate further (_dλ^d)^(k)ψ(λx˜ +b) = x^kψ˜^(k)(λx +b), and

dx(_dλ^d)^(k)ψ(λx˜ +b) =kx^(k−1)ψ˜^(k)(λx+b) +x^kλψ˜^(k+1)(λx+b) fork= 2, . . .. In particular we have

x^kψ˜^(k)(b)∈cl(Σ˜1), for allk= 1,2, . . . , b∈B, (D.1) a property which will be used in Step 5 below.

Now we utilize that ˜ψ is not a polynomial and hence there existb0∈B such that ˜ψ^(k)(b0)6= 0, see [9, 13], ([31], p. 156), for allk. Consequently we find that

x^kψ˜^(k)(b0) = ( d

dλ)^(k)ψ(λx˜ +b)|λ=0,b=b₀ ∈cl( ˆΣ˜1)

for all k= 0,1, . . ., where we have set ˆΣ˜1=span{ψ(λx˜ +b) :λ∈Λ, b∈B}, which differs from Σ˜1 only with respect to the elementλ= 0. Thuscl( ˆΣ˜1) contains all polynomials. Since they are dense inC¹(R,R), we have that ˆΣ˜1is dense inC¹(R,R) as well. Thus for eachg∈ C¹(R,R) and each compact setK∈Rwe have the following property: For allε >0 there existm∈N, and λi∈Λ,bi ∈B such thatkg−Pm

i=1ψ(λ˜ i·+bi)k_C1(K,R)≤ε.

Since ˜ψ∈ C^∞(R,R) and in particular ˜ψ∈ C¹(R,R), it follow that Σ˜1 is dense in C¹(R,R). In fact, if in the above expansion there are terms withλi = 0, they can be replaced by nontrivial, sufficiently smallλi, such that kg−Pm

i=1ψ(λ˜ i· •+bi)k_C1(K,R)≤2ε.

As a final note to this step, we point out that if ψ∈ C^∞, then the regularisation by convolution is not necessary, the last estimate holds with ˜ψreplaced byψ, and we can directly continue the proof at Step 6.

Step 4.Let us chooseα∈(0,¯b−b, and defineB−α= (b+α,¯b−α). ThenB−αis a nontrivial interval contained in B₀. Further we choose a sequence of mollifiersχ_n ∈ C₀^∞(R,R) with the properties that

– ψ∗χn→ψin L^p(K,R) for somep∈[2,∞) and every compact setK⊂R, and – with the support ofχn contained in (−α, α).

We show in this step that ψ∗χ_n∈cl(span{ψ(·+b) :b∈B₀}), for eachn. For this purpose we verify that for each n∈N, for each compact set K⊂R, and each ε >0, there existm∈N,{b_i}^m_i=1⊂B₀, and{µ_i}^m_i=1⊂R such that

kψ∗χn−

i=1

µiψ(· −bi)k_W^1,∞_(K;_R₎

=k Z α

−α

ψ(· −ξ)χn(ξ)dξ−

i=1

µiψ(· −bi)kW^1,∞(K;R)≤3ε.

(D.2)

We setb_i=−α+^2iα_m, fori= 0, . . . , m,

∆i= [b_i−1, bi], µi= ∆iχn(bi), fori= 1, . . . , m, (D.3) and choose δ=δ(ε) such that

10δkψ⁰k_L^∞_(K_α_;_R₎kχnk_C(_R_,_R₎≤ε, (D.4) where Kα ={s=s1+s2 :s1∈K, s2∈(−α, α)}. By assumption there exist r(δ)∈N intervals {Ij}^r(δ)_j=1 with µ(U)≤δ, such that ψ is uniformly continuously differentiable on Kα\U. Here µ(U) denotes the Lebesgue measure of U =Sr(δ)

i=1Ij. Next we choosemsufficiently large such that αr(δ)

m < δ, (D.5)

and forξ1 andξ2∈Kαwith|ξ1−ξ2| ≤ ^2α_m

|χn(ξ₁)−χ_n(ξ₂)| ≤

2αkψkW^1,∞(K_α;R)

, (D.6)

|ψ(ξ₁)−ψ(ξ₂)| ≤

To estimate (D.2) we first use the triangle inequality

kψ∗χn−

where I denotes the next to the last equality in the above estimate. To estimateI we proceed as follows. For a.e. x∈K we consider the set of intervals characterized by indicesi∈ I if and only if (x−∆i)∩U =∅. Then find that the measure of such intervals satisfies µ(S

i∈I^C∆i)≤δ+^4α_mr(δ)≤5δ, where we used (D.5). Using

(D.4) we obtain the following inclusions, which hold for eachn:

x^k(ψ∗χn)^(k)(b)⊂cl(span{ψ∗χn(λ·+b) :λ∈Λ\0, b∈B_−α})

⊂cl(span{ψ(λ·+b) :λ∈Λ\0, b∈B0})

⊂cl(span{ψ(λ·+b) :λ∈Λ\0, b∈B}),

(D.10)

where the first inclusion one holds for each b∈ B_−α and each k= 0,1, . . . . The last inclusion in (D.10) is obvious. The second inclusion is a consequence of Step 4. For k= 1,2, . . . the first one follows from (D.1) in Step 3 withBreplaced byB_−αand ˜ψ=ψ∗χn. Note that in Step 3 the requirement that ˜ψis not a polynomial, is only used after (D.1), and hence is applicable for ˜ψ=ψ∗χn. Fork= 0 the first inclusion can be achieved by appropriate choice of small λ6= 0.

If Σ1 =span{ψ(λ·+b) :λ∈Λ, b∈B} is not dense in C¹(R,R), then x^k⁰ is not in ¯Σ1 for some k⁰. From

We setSi={x∈Rⁿ|wi·x∈Si}. Sincewi6= 0 it follows thatµ(Si) = 0, seee.g.[32]. We therefore have where we used (D.13). This estimate together with (D.11) imply

sup

[1] R. Bellman, Adaptive control processes: A guided tour. (A RAND Corporation Research Study). Princeton University Press, XVI, Princeton, N.J. (1961).

[2] D. Bertsekas, Multiagent rollout algorithms and reinforcement learning (2019).

[3] D. Bertsekas, Reinforcement Learning and Optimal Control. Athena Scientific (2019).

[4] T. Breiten, K. Kunisch and L. Pfeiffer, Infinite-horizon bilinear optimal control problems: Sensitivity analysis and polynomial feedback laws.SIAM J. Control Optim.56(2018) 3184–3214.

[5] T. Breiten, K. Kunisch and L. Pfeiffer, Numerical study of polynomial feedback laws for a bilinear control problem.Math.

Control Relat. Fields8(2018) 557–582.

[6] T. Breiten, K. Kunisch and L. Pfeiffer, Feedback stabilization of the two-dimensional Navier-Stokes equations by value function approximation, tech. rep., University of Graz (2019). Preprinthttps://arxiv.org/abs/1902.00394.

[7] E. Casas and K. Kunisch, Stabilization by sparse controls for a class of semilinear parabolic equations. SIAM J. Control Optim.55(2017) 512–532.

[8] Y.T. Chow, W. Li, S. Osher and W. Yin, Algorithm for Hamilton-Jacobi equations in density space via a generalized Hopf formula (2018).

[9] E. Corominas and F. Sunyer Balaguer, Conditions for an infinitely differentiable function to be a polynomial.Revista Mat.

Hisp.-Amer.14(1954) 26–43.

[10] R. Curtain and H. Zwart, An Introduction to Infinite-Dimensional Linear Systems Theory. Springer-Verlag (2005).

[11] J. Diestel and J.J. Uhl, Jr., Vector measures. With a foreword by B. J. Pettis, Mathematical Surveys, No. 15. American Mathematical Society, Providence, R.I. (1977).

[12] S. Dolgov, D. Kalise and K. Kunisch, Tensor decomposition for high-dimensional Hamilton-Jacobi-Bellman equations. To appear in:Siam J. Sci. Comput.(2019).

[13] W.F. Donoghue, Jr., Distributions and Fourier transforms. Vol. 32 ofPure and Applied Mathematics. Academic Press, New York (1969).

[14] R.E. Edwards, Functional Analysis. Theory and Applications. Holt, Rinehart and Winston, New York (1965).

[15] M. Falcone and R. Ferretti, Semi-Lagrangian approximation schemes for linear and Hamilton-Jacobi equations. Society for Industrial and Applied Mathematics SIAM, Philadelphia, PA (2014).

[16] W.H. Fleming and H.M. Soner, Controlled Markov processes and viscosity solutions. Vol. 25 of Stochastic Modelling and Applied Probability. Springer, New York, second ed. (2006).

[17] J. Garcke and A. Kr¨oner, Suboptimal feedback control of PDEs by solving HJB equations on adaptive sparse grids.J. Sci.

Comput.70(2017) 1–28.

[18] S. Garreis and M. Ulbrich, Constrained optimization with low-rank tensors and applications to parametric problems with PDEs.SIAM J. Sci. Comput.39(2017) A25–A54.

[19] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition. PreprintarXiv:1512.03385(2015).

[20] K. Hornik, Multilayer feedforward networks are universal approximators.Neural Netw.2(1989) 359–366.

[21] D. Kalise and K. Kunisch, Polynomial approximation of high-dimensional Hamilton-Jacobi-Bellman equations and applications to feedback control of semilinear parabolic PDEs.SIAM J. Sci. Comput.40(2018) A629–A652.

[22] D. Kalise, K. Kunisch and Z. Rao, eds., Hamilton-Jacobi-Bellman equations. Vol. 21 ofRadon Series on Computational and Applied Mathematics. De Gruyter, Berlin (2018).

[23] D.P. Kouri, M. Heinkenschloss, D. Ridzal and B.G. van Bloemen Waanders, A trust-region algorithm with adaptive stochastic collocation for PDE optimization under uncertainty.SIAM J. Sci. Comput.35(2013) A1847–A1879.

[24] M. Leshno, V.Y. Lin, A. Pinkus and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function.Neural Networks6(1993) 861–867.

[25] V.Y. Lin and A. Pinkus, Fundamentality of ridge functions.J. Approx. Theory75(1993) 295–311.

[26] J. Lions and E. Magenes, Non-homogeneous Boundary Value Problems and Applications. Vol. I/II. Die Grundlehren der mathematischen Wissenschaften in Einzeldarstellungen.Springer-Verlag, Berlin (1972).

[27] P.L. Lions and J.-C. Rochet, Hopf formula and multitime Hamilton-Jacobi equations.Proc. Am. Math. Soc.96(1986) 79–84.

[28] T. Nakamura-Zimmerer, Q. Gong and W. Kang, Adaptive deep learning for high-dimensional Hamilton-Jacobi-bellman equations (2019).

[29] T. Osa, J. Pajarinen, G. Neumann, J.A. Bagnell, P. Abbeel and J. Peters, An algorithmic perspective on imitation learning.

Found. Trends Robotics7(2018) 1–179.

[30] J. Peters and S. Schaal, Reinforcement learning of motor skills with policy gradients.Neural Networks21(2008) 682–697.

[31] A. Pinkus, Approximation theory of the MLP model.Neural Networks8(1999) 143–195.

[32] S.P. Ponomar¨ev, Submersions and pre-images of sets of measure zero.Sibirsk. Mat. Zh.28(1987) 199–210.

[33] B. Recht, A tour of reinforcement learning: The view from continuous control. Annu. Rev. Control Robotics Auton. Syst.2 (2018) 253–279.

[34] H.L. Royden, Real analysis. The Macmillan Co., New York; Collier-Macmillan Ltd., London (1963).

[35] R.S. Sutton and A.G. Barto, Reinforcement learning: an introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, second ed. (2018).

[36] L. Thevenet, J.-M. Buchot and J.-P. Raymond, Nonlinear feedback stabilization of a two-dimensional Burgers equation.

ESAIM: COCV16(2010) 929–955.

[37] F. Tr`eves, Topological vector spaces, distributions and kernels. Academic Press, New York-London (1967).

[38] K. Vamvoudakis, F. Lewis and S.S. Ge, Neural networks in feedback control systems. Mechanical Engineers’ Handbook:

Instrumentation, Systems, Controls, and MEMS. Wiley (2015).

[39] A. van der Schaft,L2-gain and passivity techniques in nonlinear control. Vol. 218 ofLecture Notes in Control and Information Sciences. Springer-Verlag London, Ltd., London (1996).

[40] E. Weinan and B. Yu, The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems.

Commun. Math. Stat.6(2018) 1–12.

Dans le document and Daniel Walter (Page 53-59)