A NNALES DE L ’I. H. P., SECTION C
I. C APUZZO D OLCETTA
M. F ALCONE
Discrete dynamic programming and viscosity solutions of the Bellman equation
Annales de l’I. H. P., section C, tome S6 (1989), p. 161-183
<
http://www.numdam.org/item?id=AIHPC_1989__S6__161_0>
© Gauthier-Villars, 1989, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section C » (
http://www.elsevier.com/locate/anihpc) implique l’accord avec les condi- tions générales d’utilisation (
http://www.numdam.org/conditions). Toute uti- lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
SOLUTIONS OF THE BELLMAN EQUATION
I. CAPUZZO DOLCETTA & M. FALCONE
Dipartimento
di Matematica, Universitd di Roma "LaSapienza",
P. Aldo Moro, 2, 001 ~S Roma,Italy
Abstract. This paper
presents
atechnique
forapproximating
theviscosity
solution of the Bellman
equation
in deterministic controlproblems.
Thistechnique,
based on discretedynamic programming,
leads tomonotonically converging
schemes and allows to prove apriori
error estimates. Severalcomputational algorithms leading
to monotone convergence are reviewed andcompared.
Key
words. Bellmanequation, dynamic programming, approximation schemes, viscosity
solutions.Introduction
The notion of
viscosity
solution of Hamilton-Jacobiequations recently
introducedby
Crandall and Lions[21] ]
anddeveloped by
various authors(see Crandall-lshii-Lions [19],
Lions[40]
and referencestherein)
hasproved
tobe an
important
tool in severalapplications
andparticularly
in thedynamic programming approach
to deterministic(see
Lions[38], [39], Capuzzo
Dolcetta-Evans[13],
Bardi[1 ], Barles [4],
Barles-Perthame[5],[6], Lions-Souganidis [42],
Soner[50])
and stochastic(see Fleming [26]
and referencestherein) optimal
controlproblems.
The aim of this paper is to discuss some
aspects
of thetheory
ofviscosity
solutions related toapproximation
andcomputational
methodsin the framework of deterministic control
theory.
Moreprecisely,
our purpose here is topoint
out on a modelproblem
how the solution of the classical discrete timedynamic programming
functionalequation (see
Bellman[7])
can beregarded
as a uniformapproximation
of theviscosity
solution of the
corresponding
Bellmanpartial
differentialequation.
We will show how this PDE
approach
allow to establishgeneral
resultson the rate of convergence of the
approximate
solutions. These resultsprovide
a wider theoretical basis to classical and more recentcomputational techniques
for the value function andoptimal
feedbacks.The last section is
completely
devoted to review in this framework several methodsleading
to monotone convergence(namely,
successiveapproximations,
iteration inpolicy
space and finite differenceapproximations).
Let us mention
finally
that most of these methods have been alsoapplied
to stochastic controlproblems.
We refer the interested reader toBellman-Dreyfus [8],
Howard[29],
and to the more recent worksby
Kushner
[32],
Kushner-Kleinman[33],[34],
Lions-Mercier[41],
Menaldi[46],
Quadrat
[49], Bensoussan-Rungaldier [10].
1. The infinite horizon
problem:
discrete Bellmanequation
andsynthesis.
Let us suppose to observe the state
x(t)
E IR n of some deterministicsystem evolving
in time and to be able- to affect its evolutionby acting
onsome available controls. A standard model for the evolution can be
expressed
in terms of asystem
of controlledordinary
differentialequations:
dx(t)/dt
=b(x(t),a(t)),
t>0(1.1)
x(0)
= x,where the control a is a measurable function of t
taking
values in agiven compact
subset AThe cost functional which measures the
performance
of the control a is taken of the formwhere f is a
given
function(the running cost)
which we shallalways
assume to be bounded on
Rn A
and 03BB is apositive
number(the
discountfactor).
The discounted infinite horizon
problem
is todetermine,
for any initialposition
x E Rn,
the value(P) v(x) =
Infa(.) E.9L J(x,a) ,
~ ={
a:[0,+oo)
-A,
ameasurable}
and to
identify
a control a*(depending
onx )
for which the infimum in(P)
is
attained, provided
such a control exists(see Fleming-Rishel [27],
Lee-Markus
[37]
asgeneral
references foroptimal
controlproblems
andCarlson-Haurie
[18]
for the treatment of severalexamples
of infinitehorizon
problems arising
in theapplications).
!n this section we describe how discrete time
dynamic programming
methods
(see
Bellman[7],
Bensoussan[9],
Bertsekas-Shreve[11 ]) apply
toproblem (P).
At this purpose, let h be a fixedpositive
number and assume that the evolution(1.1 )
is observedonly
at a sequence of instantstj
=jh,
j=o,1,2,....
Assume as well that thedynamics
b and therunning
cost fremain constant in any time interva!
namely,
" " " J
where
(1.4) aj = a(tj)
and
xj = x(tj)
isgiven by
the recursionxo
= xThe total cost associated with the initial
position
x and the control a isgiven,
in this discretizedmodel, by
the serieswith
#
= 1 - ~h. !Let us introduce now the
approximate
value functionvh by setting
where
5th
is the subsetof A consisting
of controls which are constanton each interval
The statement below
comprises
the well-known BellmanDynamic Programming Principle
for theproblem
under consideration:proposition
i .i Thefollowing identity
holdsfor
aCl x =Xo
ancfp=1, 2, ...
Proof.
Let x =XQ
be anarbitrary
but fixed vector in[R n. By (Ph)’
for anye > 0 there exists
a£
such that .Now it is easy to check that for any p > 1
The second term in the
right
hand member isactually #P ).
Hence,
The reverse
inequality
isproved
in a similarway..
The
(DPPh) yields
the Bellman functionalequation characterizing
the value functionvh .
Proposition
1.2 Let usassume |f (x,a)
Mfor
a(C(x,a)
e"1Rnx
A. Then thefunction vh de-fined d y
is theunique
bounded solutionof
for
h E[0,1 /~[.
Proof.
The boundedness ofvh
is an easy consequence of theassumption
onf . To check that
vh
satisfies(B~),
take p = 1 in(DPPh).
Thisgives
To prove the converse, take p = 1 in
(DPPh)
and observe that for any e > 0ther exists
a£
such that vand therefore .
Finally,
letu1 , u2
be two bounded solutions of(Bh).
It is immediate todeduce from
(Bh)
that .and the
uniqueness
isproved. -
On this characterization of the
approximate
value functionvh
reliesan
algorithm
for thesynthesis
ofoptimal
feeback controls for the discrete timeproblem (Ph).
In order to describe
it,
let us assume thatvh
is lower semicontinuous(this
is the caseif,
forexample,
b is continuous and f is lowersemicontinuous). Then,
for any fixed x E the supremum in(Bh)
isattained at some
ah = ah(x)
E A. Define nextand set
Therefore we have:
Proposition
1.3 Let h E[0, 1 /Â,[. If vh
is lower semicontinuous and A iscompact,
then thepiecewise
constant controldefi.ned by (1.8)
isoptimdfor (Ph),
that isProof. By
the very definition of themapping
x ~ fromequation (Bh)
it follows that for any p
This
yields by
additionand
(1.9)
followsby letting
p -2.
Limiting
behaviour as the timestep
vanishes.In this section we shall
investigate
the behaviour ofvh
anda*h
as thetime
step
h tends to zero. It is natural toexpect
thatvh
anda*h
shouldconverge,
respectively,
to the value function v and to someoptimal
control
(if
itexists)
forproblem (P).
We
report
hereafter on some recent convergence results and error estimates. A crucial role inestablishing
these results isplayed by
twokey
facts in the
theory
ofviscosity
solutions of Hamilton - Jacobi - Bellmanequations. First,
the observation due to P.L.Lions(see [38])
that the value vof
problem (P),
even whenonly continuous,
satisfies nonetheless thecorresponding
Bellmanequation
in the
viscosity
sense.Second,
theuniqueness
result ofM.G.Crandall-P.L.Lions [21]
for boundeduniformly
continuousviscosity
solutions of
(B).
Let us recall for the reader’s convenience thata viscosity
solution of(B)
is a boundeduniformly
continuous function u such that foreach # E C 1 ([R n)
thefollowing
holds:(i)
if attains a local maximum atXo
then(ii)
ifu-~
attains a local minimum atXQ
thenLet us also
point
out that the use of this weak notion of solution of(B)
allows to pass to the limit as h - 0 in the nonlinear
equations
underrather
general
conditions on b and f .Let us state and sketch the
proof
of two results in this direction(see Capuzzo
Dolcetta[12], Capuzzo
Dolcetta-Ishii[14]
and Loreti[43]
fordetails and
extensions).
Theorem 2.1 Let us assume that b and f are continuous on x A and that
for
someconstants
L,
M thefollowing
holdsfor alix,
x’ a EA,
o~rvh v
h 2014~0+
ancffor
some constant C > 0.Theorem 2.2 Under tfie
assumptions of Theorem
2.1 andtfie
following
estimateof
the rateof
convergence holdsfor
some constant C > 0.Sketch of
theproof of Theorem
2.1.The first
step
is to establish the uniform estimates:for h E
[0,1/~[.
These aresimple
consequences of the fact thatvh
satisfies
(Bh). Then, by
Ascoli-Arzelàtheorem,
there exists aLipschitz
continuous function u such that
vh
ulocally uniformly
in(R n
as h -
0+ (at
least for asubsequence).
The next
step
is to show that u is aviscosity
solution of(B).
To dothis,
E
n), Xo
a local strict maximumpoint
for u -#,
S a closed ball centered atxo’ Xo h
a maximumpoint
for over S. Then(Bh ) yields
Since #
EC1 ( R"),
it follows that0 >
Supa~
A[-0 p
for some e =
8(h,a)
E.[0,1] . Hence, by
uniform convergence,that is u is a
viscosity
subsolution of(B).
A similarargument
shows thatu is also a
viscosity supersolution
of(B) and, by
theuniqueness
theoremof Crandall-Lions
[21],
u == v.In order to prove
(2.4),
observe thatby definition,
A
simple computation
shows that for any awhere e = Since e - 1 is of order h as h-
0+, (2.4)
follows..Sketch of
theproof of Theorem
2.2.Let us consider the function 03A8 defined
by
where ~
is anauxiliary
functionforcing ~
to attain its maximum at somepoint
R2n.
It is not difficult to showthat I xo - y0 | CE2.
Since v is the
viscosity
solution of(B),
thenfor some a*~ A. The next
step
is to show thatvh
satisfies thesemiconcavity
conditionfor some C > 0. To prove this observe that
vh satisfies, by
its verydefinition,
A rather technical
computation
based on(2.5), (2.6)
shows(see [14]
fordetails)
thatwhere
Since
with v =
log(1 +Lh)/Lh ,
and ?L >2L ,
theintegral
in(2.11 )
can beestimated
by
CM(1 +M/vL) l"zI2.
Thisimplies (2.10).
As a consequence of the uniformsemiconcavity (2.10)
ofvh
one has(see [14]
fortechnical details)
for all x~ R
n.
The choice x= in the aboveyields
From this and
equation
evaluated at x =XQ
it follows thatThis
inequality,
combined with(2.9), yields
Choosing
e =h 1/2. ,
from the aboveinequality
it follows thatand
(2.8)
isproved..
2.1. Under the sole
assumptions
of Theorem 2.1 it ispossible
to show thatfor a
proof
we refer to[14]
and also to the papersby
Crandall - Lions[20]
andSouganidis [51] ]
where similar estimates are established forgeneral
Hamilton - Jacobiequations
of the formincluding
forexample
the Isaac’sequation
in differential gamestheory.
The above Theorem 2.2
exploits, through
thesemiconcavity assumptions (2.5), (2.6),
thespecial
structure of the Bellman’sequation (B) , namely
the
convexity
of themapping
in order to obtain the
sharper,
and in factoptimal (see [14]),
estimate(2.8).
Theorems 2.1 and 2.2 can be
regarded
as an extension of earlier resultsby
J.Cullum[22],
K.Malanowski[44] ,
M.M.Hrustalev[30]
obtained via thePontryagin
maximumprinciple
under rather restrictiveconvexity assumptions.
Theasymptotic
behaviour of the discrete value function as the timestep
vanishes is studied in[16], [17]
for other controlproblems
such as the
stopping
time and theoptimal switching problem,
without useof the notion of
viscosity
solution. Morerecently
the notion of discountinuousviscosity
solutions(see [5])
has beenapplied
to prove the convergence of a discretization scheme for the minimum timeproblem by
Bardi-Falcone[2].
As a consequence of Theorem
2.1,2.2
andProposition 1.3,
thepiecewise
constant controlsa*h (see §1 ) yield
the value ofproblem (P)
with any
prescribed
accuracy. Theirlimiting
behaviour as h -0+
iseasily
understood in the framework of relaxed controls in the sense of
L.C.Young [53] (see
also Lee-Markus[37], Warga [52]
asgeneral
references on thissubject). Namely, a*h
convergeweakly
to anoptima!
control for the relaxed version of(P).
The result relies on the fact that the Hamiltonian isieft
unchanged by
relaxation(see Capuzzo
Dolcetta -Ishii[10 ]
for detailsand also to
Mascolo-Migliaccio [45]
for other relaxation results inoptimal control).
3. Discretization in the state variable and
computational
methods.
In order to reduce
equation (Bh)
to a finite dimensionalproblem
we need adiscretization in the state variable x. To be able to build a
grid
in thestate space, let us assume the existence of an open bounded subset Q of
(R n
which isinvariant
for thedynamics (1.1 ).
Atriangulation
of Q into afinite number P of
simplices Sj
can be constructed(Falcone [23])
sothat,
the
Sj
isinvariant
withrespect
to the discretizedtrajectories,
i.e.I
l
where k=maxdiam(Sj’)’
We denotei=1,...,N,
ageneric
node of thatgrid
!
and we
replace by
thefollowing system
of Nequations
which
corresponds
to the above outlined discretization in the space variable. We shallbriefly
refer in thesequel
on somecomputational techniques
available to solve thissystem
ofequations.
, 3.~. Notice
that,
due to the fact that we aredealing
with the infinite horizonproblem,
the discretization in time(1.3), (1.4)
and(1.5)
isnot sufficient in itself to
compute
anapproximate
solution of(B). Infact,
even if in
priciple
the value ofu(x)
can be obtainedby (Bh)
from theprevious knoledge
of u at allpoints x+hb(x,a),
i.e. thepoints
which follow xI
in the discretization scheme(1.3),
the classical backward scheme of thedynamic programming computational procedure (Larson [35])
cannot beapplied
since time varies in[0,+oo[.
That scheme isonly
suitable forfinite
I horizonproblems
where a terminal condition isgiven.
’
Remark
3.2. Other numerical methods which can beapplied directly
to(B) ;
do not lead to the
system (Bhk)(see
also Remark5).
Let usjust mention i
the finite element
approach
usedby
Gonzales-Rofman[28]
to solvethe í
Bellman
equation
related to astopping
timeproblem
with continuous andimpulsive
controls. Theprocedure generates
a monotonenon-decreasing
sequence and a convergence result is obtained
by
means of a discretemaximum
principle (see
also Menaldi-Rofman[47]
where the convergence to theviscosity
solution isproved).
Successive
approximation (BelCmun [7], Bellman-Dreyfus [8]).
Perhaps
the most classicaltechnique
tocompute
a solution of(Bhk)
is toapply
Picard method of successiveapproximations.
Starting
from anyassigned
initial guessone can
compute by
asimple interpolation
on the nodes ofthe
grid
and then define thefollowing
recursive sequenceSince
Th is
a contractive map(see
Theorem 3.1below), un
will converge toa limit as h tends to 0.
In
spite
of itssimplicity
thistechnique
has severalcomputational disadvantages
which do notcompensate
for the fact thatconvergence
is assured for any initial choiceu~.
In order to illustrate this
point,
assume to have Madmissible
controls.Since the
point
could beanywhere
inQk,
thecomputation
ofu n+
1 wouldrequire
tokeep
the N values ofun
at the nodes in the central memory and compare M different values to find the value ofun+ 1
at eachnode : this means that N x M
comparisons
are needed at each iteration.However,
using
state incrementdynamic programming (see
Larson[33])
itis
possible
to reduceconsiderably
thehuge
demand for memory allocations.Larson’
technique
consistsessentially
in the introduction of acompatibility
condition between thestep
in time h and thespacial step
k .By defining Mb= b(x ;,a )L h1=k/Mb
andchoosing h2
such that the invariance condition(3.1)
isverified,
’ the choice of h=minguarantees
that thepoints xi+hb(xi,a) belong
to somesimplex having xi
among its nodes. In that way
only
the values ofun
atneighbouring
nodesare needed to
compute un+ 1
atxi.
The second
disadvantage
is that thevelocity
of convergence of thisalgorithm
becomes very poor when agreat
accuracy isrequested,
due tothe fact that the contraction coefficient of
Th is p=1- ~h ,
which is closeto 1 for h close to 0. We shall see in the section devoted to finite difference schemes an acceleration
technique
whichpermits
to overcomethis
difficulty.
This accelerationtechnique
isessentially
based on themonotonicity
ofTh
whichguarantees
a monotone convergence ofun
for an~
appropriate
choice ofu".
Approximation
inpolicy
spacej8 j,
Howard[29]).
A different
computational procedure leading
to monotone convergence is the Bellman-Howardapproximation
inpolicy
space. The idea here is to fixi
an initial guess for thepolicy
rather then for the value function(as
in thei i
successive
approximation procedure). Assuming an
to be such an initialI
guess, we cancompute
the related return functionby solving
theequations
I
This can be donestarting
from anassigned uO, 0,
andgenerating
therecursive sequence
which,
underappropriate assumptions,
will convergeThen one looks for a new
policy
such thatand
replaces ao
witha1
in(3.4)
tocompute
a new return functionu
1by
means of recursion.
Notice that
Continuing
in this way, we obtain a monotonedecreasing
sequenceun
and theprocedure stops
whenan+ 1 = an.
Since for any choice of thepolicy
weobtain a return function which is
greater
orequal
to the value function u,we have
so
the approximation
inpolicy
spaceyields
monotone convergence.This
procedure
appear to have moreappealing properties
from thecomputational point
of view.Infact,
Kalaba[31]
has shown that it isequivalent
to the Newton-Kantorovich iterationprocedure applied
to thefunctional
equation
ofdynamic programming
and Puterman-Brumelle[48]
have
given
sufficient conditions for the rate of convergence to be eithersuperlinear
orquadratic.
Remark
3.3. Assume to have M admissible controls. Even if thisprocedure
does not
require
Mcomparisons
tocompute uJ,n(xi)
in(3.5), they
arerequested
in(3.7) .
Then the cost of thisprocedure
in terms ofcomputational requirements
is similar to the method of successiveapproximation
anddepends mainly
from thevelocity
of convergence in(3.6).
Larson’technique
can also beapplied
to this scheme.finite difference approximation.
We look for a solution of in the
following
space ofpiecewise
affinefunctions,
W ={WE
in Thefollowing
theorem holds(see
Falcone[23])
3.1 Under the same
assumptions of
?Cceorem 2.1,for
any h E[0 , 1
/À.[ satisfying (3.1J
tfl.ereexists
aunique
solutionvh of
in W and tafoCCozvircg
estimate holdsSketch of
theproof.
Due to the invariance
assumption (3.1 )
anypoint xi
+hb(xi,a) belongs
ton k
and can be written as a convex combination of thenodes,
with coefficients Since we look for a solution inW, equation (3.3)
isequivalent
to thefollowing
fixedpoint problem
in R Nwhere
[13 A(a)
U + hF(a)]i
andA(a) =[
is anonnegative
matrix N x N such that the sum of the elements on each row isequal
to1,
U and F are vectors of[R N. By
theproperties
of A follows thattherefore
Th is
acontracting operator
in [R N and(3.10)
has aunique
solution
U*.
Estimate (3.9)
can be derivedcombining
the estimates onSup
v(x) I
obtained in thepreceeding
section with thelipschitz continuity
ofvh
and the fact thatU* ),...,
Even if the
preceeding
theorem establishes the convergence of the recursive sequenceUn + 1 = T h (U n) starting
from any initial guessUO,
special
choices ofUO
are moreappropriate
for numerical purposes in order to obtain monotone convergence and to accelerate convergence. The accelerationtechnique
is crucial to break the obstacle constitutedby
thefact the convergence is very slow whenever a
great
accuracy isrequested,
as we
explained
in the section devoted to successiveapproximation.
Let us consider the set of subsolutions for the fixed
point problem (3.10),
i.e. the set f{1
={U:
UTh (U)}.
Since U Vimplies T h (V),
therecursive sequence
Un
is monotoneincreasing
and remains in e1 wheneverU 0
is taken in f{1. This set is closed and convex,by
the definition ofTh ,
and
U*
is its maximal element since it is the limit of any(monotone)
sequence
starting
in f{1. We can construct then thefollowing
-Accelerated
algorithm
Step 1:
takeUO
in f{1 andcompute
Step 2 : compute U1 = UO+
a(U1 ’0- Uo)
where(3.12)
a = max{aeR+: UO+
a(U1 ~~- Uo)E 21 };
Step 3 : replace UO
withU1
and go back toStep
1.The convergence of this
algorithm
toU*
isguaranteed by
the convergence andmonotonicity properties
ofUn,
sinceUn Un Un+1 ~ U* .
The first
interesting
feature of this acceleration method is that itsvelocity
of convergence is notstrictly
related to themagnitude
ofsince
Th is
usedonly
to find a direction ofdisplacement.
Infact severalnumerical tests have shown that a decrease of h does not
always imply
anincrease in the number of iterations
(for
adetailed discussion
of some numerical tests as well as for thepractical implementation
of thealgorithm
see Falcone[24]
and[25]).
The secondappealing property
isthat,
once a direction of
displacement
has beencomputed,
thealgorithm proceeds
inStep
2by
a one-dimensional constrainedoptimization problem
to recover the next
approximation. Finally,
it isquite interesting
tocompare the actual acceleration
parameter a n
with1,
which is theparameter
valuecorresponding
to standard successiveapproximation,
inorder to have an
empirica!
estimate of thevelocity
of convergence.Numerical tests have shown that
h-1,
i.e.an
~100 whenh=10-2.
3.4. Assume to have M admissible controls. The cost for
finding
adirection of
displacement
isexactly
the same of successiveapproximation
and the search of
U1
atStep
2 wouldrequire
a certain(a priori unknown)
number of additional
comparisons
to check that the constraint in(3.12)
isverified.
Nonetheless,
the total amount ofoperations
needed in order tofind a fixed
point
with the above accelerationtechnique
is much lower since the total number of iterations isconsiderably reduced,
e.g. from order103
to order 10 whenh=10-2.
Theapplication
of Larsontechnique
to this scheme
yields
aparticular
structure of the matrix A(a), namely
it becomes atridiagonal
matrix for scalar controlproblems.
Remark
3.5’. Other finite difference schemes have been considered to solve the Bellmanequation (B).
Inparticular
Crandall-Lions[18]
have studiedsome finite difference schemes which can be
regarded
as anadaptation
ofclassical schemes for conservation
laws,
such as Lax-Friedrichs scheme.It is
interesting
topoint
out here thatthey
obtain an estimate of order 1/2 and monotone convergence as well.We conclude with section with some final comments and remarks related to the schemes that we have
presented
here. It is worthwhile to notice that these schemes can beadapted
to treat other deterministic controlproblem
such as finite horizon andoptimal stopping
controlproblems
(intact,
successiveapproximation
andapproximation
inpolicy
space wereoriginally developped
for thoseproblems,
see[7], [8], [35] ).
This can bedone
adding
somecontrol,
saya’,
toA,
andwriting
thoseproblems
as ar infinite horizonproblem by
anappropriate
definition off(x,a’), b(x,a’).
The
second,
andperhaps
moreimportant remark,
is that all thes~schemes
provide
theapproximate
value functionand,
without an) additionalcomputation,
the relatedapproximate
feedback law at least toall nodes of the
grid.
It can also beproved (see
Falcone[23])
that such éfeedback law is
"quasi optimal"
for theoriginal
continuousproblem (P).
Due to the
monotonicity,
the acceleration method can beapplied
also tcthe
approximation
inpolicy
space. In this case the fixedpoint
will be the minimal element in the set ofsupersolutions
and the sequenceU n
will be monotonedecreasing. Moreover,
due the localquadratic
convergence o thisapproximation,
it should bequite interesting
from a numericalpoin1
of view to combine thisprocedure
with the finite differenceapproximatior
scheme.
Infact,
one can first look for arough approximation
of theoptima policy by applying
that scheme andthen
use it as initial guess in(3.4).
References
[1]
M.Bardi,
Aboundary
valueproblem
for the minimum timefunction,
to appear on SIAM J. Control andOptimization.
[2]
M.Bardi,
M.Falcone,
Anapproximation
scheme for the minimum timefunction, preprint, july
1988[3]
G.Barles, Inéquations quasi-variationnelles
dupremier
ordre etequations d’Hamilton-Jacobi,, C.R.Acad.Sc.Paris,t.296,
1983.[4]
G.Barles,
Deterministicimpulse
controlproblems,
SIAM J. Controland
Optimization, vol.23, 3,
1985[5]
G.Barles, B.Perthame,
Discontinuous solutions of deterministicoptimal stopping
timeproblems,
to appear in Math. Methods and Num.Anal.
[6]
G.Barles, B.Perthame,
Exit timeproblems
inoptimal
control andvanishing viscosity method, preprint
[7]
R.Bellman, Dynamic programming,
PrincetonUniversity Press,
1957[8]
R.Bellman,
S.Dreyfus, Applied Dynamic Programming,
PrincetonUniversity Press,
1962[9]
A.Bensoussan, Contrôle stochastique discret,
Cahiers du Ceremade[10]
A.Bensoussan,
W.Runggaldier,
Anapproximation
method forstochastic control
problems
withpartial
observation of the state- A method forconstructing ~-optimal controls,
ActaApplicandae
Mathematicae,
vol.10, 2,
1987[11]
D.P. Bertsekas, S.E.Shreve,
Stochasticoptimal
control: the discrete time case, AcademicPress,
NewYork,
1978.[12]
I.Capuzzo Dolcetta,
On a discreteapproximation
of the Hamilton -Jacobi equation
ofdynamic programming, Appl.Math.and Optim.
10,1983
[13]
I.Capuzzo Dolcetta,L.C. Evans, Optimal switching
forordinary differential equations,
SIAM J. Control andOptimization, vol.22,1984 [14]
I.Capuzzo Dolcetta,
H.Ishii, Approximate
solutions of theBellman
equation
of deterministic controltheory, Appl.
Math.andOptim.11,1984.
[15]
I.Capuzzo Dolcetta,
P.L.Lions,
Hamilton-Jacobiequations
andstate-constrained
problems,
to appear in Transactions of the AMS[16]
I.Capuzzo Dolcetta,
M.Matzeu,
On thedynamic programming inequalities
associated with theoptimal stopping problem
in discrete and continuoustime,
Numer. Funct. Anal.Optim.,
3(4), 1981.
[17]
I.Capuzzo Dolcetta,M. Matzeu,J.L. Menaldi,
On asystem
of first orderquasi-variational inequalities
connected with theoptimal switching problem, System
and ControlLetters, Vol.3,No.2,1983.
[18]
D.A.Carlson,
A.Haurie,
Infinite horizonoptimal control,
Lecture Notes in Economics and MathematicalSystems, 290, Springer Verlag,1987.
[19]
M.G.Crandall,
H.Ishii,
P.L.Lions, Uniqueness
ofviscosity
solutions of Hamilton-Jacobiequations revisited,
J. Math. Soc.Japan,
vol.39, 4,
1987
[20]
M.G.Crandall,
P.L.Lions,
Twoapproximations
of solutions of Hamilton-Jacobiequations,
Mathematics ofComputation, vol.43, n.167,
1984[21]
M.G.Crandall,
P.L.Lions, Viscosity
solutions of Hamilton-Jacobiequations,
Trans. A.M.S.277, 1983.
[22]
J.Cullum,
Anexplicit procedure
fordiscretizing
continuousoptimal
control
problems,
J.Optimization Theory
andApplications, 8(1),1971.
[23]
M.Falcone,
A numericalapproach
to the infinite horizonproblem
of deterministic controltheory, Appl.
Math.Optim. 15,
1987[24]
M.Falcone,
Numerical solution of deterministic continuous controlproblems, Proceedings
InternationalSymposium
on NumericalAnalysis, Madrid,
1985[25]
M.Falcone, Approximate
feedback controls for deterministic controlproblems,
inpreparation
[26]
W.H.Fleming,
Controlled Markov processes andviscosity
solutions ofnonlinear evolution
equations,
LectureNotes,
Scuola NormaleSuperiore Pisa,
to appear[27]
W.H.Fleming,
R.Rishel,
Deterministic and stochasticoptimal
control,Springer Verlag, Berlin,
1975.[28]
R.Gonzales,
E.Rofman,
On deterministic controlproblems:
anapproximation procedure
for theoptimal
cost,part
I andII,
SIAM J.Control and
Optimization,
vol.23,
n.2,
1985[29]
R. A.Howard, Dynamic programming
and Markov processes,Wiley,
NewYork,
1960[30]
M.M.Hrustalev, Necessary
and sufficientoptimality
conditions in the form of Bellman’sequation,
Soviet Math.Dokl.,19 (5),
1978.[31]
R.Kalaba,
On nonlinear differentialequations,
the maximumoperation
and monotone convergence, J. of Math. and
Mech.,
vol8,
n.4,
1959[32]
H.J.Kushner, Probability
methods forapproximation
in stochasticcontrol and for
elliptic equations,
AcademicPress,
NewYork,
1977[33]
H.J.Kushner,
A.J.Kleinman,
Acceleratedprocedures
for the solution ofdiscrete Markov control
problems,
IEEE Trans. Automaticcontrol, AC-16,
1971[34]
H.J.Kushner,
A.J.Kleinman,
Mathematicalprogramming
and the control of Markovchains,
Int. J.Control, 13,
1971[35]
R.E.Larson,
State incrementdynamic programming ,
AmericanElsevier,
NewYork,
1967[36]
R.E.Larson, Korsak,
Adynamic programming
successiveapproximaton technique
with convergenceproofs, parts
I endII, Automatica,
1969[37]
E.B.Lee,L. Markus,
Foundations ofoptimal control, J.Wiley,
NewYork,
1967.
[38]
P.L.Lions,
Generalized solutions of Hamilton-Jacobiequations, Pitman, London,
1982.[39] P.L.Lions, Optimal
control andviscosity solutions,
in I.Capuzzo
Dolcetta-W.H.
Fleming-
T. Zolezzi(Eds.),
Recent Mathematical Methods inDynamic Programming,
Lecture Notes in Mathematics1119, Springer Verlag Berlin,
1985.[40]
P.L.Lions,
Recent progress on first order Hamilton-Jacobiequations,
in Directions in Partial Differential
Equations,
AcademicPress,
1987[41]
P.L.Lions,
B.Mercier, Approximation numérique
desequations
deHamilton-Jacobi-Bellman,
R.A.I.R.OAnalyse numérique, vol.14,
n. 4,1980
[42]
P.L.Lions,
P.E.Souganidis,
Differential games,optimal
control anddirectional derivatives of
viscosity
solutions of Bellman-Isaacsequations,
to appear in SIAM J. Control andOptimization
[43] P.Loreti, Approssimazione
di soluzioni viscositàdell’equazione
diBellman,
Boll. U.M.I(6),
5-B,1986.
[44] K.Malanowski,
On convergence of finite differenceapproximations
tooptimal
controlproblems
forsystems
with controlappearing linearly,
Archivum
Automatyki
iTelemechaniki, 24, Zeszyt 2,
1979.[45] E.Mascolo,L.Migliaccio,
Relaxation methods inoptimal control,
toappear in J.
Optim.
Th.App.
[46]
J.L.Menaldi,
Probabilistic view of estimates for finite differencemethods,
to appear in Mathematicae Notae[47]
J.L.Menaldi,
E.Rofman,
Analgorithm
tocompute
theviscosity
solution of the Hamilton-Jacobi-Bellman
equation,
in C.I.Byrnes,
A.Lindquist (eds.), Theory
andApplications
of Nonlinear ControlSystems, North-Holland,
1986[48]
M.L.Puterman,
S.L.Brumelle,
On the convergence ofpolicy
iteration instationary dynamic programming,
Math. ofOperation Research, vol.4,
n.
1,
1979[49]
J.P.Quadrat,
Existence de solution etalgorothme
deresolutions numeriques
deproblemes stochastiques degenerées
ou non, SIAM J.Control and
Optimization, vol.18,
1980[50]
M.H.SonerOptimal
control with state spaceconstraint,
SIAM J.Control and
Optimization, vol.24, 3,
1986[51] P.E.Souganidis, Approximation
schemes forviscosity
solutions of Hamilton-Jacobiequations,
Journal of DifferentialEquations,57,1985.
[52] J.Warga, Optimal
control of differential and functionalequations,
Academic