• Aucun résultat trouvé

5 Optimization algorithms

N/A
N/A
Protected

Academic year: 2022

Partager "5 Optimization algorithms"

Copied!
6
0
0

Texte intégral

(1)

5 Optimization algorithms

We present in this chapter some algorithms which allow us to find or approximate the solution of optimization problems.

5.1 Unconstrained optimization

We consider the following minimization problem: find u∈V such thatJ(u) = infv∈VJ(v).

The goal is to find a minimizing sequence (uk) ∈ V (i.e. such that J(uk) →J(u)). Then J(uk)−J(uk+1) should be as large as possible. For anyw∈V,

J(uk+w) =J(uk) + (J0(uk), w) +kwkε(w).

|(J0(uk), w)| ≤ kJ0(uk)k.kwk, with equality ifw=λJ0(uk), λ∈R. Then, based on the first order approximation of J (by neglecting the second and higher order derivatives), J(uk)− J(uk+w) is the largest forwopposite to the gradient:

uk+1=uk−ρkJ0(uk),

withρk ∈R+.

Gradient algorithms consist of following the line of greatest slope, given by the gradient of the cost function.

5.1.1 Gradient algorithm with fixed step

LetV be a Hilbert space. The fixed step gradient algorithm is:

uk+1=uk−ρJ0(uk).

Theorem 5.1 Let J : V →R be a Gˆateaux-differentiable and α-convex function. If J0 is Lipschitz continuous(i.e. ∃M such that kJ0(v1)−J0(v2)k ≤Mkv1−v2k, ∀v1, v2∈V), and if 0 < ρ < 2α

M2, then the fixed step gradient algorithm converges to the minimum: uk →u, and the convergence is geometric (kuk+1−uk ≤γkuk−uk).

Proof

uis given byJ0(u) = 0. Letwk =uk−u. Thenwk+1=wk−ρ(J0(uk)−J0(u)). Then kwk+1k2=kwkk2−2ρhuk−u, J0(uk)−J0(u)i+ρ2kJ0(uk)−J0(u)k2

≤ kwkk2−2ραkuk−uk22M2kuk−uk22kwkk2

withγ2= 1−2ρα+ρ2M2= 1−ρ(2α−ρM2)<1 (quadratic function ofρ, maximum value

= 1 forρ= 0 or 2α/M2). Thenkwkk ≤γkkw0k.

Example of a quadratic functional: J(v) = 12hAv, vi − hb, vi uk+1=uk−ρ(Auk−b).

(2)

5.1.2 Gradient algorithm with optimal step ( J(uk−ρkJ0(uk)) = inf

ρ>0J(uk−ρJ0(uk)), uk+1=uk−ρkJ0(uk)

This is an improvement of the fixed step algorithm, with a one-dimensional minimization problem on the step size.

Theorem 5.2 IfJ is Gˆateaux-differentiable, α-convex, and if J0 is Lipschitz continuous on every bounded subset ofV, then the optimal step gradient algorithm converges.

Proof

i) Assume that J0(uk) 6= 0 (otherwise uk is the minimum and the algorithm has con- verged). Letf(ρ) =J(uk−ρJ0(uk)). Thenf0(ρ) =−hJ0(uk−ρJ0(uk)), J0(uk)i.

2−ρ1)(f02)−f0(ρ1)) =−(ρ2−ρ1)hJ0(uk−ρ2J0(uk)−J0(uk−ρ1J0(uk)), J0(uk)i

≥α(ρ2−ρ1)2kJ0(uk)k2.

Then f(ρ) is α-convex, and has then a unique minimum, defined byf0k) = 0. And then hJ0(uk+1), J0(uk)i = 0 (the successive descent directions are orthogonal). Then hJ0(uk+1), uk+1−uki= 0, and thenJ(uk)−J(uk+1)≥ α2kuk−uk+1k2.

ii) (J(uk)) is a decreasing sequence, bounded below (byJ(u)), and then it is a convergent sequence. ThenJ(uk)−J(uk+1)→0. Then kuk−uk+1k →0.

iii) kJ0(uk)k2=hJ0(uk), J0(uk)−J0(uk+1)i ≤ kJ0(uk)k kJ0(uk)−J0(uk+1)k. ThenkJ0(uk)k ≤ kJ0(uk)−J0(uk+1)k.

iv) As (J(uk)) is a decreasing sequence, (uk) is bounded. Otherwise, ifkukk →+∞, then J(uk)→+∞asJ(uk)≥J(u0) +hJ0(u0), uk−u0i+α2kuk−u0k2.

Moreover,J0 is Lipschitz continuous. Using iii), the Lipschitz continuity on a bounded subset, and ii), J0(uk)→0.

v) αkuk −uk2 ≤ hJ0(uk)−J0(u), uk −ui ≤ kJ0(uk)k kuk −uk. Then kuk −uk ≤

1

αkJ0(uk)k →0.

Note that the last inequality of the proof gives an estimation of the error.

Application to quadratic functionals:

J(v) =1

2hAv, vi − hb, vi,

where A is a symmetric positive definite matrix, andb ∈ Rn. Then J0(v) = Av−b. The optimal step ρk is characterized byhJ0(uk), J0(uk+1)i= 0:

hAuk−b, A(uk−ρk(Auk−b))−bi= 0 Letwk =Auk−b. Thenρk = kwkk2

hAwk, wki. An iteration of the optimal step gradient algorithm consists in computing wkk and thenuk+1.

It can be difficult to find the optimalρk. An easy approximation consists in approximating f(ρ) = J(uk−ρJ0(uk)) by a parabola ˜f(ρ) (quadratic function), uniquely determined by f˜(0) := f(0) = J(uk), ˜f0(0) := f0(0) = −kJ0(uk)k2, and a third value (e.g. f˜(ρk−1) :=

f(ρk−1)).

(3)

5.1.3 Conjugate gradient

The main issue of the optimal step (or fixed step) gradient algorithm is that it usually does not converge in a finite number of iterations.

LetJ be a quadratic functional defined onRn: J(v) =1

2hAv, vi − hb, vi, whereA is symmetric definite positive.

Conjugate gradient algorithm: Assume thatu1, . . . ,ukhave been computed. We assume that J0(ui) 6= 0, ∀i ≤ k, otherwise the algorithm has converged. Then let uk+1 be the minimum ofJ on the affine subset containinguk and spanned by (J0(ui))0≤i≤k:

J(uk+1) = inf

α∈Rk+1J uk+

k

X

i=0

αiJ0(ui)

! ,

whereα= (α0, . . . , αk). Note thatJ has a unique minimum on this subset.

Remark: The conjugate gradient (inf overα∈Rk+1) is better than the optimal step gradi- ent (inf over (0, . . . ,0, αk)).

Remark: The successive gradientsJ0(ui) are mutually orthogonal.

Proof

The derivative is equal to zero at the optimum: hJ0(uk+1), J0(ui)i= 0,∀0≤i≤k.

Corollary 5.1 The gradients (J0(ui))0≤i≤k are linearly independent.

Corollary 5.2 The conjugate gradient algorithm converges in at mostniterations.

Definition 5.1 Two non-zero vectorspi andpj areconjugate with respect to A (A being a symmetric positive definite matrix) ifhpi, Apji= 0.

Proposition 5.1 Conjugate vectors are linearly independent.

Proof

Assume that there exist (λ1, . . . , λk) such that

k

X

i=0

λipi= 0. Then 0 =hA(Pk

i=0λipi), pji= λjhApj, pji. Aspj6= 0 andA is symmetric positive definite,λj= 0.

Letdk=uk+1−uk=

k

X

i=0

δkiJ0(ui) be the descent direction at iterationk.

Proposition 5.2 The descent directions (di)are mutually conjugate.

Proof

J0(uk+1) = J0(uk +dk) = A(uk +dk)−b = J0(uk) +Adk. Then for 0 ≤ i < j ≤ k, 0 =hJ0(uj+1), J0(ui)i=hJ0(uj), J0(ui)i+hAdj, J0(ui)i=hAdj, J0(ui)i.

ThenhAdj, dii=hAdj,Pi

l=0δilJ0(ul)i= 0,∀0≤i < j≤k.

(4)

Computation of the direction dk using Gram-Schmidt: the first direction is d0 = J0(u0). Then, as the direction lives in the subspace spanned by (J0(ui)), the second direction can be set to

d1=J0(u1) +β0d0.

We then use the conjugation constraint to findβ0: 0 =hd1, Ad0i=hJ0(u1), Ad0i+β0hd0, Ad0i and then

β0=−hJ0(u1), Ad0i hd0, Ad0i . Using a similar process, at iterationk,

dk+1=J0(uk+1) +

k

X

i=0

βidi

as the successive directions span the same vector subspace as the successive derivatives. Then, the conjugation constraint gives: 0 = hdk+1, Adki=hJ0(uk+1, Adki+βkhdk, Adki+ 0, and thenβk=−hJ0(uhdk+1),Adki

k,Adki .

Moreover, for i < k, 0 = hdk+1, Adii = hJ0(uk+1), Adii+βihdi, Adii. But J0(ui+1) = Aui+1−b=Aui−b+A(ui+1−ui) =J0(ui)−Aρidi.

Then hJ0(uk+1), Adii= ρ1

ihJ0(uk+1), J0(ui+1−J0(ui)i= 0 as the gradients are orthogonal.

Thenβi= 0 fori < k. Finally,

dk+1=J0(uk+1) +βkdk, with βk =−hJ0(uk+1), Adki hdk, Adki . Computation of the step size ρk:

The iterate is defined byuk+1=ukkdk. As the derivative is equal to zero at the optimum (with respect toρ),hJ0(uk+1), dki= 0. Then, asJ0(uk+1) =J0(ukkdk) =J0(uk) +Aρkdk, the step size is given by

ρk=−hJ0(uk), dki hAdk, dki . Algorithm:

• Choose any u0; setd0=J0(u0);

• uk+1=uk−ρkdk, withρk= hJ0(uk), dki hAdk, dki ;

• dk+1=J0(uk+1) +βkdk, with βk =−hJ0(uk+1), Adki hdk, Adki .

This is one of the best method for solving linear systems Ax = b, where A is symmetric positive definite. Note that this algorithm can be extended to non quadratic functionals.

5.2 Constrained optimization

5.2.1 Gradient algorithm with projection We consider the following optimization problem:

v∈Kinf J(v)

whereKis a closed convex subset ofV (a Hilbert space), andJis Gˆateaux-differentiable and α-convex. The minimumuofJ overKis characterized by Euler’s inequality: hJ0(u), v−ui ≥

(5)

0,∀v∈K. Then forρ >0,hu−u+ρJ0(u), v−ui ≥0 andh(u−ρJ0(u))−u, v−ui ≤0. As K is convex, then

u=P rojK(u−ρJ0(u)).

The gradient algorithm with projection is the following:

uk+1=P rojK(uk−ρJ0(uk)), ρ >0.

Note that ifK=V, this algorithm is exactly the fixed step gradient algorithm.

Theorem 5.3 If K is a closed non-empty convex subset of V, if J : V → R is Gˆateaux- differentiable and α-convex, ifJ0 is Lipschitz continuous onV (letM be the Lipschitz con- stant), and if

0< a≤ρ≤b < 2α M2, then the gradient algorithm with projection converges, and

kuk−uk ≤βkku0−uk with β <1.

Proof

The projection onK is Lipschitz continuous, with a Lipschitz constant of 1. Thenkuk+1− uk ≤ k(uk −u)−ρ(J0(uk)−J0(u))k. Then kuk+1−uk2 ≤ k(uk −u)k22k(J0(uk)− J0(u))k2−2ρhuk−u, J0(uk)−J0(u)i. Using the Lipschitz continuity ofJ0andα-convexity ofJ, kuk+1−uk2≤ k(uk−u)k22M2k(uk−u)k2−2ραk(uk−u)k2= (1−2ρα+ρ2M2)k(uk−u)k2. f(ρ) := (1−2ρα+ρ2M2) isα-convex, and reaches its maximum value 1 forρ= 0 andρ=M2. Then for 0< a≤ρ≤b < M2,f(ρ)≤β2<1.

5.2.2 Identification of a saddle-point: Uzawa’s algorithm

As the projection operator is not explicity known in general, the projection at each itera- tion can be very difficult. The idea is then to consider the Lagrangian associated to the constrainted minimization problem, and to identify the saddle-points of the Lagrangian.

We consider the convex minimization problem:

inf

F(v)≤0J(v),

where J :V →R is convex, and F : V →Rm is convex. We assume that the hypotheses of Kuhn-Tucker theorem are satisfied. Let L(v, q) =J(v) +hq, F(v)i. Then, by definition, (u, p) is a saddle-point if

∀q∈Rm+, L(u, q)≤ L(u, p)≤ L(v, p), ∀v∈V.

We deduce thathp−q, F(u)i ≥0 for allq∈Rm+. This is equivalent tohp−q, p−(p+ρF(u))i ≤ 0, withρ >0. Then

p=P rojRm

+(p+ρF(u)), ∀ρ >0.

Uzawa’s algorithm is then the following:

• one choosesp0∈Rm+;

• pn being known, computeun solution of the (unconstrained) optimization problem L(un, pn) = inf

v∈VL(v, pn), ∀v∈V;

(6)

• compute the next Lagrange multiplier:

pn+1=P rojRm

+(pnnF(un)), ρn>0.

Note that the minimization problem is now unconstrained, and the Lagrange multiplier is given by a projection onRm+, which is straightforward.

Theorem 5.4 Under the previous hypotheses, and if 0 < a ≤ ρn ≤ b < M2, then the algorithm converges: un →uinV.

Note that the convergence of (pn) is not ensured.

Proof

un anduare characterized by the following inequalities (see proposition 2.15):

hJ0(un), v−uni+hpn, F(v)−F(un)i ≥0, ∀v∈V, hJ0(u), v−ui+hpn, F(v)−F(u)i ≥0, ∀v∈V.

If we denote byrn=pn−p, by choosing respectivelyv=uandv=un,

−hJ0(un)−J0(u), un−ui − hrn, F(un)−F(u)i ≥0.

hrn, F(vn)−F(u)i ≤ −hJ0(un)−J0(u), un−ui ≤ −αkun−uk2

and krn+1k ≤ krnn(F(un)−F(u))k (by Lipschitz-continuity of the projection). Then krn+1k2≤ krnk2+ 2ρnhrn, F(un)−F(u)i+ρ2nkF(un)−F(u)k2≤ krnk2−2ρnαkun−uk2+ ρ2nM2kun−uk2.

As 0 < a ≤ ρn ≤ b < M2 ⇔ 2αρn−ρ2nM2 ≥ β > 0, krn+1k2 ≤ krnk2−βkun−uk2. Then the sequence (krnk) is decreasing, and bounded below, and then it converges. As

0≤βkun−uk2≤ krnk2− krn+1k2,kun−uk →0.

Remark: Uzawa’s algorithm has a dual interpretation: it is exactly the gradient algorithm with fixed step and projection applied to the dual problem.

Références

Documents relatifs

The present paper focuses on both points by proposing a single association matrix, the dual pignistic matrix which gathers all the association information (targets ⇒ tracks and tracks

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

Few apoptotic cells were observed after PBS (Fig. 2A) or PF-tafluprost (Fig. 2B) incubations (Fig. The number of apoptotic cells increased after the BAC-containing antiglaucoma

There are some primary problems or difficulties raised by this approach. First, the system should have some way to decide where to start reasoning process that

(eds.) (1998).- Le stratotype historique de l'Aptien inférieur (Bédoulien) dans la région de Cassis-La Bédoule (S.E. France).- Géologie. Méditerranéenne,

Holotype E3b : Provenant du niveau 11,0, un petit spécimen a été choisi comme holotype et coloré ; il est caractérisé par une collerette aborale très développée mais la face

Diurnal variations of upper and middle tropospheric humid- ity have been examined in conjunction with diurnal varia- tions of precipitation intensity and high clouds (i.e.,