• Aucun résultat trouvé

The optimal decay for the solution of the Monotone Inclusion associated to FISTA for b<=3 is 2b/3

N/A
N/A
Protected

Academic year: 2021

Partager "The optimal decay for the solution of the Monotone Inclusion associated to FISTA for b<=3 is 2b/3"

Copied!
9
0
0

Texte intégral

(1)

HAL Id: hal-01565933

https://hal.archives-ouvertes.fr/hal-01565933

Preprint submitted on 20 Jul 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de

The optimal decay for the solution of the Monotone Inclusion associated to FISTA for b

Jf Aujol, Charles Dossal

To cite this version:

Jf Aujol, Charles Dossal. The optimal decay for the solution of the Monotone Inclusion associated to FISTA for b

(2)

The optimal decay for the solution of the Monotone Inclusion associated to FISTA for

b 6 3 is

2b3

JF Aujol and Ch. Dossal IMB, Univ. Bordeaux

July 2017

Abstract

It was recently proved that the decay of the solution of the ODE associ- ated to the Nesterov Fast Gradient Algorithm with a parameterb63 was 0( 1

t2b3 ). In this note we prove that this decay is achieved for the solution of the associated monotone inclusion for a specific function.

1 Setting

We study the following inclusion

¨

x(t) + b

tx(t) +˙ ∂F(x(t))30 (1)

for t > t0 > 0, where F is the absolute value defined by F(x) = |x| and b (0,3). This monotone inclusion for a general convex function F is associated to the optimization Algorithm FISTA of Beck et al. [4] and the Fast Gradient Descend Method of Nesterov [6] as it was remarked by Su et al in [7].

It was proved in [2] and [3] that for

¨ x(t) + b

tx(t) +˙ ∇F(x(t)) = 0 (2)

Jean-Francois.Aujol@math.u-bordeaux.fr

Charles.Dossal@math.u-bordeaux.fr

(3)

whereF is a differentiable convex function and b(0,3) that if x is a minimizer of F,

F(x(t))F(x) =O 1

t2b3

(3) In this note, the main result is the following: we prove that this decay is actually optimal in the sense that it is achieved for (1) with F(x) =|x|. More precisely Proposition 1. Let b 63, if x(t0)6= 0, for any solution x of (1)

1. it exists K1 such that for any t>t0,

|x(t)|6 K1

t2b3 , (4)

2. it exists K2 >0 such that for any T > 0, there exists t > T such that

|x(t)|> K2

t2b3 (5)

Notice that for the case of the ODE (2), it was proved in [3] that the rate of (3) is asymptotically optimal by considering F(x) = |x|(1+β) with β > 0. The ideas presented in this note are an adaptation to the differential inclusion case of the results presented in [3]. In the first part, preliminaries results are given and in a second part the Propostion 1 is proved.

2 Preliminaries

We consider the inclusion

¨

x(t) + b

tx(t) +˙ ∂F(x(t))30 (6)

with F(x) =|x|, and with initial condition x(t0) =x0, ˙x(t0) =v0. Notice that F admits a unique minimizer x = 0.

LetI = [t0,+∞). In [1], it is proven that there exists a weak solution xof (1), with x in W2,∞(I;R)C1,λ(K;R), where λ (0,1) and K is a compact subset of I, such that x(t0) =x0 and ˙x(t0) =v0. In particular, ˙x is continuous and ¨x is bounded.

We recall that ifF is a convex one homogeneous function, then we haveF(x) = zx for any z in∂F(x).

(4)

We also give the following lemma that will prove useful. IfF is a proper lower semi continuous convex function, then there exists z(t) in ∂F(x(t)) such that for almost every t (see Lemma 3.3 in [5]) for more details).

d

dtF(x(t)) =z(t) ˙x(t) (7)

Moreover, t 7→ F(x(t)) is an absolutely continuous function. We recall here that an absolutely continuous function is differentiable almost everywhere. Notice that in particular, if the derivative of an absolutely continuous function is non positive, then this function is non decreasing. See Lemma 4.1 in [1] for more details. These properties will be used intensively in the rest of this note.

Notations IfλR and x a solution of (1) we denote : u(t) :=t|x(t)|+ 1

2t|λ(x(t)) +tx(t)|˙ 2 (8) and

v(t) := 1

2t|x(t)|2 (9)

The analysis is based on Lyapunov functions. Following Su, Boyd and Cand`es [7], for any (λ, ξ)R2 we define

Eλ,ξ(t) = t2|x(t)|+ 1

2|λ(x(t)) +tx(t)|˙ 2+ ξ

2|x(t)|2. (10) One can observe that this energy can be defined using the functions uand v

E(t) =tu(t) +ξtv(t). (11)

As usual, we need to compute the derivative Eλ,ξ0 . We will need two different expressions of the derivative.

Lemma 1.

Eλ,ξ0 (t) =(2λ)t|x(t)|+t(λ+ 1b)|x(t)|˙ 2 (12) + (λ(λ+ 1b) +ξ) ˙x(t)x(t) (13) for almost every t.

Proof. We differentiate and we use the recalls of the preliminary section to get the following computations that hold for almost everyt :

Eλ,ξ0 (t) =2t|x(t)|+t2z(t) ˙x(t) (14)

+λx(t) +˙ tx(t) + ˙¨ x(t)λ(x(t)) +tx(t) +˙ ξx(t)x(t)˙ (15)

(5)

with z(t)∂F(x(t)). We now make use of the differential inclusion:

Eλ,ξ0 (t) =2t|x(t)|+t2z(t) ˙x(t) (16)

+λx(t) +˙ t

−z b tx(t)˙

+ ˙x(t)λ(x(t)) +tx(t) +˙ ξx(t)x(t)˙ (17)

Eλ,ξ0 (t) =2t|x(t)|+t2z(t) ˙x(t) (18)

+ (λ+ 1b) ˙x(t)tzλ(x(t)) +tx(t) +˙ ξx(t)x(t)˙ (19) Eλ,ξ0 (t) =2t|x(t)|+x(t)(λ+ 1b) ˙x(t) (20)

λx(t)z+ (λ+ 1b)t|x(t)|˙ 2+ξx(t)x(t)˙ (21)

Eλ,ξ0 (t) =(2λ)t|x(t)|+t(λ+ 1b)|x(t)|˙ 2 (22) + (λ(λ+ 1b) +ξ) ˙x(t)x(t) (23) (24)

Lemma 2.

Eλ,ξ0 (t) =(2λ)t|x(t)|+ (ξλ(λ+ 1b)) ˙x(t)x(t) (25)

λ2+ 1b)

t |x(t)|2 (26)

+ λ+ 1b

t |λ(x(t)) +tx(t)|˙ 2 (27) (28) Proof. We start from the result of Lemma 1. Observing that

1

t|λ(x(t)) +tx(t)|˙ 2 =t|x(t)|˙ 2+ 2λx(t)x(t) +˙ λ2

t |x(t)|2 (29) we can write for almost every t

Eλ,ξ0 (t) =(2λ)t|x(t)|+ (λ(λ+ 1b) +ξ) ˙x(t)x(t) (30)

2λ(λ+ 1b) ˙x(t)x(t)+ 1b)λ2

t |x(t)|2 (31) + (λ+ 1b)1

t|λ(x(t)) +tx(t)|˙ 2 (32) (33)

(6)

Notations Now, we chooseλ= 2b3 and ξ=λ(λ+ 1b) and we denoteEλ,ξ=E. For these parameters one can observe that λ+ 1b= 13b >0. We can define

c:= 2 2b

3 (34)

Notice that c >0. We introduce the new Lyapunov energy:

H(t) :=t−cE(t). (35)

Lemma 3. We have for almost every t:

E0(t) = cu(t)dv(t). (36)

where d >0 is defined as:

d:=

2b 3

2 1 b

3

(37) Proof.

E0(t) =(2 2b

3)t|x(t)|+ (1 b 3)1

t|2b

3 (x(t)) +tx(t)|˙ 2 (38)

2b

3 2

(1 b 3)1

t|x(t)|2 (39)

Using

u(t) = t|x(t)|+ 1 2t|2b

3(x(t)x) +tx(t)|˙ 2, (40) the definition of v(t),cand d, we get the result.

Proposition 2. If b(0,3), if λ= 2b3 and ξ= 2b9(3b), then for almost every t it holds

H0(t) = −2ξt−cv(t)60. (41)

and H is an absolutely continuous function.

We can observe that forb= 3, we have,c= 0, ξ= 0, E =HandH is constant.

Proof. H is an absolutely continuous function as a sum of absolutely continuous functions.

We have for almost every t

H0(t) =t−c−1(tE0(t)cE(t)). (42) Using Lemma 3, we deduce that for almost every t:

H0(t) = −2ξt−cv(t)60. (43)

(7)

Lemma 4. If x(t0)>0 then limt→+∞H(t) = ` >0.

Proof. The functionHis non-negative and from Proposition 2,His non-increasing, thus H(t) converges to a limit ` >0.

Moreover for all t >t0 we have

t−cE(t) =H(t)6H(t0) (44)

which implies that

|x(t)|6 H(t0)

t2−c = H(t0)

t2b3 (45)

We recall that c:= 2 2b3 and then c[0,2) when b (0,3]. From (43)

H0(t) =−ξt−c−1|x(t)|2 =−ξtc−5t2−c|x(t)|t2−c|x(t)|. (46) and from the definition of H we have H(t)>t2−c|x(t)| which implies that

H0(t)>−ξtc−5t2−c|x(t)|H(t) (47) Unsing the decay of H we have H(t0)>t2−c|x(t)| and thus

H0(t)>−ξtc−5H(t0)H(t). (48) If we denote ϕthe function defined from [t0,+∞] to R by

ϕ(t) := ξH(t0)

c4tc−4 (49)

we have

ϕ0(t) =ξH(t0)tc−5 (50)

and definingG by

G(t) :=H(t)eϕ(t) (51)

which is an absolutely continuous function as a product of an absolutely continuous function by a continuous and bounded function, we have

G0(t) :=eϕ(t)(H0(t) +ϕ0(t)H(t)) (52) which is non negative from (48).

It follows that G is a non-decreasing function and thus that for all t >t0

H(t)>H(t0)eϕ(t0)−ϕ(t)>H(t0)eϕ(t0)>0 (53) which ends the proof of the lemma.

(8)

3 Proof of Proposition 1

The first point of the Proposition was proved in (45). To prove the second point of Proposition 1 we observe that

E(t) = t2|x(t)|+1

2|λx(t) +tx(t)|˙ 2+ ξ

2|x(t)|2. (54) LetK1 =H(t0)eϕ(t0), from (53), ∀t>t0, E(t)>K1t2−2b3. We can notice here that 22b3 >0. There are some cases where the conclusion holds directly.

1. If for t2 > T, 12|λx(t2) +tx(t˙ 2)|2 + ξ2|x(t2)|2 6 K21t2−

2b 3

0 we have t22|x(t2)| >

K1t2−

2b 3

2 K21t2−

2b 3

0 > K21t2−

2b 3

2 and thus we can conclude.

2. If there exists t > T such that ˙x(t) = 0, using the fact that E(t) = t2|x(t)|+

λ2

2 |x(t)|2 and the fact that limt→∞ λ22|x(t)|2 = 0, we can conclude using the previous point.

3. If there existst > T such thatx(t) = 0, since limu→∞|x(u)|= 0, there exists t1 >t such that ˙x(t1) = 0 and we can use the previous point.

We now suppose that x(T) > 0 and the sign of ˙x is constant on [T,+∞). Since limt→∞x(t) = 0 we deduce that ∀t(T,+∞), ˙x(t)<0.

For any t1 > T we have

x(t1)x(T) = Z t1

u=T

˙

x(t)dt (55)

Since x(t1) converges to 0, we deduce that for any ε0 > 0, there exists t2 > T such that |t2x(t˙ 2)| 6 ε0. Hence for any ε > 0, there exists t2 > T such that

1

2|λx(t2) +tx(t˙ 2)|2+ ξ2|x(t2)|2 6ε and we can conclude since we are back to case 1. This concludes the proof of the Proposition.

References

[1] V. Apidopoulos, J-F. Aujol, and C. Dossal. On a second order differential inclusion modeling the FISTA algorithm. Hal Preprint, (01517708), 2017.

[2] H. Attouch, Z. Chbani, and H. Riahi. Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α 3”. Archiv Preprint, June 2017.

(9)

[3] J-F. Aujol and C. Dossal. Optimal rate of convergence of an ode associated to the fast gradient descent schemes for b >0. Hal Preprint, June 2017.

[4] Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algo- rithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–

202, 2009.

[5] Haim Brezis.Op´erateurs maximaux monotones et semi-groupes de contractions dans les espaces de Hilbert, volume 5. Elsevier, 1973.

[6] Yurii Nesterov. A method of solving a convex programming problem with convergence rate o (1/k2). In Soviet Mathematics Doklady, volume 27, pages 372–376, 1983.

[7] Weijie Su, Stephen Boyd, and Emmanuel J Candes. A differential equation for modeling nesterov’s accelerated gradient method: theory and insights. Journal of Machine Learning Research, 17(153):1–43, 2016.

Références

Documents relatifs

If the gateway is authorized to send multiple successive Notify commands, it will proceed as follows. When the gateway exits the &#34;notification state&#34;, it resets the

These packages, when used in conjunction with the packages currently defined in RFC 2705 (Media Gateway Control Protocol Version 1.0) [1], allow an MGCP Call Agent to

When naively applying the scheme depicted in Fig. 2, it became apparent that it is critical to determine the exact value of τ ALD , because the slightest error on the length of the

In the oxygen pulse, typical combustion reaction products are observed. Horizontal slices of spectra during the oxygen plasma pulse are shown in Fig. We can clearly

La transición a la democracia que se intentó llevar en Egipto quedó frustrada en el momento que el brazo militar no quiso despojarse de sus privilegios para entregarlos al

L’objectiu principal és introduir l’art i el procés de creació artística de forma permanent dins l’escola, mitjançant la construcció d’un estudi artístic

también disfruto cuando una parte de la clase opina algo mientras la otra opina lo contrario y debatimos mientras llegamos a una conclusión acerca del poema.”; “he

Zelda se dedicó realmente a realizar todas estas actividades, por tanto, mientras que en la novela que ella misma escribió vemos el deseo de Alabama de dedicarse y triunfar, por