Convergence Rates with Inexact Non-expansive Operators

(1)

HAL Id: hal-01658852

https://hal.archives-ouvertes.fr/hal-01658852

Submitted on 7 Dec 2017

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Convergence Rates with Inexact Non-expansive Operators

Jingwei Liang, Jalal M. Fadili, Gabriel Peyré

To cite this version:

Jingwei Liang, Jalal M. Fadili, Gabriel Peyré. Convergence Rates with Inexact Non-expansive Oper-

ators. Mathematical Programming, Series A, Springer, 2016. �hal-01658852�

(2)

Noname manuscript No.

(will be inserted by the editor)

Convergence Rates with Inexact Non-expansive Operators

Jingwei Liang · Jalal Fadili · Gabriel Peyré

Received: date / Accepted: date

Abstract In this paper, we present a convergence rate analysis for the inexact Krasnosel’ski˘ı–Mann iteration built from non-expansive operators. The presented results include two main parts: we first establish the global pointwise and ergodic iteration-complexity bounds; then, under a metric sub-regularity assumption, we establish a local linear convergence for the distance of the iterates to the set of fixed points. The obtained results can be applied to analyze the convergence rate of various monotone operator splitting methods in the literature, including the Forward–Backward splitting, the Generalized Forward–Backward, the Douglas–Rachford splitting, alternating direction method of multipliers (ADMM) and Primal–Dual splitting methods. For these methods, we also develop easily verifiable termination criteria for finding an approximate solution, which can be seen as a generalization of the termination criterion for the classical gradient descent method. We finally develop a parallel analysis for the non-stationary Krasnosel’ski˘ı–Mann iteration.

Keywords Krasnosel’ski˘ı–Mann iteration·Monotone inclusion·Non-expansive operator·Convergence rates· Asymptotic regularity·Convex optimization

Mathematics Subject Classification (2000) 47H05·47H09·90C25

1 Introduction

1.1 Monotone inclusion and operator splitting methods

In various fields of science and engineering, many problems can be cast as solving a structured monotone inclusion problem. A prototypical example that has attracted a wave of interest recently, see e.g. [65,19,58], takes the form

Findx∈ H such that 0∈Bx+Pn

i=1L^∗_i ◦Ai◦Lix, (1)

whereHis a real Hilbert space,Bis cocoercive,A_iis a set-valued maximal monotone operator acting on a real Hilbert spaceG_i, andL_i:H → G_iis a bounded linear operator. Even more complex forms (e.g. with parallel sums (24)) will be discussed in detail in Section5.

Since the first operator splitting method developed in the 70’s for solving structured monotone inclusion problems, the class of splitting methods has been regularly enriched with increasingly sophisticated algorithms as the structure of problems to handle becomes more complex. Splitting methods are iterative algorithms which evaluate (possibly approximately) the individual operators, their resolvents, the linear operators, all separately at various points in the course of iteration, but never the resolvents of sums nor of composition by a linear operator. Popular splitting algorithms to solve special instances of (1) include the Forward–Backward splitting method (FBS) [46, 54], the Douglas–Rachford splitting method (DRS) [26,43] and Peaceman–Rachford splitting method (PRS) [55], Alternating Direction method of Multipliers (ADMM) [28,29,30,31]. Other splitting methods were designed to solve (1) or even more complex forms (e.g. with parallel sums), see for instance [15,64,61,12,14,58,19,22,65].

Jingwei Liang, Jalal Fadili

Normandie Université-ENSICAEN, GREYC UMR 6072 E-mail: {Jingwei.Liang, Jalal.Fadili}@ensicaen.fr Gabriel Peyré

CNRS and CEREMADE, Université Paris-Dauphine E-mail: [email protected]

(3)

For the operator splitting methods, many of them (including FBS, DRS, ADMM, and many others) can be cast as the Krasnosel’ski˘ı–Mann fixed-point iteration [35,44], possibly in its inexact form to handle errors,

z_k+1=z_k+λ_k(T z_k+ε_k−z_k), (2) whereT : H → H is a non-expansive operator (Definition1), andHis a real Hilbert space with norm|| · ||.

(λk)_k∈N ∈[0,1], andεk is the error of approximatingT zk. The sequencezkis built in way that it converges (in general weakly [59]) to some fixed pointz^?ofT, and the latter can be easily related to a solutionx^?of the original problem, e.g. (1).

1.2 Contributions

We consider the inexact Krasnosel’ski˘ı–Mann iteration (2), and assume that the set of fixed points is non-empty. It is known that a crucial step in proving the convergence of the iterates is to show that||z_k−T z_k|| →0, a property known as asymptotic regularity ofT [53]. In this paper, we show that even with errors, the global pointwise and ergodic iteration-complexity bounds, i.e. rates of asymptotic regularity, are respectivelyO(1/√

k)andO(1/k).

Then under a metric sub-regularity assumption [25], we establish a local linear convergence for the iteration in terms of the distance of the iterates to the set of fixed points. Of course, when the fixed point is unique, the sequence itself converges linearly to it.

Our result can be applied to analyze the convergence behaviour of the iterates generated by various monotone operator splitting methods. As stated above, one crucial property of these methods is that they have the equivalent fixed-point formulation (2)¹. Such methods include the FBS, the Generalized Forward–Backward (GFB) [58], DRS, ADMM, and several Primal–Dual splitting (PDS) methods [65,16]. In particular, for the GFB method developed by two of the co-authors, which addresses the case whenLi’s in (1) equal to identity, we demonstrate thatO(1/) iterations are needed to find a pair((ui)i, g)with the termination criterion||g+B(^P_iωiui)||² ≤ , whereg ∈ P

iAi(ui). This termination criterion can be viewed as a generalization of the classical one based on the norm of the gradient for the gradient descent method [51]. The iteration-complexity improves toO(1/√

)in ergodic sense for the same termination criterion. Similar interpretation is also provided for the DRS/ADMM and PDS considered in Section5.

We finally study the convergence and rates of the non-stationary version of the Krasnosel’ski˘ı–Mann iteration.

By absorbing the non-stationarity appropriately into an additional error term, we show that the iteration-complexity bounds developed above remain valid under reasonable conditions. The obtained result is illustrated on the GFB method.

1.3 Related work

1.3.1 Global iteration-complexity bounds

Relation with [3,21] Suppose thatT : C → C, whereCis a convexcompactsubset ofH, andεk ≡0(i.e. exact case). In [3], the authors conjectured that||zk−T zk|| ≤diam(C)/q

π^P^k_j=1λj(1−λj)wherediam(C)is the diameter ofC, and proved it forλj ≡λ. This conjecture was settled in [21, Theorem 1]². Our work differs from these in several aspects. For instance, we consider the inexact case without any assumption on boundedness of C. We also establish both the pointwise and ergodic convergence rates, as well as local linear convergence under metric sub-regularity.

However, the extension of the result of [21] to the inexact iteration seems quite intricate. The main reason is that their method of proof relies on the recursive bound in [21, Corollary 3], and exploits some properties of some special functions and an identity for Catalan numbers. These recursions are unfortunately not stable to errors.

Relation with [33] The authors in [33] studied the exact version of DRS to solve (1), i.e.B = 0, n = 2and Li= Id, i= 1,2. They also supposed the extra assumption thatA2is single-valued. Let

ek =zk−JγA₁(Id−γA2)zk, (3)

1 In fact, in many cases the fixed-point operatorTis evenα-averaged, see Definition1.

2 The authors consider the case whereHis any normed space.

(4)

whereJγA1 = (Id +γA1)⁻¹is the resolvent ofA1. Relying on firm non-expansiveness of the resolvent [6], [33]

have shown that||e_k||=O(1/√ k).

In fact, it is easy to show that (3) is equivalent to

ek =zk−¹₂ (2JγA₁−Id)(2JγA₂−Id) + Id zk,

where single-valuedness ofA₂is not needed whatsoever. The operator¹₂((2J_γA₁−Id)(2JγA2−Id) + Id)is firmly non-expansive, and thus fits in our framework. Our results in Section3go much beyond this by considering a more general iterative scheme with an operator that is only non-expansive and may be evaluated approximately.

Relation with HPE Based on the enlargement of maximal monotone operators, in [62], a hybrid proximal extra- gradient method (HPE) is introduced to solve monotone inclusion problems of the form

Findx∈ H such that 0∈Ax.

The HPE framework encompasses some splitting algorithms in the literature [48]. The convergence of HPE is established in [62] and in [13] for its inexact version. The pointwise and ergodic iteration-complexities of the exact HPE on a similar error criterion as in our work are established in [48]. Some of the splitting methods we consider in Section5are also covered by HPE, hence our iteration-complexity bounds coincide with those of HPE, but only for the ergodic case. While for the pointwise case, our bound is uniform and theirs is not (see further discussion in Remark3).

1.3.2 Local linear convergence

Relation with [38,40] In [38], local linear convergence of the distance to the set of zeros of a maximal monotone operator using the exact proximal point algorithm (PPA [45,60]) is established by assuming metric sub-regularity of the operator. Local convergence rate analysis of PPA under a higher-order extension of metric sub-regularity, namely metricq-sub-regularityq ∈]0,1], is conducted in [40]. In our work, metric sub-regularity is assumed on Id−TwithT being the fixed-point operator, i.e. the resolvent, rather than the maximal monotone operator in the case of PPA. Relation between metric sub-regularity of these operators is intricate in general and is beyond the scope of this paper, though we provide an instructive discussion for a simple case at the end of Section4. Note also that the work of [38,40] considers PPA only in its classical form, i.e. without errors nor relaxation.

Relation with [8] While the first version of this paper was submitted, we became aware of the recent work of [8].

These authors considered (2) withλk ≡λ∈[0,1[andεk ≡0. In [8, Lemma 3.8], they established local linear convergence under bounded linear regularity ofT (see [8, Definition 2.1]). The latter coincides exactly with our metric sub-regularity assumption onId−T (see (12)). Our result is however more general as it covers the inexact case andλk is allowed to be iteration-dependent.

Local linear rate for feasibility problems In finite dimension, based on strong regularity, [39] proves a local linear convergence of the Method of Alternating Projections (MAP) in the non-convex setting, where the sets are closed and one of which is suitably regular. The linear rate is associated with a modulus of regularity. This is refined later in [7]. In [34], the authors develop a local linear convergence results for the MAP and DRS to solve non-convex feasibility problems, see also [56] for an even more general setting. Their analysis relies on a local version of firm non-expansiveness together with a coercivity condition. It turns out that this coercivity condition holds for mapping T for which the fixed points are isolated andId−Tis metrically regular. The linear rate they establish, however, imposes a bound on the metric regularity modulus.

Other local linear rates with DRS For the case of a sphere intersecting a line, or more generally a proper affine subset, typically inR², [9] establishes local linear convergence of DRS. Local linear convergence of (the relaxed) DRS to solve the affine constraint`1-minimization problem (basis pursuit) is shown in [23]. For this particular instance, their regularity assumptions can be related to our metric sub-regularity assumption involving the DRS fixed-point operator. Given the level of details this relation requires, we defer it (in an even more general setting) to a forthcoming paper, hence we do not discuss this further.

(5)

1.4 Paper organization

The organization of the paper is as follows. In Section2we introduce our main notations and recall some necessary material on monotone operator theory. Global iteration-complexity bounds and local convergence rate of the inexact Krasnosel’ski˘ı–Mann iteration are established in Section3and Section4 respectively. In Section5, illustrative examples of existing monotone operator splitting methods to which our iteration-complexity result can be applied are described. Extension to the non-stationary case is discussed in Section6.

2 Preliminaries

Throughout the paper,Nis the set of non-negative integers andHis a real Hilbert space with scalar producth·,·i, norm|| · ||.Iddenotes the identity operator onH. The sub-differential of a proper functionf :H →]− ∞,+∞]

is the set-valued operator,

∂f :H →2^H, x7→

g∈ H|(∀y∈ H), hy−x, gi+f(x)≤f(y) .

Γ0(H)denotes the class of proper, lower semi-continuous, convex functions fromHto]− ∞,+∞]. Iff ∈Γ0(H), thenprox_f denotes the Moreau proximity operator [50], and the Moreau envelope of indexδ∈]0,+∞[offis the function,

δf :x7→min

y∈Hf(y) + ¹

2δ||y−x||², and its gradient isδ⁻¹-Lipschitz continuous [49].

LetA:H →2^Hbe a set-valued operator. The domain ofAisdomA={x∈ H|Ax6=∅}, the range ofAis ranA={y∈ H|∃x∈ H:y∈Ax}, the graph ofAis the setgraA={(x, y)∈ H²|y∈Ax}, the inverse ofAis the operator whose graph isgraA⁻¹={(y, x)∈ H²|x∈A⁻¹y}, and its zeros set iszerA={x∈ H|0∈Ax}= A⁻¹(0).

The resolvent ofAis the operatorJA = (Id +A)⁻¹, and the reflection operator associated toJAisRA = 2JA−Id.

We denote`¹₊the set of summable sequences in[0,+∞[, and define the index setJ1, nK={1,2,· · ·, n}.

A sequence(xk)k∈Nis said to convergeQ-linearly tox˜if there exists a constantr∈]0,1[such that^||x_||x^k+1^−˜^x||

k−˜x|| ≤ r, and(x_k)_k∈_Nis said to convergeR-linearly tox˜if||x_k−x|| ≤˜ σ_kand(σ_k)_k∈_NconvergesQ-linearly to0.

2.1 Non-expansive operators and quasi-Fejér monotone sequences

Definition 1 (Non-expansive operator)An operatorT :H → His non-expansive if

∀x, y∈ H, ||T x−T y|| ≤ ||x−y||.

For anyα∈]0,1[,T isα-averaged if there exists a non-expansive operatorRsuch thatT =αR+ (1−α)Id.

We denoteA(α)the class ofα-averaged operators onH, in particularA(¹₂)is the class of firmly non-expansive operators, whose detailed property can be found in [6, Proposition 4.2].

Lemma 1 LetT :H → Hbe anα-averaged operator, then_2α¹ (Id−T)∈ A(¹₂).

Proof By definition, there exists a1-Lipschitz continuous operatorRsuch thatT =αR+ (1−α)Id, hence

1

2α(Id−T) =_2α¹ Id−(αR+ (1−α)Id)

= ¹₂ Id + (−R)

∈ A(¹₂).

We also collect below several useful equivalent characterizations of the firmly non-expansive operators.

Lemma 2 The following statements are equivalent:

(i) T is firmly non-expansive;

(ii) 2T−Idis non-expansive;

(iii) ∀x, y∈ H, ||T x−T y||²≤ hT x−T y, x−yi;

(iv) T is the resolvent of a maximal monotone operatorA, i.e.T =JA.

(6)

Proof For (i)-(iii), see [6, Proposition 4.2]; For (i)⇔(iv), see [47].

The next lemma shows that the classA(α)is closed under relaxation, convex combination and composition.

Lemma 3 Let(T_i)_i∈J_1,n

K be a finite family of non-expansive operators fromHto H,(ω_i)_i ∈]0,1]ⁿ such that P

iω_i= 1, and let(α_i)_i∈]0,1]ⁿsuch that, for everyi∈J1, nK, T_i∈ A(α_i). Then, (i) (∀i∈J1, nK) (∀λ_i ∈]0,_α¹

i[), Id +λ_i(T_i−Id)∈ A(λ_iα_i);

(ii) T₁· · ·T_n∈ A(α), withα=_n−1+1/(maxⁿ

i∈J1,nKα_i). (iii) Letα= maxi{αi}, thenP

iωiTiisα-averaged.

Proof See [6, Proposition 4.28, 4.30, 4.32].

Remark 1 For the composite operatorT1· · ·Tn, a sharper bound ofαcan be obtained for the casen = 2with α=^α¹^+α_1−α²^−2α¹^α²

1α2 ∈]0,1[([52, Theorem 3]).

Definition 2 (Quasi–Fejér monotone sequences)LetΩbe a non-empty closed and convex subset ofH. A sequence(xk)_k∈Nisquasi–Fejér monotonewith respect toΩif there exists a sequence(δk)_k∈N∈`¹₊such that

∀x∈Ω, k∈N, ||xk+1−x|| ≤ ||xk−x||+δk. 2.2 Monotone operators

Definition 3 (Monotone operator)A set-valued operatorA:H →2^His monotone if

∀(x, u)∈graA

∀(y, v)∈graA

, hx−y, u−vi ≥0,

it is moreover maximally monotone ifgraAis not strictly contained in the graph of any other monotone operator.

Definition 4 (Cocoercive operator)An operatorB : H → His calledβ-cocoercive for someβ ∈]0,+∞[if βB∈ A(¹₂), i.e.,

(∀x, y∈ H), β||Bx−By||²≤ hBx−By, x−yi.

Observe thatβ-cocoercivity implies¹_β-Lipschitz continuity.

Lemma 4 Letf : H →]− ∞,+∞[be a convex differentiable function, with _β¹-Lipschitz continuous gradient, β∈]0,+∞[. Then

(i) β∇f ∈ A(¹₂), i.e. is firmly non-expansive;

(ii) Id−γ∇f ∈ A(_2β^γ )forγ∈]0,2β[;

Proof (i) See [4, Baillon–Haddad theorem]; (ii) See [6, Proposition 4.33].

2.3 Product Space

Let(ωi)i∈]0,+∞]ⁿ, considerH=Hⁿendowed with the scalar product and norm defined as

∀x,y∈H, hhx,yii=Pn

i=1ω_ihxi, y_ii, ||x||= q

Pn

i=1ω_i||xi||². DefineIdthe identity operator onH.

Let S = {x = (xi)i ∈ H|x1 = · · · = xn} and its orthogonal complement S^⊥ =

x = (xi)i ∈ H|Pn

i=1ω_ix_i = 0 ⊂ H. We also define the canonical isometryC : H → S, x 7→ (x,· · ·, x). We have

∀z∈H,

P_S(z) =C Pn i=1ω_iz_i

. ClearlyP_Sis self-adjoint, and its reflection operator isR_S = 2P_S−Id.

Letγ= (γi)i∈]0,+∞[ⁿ. For arbitrary maximal monotone operatorsAi, i∈J1, nKonH, defineγA:H→2^H,x= (xi)i7→ ×ⁿ_i=1γiAixi, i.e. its graph is

graγA=

×

ⁿi=1graγiAi=

(x,u)∈H²|x= (xi)i, u= (ui)i, ui∈γiAixi .

For a single-valued maximal monotone operatorB, denoteB :H→H, x= (xi)i 7→(Bxi)i. It is immediate to check that bothγAandBare maximal monotone. We also define the operatorsBS =BPS,JγA= (Jγ_iA_i)i

andRγA= 2JγA−Id.

(7)

3 Global iteration-complexity bounds

In this section, we present the global iteration-complexity bounds of the inexact Krasnosel’ski˘ı–Mann iteration.

Definition 5 (Inexact Krasnosel’ski˘ı–Mann iteration)LetT :H → Hbe a non-expansive operator such that the set of fixed pointsfixT ={z∈ H:z=T z}is non-empty. Letλk ∈]0,1], and denoteTλ_k=λkT+(1−λk)Id.

Then the inexact Krasnosel’ski˘ı–Mann iteration ofT is given by

z_k+1=z_k+λ_k(T z_k+ε_k−z_k) =T_λ_kz_k+λ_kε_k, (4) whereεkis the error of approximatingT zk. The error of the iteration is defined as

ek = (Id−T)zk= (zk−zk+1)/λk+εk. (5) Define the following two important notions

T⁰ = Id−T and τk=λk(1−λk), (6) clearly,e_k =T⁰z_k. We start by collecting some useful properties that characterize the above fixed-point iteration.

Proposition 1 The following statements hold,

(i) Tλ_k∈ A(λk)forλk ∈]0,1[, and ifT ∈ A(α), thenTλ_k∈ A(λkα);

(ii) For anyz^?∈fixT,z^?∈fixT ⇐⇒z^?∈fixTλ_k⇐⇒z^?∈zerT⁰; (iii) If(τ_k)_k∈_N∈/ `¹₊and(λ_k||εk||)k∈N∈`¹₊, then,

(a) (e_k)_k∈_Nconverges strongly to0;

(b) (z_k)_k∈_Nis quasi–Fejér monotone with respect tofixT, and converges weakly to a pointz^?∈fixT. Proof (i) Combine Definition1and (i) of Lemma3; (ii) Straightforward; (iii) See [18, Lemma 5.1].

Let’s now turn to the properties ofek. Denoteτ = inf_k∈_Nτk,τ = sup_k∈_Nτk, andν1 = 2 sup_k∈_N||Tλ_kzk− z^?||+ sup_k∈_Nλk||εk||,ν2= 2 sup_k∈_N||ek−ek+1||.

Lemma 5 For the error terme_k, the following inequality holds

1

2λ_k||ek−ek+1||²≤ hek−εk, ek−ek+1i.

Proof By Lemma1,¹₂T⁰ ∈ A(¹₂). It then follows from Lemma2(iii) that∀p, q∈ H,

||¹₂T⁰(p)−¹₂T⁰(q)||²≤ hp−q,¹₂T⁰(p)−¹₂T⁰(q)i.

Applying this bound withp=zkandq=zk+1, and using the definition ofekyields the desired result.

Corollary 1 IfTisα-averaged, the inequality of Lemma5becomes

1

2αλ_k||ek−ek+1||²≤ hek−εk, ek−ek+1i.

Lemma 6 Forz^?∈fixT,λk∈]0,1], we have

||zk+1−z^?||²≤ ||zk−z^?||²−τk||ek||²+ν1λk||εk||.

Proof By virtue of [6, Corollary 2.14] we get

||zk+1−z^?||²=||Tλ_kzk−z^?+λkεk||²

≤ ||(1−λk)(zk−z^?) +λk(T zk−T z^?)||²+ν1λk||εk||

= (1−λ_k)||z_k−z^?||²+λ_k||T z_k−z^?||²−τ_k||z_k−T z_k||²+ν₁λ_k||ε_k||

≤ ||zk−z^?||²−τk||ek||²+ν1λk||εk||.

This lemma indicates that(z_k)_k∈_Nis quasi–Fejér monotone with respect tofixTas stated in Proposition1.

Corollary 2 IfTisα-averaged, the inequality of Lemma6holds withλk ∈]0,1/α]andτk=λk(_α¹ −λk).

(8)

Lemma 7 Forλk ∈]0,1],(ek)k∈Nobeys||ek+1||²−ν2||εk|| ≤ ||ek||². Proof We have

||ek+1||²=||ek+1−ek+ek||²=||ek||²−2hek, ek−ek+1i+||ek−ek+1||²

≤ ||ek||²−2(1−λ_k)hek−ε_k, e_k−e_k+1i −2hεk, e_k−e_k+1i

≤ ||ek||²−^1−λ_λ ^k

k ||ek−ek+1||²+ν2||εk|| ≤ ||ek||²+ν2||εk||

where Lemma5was used twice in the second and third lines.

We are now in position to the main results of this section. Denoted0=d(z0,fixT) = inf_z∈fixT||z0−z||.

Theorem 1 (Pointwise iteration-complexity bound)For the inexact fixed-point iteration(4), if there holds 0< inf

k∈Nλ_k≤sup

k∈N

λ_k<1 and (k+ 1)||εk||

k∈N∈`¹₊, (7)

then, denotingC₁=ν₁P

j∈Nλ_j||εj||+ν₂τP

`∈N(`+ 1)||ε`||<+∞, we have

||ek|| ≤

rd²₀+C1

τ(k+ 1). (8)

Proof Condition (7) impliesτ >0,(τk)_k∈N∈/ `¹₊and(λk||εk||)_k∈N∈`¹₊. Therefore,zkis quasi-Fejér monotone with respect tofixT ((iii) of Proposition1). Thus,(||ek||)_k∈Nand(||zk−z^?||)_k∈Nare bounded for anyz^?∈fixT. Henceν1, ν2andC1are bounded constants. Choosez^?such thatd0=||z0−z^?||. From Lemma6,∀k∈N,

τk||ek||²≤ ||zk−z^?||²− ||zk+1−z^?||²+ν1λk||εk||.

Summing up fromj= 0tok, Pk

j=0τj||ej||²≤ ||z0−z^?||²− ||zk+1−z^?||²+ν1Pk

j=0λj||εj||. (9)

From Lemma7, we have∀j≤k,

||ek||²−ν2Pk−1

`=j ||ε`|| ≤ ||ej||². Substituting this back into (9) yields,

Pk j=0τj

||ek||²≤Pk

j=0τj||ej||²+ν2Pk

j=0τjPk−1

`=j ||ε`||

≤d²₀+ν1Pk

j=0λj||εj||+ν2Pk

j=0τjPk−1

`=j ||ε`||. (10)

Finally, since(k+ 1)τ ≤Pk

j=0τ_j, we get,

(k+ 1)τ||ek||²≤d²₀+ν₁Pk

j=0λ_j||εj||+ν₂τPk−1

`=0 (`+ 1)||ε`||, which leads to the desired result (8).

Remark 2

(i) IfTisα-averaged, then condition (7) of Theorem1onλ_kchanges to0<inf_k∈_Nλ_k≤sup_k∈Nλ_k< _α¹. (ii) Since findingz^? ∈fixT is equivalent to finding a zero ofT⁰((ii) of Proposition1), Theorem1tells us that

O(1/)iterations are needed for (4) to reach an-accurate in terms of the error criterion||T⁰z_k||²≤. (iii) For the case of first-order methods for solving smooth optimization problems, i.e. the gradient descent where

T⁰is just the gradient, the obtained pointwise bound is the best-known complexity bound [51].

(iv) If the fixed-point iteration (4) is exact, then the sequence(||ek||)_k∈_Nis non-increasing (Lemma7), hence we get||ek|| ≤d0/

q Pk

j=0τj, which recovers the result of [21, Proposition 11]. Note that we provide a sharper monotonicity property compared to them.

We now turn to the ergodic iteration-complexity bound of (4). For this, let’s defineΛk =Pk

j=0λj ande¯k =

1 Λ_k

Pk j=0λjej.

(9)

Theorem 2 (Ergodic iteration-complexity bound)Suppose thatC2=P

k∈Nλk||εk||<+∞. Then,

||¯e_k|| ≤ ^2(d⁰⁺^C²⁾

Λk

. In particular, ifinf_k∈_Nλ_k>0, then||¯e_k||=O(1/k).

Proof Again, letz^?∈fixT such thatd₀=||z₀−z^?||. SinceT_λ_kis non-expansive, we have

||zk+1−z^?||=||Tλ_kzk−Tλ_kz^?+λkεk|| ≤ ||zk−z^?||+λk||εk||

≤ ||z_k−1−z^?||+Pk

j=k−1λj||εj|| ≤ ||z0−z^?||+Pk

j=0λj||εj||.

This together with the definition ofe¯kyields

||¯ek||=||_Λ¹

k

Pk

j=0λjej||= _Λ¹

k||^P^k_j=0(zj−zj+1) +^P^k_j=0λjεj||

≤ _Λ¹

k ||z0−z^?||+||zk+1−z^?||+Pk

j=0λj||εj||

≤^2(d⁰_Λ^+C²⁾

k .

Again, this result holds whenT isα-averaged, where nowλkis allowed to vary in]0,1/α].

Remark 3

(i) As in the pointwise case, Theorem2holds whenTisα-averaged, where nowλkis allowed to vary in]0,1/α].

(ii) When T is firmly non-expansive, i.e. the resolvent of a maximal monotone operator ((iv) of Lemma2), anO(1/k)ergodic convergence rate is also established in [48] with summable enlargement errors. For the methods which can also be cast in the HPE framework, our result coincides with the one in [48]. Note that in that work, they handled the non-stationary case (i.e. the parameter of the resolvent varies); see also our extension to the non-stationary case in Section6. For the case without errors, we recover also the bound in [2].

From Theorem1and2, it is immediate to get the convergence rate bounds on the sequence(||zk−z_k+1||)k∈N

in the exact case. To lighten the notation, letv_k=z_k−z_k+1andv¯_k =_k+1¹ Pk j=0v_j. Corollary 3 Assume thatεk= 0for allk∈N.

(i) If0<infk∈Nλk≤sup_k∈Nλk<1, then||vk|| ≤ √ ^d⁰

τ(k+1); (ii) Ifλ= inf_k∈_Nλk>0, then||¯v^k|| ≤ _k+1^2d⁰.

Proof

(i) By definitionvk =λkek, then from (8) we have

||v^k||=||λkek|| ≤λk√ d0

τ(k+1) ≤ √ ^d⁰

τ(k+1). (ii) A direct result of Theorem2by replacingΛ_kwithk+ 1.

4 Local linear convergence rate

In the literature, for many splitting algorithms applied to a range of optimization problems, the following typi- cal convergence profile has been observed in practice. Globally the algorithm converges sub-linearly, and after a sufficiently large number of iterations, the algorithm enters a new regime where a local linear convergence takes over. This has been for instance observed (and sometimes proved) for DRS or FBS when solving sparsity-enforcing minimization problems, see e.g. [23,41].

In this section, we study the rationale underlying this local linear convergence behaviour. Our analysis relies on a metric sub-regularity assumption ofT⁰= Id−T (Definition5).

Definition 6 (Metric sub-regularity [25])A set-valued mappingF :H →2^His called metrically sub-regular at

˜

zforu˜∈F(˜z)if there existsκ≥0along with neighborhoodZof˜zsuch that

d(z, F⁻¹u)˜ ≤κ d(˜u, F z), ∀z∈ Z. (11) The infimum ofκsuch that (11) holds is the modulus of metric sub-regularity, denoted bysubreg(F; ˜z|˜u). The absence of metric regularity is signalled bysubreg(F; ˜z|˜u) = +∞.

(10)

Metric sub-regularity implies that, for any z ∈ Z,d(˜u, F z)is bounded below. The metric (sub)regularity of multifunctions plays a crucial role in modern variational analysis and optimization. These properties are a key to study the stability of solutions of generalized equations, see the dedicated monograph [25].

Let’s specialize this notion toT⁰ andu˜ = 0. SinceT⁰is single-valued andzerT⁰ = fixT, from (11), metric sub-regularity ofT⁰at somez^?∈fixTfor0is equivalent to

d(z,fixT)≤κ||T⁰z||, ∀z∈ Z. (12)

There are several concrete examples of operatorsT whereT⁰fulfills (12).

Example 1 (Projector)Suppose thatT =P_C, forCa non-empty closed convex subset ofH. ThenfixT =Cand d(z,C) =||z−P_Cz||=||T⁰z||. ThusT⁰is metrically sub-regular at anyz^?∈ Cfor0withZ=Hand modulus1.

Using the relation between metric sub-regularity ofT⁰and bounded linear regularity ofT as defined in [8] (see our discussion in Section1.3.2), other examples can be deduced for instance from [8, Example 2.3 and 2.5]. Two other instructive examples, one on DRS with two subspaces and the second on gradient descent will be discussed at the end of the section.

Another interesting situation is whenTis firmly non-expansive, so thatT =JF for some maximal monotone operatorF ((iv) of Lemma2), in which case (4) is the relaxed inexact PPA. Letz^? ∈fixT = zerF and suppose that0∈zerF. IfFis metrically sub-regular atz^?for0with modulusγ, then

d(z,zerF) =d(z,fixT)≤γd(0, F z) =γd(0, T⁻¹z−z) =γ inf

v∈T⁻¹z||v−z||, ∀z∈ Z.

Thus for allwsuch thatz=T w∈ Z, applying the previous inequality and using the triangle inequality, we get d(T w,fixT)≤γ||T⁰w|| and d(w,fixT)≤(1 +γ)||T⁰w||.

Clearly, this is closely related, though not equivalent, to metric sub-regularity ofT⁰.

Metric sub-regularity implies that (12) gives an estimate for how far a pointzis from being the fixed-point set of Tin terms of the residual||z−T z||. This is the rationale behind using such a regularity assumption on the operator T⁰to quantify the convergence rate ond(zk,fixT). Thus, starting fromz0∈ H, and by virtue of Theorem1, one can recover aO(1/√

k)rate ond(zk,fixT). In fact, we can do even better as shown in the following theorem. We use the shorthand notationdk =d(zk,fixT).

Theorem 3 (Local convergence rate)Letz^?∈fixT, assumeT⁰is metrically sub-regular atz^?with neighborhood Zofz^?, letκ >subreg(T⁰;z^?|0),λk ∈]0,1]. Given a ballB^a(z^?)⊂ Z, a≥0, supposeC2=P

k∈Nλk||εk||is small enough such that

B(a+C₂)(z^?)⊆ Z.

Then for any starting pointz0∈B^a(z^?), we have for allk∈N, d²_k+1≤ζkd²_k+ck, where ζk=

(1−_κ^τ^k2, ifτk/κ²∈]0,1]

κ²

κ²+τk, otherwise ∈[0,1[, (13) andc_k=ν₁λ_k||ε_k||. Moreover,

(i) d_kconverges to0if(τ_k)_k∈_N∈/`¹₊. (ii) Letχk = Qk

j=0ζj, ifχ = lim sup

k→+∞

√k

χk < 1, then(d²_k)k∈N ∈ `¹₊. Whenεk = 0, then lim

k→+∞

√k

dk < 1, which isR-linear convergence.

(iii) If0<infk∈Nλk≤sup_k∈Nλk<1, then there existsζ∈(0,1)such that d²_k+1≤ζ^k d²₀+Pk

j=0ζ^−j+1c_j .

Proof Let constantsb > a > 0such thatT⁰ is metrically sub-regular atz^?with modulusκ. Make the radiusa smaller if necessary so thatBa(z^?)⊂Bb(z^?)⊆ Zand

a+C2≤b.

(11)

Pickz0∈B^a(z^?). Ifz0=z^?then, takezk =z^?for allk∈Nand there is nothing more to prove. If not, then from Lemma6that for anyz˜∈fixT,

||zk+1−z|| ≤ ||z˜ k−z||˜ +λ_k||εk|| ≤ · · · ≤ ||z0−z||˜ +Pk

j=0λ_j||εj|| ≤a+C₂≤b,

which implies that starting from any pointz0 ∈ B^a(z^?),zk ∈B^b(z^?)holds for allk∈ N. Now for∀k ∈N, let

˜

z ∈fixT be such thatdk =||zk−z||˜ andck =ν1λk||εk||, then by virtue of the metric sub-regularity ofT⁰and Lemma6, we have

d²_k+1≤ ||zk+1−z||˜ ²≤ ||zk−z||˜ ²−τk||T⁰zk−T⁰˜z||²+ck≤d²_k−_κ^τ^k2d²_k+ck (14)

≤d²_k−^τ_κ^k2(d²_k+1−c_k) +c_k =d²_k−^τ_κ^k2d²_k+1+_ζ¹

kc_k. (15)

Ifτk/κ² ∈]0,1[, then from (14) we haved²_k+1 ≤ (1− _κ^τ^k2)d²_k +ck, or if 1 ≤ τk/κ², (15) producesd²_k+1 ≤

κ²

κ²+τ_kd²_k+ck,λk ∈]0,1]ensuresκ²/(κ²+τk)∈]0,1]. Therefore, we have ζk=

(1−^τ_κ^k₂, ifτ_k/κ²∈]0,1[

κ²

κ²+τ_k, if1≤τk/κ² ∈]0,1].

Furthermore,

d²_k+1≤ζkd²_k+ck≤ · · · ≤χkd²₀+Pk

j=0φ_k−jcj ≤χkd²₀+Pk

j=0cj, (16)

whereχk=Qk

j=0ζjandφ_k−j=Qk

`=j+1ζ`.

(i) From (16) we haved²_k+1 ≤ d²_k+ck, then thed²_k → d ≥ 0([57, Lemma 2.2.2]). If(τk)_k∈N ∈/ `¹₊, then

||ek|| →0(Theorem1), and by metric sub-regularity we havedk≤κ||ek||, therefored= 0;

(ii) Ifχ= lim sup

k→+∞

√k

χk <1, then lim

k→+∞χk= 0and(χk)_k∈N, (φk)_k∈N∈`¹₊. Since also(ck)_k∈N∈`¹₊, hence the convolution ^P^k_j=0φ_k−jc_j

k∈N∈`¹₊, as a result, χ_kd²₀+^P^k_j=0φ_k−jc_j

k∈N∈`¹₊and so is(d²_k)_k∈_N. Ifεk= 0, then from (16) we have lim

k→+∞

√k

dk≤lim sup

k→+∞

√k

χk<1.

(iii) If0 <infk∈Nλk ≤sup_k∈Nλk <1, then there existsζ = sup_k∈Nκ²/(κ²+τk)<1which concludes the result.

Remark 4

(i) When the fixed point is a singleton, Theorem3holds replacingdkby||zk−z^?||.

(ii) For simplicity, suppose the iteration is exact, and letz^?∈fixTsuch thatdk =||zk−z^?||. Then we have

||ek||²=||zk−z^?+T z^?−T z_k||²≤4d²_k, which means locally,||ek||also converges linearly to0given that(τk)_k∈N∈/ `¹₊.

(iii) As far as the claim in (iii) of Theorem3is concerned, if∃ξ∈]0,1]such thatck =O(ξ^k), then (a) Ifξ < ζ, thend²_k+1=O(ζ^k);

(b) Ifξ=ζ, thend²_k+1=O(kζ^k) =o (ζ+δ)^k , δ >0;

(c) Ifξ > ζ, thend²_k+1=o(ξ^k).

Theorem3extends readily to theα-averaged case.

Corollary 4 IfTisα-averaged, then Theorem3holds substitutingλ_kαforλ_kandκαforκ.

Remark 5

(i) Whenλk ≡λandεk ≡0, our second rate estimate in (13) encompasses that of [8, Lemma 3.8].

(ii) Equivalent characterizations of metric sub-regularity can be given, for instance in terms of derivative criteria.

In particular, asT⁰is single-valued, metric sub-regularity ofT⁰holds ifT⁰is differentiable on a neighbourhood ofz^?with non-singular derivatives atzaroundz^?, and the operator norms of their inverses are uniformly bounded [24, Theorem 1.2]. Computing the metric regularity modulusκis however far from obvious in general even for the differentiable case.

(12)

Examples

DRS with two subspaces inR² LetU andV two subspaces inR² forming an angle ofθ ∈]0, π/2]. Consider the problem of findingU ∩ V = {0}using DRS. It is well-known the DRS fixed point operatorT_DRS is firmly non-expansive. Moreover, following [8, Example 2.3 and 2.5], it can be shownT_DRS⁰ = Id−T_DRSis metrically sub-regular with modulus1/sinθ. It then follows from (13) that the rate estimate isζ_k= 1−(2−λ_k)λ_ksin²θ= (1−λ_k)²+ (2−λ_k)λ_kcos²θ∈]0,1[, forλ_k∈]0,2[. This is exactly the optimal rate estimate provided in [42,5].

Observe that as remarked in [42], the best rateζk ≡cos²θis obtained forλk ≡1(no relaxation).

Gradient descent We now consider the example of minimizing a convex functionf :Rⁿ→Rwhich is locallyC² and strongly convex onΩ⊂Rⁿand has Lipschitz gradient. Clearlyf has a unique minimizer, and without losing generality, we denote it as0. Defineδ_m, δ_Mthe local strong convexity and Lipschitz modulus off respectively, then_δ¹

m is equivalent to metric sub-regularity of∇f at0for0[1, Theorem 3.5].

For simplicity, consider the non-relaxed gradient descent method [57] for minimizingfwith constant step-size, i.e.

xk+1=xk−γ∇f(xk), whereγ ∈]0,_δ²

M[. This can be cast in our above framework by setting T = Id−γ∇f which is ^γδ₂^M-averaged (Lemma4). HenceT⁰ = Id^def −T =γ∇f which is continuously differentiable on any neighbourhoodZ ⊂Ω. For anyx∈ Z, the Jacobian ofT⁰is justγ∇²f(x), where∇²f(x)is the Hessian off atx, which is non-singular and its inverse is uniformly bounded by _δ¹

m from local strong convexity. Thus, by virtue of [25, Theorem 4B.1],T⁰is metrically regular, hence sub-regular, and the metric regularity modulusκis precisely _γδ¹

m. In fact we could have anticipated this directly from the local strong monotonicity of∇f. Specializing the rate of Theorem3, we get

ζ= 1−^t(2⁻^t)

cnd² ∈[0,1[,

where we sett=γδ_M∈]0,2[, andcnd =δ_M/δ_mcan be seen as the condition number of the Hessian off along Ω. It is obvious that the smallestζ³is attained fort= 1, i.e.γ= 1/δ_M.

The observed and theoretical convergence profiles of||ek|| = ||xk −x_k+1||are illustrated in Figure1where gradient descent was run withγ ∈ {_2δ¹

M,_δ¹

M}. As predicted by our result, the convergence profile exhibits two regimes, a global sub-linear one, and then a local linear one.

k kek k

10⁰ 10¹ 10²

10^-10 10^-6 10^-2 10²

global 1=p k practical localp

1;.= _2/¹

M

practical.=_2/¹

M

localp 1;.= _/¹

M

practical.=_/¹

M

120 140 160

Fig. 1 Global and local convergence profiles of gradient descent to minimizefwith(δm, δM) = (0.8,1). Forγ= _2δ¹

M, the observed and theoretical rates are respectively0.60and0.72. Forγ=_δ¹

M, they are0.20and0.60respectively.

3 One may observe thatγcan be modified locally, once the iterates enter the appropriate neighbourhood, to get the usual optimal linear rate of gradient descent with a strongly convex objective [57]. But this is not our aim here.

(13)

5 Applications

In this section, we apply the obtained results to conduct quantitative convergence analysis of a class of monotone operator splitting methods in the literature, and mainly focus on the global iteration-complexity bounds. As stated in the introduction, we will rely on the fact that all the considered iterative schemes (GFB/FBS, DRS/ADMM and PDS) can be cast as Krasnosel’ski˘ı–Mann iteration. Furthermore, based on the structure of the corresponding monotone inclusion problem, we also derive specific error criteria which can serve as termination tests.

5.1 Generalized Forward–Backward splitting

The GFB algorithm [58] addresses the monotone inclusion (1) whenLi= Id, i∈J1, nK, andBisβ-cocoercive. By lifting the problem into product space, GFB achieves full splitting by applying theAi’s implicitly andBexplicitly.

Let(ω_i)_i∈]0,1[ⁿsuch thatPn

i=1ω_i= 1,γ∈]0,2β[,λ_k ∈]0,^4β−γ_2β [,(b_k)_k∈_Nand(a_i,k)_k∈_Nare error terms. The GFB iteration reads





Fori= 1,· · ·, n jzi,k+1=zi,k+λk J^γ

ωiA_i(2xk−zi,k−γBxk+bk) +ai,k−xk , x_k+1=Pn

i=1ω_iz_i,k+1.

(17)

GFB recovers FBS forn= 1, and whenB= 0, GFB recovers DRS in theproduct space. In the literature, the convergence properties of the FBS and DRS have been extensively studied, see [6] and references therein. In this section, we mainly focus on the GFB method.

Recall the notation introduced in Section2.3. Denoteui,k+1=J^γ

ωiA_i(2xk−zi,k−γBxk), i∈J1, nK, then (17) equivalently reads

zi,k+1=zi,k+λk(ui,k+1+εi,k−xk), whereε_i,k = J^γ

ωiA_i(2x_k−z_i,k−γBx_k+b_k) +a_i,k

−u_i,k+1. Letε_1,k=C(b_k),ε_2,k= (a_i,k)_i, andT_1,γ=

1

2(R_γAR_S+Id), T_2,γ=Id−γB_S.

Using Remark1, the composed operatorTγ =T1,γ◦T2,γis _4β−γ^2β -averaged. Moreover, it was shown in [58]

that the fixed-point iteration of GFB reads,

zk+1 =zk+λk T1,γ(T2,γzk+ε1,k) +ε2,k−zk

=zk+λk(Tγzk+εk−zk), (18) whereε_k = T_1,γ(T_2,γz_k+ε_1,k) +ε_2,k

−T_γz_k.

Obviously, (18) is exactly in the form of (4). Therefore, the GFB iterates converge weakly [58, Theorem 4.1], and obey the iteration-complexity bounds in Theorem1and2. Moreover, we can establish certain convergence rates for the structured monotone inclusion (1).

Letgk+1= ¹_γxk−Bxk−¹_γ^P_iωiui,k+1. Defineu¯i,k+1= _k+1¹ Pk

j=0ui,j+1, i∈J1, nK,x¯k= _k+1¹ Pk j=0xj

andg¯_k+1= ¹_γx¯_k−Bx¯_k−¹_γ ^P_iω_iu¯_i,k+1. Proposition 2 We havegk+1∈P

iAiui,k+1. Moreover, (i) If0 <inf_k∈Nλk ≤sup_k∈_Nλk < ^4β−γ_2β , (k+ 1)||bk||

k∈N∈`¹₊and∀i∈J1, nK, (k+ 1)||ai,k||

k∈N∈`¹₊, then

d 0,^P_iAiui,k+1+B(^P_iωiui,k+1)

≤ 1 γ

r

d²₀+C1

τ(k+ 1) . (ii) Ifλ= inf_k∈_Nλ_k >0,(λ_k||bk||)k∈N∈`¹₊and(λ_k||ai,k||)k∈N∈`¹₊,∀i∈J1, nK, then

||¯gk+1+B(^P_iωiu¯i,k+1)|| ≤ ^2(d⁰⁺^C²⁾

γλ(k+ 1). C1, C2are the constants in Theorem1and2respectively.