HAL Id: hal-01658852
https://hal.archives-ouvertes.fr/hal-01658852
Submitted on 7 Dec 2017
HAL
is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire
HAL, estdestinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Convergence Rates with Inexact Non-expansive Operators
Jingwei Liang, Jalal M. Fadili, Gabriel Peyré
To cite this version:
Jingwei Liang, Jalal M. Fadili, Gabriel Peyré. Convergence Rates with Inexact Non-expansive Oper-
ators. Mathematical Programming, Series A, Springer, 2016. �hal-01658852�
Noname manuscript No.
(will be inserted by the editor)
Convergence Rates with Inexact Non-expansive Operators
Jingwei Liang · Jalal Fadili · Gabriel Peyré
Received: date / Accepted: date
Abstract In this paper, we present a convergence rate analysis for the inexact Krasnosel’ski˘ı–Mann iteration built from non-expansive operators. The presented results include two main parts: we first establish the global point- wise and ergodic iteration-complexity bounds; then, under a metric sub-regularity assumption, we establish a local linear convergence for the distance of the iterates to the set of fixed points. The obtained results can be applied to analyze the convergence rate of various monotone operator splitting methods in the literature, including the Forward–Backward splitting, the Generalized Forward–Backward, the Douglas–Rachford splitting, alternating di- rection method of multipliers (ADMM) and Primal–Dual splitting methods. For these methods, we also develop easily verifiable termination criteria for finding an approximate solution, which can be seen as a generalization of the termination criterion for the classical gradient descent method. We finally develop a parallel analysis for the non-stationary Krasnosel’ski˘ı–Mann iteration.
Keywords Krasnosel’ski˘ı–Mann iteration·Monotone inclusion·Non-expansive operator·Convergence rates· Asymptotic regularity·Convex optimization
Mathematics Subject Classification (2000) 47H05·47H09·90C25
1 Introduction
1.1 Monotone inclusion and operator splitting methods
In various fields of science and engineering, many problems can be cast as solving a structured monotone inclusion problem. A prototypical example that has attracted a wave of interest recently, see e.g. [65,19,58], takes the form
Findx∈ H such that 0∈Bx+Pn
i=1L∗i ◦Ai◦Lix, (1)
whereHis a real Hilbert space,Bis cocoercive,Aiis a set-valued maximal monotone operator acting on a real Hilbert spaceGi, andLi:H → Giis a bounded linear operator. Even more complex forms (e.g. with parallel sums (24)) will be discussed in detail in Section5.
Since the first operator splitting method developed in the 70’s for solving structured monotone inclusion prob- lems, the class of splitting methods has been regularly enriched with increasingly sophisticated algorithms as the structure of problems to handle becomes more complex. Splitting methods are iterative algorithms which evaluate (possibly approximately) the individual operators, their resolvents, the linear operators, all separately at various points in the course of iteration, but never the resolvents of sums nor of composition by a linear operator. Popular splitting algorithms to solve special instances of (1) include the Forward–Backward splitting method (FBS) [46, 54], the Douglas–Rachford splitting method (DRS) [26,43] and Peaceman–Rachford splitting method (PRS) [55], Alternating Direction method of Multipliers (ADMM) [28,29,30,31]. Other splitting methods were designed to solve (1) or even more complex forms (e.g. with parallel sums), see for instance [15,64,61,12,14,58,19,22,65].
Jingwei Liang, Jalal Fadili
Normandie Université-ENSICAEN, GREYC UMR 6072 E-mail: {Jingwei.Liang, Jalal.Fadili}@ensicaen.fr Gabriel Peyré
CNRS and CEREMADE, Université Paris-Dauphine E-mail: [email protected]
For the operator splitting methods, many of them (including FBS, DRS, ADMM, and many others) can be cast as the Krasnosel’ski˘ı–Mann fixed-point iteration [35,44], possibly in its inexact form to handle errors,
zk+1=zk+λk(T zk+εk−zk), (2) whereT : H → H is a non-expansive operator (Definition1), andHis a real Hilbert space with norm|| · ||.
(λk)k∈N ∈[0,1], andεk is the error of approximatingT zk. The sequencezkis built in way that it converges (in general weakly [59]) to some fixed pointz?ofT, and the latter can be easily related to a solutionx?of the original problem, e.g. (1).
1.2 Contributions
We consider the inexact Krasnosel’ski˘ı–Mann iteration (2), and assume that the set of fixed points is non-empty. It is known that a crucial step in proving the convergence of the iterates is to show that||zk−T zk|| →0, a property known as asymptotic regularity ofT [53]. In this paper, we show that even with errors, the global pointwise and ergodic iteration-complexity bounds, i.e. rates of asymptotic regularity, are respectivelyO(1/√
k)andO(1/k).
Then under a metric sub-regularity assumption [25], we establish a local linear convergence for the iteration in terms of the distance of the iterates to the set of fixed points. Of course, when the fixed point is unique, the sequence itself converges linearly to it.
Our result can be applied to analyze the convergence behaviour of the iterates generated by various monotone operator splitting methods. As stated above, one crucial property of these methods is that they have the equivalent fixed-point formulation (2)1. Such methods include the FBS, the Generalized Forward–Backward (GFB) [58], DRS, ADMM, and several Primal–Dual splitting (PDS) methods [65,16]. In particular, for the GFB method developed by two of the co-authors, which addresses the case whenLi’s in (1) equal to identity, we demonstrate thatO(1/) iterations are needed to find a pair((ui)i, g)with the termination criterion||g+B(Piωiui)||2 ≤ , whereg ∈ P
iAi(ui). This termination criterion can be viewed as a generalization of the classical one based on the norm of the gradient for the gradient descent method [51]. The iteration-complexity improves toO(1/√
)in ergodic sense for the same termination criterion. Similar interpretation is also provided for the DRS/ADMM and PDS considered in Section5.
We finally study the convergence and rates of the non-stationary version of the Krasnosel’ski˘ı–Mann iteration.
By absorbing the non-stationarity appropriately into an additional error term, we show that the iteration-complexity bounds developed above remain valid under reasonable conditions. The obtained result is illustrated on the GFB method.
1.3 Related work
1.3.1 Global iteration-complexity bounds
Relation with [3,21] Suppose thatT : C → C, whereCis a convexcompactsubset ofH, andεk ≡0(i.e. exact case). In [3], the authors conjectured that||zk−T zk|| ≤diam(C)/q
πPkj=1λj(1−λj)wherediam(C)is the diameter ofC, and proved it forλj ≡λ. This conjecture was settled in [21, Theorem 1]2. Our work differs from these in several aspects. For instance, we consider the inexact case without any assumption on boundedness of C. We also establish both the pointwise and ergodic convergence rates, as well as local linear convergence under metric sub-regularity.
However, the extension of the result of [21] to the inexact iteration seems quite intricate. The main reason is that their method of proof relies on the recursive bound in [21, Corollary 3], and exploits some properties of some special functions and an identity for Catalan numbers. These recursions are unfortunately not stable to errors.
Relation with [33] The authors in [33] studied the exact version of DRS to solve (1), i.e.B = 0, n = 2and Li= Id, i= 1,2. They also supposed the extra assumption thatA2is single-valued. Let
ek =zk−JγA1(Id−γA2)zk, (3)
1 In fact, in many cases the fixed-point operatorTis evenα-averaged, see Definition1.
2 The authors consider the case whereHis any normed space.
whereJγA1 = (Id +γA1)−1is the resolvent ofA1. Relying on firm non-expansiveness of the resolvent [6], [33]
have shown that||ek||=O(1/√ k).
In fact, it is easy to show that (3) is equivalent to
ek =zk−12 (2JγA1−Id)(2JγA2−Id) + Id zk,
where single-valuedness ofA2is not needed whatsoever. The operator12((2JγA1−Id)(2JγA2−Id) + Id)is firmly non-expansive, and thus fits in our framework. Our results in Section3go much beyond this by considering a more general iterative scheme with an operator that is only non-expansive and may be evaluated approximately.
Relation with HPE Based on the enlargement of maximal monotone operators, in [62], a hybrid proximal extra- gradient method (HPE) is introduced to solve monotone inclusion problems of the form
Findx∈ H such that 0∈Ax.
The HPE framework encompasses some splitting algorithms in the literature [48]. The convergence of HPE is established in [62] and in [13] for its inexact version. The pointwise and ergodic iteration-complexities of the exact HPE on a similar error criterion as in our work are established in [48]. Some of the splitting methods we consider in Section5are also covered by HPE, hence our iteration-complexity bounds coincide with those of HPE, but only for the ergodic case. While for the pointwise case, our bound is uniform and theirs is not (see further discussion in Remark3).
1.3.2 Local linear convergence
Relation with [38,40] In [38], local linear convergence of the distance to the set of zeros of a maximal monotone operator using the exact proximal point algorithm (PPA [45,60]) is established by assuming metric sub-regularity of the operator. Local convergence rate analysis of PPA under a higher-order extension of metric sub-regularity, namely metricq-sub-regularityq ∈]0,1], is conducted in [40]. In our work, metric sub-regularity is assumed on Id−TwithT being the fixed-point operator, i.e. the resolvent, rather than the maximal monotone operator in the case of PPA. Relation between metric sub-regularity of these operators is intricate in general and is beyond the scope of this paper, though we provide an instructive discussion for a simple case at the end of Section4. Note also that the work of [38,40] considers PPA only in its classical form, i.e. without errors nor relaxation.
Relation with [8] While the first version of this paper was submitted, we became aware of the recent work of [8].
These authors considered (2) withλk ≡λ∈[0,1[andεk ≡0. In [8, Lemma 3.8], they established local linear convergence under bounded linear regularity ofT (see [8, Definition 2.1]). The latter coincides exactly with our metric sub-regularity assumption onId−T (see (12)). Our result is however more general as it covers the inexact case andλk is allowed to be iteration-dependent.
Local linear rate for feasibility problems In finite dimension, based on strong regularity, [39] proves a local linear convergence of the Method of Alternating Projections (MAP) in the non-convex setting, where the sets are closed and one of which is suitably regular. The linear rate is associated with a modulus of regularity. This is refined later in [7]. In [34], the authors develop a local linear convergence results for the MAP and DRS to solve non-convex feasibility problems, see also [56] for an even more general setting. Their analysis relies on a local version of firm non-expansiveness together with a coercivity condition. It turns out that this coercivity condition holds for mapping T for which the fixed points are isolated andId−Tis metrically regular. The linear rate they establish, however, imposes a bound on the metric regularity modulus.
Other local linear rates with DRS For the case of a sphere intersecting a line, or more generally a proper affine subset, typically inR2, [9] establishes local linear convergence of DRS. Local linear convergence of (the relaxed) DRS to solve the affine constraint`1-minimization problem (basis pursuit) is shown in [23]. For this particular instance, their regularity assumptions can be related to our metric sub-regularity assumption involving the DRS fixed-point operator. Given the level of details this relation requires, we defer it (in an even more general setting) to a forthcoming paper, hence we do not discuss this further.
1.4 Paper organization
The organization of the paper is as follows. In Section2we introduce our main notations and recall some necessary material on monotone operator theory. Global iteration-complexity bounds and local convergence rate of the inexact Krasnosel’ski˘ı–Mann iteration are established in Section3and Section4 respectively. In Section5, illustrative examples of existing monotone operator splitting methods to which our iteration-complexity result can be applied are described. Extension to the non-stationary case is discussed in Section6.
2 Preliminaries
Throughout the paper,Nis the set of non-negative integers andHis a real Hilbert space with scalar producth·,·i, norm|| · ||.Iddenotes the identity operator onH. The sub-differential of a proper functionf :H →]− ∞,+∞]
is the set-valued operator,
∂f :H →2H, x7→
g∈ H|(∀y∈ H), hy−x, gi+f(x)≤f(y) .
Γ0(H)denotes the class of proper, lower semi-continuous, convex functions fromHto]− ∞,+∞]. Iff ∈Γ0(H), thenproxf denotes the Moreau proximity operator [50], and the Moreau envelope of indexδ∈]0,+∞[offis the function,
δf :x7→min
y∈Hf(y) + 1
2δ||y−x||2, and its gradient isδ−1-Lipschitz continuous [49].
LetA:H →2Hbe a set-valued operator. The domain ofAisdomA={x∈ H|Ax6=∅}, the range ofAis ranA={y∈ H|∃x∈ H:y∈Ax}, the graph ofAis the setgraA={(x, y)∈ H2|y∈Ax}, the inverse ofAis the operator whose graph isgraA−1={(y, x)∈ H2|x∈A−1y}, and its zeros set iszerA={x∈ H|0∈Ax}= A−1(0).
The resolvent ofAis the operatorJA = (Id +A)−1, and the reflection operator associated toJAisRA = 2JA−Id.
We denote`1+the set of summable sequences in[0,+∞[, and define the index setJ1, nK={1,2,· · ·, n}.
A sequence(xk)k∈Nis said to convergeQ-linearly tox˜if there exists a constantr∈]0,1[such that||x||xk+1−˜x||
k−˜x|| ≤ r, and(xk)k∈Nis said to convergeR-linearly tox˜if||xk−x|| ≤˜ σkand(σk)k∈NconvergesQ-linearly to0.
2.1 Non-expansive operators and quasi-Fejér monotone sequences
Definition 1 (Non-expansive operator)An operatorT :H → His non-expansive if
∀x, y∈ H, ||T x−T y|| ≤ ||x−y||.
For anyα∈]0,1[,T isα-averaged if there exists a non-expansive operatorRsuch thatT =αR+ (1−α)Id.
We denoteA(α)the class ofα-averaged operators onH, in particularA(12)is the class of firmly non-expansive operators, whose detailed property can be found in [6, Proposition 4.2].
Lemma 1 LetT :H → Hbe anα-averaged operator, then2α1 (Id−T)∈ A(12).
Proof By definition, there exists a1-Lipschitz continuous operatorRsuch thatT =αR+ (1−α)Id, hence
1
2α(Id−T) =2α1 Id−(αR+ (1−α)Id)
= 12 Id + (−R)
∈ A(12).
We also collect below several useful equivalent characterizations of the firmly non-expansive operators.
Lemma 2 The following statements are equivalent:
(i) T is firmly non-expansive;
(ii) 2T−Idis non-expansive;
(iii) ∀x, y∈ H, ||T x−T y||2≤ hT x−T y, x−yi;
(iv) T is the resolvent of a maximal monotone operatorA, i.e.T =JA.
Proof For (i)-(iii), see [6, Proposition 4.2]; For (i)⇔(iv), see [47].
The next lemma shows that the classA(α)is closed under relaxation, convex combination and composition.
Lemma 3 Let(Ti)i∈J1,n
K be a finite family of non-expansive operators fromHto H,(ωi)i ∈]0,1]n such that P
iωi= 1, and let(αi)i∈]0,1]nsuch that, for everyi∈J1, nK, Ti∈ A(αi). Then, (i) (∀i∈J1, nK) (∀λi ∈]0,α1
i[), Id +λi(Ti−Id)∈ A(λiαi);
(ii) T1· · ·Tn∈ A(α), withα=n−1+1/(maxn
i∈J1,nKαi). (iii) Letα= maxi{αi}, thenP
iωiTiisα-averaged.
Proof See [6, Proposition 4.28, 4.30, 4.32].
Remark 1 For the composite operatorT1· · ·Tn, a sharper bound ofαcan be obtained for the casen = 2with α=α1+α1−α2−2α1α2
1α2 ∈]0,1[([52, Theorem 3]).
Definition 2 (Quasi–Fejér monotone sequences)LetΩbe a non-empty closed and convex subset ofH. A se- quence(xk)k∈Nisquasi–Fejér monotonewith respect toΩif there exists a sequence(δk)k∈N∈`1+such that
∀x∈Ω, k∈N, ||xk+1−x|| ≤ ||xk−x||+δk. 2.2 Monotone operators
Definition 3 (Monotone operator)A set-valued operatorA:H →2His monotone if
∀(x, u)∈graA
∀(y, v)∈graA
, hx−y, u−vi ≥0,
it is moreover maximally monotone ifgraAis not strictly contained in the graph of any other monotone operator.
Definition 4 (Cocoercive operator)An operatorB : H → His calledβ-cocoercive for someβ ∈]0,+∞[if βB∈ A(12), i.e.,
(∀x, y∈ H), β||Bx−By||2≤ hBx−By, x−yi.
Observe thatβ-cocoercivity implies1β-Lipschitz continuity.
Lemma 4 Letf : H →]− ∞,+∞[be a convex differentiable function, with β1-Lipschitz continuous gradient, β∈]0,+∞[. Then
(i) β∇f ∈ A(12), i.e. is firmly non-expansive;
(ii) Id−γ∇f ∈ A(2βγ )forγ∈]0,2β[;
Proof (i) See [4, Baillon–Haddad theorem]; (ii) See [6, Proposition 4.33].
2.3 Product Space
Let(ωi)i∈]0,+∞]n, considerH=Hnendowed with the scalar product and norm defined as
∀x,y∈H, hhx,yii=Pn
i=1ωihxi, yii, ||x||= q
Pn
i=1ωi||xi||2. DefineIdthe identity operator onH.
Let S = {x = (xi)i ∈ H|x1 = · · · = xn} and its orthogonal complement S⊥ =
x = (xi)i ∈ H|Pn
i=1ωixi = 0 ⊂ H. We also define the canonical isometryC : H → S, x 7→ (x,· · ·, x). We have
∀z∈H,
PS(z) =C Pn i=1ωizi
. ClearlyPSis self-adjoint, and its reflection operator isRS = 2PS−Id.
Letγ= (γi)i∈]0,+∞[n. For arbitrary maximal monotone operatorsAi, i∈J1, nKonH, defineγA:H→2H,x= (xi)i7→ ×ni=1γiAixi, i.e. its graph is
graγA=
×
ni=1graγiAi=(x,u)∈H2|x= (xi)i, u= (ui)i, ui∈γiAixi .
For a single-valued maximal monotone operatorB, denoteB :H→H, x= (xi)i 7→(Bxi)i. It is immediate to check that bothγAandBare maximal monotone. We also define the operatorsBS =BPS,JγA= (JγiAi)i
andRγA= 2JγA−Id.
3 Global iteration-complexity bounds
In this section, we present the global iteration-complexity bounds of the inexact Krasnosel’ski˘ı–Mann iteration.
Definition 5 (Inexact Krasnosel’ski˘ı–Mann iteration)LetT :H → Hbe a non-expansive operator such that the set of fixed pointsfixT ={z∈ H:z=T z}is non-empty. Letλk ∈]0,1], and denoteTλk=λkT+(1−λk)Id.
Then the inexact Krasnosel’ski˘ı–Mann iteration ofT is given by
zk+1=zk+λk(T zk+εk−zk) =Tλkzk+λkεk, (4) whereεkis the error of approximatingT zk. The error of the iteration is defined as
ek = (Id−T)zk= (zk−zk+1)/λk+εk. (5) Define the following two important notions
T0 = Id−T and τk=λk(1−λk), (6) clearly,ek =T0zk. We start by collecting some useful properties that characterize the above fixed-point iteration.
Proposition 1 The following statements hold,
(i) Tλk∈ A(λk)forλk ∈]0,1[, and ifT ∈ A(α), thenTλk∈ A(λkα);
(ii) For anyz?∈fixT,z?∈fixT ⇐⇒z?∈fixTλk⇐⇒z?∈zerT0; (iii) If(τk)k∈N∈/ `1+and(λk||εk||)k∈N∈`1+, then,
(a) (ek)k∈Nconverges strongly to0;
(b) (zk)k∈Nis quasi–Fejér monotone with respect tofixT, and converges weakly to a pointz?∈fixT. Proof (i) Combine Definition1and (i) of Lemma3; (ii) Straightforward; (iii) See [18, Lemma 5.1].
Let’s now turn to the properties ofek. Denoteτ = infk∈Nτk,τ = supk∈Nτk, andν1 = 2 supk∈N||Tλkzk− z?||+ supk∈Nλk||εk||,ν2= 2 supk∈N||ek−ek+1||.
Lemma 5 For the error termek, the following inequality holds
1
2λk||ek−ek+1||2≤ hek−εk, ek−ek+1i.
Proof By Lemma1,12T0 ∈ A(12). It then follows from Lemma2(iii) that∀p, q∈ H,
||12T0(p)−12T0(q)||2≤ hp−q,12T0(p)−12T0(q)i.
Applying this bound withp=zkandq=zk+1, and using the definition ofekyields the desired result.
Corollary 1 IfTisα-averaged, the inequality of Lemma5becomes
1
2αλk||ek−ek+1||2≤ hek−εk, ek−ek+1i.
Lemma 6 Forz?∈fixT,λk∈]0,1], we have
||zk+1−z?||2≤ ||zk−z?||2−τk||ek||2+ν1λk||εk||.
Proof By virtue of [6, Corollary 2.14] we get
||zk+1−z?||2=||Tλkzk−z?+λkεk||2
≤ ||(1−λk)(zk−z?) +λk(T zk−T z?)||2+ν1λk||εk||
= (1−λk)||zk−z?||2+λk||T zk−z?||2−τk||zk−T zk||2+ν1λk||εk||
≤ ||zk−z?||2−τk||ek||2+ν1λk||εk||.
This lemma indicates that(zk)k∈Nis quasi–Fejér monotone with respect tofixTas stated in Proposition1.
Corollary 2 IfTisα-averaged, the inequality of Lemma6holds withλk ∈]0,1/α]andτk=λk(α1 −λk).
Lemma 7 Forλk ∈]0,1],(ek)k∈Nobeys||ek+1||2−ν2||εk|| ≤ ||ek||2. Proof We have
||ek+1||2=||ek+1−ek+ek||2=||ek||2−2hek, ek−ek+1i+||ek−ek+1||2
≤ ||ek||2−2(1−λk)hek−εk, ek−ek+1i −2hεk, ek−ek+1i
≤ ||ek||2−1−λλ k
k ||ek−ek+1||2+ν2||εk|| ≤ ||ek||2+ν2||εk||
where Lemma5was used twice in the second and third lines.
We are now in position to the main results of this section. Denoted0=d(z0,fixT) = infz∈fixT||z0−z||.
Theorem 1 (Pointwise iteration-complexity bound)For the inexact fixed-point iteration(4), if there holds 0< inf
k∈Nλk≤sup
k∈N
λk<1 and (k+ 1)||εk||
k∈N∈`1+, (7)
then, denotingC1=ν1P
j∈Nλj||εj||+ν2τP
`∈N(`+ 1)||ε`||<+∞, we have
||ek|| ≤
rd20+C1
τ(k+ 1). (8)
Proof Condition (7) impliesτ >0,(τk)k∈N∈/ `1+and(λk||εk||)k∈N∈`1+. Therefore,zkis quasi-Fejér monotone with respect tofixT ((iii) of Proposition1). Thus,(||ek||)k∈Nand(||zk−z?||)k∈Nare bounded for anyz?∈fixT. Henceν1, ν2andC1are bounded constants. Choosez?such thatd0=||z0−z?||. From Lemma6,∀k∈N,
τk||ek||2≤ ||zk−z?||2− ||zk+1−z?||2+ν1λk||εk||.
Summing up fromj= 0tok, Pk
j=0τj||ej||2≤ ||z0−z?||2− ||zk+1−z?||2+ν1Pk
j=0λj||εj||. (9)
From Lemma7, we have∀j≤k,
||ek||2−ν2Pk−1
`=j ||ε`|| ≤ ||ej||2. Substituting this back into (9) yields,
Pk j=0τj
||ek||2≤Pk
j=0τj||ej||2+ν2Pk
j=0τjPk−1
`=j ||ε`||
≤d20+ν1Pk
j=0λj||εj||+ν2Pk
j=0τjPk−1
`=j ||ε`||. (10)
Finally, since(k+ 1)τ ≤Pk
j=0τj, we get,
(k+ 1)τ||ek||2≤d20+ν1Pk
j=0λj||εj||+ν2τPk−1
`=0 (`+ 1)||ε`||, which leads to the desired result (8).
Remark 2
(i) IfTisα-averaged, then condition (7) of Theorem1onλkchanges to0<infk∈Nλk≤supk∈Nλk< α1. (ii) Since findingz? ∈fixT is equivalent to finding a zero ofT0((ii) of Proposition1), Theorem1tells us that
O(1/)iterations are needed for (4) to reach an-accurate in terms of the error criterion||T0zk||2≤. (iii) For the case of first-order methods for solving smooth optimization problems, i.e. the gradient descent where
T0is just the gradient, the obtained pointwise bound is the best-known complexity bound [51].
(iv) If the fixed-point iteration (4) is exact, then the sequence(||ek||)k∈Nis non-increasing (Lemma7), hence we get||ek|| ≤d0/
q Pk
j=0τj, which recovers the result of [21, Proposition 11]. Note that we provide a sharper monotonicity property compared to them.
We now turn to the ergodic iteration-complexity bound of (4). For this, let’s defineΛk =Pk
j=0λj ande¯k =
1 Λk
Pk j=0λjej.
Theorem 2 (Ergodic iteration-complexity bound)Suppose thatC2=P
k∈Nλk||εk||<+∞. Then,
||¯ek|| ≤ 2(d0+C2)
Λk
. In particular, ifinfk∈Nλk>0, then||¯ek||=O(1/k).
Proof Again, letz?∈fixT such thatd0=||z0−z?||. SinceTλkis non-expansive, we have
||zk+1−z?||=||Tλkzk−Tλkz?+λkεk|| ≤ ||zk−z?||+λk||εk||
≤ ||zk−1−z?||+Pk
j=k−1λj||εj|| ≤ ||z0−z?||+Pk
j=0λj||εj||.
This together with the definition ofe¯kyields
||¯ek||=||Λ1
k
Pk
j=0λjej||= Λ1
k||Pkj=0(zj−zj+1) +Pkj=0λjεj||
≤ Λ1
k ||z0−z?||+||zk+1−z?||+Pk
j=0λj||εj||
≤2(d0Λ+C2)
k .
Again, this result holds whenT isα-averaged, where nowλkis allowed to vary in]0,1/α].
Remark 3
(i) As in the pointwise case, Theorem2holds whenTisα-averaged, where nowλkis allowed to vary in]0,1/α].
(ii) When T is firmly non-expansive, i.e. the resolvent of a maximal monotone operator ((iv) of Lemma2), anO(1/k)ergodic convergence rate is also established in [48] with summable enlargement errors. For the methods which can also be cast in the HPE framework, our result coincides with the one in [48]. Note that in that work, they handled the non-stationary case (i.e. the parameter of the resolvent varies); see also our extension to the non-stationary case in Section6. For the case without errors, we recover also the bound in [2].
From Theorem1and2, it is immediate to get the convergence rate bounds on the sequence(||zk−zk+1||)k∈N
in the exact case. To lighten the notation, letvk=zk−zk+1andv¯k =k+11 Pk j=0vj. Corollary 3 Assume thatεk= 0for allk∈N.
(i) If0<infk∈Nλk≤supk∈Nλk<1, then||vk|| ≤ √ d0
τ(k+1); (ii) Ifλ= infk∈Nλk>0, then||¯vk|| ≤ k+12d0.
Proof
(i) By definitionvk =λkek, then from (8) we have
||vk||=||λkek|| ≤λk√ d0
τ(k+1) ≤ √ d0
τ(k+1). (ii) A direct result of Theorem2by replacingΛkwithk+ 1.
4 Local linear convergence rate
In the literature, for many splitting algorithms applied to a range of optimization problems, the following typi- cal convergence profile has been observed in practice. Globally the algorithm converges sub-linearly, and after a sufficiently large number of iterations, the algorithm enters a new regime where a local linear convergence takes over. This has been for instance observed (and sometimes proved) for DRS or FBS when solving sparsity-enforcing minimization problems, see e.g. [23,41].
In this section, we study the rationale underlying this local linear convergence behaviour. Our analysis relies on a metric sub-regularity assumption ofT0= Id−T (Definition5).
Definition 6 (Metric sub-regularity [25])A set-valued mappingF :H →2His called metrically sub-regular at
˜
zforu˜∈F(˜z)if there existsκ≥0along with neighborhoodZof˜zsuch that
d(z, F−1u)˜ ≤κ d(˜u, F z), ∀z∈ Z. (11) The infimum ofκsuch that (11) holds is the modulus of metric sub-regularity, denoted bysubreg(F; ˜z|˜u). The absence of metric regularity is signalled bysubreg(F; ˜z|˜u) = +∞.
Metric sub-regularity implies that, for any z ∈ Z,d(˜u, F z)is bounded below. The metric (sub)regularity of multifunctions plays a crucial role in modern variational analysis and optimization. These properties are a key to study the stability of solutions of generalized equations, see the dedicated monograph [25].
Let’s specialize this notion toT0 andu˜ = 0. SinceT0is single-valued andzerT0 = fixT, from (11), metric sub-regularity ofT0at somez?∈fixTfor0is equivalent to
d(z,fixT)≤κ||T0z||, ∀z∈ Z. (12)
There are several concrete examples of operatorsT whereT0fulfills (12).
Example 1 (Projector)Suppose thatT =PC, forCa non-empty closed convex subset ofH. ThenfixT =Cand d(z,C) =||z−PCz||=||T0z||. ThusT0is metrically sub-regular at anyz?∈ Cfor0withZ=Hand modulus1.
Using the relation between metric sub-regularity ofT0and bounded linear regularity ofT as defined in [8] (see our discussion in Section1.3.2), other examples can be deduced for instance from [8, Example 2.3 and 2.5]. Two other instructive examples, one on DRS with two subspaces and the second on gradient descent will be discussed at the end of the section.
Another interesting situation is whenTis firmly non-expansive, so thatT =JF for some maximal monotone operatorF ((iv) of Lemma2), in which case (4) is the relaxed inexact PPA. Letz? ∈fixT = zerF and suppose that0∈zerF. IfFis metrically sub-regular atz?for0with modulusγ, then
d(z,zerF) =d(z,fixT)≤γd(0, F z) =γd(0, T−1z−z) =γ inf
v∈T−1z||v−z||, ∀z∈ Z.
Thus for allwsuch thatz=T w∈ Z, applying the previous inequality and using the triangle inequality, we get d(T w,fixT)≤γ||T0w|| and d(w,fixT)≤(1 +γ)||T0w||.
Clearly, this is closely related, though not equivalent, to metric sub-regularity ofT0.
Metric sub-regularity implies that (12) gives an estimate for how far a pointzis from being the fixed-point set of Tin terms of the residual||z−T z||. This is the rationale behind using such a regularity assumption on the operator T0to quantify the convergence rate ond(zk,fixT). Thus, starting fromz0∈ H, and by virtue of Theorem1, one can recover aO(1/√
k)rate ond(zk,fixT). In fact, we can do even better as shown in the following theorem. We use the shorthand notationdk =d(zk,fixT).
Theorem 3 (Local convergence rate)Letz?∈fixT, assumeT0is metrically sub-regular atz?with neighborhood Zofz?, letκ >subreg(T0;z?|0),λk ∈]0,1]. Given a ballBa(z?)⊂ Z, a≥0, supposeC2=P
k∈Nλk||εk||is small enough such that
B(a+C2)(z?)⊆ Z.
Then for any starting pointz0∈Ba(z?), we have for allk∈N, d2k+1≤ζkd2k+ck, where ζk=
(1−κτk2, ifτk/κ2∈]0,1]
κ2
κ2+τk, otherwise ∈[0,1[, (13) andck=ν1λk||εk||. Moreover,
(i) dkconverges to0if(τk)k∈N∈/`1+. (ii) Letχk = Qk
j=0ζj, ifχ = lim sup
k→+∞
√k
χk < 1, then(d2k)k∈N ∈ `1+. Whenεk = 0, then lim
k→+∞
√k
dk < 1, which isR-linear convergence.
(iii) If0<infk∈Nλk≤supk∈Nλk<1, then there existsζ∈(0,1)such that d2k+1≤ζk d20+Pk
j=0ζ−j+1cj .
Proof Let constantsb > a > 0such thatT0 is metrically sub-regular atz?with modulusκ. Make the radiusa smaller if necessary so thatBa(z?)⊂Bb(z?)⊆ Zand
a+C2≤b.
Pickz0∈Ba(z?). Ifz0=z?then, takezk =z?for allk∈Nand there is nothing more to prove. If not, then from Lemma6that for anyz˜∈fixT,
||zk+1−z|| ≤ ||z˜ k−z||˜ +λk||εk|| ≤ · · · ≤ ||z0−z||˜ +Pk
j=0λj||εj|| ≤a+C2≤b,
which implies that starting from any pointz0 ∈ Ba(z?),zk ∈Bb(z?)holds for allk∈ N. Now for∀k ∈N, let
˜
z ∈fixT be such thatdk =||zk−z||˜ andck =ν1λk||εk||, then by virtue of the metric sub-regularity ofT0and Lemma6, we have
d2k+1≤ ||zk+1−z||˜ 2≤ ||zk−z||˜ 2−τk||T0zk−T0˜z||2+ck≤d2k−κτk2d2k+ck (14)
≤d2k−τκk2(d2k+1−ck) +ck =d2k−τκk2d2k+1+ζ1
kck. (15)
Ifτk/κ2 ∈]0,1[, then from (14) we haved2k+1 ≤ (1− κτk2)d2k +ck, or if 1 ≤ τk/κ2, (15) producesd2k+1 ≤
κ2
κ2+τkd2k+ck,λk ∈]0,1]ensuresκ2/(κ2+τk)∈]0,1]. Therefore, we have ζk=
(1−τκk2, ifτk/κ2∈]0,1[
κ2
κ2+τk, if1≤τk/κ2 ∈]0,1].
Furthermore,
d2k+1≤ζkd2k+ck≤ · · · ≤χkd20+Pk
j=0φk−jcj ≤χkd20+Pk
j=0cj, (16)
whereχk=Qk
j=0ζjandφk−j=Qk
`=j+1ζ`.
(i) From (16) we haved2k+1 ≤ d2k+ck, then thed2k → d ≥ 0([57, Lemma 2.2.2]). If(τk)k∈N ∈/ `1+, then
||ek|| →0(Theorem1), and by metric sub-regularity we havedk≤κ||ek||, therefored= 0;
(ii) Ifχ= lim sup
k→+∞
√k
χk <1, then lim
k→+∞χk= 0and(χk)k∈N, (φk)k∈N∈`1+. Since also(ck)k∈N∈`1+, hence the convolution Pkj=0φk−jcj
k∈N∈`1+, as a result, χkd20+Pkj=0φk−jcj
k∈N∈`1+and so is(d2k)k∈N. Ifεk= 0, then from (16) we have lim
k→+∞
√k
dk≤lim sup
k→+∞
√k
χk<1.
(iii) If0 <infk∈Nλk ≤supk∈Nλk <1, then there existsζ = supk∈Nκ2/(κ2+τk)<1which concludes the result.
Remark 4
(i) When the fixed point is a singleton, Theorem3holds replacingdkby||zk−z?||.
(ii) For simplicity, suppose the iteration is exact, and letz?∈fixTsuch thatdk =||zk−z?||. Then we have
||ek||2=||zk−z?+T z?−T zk||2≤4d2k, which means locally,||ek||also converges linearly to0given that(τk)k∈N∈/ `1+.
(iii) As far as the claim in (iii) of Theorem3is concerned, if∃ξ∈]0,1]such thatck =O(ξk), then (a) Ifξ < ζ, thend2k+1=O(ζk);
(b) Ifξ=ζ, thend2k+1=O(kζk) =o (ζ+δ)k , δ >0;
(c) Ifξ > ζ, thend2k+1=o(ξk).
Theorem3extends readily to theα-averaged case.
Corollary 4 IfTisα-averaged, then Theorem3holds substitutingλkαforλkandκαforκ.
Remark 5
(i) Whenλk ≡λandεk ≡0, our second rate estimate in (13) encompasses that of [8, Lemma 3.8].
(ii) Equivalent characterizations of metric sub-regularity can be given, for instance in terms of derivative criteria.
In particular, asT0is single-valued, metric sub-regularity ofT0holds ifT0is differentiable on a neighbour- hood ofz?with non-singular derivatives atzaroundz?, and the operator norms of their inverses are uniformly bounded [24, Theorem 1.2]. Computing the metric regularity modulusκis however far from obvious in gen- eral even for the differentiable case.
Examples
DRS with two subspaces inR2 LetU andV two subspaces inR2 forming an angle ofθ ∈]0, π/2]. Consider the problem of findingU ∩ V = {0}using DRS. It is well-known the DRS fixed point operatorTDRS is firmly non-expansive. Moreover, following [8, Example 2.3 and 2.5], it can be shownTDRS0 = Id−TDRSis metrically sub-regular with modulus1/sinθ. It then follows from (13) that the rate estimate isζk= 1−(2−λk)λksin2θ= (1−λk)2+ (2−λk)λkcos2θ∈]0,1[, forλk∈]0,2[. This is exactly the optimal rate estimate provided in [42,5].
Observe that as remarked in [42], the best rateζk ≡cos2θis obtained forλk ≡1(no relaxation).
Gradient descent We now consider the example of minimizing a convex functionf :Rn→Rwhich is locallyC2 and strongly convex onΩ⊂Rnand has Lipschitz gradient. Clearlyf has a unique minimizer, and without losing generality, we denote it as0. Defineδm, δMthe local strong convexity and Lipschitz modulus off respectively, thenδ1
m is equivalent to metric sub-regularity of∇f at0for0[1, Theorem 3.5].
For simplicity, consider the non-relaxed gradient descent method [57] for minimizingfwith constant step-size, i.e.
xk+1=xk−γ∇f(xk), whereγ ∈]0,δ2
M[. This can be cast in our above framework by setting T = Id−γ∇f which is γδ2M-averaged (Lemma4). HenceT0 = Iddef −T =γ∇f which is continuously differentiable on any neighbourhoodZ ⊂Ω. For anyx∈ Z, the Jacobian ofT0is justγ∇2f(x), where∇2f(x)is the Hessian off atx, which is non-singular and its inverse is uniformly bounded by δ1
m from local strong convexity. Thus, by virtue of [25, Theorem 4B.1],T0is metrically regular, hence sub-regular, and the metric regularity modulusκis precisely γδ1
m. In fact we could have anticipated this directly from the local strong monotonicity of∇f. Specializing the rate of Theorem3, we get
ζ= 1−t(2−t)
cnd2 ∈[0,1[,
where we sett=γδM∈]0,2[, andcnd =δM/δmcan be seen as the condition number of the Hessian off along Ω. It is obvious that the smallestζ3is attained fort= 1, i.e.γ= 1/δM.
The observed and theoretical convergence profiles of||ek|| = ||xk −xk+1||are illustrated in Figure1where gradient descent was run withγ ∈ {2δ1
M,δ1
M}. As predicted by our result, the convergence profile exhibits two regimes, a global sub-linear one, and then a local linear one.
k kek k
100 101 102
10-10 10-6 10-2 102
global 1=p k practical localp
1;.= 2/1
M
practical.=2/1
M
localp 1;.= /1
M
practical.=/1
M
120 140 160
Fig. 1 Global and local convergence profiles of gradient descent to minimizefwith(δm, δM) = (0.8,1). Forγ= 2δ1
M, the observed and theoretical rates are respectively0.60and0.72. Forγ=δ1
M, they are0.20and0.60respectively.
3 One may observe thatγcan be modified locally, once the iterates enter the appropriate neighbourhood, to get the usual optimal linear rate of gradient descent with a strongly convex objective [57]. But this is not our aim here.
5 Applications
In this section, we apply the obtained results to conduct quantitative convergence analysis of a class of monotone operator splitting methods in the literature, and mainly focus on the global iteration-complexity bounds. As stated in the introduction, we will rely on the fact that all the considered iterative schemes (GFB/FBS, DRS/ADMM and PDS) can be cast as Krasnosel’ski˘ı–Mann iteration. Furthermore, based on the structure of the corresponding monotone inclusion problem, we also derive specific error criteria which can serve as termination tests.
5.1 Generalized Forward–Backward splitting
The GFB algorithm [58] addresses the monotone inclusion (1) whenLi= Id, i∈J1, nK, andBisβ-cocoercive. By lifting the problem into product space, GFB achieves full splitting by applying theAi’s implicitly andBexplicitly.
Let(ωi)i∈]0,1[nsuch thatPn
i=1ωi= 1,γ∈]0,2β[,λk ∈]0,4β−γ2β [,(bk)k∈Nand(ai,k)k∈Nare error terms. The GFB iteration reads
Fori= 1,· · ·, n jzi,k+1=zi,k+λk Jγ
ωiAi(2xk−zi,k−γBxk+bk) +ai,k−xk , xk+1=Pn
i=1ωizi,k+1.
(17)
GFB recovers FBS forn= 1, and whenB= 0, GFB recovers DRS in theproduct space. In the literature, the convergence properties of the FBS and DRS have been extensively studied, see [6] and references therein. In this section, we mainly focus on the GFB method.
Recall the notation introduced in Section2.3. Denoteui,k+1=Jγ
ωiAi(2xk−zi,k−γBxk), i∈J1, nK, then (17) equivalently reads
zi,k+1=zi,k+λk(ui,k+1+εi,k−xk), whereεi,k = Jγ
ωiAi(2xk−zi,k−γBxk+bk) +ai,k
−ui,k+1. Letε1,k=C(bk),ε2,k= (ai,k)i, andT1,γ=
1
2(RγARS+Id), T2,γ=Id−γBS.
Using Remark1, the composed operatorTγ =T1,γ◦T2,γis 4β−γ2β -averaged. Moreover, it was shown in [58]
that the fixed-point iteration of GFB reads,
zk+1 =zk+λk T1,γ(T2,γzk+ε1,k) +ε2,k−zk
=zk+λk(Tγzk+εk−zk), (18) whereεk = T1,γ(T2,γzk+ε1,k) +ε2,k
−Tγzk.
Obviously, (18) is exactly in the form of (4). Therefore, the GFB iterates converge weakly [58, Theorem 4.1], and obey the iteration-complexity bounds in Theorem1and2. Moreover, we can establish certain convergence rates for the structured monotone inclusion (1).
Letgk+1= 1γxk−Bxk−1γPiωiui,k+1. Defineu¯i,k+1= k+11 Pk
j=0ui,j+1, i∈J1, nK,x¯k= k+11 Pk j=0xj
andg¯k+1= 1γx¯k−Bx¯k−1γ Piωiu¯i,k+1. Proposition 2 We havegk+1∈P
iAiui,k+1. Moreover, (i) If0 <infk∈Nλk ≤supk∈Nλk < 4β−γ2β , (k+ 1)||bk||
k∈N∈`1+and∀i∈J1, nK, (k+ 1)||ai,k||
k∈N∈`1+, then
d 0,PiAiui,k+1+B(Piωiui,k+1)
≤ 1 γ
r
d20+C1
τ(k+ 1) . (ii) Ifλ= infk∈Nλk >0,(λk||bk||)k∈N∈`1+and(λk||ai,k||)k∈N∈`1+,∀i∈J1, nK, then
||¯gk+1+B(Piωiu¯i,k+1)|| ≤ 2(d0+C2)
γλ(k+ 1). C1, C2are the constants in Theorem1and2respectively.