Iteration-Complexity of a Generalized Forward Backward Splitting Algorithm

(1)

Iteration-Complexity of a Generalized Forward

Backward Splitting Algorithm

Jingwei Liang, Jalal M. Fadili, Gabriel Peyr´

e

To cite this version:

Jingwei Liang, Jalal M. Fadili, Gabriel Peyr´

e. Iteration-Complexity of a Generalized Forward

Backward Splitting Algorithm. ICASSP’14, May 2014, Florence, Italy. 5 pp. <hal-01073917>

HAL Id: hal-01073917

https://hal.archives-ouvertes.fr/hal-01073917

Submitted on 27 Oct 2014

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not.

The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destin´

ee au d´

epˆ

ot et `

a la diffusion de documents

scientifiques de niveau recherche, publi´

es ou non,

´

emanant des ´

etablissements d’enseignement et de

recherche fran¸

cais ou ´

etrangers, des laboratoires

publics ou priv´

es.

(2)

ITERATION-COMPLEXITY OF A GENERALIZED FORWARD BACKWARD SPLITTING

ALGORITHM

Jingwei Liang

1

, Jalal M. Fadili

1

and Gabriel Peyr´e

2

1

_{GREYC, CNRS-ENSICAEN}

2

_{CEREMADE, CNRS-Paris-Dauphine}

ABSTRACT

In this paper, we analyze the iteration-complexity of Gener-alized Forward–Backward (GFB) splitting algorithm, as pro-posed in [2], for minimizing a large class of composite objec-tives f `řn_i“1hion a Hilbert space, where f has a

Lipschitz-continuous gradient and the hi’s are simple (i.e. their

prox-imity operators are easy to compute). We derive iteration-complexity bounds (pointwise and ergodic) for the inexact version of GFB to obtain an approximate solution based on an easily verifiable termination criterion. Along the way, we prove complexity bounds for relaxed and inexact fixed point iterations built from composition of nonexpansive averaged operators. These results apply more generally to GFB when used to find a zero of a sum of n ą 0 maximal monotone oper-ators and a co-coercive operator on a Hilbert space. The the-oretical findings are exemplified with experiments on video processing.

Index Terms— Convex optimization, Proximal splitting, Convergence rates, Inverse problems.

1. INTRODUCTION 1.1. Problem statement

Many structured convex optimization problems in science and engineering, including signal/image processing and machine learning, can be cast as solving

min xPHJ pxq :“ f pxq ` n ÿ i“1 hipxq, (1)

where f P Γ0pHq has β´1-Lipschitz continuous gradient,

hi P Γ0pHq is simple, and Γ0pHq is the class of lower

semi-continuous, proper, convex functions from a Hilbert space H to s ´ 8, `8s. Some instances of (1) in signal, image and video processing are considered in Section 3 as illustrative examples.

Assume that Argmin J ‰ H and that the qualification condition

p0, . . . , 0q P sri tpx ´ y1, . . . , x ´ ynq |x P H, @i, yiP domhiu

This work has been supported by the ERC project SIGMA-Vision and l’Institut Universitaire de France. We would like to thank Yuchao Tang for pointing [1] to us.

holds, where sri is the strong relative interior, see [3]. Thus, minimizing J in (1) is equivalent to

Find x P zerpBJ q :“ tx P H|0 P ∇f pxq `řn

i“1Bhipxqu .

(2) Although we only focus on optimization problems (1) in the sequel, our results apply more generally to monotone inclu-sion problems of the form

Find x P # zerpB ` n ÿ i“1 Aiq :“ x P H|0 P Bx ` n ÿ i“1 Aix ( + , (3) where B : H ÞÑ H is β-cocoercive, and Ai : H ÞÑ 2His a

maximal monotone set-valued map.

In this paper, we will establish iteration-complexity bounds of the inexact GFB algorithm [2] for solving (2), whose steps are summarised in Algorithm 1. There, εk

1 and

εk

2,iare the errors when computing ∇f p¨q and prox_ωiγhip¨q.

Algorithm 1: Inexact GFB Algorithm for solving (2). Input: pziq_{iPt1,¨¨¨ ,nu}, pωiq_{iPt1,¨¨¨ ,nu} and řn_i“1ωi“ 1,

γ Ps0, 2βr, α “_4β´γ2β and λk Ps0,_α1r. k “ 0, x0“řn_i“1ωiz0i; repeat for i “ 1, . . . , n do vk`1_i “ proxγ ωihi`2x k ´ z_ik´ γ∇f pxkq ` εk1 ˘ ` εk_2,i; z_ik`1“ zik` λkpvk`1i ´ x k q; xk`1“řn_i“1ωizik`1; k “ k ` 1; until convergence; return x;

When n “ 1, GFB recovers the Forward–Backward split-ting algorithm [4], and when ∇f “ 0, GFB specializes to the Douglas–Rachford algorithm on product space [5].

There has been a recent wave of interest in splitting algo-rithms to solve monotone inclusions taking the form of (2) or (3), or even more general. In particular, several primal-dual splitting schemes were designed such as those in [6, 7] or [8] in the context of convex optimization. See also [9, 10] for

(3)

convergence rates analysis. The authors in [11, 12] analyze the iteration-complexity of the hybrid proximal extragradient (HPE) method proposed by Solodov and Svaiter. It can be shown that the GFB can be cast in the HPE framework but only for the exact and unrelaxed (i.e. λk “ 1) case.

1.2. Contributions

In this paper, we establish pointwise and ergodic iteration-complexity bounds for sequences generated by inexact and relaxed fixed point iterations, in which, the fixed point oper-ator is α-averaged. It is a generalization of the result of [13] to the inexact case, and of [14] who only considered the ex-act Douglas–Rachford method. Then we apply these results to derive iteration-complexity bounds for the GFB algorithm to solve (1). This allows us to show that Op1{q iterations are needed to find a pair ppuiqi, gq with the termination

crite-rion kg ` ∇f p<iωiuiqk 2

ď , where g Př_iBihipuiq. This

termination criterion can be viewed as a generalization of the classical one based on the norm of the gradient for the gra-dient descent method. The iteration-complexity improves to Op1{?q in ergodic sense for the same termination criterion.

2. ITERATION-COMPLEXITY BOUNDS 2.1. Preliminaries

The class of α-averaged non-expansive operators, α Ps0, 1r, is denoted Apαq “ tT : T “ Id ` αpR ´ Idqu for some non-expansive operator R. For obvious space limitations, we recall in Section 4 only properties of these operators that are essential to our exposition. The reader may refer to e.g., [3] for a comprehensive account.

Let pωiqiPt1,...,nu Ps0, 1sn s.t.

ř

iωi “ 1. Consider the

product space H :“ Hn_{endowed with scalar product xx¨, ¨yy}

@x “ pxiqi, y “ pyiqiP H, xxx, yyy “

ř

iωixxi, yiy,

and the corresponding norm ~~ ¨ ~~. Define the non-empty sub-space S Ă H :“ tx “ pxiqi P H|x1 “ . . . “ xnu, and

its orthogonal complement SK _{Ă H :“ tx “ px} iqi P

H|řn

i“1ωixi “ 0u. Denote Id as the identity operator on

H, and the canonical isometry: C : H ÞÑ S, x ÞÑ px, . . . , xq.

2.2. Inexact relaxed fixed point equation of GFB

Denote PS : H Ñ H, z ÞÑ Cp<iωiziq, RS “ 2PS ´

Id, B : H Ñ H, x “ pxiqi ÞÑ p∇f pxiqqi, Jγ¨A “

pproxγ

ωihiqi, and Rγ¨A“ 2Jγ¨A´ Id.

Let T1,γ “ 1₂rRγ¨ARS`Ids and T2,γ “ rId´γBPSs.

We can now define the inexact version of GFB.

Proposition 2.1. (i) The composed operator T1,γ ˝ T2,γis

α-averaged monotone with α “ _4β´γ2β ;

(ii) The inexact GFB is equivalent to the following relaxed fixed point iteration

zk`1“ zk` λk`T1,γpT2,γzk` εk1q ` ε k 2´ z k_{˘ ,} (4) and pzk

qkPNis quasi-Fej´er monotone with respect to FixpT1,γ˝

T2,γq ‰ H.

Proof. (i) This a consequence of [2, Proposition 4.12-13] and [1, Theorem 3]. (ii) See [2, Theorem 4.17], and [15, Theorem 3.1] since Argmin J ‰ H.

To further lighten the notation, let T “ T1,γ˝ T2,γ. Then

(4) can be rewritten as

zk`1“ Tkzk` λkεk,

where Tk “ λkT `p1´λkqId P Apαλkq, εk“ T1,γpT2,γzk`

εk

1q ` εk2´ T zk. We now define the residual term that will

be used as a termination criterion for (4), i.e.

ek “ pId ´ T qzk “ pzk´ zk`1q{λk` εk. (5)

2.3. Iteration complexity bounds of (4)

We are now in position to establish our main results on point-wise and ergodic iteration-complexity bounds for the inex-act relaxed fixed point iteration (4). The proofs are deferred to Section 4. Define τk “ λkp_α1 ´ λkq, τ “ infkPNτk,

τ “ sup_kPNτk. Let d0“ ~~z0´z‹~~ be the distance from z0to

z‹_{P FixT , ν}

1“ 2 sup_kPN~~Tλkz

k

´ z‹~~ ` sup_kPNλk~~εk~~

and ν2 “ 2 sup_kPN~~ek´ ek`1~~. Let `1` denote the set of

summable sequences in r0, `8r.

Theorem 2.2 (Pointwise iteration-complexity bound of (4)). (i) If

λkPs0, 1{αr, pτkqkPN R `1` and pλk~~εk~~qkPNP `1`, (6)

then the sequence pek

qkPN converges strongly to 0, and

pzkqkPNconverges weakly to a pointz‹P FixpT q.

(ii) If 0 ă inf kPNλk ď sup_kPNλk ă 1 α and ` pk ` 1q~~εk~~˘ kPNP ` 1 `, (7) thenC1 “ ν1 ř jPNλj~~εj~~ ` ν2τ ř `PNp` ` 1q~~ε ` ~~ ă `8, and ~~ek~~ ď d d2 0` C1 τ pk ` 1q; (8) (iii)If_2α1 ď λk ď sup_kPNλk ă_α1 is non-decreasing, then

~~ek~~ ď d d2 0` C2 τkpk ` 1q . (9) whereC2“ ν1ř_jPNλj~~εj~~ ` ν2τ0ř_`PNp` ` 1q~~ε`~~ ă `8. In a nutshell, after k ě O`pd20` C2q{ ˘ iterations, (4) achieves the termination criterion ~~ek~~2ď .

(4)

Denote now Λk“

řk

j“0λj, and define ¯ek“ _Λ1_k

řk

j“0λjej.

We have the following theorem.

Theorem 2.3 (Ergodic iteration-complexity bound of (4)). If λk Ps0, 1r and C3“ ř`8 j“0λj~~εj~~ ă `8, then ~~¯ek~~ ď 2`d0` C3 ˘ {Λk.

If infkλk ą 0, then we get the iteration-complexity

Op1{?q in ergodic sense for (4).

2.4. Iteration complexity bounds of (2)

We now turn to the complexity bounds of the GFB applied to solve (2) (or equivalently (1)).

From the quantities used in Algorithm 1, let’s denote uk`1“ puk`1_i qi“`proxγ ωihip2x k ´z_ik´γ∇f pxkqq˘_i, εk_i “ vk`1_i ´ uk`1_i , then ek i “ xk´ uk`1i “ pz k i ´ zik`1q{λk` εki, and gk “_γ1xk ´ ∇f pxkq ´1_γp<iωiuk`1i q. To save space, we

only consider the case where λkP r_2α1 ,_α1r is non-decreasing.

Theorem 2.4 (Pointwise iteration-complexity bound of (2)). We havegk

P ř_iBhipuk`1i q. Moreover, under the

assump-tions of Theorem 2.2, kgk ` ∇f p<iωiuk`1i qk ď 1 γ d d2 0` C2 τkpk ` 1q . Let now ¯uk i “ Λ1k řk j“0λkuj`1i , ¯x k “ _Λ1 k řk j“0λjxj and ¯gk “_γ1x¯k ´ ∇f p¯xk

q ´1_γp<iωiu¯kiq. We get the following.

Theorem 2.5 (Ergodic iteration-complexity bound of (2)). Under the assumptions of Theorem 2.3, we have

k¯gk` ∇f p<iωiu¯kiqk ď 2pd0` C3q{pγΛkq.

3. NUMERICAL EXPERIMENTS

As an illustrative example, in this section, we consider the principal component pursuit (PCP) problem, and apply it to decompose a video sequence into its background and fore-ground components. The rationale behind this is that since the background is virtually the same in all frames, if the lat-ter are stacked as columns of a matrix, it is likely to be low-rank (even of low-rank 1 for perfectly constant background). On the other hand, moving objects appear occasionally on each frame and occupy only a small fraction of it. Thus the corre-sponding component would be sparse.

Assume that a matrix real M can be written as M “ XL,0` XS,0` N,

where a XL,0is low-rank, XS,0is sparse and N is a

pertur-bation matrix that accounts for model imperfection. The PCP proposed in [16] attempts to provably recover pXL,0, XS,0q,

to a good approximation, by solving a convex optimization. Here, toward an application to video decomposition, we also add a non-negativity constraint to the low-rank component, which leads to the convex problem

min XL,XS 1 2kM ´XL´XSk 2 F`µ1kXSk1`µ2kXLk_˚`ιP`pXLq, (10) where k¨kFis the Frobenius norm, k¨k˚stands for the nuclear

norm, and ιP` is the indicator function of the nonnegative

orthant.

One can observe that for fixed XL, the minimizer of (10)

is X‹

S “ proxµ1k¨k1pM ´ XLq. Thus, (10) is equivalent to

min XL 1 pµ1k ¨ k1qpM ´ XLq ` µ2kXLk_‹` ιP`pXLq, (11) where1 pµ1k ¨ k₁qpM ´ XLq “ minZ 1₂kM ´ XL´ Zk2_F`

µ1kZk₁is the Moreau Envelope of µ1k ¨ k₁of index 1. Since

the Moreau envelope is differentiable with a 1-Lipschitz con-tinuous gradient [17], (11) is a special instance of (1) and can be solved using Algorithm 1. Fig. 2 shows the recov-ered components for a video example. Fig. 1 displays the observed pointwise and ergodic rates and those predicted by Theorem 2.4 and 2.5. 100 ₁₀1 ₁₀2 100 102 104 106 108 k k g k+ B ( P i ωi u k + 1 i )k theoretical practical 100 ₁₀1 ₁₀2 100 101 102 103 104 k k ¯g k+ B ( P i ωi ¯u k i)k theoretical practical

Fig. 1. Observed rates and theoretical bounds for the GFB applied to the PCP problem.

Fig. 2. Left: Original frames of a video (436 384 ˆ 288-pixel frames). Middle and Right: recovered background (low-rank) and foreground (sparse) components.

(5)

4. PROOFS 4.1. Preparatory lemmata

Lemma 4.1. Let T P Apαq, then _2α1 pId ´ T q is firmly non-expansive, i.e. _2α1 pId ´ T q P Ap1₂q.

Proof. From the definition, it is straightforward to see that

1

2αpId ´ T q “ 1

2pId ` p´Rqq.

Lemma 4.2. If the operator B is β-cocoercive, β ą 0, then Id ´ γB P Ap_2βγ q, with γ Ps0, 2βr.

Proof. This is Baillon-Haddad theorem, see e.g., [3, Proposi-tion 4.33].

Lemma 4.3. For ek, the following inequality holds 1

2αλk

~~ek´ ek`1~~2ď xxek´ εk, ek´ ek`1yy. Proof. Let E : H Ñ H, z ÞÑ pId ´ T qz, then _2α1E “

1

2`Id ` p´Rq˘ P Ap 1

2q (Lemma 4.1), and thus @p, q P H,

~~_2α1 Eppq ´ 1 2αEpqq~~

2

ď xxp ´ q,_2α1 Eppq ´ 1 2αEpqqyy,

substituting zk_{and z}k`1_{for p, q yields the result.}

Denote ν1“ 2 sup_kPN~~Tkzk´ z‹~~ ` sup_kPNλk~~εk~~.

Lemma 4.4. For z‹_{P FixpT q, λ}

k Ps0,1_αr, we have ~~zk`1´z‹~~2ď ~~zk´z‹~~2´λkp 1 α´ λkq~~e k ~~2`ν1λk~~εk~~.

Proof. Proposition 2.1(i) implies that ~~zk

´ T zk~~2 “ α2

~~zk´ Rzk~~2for some non-expansive operator R. There-fore, using [3, Corollary 2.14] we get

~~zk`1´ z‹~~2ď`~~Tkzk´ z‹~~ ` λk~~εk~~ ˘2 ď p1 ´ αλkq~~zk´ z‹~~2` αλk~~Rzk´ z‹~~2 ´ αλkp1 ´ αλkq~~zk´ Rzk~~2` ν1λk~~εk~~ ď ~~zk´ z‹~~2´ λkp 1 α´ λkq~~e k ~~2` ν1λk~~εk~~.

Denote ν2“ 2 sup_kPN~~ek´ ek`1~~.

Lemma 4.5. For λkPs0,_α1r, the sequence pekqkPNobeys

~~ek`1~~2´ ν2~~εk~~ ď ~~ek~~2.

Proof. Using Lemma 4.3, we get

~~ek`1~~2“~~ek~~2´ 2xxek, ek´ ek`1yy ` ~~ek´ ek`1~~2 ď~~ek~~2´1 ´ αλk

αλk

~~ek´ ek`1~~2` ν2~~εk~~

ď~~ek~~2` ν2~~εk~~.

4.2. Proofs of main results Proof of Theorem 2.2

(i) This is an adaptation of [15, Lemma 5.1].

(ii) piq implies that ~~ek~~ and ~~zk ´ z‹~~ are bounded, since pλk~~εk~~qkPN P `1`, therefore, ν1 and ν2 are bounded

con-stants. Then from Lemma 4.5, @j ď k, ~~ek~~2´ ν2

řk´1 `“j~~ε

`

~~ ď ~~ej~~2.

Inserting this in Lemma 4.4, and summing up over j, we get řk j“0τj~~ek~~2ď d20`ν1 řk j“0λj~~εj~~`ν2 řk j“0τj řk´1 `“j~~ε ` ~~, (12) whence we obtain pk`1qτ ~~ek~~2ď d2₀`ν1ř k j“0λj~~εj~~`ν2τřk´1_`“0p``1q~~ε`~~ .

Assumption 7 then yields the bound (8).

(iii)_2α1 ď λk ď sup_kPNλk ă _α1 is non-decreasing implies

that τkis non-increasing. Hence from (12), we get (9).

Proof of Theorem 2.3 Nonexpansiveness of Tkimplies

~~zk`1´ z‹~~ ď ~~z0´ z‹~~ `řk_j“0λj~~εj~~.

Combining this with the definition of ¯ek, we arrive at

~~¯ek~~ ď 1 Λk

p~~z0´zk`1~~`řk_j“0λj~~εj~~q ď 2pd0` C3q{Λk.

Proof of Theorem 2.4 From the update formula of uk`1_i , we get

γ´1_xk

´ ∇f pxkq ´ γ´1p<iωiuk`1i q ` γ∇f p<iωiuk`1i q

Př_iBhipuk`1i q ` ∇f p<iωiuk`1i q .

Since Id ´ γ∇f P Ap_2βγ q (Lemma 4.2), hence nonexpansive, and using Theorem 2.2 (iii), we arrive at

kgk ` ∇f p<iωiuk`1i qk ďγ ´1_kxk ´ř_iωiuk`1i k ďγ´1~~xk´ uk`1~~ “γ´1~~ek~~ ď 1 γ d d2₀` C2 τkpk ` 1q .

Proof of Theorem 2.5 Owing to Theorem 2.4, we get k¯gk` ∇f p<iωiu¯kiqk ďγ´1k¯x k ´ř_iωiu¯kik ď γ´1~~¯x k ´ ¯uk~~ ďγ´1~~¯ek~~ ď 2pd0` C3q{pγΛkq.

(6)

5. REFERENCES

[1] N. Ogura and I. Yamada, “Non-strictly convex mini-mization over the fixed point set of an asymptotically shrinking nonexpansive mapping,” Numerical Func-tional Analysis and Optimization, vol. 23, no. 1 & 2, pp. 113–137, 2002.

[2] H. Raguet, J. Fadili, and G. Peyr´e, “Generalized forward-backward splitting,” SIAM Journal on Imaging Sciences, vol. 6, no. 3, pp. 1199–1226, 2013.

[3] H. H. Bauschke and P. L. Combettes, Convex anal-ysis and monotone operator theory in Hilbert spaces, Springer, 2011.

[4] P. L. Combettes and V. R. Wajs, “Signal recovery by proximal forward-backward splitting,” Multiscale Mod-eling & Simulation, vol. 4, no. 4, pp. 1168–1200, 2005. [5] P. L. Lions and B. Mercier, “Splitting algorithms for the sum of two nonlinear operators,” SIAM Journal on Numerical Analysis, vol. 16, no. 6, pp. 964–979, 1979. [6] P. L. Combettes and J. C. Pesquet, “Primal-dual

split-ting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type mono-tone operators,” Set-Valued and variational analysis, vol. 20, no. 2, pp. 307–330, 2012.

[7] B. C. V˜u, “A splitting algorithm for dual monotone inclusions involving cocoercive operators,” Advances in Computational Mathematics, vol. 38, pp. 667–681, 2013.

[8] L. Condat, “A primal–dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms,” Journal of Optimization The-ory and Applications, vol. 158, pp. 460–479, 2013. [9] R. I. Bot and C. Hendrich, “Convergence analysis

for a primal-dual monotone+ skew splitting algorithm with applications to total variation minimization,” arXiv preprint arXiv:1211.1706, 2012.

[10] R. I. Bot, E. R. Csetnek, and A. Heinrich, “On the convergence rate improvement of a primal-dual splitting algorithm for solving monotone inclusion problems,” Tech. Rep. arXiv:1303.2875v1, 2013.

[11] R. DC. Monteiro and B. F. Svaiter, “On the complex-ity of the hybrid proximal extragradient method for the iterates and the ergodic mean,” SIAM Journal on Opti-mization, vol. 20, no. 6, pp. 2755–2787, 2010.

[12] R. DC. Monteiro and B. F. Svaiter, “Complexity of variants of Tseng’s modified FB splitting and Korpele-vich’s methods for hemivariational inequalities with ap-plications to saddle-point and convex optimization prob-lems,” SIAM Journal on Optimization, vol. 21, no. 4, pp. 1688–1720, 2011.

[13] R. Cominetti, J. A. Soto, and J. Vaisman, “On the rate of convergence of Krasnoselski-Mann iterations and their connection with sums of bernoullis,” arXiv preprint arXiv:1206.4195, 2012.

[14] B. He and X. Yuan, “On convergence rate of the Dou-glasRachford operator splitting method,” Mathematical Programming, under revision, 2011.

[15] P. L. Combettes, “Solving monotone inclusions via compositions of nonexpansive averaged operators,” Op-timization, vol. 53, no. 5-6, pp. 475–504, 2004.

[16] E. J. Cand`es, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of the ACM (JACM), vol. 58, no. 3, pp. 11, 2011.

[17] J. J. Moreau, “D´ecomposition orthogonale d’un espace hilbertien selon deux cˆones mutuellement polaires,” CR Acad. Sci. Paris, vol. 255, pp. 238–240, 1962.