Linear Rate of Convergence Analysis for GP

A Linearly Convergent Algorithm for Solving a Class of Nonconvex/Affine Feasibility Problems

3.4 A Linearly Convergent Gradient Projection Method

3.4.2 Linear Rate of Convergence Analysis for GP

We assume that SRIP(2s,√

2) holds. We will consider two versions of algorithm (GP). The first one is with a constant stepsize where we assume that

Tk=T¯ ∈[μ2s²,2ν2s²),

whereμ2s,ν2s are as in the definition of SRIP. An evident drawback of the fixed stepsize setting is the requirement that at leastμ2sshould be known. To avoid the need for knowing this parameter, we also introduce a variant of the method with a backtracking stepsize rule that does not require the knowledge ofμ2sfor computa-tional implementation, see Remark3.10. This backtracking procedure requires that SRIP should hold with a parameterα which is smaller than√

2 (but on the other hand, can be arbitrary close to√

2).

Gradient Projection with backtracking:

Input:ϕ∈S,s>0,x0∈Csarbitrary, η>1 – backtracking parameter, T0∈(0,μ2s)initial stepsize.

Step k(k≥0):

(a) Compute xk+1∈P_C_s

xk−_T¹_k∇f(xk) . (b) IfA(xk+1−xk)>√

Tkxk+1−xk, set Tk←ηTkand go back to (a).

(c) Set T_k+1←Tk. (d) Set k←k+1.

Remark 3.10. It is very easy to find a T₀∈(0,μ2s²)without actually knowingμ2s. For example, by taking an arbitrary v∈C2s, we get that^A_v⁽^v⁾≤μ2s, so we can pick T0∈

0,^A(v)_v2²

From the definition of the backtracking procedure, we first establish the following useful fact on the inverse step size Tk.

Proposition 3.11. For all k≥0,

T_k≤ημ_2s². (3.9)

Proof. This is proved by induction on k. For k=0 the claim is valid by the choice of T0. Suppose that the claim is true for k and we will prove it for k+1. If no back-tracking steps were done in step (b), then T_k+1=Tkand the claim is correct by the induction assumption. Otherwise, if backtracking steps were performed during step (b), then, in particular,γ =^T^k_η⁺¹ satisfiesA(xk+2−xk+1)>√γxk+2−xk+1, which together with the fact that x_k+2−x_k+1∈C2sand the SRIP assumption imply

that√γ≤μ2sand hence T_k+1≤ημ_2s².

We are now ready to prove our main result which shows that if SRIP(2s,√ 2/η) is satisfied, then the function values of the sequence generated by the above algo-rithm converges linearly to zero. For example, ifη=1.1, then SRIP(2s,1.285...) is required to hold true instead of SRIP(2s,1.414...).

Theorem 3.12. Consider the GP method with either a constant stepsize Tk=T¯∈ [μ_2s²,2ν_2s²)or with a backtracking stepsize rule with parameterηand suppose that SRIP(2s,√

2/ξ) is satisfied whereξ =1 for the constant stepsize setting andξ = η>1 for the backtracking scenario. Then

f(xk+1)≤(ρ−1)f(xk),∀k≥0 withρ<2 given by

ρ=

⎧⎪

⎪⎨

⎪⎪

⎩

T¯

ν²_2s constant stepsize

ημ_2s²

ν_2s² backtracking.

As a consequence,

f(xk+1)≤(ρ−1)^kf(x0),for every k≥0 and f(xk)→0 as k→∞.

Proof. Let

qk(x,xk):=f(xk) +x−xk,∇f(xk)+Tk

2x−xk². (3.10) Then the GP method can be equivalently rewritten as

x_k+1∈argmin{qk(x,xk): x∈Cs}, and hence, in particular, for a solution ¯x of (NCF) it holds that

qk(xk+1,xk)≤qk(¯x,xk). (3.11)

Now, since f(x) =¹₂A(x)−b², it follows that f(x_k+1) = f(x_k) +x_k+1−x_k,∇f(x_k)+1

2A(x_k+1−x_k)²

≤ f(xk) +xk+1−xk,∇f(xk)+Tk

2x_k+1−xk²,

where the last inequality follows from the fact that xk−x_k+1∈C2s(by the subaddi-tivity and symmetry of the functionϕ∈S) and from the fact that the definition of the stepsize (in the constant or backtracking settings) implies thatA(x_k+1−x_k) ≤

√T_kx_k+1−x_k. Therefore, we have shown that f(x_k+1)≤q_k(x_k+1,x_k)so that

f(xk+1) =qk(xk+1,xk)^(3.11)≤ qk(¯x,xk). (3.12) On the other hand,

q_k(¯x,x_k) = f(x_k) +¯x−x_k,∇f(x_k)+T_k

2¯x−x_k²

≤ f(x_k) +¯x−x_k,∇f(x_k)+ T_k

2ν_2s²^A⁽^¯x⁻^x^k⁾²

A(¯x)=b

= f(xk) +¯x−xk,∇f(xk)+ Tk

2ν_2s²^b⁻^A^(x^k⁾²

1+ Tk

ν_2s² ^f^(x^k^{) +}^¯x⁻^x^k^,^∇^f⁽^x^k⁾

1+ Tk

ν_2s² ^f^(x^k⁾⁻^{2 f}^(x^k⁾

= Tk

ν_2s² ⁻¹ ^f^(x^k^),

which along with (3.9) (in the backtracking setting), implies the result.

Corollary 3.13. Suppose that SRIP(2s,√

2)holds true. Then the sequence {xk} generated by GP converges to the unique optimal solution of (NC), and hence of NCF.

Proof. Let{x_k}be the sequence generated by the GP method with a constant step-size ¯T=μ_2s² and let ¯x be a solution of NCF. Then by Theorem3.12we have that

f(xk)≤(ρ−1)^k−1f(x0) On the other hand,

f(x_k) =1

2A(x_k)−b²=1

2A(x_k)−A(¯x)²≥ 1

2ν_2s²^x^k⁻^¯x²^. Therefore, xk→¯x and since ¯x was chosen arbitrarily its uniqueness follows.

The next corollary, follows immediately from Theorem3.12, and bounds the number of iterations required to obtain anε-optimal solution of (NC).

Corollary 3.14. Consider the setting of Theorem3.12. Then for k≥1+^log(1/ε)+C_D , the (GP) algorithm produces an x such that

A(x)−b²≤ε, where C :=log(2 f(x₀)),D :=log

1 ρ−1

Remark 3.15. Recently, an algorithm called the iterative M-sparse algorithm was analyzed in [5] for solving the l₀problem (E=Rⁿ,V=R^m,ϕ(x) =x0). This method is in fact nothing else but the gradient projection algorithm with a constant step size fixed and equal to 1. It was proved in [5] that if the columns of the matrix are normalized, andA2<1, this algorithm converges to a local minimum of (NC).

Our approach has focused on using SRIP to solve directly the NCF via a simple gradient projection method. On the other hand, RIP was used to determine con-ditions that warrant recovery of solutions for the nonconvex optimization problems such as (CS) and (AR) by solving their convex relaxations (ConvCS) and (ConvAR) respectively, namely, it is also needed to apply convex minimization schemes to solve these relaxed problems and achieve the same goals. While a direct comparison of these results is not fully transparent (e.g., in terms of complexity, the parameters involved etc.), it is nevertheless worthwhile to make the following remarks.

Remark 3.16. In the (CS) case, if RIP(2s,δ2s) is satisfied withδ2s<√

2−1, then this implies (by Lemma3.9) that SRIP(2s,α)holds true withα=

1+√ 1−(√2−1

2−1) = √1

2−1=1.5538..., which is less restrictive than the assumptionα=√

2 used in Theorem3.12. On the other hand, note that the later condition does not imply the condition on RIP of Theorem3.6(ii).

Remark 3.17. In the (AR) case, we can be more precise. The condition in Theorem 3.7(ii) requires that RIP should hold withδ5s<0.1. This condition is worse than the assumption of Theorem3.12. Indeed, by Lemma3.9it implies that SRIP(5s,α) is satisfied withα=

1.10.9=1.105..., which is a more restrictive than the condition SRIP(2s,√

2), see Lemma3.8.

We end by showing another interesting consequence of Theorem3.12which is particularly relevant to sparse recovery problems. Let us focus again on the setting of problem (CS) in Example3.2, that is,E=Rⁿ,V=R^mandϕ(x) =x0. The support of a vector x∈Rⁿis defined to be the set of indices of the nonzero components:

supp(x) ={i∈ {1,2,...,n}: xi=0}.

Our final result shows stabilization of the support in the sense that the support of xk

is contained in the support of the unique solution of NCF from a certain iteration of (GP).

Corollary 3.18. Consider the setting of Theorem3.12and letE=Rⁿ,V=R^mand ϕ(x) =x0. Let¯x be the unique solution of NCF. Then there exists ¯k such that for every k≥¯k the inclusion

supp(xk)⊆supp(¯x) holds true.

Proof. For every set S⊆ {1,2,...,n}, let us define:

f_S^∗=min 1

2Ax−b²: xi=0,i∈/S

. (3.13)

If|S| ≤s and supp(¯x)S then f_S^∗>0 since otherwise, if f_S^∗=0, it would mean by the uniqueness of ¯x that the optimal solution of (3.13) is ¯x in contradiction to supp(¯x)S. Let us now define the number

g=min

f_S^∗:|S| ≤s,S⊆ {1,2,...,n},supp(¯x)S

, (3.14)

which is positive. Now, since f(xk)→0, it follows that there exists ¯k such that

f(xk)<g (3.15)

for all k≥¯k. Let k≥¯k and let us assume in contradiction that supp(x_k)supp(¯x). Then

f(xk)≥fsupp(x^∗ k)≥g,

where the last inequality follows from the definition of g, which is a contradiction

to (3.15).

Appendix

We briefly summarize some of the first results providing sufficient conditions war-ranting recovery of sparse vectors for the compressed sensing l₀-minimization problem via the convex l1-norm problem (ConvCS). These were obtained via the concept of mutual coherence of a matrix, see [6] for more details and references.

Definition 3.19. [11] Let A= [a1,...,an]∈R^m×nwith m≤n and with normalized columnsai=1 for all i=1,...,n.Then, the mutual coherence M(A)of the matrix A is defined by

M(A):=max

i=j |ai,aj|=max

i=j |(A^TA)i j|.

Clearly, 0≤M(A)≤1. Furthermore, it has been shown that M(A)≥

n−m m(n−1).

Note that the mutual coherence of a matrix is generally easy to compute even for large matrices.

Using the mutual coherence of a matrix given in Definition3.19, the follow-ing sufficient condition relatfollow-ing (CS) to its convex relaxation (ConvCS) was proven in [15].

Theorem 3.20. [15] Consider problem (CS) withA(x)≡Ax. If a solution x∈Rⁿ of problem (CS) satisfies

x0<1 2

1+ 1

M(A) ,

then it is unique and coincides with the optimal solution of the convex problem (ConvCS).

We note that for a special class of matrices which are the concatenation of two orthogonal square matrices U,V , i.e., with A := [U,V], the above result has been improved in [16] by requiring the weaker condition:

x0<

√ 2−1

2 M(A) .

Further results in the same spirit have been derived for the noisy compressed sens-ing, see e.g., [8].

Finally, we note that the mutual coherence of a matrix M(A)given in Definition 3.19is closely related to RIP as shown in the following result.

Lemma 3.21. Let A= [a1,...,an]∈R^m×nwith m≤n and with normalized columns ai=1 for all i=1,...,n.Then, withδs≤(s−1)M(A), the matrix A with mutual coherence M(A)satisfies RIP(s,δs).

Proof. This follows immediately from the definition of RIP and using the

Gershgorin circles theorem (see e.g., [19].

Acknowledgements We thank two anonymous referees for their useful comments and sugges-tions. This research was partially supported by the Israel Science Foundation under ISF Grant 489-06.

References

1. Bauschke, H.H., Borwein, J.M.: On projection algorithms for solving convex feasibility prob-lems. SIAM Review 38, 367–426 (1996)

2. Beck, A., Teboulle., M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sciences 2, 183–202 (2009)

3. Beck, A., Teboulle, M.: Gradient-based algorithms with applications to signal recovery problems. In: D. Palomar, Y. Eldar (eds.) Convex Optimization in Signal Processing and Com-munications, 42–88. Cambridge University Press (2010)

4. Bertsekas, D.: Non-Linear Programming, 2nd ed. Athena Scientific, Belmont, MA (1999) 5. Blumensath, T., Davies, M.E.: Iterative hard thresholding for Sparse Approximations. The

Journal of Fourier Analysis and Applications 14, 629–654 (2008)

6. Bruckstein, A. M., Donoho, D. L., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review 51, 34–81 (2009)

7. Cadzow, J.A.: Signal enhancement – a composite property mapping algorithm. IEEE Trans.

Acoustics, Speech, Signal Process. 36, 49–62 (1988)

8. Cand`es, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: Exact signal reconstruc-tion from highly incomplete frequency informareconstruc-tion. IEEE Transacreconstruc-tions on Informareconstruc-tion Theory, 52:489–509 (2006)

9. Cand`es, E.J.: The restricted isometry property and its implications for compressed sensing.

Compte Rendus de l’Academie des Sciences, Paris, Serie I 346, 589–592 (2008)

10. Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process. Letters 14, 707–710 (2007)

11. Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM Review 43, 129–159 (2001)

12. Combettes, P.L.: The foundations of set theoeretic estimation. Proc. IEEE 81, 182–208 (1993) 13. Combettes, P.L., Trussell, H.J.: Method of successive projections for finding a common point

of sets in metric spaces. J. Optim. Theory Appl. 67, 487–507 (1990)

14. Donoho, D.L.: Compressed sensing. IEEE Transactions on Information Theory 52, 1289–1306 (2006)

15. Donoho, D.L., Huo, X.: Uncertainty principles and ideal atomic decomposition. IEEE Trans.

Inform. Theory 47, 2845–2862 (2001)

16. Elad, M., Bruckstein, A.M.: A generalized uncertainty principle and sparse representation in pairs of bases. IEEE Trans. Inform. Theory 48, 2558–2567 (2002)

17. Engl, H.W., Hanke, M., Neubauer, A.: Regularization of inverse problems, volume 375 of Mathematics and its Applications. Kluwer Academic Publishers Group, Dordrecht (1996) 18. Fazel, M., Hindi, H., Boyd, S.: Rank minimization and applications in system theory. In:

American Control Conference, 3272–3278 (2004)

19. Golub, G., Loan, C.V.: Matrix computations, 3rd edn. Johns Hopkins University Press (1996) 20. Natarajan, B.K.: Sparse approximation solutions to linear systems. SIAM J. Computing 24,

227–234 (1995)

21. Recht, B., Fazel, M., Parrilo, P.: Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Review 52, 471–501 (2010)

22. Santosa, F., Symes, W.W.: Linear inversion of band-limited reflection seismograms. SIAM J.

Sci. Statist. Comput. 7, 1307–1330 (1986)

23. Taylor, H.L., Banks, S.C., McCoy, J.F.: Deconvolution with the l₁norm. Geophysics 44, 39–52 (1979)

The Newton Bracketing Method for Convex

Dans le document Fixed-Point Algorithms for Inverse Problems in Science and Engineering (Page 55-62)