• Aucun résultat trouvé

Sparse Regularization: Convergence Of Iterative Jumping Thresholding Algorithm

N/A
N/A
Protected

Academic year: 2022

Partager "Sparse Regularization: Convergence Of Iterative Jumping Thresholding Algorithm"

Copied!
13
0
0

Texte intégral

(1)

Sparse Regularization: Convergence Of Iterative Jumping Thresholding Algorithm

Jinshan Zeng, Shaobo Lin, and Zongben Xu

Abstract—In recent studies on sparse modeling, nonconvex penalties have received considerable attentions due to their supe- riorities on sparsity-inducing over the convex counterparts. In this paper, we study the convergence of a nonconvex iterative threshold- ing algorithm for solving a class of sparse regularized optimization problems, where the corresponding thresholding functions of the penalties are discontinuous with jump discontinuities. Therefore, we call the algorithm the iterative jumping thresholding (IJT) al- gorithm. The finite support and sign convergence of IJT algorithm is first verified via taking advantage of such jump discontinuity. To- gether with the introduced restricted Kurdyka–Łojasiewicz prop- erty, then the global convergence1 of the entire sequence can be further proved. Furthermore, we can show that the IJT algorithm converges to a strictly local minimizer at an eventual linear rate2 under some additional conditions. Moreover, we derive a posteriori computable error estimate, which can be used to design an efficient terminate rule. It should be pointed out that the q quasinorm (0< q <1) is an important subclass of the nonconvex penalties studied in this paper. In particular, when applied to theq regu- larization, IJT algorithm can converge to a local minimizer with an eventual linear rate under certain concentration conditions.

We also apply the proposed algorithm to sparse signal recovery and synthetic aperture radar imaging problems. The experiment results show the effectiveness of the proposed algorithm.

Index Terms—Sparse regularization, non-convex optimization, iterative thresholding algorithm,q regularization (0< q <1), Kurdyka-Łojasiewicz inequality.

I. INTRODUCTION

T

HE sparse regularized optimization problems emerging in many areas of scientific research and engineering practice have attracted considerable attention in recent years. Typical applications include regression [37], visual coding [32], signal processing [20], compressed sensing [10], [23], and microwave imaging [40]. These problems can be intuitively modeled as the following0quasi-norm regularized optimization problem

minxRN {F(x) +λx0}, (1)

Manuscript received July 27, 2015; revised March 14, 2016 and May 26, 2016; accepted July 17, 2016. Date of publication July 28, 2016; date of current version August 08, 2016. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Ami Wiesel. The work of J. Zeng was supported in part by the National Science Foundation (NSF) under Grant 11501440. The work of S. Lin was supported in part by the NSF under Grants 61502342 and 11401462. (Corresponding author: Shaobo Lin.)

J. Zeng is with the College of Computer Information Engineering, Jiangxi Normal University, Nanchang 330022, China (e-mail: jsh.zeng@gmail.com).

S. Lin is with the College of Mathematics and Information Science, Wenzhou University, Wenzhou 325035, China (e-mail: sblin1983@gmail.com).

Z. Xu is with the School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China (e-mail: zbxu@mail.xjtu.edu.cn).

Color versions of one or more of the figures in this letter are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSP.2016.2595499

1The global convergence in this paper is defined in the sense that the entire sequence converges regardless of the initial point.

2It is also known as asymptotic or local linear rate in other papers.

where F:RN [0,∞) is a proper lower-semicontinuous function,x0, commonly called the 0 quasi-norm, denotes the number of nonzero components ofxand λ>0 is a reg- ularization parameter. Some efficient algorithms including the iterative hard thresholding algorithm ([3], [29]) were developed to solve (1).

Besides the0regularized optimization problem, a more gen- eral class of problems are considered in both practice and theory, that is,

xminRN{F(x) +λΦ(x)}, (2) where Φ(x) is a certain separable, continuous penalty with Φ(x) =N

i= 1φ(|xi|), and x= (x1,· · ·, xN)T. One of the most important cases is the 1-norm with Φ(x) =x1 = N

i= 1|xi|. The 1-norm is convex and thus, the correspond- ing1-norm regularized optimization problem can be efficiently solved. Nevertheless, the1-norm may not induce adequate spar- sity when applied to certain applications [13]. Alternatively, many non-convex penalties were proposed as relaxations of the 0 quasi-norm. Some typical non-convex examples are theq

quasi-norm (0< q <1) [13], [14], [39], smoothly clipped ab- solute deviation (SCAD) [24], and log-sum penalty [11]. Com- pared with the1-norm, the non-convex penalties can usually induce better sparsity while the corresponding non-convex reg- ularized optimization problems are generally more difficult to solve.

There are mainly four classes of algorithms to solve the non- convex regularized optimization problem (2). The first one is the half-quadratic (HQ) algorithm [26], [27]. HQ algorithms can be efficient when both subproblems are easy to solve (particularly, when both subproblems have closed-form solutions). The sec- ond class is the iterative reweighted algorithm including iterative reweighted least squares (IRLS) minimization ([15], [21], [28], [30]) and iterative reweighted1-minimization (IRL1) [11] al- gorithms. The basic idea of the iterative reweighted algorithm is to obtain an approximate sparse solution via solving a se- quence of weighted least squares (or, 1-minimization) prob- lems. The third class is the difference of convex functions algo- rithm (DC programming) [25]. DC programming method first converts the original problem into the difference of two con- vex problems (called primal and dual problems, respectively), then iteratively optimizes these two problems. The last class is the iterative thresholding algorithm, which fits the framework of the forward-backward splitting (FBS) algorithm [2] and the generalized gradient projection method [7] when applied to a separable non-convex penalty. Some typical iterative threshold- ing algorithms include iterative hard [3], soft [22] and half [39]

thresholding algorithms. Compared to other types of non-convex

1053-587X © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

(2)

algorithms such as the HQ, IRLS, IRL1 and DC programming algorithms, the iterative thresholding algorithm is easy to im- plement and has almost the least computational complexity for large scale problems (see, [40] for instance).

Although the effectiveness of the iterative thresholding al- gorithms for the non-convex regularized optimization problems has been verified in many applications, except for the iterative hard [29] and half [41] thresholding algorithms, the conver- gence of most of these algorithms has not been thoroughly investigated. Basically, the three questions, i.e., when, where, and how fast does the algorithm converge, should be answered.

A. Main Contribution

In this paper, we give the convergence analysis for the it- erative jumping thresholding algorithm (called IJT algorithm henceforth) for solving a certain class of non-convex regularized optimization problems. The main contributions can be summa- rized as follows:

a) We prove that the supports and signs of any sequence generated by IJT algorithm can converge with finitely many iterations.

b) Under a further assumption that there exists one limit point such that the objective function satisfies the so- called restricted Kurdyka-Łojasiewicz (rKL) property (see Definition 2) at this point, the whole sequence converges to this point (see Theorem 1).

c) Under certain second-order conditions, we demonstrate that IJT algorithm converges to a strictly local minimizer at an eventual linear rate (see Theorems 2 and 3).

d) When applied to the q (0< q <1) regularization, IJT algorithm converges to a local minimizer at an eventual linear rate as long as the matrix satisfies a certain concen- tration property (see Theorem 4).

B. Notations and Organization

We denoteR,NandCas the sets of real number, natural number and complex number, respectively. For any vectorx∈ RN, xi is its ith component, and for a given index set I⊂ IN {1, . . . , N},xIrepresents its subvector containing all the components restricted toI. Ic represents the complementary set ofI, i.e., Ic=IN\I.x2 represents the Euclidean norm of a vectorx. Supp(x)is the support ofx, i.e., Supp(x) ={i:

|xi|>0, i= 1, . . . , N}. For any matrixA∈RN×N,σi(A)and σmin(A) (λi(A) and λmin(A)) denote as theith and minimal singular values (eigenvalues) ofA, respectively. Similar to the vector case, for a given index setI,AIrepresents the submatrix ofAcontaining all the columns restricted toI. For anyz∈R, sign(z)denotes its sign function, i.e.,

sign(z) =

⎧⎪

⎪⎩

1, forz >0 0, forz= 0

−1, forz <0 .

The remainder of this paper is organized as follows. In Section II, we give the problem settings and then introduce IJT algorithm with some basic properties. In Section III, we give

the convergence analysis of IJT algorithm. In Section IV, we apply the developed theoretical analysis to theq (0< q <1) regularization. In Section V, we give some related works and comparisons. In Section VI, we present some applications to show the effectiveness of the proposed algorithm. We conclude this paper in Section VII. The proofs are presented in Appendix.

II. IJT ALGORITHM

A. Problem Settings

We make several assumptions on the concerned problem

x∈RminN{Tλ(x) =F(x) +λΦ(x)}, (3) whereΦ(x)is separable withΦ(x) =N

i= 1φ(|xi|).

Assumption 1: F :RN [0,∞)is continuously differen- tiable with Lipschitz continuous gradient, i.e., it holds that

∇F(u)− ∇F(v)2 ≤Lu−v2, ∀u, v∈RN, whereL >0is the Lipschitz constant.

Note that Assumption 1 is a general assumption forF. For example, the least squares and logistic loss functions used in machine learning are two typical cases.

Assumption 2: φ: [0,∞)→[0,∞)is continuous and satis- fies the following assumptions:

a) φis non-decreasing withφ(0) = 0andφ(z)→ ∞when z→ ∞.

b) For eachb >0, there exists ana >0such thatφ(z)≥az2 forz∈[0, b].

c) φ is differentiable on (0,∞), and its first derivative φ is strictly convex with φ(z)→ ∞ for z→0 and limz→∞φ(z)/z = 0.

d) φhas a continuous second derivativeφon(0,).

Most of the above assumptions were considered in [7]. It can be observed that Assumption 2(a) ensures the coercivity ofφ, and thus the existence of the minimizer. Assumption 2(b) guarantees the weakly sequential lower semi-continuity of φ inl2, and Assumption 2(c) is assumed to induce the sparsity.

In practice, there are many non-convex functions satisfying Assumption 2. Two of the most typical subclasses areφ(z) =zq andφ(z) =log(1 +zq)withq∈(0,1)as shown in Fig. 1.

B. IJT Algorithm

In order to describe IJT algorithm, we need to define the following proximity operator ofΦ,

P roxμ ,λΦ(x) =arg min

uRN

x−u22

2μ +λΦ(u)

, (4) where μ >0 is a parameter. SinceΦis separable, computing P roxμ ,λΦcan be reduced to solve a one-dimensional minimiza- tion problem, that is,

proxμ ,λφ(z) =arg min

vR

|z−v|2

2μ +λφ(|v|)

. (5) Therefore,

P roxμ ,λΦ(x) = (proxμ ,λφ(x1),· · ·, proxμ ,λφ(xN))T. (6) We list some useful results onproxμ ,λφobtained in [7].

(3)

Fig. 1. Typicalφsatisfying Assumption 2 and the corresponding threshold- ing functions. We plot the figures ofφ(|z|) =|z|1/2,|z|2/3,log(1 +|z|1/3), and their corresponding thresholding functions. For comparison, we also plot the figures of two well-known cases, i.e.,0-norm withφ(|z|) = 1|z|>0 as the indicator function of|z|>0,1-norm withφ(|z|) =|z|, and their corre- sponding thresholding functions. (a) Typical penalty functions. (b) Thresholding functions.

Lemma 1. ([7, Lemmas 3.2 and 3.3]): Assume thatφsatis- fies Assumption 2, then

a) for eachμ >0, the functionρμ :z→z+λμφ(z)is well defined onR+;

b) the function ψ:z→2(φ(z)−zφ(z))/z2 is strictly decreasing and one-to-one on(0,∞)→(0,∞);

c) for any z >0, φ(z) is negative and monotonically increasing;

d) proxμ ,λφis well defined and can be specified as proxμ ,λφ(z) = sign(z)ρ−1μ (|z|), for|z| ≥τμ

0, for|z| ≤τμ

, (7) for anyz∈Rwith

τμ =ρμμ) and ημ =ψ1((λμ)1). (8) Moreover, the range ofproxμ ,λφ is{0} ∪μ,∞).

It can be observed that the proximity operator is discon- tinuous with a jump discontinuity, which is one of the most significant features of such a class of non-convex penalties studied in this paper. Henceforth, we callproxμ ,λφ the jumping thresholding function. Moreover, it can be easily checked that the proximity operator is not nonexpansive in general. (Some specific proximity operators are shown in Fig. 1(b).)

Formally, the iterative form of IJT algorithm can be expressed as follows:

xn+ 1 ∈P roxμ ,λΦ(xn−μ∇F(xn)), (9) whereμ >0is a step size parameter. For simplicity, we define

Gμ ,λΦ(x) =P roxμ ,λΦ(x−μ∇F(x)), x∈RN, and its fixed point setFμ {x:x=Gμ ,λΦ(x)}. C. Some Basic Properties of IJT Algorithm

Property 1: Letxbe a fixed point ofGμ ,λΦand{xn}be a sequence generated by IJT algorithm, then it holds

a) for any i∈Supp(x), |xi| ≥ημ and [∇F(x)]i+ λsign(xi(|xi|) = 0; and for anyi∈Supp(x)c,|xi|= 0and|[∇F(x)]i| ≤τμ/μ;

b) for any i∈Supp(xn+ 1), |xn+ 1i | ≥ημ and xn+ 1i +λμ sign(xn+ 1i(|xn+ 1i |) =xni −μ[∇F(xn)]i; and for any i∈Supp(xn+ 1)c,|xn+ 1i |= 0and|xni −μ[∇F(xn)]i| ≤ τμ ,n∈N,

where[∇F(x)]iand[∇F(xn+ 1)]irepresent theith component of∇F(x)and∇F(xn+ 1)respectively.

This property can be easily derived by the definition of prox- imity operator and Lemma 1(d). Actually, Property 1(a) is a certain type of optimality conditions of problem (3). We call x a stationary point of (3) if x satisfies Property 1(a), and we denote byΩμ the stationary point set for a givenμ. Then according to Property 1(a), it holdsFμ Ωμ.

Property 2: Let{xn}be a sequence generated by IJT algo- rithm with a bounded initialization. Assume that0< μ < L1, then it holds

a) Tλ(xn+ 1)≤Tλ(xn)12(1μ −L)xn+ 1 −xn22, and there exists a constantTλsuch that limn→∞Tλ(xn)→Tλ; b) xn+ 1−xn2 0asn→ ∞;

c) each accumulation point of{xn}is a fixed point ofGμ ,λΦ; d) if{xn}possess an isolated accumulation point, then the

whole sequence converges to somex∈ Fμ.

This property can be claimed from [7, Propositions 2.1, 2.3 and Corollary 2.1] with μn ≡μ. Property 2(a) is commonly called the sufficient decrease property, which is a basic property desired for a descent method. LetX be the accumulation point set of {xn}, then by Property 2(c), X ⊂ Fμ, and further by Property 1(a),X ⊂Ωμ.

Property 3: Suppose that0< μ < L1, then each global min- imizer ofTλis a fixed point ofGμ ,λΦ. LetMbe the set of global minimizers, thenM ⊂ Fμ.

Property 3 is a corollary of [7, Propositions 2.2] with a uni- form step size. From Properties 2 and 3, the following relations hold

X ⊂ Fμ,M ⊂ Fμ andFμ Ωμ. III. CONVERGENCEANALYSIS

In this section, we will answer the basic questions concerning IJT algorithm presented in introduction, i.e., when, where and how fast does the algorithm converge?

A. rKL Property

Kurdyka-Łojasiewicz (KL) property has been widely used to prove the convergence of the non-convex algorithms (see, [2]

for instance).

Definition 1. (KL property): A function f :RN R {+∞}is said to have the KL property atxdom(∂f) if there existη∈(0,+∞], a neighborhoodU of x and a continuous concave functionϕ: [0, η)R+such that:

i) ϕ(0) = 0andϕisC1on(0, η);

ii) for alls∈(0, η),ϕ(s)>0;

iii) for all x in U∩ {x:f(x)< f(x)< f(x) +η}, the KL inequality holds

ϕ(f(x)−f(x))dist(0, ∂f(x))1. (10)

(4)

Proper lower semi-continuous functions which satisfy the KL inequality at each point of dom(∂f) are called KL functions.

The KL property of f at some point x means that “f is amenable to sharpness atx” [6], and the KL inequality (10) is equivalent to

dist(0, ∂(ϕ(f(x)−f(x))))1, (11) for allx∈U∩ {x:f(x)< f(x)< f(x) +η} (simply use the “one-sided” chain-rule [35, Theorem 10.6]). KL functions include real analytic functions, semialgebraic functions and lo- cally strongly convex functions (more information can be re- ferred to Sec. 2.2 in [38] and references therein). However, according to [4] (Sec. 1, page 1), some simple functions such asf(x) =exp(−x12),∀x∈R,are not KL function, and in the latter proof of Proposition 1 (see Appendix B), a class of simple functions are shown to be not KL functions.

Motivated by this, in this paper, we introduce another related but weaker property called the rKL property. Before describing the definition of rKL property formally, we define a projection mapping associated with an index setI⊂IN,

PI :RN R|I|, PIx=xI,∀x∈RN. We also denotePIT as the transpose ofPI,

PIT :R|I|RN,(PITz)I =zand(PITz)Ic = 0,∀z∈R|I|, where|I|is the cardinality ofIandIc=IN\I.

Definition 2. (rKL property): A function f :RN R {+∞} is said to have the rKL property atxdom(∂f), if g:R|I|R∪ {+∞}, g(z) =f(PITz)satisfies KL property atz=xI withI=Supp(x).

From Definition 2, rKL property only requires that the sub- gradient offwith respect to the nonzero variables can get sharp after certain a concave transform, while KL property requires such well property for all variables around some point. In the following, we give a sufficient condition of the rKL property.

Lemma 2: Given an index setI⊂IN, consider the function g(z) =f(PITz). Assume thatzis a stationary point ofg(i.e.,

∇g(z) = 0), and g is twice continuously differentiable at a neighborhood ofz, i.e.,B(z, 0)for some0 >0. Moreover, if2g(z)is nonsingular, thenf satisfies the rKL property at PITz. Actually, it holds

|g(z)−g(z)| ≤C∇g(z)22,∀z∈B(z, ), for some0< < 0and a positive constantC>0.

The proof of this lemma is shown in Appendix A. Then we present a proposition to show that rKL property is an extension of KL property.

Proposition 1. (rKL is a generalization of KL): Iffsatisfies the KL property atx, thenf satisfies the rKL property atx, but not vice versa.

The proof of this proposition is presented in Appendix B. Ac- cording to the proof procedure of Proposition 1, the conditions listed in Lemma 2 are essential for the rKL property in the sense that there exists a function satisfying conditions in Lemma 2 but not KL property.

B. Convergence of Entire Sequence

Lemma 3. [Finite Support Convergence]: Let{xn}be a se- quence generated by IJT algorithm and In =Supp(xn). As- sume that0< μ <L1, then there exist a positive integern, an index set I and a sign vector S such that when n > n, the following hold

a) In =Iand Supp(x) =I,∀x∈ X, b) sign(xn) =Sand sign(x) =S,∀x∈ X.

The proof of this lemma is presented in Appendix C. Accord- ing to Lemma 3, the support and sign freeze with finitely many iterations. Furthermore, by Lemma 3, we can claim that{xn} converges tox if the new sequence{xi+n}i∈N converges to x, which is also equivalent to the convergence of the sequence {zi+n}i∈N,

zi+n →zasi→ ∞ (12) withzi+n=PIxi+nandz=PIx. Let

ˆ

zn =zn+n, (13)

then{ˆzn}has the same convergence behavior of{xn}.

For any >0, we define a one-dimensional real set R R\[−, ].

Particularly, letR0 =R\{0}. We let

ZPIX ={PIx:x∈ X },

thenZis the accumulation point set of the sequence{zˆn}. We defineT :R|ηI|

μ/2 Randf :R|ηI|

μ/2 Ras

T(z) =Tλ(PITz)andf(z) =F(PITz),∀z∈R|I|η

μ/2. (14)

For any z∈ Z, it can be observed from Property 1(a) that zR|I|ημ andzis a stationary point ofT. Moreover, we de- fine a series of mappingsφ1,m :Rm0 Rm andφ2,m :Rm0 Rm×m as follows

φ1,m(z) = (sign(z1(|z1|),· · · ,sign(zm(|zm|))T, φ2,m(z) =diag(φ(|z1|),· · ·, φ(|zm|)), m= 1, . . . , N,

(15) where diag(z)represents the diagonal matrix generated by z.

For brevity, we denoteφ1,mandφ2,masφ1andφ2respectively whenmis fixed and there is no confusion.

By Properties 1 and 2, we can easily verify that{ˆzn}satisfies the following properties.

Lemma 4: {zˆn}satisfies the following:

a) (Sufficient decrease condition). For eachn∈N, Tzn+ 1)≤T(ˆzn)1

2 1

μ−L

ˆzn+ 1−zˆn22. b) (Relative error condition). For eachn∈N,

∇T(ˆzn+ 1)2 (1

μ+L)ˆzn+ 1−zˆn2. c) (Continuity condition). There exists a subsequence

{zˆnj}j∈N andzsuch that ˆ

znj →zandTznj)→T(z), asj→ ∞.

(5)

Lemma 4(a) and (c) are obvious by Property 2, the spe- cific form of T (14) and the construction of {ˆzn} (13).

Lemma 4(b) holds mainly due to Property 1(b) and Assumptions 1–2. Specifically, by Property 1(b), it can be easily checked that

ˆ

zn+ 1 +λμφ1zn+ 1) = ˆzn −μ∇fzn), which implies

μ(∇f(ˆzn+ 1) +λφ1zn+ 1)) = (ˆzn−zˆn+ 1) +μ(∇fzn+ 1)

− ∇fzn)).

Thus,∇Tzn+ 1)2 = 1

μ(ˆzn−zˆn+ 1) +μ(∇fzn+ 1)− ∇fzn))2. By Assumption 1,∇Fis Lipschitz continuous, then

∇fzn+ 1)− ∇fzn)2 ≤ ∇F(PITzˆn+ 1)− ∇F(PITzˆn)2

≤LPITzˆn+ 1−PITzˆn2 =Lzˆn+ 1 ˆzn2. Therefore,∇Tzn+ 1)2 (μ1 +L)zˆn+ 1−zˆn2.

From Lemma 4, ifTfurther has the KL property at the limit pointz, then according to Theorem 2.9 in [2],{zˆn}converges toz. Note that the construction form of{zˆn}, we can obtain the following convergence result.

Theorem 1. [Global Convergence]: Assume that F and φ satisfy Assumptions 1 and 2, respectively. Let{xn} be a se- quence generated by IJT algorithm. Suppose that0< μ <L1, then{xn}converges subsequentially to a setX. If further there is a limit pointx∈ X at whichTλsatisfies the rKL property, then the whole sequence converges tox.

Together with Lemma 2, the following corollary holds.

Corollary 1: Under Assumptions 1 and 2, suppose that0<

μ < L1, and that there exists a limit pointxof{xn}such that F is twice continuously differentiable atxand2T(PIx)is nonsingular, then{xn}converges tox.

Remark 1: A similar condition is also used to guarantee the convergence of the steepest descent method in [34, Theorem 2, pp. 266]. Obviously, ifzis a strictly local minimizer (or maxi- mizer), or a strict saddle point ofT, then the nonsingularity of

2T(z)holds naturally. Therefore, ifTis locally strict convex or concave, then Corollary 1 holds.

C. Convergence to a Strictly Local Minimizer

As shown in Corollary 1, if 2T(PIx) is nonsingular at some limit pointx, then the sequence generated by IJT algo- rithm converges tox. In this subsection, we will justify thatx is also a strictly local minimizer of the optimization problem if

2T(PIx)is positive definite.

Theorem 2. [Convergence to a Strictly Local Minimizer]:

Under assumptions of Corollary 1, if further 2T(PIx) is positive definite, thenxis a strictly local minimizer ofTλ.

The proof of this theorem is rather intuitive. By Property 1(a) we have

[∇F(x)]I+λφ1(xI) = 0. (16) This together with the condition of the theorem

2T(PIx) =2I IF(x) +λφ2(xI)0

imply that the second-order optimality conditions hold atx= (xI,0), where2I IF(x) =2∂ xF(x)2

I

x=x.For sufficiently small vectorh, we denotexh = (xI+hI,0). It then follows

F(xh) +λ

iI

φ(|xi+hi|)≥F(x) +λ

iI

φ(|xi|). (17) Furthermore, by Assumption 2(c), it obviously holds that

φ(t)>([∇F(x)]Ic+ 2)t/λ,

for sufficiently smallt >0. By this fact and the differentiability ofF, for sufficiently smallh, there hold

F(x+h)−F(xh) +λ

i∈Ic

φ(|hi|)

= hTIc[∇F(x)]Ic +λ

i∈Ic

φ(|hi|) +o(hIc)

iIc

([∇F(x)]Ic[∇F(x)]i+ 1)|hi| ≥0. (18) Summing up the above two inequalities (17)–(18), one has that for all sufficiently smallh,

Tλ(x+h)−Tλ(x)0, (19) and hencex is a local minimizer. Moreover, we can observe that whenh= 0, then at least one of these two inequalities (17) and (18) will hold strictly, which implies thatx is a strictly local minimizer.

D. Eventual Linear Convergence Rate

In order to derive the convergence rate of IJT algorithm, we first show some observations on∇Fandφin the neighborhood ofx. For any0< ε < ημ, we define a neighborhood ofxas follows

N(x, ε) ={x∈RN :xI −xI2 < ε, xIc = 0}.

If F is twice continuously differentiable at x and alsoλmin

(2I IF(x))>0, then for any x∈ N(x, ε), there exist two sufficiently small positive constantscF andcφ (bothcF andcφ depending onεwithcF 0andcφ 0asε→0) such that

[∇F(x)]I [∇F(x)]I, xI−xI

(λmin(2I IF(x))−cF)xI−xI22, (20) φ1(xI)−φ1(xI), xI−xI(e)−cφ)xI−xI22,

(21) where (21) holds for φ being strictly convex on (0,∞), and thus φ being nondecreasing on (0,), consequently, mini∈Iφ(|xi|) =φ(mini∈I|xi|). With the observations (20) and (21), we obtain the following theorem.

Theorem 3. (Eventual Linear Rate): Under conditions of Corollary 1, if the following conditions also hold

a) λmin(∇2I IF(x))>0;

b) 0<λ<−λmin(∇φ2I I(e)F(x)),

c) either 0< μ <min{2(λmin(L22I I−(λφF(x(e))))+2λφ(e)),L1}, or, for any sufficiently small 0< ε < ημ, the third derivative

(6)

φ is well-defined, bounded and nonzero on the set

i∈IB(xi, ε), whereB(xi, ε) := (xi −ε, xi+ε), where e=mini∈I|xi|, then there exists a positive integer n0 and a constantρ∈(0,1)such that whenn > n0,

xn+ 1−x2 ≤ρxn −x2, and xn+ 1−x2 ρ

1−ρxn+ 1−xn2.

The proof of Theorem 3 is presented in Appendix D. As shown by this theorem, if we can fortunately obtain a good initial point, then IJT algorithm may converge fast with a linear rate. On the other hand, Theorem 3 also provides a posteriori computable error estimate of the algorithm, which can be used to design an efficient terminate rule of IJT algorithm. It can be observed that the conditions of Theorem 3 are slightly stricter than those of Theorem 2, and thus,x is also a strictly local minimizer under the conditions of Theorem 3.

IV. APPLICATION TOq(0< q <1) REGULARIZATION

Theq(0< q <1)regularization is formulated as follows:

minxRN

Tλ(x) =1

2Ax−y22+λxqq

, (22)

whereA∈RM×N (commonly,M < N),y∈RM, andxqq

=N

i= 1|xi|q. The proximity operator proxμ ,λ|·|q can be expressed as (see [7])

proxμ ,λ|·|q(z) = (·+λμqsign(·)| · |q−1)−1(z), |z| ≥τμ ,q

0, |z| ≤τμ ,q

(23) for anyz∈R, where

τμ ,q = 2−q

22q(2λμ(1−q))2−q1 , (24) ημ ,q = (2λμ(1−q))2−q1 , (25) and the range ofproxμ ,λ|·|q is{0} ∪μ ,q,∞). Specifically, for some specialq(say,q= 1/2,2/3), the corresponding proximity operators can be expressed analytically [39], [12].

According to [2] (See Example 5.4, page 122), the function Tλ(x) = 12Ax−y22+λxqqis a KL function and obviously satisfies the rKL propety at any limit point. Then we can obtain the following corollary directly.

Corollary 2: Let{xn} be a sequence generated by IJT al- gorithm for q regularization with q∈(0,1). Assume that 0< μ < A12

2, then {xn} converges to a stationary point of qregularization.

Moreover, it is easy to check thatφ(z) =zqsatisfies the sec- ond part of condition (c) in Theorem 3. Therefore, the eventual linear convergence rate of IJT algorithm for q regularization can be claimed as follows.

Corollary 3: Under conditions of Corollary 2, if the follow- ing conditions also hold:

a) λmin(ATIAI)>0, b) 0<λ< λmin(Aq(1−q)TIAI)e2−q,

where I=Supp(x)and e=mini∈I|xi|, then IJT algorithm converges to a strictly local minimizerxwith an eventual linear rate.

It can be observed that the minimal nonzero entry eof x is used in condition (b) of this corollary. A theoretical lower bound ofeis estimated by Chen et al. [17]. In the following, we derive another sufficient conditions through the observation that the threshold value (25) is generally a tighter lower bound ofethan that studied in [17]. Specifically, by (25), it holds

e≥ημ ,q = (2λμ(1−q))21q. (26) Then if λminA(ATI2AI)

2 > q2 and q

min(ATIAI) < μ <A12

2,the con- ditions in Corollary 3 hold naturally.

Theorem 4: Under conditions of Corollary 2, if the following conditions still hold:

a) λmin(ATIAI) A22 > q

2,

b) q

min(ATIAI) < μ < 1 A22

,

then IJT algorithm converges to a strictly local minimizerx with an eventual linear rate.

From Theorem 4, it means that if the matrix A satisfies a certain concentration property and the step size μ is chosen appropriately, then IJT algorithm can converge to a local min- imizer with an eventual linear rate. Note that the condition (a) in Theorem 4 implies q

min(ATIAI) < A12

2 naturally. Thus, the condition (b) of Theorem 4 is a natural and reachable condition and, furthermore, whenever this condition is satisfied, the se- quence{xn}is indeed convergent by Corollary 2. This shows that only condition (a) is essential in Theorem 4. We notice that condition (a) is a concentration condition on eigenvalues of the submatrixATIAI, and, in particular, it implies

λmin(ATIAI)> qλm ax(ATIAI)/2, or equivalently

Cond(ATIAI) := λm ax(ATIAI) λmin(ATIAI) < 2

q, (27)

where Cond(ATIAI)is the condition number of ATIAI. (27) thus shows that the submatrixATIAI is well-conditioned with the condition number lower than2/q.

In recent years, a property called the restricted isometry prop- erty (RIP) of a matrix A was introduced to characterize the concentration degree of the eigenvalues of its submatrix with k columns [9]. A matrix A is said to be of the k-order RIP (denoted then byδk-RIP) if there exists aδk (0,1)such that

(1−δk)x22 ≤ Ax22 (1 +δk)x22,˜∀x0 ≤k. (28) In other words, the RIP ensures that all submatrices ofAwith k columns are close to an isometry, and therefore distance- preserving. LetK=x0. It can be seen from (28) that ifA possessesδK-RIP withδK < 22+qq, then

Cond(ATIAI)1 +δK

1−δK < 2 q.

Références

Documents relatifs

In doing so, we put to the fore inertial proximal algorithms that converge for general monotone inclusions, and which, in the case of convex minimization, give fast convergence rates

We interrogated phenotypic and bulk brain RNA- sequencing data from two large independent databases (ROSMAP and MSBB) and found that increased levels of MEF2C and downstream

La gure 17.6 repr esente en fonction de l' energie du photon ISR, la proportion d' ev enements pour lesquels ces sous-d eclenchements sont allum es parmi les  ev enements

The performance of these methods is an improvement upon other methods proposed in the literature and are algorithmically simple for large computational saving.. The proposed

Abstract: This paper deals with the recovery of an image from noisy observations when multiple noisy copies of the image are available.. Two configurations based

Key-words: Howard’s algorithm (policy iterations), primal-dual active set algorithm, semi-smooth Newton’s method, super-linear convergence, double-obstacle problem,

The results obtained to the previous toy problem confirm that good performances can be achieved with various thresholding rules, and the choice between Soft or Hard thresholding

Pour ce qui est de l’analyse précise de l’influence de la cohésion du couple sur la socialisation de l’enfant envers ses pairs, qui était la quatrième