• Aucun résultat trouvé

Algorithms: Subgradient Descent

Entropic Regularization of the 0 Function

5.3 Algorithms: Subgradient Descent

The central algorithm we explore in this note is simple (sub)gradient descent on the dual problem (5.13):

Algorithm 5.1 (Subgradient descent). Given y0Rm, forν=0,1,2...generate the sequence{yν}ν=0via

yν+1=yννdν, where dν ∈ −ϕε,L ATyν−bTyν

and λν is an appropriate step length parameter.

Forε >0, ϕε,L is continuously differentiable on its domain, and the algorithm amounts to the method of steepest descent.

5.3.1 Nonsmooth Case: ε = 0

In this section, we present and analyze a subgradient descent method with exact line search and variants thereof suitable for solving the dual problem above for the case ε=0, that is, we do not smooth the problem.

Using the notation of indicator functions, we have ϕ0,L (x) =ιRL(x)

0 for x∈RL

+∞ otherwise.

The specific instance of (5.13) that we address is

ymin∈RmιRL(ATy)−yTb. (5.14) Since the set RL is a rectangle, nonsmooth calculus yields the following simple expression for the subdifferential of the dual objective:

ιRL

ATyν

−bTyν

=ANRL

ATyν

−b. (5.15)

Here, we have used the fact that the subdifferential of the indicator function to the box RLat the point x, denoted∂ιRL(x)is equivalent to the normal cone mapping of RLat the point x

∂ιRL(x) =NRL(x)

{w∈Rn with (z−x)Tw≤0 for all z∈RL} if x∈RL

/0 if x∈/RL.

Remark 5.2. It is important to note that we assume that we can perform exact arith-metic. This assumption is necessary due to the composition of the normal cone mapping of RLwith AT: while we can determine the exact evaluation of the normal cone for a given ATyν, we cannot guarantee exact evaluation of the matrix–vector product and, since the normal cone mapping is not Lipschitz continuous on RL, this can lead to large computational errors.

Problem (5.14) is a linear programming problem. The algorithm we analyze be-low solves problem a parametric version of problem (5.14), where the parameter L changes dynamically at each iteration. To see how the parameter might be changed from one iteration to the next, we look to a trivial extension of the primal problem:

minimize

(x,L)∈Rn×Rn+

n j=1

Lj|xj|

subject to Ax=b. (5.16)

It is clear that L=0 and any feasible x is an optimal solution to problem (5.16), and that the (global) optimal value is 0. However, this is not the only solution. Indeed, the sparsest solution x to Ax=b and the weight L satisfying Lj =0 only for those elements j on the support of x is also a solution. The algorithm we study below finds a weight compatible with the sparsest element x. A more satisfying reformulation would yield a weight that is in some sense optimal for the sparsest element x, but this is beyond the scope of this work.

5.3.2 Dynamically Rescaled Descent with Exact Line Search

There are three unresolved issues in our discussion to this point, namely the choice of elements from the subdifferential, the choice of the step length and the adjustment of the weights Lj. Our strategy is given in Algorithm5.4below. In the description of the algorithm, we use some geometric notions that we introduce first. It will be convenient to define the set C by

C≡ {y∈Rm|ATy∈RL}.

This set is polyhedral as the domain of a linear mapping with box constraints.

Lemma 5.3 (Normal cone projection). Let A be full rank and denote the normal cone to C≡ {y∈Rm|ATy∈RL}at y∈C by NC(y). Then

PNC(y)b=Aw (5.17)

for

w=argmin{Aw−b2|w∈NRL(ATy)}. (5.18) Proof. If A is full rank, then all points y∈C satisfy the constraint qualification that A is injective on NRL(ATy), that is, the only vector w∈NRL(ATy)for which Aw=0 is w=0. Then, by convex or nonsmooth analysis (see e.g., [14, Theorem 6.14]) the set C is regular and

NC(y) =ANRL(ATy) =

u=Aw|w∈NRL(ATy) . By the definition of the projection

PNC(y)b≡argmin

u−b2|u∈NC(y) hence,

PNC(y)b=argmin

u−b2|u∈ANRL(ATy)

=Aargmin

Aw−b2|w∈NRL(ATy) =Aw.

Algorithm 5.4 (Dynamically rescaled descent with exact line search).

Initialization: Setν=0,τ>0, L0= (a1,a2,...,an), where aj is the jth column of A, y0=0 and the direction d0=b.

Main iteration: Whiledν>τdo

(Exact line search.) Calculate the step lengthλν>0 according to λνargmin

λ>0

ιRLν

AT(yνdν)

−bT(yνdν) . (5.19)

Set y=yννdν.

(Subgradient selection and preliminary rescaling.) Define

Jν+1={j| |aTjy|=Lνj}, (5.20) S(L,J,γ) = (s1(L,J,γ),s2(L,J,γ),... ,sn(L,J,γ)),

where sj(L,J,γ) =

γLj for all j∈J

Lj else, (5.21)

and

C(L,J,γ) ={y∈Rn|ATy∈RS(L,J,γ)}, where

RS(L,J,γ)[−s1(L,J,γ),s1(L,J,γ)]× ··· ×[−sn(L,J,γ),sn(L,J,γ)]

(5.22) Chooseγ≥0 small enough that PN

C(Lν,Jν+1,γ)(y)b∈ri(NC(Lν,Jν+1)(y)) for yy. Compute the direction

dν+1≡b−PN

C(Lν,Jν+1,γ)(y)b (5.23)

(Rescaling.) Let Jν+1+

j|aTjdν+1>0 , Jν+1

j|aTjdν+1<0 , and define

Iν+1(γ) argmin

j∈Jν++1∪Jν+1

LνjγaTjy aTjdν+1

j∈Jν+1+

, −LνjγaTjy

aTjdν+1

j∈Jν+1

. (5.24)

Chooseγν+1[0,γ]to satisfy

Iν+1ν+1)Iν+1(0). (5.25) Set

Lν+j 1=

γν+1Lνj for j∈Jν+1

Lνj else (5.26)

and yν+1ν+1y. Incrementν=ν+1.

End do.

We begin with some observations. The next proposition shows that the direc-tions chosen by (5.23) with PN

C(Lν,Jν+1,γ)(y)b∈ri(NC(Lν,Jν+1)(y))for yy are subgradient descent directions that are not only feasible, but orthogonal to the active constraints. We use orthogonality of the search directions to the active con-straints to guarantee finite termination of the algorithm.

Proposition 5.5 (Feasible directions). Let C≡ {y∈Rm|ATy∈RL}, ¯y∈C and define the direction ¯d≡b−PNC(y)¯b. Then−d¯ιRL(ATy¯)−bTy¯

and there exists aλ>0 such that yd¯∈C for allλ [0,λ].

Moreover, if PNC(y¯)b∈ri(NC(y)), then the direction ¯¯ d is orthogonal to the jth column of A for all j such that aTjy=Lj.

Proof. The inclusion−d¯ιRL(ATy)¯ −bT¯y

follows immediately from (5.15).

The feasibility of this direction follows from Lemma5.3and the polyhedrality of C since the polar to the normal cone to C at a point y∈C is therefore equivalent to the tangent cone, which consists only of feasible directions to C at y, defined as a direction d for which yd∈C for allλ >0 sufficiently small.

Indeed, let ajdenote the jth column of the matrix A and recall the definition of the contingent cone to C at y∈C:

KC(y)≡ {w∈Y| for allνyνwν∈C for some wν→w,λν0}. Since C is convex the contingent cone and the tangent cone are equivalent [2, Corollary 6.3.7] and since C is polyhedral the tangent cone can be written as

TC(y)≡ {w∈Y| for allνy+λνw∈C for some λν0},

that is, the tangent cone consists entirely of feasible directions. Now the tangent and normal cones to C are convex and polar to each other [14, Corollary 6.30], so, by Lemma5.3, what remains to be shown is that b−PNC(y)b lies in the polar to the normal cone to C. This follows since NC(y)is nonempty, closed, and convex. Hence for all w∈NC(y)and for any b

wT(b−PNC(y)b)0, that is, b−PNC(y)b is in the polar to the normal cone.

To see the final statement of the proposition, denote by ¯Jthe set {j=1,2,...,n|aTjy¯=Lj}.

If the projection lies on the relative interior to NC(y), then the projection onto N¯ C(y)¯ is equivalent to the projection onto the subspace containing NC(y):¯

PNC(y)¯b=PD(¯y)b, where

D(y¯)

Aw|w∈Rnwith wj=0 for j∈/J .

Thus, aTj(b−PNC(y)¯b) =aTj(b−PND(y)¯b) =0 for j∈Jas claimed.

Remark 5.6 (Detection of orthogonality of feasible directions). The interiority con-dition PNCy)b∈ri(NCy))guaranteeing orthogonality of the directions can easily be checked. Let ¯w≡argmin{Aw−b2|w∈NRL(ATy)}.By Lemma5.3PNC(y)¯b=A ¯w.

Then A ¯w and hence PNC(y)¯b lies in ri(NC(y))¯ if and only if ¯wj=0 for all j such that aTjy=Lj.

Calculation of the direction in (5.23) of Algorithm5.4is suggested by Lemma 5.3, where it is shown that the projection is the mapping of the solution to a least

squares problem over a cone. Also, by Proposition5.5, the direction is the negative of a subgradient of the objective in (5.14) with the box S(Lν,Jν+1,γ), that is

−dν+1≡ −b+PN

C(Lν,Jν+1,γ)(y)b∈ιRS(Lν,Jν+1,γ)(ATy)−bTy .

The description as a projection onto the normal cone of a polyhedron is perhaps less helpful than the explicit formulation of Lemma5.3for suggesting how this can be computed, but it provides greater geometrical insight. Moreover, the projection provides an elegant criterion for maintaining orthogonality of the search directions with the active constraints.

The exact line search step has an explicit formulation given in the next pro-position.

The exact line search step length ¯λgiven by (5.19) has the explicit representation λ¯ min

Proof. Application of nonsmooth calculus provides a generalization to the fact from optimization of smooth objectives that the exact line search step extends to the tan-gent of a level set of the objective, from which we can extract (5.27). However, it is perhaps easiest to see the explicit formulation by direct inspection: the indicator functionιRL is zero at all points in RL, so the step length is the largestλ such that

We show in this section that for sufficiently sparse solutions xto Ax=b, the steep-est subgradient descent algorithm with exact line search (Algorithm5.4) recovers xexactly. Before we continue, however, we must specify precisely what is meant by “sufficiently sparse.”