• Aucun résultat trouvé

Homework 5: Proximal methods, monotone operators, incremental methods

N/A
N/A
Protected

Academic year: 2022

Partager "Homework 5: Proximal methods, monotone operators, incremental methods"

Copied!
3
0
0

Texte intégral

(1)

10-801: Advanced Optimization and Randomized Methods

Homework 5: Proximal methods, monotone operators, incremental methods

(April 9, 2014)

Instructor: Suvrit Sra Due:April 21, 2014

1. Consider the convex optimization problem

min f(x) x∈ X,

whereXis closed and convex, whilef :X →Ris Lipschitz continuous and convex. Suppose further that we have a functionDφ:X × X →R

Dφ(x, y) =φ(x)−φ(y)− h∇φ(y), x−yi,

whereφis strongly convex with parameterµand continuously differentiable onX (as a resultDφ(x, y)≥

µ

2kx−yk22).

(a) Show thatDφ(x, y)is strongly convex as a function ofxby proving that Dφ(x, y)≥Dφ(z, y) +h∇zD(z, y), x−zi+µ

2kx−zk22. (b) Consider the general iterative algorithm

xk+1= argmin

x∈X

{hx, gki+ 1 αk

Dφ(x, xk)}, gk ∈∂f(xk). (5.1)

• Write down the optimality conditions for (5.1)

• Use these optimality conditions to write the above update explicitly in terms of∇φand∇φ. Hint:You’ll need the fact:φis strongly convex onX, soφis finite everywhere and differentiable, with

∇φ≡(∇φ+NX)−1; then consideru∈ ∇φ(x) +NX(x), whereNX is the normal cone.

(c) Show why the projected subgradient iteration

xk+1=PX(xk−αkgk), k= 0,1, . . . ,

is actually a special case of iteration (5.1).

2. The Douglas-Rachford iteration for minimizingf(x) +g(x)is given by xk= proxg(zk)

vk= proxf(2xk−zk) zk+1=zkk(vk−xk)

Show that forγk = 1, we can rewrite the above iteration usingaveraged reflectionsas

zk+1= [1

2(RfRg+I)](zk), where the reflection operators areRf:= 2 proxf−I, andRg:= 2 proxg−I. 3. Consider the following separable convex optimization problem

x∈minRn

F(x) :=Xm

i=1fi(x),

where eachfi:Rn→R∪ {+∞}(e.g.,fi(x) =δC(x)for some closed convex setC).

(a) Derive a Douglas-Rachford (DR) iteration to optimizeF(x)using the “product space trick”. Justify all your steps.

5-1

(2)

HOMEWORK5 Proximal methods, monotone operators, incremental methods April 9, 2014

(b) WriteF(x) =f1(x) +Pm

i=2fi(x). Introduce variablesx2, . . . , xm=x1. Now, obtain the (Lagrange) dual problem in terms of the conjugate functionsfi. Show how to solve this dual problem using DR.

(c) Compare (in words) the two formulations in (a) and (b) above. Are there situations where you would prefer one over the other?

4. LetA1, . . . , AT be matrices inRm×n, and lety1, . . . , yT ∈R. Consider the trace-norm regularized optimization problem

min

X∈Rm×n

XT

j=1(yj−tr(XTAj))2+λkXktr, where thetrace normiskXktr:=P

iσi(X)(sum of singular values).

(a) Derive a closed-form solution for the proximity operator of the trace-norm 1

2kX−Yk2F+λkXktr,

Hint 1:Ifr:Rn →Ris symmetric convex andabsolute(r(x) =r(|x1|,|x2|, . . . ,|xn|)), andσ:Rm×n →Rn+is the singular value map, then the conjugate of the compositionr◦σ, i.e.,(r◦σ)is (no surprise)r◦σ. (b) Present pseudo-code for solving this problem via proximal-gradients. Comment on how to select the

step-size parameter.

5. LetXbe a closed and bounded convex set. Letf be strongly convex with parameterµ. Assume we run the stochastic gradient method

xk+1=PX(xk−αkgk),

wheregk is astochastic subgradient, i.e.,E[gk[k−1]]∈∂f(xk), that hasfinite variance, i.e.,E[kgkk2]≤σ2. In this exercise, we’ll study a small modification to the simple convergence analysis from Lecture 19. In particular, we’ll show that a weighted average of the iteratesxk demonstratesO(1/k)convergence rate.

(a) Prove the following inequality (we essentially proved it in class already):

E[kxk+1−xk2]≤Ekxk−xk22kE[kgkk2]−2αk[f(xk)−f(x) +µ2kxk−xk2].

(b) Show from this inequality it follows that

E[f(xk)]−f(x)≤ αkσ2

2 +α−1k −µ

2 E[kxk−xk2]−1

kE[kxk+1−xk2]. (5.2) (c) Show that choosing stepsizeαk= µ(k+1)2 , implies that

Ef(¯xk)−f≤ 2σ2 µ(k+ 1), wherex¯k :=k(k+1)2 Pk

t=1txt.

(d) Show howx¯kcan be efficiently updated from iterationk→k+ 1.

6. Supposef is a convex function on a set C. An alternative definition of strong convexity off onC with coefficientµ >0is

f(αx+ (1−α)y) +µ2α(1−α)kx−yk22≤αf(x) + (1−α)f(y).

Supposefis a continuously differentiable function on int(C). Show that the following two are equivalent:

(a) f is strongly convex with strong convexity coefficientµ (b) h∇f(x)− ∇f(y), x−yi ≥µkx−yk22, ∀x, y∈int(C).

Hint:For part (b), it might help to definez(α) =αx+ (1−α)yand invoke the integral representation

f(z(α)) =f(x) + Z 1

0

h∇f(x+t(z(α)−x)), z(α)−xidt

5-2

(3)

HOMEWORK5 Proximal methods, monotone operators, incremental methods April 9, 2014

7. Iff is not convex, we can still define a prox-operator, which is now a set-valued map:

proxλf ≡y7→Argmin

x∈Rn

1

2kx−yk22+λf(x), λ >0.

Obtain the prox-maps for the following functions (a) f(x) =kxk0, i.e., the`0-“norm”.

(b) f(x) =kxk1/2:= (P

i|xi|1/2)2

(c) Does there exist a nonconvexf for which the prox-map is a singleton (forn >1)?

8. Consider the convex optimization problem

min f(x) +h(Ax), (5.3)

wherefandhare closed convex functions, andAhas full column rank. Assume that∂(f+h◦A) =∂f+∂(h◦A) (assume a similar qualification on the dual if needed).

(a) Write the Fenchel dual of this problem

(b) Show why running Douglas-Rachford on the dual yields the Alternating Direction Method of Multipliers (ADMM) for solving (5.3). (Hint:You may need to use the full DR method, not just its averaged reflections incarnation).

5-3

Références

Documents relatifs