6 Numerical experiments - First-order methods for constrained convex programming based on linea

In this section, we test Algorithms 1 and 2 on BPDN (65) and QCQP (66) to show their numer-ical performance. The two algorithms are named as LALM and BLALM respectively. For both algorithms, we choose the parameter η by backtracking. More precisely, at each iteration k, for LALM, we start fromη^k=η^k−1 and multiply it by 1.5 if (17) fails, and for BLALM, we initialize η_i^k

k = η^k−1_i

k and multiply it by 1.5 if (64) does not hold. For both tests, we run the compared methods to 10⁵ epochs, where one epoch is equivalent to n block updates. Optimal solutions to both tested problems are computed by CVX [8] with high precision.

6.1 Basis pursuit denoising

In this test, we show the convergence speed of LALM and BLALM on solving BPDN (65). For simplicity, we setD=I. The matrixA∈R^50×100 is randomly generated according to the standard Gaussian distribution, and the underlying sparse signalx^o has 5 nonzero components following the standard Gaussian distribution. Then we let b = Ax⁰ + 0.1ξ, where ξ is a unit Gaussian noise vector. For BLALM, we evenly partition the variablex into 10 blocks. The parameterβ is simply set to 1 for both methods, and ρz =β is set for LALM andρz= ₁₀^β for BLALM. Note that for the latter, the value ofρ_z is larger than that given by the theorem, and the algorithm still works well.

This may indicate that our analysis is not tight.

10⁰ 10¹ 10² 10³ 10⁴ 10⁵ Number of Epochs

10^-15 10^-10 10^-5 10⁰ 10⁵

distance of objective to optimal value

LALM ergodic BLALM ergodic LALM nonergodic BLALM nonergodic 1/k rate

10⁰ 10¹ 10² 10³ 10⁴ 10⁵

Number of Epochs 10^-15

10^-10 10^-5 10⁰ 10⁵

Violation of constraint

LALM ergodic BLALM ergodic LALM nonergodic BLALM nonergodic 1/k rate

Figure 1: Convergence behaviors of Algorithm1(named LALM) and Algorithm2(named BLALM) on the BPDN problem (65). Left: distance of objective value to the optimal value|f₀(x)−f0(x^∗)|;

Right: constraint residual Pm

j=1[f_j(x)]₊. “ergodic” curves are measured by averaged iterate ¯x^k and “nonergodic” ones by actual iterate x^k. The missing part on each constraint violation curve corresponds to zero residual.

Figure1plots the objective values and constraint residuals produced by both algorithms, where the curve corresponding to “ergodic” is obtained by using the averaged iterates ¯x^k and “nonergodic”

by the actual iteratex^k. The missing part on each constraint violation curve corresponds to zero residual. Since LALM and BLALM have similar per-epoch complexity, their comparison in terms of running time is similar to that in Figure 1. From the figure, we see that BLALM is better than LALM in terms of both ergodic and nonergodic iterates. The ergodic convergence speed of both methods is precisely the order of_k¹ and matches our theorems. However, the nonergodic convergence is significantly faster, especially as the iterate approaches to optimality. This is possibly because the iterate enters a region where the algorithm has linear convergence as indicated by the analysis in section 3.2. For this reason, we use the actual iterate in the next test.

6.2 Quadratically constrained quadratic programming

In this subsection, we test LALM and BLALM on the QCQP problem (66) and compare them to the recently proposed first-order primal-dual type method by Yu&Neely [21]. They consider a smooth constrained convex program in the form of (1) without an explicit linear equality constraint.

Their method that we name as PD-YN iteratively performs the updates:

x^k+1 =PX



x^k−1 η



∇f₀(x^k) +

j=1

λ^k_j +f_j(x^k)

∇f_j(x^k)







, (69a)

λ^k+1_j = max −f_j(x^k), λ^k_j +f_j(x^k)

,∀j∈[m], (69b)

10⁰ 10¹ 10² 10³ 10⁴ 10⁵ Number of Epochs

10^-15 10^-10 10^-5 10⁰ 10⁵

distance of objective to optimal value

LALM BLALM PD-YN 1/k rate

10⁰ 10¹ 10² 10³ 10⁴ 10⁵

Number of Epochs 10^-15

10^-10 10^-5 10⁰ 10⁵

Violation of constraint ^LALM_BLALM PD-YN 1/k rate

Figure 2: Convergence behaviors of Algorithm 1 (named LALM), Algorithm 2 (named BLALM), and the primal-dual type method (named PD-YN) in [21] on the QCQP problem (66). Left:

distance of objective value to the optimal value |f₀(x) − f₀(x^∗)|; Right: constraint residual Pm

j=1[fj(x)]+. The missing part on each constraint violation curve corresponds to zero residual.

where λ⁰_j = max(0,−f_j(x⁰)),∀j ∈ [m], and η is the step size. In the test, we also choose η adaptively by backtracking such that

φ(x^k+1, z^k)≤φ(x^k, z^k) +

∇_xφ(x^k, z^k), x^k+1−x^k +η^k

2 kx^k+1−x^kk², where φ(x, z) = f₀(x) +Pm

j=1z_jf_j(x) and z_j^k = λ^k_j +f_j(x^k). Although [21] does not show the convergence of PD-YN with the above adaptive η^k, we observe its better performance than that with a fixedη.

The problem size is set to m = 10 and p = 2000. We randomly generate SPD matrices Q_j, j = 0,1, . . . m. Ais set to zero, i.e., there is no linear equality constraint. The vectorcj’s are generated according to the standard Gaussian distribution, and d0 = 0 and each dj is a negative number for j ∈ [m]. Also we set l_i =−10 and u_i = 10 for each i∈ [p]. Hence, the zero vector is an interior point of X and makes every inequality hold strictly, namely, the Slater condition holds. We set β = 0.1 for both LALM and BLALM, and for the latter, we evenly partition the variable into 200 blocks. The parameter ρ_z is set toβ and ₂₀₀^β for the two algorithms respectively.

Figure 2plots the results by the three compared algorithms. Both the proposed methods perform significantly better than PD-YN, and BLALM is the best among the three. In addition, we notice that LALM and BLALM converge linearly when the iterate approaches to optimality, as indicated by Theorem 3.5.

7 Conclusions

We have presented a first-order method for solving composite convex programming with both equal-ity and smooth nonlinear inequalequal-ity constraints. The method is derived from proximal linearization of the classic augmented Lagrangian function. Its global iterate convergence and global sublinear and local linear convergence results have been established. For the problem that has coordinate friendly structure, we have also proposed a first-order randomized block update method and shown its global sublinear convergence in expectation. In addition, we have implemented the two methods on solving the basis pursuit denoising problem and the convex quadratically constrained quadratic programming. Global sublinear and local linear convergence are both observed in the numerical experiments.

References

[1] A. Ben-Tal and M. Zibulevsky. Penalty/barrier multiplier methods for convex programming problems. SIAM Journal on Optimization, 7(2):347–366, 1997. 2

[2] D. P. Bertsekas. Nonlinear programming. Athena scientific Belmont, 1999. 3

[3] D. P. Bertsekas. Constrained optimization and Lagrange multiplier methods. Academic press, 2014. 3

[4] P. L. Combettes and J.-C. Pesquet. Stochastic quasi-fej´er block-coordinate fixed point itera-tions with random sweeping. SIAM Journal on Optimization, 25(2):1221–1248, 2015. 7 [5] Y. Cui, X. Li, D. Sun, and K.-C. Toh. On the convergence properties of a majorized alternating

direction method of multipliers for linearly constrained convex optimization problems with coupled objective functions. Journal of Optimization Theory and Applications, 169(3):1013–

1041, 2016. 3

[6] X. Gao, Y. Xu, and S. Zhang. Randomized primal-dual proximal block coordinate updates.

arXiv preprint arXiv:1605.05969, 2016. 3

[7] X. Gao and S.-Z. Zhang. First-order algorithms for convex optimization with nonsepara-ble objective and coupled constraints. Journal of the Operations Research Society of China, 5(2):131–159, 2017. 3

[8] M. Grant, S. Boyd, and Y. Ye. CVX: Matlab software for disciplined convex programming, 2008. 25

[9] M. Hong, T.-H. Chang, X. Wang, M. Razaviyayn, S. Ma, and Z.-Q. Luo. A block successive upper bound minimization method of multipliers for linearly constrained convex optimization.

arXiv preprint arXiv:1401.7079, 2014. 3

[10] G. Lan, D. Renato, and C. Monteiro. Iteration-complexity of first-order augmented lagrangian methods for convex programming. Mathematical Programming, 155(1-2):511–547, 2016. 3 [11] Y.-F. Liu, X. Liu, and S. Ma. On the non-ergodic convergence rate of an inexact augmented

lagrangian framework for composite convex programming. arXiv preprint arXiv:1603.05738, 2016. 3

[12] V. Nedelcu, I. Necoara, and Q. Tran-Dinh. Computational complexity of inexact gradient augmented lagrangian methods: application to constrained mpc. SIAM Journal on Control and Optimization, 52(5):3109–3134, 2014. 3

[13] A. Nedi´c and A. Ozdaglar. Subgradient methods for saddle-point problems. Journal of opti-mization theory and applications, 142(1):205–228, 2009. 3

[14] Y. Nesterov. Gradient methods for minimizing composite functions. Mathematical Program-ming, 140(1):125–161, 2013. 3

[15] E. Pee and J. O. Royset. On solving large-scale finite minimax problems using exponential smoothing. Journal of optimization theory and applications, 148(2):390–421, 2011. 25

[16] Z. Peng, T. Wu, Y. Xu, M. Yan, and W. Yin. Coordinate-friendly structures, algorithms and applications. Annals of Mathematical Sciences and Applications, 1(1):57–119, 2016. 18 [17] Y. Xu. Accelerated first-order primal-dual proximal methods for linearly constrained composite

convex programming. SIAM Journal on Optimization, 27(3):1459–1484, 2017. 3

[18] Y. Xu. Asynchronous parallel primal-dual block update methods. arXiv preprint arXiv:1705.06391, 2017. 19

[19] Y. Xu. Global convergence rates of augmented lagrangian methods for constrained convex programming. preprint, 2017. 2,3,7,8

[20] Y. Xu and S. Zhang. Accelerated primal-dual proximal block coordinate updating methods for constrained convex optimization. arXiv preprint arXiv:1702.05423, 2017. 3,23

[21] H. Yu and M. J. Neely. A primal-dual type algorithm with the O(1/t) convergence rate for large scale constrained convex programs. In Decision and Control (CDC), 2016 IEEE 55th Conference on, pages 1900–1905. IEEE, 2016. 3,26,27

Dans le document First-order methods for constrained convex programming based on linearized augmented Lagrangian function∗ (Page 25-29)