• Aucun résultat trouvé

Block-wise Alternating Direction Method of Multipliers for Multiple-block Convex Programming and Beyond

N/A
N/A
Protected

Academic year: 2022

Partager "Block-wise Alternating Direction Method of Multipliers for Multiple-block Convex Programming and Beyond"

Copied!
31
0
0

Texte intégral

(1)

SMAI-JCM

SMAI Journal of

Computational Mathematics

Block-wise Alternating Direction Method of Multipliers for

Multiple-block Convex Programming and Beyond

Bingsheng He & Xiaoming Yuan Volume 1 (2015), p. 145-174.

<http://smai-jcm.cedram.org/item?id=SMAI-JCM_2015__1__145_0>

© Société de Mathématiques Appliquées et Industrielles, 2015 Certains droits réservés.

cedram

Article mis en ligne dans le cadre du

Centre de diffusion des revues académiques de mathématiques http://www.cedram.org/

(2)

Vol. 1, 145-174 (2015)

Block-wise Alternating Direction Method of Multipliers for Multiple-block Convex Programming and Beyond

Bingsheng He1 Xiaoming Yuan2

1Department of Mathematics, South University of Science and Technology of China, and Department of Mathematics, Nanjing University, China

E-mail address: hebma@nju.edu.cn

2Department of Mathematics, Hong Kong Baptist University, Hong Kong E-mail address: xmyuan@hkbu.edu.hk.

Abstract. The alternating direction method of multipliers (ADMM) is a benchmark for solving a linearly constrained convex minimization model with a two-block separable objective function; and it has been shown that its direct extension to a multiple-block case where the objective function is the sum of more than two functions is not necessarily convergent. For the multiple-block case, a natural idea is to artificially group the objective functions and the corresponding variables as two groups and then apply the original ADMM directly —- the block-wise ADMM is accordingly named because each of the resulting ADMM subproblems may involve more than one function in its objective. Such a subproblem of the block-wise ADMM may not be easy as it may require minimizing more than one function with coupled variables simultaneously. We discuss how to further decompose the block-wise ADMM’s subproblems and obtain easier subproblems so that the properties of each function in the objective can be individually and thus effectively used, while the convergence can still be ensured. The generalized ADMM and the strictly contractive Peaceman-Rachford splitting method, two schemes closely relevant to the ADMM, will also be extended to the block-wise versions to tackle the multiple-block convex programming cases. We present the convergence analysis, including both the global convergence and the worst-case convergence rate measured by the iteration complexity, for these three block-wise splitting schemes in a unified framework.

Math. classification. 90C25, 90C06, 65K05.

Keywords. Convex programming, Operator splitting methods, Alternating direction method of multipli- ers, proximal point algorithm, Douglas-Rachford splitting method, Peaceman-Rachford splitting method, Convergence rate, Iteration complexity.

1. Introduction

We consider a separable convex minimization problem with linear constraints and its objective function is the sum of more than one function without coupled variables:

minn

m

X

i=1

θi(xi)

m

X

i=1

Aixi =b, xiXi, i= 1,· · · , mo, (1.1) whereθi :Rni →R (i= 1,· · · , m) are convex (not necessarily smooth) closed functions; Ai ∈Rl×ni, b ∈ Rl, and Xi ⊆ Rni (i = 1,· · ·, m) are convex sets. The solution set of (1.1) is assumed to be nonempty throughout our discussions in this paper. We also assume that matrices ATi Ai for (i = 1, . . . , m) are all nonsingular.

Bingsheng He was supported by the NSFC grant: 11471156.

Xiaoming Yuan was supported by a General Research Fund from Hong Kong Research Grants Council.

(3)

Let the augmented Lagrangian function of (1.1) be Lmβ(x1, x2,· · ·, xm, λ) =

m

X

i=1

θi(xi)−λT

m

X

i=1

Aixib+β 2k

m

X

i=1

Aixibk2, (1.2) withλ∈Rlthe Lagrange multiplier andβ >0 a penalty parameter. For the special case of (1.1) with m= 2, the alternating direction method of multipliers (ADMM) in [14] reads as

xk+11 = arg minL2β(x1, xk2, λk)x1X1 , xk+12 = arg minL2β(xk+11 , x2, λk)x2X2 , λk+1=λkβ(A1xk+11 +A2xk+12b).

(1.3)

Recently, the ADMM has found many efficient applications for a broad spectrum of applications in various fields such as machine learning, statistical learning, computer vision, wireless network, and so on. We refer the reader to [1, 7, 12] for some recent review papers on the ADMM. For the multiple-block case of (1.1) with m≥3, the direct extension of ADMM reads as

xk+11 = arg minLmβ(x1, xk2,· · ·, xkm, λk)x1X1 ,

· · · ·

xk+1i = arg minLmβ(xk+11 ,· · · , xk+1i−1, xi, xki+1,· · · , xkm, λk)xiXi ,

· · · ·

xk+1m = arg minLmβ(xk+11 ,· · · , xk+1m−1, xm, λk)xmXm , λk+1=λkβ(Pmi=1Aixk+1ib).

(1.4)

The direct extension of ADMM scheme (1.4) indeed works empirically for some applications, as shown in, e.g. [32, 34]. However, it was shown in [3] that the scheme (1.4) is not necessarily convergent. In the literature, some surrogates of (1.4) have been well studied. For example, the schemes in [22, 21]

suggest correcting the output of (1.4) appropriately to ensure the convergence; the scheme in [23]

slightly changes the order of updating the Lagrange multiplier and twists some of the subproblems appropriately; the scheme in [26] suggests attaching a shrinking factor to the Lagrange multiplier updating step in (1.4).

Given the wide applicability of ADMM (1.3) for a two-block convex minimization model and the convergence difficulty of the direct extension of ADMM (1.4) for a multiple-block counterpart, in this paper we mainly answer the question of how to use the original ADMM scheme (1.3) directly for the multiple-block model (1.1) and to design an implementable algorithm with provable convergence which can individually take advantage of the properties of the functions in the objective of (1.1). Let us first elaborate on the difficulty by applying the original ADMM (1.3) to the model (1.1) withm≥3.

Conceptually, we can group them functions in the objective of (1.1) and accordingly all the variables as two groups; to which the original ADMM scheme (1.3) becomes applicable in a block-wise form.

That is, we can rewrite the model (1.1) as min

m1

X

i=1

θi(xi) +

m2

X

j=1

φj(yj)

m1

X

i=1

Aixi+

m2

X

j=1

Bjyj =b, xiXi, yjYj

, (1.5)

wherem1 ≥1,m2 ≥1 and m=m1+m2. Note that for the second group, we relabel (θi,xi,Ai,Xi) for i = m1+ 1,· · ·, m in (1.1) as (φj, yj, Bj, Yj) with j = 1,2,· · · , m2 in (1.5), respectively. This relabeling will significantly simplify our natation of presentation in the coming analysis and help us

(4)

expose our idea more clearly. Furthermore, with the notation x=

x1

... xm1

, y=

y1

... ym2

, A= (A1, . . . , Am1), B= (B1, . . . , Bm2), (1.6) and

ϑ(x) =

m1

X

i=1

θi(xi), ϕ(y) =

m2

X

j=1

φj(yj), X =

m1

Y

i=1

Xi, Y =

m2

Y

j=1

Yj, (1.7) The reformulation (1.5) of the model (1.1) can be written as the block-wise form

minϑ(x) +ϕ(y)Ax+By=b, x∈ X,y∈ Y . (1.8) The Lagrange function of (1.8) is

L2(x,y, λ) =ϑ(x) +ϕ(y)λT(Ax+By−b), (1.9) which defined on

Ω =X × Y ×Rl=X1× · · · ×Xm1×Y1× · · · ×Ym2 ×Rl. (1.10) The augmented Lagrangian function can be denoted by

L2β(x,y, λ) =L2(x,y, λ) +β

2kAx+By−bk2

=

m1

X

i=1

θi(xi) +

m2

X

j=1

φj(xj)−λT

m1

X

i=1

Aixi+

m2

X

j=1

Bjyjb+β 2

m1

X

i=1

Aixi+

m2

X

j=1

Bjyjb2. (1.11) Then, conceptually, the original ADMM (1.3) is implementable to the block-wise reformulation (1.8) of the model (1.1). The resulting scheme, called a block-wise ADMM hereafter, reads as

xk+1 = arg minL2β(x,yk, λk)x∈ X , yk+1= arg minL2β(xk+1,y, λk)y∈ Y , λk+1=λkβ(Axk+1+Byk+1b).

(1.12)

For a generic case of (1.1) withm≥3, however, solving thex- andy-subproblems in (1.12) is usually hard or even not feasible for many concrete applications of the abstract model (1.1). This is mainly due that such a subproblem is in a block-wise form and it may require minimizing more than one functions with variables coupled by the quadratic term in (1.11). Recall that for the ADMM (1.3) and its variants, all the functions in the objective of (1.1) are treated individually in their decomposed subproblems and thus these functions’ properties can be effectively exploited in the algorithmic implementation. One representative case arising often in sparse and/or low-rank optimization models is that when a function θi is simple in the sense that its proximal operator can be evaluated explicitly, then the corresponding subproblem or its linearized version is easy enough to have a closed-form solution —- meaning the corresponding subproblem is completely exempted from any inner iteration. This feature is indeed the main reason to account for the efficient applications of ADMM-related schemes in a broad spectrum of applications. Therefore, although the scheme (1.12) represents a direct application of the original ADMM to the block-wise reformulation (1.8) and it makes theoretical senses, it is necessary to discuss how to solve itsx- andy-subproblems efficiently. With the just-mentioned desire of taking advantage of the properties of θi’s individually, we can consider further decomposing the x- and y-subproblems in (1.12) into m1 and m2 smaller subproblems (implementing decomposition to the quadratic term in (1.11) which couples the variables), respectively; so that each of them only has to deal with one

(5)

θi(xi) in its objective. That is, for solving the multiple-block model (1.1) withm≥3, we can consider the following splitting version of the block-wise ADMM (1.12):

xk+11 = arg minL2β(x1, xk2,· · ·, xkm1,yk, λk)x1X1 , ...

xk+1i = arg minL2β(xk1, , xk2,· · ·, xki−1, xi, xki+1,· · · , xkm1,yk, λk)xiXi , ...

xk+1m1 = arg minL2β(xk1, xk2,· · ·, xkm1−1, xm1,yk, λk)xm1Xm1 , yk+11 = arg minL2β(xk+1, y1, yk2,· · · , ymk2, λk)y1Y1 ,

...

yk+1j = arg minL2β(xk+1, y1k, yk2,· · · , yj−1k , yj, yj+1k ,· · · , ymk2, λk)yjYj , ...

yk+1m2 = arg minL2β(xk+1, y1k, yk2,· · · , xkm2−1, ym2, λk)ym2Ym2 , λk+1 =λkβ(Axk+1+Byk+1b).

(1.13)

Note that in (1.13), internally we consider the parallel (i.e., Jacobian) decomposition for the x- and y-subproblems arising in (1.12). See Remark 3.5 for the discussion where the alternating (i.e., Gauss- Seidel) decomposition is implemented to these subproblems internally. With this consideration of parallel decomposition, it is possible to solve the smaller subproblems respectively decomposed by the x- and y-subproblems simultaneously on parallel computation infrastructures, which is particularly of interest for the scenarios where large-dimension data is considered and distributed computation is requested. Thus, the scheme (1.13) is a mixture of alternating decomposition outside (i.e, the block- wise variablesxandyare updated alternatingly) and parallel decomposition inside (i.e., the individual variables in both thex- andy-subproblems are updated in parallel). It is clear that the scheme (1.13) enjoys the same good feature as the ADMM-related schemes, because each of its subproblems could be very easy if the functions θi’s are easy enough (e.g., their proximal operators can be evaluated explicitly). Despite of this nice feature, the scheme (1.13), however, is not necessarily convergent — see a counter example in the appendix.

In fact, the divergence of (1.13) can be intuitively understood: The decomposed (x1, x2,· · · , xm1)- subproblems (Resp., (y1, y2,· · ·, ym2)-subproblems) in (1.13) represent a more implementable but inexact version of the block-wise x-subproblem (Resp., y-subproblem) in (1.12). This approxima- tion, however, is not accurate enough because the block-wise x-subproblem (Resp., y-subproblem) in (1.12) is decomposed bym1(Resp.,m2) times in (1.13). Therefore, the convergence of the block-wise ADMM (1.12), which is indeed guaranteed under the condition that both of its block-wise subproblems must be solved exactly or inexactly but with certain requirement on the inexactness (see e.g. [18, 30]), does not necessarily hold for its splitting version (1.13). The counter example in the appendix is indeed a case of the model (1.1) with m = 3. Thus, we have m1 = 1 and m2 = 2 when implementing the splitting version (1.13), meaning that the xi-subproblem in (1.13) is the same as that in (1.12) while only the y-subproblem in (1.12) is approximated by two further decomposed subproblems in (1.13).

For this case with the least extent of approximation, the scheme (1.13) is already shown not to be necessarily convergent by this counter example. This fact clearly shows the failure of convergence of (1.13) for a generic case where both m1 ≥2 and m2 ≥2. How to derive a convergence-guaranteed algorithm based on the scheme (1.13) is thus the emphasis of this paper.

We will present an algorithm based on the scheme (1.13) in Section 3, preceding by some preliminar- ies summarized in Section 2 for further analysis. Then, in Section 4, we prove the convergence for the new scheme and establish its worst-case convergence rate measured by the iteration complexity in both

(6)

the ergodic and a nonergodic senses. In Sections 5 and 6, we apply similar analysis to the generalized ADMM in [6] and the strictly contractive Peaceman-Rachford splitting method in [20], respectively, and obtain their block-wise versions which are suitable for the model (1.1). The convergence analysis, including the convergence rate analysis, for these three block-wise splitting methods can be presented in a unified framework. Finally, we make some conclusions in Section 7.

2. Preliminaries

In this section, we summarize some known results in the literature for further analysis.

2.1. A variational inequality characterization

We denote by x = (x1, . . . , xm1) and y = (y1, . . . , ym2). Let (x,y, λ) be a saddle point of the Lagrange function (1.9). Then, for anyλ∈Rl,x∈ X,y∈ Y, we have

L2(x,y, λ)L2(x,y, λ)≤L2(x,y, λ).

Indeed, finding a saddle point ofL2(x,y, λ) can be expressed as the following variational inequalities:

x1, . . . , xm1, y1, . . . , ym2, λ∈Ω where Ω is defined in (1.10), such that

x1X1, θ1(x1)−θ1(x1) + (x1x1)T(−AT1λ)≥0, ∀x1X1, ...

xm1Xm1, θm1(xm1)−θm1(xm1) + (xm1xm1)T(−ATm

1λ)≥0, ∀xmXm, y1Y1, φ1(y1)−φ1(y1) + (y1y1)T(−B1Tλ)≥0, ∀y1Y1,

...

ym2Ym2, φm2(ym2)−φm2(ym2) + (ym2ym2)T(−BTm2λ)≥0, ∀ym2Ym2, λ ∈Rl, (λ−λ)T(Pmi=11 Aixi +Pmj=12 Bjyjb)≥0, ∀λ∈Rl.

(2.1)

More compactly, the variational inequalities in (2.1) can be rewritten in a compact form

VI(Ω, F, θ) w ∈Ω, f(u)−f(u) + (w−w)TF(w)≥0, ∀w∈Ω, (2.2a) with the definitions:

f(u) =

m1

X

i=1

θi(xi) +

m2

X

j=1

φj(yj), (2.2b)

u=

x1

... xm1

y1 ... ym2

, w=

x1

... xm1

y1 ... ym2

λ

, F(w) =

−AT1λ ...

−ATm

1λ

−B1Tλ ...

−BmT

2λ Pm1

i=1Aixi+Pmj=12 Bjyjb

(2.2c)

and Ω is given in (1.10) (also can be expressed as Ω =X × Y ×Rl). We also denote by Ω the set of all saddle points ofL2β(x,y, λ).

Recall the notation defined in (1.6)-(1.7). For the variational inequality (2.2). we further have

f(u) =ϑ(x) +ϕ(y), (2.3a)

(7)

where

u= x

y

, w=

x y λ

, F(w) =

−ATλ

−BTλ Ax+By−b

(2.3b)

This variational inequality indeed characterizes the first-order optimality condition of the block-wise reformulation (1.5) of the model (1.1). We need this variational inequality characterization for the upcoming theoretical analysis.

2.2. Some properties of the matrices A and B

For the matrices Aand Bdefined in (1.6), we have some properties which are useful for our analysis.

We summarize them in the following lemma; its proof is omitted as it is trivial.

Lemma 2.1. For the matrices definedA and B defined in (1.6), we have the following conclusions:

m1·diag(ATA) ATA and m2·diag(BTB) BTB, (2.4) where diag(ATA) and diag(BTB) are defined by

diag(ATA) :=

AT1A1 0 · · · 0 0 . .. ... ... ... . .. ... 0 0 · · · 0 ATm1Am1

and diag(BTB) :=

B1TB1 0 · · · 0 0 . .. ... ... ... . .. ... 0 0 · · · 0 BmT2Bm2

,

respectively.

Lemma 2.2. Let τi > mi−1 for i= 1,2. Then we have

DA:= (τ1+ 1)βdiag(ATA)−βATA 0 (2.5) and

DB := (τ2+ 1)βdiag(BTB)−βBTB 0, (2.6) where diag(ATA) and diag(BTB) are defined in Lemma 2.1.

Proof. We need only to show (2.5). It follows from (2.4) that

DA= (τ1+ 1)βdiag(ATA)−βATA β1+ 1−m1)diag(ATA).

The positive definiteness ofDAfollows from (τ1+ 1−m1)>0 and diag(ATA)0 directly.

3. A splitting version of the block-wise ADMM (1.12) for (1.1)

In this section, we present a splitting version of the block-wise ADMM (1.12) for the model (1.1) and give some remarks.

Algorithm 1: A splitting version of the block-wise ADMM for (1.1)

Initialization: Specify a grouping strategy for (1.1) and determine the integers m1 and m2. Choose the constantsτ1> m1−1;τ2> m2−1 andβ >0. For a given iteratewk= (xk,yk, λk) = (xk1, . . . , xkm1, yk1, . . . , ymk2, λk), the new iteratewk+1 is generated by the following steps.

xk+1i = arg min

xi∈Xi

nL2β xk1, . . . , xki−1, xi, xki+1, . . . , xkm1,yk, λk

+τ12βkAi(xixki)k2o

, i= 1,· · ·, m1, yk+1j = arg min

yj∈Yj

nL2β xk+1, yk1, . . . , yj−1k , yj, yj+1k , . . . , ymk2, λk

+τ22βkBj(yjyjk)k2o

, j= 1,· · ·, m2, λk+1=λkβ Axk+1+Byk+1b

.

(3.1)

(8)

Remark 3.1. As analyzed in the introduction, the scheme (1.13) is not necessarily convergent be- cause the decomposed subproblems therein might not be accurate enough to approximate thex- and y-subproblems in the block-wise ADMM (1.12). Compared to (1.13), the splitting version (3.1) prox- imally regularizes the decomposed subproblems in (1.13) but the proximally regularized subproblems are of the same difficulty as their original ones in (1.13). In fact, these added proximal terms play the role of controlling the proximity of the solutions of the decomposed subproblems to the solutions of the block-wise subproblems in (1.12), and the extent of this control is determined by the proximal coefficientsτ1 and τ2. This is the intrinsic mechanism why the convergence of the scheme (3.1) can be sufficiently guaranteed by adding some proximal terms in its subproblems. The requirementτi> mi−1 for i= 1,2 is the specific mathematical condition for how large the mentioned proximity should be controlled to sufficiently lead to convergence. More specifically, we require this condition to guarantee the positive definiteness of the matrix G defined in (4.13) and thus ensure the strict contraction for the sequence generated by the scheme (3.1) with respect to the solution set.

Remark 3.2. Algebraically, the scheme (3.1) can be derived from the proximal version of the block- wise ADMM (1.12) with appropriate choices of proximal terms, see, e.g., [18] for an abstract discussion in the variational inequality context. But it should be mentioned that it is significantly different from the so-called linearized ADMM in the literature (e.g., [8, 35, 36]), which aims at linearizing the quadratic terms in ADMM’s subproblems with a sufficiently large proximal parameter (e.g., it should be greater than β · kATi Aik if the xi-subproblem in (3.1) is considered) and thus alleviating the linearized subproblems. In other words, the proximal parameters in current linearized ADMM literature are dependent on the involved matrices of the corresponding quadratic terms and they may need to be sufficiently large to ensure the convergence if the matrices happen to be ill-conditioned, see, e.g., [16]. In (3.1), however, the proximal parameters τ1 and τ2 are just constants independent of the matrices. Indeed, they just rely on the user-assigned numbers of blocks for regrouping the model (1.1); and could be small. More importantly, the idea of using proximal terms to regularize the subproblems in the block-wise ADMM (1.12) and obtaining (3.1) is just for decomposing each block- wise ADMM subproblem into smaller and simpler ones when the block-wise reformulation (1.5) of (1.1) is considered. We do can further discuss how to linearize the subproblems in (3.1) and obtain even easier subproblems, as mentioned at the end in Section 7. But our emphasis is just the scheme (3.1) and we do not discuss more elaborated versions of it.

Remark 3.3. Notice that if we use xk, instead of xk+1, for the yj-subproblems in (1.13), it leads to a full Jacobian decomposition scheme of the application of the augmented Lagrangian method to the model (1.1); and its divergence was shown in [17] even for the case of m = 2. For this case, adding appropriate proximal terms to regularize the resulting subproblems can also guarantee the convergence, see [17] for detailed analysis. The scheme (3.1) differs from the scheme in [17] in that the block-wise variables x and y are updated alternatingly, i.e., it is xk+1, not xk for updating the yj-subproblems in (3.1). This fact indeed represents a more accurate approximation to the augmented Lagrangian method when it is applied to the model (1.1). This difference also brings us a significant difference in the requirement on the proximal coefficientsτ1 and τ2: The conditions τ1 > m1−1 and τ2 > m2 −1 in (3.1) are much less restrictive than that in [17] which is required to be greater than m−1. For example, if we consider a case of (1.1) with m = 20, then the scheme in [17] requires the proximal coefficient to be greater than 19. Meanwhile, if we choosem1 =m2 = 10 to implement the scheme (3.1), then τ1 and τ2 are only required to be greater than 9. This is a very important difference because a larger proximal coefficient makes the proximal term play a more important role in the objective function and thus the proximity is controlled in a more conservative way and it is more likely to result in slower convergence.

Remark 3.4. Note that we only discuss grouping the variables and functions of (1.1) “blindly”

as (1.5). For a particular application of (1.1), based on some known information or features, we may

(9)

group the functions and variables more smartly and even discuss an optimal grouping strategy; but we would emphasize that how to group the variables and functions better for a particular application really depends on the particular structures and features of a given application itself. In this paper, we just provide the methodology and theoretical analysis to guarantee the convergence for the most general setting in form of (1.5).

Remark 3.5. As mentioned, the splitting scheme (3.1) is obtained by implementing the parallel (i.e., Jacobian) decomposition to the block-wise x- and y-subproblems in (1.13). If we consider the alternating (i.e., Gauss-Seidel) decomposition for the same subproblems in (1.13), then it is easy to see that the direct extension of ADMM (1.4) is recovered. Similar as what we will analyze, we can also consider how to ensure the convergence for the scheme (1.4) by adding proximal regularization terms to its subproblems. For succinctness, we omit the detail of analysis.

4. Convergence analysis

In this section, we analyze the convergence for the splitting version (3.1) of the block-wise ADMM (1.12). We will prove the global convergence, and establish the worst-case convergence rate measured by both the ergodic and a nonergodic senses. First of all, we rewrite the iterative scheme of (3.1) as a form that is more favorable for our analysis.

4.1. A reformulation of (3.1)

In our analysis, we need the auxiliary variables

x˜k =xk+1, y˜k=yk+1 (4.1)

and

λ˜k =λkβ Axk+1+Bykb. (4.2)

Recall the notation w= (x,y, λ) = (x1,· · · , xm1, y1,· · · , ym2, λ) with superscripts k and k+ 1; and further define ˜w= (˜x,y,˜ λ) = (˜˜ x1,· · ·,x˜m1,y˜1,· · · ,y˜m2,λ) with superscripts. Then, we can artificially˜ rewrite the iterative scheme of (3.1) as the following prediction-correction framework.

Prediction.

˜

xki = arg min

xi∈Xi

nL2β xk1, . . . , xki−1, xi, xki+1, . . . , xkm1,yk, λk +τ1β

2 kAi(xixki)k2o

, (4.3a)

y˜kj = arg min

yj∈Yj

nL2β x˜k, yk1, . . . , yj−1k , yj, ykj+1, . . . , ykm

2, λk +τ2β

2 kBj(yjyjk)k2o

, (4.3b)

λ˜k=λkβ xk+Bykb

. (4.3c)

Correction.

wk+1 =wkM(wkw˜k), (4.4a)

where

M =

I 0 0

0 I 0

0 −βB I

, (4.4b)

and ˜wk is the related sub-vector of the predictor ˜wk generated by (4.3).

(10)

The iterative scheme (4.3)-(4.4) can also be understood as a prediction-correction method. We would emphasize that it is the scheme (3.1) that will be implemented in practice; and the reformulation (4.3)- (4.4) is considered in our analysis because of two reasons. First, our upcoming convergence analysis is essentially based on the effort of showing that the sequence generated by (3.1) is strictly contractive with respect to the solution set of (1.1) and it turns out that the progress of proximity to the solution set at each iteration can be measured by the quantitykwkw˜kk2G where Gis defined in (4.13), see the inequality (4.20) in Theorem 4.4. Second, the analysis we will consider in Sections 5 and 6 for two other methods can also be written as prediction-correction frameworks; and using this kind of reformulations can make us present the analysis for these three methods in a unified framework.

4.2. Global convergence

Indeed, to prove the global convergence of the scheme (3.1), we mainly need to prove two conclusions.

We summarize them in the following two theorems respectively.

Theorem 4.1. Let w˜k be generated by (4.3) from the given vector wk. Then we have

w˜k ∈Ω, f(u)−f( ˜uk) + (w−w˜k)TF( ˜wk)≥(w−w˜k)TQ(wkw˜k), ∀w∈Ω, (4.5) where Q is defined by

Q=

1+ 1)βdiag(ATA)−βATA 0 0

0 (τ2+ 1)βdiag(BTB) 0

0 −B β1I

. (4.6)

Proof. First, the xi-subproblem in (4.3a) can be written as x˜ki = arg min

xi∈Xi

nL2βxk1, . . . , xki−1, xi, xki+1, . . . , xkm1,yk, λk+ τ1β

2 kAi(xixki)k2o

(1.11)

= arg min

xi∈Xi

( θi(xi)−(λk)TAixi+β2kAi(xixki) + (Axk+Bykb)k2 +τ12βkAi(xixki)k2,

) ,

in which some constant terms are ignored in its objective function. The first-order optimality condition of the above convex minimization problem can be written as

x˜kiXi, θi(xi)−θixki) + (xix˜ki)T−ATi λk

+βATiAixkixki) + (Axk+Bykb)+τ1βATi Aixkixki) ≥0, ∀xiXi. (4.7) Then, it follows from (4.3c) that

λk= ˜λk+β Ax˜k+Bykb. Hence, based on (4.7), we have

x˜kiXi, θi(xi)−θixki) + (xix˜ki)T−ATi ˜λk+β(Ax˜k+Bykb)

+βATi Aixkixki) + (Axk+Bykb)+τ1βATi Aixkixki) ≥0, ∀xiXi. Consequently, we have

x˜kiXi, θi(xi)−θixki) + (xix˜ki)T−ATi ˜λk

+βATi Aixkixki)− A(˜xkxk)+τ1βATi Aixkixki) ≥0, ∀xiXi, and it can be written as

x˜kiXi, θi(xi)−θixki) + (xix˜ki)T−ATi ˜λk

−βATi A(˜xkxk) + (τ1+ 1)βATi Aixkixki) ≥0, ∀xiXi.

Références

Documents relatifs

In this paper, we consider the complementary problem of finding best case estimates, i.e., how slow the algorithm has to converge, and we also study exact asymptotic rates

Milestone Pattern Expand Perimeter Shrink Perimeter Launch Delegated Manager Guard Adaptation Unless Guard Pattern Add Aspects Remove Aspects Mileston Pattern Update Exporting

Then, when the non-smooth part is also partly smooth relative to a smooth submanifold, we establish finite identification of the latter and provide sharp local linear

– Finite Time Activity Identification Under the assumption that the non- smooth component of the optimization problem is partly smooth around a global minimizer relative to its

%( This should be the ground for artistic creativity. The notion of imitation was no longer associated with the ‘ technical virtuosity of mimesis’ inherited from the culture

Par ailleurs, une réduction de la concentration de Dickkopf-related protein 1 (DKK-1), un récepteur antagoniste de la signalisation Wnt pouvant être aussi associé à

In [5], it was extended to non-smooth problems by allowing general proximity operators; The Barzilai-Borwein method [6] uses a specific choice of step-length t k motivated

The goal of this work is to understand the local linear convergence behaviour of Douglas–Rachford (resp. alternating direction method of multipliers) when the in- volved