The augmented Lagrangian method with full Jacobian decomposition and logarithmic-quadratic proximal regularization for multiple-block separable convex programming

(1)

SMAI-JCM

SMAI Journal of

Computational Mathematics

The augmented Lagrangian method with full Jacobian decomposition and logarithmic-quadratic proximal

regularization for multiple-block separable convex programming

Min Li & Xiaoming Yuan Volume 4 (2018), p. 81-120.

<http://smai-jcm.cedram.org/item?id=SMAI-JCM_2018__4__81_0>

cedram

Article mis en ligne dans le cadre du

Centre de diffusion des revues académiques de mathématiques http://www.cedram.org/

(2)

Vol. 4, 81-120 (2018)

The augmented Lagrangian method with full Jacobian

decomposition and logarithmic-quadratic proximal regularization for multiple-block separable convex programming

Min Li¹ Xiaoming Yuan²

1School of Management and Engineering, Nanjing University, China E-mail address: [email protected]

2Department of Mathematics, The University of Hong Kong, Hong Kong, China E-mail address: [email protected].

Abstract. We consider a separable convex minimization model whose variables are coupled by linear constraints and they are subject to the positive orthant constraints, and its objective function is in form of mfunctions without coupled variables. It is well recognized that when the augmented Lagrangian method (ALM) is applied to solve some concrete applications, the resulting subproblem at each iteration should be decomposed to generate solvable subproblems. When the Gauss-Seidel decomposition is implemented, this idea has inspired the alternating direction method of multiplier (form= 2) and its variants (form≥3). When the Jacobian decomposition is considered, it has been shown that the ALM with Jacobian decomposition in its subproblem is not necessarily convergent even when m = 2 and it was suggested to regularize the decomposed subproblems with quadratic proximal terms to ensure the convergence. In this paper, we focus on the multiple-block case with m ≥ 3. We consider implementing the full Jacobian decomposition to ALM’s subproblems and using the logarithmic-quadratic proximal (LQP) terms to regularize the decomposed subproblems. The resulting subproblems are all unconstrained minimization problems because the positive orthant constraints are all inactive; and they are fully eligible for parallel computation. Accordingly, the ALM with full Jacobian decomposition and LQP regularization is proposed. We also consider its inexact version which allows the subproblems to be solved inexactly. For both the exact and inexact versions, we comprehensively discuss their convergence, including their global convergence, worst-case convergence rates measured by the iteration-complexity in both the ergodic and nonergodic senses, and linear convergence rates under additional assumptions. Some preliminary numerical results are reported to demonstrate the efficiency of the ALM with full Jacobian decomposition and LQP regularization.

2010 Mathematics Subject Classification. 90C25, 90C33, 65K05.

Keywords. convex programming, splitting methods, augmented Lagrangian method, logarithmic-quadratic proximal, parallel computation, convergence rate.

1. Introduction

We consider the following separable convex minimization problem whose variables are coupled by linear constraints and they are subject to the positive orthant constraints, and its objective function is the sum of more than one function without coupled variables:

minⁿ

m

X

i=1

θ_i(x_i)

m

X

i=1

A_ix_i=b, x_i ∈ <ⁿ₊ⁱ, i= 1, . . . , m^o, (1.1) where θ_i :<ⁿⁱ → <(i= 1, . . . , m) are convex but not necessarily smooth functions; A_i ∈ <^l×nⁱ and b∈ <^l. The solution set of (1.1) is assumed to be nonempty throughout our discussions.

The first author was supported by the National Natural Science Foundation of China under grant 11001053 and Program for New Century Excellent Talents in University under grant NCET-12-0111.

The second author was supported by the General Research Fund from Hong Kong Research Grants Council: 12313516.

(3)

Letλ∈ <^l be the Lagrange multiplier associated with the linear equality constraints in (1.1) and the Lagrangian function of (1.1) be

L(x₁, . . . , x_m, λ) :=

m

X

i=1

θ_i(x_i)−λ^T

m

X

i=1

A_ix_i−b, (1.2)

defined on Ω :=<ⁿ₊¹× · · · × <ⁿ₊^m× <^l. Then, the augmented Lagrangian function of (1.1) is L_β(x₁, . . . , x_m, λ) :=

m

X

i=1

θ_i(x_i)−λ^T

m

X

i=1

A_ix_i−b+ β 2

m

X

i=1

A_ix_i−b², (1.3)

where β > 0 is a penalty parameter. If we treat the model (1.1) as a whole and apply directly the augmented Lagrangian method (ALM) in [21, 41], then the resulting scheme is







(x^k+1₁ , . . . , x^k+1_m ) := argminL_β(x1, . . . , xm, λ^k)xi ∈ <ⁿ₊ⁱ, i= 1, . . . , m , λ^k+1:=λ^k−β(^P^m_i=1A_ix^k+1_i −b).

(1.4)

In general, the minimization subproblem in (1.4) is hard because it requires minimizingm functions with variables coupled by the quadratic term in (1.3). This difficulty has inspired a series of splitting methods whose common idea is decomposing the subproblem in (1.4) and thus generating easier subproblems. For example, for the special case of (1.1) withm = 2, if the minimization subproblem in (1.4) is decomposed in Gauss-Seidel order, the scheme is











x^k+1₁ := argminL_β(x1, x^k₂, λ^k)x1 ∈ <ⁿ₊¹ , x^k+1₂ := argminL_β(x^k+1₁ , x₂, λ^k)x₂ ∈ <ⁿ₊² , λ^k+1 :=λ^k−β(A1x^k+1₁ +A2x^k+1₂ −b).

(1.5)

This is the so-called alternating direction method of multiplier (ADMM) in [17] and it has found many efficient applications in a broad spectrum of application domains such as image processing, statistical learning, computer vision, network optimization, and so on. We refer to [5, 13, 16] for some review papers on the ADMM. If we consider directly extending the scheme (1.5) to the generic case of (1.1) withm≥3, then the resulting direct extension of ADMM reads as











x^k+1₁ := argminL_β(x₁, x^k₂, . . . , x^k_m, λ^k)x₁ ∈ <ⁿ₊¹ ,

· · · ·

x^k+1_i := argminL_β(x^k+1₁ , . . . , x^k+1_i−1, xi, x^k_i+1, . . . , x^k_m, λ^k)xi∈ <ⁿ₊ⁱ ,

· · · ·

x^k+1_m := argminL_β(x^k+1₁ , . . . , x^k+1_m−1, xm, λ^k)xm ∈ <ⁿ₊^m , λ^k+1 :=λ^k−β(^P^m_i=1A_ix^k+1_i −b).

(1.6)

The direct extension of ADMM scheme (1.6) indeed works empirically for some applications, as shown in, e.g., [40, 43]. However, it was shown in [6] that the scheme (1.6) is not necessarily convergent. The convergence rate of ADMM and its extension are analysed in [9, 28, 30, 33, 32].

(4)

On the other hand, if we consider implementing the Jacobian decomposition to the ALM subproblem in (1.4), the resulting scheme reads as











x^k+1₁ := argminL_β(x1, x^k₂, . . . , x^k_m, λ^k)x1 ∈ <ⁿ₊¹ ,

· · · ·

x^k+1_i := argminL_β(x^k₁, . . . , x^k_i−1, x_i, x^k_i, . . . , x^k_m, λ^k)x_i∈ <ⁿ₊ⁱ ,

· · · ·

x^k+1_m := argminL_β(x^k₁, . . . , x^k_m−1, x_m, λ^k)x_m ∈ <ⁿ₊^m , λ^k+1 :=λ^k−β(^P^m_i=1Aix^k+1_i −b).

(1.7)

The x_i-subproblems in (1.7), which usually dominate the computation time at each iteration, can be solved in parallel; and they can be implemented in a distributed-computing system. This feature is of particular interest for the big-data scenario and the circumstances where parallel computing infrastructures are available. Note that the subproblems in (1.7) are of the same level of difficulty as those in (1.6) — each of them requires minimizing oneθi in the original objective of (1.1) plus a quadratic term with the positive orthant constraint<ⁿ₊ⁱ. The scheme (1.7), however, is not necessarily convergent even whenm= 2, as shown in [22]. In the literature, it was suggested to correct the output of (1.7) by some correction steps to ensure the convergence; some prediction-correction methods based on the Jacobian decomposition of ALM (1.7) were thus presented in the literature, see, e.g., [20, 22].

Note that these prediction-correction methods usually converge fast for some applications arising in image processing and other areas. But their correction steps need the solutions of thex_i-subproblems in (1.7) as the input and thus they are of less degrees of parallel computation. In [25, 10], it was proved that the convergence is ensured if the subproblems in (1.7) are regularized by quadratic proximal terms with sufficiently large proximal coefficients. For example, it was analyzed in [25] that the following scheme











x^k+1₁ := argminL_β(x1, x^k₂, . . . , x^k_m, λ^k) + ^sβ₂ kA₁(x1−x^k₁)k² x1 ∈ <ⁿ₊¹ ,

· · · ·

x^k+1_i := argminL_β(x^k₁, . . . , x^k_i−1, xi, x^k_i, . . . , x^k_m, λ^k) +^sβ₂kA_i(xi−x^k_i)k² xi∈ <ⁿ₊ⁱ ,

· · · ·

x^k+1_m := argminL_β(x^k₁, . . . , x^k_m−1, x_m, λ^k) +^sβ₂ kA_m(x_m−x^k_m)k² x_m∈ <ⁿ₊^m , λ^k+1 :=λ^k−β(^P^m_i=1Aix^k+1_i −b).

(1.8)

is convergent as long as the proximal coefficient s ≥ m−1. A more general analysis can be found in [10]. The scheme (1.8) requires no correction step because of the regularization of the quadratic terms ^sβ₂kA_i(xi−x^k_i)k² fori= 1, . . . , m; and the eligibility for parallel computation is remained. The iteration-complexity of a Jacobi-type non-Euclidean proximal ADMM for solving multi-block linearly constrained nonconvex programs is established in [35].

Note that thexi-subproblems in (1.8) are constrained minimization problems subject to the positive orthants. To further simplify these subproblems, we can apply the logarithmic-quadratic proximal terms, which firstly appeared in [3], to the subproblems in (1.7). The key point is that the LQP regularization automatically excludes the points on the boundaries of the constraints in the feasible regions; thus the positive orthant constraints in (1.7) all become inactive and the decomposed subproblems in (1.7) with the LQP regularization are all unconstrained. In the literature, the research on the combination of the LQP regularization with ALM-based splitting methods focuses only on the special case of (1.1) with m = 2 and mainly on the Gauss-Seidel decomposition. For instance, the combination of the LQP regularization with the ADMM scheme (1.5) in [46, 1] is in the variational inequality context. We also refer to [29] and [31] for the combination of the LQP with the generalized ADMM proposed in [12] and the strictly contractive Peaceman-Rachford splitting method proposed

(5)

in [23], respectively. Some other interesting applications of LQP can be found in, e.g., [2]. Finally, it is referred to [31, 44, 7] for the convergence rate analysis for the mentioned methods.

Our first purpose is proposing the scheme of ALM with Jacobian decomposition and LQP regularization for the multiple-block convex minimization model (1.1) with m ≥ 3, see (3.2) for detail.

Both the exact and inexact versions will be proposed. The inexact version allows the decomposed subproblems to be solved inexactly subject to certain inexactness criterion. To the best of our knowl- edge, it is the first work of combining the LQP regularization with the Jacobian decomposition of the ALM (1.7) for the generic case of (1.1) withm≥3. Note that using the LQP regularization, instead of the quadratic proximal terms, is particularly useful for the case where the functionsθ_i’s are generic and the subproblems in (1.7) are not simple enough to have closed-form solutions and thus the constrained subproblems in (1.7) need to be solved iteratively by a certain algorithm. In other words, the new scheme mainly differs from (1.7) in that only unconstrained subproblems are required to solve.

The second purpose of this paper is comprehensively analyzing the convergence for both the exact and inexact versions of the new scheme. More specifically, we discuss their convergence, including the global convergence, the worst-case convergence rate measured by the iteration-complexity in both the ergodic and nonergodic senses, and the linear convergence rates under additional assumptions.

The rest of this paper is organized as follows. In Section 2, we summarize some useful results and introduce some notation for further analysis. Then, we present the exact version of the ALM with full Jacobian decomposition and LQP regularization in Section 3, followed by some remarks. The convergence of the exact version of this new scheme is proved in Section 4. Then, we establish its convergence rate in Section 5. In Section 6, we present the inexact version of the new scheme, and analyze its convergence in Section 7. In Section 8, we report some preliminary numerical results to show the efficiency of the new scheme. Finally, we make some conclusions in Section 9.

2. Preliminaries

We first summarize some useful preliminaries known in the literature and introduce some notations to be used in the analysis. Some simple conclusions are also proved in this section.

2.1. The Logarithmic-quadratic Proximal Regularization

We first review the LQP regularization. More details are provided in [3]. Let us define ϕ(c) :=

(₁

2(c−1)²+µ(c−logc−1) ifc >0,

+∞ otherwise, (2.1)

for a given scalar µ∈(0,1). Associated with ϕ, for anyz∈ <^N₊₊, we define d(z⁰, z) :=





 PN

j=1

₁

2(z_j⁰ −z_j)²+µ(z_j²log^z_z^j0 j

+z_j⁰z_j−z_j²) ifz⁰ ∈ <^N₊₊,

+∞ otherwise, (2.2)

and

Φ⁰(z, z⁰) := (z1ϕ⁰(z₁⁰/z1),· · ·, zNϕ⁰(z_N⁰ /zN))^T ∀z, z⁰ ∈ <^N₊₊, (2.3) where

ϕ⁰(z_j⁰/z_j) =z_j⁰/z_j −1 +µ(1−z_j/z⁰_j), j= 1,· · · , N. (2.4) For any z⁰, z ∈ <^N₊₊, we have d(z⁰, z) ≥ kz⁰−zk²/2 and d(z⁰, z) = 0 if and only if z⁰ =z. Moreover, the functiond(·,·) defined in (2.2) can be rewritten as

d(z⁰, z) =

N

X

j=1

z_j²ϕ(z_j⁰/z_j) ∀z⁰, z∈ <^N₊₊,

(6)

and then we have

Φ⁰(z, z⁰) =∇_z⁰d(z⁰, z) = (z⁰−z) +µ[z−Z²(z⁰)⁻¹],

whereZ := diag(z₁, z₂, . . . , z_N)∈ <^N×N, (z⁰)⁻¹∈ <^N is a vector whose j-th element is 1/z_j⁰.

The following lemma was proved in [46] and it was inspired by Proposition 1 in [3]. We need this lemma to analyze the convergence for the new algorithms.

Lemma 2.1. Let P := diag(p₁, . . . , p_N)∈ <^N^×N be a positive definite diagonal matrix, q(z)∈ <^N be a monotone mapping ofzwith respect to<^N+, and ϑ:<^N → <. Letµ∈(0,1)be a constant. For given z, z¯ ∈ <^N₊₊, we define Z¯ := diag(¯z₁, . . . ,z¯_N), z⁻¹ := (1/z₁, . . . ,1/z_N)^T and

Φ⁰(¯z, z) = (z−z) +¯ µ(¯z−Z¯²z⁻¹).

Then, the variational inequality

ϑ(z⁰)−ϑ(z) + (z⁰−z)^T[q(z) +PΦ⁰(¯z, z)]≥0 ∀z⁰∈ <^N₊, (2.5) has the unique positive solution z. In addition, for this positive solution z ∈ <^N₊₊ and any z⁰ ∈ <^N₊, we have

ϑ(z)−ϑ(z⁰) + (z−z⁰)^T[q(z) + (1 +µ)P(z−z)]¯ ≤µk¯z−zk²_P, (2.6) where kzk²_P :=z^TP z.

2.2. Variational Reformulation of (1.1)

In our analysis, we need a variational reformulation of the convex minimization model (1.1). More specifically, let (x^∗₁, . . . , x^∗_m, λ^∗) be a saddle point of the Lagrange function (1.2). Then, for any (x1, . . . , xm, λ)∈ <ⁿ₊¹ × · · · × <ⁿ₊^m× <^l, we have the inequalities

L(x^∗₁, . . . , x^∗_m, λ) ≤ L(x^∗₁, . . . , x^∗_m, λ^∗) ≤ L(x₁, . . . , xm, λ^∗). (2.7) Setting (x₁, . . . , xi−1, x_i, x_i+1,· · · , x_m, λ^∗) = (x^∗₁, . . . , x^∗_i−1, x_i, x^∗_i+1, . . . , x^∗_m, λ^∗) in the second inequality of (2.7) fori= 1,· · ·, m, we get

x^∗_i ∈ <ⁿ₊ⁱ θ_i(x_i)−θ_i(x^∗_i) + (x_i−x^∗_i)^T(−A^T_i λ^∗)≥0 ∀x_i ∈ <ⁿ₊ⁱ, i= 1, . . . , m.

On the other hand, the first inequality in (2.7) means λ^∗∈ <^l (λ−λ^∗)^T

m

X

i=1

A_ix^∗_i −b≥0 ∀λ∈ <^l.

Recall that Ω =<ⁿ₊¹× · · · × <ⁿ₊^m× <^l. Thus, finding a saddle point of L(x₁, . . . , x_m, λ) is equivalent to finding a vectorw^∗ = (x^∗₁, . . . , x^∗_m, λ^∗)∈Ω such that











θ₁(x₁)−θ₁(x^∗₁) + (x₁−x^∗₁)^T(−A^T₁λ^∗)≥0 ∀x₁ ∈ <ⁿ₊¹,

· · ·

θi(xi)−θi(x^∗_i) + (xi−x^∗_i)^T(−A^T_i λ^∗)≥0 ∀x_i ∈ <ⁿ₊ⁱ,

· · ·

θ_m(x_m)−θ_m(x^∗_m) + (x_m−x^∗_m)^T(−A^T_mλ^∗)≥0 ∀x_m∈ <ⁿ₊^m, (λ−λ^∗)^T^P^m_i=1Aix^∗_i −b≥0 ∀λ∈ <^l.

(2.8)

We can rewrite (2.8) in a compact way: solving (1.1) is equivalent to findingw^∗ = (x^∗₁, . . . , x^∗_m, λ^∗)∈ Ω :=<ⁿ₊¹ × · · · × <ⁿ₊^m× <^l such that

VI(Ω, F, θ) : θ(x)−θ(x^∗) + (w−w^∗)^TF(w^∗)≥0 ∀w∈Ω, (2.9a)

(7)

where

x=





 x1

... xm





, θ(x) =

m

X

i=1

θi(xi), (2.9b)

w=





 x1

... xm

λ







and F(w) =







−A^T₁λ ...

−A^T_mλ Pm

i=1Aixi−b







. (2.9c)

Because the mapping F(w) defined in (2.9c) is affine with a skew-symmetric matrix, it is monotone.

We denote by Ω^∗ the solution set of VI(Ω, F, θ), and it is not nonempty under the non-emptiness assumption of the solution set of (1.1).

Then, we recall the characterization of the solution set Ω^∗ whose proof can be found in [14, 26]:

Ω^∗ := ^\

w∈Ω

w˜∈Ω|θ(x)−θ(˜x) + (w−w)˜ ^TF(w)≥0 . (2.10) With the characterization (2.10) and following Definition 1 in [38], we define an ε-approximation solution of VI(Ω, F, θ) as follows.

Definition 2.2. The vector ˜w∈Ω is called an ε-approximation solution of VI(Ω, F, θ) if it satisfies sup

w∈B_Ω( ˜w)

θ(˜x)−θ(x) + ( ˜w−w)^TF(w) ≤ε, (2.11) where

B_Ω( ˜w) :=w∈Ω| kw−wk ≤˜ 1 .

Based on this definition, for an algorithm, if aftert iterations, we can find ˜w∈Ω such that θ(˜x)−θ(x) + ( ˜w−w)^TF(w)≤ε ∀w∈ B_Ω( ˜w),

with ε= O(1/t), then we say this algorithm has a worst-case O(1/t) convergence rate measured by the iteration complexity. See, e.g., [36, 37] for more details.

The following lemma is useful for establishing a worst-caseo(1/t) convergence rate in Sections 5.3 and 7.4. It is similar as Lemma 1.2 in [10].

Lemma 2.3. If a sequence {a_t} ⊆ < obeys: (1) at ≥0; (2) ^P^∞_t=0at<+∞; (3) at ≤at−1+σt−1 for any integer t≥1, where the sequence{σ_t} satisfies^P^∞_t=1tσt<+∞ with σt≥0 for any integer t≥0, then we have a_t=o(1/t).

Proof. Since at≤at−1+σt−1, we get at≤ak+

t−1

X

j=k

σj ∀k≤t−1.

By assumptions (1)-(3) in this lemma, we have 0≤ t

2 ·at ≤

t

X

k=b₂^tc+1

ak+

t−1

X

k=b^t₂c+1 t−1

X

j=k

σj

≤

t

X

k=b^t

2c+1

a_k+

t−1

X

k=b^t

2c+1

kσ_k→0, ast→ ∞. Therefore, we geta_t=o(1/t). The proof is complete.

(8)

2.3. Some Notations

With the given positive scalarsβ and µ∈(0,1), we define the scalars ri > (m−1)β

1−µ λmax(A^T_i Ai) ∀i= 1, . . . , m,

whereAi is the coefficient matrix given in the model (1.1). We also define the matricesG,H,M,Nx

and N as following:

G:=







(1 +µ)r1In1 · · · −βA^T₁Am 0

... . .. ... ...

−βA^T_mA₁ · · · (1 +µ)r_mI_n_m 0 0 · · · 0 _β¹I_l







, (2.12)

H :=







(1 +µ)r₁I_n₁ · · · −βA^T₁A_m 0

... . .. ... ...

−βA^T_mA1 · · · (1 +µ)rmInm 0 0 · · · 0 _γβ¹ Il







, (2.13)

M := diag(In1, . . . Inm, γIl), Nx :=µ·diag(r1In1, . . . , rmInm) (2.14) and

N := diag(Nx,0), (2.15)

whereγ ∈(0,2).

Below we prove three assertions regarding the matrices just defined. These assertions make it pos- sible to present our convergence analysis for the new algorithms compactly with alleviated notation.

Lemma 2.4. Let β >0; µ∈(0,1); γ ∈(0,2) and r_i >(m−1)βλ_max(A^T_i A_i)/(1−µ), i= 1, . . . , m.

The matrices G, H, M and N defined respectively in (2.12)-(2.15) have the following relationships:

HM =G, H˜ :=G^T +G−M^THM−2N 0 and H 0. (2.16) Proof. Using the definitions of the matrices H,M and G, by a simple manipulation, we obtain

HM =







(1 +µ)r1In1 · · · −βA^T₁Am 0

... . .. ... ...

−βA^T_mA₁ · · · (1 +µ)r_mI_n_m 0 0 · · · 0 _γβ¹ I_l













I_n₁ · · · 0 0 ... . .. ... ... 0 · · · I_n_m 0 0 · · · 0 γI_l







=







(1 +µ)r1In1 · · · −βA^T₁Am 0

... . .. ... ...

−βA^T_mA₁ · · · (1 +µ)r_mI_n_m 0 0 · · · 0 _β¹I_l







=G.

The first assertion HM =Gis proved.

(9)

Consequently, we get

M^THM = M^TG

=







In1 · · · 0 0 ... . .. ... ... 0 · · · Inm 0 0 · · · 0 γI_l













(1 +µ)r₁I_n₁ · · · −βA^T₁A_m 0

... . .. ... ...

−βA^T_mA1 · · · (1 +µ)rmInm 0 0 · · · 0 ¹_βIl







=







(1 +µ)r₁I_n₁ · · · −βA^T₁A_m 0

... . .. ... ...

−βA^T_mA1 · · · (1 +µ)rmInm 0 0 · · · 0 ^γ_βI_l





 .

Using (2.12)-(2.15) and the above equation, we have H˜ = G^T +G−M^THM−2N

=







(1−µ)r1In1 · · · −βA^T₁Am 0

... . .. ... ...

−βA^T_mA₁ · · · (1−µ)r_mI_n_m 0 0 · · · 0 ^2−γ_β I_l







=







(m−1)βA^T₁A1 · · · −βA^T₁Am 0

... . .. ... ...

−βA^T_mA₁ · · · (m−1)βA^T_mAm 0

0 · · · 0 0







+







(1−µ)r1In1−(m−1)βA^T₁A1 · · · 0 0

... . .. ... ...

0 · · · (1−µ)r_mI_n_m−(m−1)βA^T_mA_m 0

0 · · · 0 ^2−γ_β I_l







= P^T







(m−1)I_l · · · −I_l 0 ... . .. ... ...

−I_l · · · (m−1)I_l 0

0 · · · 0 0





 P

+







(1−µ)r₁I_n₁ −(m−1)βA^T₁A₁ · · · 0 0

... . .. ... ...

0 · · · (1−µ)rmInm−(m−1)βA^T_mAm 0

0 · · · 0 ^2−γ_β Il





 ,(2.17)

with

P = diag(^pβA1, . . . ,^pβAm, Il).

Therefore, the matrix ˜H is positive definite ifβ >0; µ∈(0,1); γ ∈(0,2) and ri > (m−1)β

1−µ λmax(A^T_i Ai) ∀i= 1, . . . , m.

The second assertion is proved.

(10)

Similar as (2.17), the matrixH can be written as H = P^T







(m−1)I_l · · · −I_l 0 ... . .. ... ...

−I_l · · · (m−1)I_l 0

0 · · · 0 0





 P

+







(1 +µ)r₁I_n₁−(m−1)βA^T₁A₁ · · · 0 0

... . .. ... ...

0 · · · (1 +µ)rmInm−(m−1)βA^T_mAm 0

0 · · · 0 _γβ¹ I_l





 .

Therefore the matrixH is also positive definite. The proof is complete.

3. The ALM with Full Jacobian Decomposition and LQP Regularization — Exact Version

Now, we present the exact version of the ALM with full Jacobian decomposition and LQP regularization for solving the model (1.1). Some remarks will also be proved. Based on our previous introduction and motivation, the ALM with full Jacobian decomposition and LQP regularization can be summa- rized as follows:











x^k+1₁ := argminL_β(x1, x^k₂, . . . , x^k_m, λ^k) +r1d(x1, x^k₁)x1 ∈ <ⁿ₊¹ ,

· · ·

x^k+1_i := argminL_β(x^k₁, . . . , x^k_i−1, x_i, x^k_i+1, . . . , x^k_m, λ^k) +r_id(x_i, x^k_i)x_i ∈ <ⁿ₊ⁱ ,

· · ·

x^k+1_m := argminL_β(x^k₁, . . . , x^k_m−1, xm, λ^k) +rmd(xm, x^k_m)xm ∈ <ⁿ₊^m , λ^k+1:=λ^k−γβ(^P^m_j=1Ajx^k+1_j −b),

(3.1)

where the LQP regularizerd(·,·) is defined in (2.2), β >0, µ∈(0,1), γ ∈(0,2) and r_i > (m−1)β

1−µ λ_max(A^T_i A_i) ∀i= 1, . . . , m.

Recall the analysis in [3]. The LQP regularization terms r_id(x_i, x^k_i) force the solution of the x_i- subproblem in (3.1) to stay strictly in the interior of <ⁿ₊ⁱ. Hence, the constraints <ⁿ₊ⁱ are not active and the iterative scheme (3.1) can be further specified as











x^k+1₁ := argminL_β(x1, x^k₂, . . . , x^k_m, λ^k) +r1d(x1, x^k₁) ,

· · ·

x^k+1_i := argminL_β(x^k₁, . . . , x^k_i−1, x_i, x^k_i+1, . . . , x^k_m, λ^k) +r_id(x_i, x^k_i) ,

· · ·

x^k+1_m := argminL_β(x^k₁, . . . , x^k_m−1, xm, λ^k) +rmd(xm, x^k_m) , λ^k+1 :=λ^k−γβ(^P^m_j=1Ajx^k+1_j −b),

(3.2)

in which only munconstrained minimization subproblems are involved.

Algorithm 1.

Step 0.Let ε >0, β >0, µ∈(0,1), γ ∈(0,2)and r_i>(m−1)βλ_max(A^T_i A_i)/(1−µ), i= 1, . . . , m.

Choose (x⁰₁, . . . , x⁰_m, λ⁰)∈ <ⁿ₊₊¹ × · · · × <ⁿ₊₊^m × <^l. Set k:= 0.