• Aucun résultat trouvé

Sparse-BSOS: a bounded degree SOS hierarchy for large scale polynomial optimization with sparsity

N/A
N/A
Protected

Academic year: 2021

Partager "Sparse-BSOS: a bounded degree SOS hierarchy for large scale polynomial optimization with sparsity"

Copied!
29
0
0

Texte intégral

(1)

HAL Id: hal-01341931

https://hal.archives-ouvertes.fr/hal-01341931v3

Submitted on 26 May 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Sparse-BSOS: a bounded degree SOS hierarchy for large scale polynomial optimization with sparsity

Tillmann Weisser, Jean-Bernard Lasserre, Kim-Chuan Toh

To cite this version:

Tillmann Weisser, Jean-Bernard Lasserre, Kim-Chuan Toh. Sparse-BSOS: a bounded degree SOS hierarchy for large scale polynomial optimization with sparsity. Mathematical Programming Compu- tation, Springer, 2018, 10, pp.1–32. �hal-01341931v3�

(2)

Sparse-BSOS: a bounded degree SOS hierarchy for large scale polynomial optimization with sparsity

Tillmann Weisser Jean B. Lasserre and Kim-Chuan Toh§ May 26, 2017

Abstract

We provide a sparse version of the bounded degree SOS hierarchy BSOS [6] for polynomial optimization problems. It permits to treat large scale problems which satisfy a structured sparsity pattern. When the sparsity pattern satisfies the running intersection property this Sparse-BSOS hierarchy of semidefinite programs (with semidefinite constraints of fixed size) converges to the global optimum of the original problem. Moreover, for the class of SOS- convex problems, finite convergence takes place at the first step of the hierarchy, just as in the dense version.

Keywords: Global Optimization, Semidefinite Programming, Sparsity, Large Scale Problems, Convex Relaxations, Positivity Certificates

MSC: 90C26, 90C22

1 Introduction

We consider the polynomial optimization problem:

f := min

x {f(x) : xK} (P)

where f ∈R[x] is a polynomial and K⊂Rn is the basic semi-algebraic set

K:={x∈Rn:gj(x)≥0, j = 1, . . . , m}, (1) for some polynomials gj ∈R[x],j= 1, . . . , m. In [6] Lasserre et al. have provided BSOS, a new hierarchy of semidefinite programs (Qkd) indexed by d∈ Nand parametrized by k ∈N (fixed), whose associated (monotone non-decreasing) sequence of optimal values (ρkd)d∈Nconverges tof as d→ ∞, i.e., ρkdf asd→ ∞.

One distinguishing feature of the BSOS hierarchy (when compared to the LP-hierarchy de- fined in [8]) isfinite convergencefor an important class of convex problems. That is, whenf,−gj

are SOS-convex polynomials of degree bounded by 2k, then the first semidefinite relaxation of

The work of the first author is partially supported by a PGMO grant fromFondation Math´ematique Jacques Hadamardand a grant from theERC councilfor the Taming project (ERC-Advanced Grant #666981 TAMING).

LAAS-CNRS, University of Toulouse, LAAS, 7 Avenue du Colonel Roche, 31031 Toulouse c´edex 4, France (tweisser@laas.fr).

LAAS-CNRS and Institute of Mathematics, University of Toulouse, LAAS, 7 Avenue du Colonel Roche, 31031 Toulouse c´edex 4, France (lasserre@laas.fr).

§Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076 (mattohkc@nus.edu.sg).

(3)

the hierarchy (Qkd)d∈N, is exact, i.e., ρk1 = f. (In contrast the LP-hierarchy cannot converges in finitely many steps for such convex problems).

The BSOS hierarchy has also two other important distinguishing features, now when com- pared to the standard SOS hierarchy defined in [7, 11, 12] (let us call it PUT as it is based on Putinar’s Positivstellensatz).

• For each semidefinite relaxationQkd,d∈N, the size of the semidefinite constraint isO(nk), hence fixed and controlled by the parameterk(fixed and chosen by the user). Withk= 0 one retrieves the LP-hierarchy based on a positivity certificate due to Stengle; see [6] and [8] for more details.

• For the important class of quadratic/quadratically constrained problems, the first relax- ation of the BSOS hierarchy is at least as good as the first relaxation of the PUT hierarchy.

In this paper we introduce a sparse version of the BSOS hierarchy to help solve large scale polynomial optimization problems that exhibit some structured sparsity pattern. This hierarchy has exactly the same advantages as BSOS, now when compared with the sparse version of the standard SOS hierarchy introduced in Waki et al. [24].

Motivation

In general the standard SOS hierarchy PUT [7] is considered very efficient (and hard to beat) when it can be implemented. For instance, PUT’s first relaxation solves at optimality sev- eral small size instances of the Optimal Power Flow problem (OPF), an important problem in management of energy networks (essentially a quadratic/quadratically constrained optimization problem); see e.g. [16]. But even for relatively small size OPFs, PUT’s second relaxation cannot be implemented because the size of some matrices required to be positive semidefinite (psd) is too large for state-of-the art semidefinite solvers.

This was an important motivation for introducing the BSOS hierarchy [6]: That is, to provide an alternative to PUT, especially in cases where PUT’s second relaxation cannot be implemented. In particular, in all relaxations of the BSOS hierarchy (with parameter k = 1) the size of the matrix required to be psd is only O(n) (in contrast to O(n2) for PUT’s second relaxation). Thus for quadratic/quadratically constrained problems where PUT’s second relaxation cannot be implemented, the second and even higher order relaxations of BSOS can be implemented and provide better lower bounds. 1 When they are strictly less than the optimal value these lower bounds might still be useful as they can be exploited in some other procedure.

The motivation for introducing the sparse version of BSOS is exactly the same as for BSOS, but now for large scale polynomial optimization problems that exhibit some sparsity pattern.

For such problems the sparse version of the SOS hierarchy introduced in Waki et al. [24] (that we call “Sparse-PUT” in the sequel) is in general very efficient, in particular when its second relaxation can be implemented. So there is a need to provide an alternative to the latter, especially in cases where its second relaxation cannot be implemented – for instance for some large scale OPF problems (where indeed the second relaxation cannot be implemented, at least in its initial form), e.g as described in [15].

1Recently Marandi et al. [14] have shown that BSOS provides better bounds than the first relaxation of PUT for the Pooling Problem - another instance of quadratic/quadratically constrained problems. However their experiments also show that BSOS (i) does not always solve these difficult problems to optimality at a low level

“d” relaxation and (ii) does not scale well.

(4)

Contribution

Even though in the BSOS hierarchy [6] the sizeO(nk) of the semidefinite constraint ofQkdis fixed for alldand permits to handle problems (P) of size larger than with the standard SOS-hierarchy, its application is still limited to problems of relatively modest size (say medium size problems).

Our contribution is to provide a sparse version “Sparse-BSOS” of the BSOS hierarchy which permits to handle large size problems (P) that satisfy some (structured) sparsity pattern. The Sparse-BSOS hierarchy is to its dense version BSOS what the Sparse-PUT hierarchy is to the standard SOS-hierarchy PUT. Again as in the dense case, a distinguishing feature of the Sparse- BSOS hierarchy (and in contrast to Sparse-PUT) is that the size of the resulting semidefinite constraints is fixed in advance at the user convenience and does notdepend on the rank in the hierarchy. However, such an extension is not straightforward because in contrast to Putinar’s SOS-based certificate [18] (where the gj’s appear separately), the positivity certificate used in the dense BSOS algorithm [6] potentially mixes all polynomials gj that defineK, that is, if f is positive on Kthen

f = σ+ X

α,β∈(N0)m

cαβ

Ym

j=1

gjαj(1−gj)βj, (2)

for some SOS polynomial σ and positive scalar weightscαβ. Therefore in principle the sparsity as defined in [24] may be destroyed in σ and in the products Qjgjαj(1 −gj)βj. In fact, one contribution of this paper is to provide a specialized sparse version of (2). In particular, we prove that if the sparsity pattern satisfies the so-called Running Intersection Property (RIP) then the Sparse-BSOS hierarchy also converges to the global optimum f of (P). A sufficient rank-condition also permits to detect finite convergence. Last but not least, we also prove that the Sparse-BSOS hierarchy preserves two distinguishing features of the dense BSOS hierarchy.

Namely:

• Finite convergence at the first step of the hierarchy for the class of SOS-convex problems.

(Recall that the standard LP hierarchy cannot converge in finitely many steps for such problems [8, 12].)

• For quadratic/quadratically constrained problems (with a sparsity pattern), the first re- laxation of the Sparse-BSOS hierarchy is at least as good as that of Sparse-PUT. Hence when Sparse-PUT’s second relaxation cannot be implemented the Sparse-BSOS hierarchy (with parameter k= 1) will provide better lower bounds.

This also suggests that the Sparse-BSOS relaxations could be useful in Branch & Bound algo- rithms for solving large scale Mixed Integer Non Linear programs (MINLP). Indeed for such algorithms the quality of lower bounds computed at each node of the search tree is crucial for the overall computational efficiency.

The sparsity pattern. Roughly speaking we say that problem (P) satisfies a structured sparsity pattern, if the set I0 := {1, . . . , n} of all (indices of) variables is some union ∪pℓ=1Ik

of smaller blocks of (indices of) variables I such that each monomial of the objective function only consists of variables in one of the blocks. In addition, each polynomial gj in the definition (1) of the feasible set, is also a polynomial only in variables of one of the blocks. Of course the blocks I1, . . . , Ip may overlap, i.e., variables may appear in several blocks. Together with the maximum degree appearing in the data of (P), the number and size of the blocks I1, . . . , Ip as well as the size of their overlaps, are the characteristics of the sparsity pattern which have the strongest influence on the performance of our algorithm.

(5)

Computational experiments. We have tested the Sparse-BSOS hierarchy against both its dense version BSOS and Sparse-PUT [24]. To compare Sparse-BSOS and BSOS we consider problems of small and medium size (≤100 variables) and different sparsity patterns. Also we show that Sparse-BSOS can solve sparse large scale problems with up to 3000 variables which by far is out of the scope of the dense version in a reasonable time on a standard lap-top.2

To compare Sparse-BSOS and Sparse-PUT we have considered several problems from the literature in non-linear optimization and random medium size non-linear problems. For several such problems the first or second relaxation of the Sparse-PUT hierarchy cannot be solved whereas the first Sparse-BSOS relaxations d= 1,2. . . with a small parameter k= 1 or 2 can3. Therefore in such cases a good approximation (or even the optimal value) can be obtained by the Sparse-BSOS hierarchy. In our numerical experiments the problems were chosen in such a way that only the first relaxation of Sparse-PUT can be implemented, and indeed for such cases the Sparse-BSOS relaxations provide better lower bounds (and sometimes even the optimal value).

2 Preliminaries

2.1 Notation and definitions

Let R[x] be the ring of polynomials in the variablesx= (x1, . . . , xn). Denote by R[x]d ⊂R[x]

the vector space of polynomials of degree at most d, which has dimension s(d) := n+dd , with e.g., the usual canonical basis (xγ)γ∈Nn

d of monomials, where Nn

d :={γ ∈(N0)n : |γ| ≤d},N0 is the set of natural numbers including 0 and |γ|:=Pni=1γi. Also, denote by Σ[x]⊂ R[x] (resp.

Σ[x]d⊂R[x]2d) the cone of sums of squares (SOS) polynomials (resp. SOS polynomials of degree at most 2d). If f ∈ R[x]d, we write f(x) = Pγ∈Nn

d fγxγ in the canonical basis and denote by f = (fγ)γ ∈Rs(d)its vector of coefficients. Finally, letSndenote the space ofn×nreal symmetric matrices, with inner product hA,Bi = traceAB. We use the notation A 0 to denote that A is positive semidefinite (psd). With g0 := 1, the quadratic module Q(g1, . . . , gm) ⊂ R[x]

generated by polynomials g1, . . . , gm, is defined by Q(g1, . . . , gm) :=

Xm

j=0

σjgj : σj ∈Σ[x]

.

With a real sequence y = (yγ)γ∈Nnd, one may associate the linear functional Ly : R[x]d → R defined by

f =X

γ

fγxγ

!

7→Ly(f) := X

γ

fγyγ,

which is called the Riesz functional. If d= 2adenote by Ma(y) the moment matrix associated withy. It is a real symmetric matrix with rows and columns indexed in the basis of monomials (xγ)γ∈Nn

a, and with entries

Ma(y)(α, β) := Ly(xα+β) = yα+β,α, β∈Nna.

Ify= (yγ)γ∈Nn is the sequence of moments of some Borel measureµon RnthenMa(y)0 for all a∈N. However the converse is not true in general and it is related to the well-known fact

2The numerical experiments were run on a standard lap-top from the year 2015. For a more detailed description see page 13.

3If in the description (P) some degree “s” is strictly larger than 2 then for Sparse-PUT’s first relaxation, the size of the largest positive semidefinite variable is already at least O(κ⌈s/2) where κ is the size of the largest clique in some graph associated with the problem; ifκis not small it can be prohibitive. In contrast, for all the Sparse-BSOS relaxations (with parameterk= 1), this largest size is onlyO(κ).

(6)

that there are positive polynomials that are not sums of squares. For more details the interested reader is referred to e.g. [11, Chapter 3].

A polynomial f ∈ R[x] is said to be SOS-convex if its Hessian matrix x 7→ ∇2f(x) is an SOS matrix-polynomial, that is, ∇2f = L LT for some real matrix polynomial L ∈ R[x]n×a (for some integer a). In particular, for SOS-convex polynomials and sequences y with positive semidefinite moment matrix Ma(y)0, a Jensen-type inequality is valid:

Lemma 1. Let f ∈ R[x]2a be SOS-convex and let y = (yγ)γ∈Nn2a be such that y0 = 1 and Ma(y)0. Then

Ly(f) ≥ f(Ly(x)), with Ly(x) := (Ly(x1), . . . , Ly(xn)).

For a proof see Theorem 13.21, p. 209 in [12].

2.2 A sparsity pattern

Given I ⊂ {1, . . . , n} denote by R[x;I] the ring of polynomials in the variables {xi : iI}, which we understand as a subring of R[x]. Hence, a polynomial g∈R[x;I] canonically induces two polynomial functions g:R#I →Rand g:Rn→R, where #I is the number of elements in I.

Assumption 1 (Sparsity pattern). There exists p ∈ N and subsets I ⊆ {1, . . . , n} and J ⊆ {1, . . . , m} for allℓ∈ {1, . . . , p} such that

f =Ppℓ=1f, for some f1, . . . , fp such that f∈R[x;I], ∈ {1, . . . , p},

gj ∈R[x;I]for all jJ andℓ∈ {1, . . . , p},

Spℓ=1I={1, . . . , n},

Spℓ=1J={1, . . . , m};

for all = 1, . . . , p− 1 there is an s such that Iℓ+1Sr=1Ir

Is (Running Intersection Property).

On the RIP. The RIP can be understood in the following framework. Suppose that we are given probability measures (µ)ℓ=1,...,p, where each µ only seesthe variables {xi :iI}, and the µ’s are consistent in the sense that if Iℓ,k :=IIk6=∅ thenµℓ,k =µk,ℓ, where µℓ,k (resp.

µk,ℓ) is the marginal of µ (resp. µk) with respect to the variables {xi :iIℓ,k}. If the RIP property holds then from the µ’s one is able to build up a probability measureφ that sees all variables x1, . . . , xn, and such that for every = 1, . . . , p, the marginal φ of φ with respect to the variables {xi :iI} is exactly µ. Put differently, the “local” information (µ) is part of a “consistent and global” information φ, without knowing φ. In this set up the RIP condition appears naturally when one tries to build upφfrom theµ’s; see e.g. the proof in Lasserre [10].

Note that if the setsI1, . . . , Ip satisfy all requirements of Assumption 1 except the Running Intersection Property (RIP) it is always possible to enlarge the I until the RIP is satisfied.

In the worst case however I = {1, . . . , n} which is not interesting. In [24] no assumption on the sparsity is made. Instead Waki et al. provide an algorithm to create a sparsity pattern satisfying Assumption 1 from any POP. They consider a graph with nodes{1, . . . , n} which has an edge (i, j) if the objective function has a monomial involvingxi andxj or there is a constraint involving xi and xj. If this graph is chordal, the sets I correspond to the maximum cliques of this graph (and hence we sometimes call the I “cliques”). If the graph is not chordal it is replaced by its chordal extension.

(7)

Finally and importantly, even if the RIP property does not hold, the Sparse-PUT or Sparse- BSOS hierarchies can still be implemented and provide valid lower bounds on the global optimum f. However convergence of the lower bounds to f is not guaranteed any more.

2.3 A preliminary result

For the uninitiated reader we start this section by citing two important Positivstellens¨atze. For polynomials g1, . . . , gm ∈R[x] and α, β∈(N0)m, let hαβ ∈R[x] be the polynomial:

x7→ hαβ(x) :=

Ym

j=1

gj(x)αj(1−gj(x))βj, x∈Rn.

Lemma 2 (Krivine/Stengle/Vasilescu/Handelmann Positivstellensatz [5, 19, 23, 1]). Let f, g1, . . . , gm ∈ R[x] and K = {x ∈ Rn : 0 ≤ gj(x) ≤ 1, j = 1, . . . , m} be compact. If the polynomials 1 and (gj)j=1...,m generate R[x] as an R-algebra and f is (strictly) positive on K, then there exist finitely many positive weights cαβ such that:

f = X

α,β∈(N0)m

cαβhαβ.

We next provide a sparse version of this Positivstellensatz which to the best of our knowledge is new. Before we recall the sparse version of Putinar’s Positivstellensatz [18] proved in [10].

Lemma 3 (Sparse Putinar Positivstellensatz [4]). Let f, g1, . . . , gm ∈ R[x] satisfy Assumption 1 and let the polynomials 1−M−1Pi∈Ix2i for some positive numbers M be among the gj. If f is (strictly) positive on K={x∈ Rn:gj(x) ≥0, j = 1, . . . , m},then f =Ppℓ=1f for some polynomials f ∈R[x;I],= 1, . . . , p, and

f

σ0+ X

j∈J

σjgj : σ0, σj ∈Σ[x, I]

.

From now on, we assume thatKis described by polynomials g1, . . . , gm such that

K= {x∈Rn: 0≤gj(x)≤1, j= 1, . . . , m}. (3) Note that this is not a restriction whenKdefined in (P) is compact, as the constraint polynomials can be scaled by a positive factor without adding or losing information.

Let Assumption 1 hold, definen :=|I|, m := |J|, and letK ⊂Rn, = 1, . . . , p, be the sets:

K := {x∈Rn: 0≤gj(x)≤1, jJ}, = 1, . . . , p. (4) Define π:Rn→Rn,x7→(xi)i∈I. Then

K={x∈Rn : π(x)∈K for all = 1, . . . , p}. (5) Assumption 2. (a) The setsK defined in (4)are compact and the polynomials1and(gj)j∈J

generate R[x;I] as an R-algebra for each = 1, . . . , p.

(b) For each ℓthere exists M>0 and jJ such that gj = 1−M−1Pi∈Ix2i.

Note that if Assumption 2(a) holds then Assumption 2(b) can be satisfied easily. Indeed, sinceKis assumed to be compact, the polynomialx7→Pi∈Ix2i attains its maximumMonK. Defining x 7→g+(x) := 1−M−1Pi∈Ix2i and adding the redundant constraint 0 ≤g+(x) ≤1 to the description of K for each = 1, . . . , p, Assumption 2(b) is satisfied.

LetN:={(α, β)∈(N0)2m : supp(α)∪supp(β)⊆J}, where supp(α) :={j∈ {1, . . . , m} : αj 6= 0}.

(8)

Theorem 1 (Sparse Krivine-Stengle Positivstellensatz). Let f, g1, . . . , gm ∈ R[x] satisfy As- sumption 1 and 2(a). If f is (strictly) positive on K then f = Ppℓ=1f for some polynomials f ∈R[x;I],= 1, . . . , p, and there exists finitely many positive weightscαβ such that

f = X

(α,β)∈N

cαβhαβ, = 1, . . . , p. (6) Note that for eachthe representation (6) only involvesgj ∈R[x;I] since the corresponding exponents in the definition of hαβ are 0 if j /J.

Proof. As f is positive on K there exist ε > 0 such that fε > 0 on K. As remarked just after Assumption 2, we can add a redundant constraint 0 ≤g+(x) ≤ 1, withg+ ∈R[x;I], to the description of each of the K so as to satisfy Assumption 2(b). Hence we can apply Lemma 3 to obtain the representation

fε= Xp

ℓ=1

σ0+σ+g+ +X

j∈J

σjgj

| {z }

R[x;I]and ≥0 onK

,

for some SOS polynomials σj, σ+. Next, let

f := ε/p+σ0+σ+g++ X

j∈J

σjgj, = 1, . . . , p.

Notice thatf =Ppℓ=1f and eachf∈R[x;I] is strictly positive onK,= 1, . . . , p. Removing the redundant constraints 0 ≤ g+ ≤ 1 from the description of K again does not change the fact that f is strictly positive on K. Furthermore Assumption 2(a) still holds. Hence, we can apply Lemma 2 to each of the f, which yields (6) for each = 1, . . . , p.

3 Main result

3.1 The Sparse Bounded-Degree SOS-hierarchy (Sparse-BSOS)

Consider problem (P) and let Assumption 1 & 2 hold. For d ∈ N let Nd := {(α, β) ∈ N : P

jj +βj)≤d}, which is of size 2md+d. Letk∈Nbe fixed and define dmax:= max{deg(f),2k, dmax

j {deg(gj)}}.

We consider the family of optimization problems indexed by d∈N: qkd:= sup

t,λ1, . . . ,λp, f1, . . . , fp

{t : fX

(α,β)∈Nd

λαβhαβ ∈Σ[x;I]k, = 1, . . . , p,

ft = Xp

ℓ=1

f, λ ∈R|N

d|

+ , t∈R, f∈R[x;I]dmax

,

(7)

where the scalars λαβ are the entries of the vector λ ∈R|N

d| + .

(9)

Observe that whenkis fixed, then for each d∈N, computing qkd in (7) reduces to solving a semidefinite program and qkd+1qkd for all d∈N. Therefore (7) defines ahierarchy of semidef- inite programs of which the associated sequence of optimal values is monotone non decreasing.

Let us call it the Sparse-BSOS hierarchy, as it is the sparse versionof the BSOS hierarchy [6].

For practical implementation of (7) there are at least two possibilities depending on how the polynomial identities in (7) are treated in the resulting semidefinite program. To state that two polynomials p, q∈R[x]d are identical one can either:

- equate their coefficients(e.g. in the monomial basis, i.e., pγ= qγ for allγ ∈Nn

d), or - equate their values(i.e an implementation by sampling) on n+dd generic points (e.g. ran- domly generated on the box [−1,1]n).

Both strategies have drawbacks: To equate coefficients one has to take powers of the poly- nomials gj and (1−gj) which leads to an ill-conditioning of the coefficients of the polynomials hαβ (in the monomial basis) as some of them are multiplied by binomial coefficients which be- come large quickly when the relaxation order d increases. On the other hand, when equating values the resulting linear system may become ill-conditioned because (depending on the points of evaluation and the gj) the constraints may be nearly linear dependent. The authors of [6]

chose point evaluation for the implementation of BSOS because the SDP solver SDPT34 is able to exploit the structure of the SDP generated in that way and hence problems with psd vari- ables of larger size can be solved. However, this feature cannot be used in our case and so in Sparse-BSOS we have implemented the equality constraints of (7) by comparing coefficients.

Indeed, equating coefficients is reasonable in the present context because we expect the number of variablesn in each block to be rather small. The drawback of this choice is that the resulting relaxations with high order dcan become time consuming (and even ill-conditioned as explained above).

Define Id:= {γ ∈Nnd : supp(γ)∈I}, Γd :={γ : γ ∈ Id,for some∈ {1, . . . , p}} and let 0 be the all zero vector of size n. Then fork fixed and for eachd, we get

qkd= sup

t,Q1,...,Qp, λ1,...,λp,

f1,...,fp

{t s.t. X

ℓ:γ∈Idmax

fγ = fγ, ∀γ ∈Γdmax\{0}, Xp

ℓ=1

f0 = f0t

fγX

(α,β)∈Nd

λαβ(hαβ)γ− hQ,vk(vk)Tiγ= 0,

∀γ ∈ Idmax, = 1, . . . , p, Q∈ S+s(ℓ,k), λ∈R|N

l d|

+ , f ∈Rs(ℓ,dmax), ℓ= 1, . . . , p, t∈R

, (8)

where s(ℓ, k) := nk+k, and vk is the vector of the canonical (monomial) basis of the vector space R[x;I]k. Here we use the convention that the coefficient qγ of a polynomial q is 0 if

|γ| > deg(q). For a matrix polynomial q = (qij)1≤i,j≤s ∈ R[x]s×s the coefficient qγ is the matrix ((qij)γ)1≤i,j≤s ∈Rs×s. Note that the semidefinite matrix variables have fixed sizes(ℓ, k), independent of d∈N. This is a crucial feature for computational efficiency of the approach.

The dual of the semidefinite program (8) reads:

˜

qdk:= inf

θ1,...,θp,y{Ly(f) s.t. yγ = θγ, ∀ℓ:γ ∈ Idmax; ∀γ ∈Γdmax, y0 = 1,

Mk) 0, Lθ(hαβ) ≥ 0, (α, β)∈Nd; = 1, . . . , p θ ∈Rs(ℓ,dmax), ℓ= 1, . . . , p, y∈Rd|o.

(9)

4In both [6] and this paper SDPT3 [21, 22] is used as SDP solver.

(10)

By standard weak duality of convex optimization, ˜qdkqdk for all d ∈ N. Moreover (9) is a relaxation of (P). Indeed, if ˆx is feasible in (P), define θγ := ˆxγ for all γ ∈ Idmax and all ∈ {1, . . . , p}. Define y according to the constraints in (9). Then y0 = ˆx0 = 1. Let vk be the vector of the canonical (monomial) basis of the vector space R[x;I]k, understood as a function Rn → Rs(ℓ,k). Then Mk) = vkx)(vkx))T 0. Finally, regarding the Riesz functionals Lθ(hαβ) = hαβx) ≥ 0 for all (α, β) ∈ Nd, = 1, . . . , p and Ly(f) = fx). Thus to every feasible point ˆx of (P) corresponds a feasible point of (9) giving the same value fx) and therefore fq˜dkqkd for all d∈N. In fact we have an even more precise and interesting result:

Theorem 2 ([9]). Consider problem (P) and let Assumption 1 & 2 hold. Let k ∈ N be fixed.

Then the sequence (qdk)d∈N, defined in (7) is monotone non-decreasing and qdkf as d→ ∞.

Proof. Monotonicity of the sequence (qdk)d∈N follows from its definition. Let ε > 0 be fixed arbitrary. Then the polynomial ff+ε is positive on K. By Theorem 1 there exist some polynomials f1. . . , fp such that (6) holds, i.e.,

f−(fε)

| {z }

t

= Xp

ℓ=1

f with f = X

(α,β)∈N

cαβhαβ, = 1, . . . , p,

for finitely many positive weightscαβ. Hence (f−ε, f, cαβ) is a feasible solution for (7) as soon as d is sufficiently large, and therefore qdkfε. Combining this with qkdf and noting that ε >0 was arbitrary, yield the desired result qkdf as d→ ∞.

Note that we optimize over all possible representations of f of the formf =Ppℓ=1f with f ∈ R[x;I]. By Assumption 1 such a representation exists; however it does not need to be unique.

We next show that a distinguishing feature of the dense BSOS hierarchy [6] is also valid for its sparse version Sparse-BSOS.

Theorem 3. Assume the feasible set K in problem (P) is non-empty and let Assumption 1 &

2 hold. Let k∈N be fixed and assume that for every = 1, . . . , p, the polynomials f and −gj

are all SOS-convex polynomials of degree at most 2k. (If k > 1 we assume (with no loss of generality) that for each = 1, . . . , p, and some sufficiently large κ >0, the redundant (SOS- convex) constraints 1−κ−1 Pi∈Ix2ki ≥ 0, = 1, . . . , p, are present in the description (3) of K.)

Then the semidefinite program (9) has an optimal solution ((θ∗ℓ),y) such that f = ˜q1k = Ly(f) and x := (Ly(x1), . . . , Ly(xn)) ∈ K is an optimal solution of (P). Hence finite convergence takes place at the first step of the hierarchy.

Proof. Let d = 1 (so that dmax = 2k) and consider the semidefinite program (9). Note, that θ = (θγ)γ∈I2k

. Recall that by Assumption 2, for every = 1, . . . , p, there exists jJ such that gj(x) = 1−M−1Pi∈Ix2i. In addition if k > 1 then there also exists r such that gr(x) = 1−κ−1 Pi∈Ix2ki . Hence, feasibility in (9) (with an appropriate choice of (α, β) ∈Nd) implies thatLθ(gj)≥0 andLθ(gr)≥0, which in turn imply:

Lθ(x2i)≤Mθ0(=M) and Lθ(x2ki )≤κθ0(=κ) (ifk >1), ∀i∈I, ∀ℓ= 1, . . . , p, where we have used that θ0 =y0 = 1 for all= 1, . . . , p.

Combining this with Mk) 0 and invoking Proposition 2.38 in [12] yields that |θγ| ≤ max[M, κ,1] for every |γ| ≤ 2k and = 1, . . . , p. Consequently, the set of feasible solutions

(11)

,y) of (9) is bounded, hence compact. This implies that (9) has an optimal solution (θ∗ℓ,y).

Notice that among the constraints Lθ∗ℓ(hαβ)≥0 are the constraintsLθ∗ℓ(gj)≥0 for alljJ. As f and −gj are SOS-convex, invoking Lemma 1 yields

f(x) ≤ Lθ∗ℓ(f) and 0 ≤ Lθ∗ℓ(gj) ≤ gj(x), ∀j∈J, = 1, . . . , p,

wherex := (Lθ∗ℓ(xi))i∈IK,= 1, . . . , p. In addition, the constraintyγ = θ∗ℓγ , for allsuch that γ ∈ Idmax, implies that (x)i = (x)i whenever iII. Therefore definingxi := (x)i

whenever iI, one obtainsgj(x)≥0 for all j, i.e., xK. Finally, fq˜1k = Ly(f) =

Xp

ℓ=1

Lθ∗ℓ(f) ≥ Xp

ℓ=1

f(x) = f(x), which shows that xKis an optimal solution of (P). Hencef =f(x) = ˜qk1. 3.2 Sufficient condition for finite convergence

By looking at the dual (9) of the semidefinite program (8) one obtains a sufficient condition for finite convergence. Choose ω ∈N minimal such that 2ω ≥ max{deg(f),deg(g1), . . . ,deg(gm)}.

We have the following lemma.

Lemma 4. Let∗1, . . . ,θ∗p,y) ∈ Rs(1,dmax)× · · · ×Rs(p,dmax)×Rs be an optimal solution of (9). If rankMω∗ℓ) = 1 for every = 1, . . . , p, then q˜kd =f and x = (yγ)|γ|=1 is an optimal solution of problem (P).

Proof. If rankMω∗ℓ) = 1, then (θγ∗ℓ)|γ|≤2ω, is the vector of moments (up to order 2ω) of the Dirac measure δx at the point x := (θ∗ℓγ )|γ|=1 ∈Rn. Note that from the constraints yγ = θγ in (9)

xγ1 =xγ2, ∀ℓ1, ℓ2 :γ ∈ I11 ∩ I12.

Hence we can define xγ :=xγ for all γ such that |γ|= 1 and independent of the specific choice of such that γ ∈ I. For the same reasonx = (yγ)|γ|=1. Consequently, for allq ∈R[x;I]

q(x) =q(x) = Z

q δx =Lθ∗ℓ(q) = Ly(q)

LetjJ. The constraintsLθ∗ℓ(hαβ)≥0 imply in particularLθ∗ℓ(gj)≥0. Since deg(gj)≤2ω, 0 ≤ Lθ∗ℓ(gj) = gj(x) = gj(x), and so as jJ was arbitrary, xK. Finally, and again because deg(f)≤2ω,

fq˜kd = Ly(f) = f(x),

from which we may conclude that xK is an optimal solution of problem (P).

4 Computational issues

4.1 Comparing coefficients

As already outlined earlier we have implemented polynomial equality constraints in (7) by com- parison of coefficients. The resulting constraints in the SDP are sparse and can be treated efficiently by the SDP solver. A crucial issue for the implementation of the Sparse-BSOS relax- ations is how to equate the coefficients. The bottleneck for such an implementation is that one has to gather all occurrences of the same monomials.

(12)

As in [6], we use the following data format for representing a polynomialf inn variables:

F(i,1:n+1)= [γT,fγ],

stating that fγ is the ith coefficient of f corresponding to the monomialxγ. Adding two poly- nomials is done by concatenating their representations. Hence, equating the coefficients of xγ is basically finding all indices iof a polynomial F, such thatF(i,1:n)=γT.

Matlab is providing the functionismember(A,B,’rows’)to find a rowAin a MatrixB. This however is too slow for our purpose. Instead of using this function, we reduce the problem to finding all equal entries of a vector, which can be handled much more efficiently. To that end we multiply F(:,1:n)by a random vector. Generically this results in a vector whose entries are different if and only if the corresponding rows in F(:,1:n)are different.

4.2 Reducing problem size

By looking at (8) more closely one may reduce the number of free variables and the number of constraints. It is likely that there are some indices i ∈ {1, . . . , n}, that only appear in one of theI, sayiIi. Hence, for allγSpj=1Ijdmax such thatγi 6= 0 the second equality constraint in (8) reduces to fγi = fγ. Consequently, there is a number of variables that are or can be fixed from the beginning. We do this in our implementation. However, in order to be able to certify optimality by Lemma 4, one needs to trace back these substitutions, to recover the moment sequences y from the solution of the dual problem. Removing these fixed variables occasionally leads to equality constraints 0 = 0 in the SDP. We remove those constraints for better conditioning.

5 Numerical experiments

In this section we provide some examples to illustrate the performance of our approach and to compare with others. It is natural to compare Sparse-BSOS with its dense counterpart BSOS [6] whenever possible. We also compare Sparse-BSOS with the sparse standard SOS-hierarchy [24] (Sparse-PUT). We use the following notation to refer to both the different SDP relaxations of Problem (P) and to the particular implementations used in our experiments. Note that for the sparse versions, we suppose that Assumption 1 holds.

• BSOS: The implementation of the dense version of BSOS as described in [6].

sup

t,Q,λ

{t s.t. f(x(τ))− X

(α,β)∈N2m

d

λαβhαβ(x(τ))− hQ,vk(x(τ))(vk(x(τ)))Ti= 0, τ = 1, . . . ,|Nnd

max|, Q∈ S+|Nnk|, λ∈R|N

2m d | + , t∈R

,

wheredis the relaxation order,kis a parameter fixed in advance to control the size of the psd variableQ,vkis the vector of the canonical (monomial) basis of the vector spaceR[x]k

understood as a function Rn→ R|Nnk|, and {x(τ) : τ = 1, . . . ,|Nn

dmax|}is a set of generic points in [−1,1]n. We emphasize that BSOS uses sampling to implement the equality constraints in the above problem (see discussion §3.1). We do the same implementation as is used for the numerical experiments in [6]. As explained in [6, Section 3], some of the constraints in the SDP stated above might by nearly redundant, i.e. they might be

“almost” linearly dependant of others. Such constraints are removed before handing the problem over to the SDP solver. A close look to the code also reveals that the maximum

Références

Documents relatifs

The ridge logistic regression has successfully been used in text categorization problems and it has been shown to reach the same performance as the Support Vector Machine but with

For example, for problem Hn20 4 (with n = 20 variables and degree ℓ = 4), BSOS took about 250s to generate the first lower bound −2.1860, and took nearly 300s to generate a better

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

The pushdown hierarchy — sometimes called the Caucal hierarchy — is a family of sets of graphs having a decidable monadic theory.. The n th decomposition G r,n is the subgraph

The numerical results are organized as follows: (i) we first compare the complexity of the proposed algorithm to the state-of-the-art algorithms through simulated data, (ii) we

In particular, all methods can recover the sparsest solution in sufficiently sparse cases and the greedy approach is the fastest while the BP methods based on convex relaxations

Cette étude a pour objectif d’évaluer le rôle de la fibrose hépatique dans la réponse virologique soutenue (RVS) à la bithérapie dans le traitement

Section 3 contains our main results about the convergence analysis of the bounds (3) for the unit sphere: after showing in Section 3.1 that the convergence rate is in O(1/r 2 ) we