Sparse-BSOS: a bounded degree SOS hierarchy for large scale polynomial optimization with sparsity

(1)

HAL Id: hal-01341931

https://hal.archives-ouvertes.fr/hal-01341931v3

Submitted on 26 May 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Sparse-BSOS: a bounded degree SOS hierarchy for large scale polynomial optimization with sparsity

Tillmann Weisser, Jean-Bernard Lasserre, Kim-Chuan Toh

To cite this version:

Tillmann Weisser, Jean-Bernard Lasserre, Kim-Chuan Toh. Sparse-BSOS: a bounded degree SOS hierarchy for large scale polynomial optimization with sparsity. Mathematical Programming Compu- tation, Springer, 2018, 10, pp.1–32. �hal-01341931v3�

(2)

Sparse-BSOS: a bounded degree SOS hierarchy for large scale polynomial optimization with sparsity

^∗

Tillmann Weisser^† Jean B. Lasserre^‡ and Kim-Chuan Toh^§ May 26, 2017

Abstract

We provide a sparse version of the bounded degree SOS hierarchy BSOS [6] for polynomial optimization problems. It permits to treat large scale problems which satisfy a structured sparsity pattern. When the sparsity pattern satisfies the running intersection property this Sparse-BSOS hierarchy of semidefinite programs (with semidefinite constraints of fixed size) converges to the global optimum of the original problem. Moreover, for the class of SOS- convex problems, finite convergence takes place at the first step of the hierarchy, just as in the dense version.

Keywords: Global Optimization, Semidefinite Programming, Sparsity, Large Scale Problems, Convex Relaxations, Positivity Certificates

MSC: 90C26, 90C22

1 Introduction

We consider the polynomial optimization problem:

f^∗ := min

x {f(x) : x∈K} (P)

where f ∈R[x] is a polynomial and K⊂Rⁿ is the basic semi-algebraic set

K:={x∈Rⁿ:g_j(x)≥0, j = 1, . . . , m}, (1) for some polynomials gj ∈R[x],j= 1, . . . , m. In [6] Lasserre et al. have provided BSOS, a new hierarchy of semidefinite programs (Q^k_d) indexed by d∈ Nand parametrized by k ∈N (fixed), whose associated (monotone non-decreasing) sequence of optimal values (ρ^k_d)_d∈^Nconverges tof^∗ as d→ ∞, i.e., ρ^k_d→f^∗ asd→ ∞.

One distinguishing feature of the BSOS hierarchy (when compared to the LP-hierarchy defined in [8]) isfinite convergencefor an important class of convex problems. That is, whenf,−gj

are SOS-convex polynomials of degree bounded by 2k, then the first semidefinite relaxation of

∗The work of the first author is partially supported by a PGMO grant fromFondation Math´ematique Jacques Hadamardand a grant from theERC councilfor the Taming project (ERC-Advanced Grant #666981 TAMING).

†LAAS-CNRS, University of Toulouse, LAAS, 7 Avenue du Colonel Roche, 31031 Toulouse c´edex 4, France (tweisser@laas.fr).

‡LAAS-CNRS and Institute of Mathematics, University of Toulouse, LAAS, 7 Avenue du Colonel Roche, 31031 Toulouse c´edex 4, France (lasserre@laas.fr).

§Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076 (mattohkc@nus.edu.sg).

(3)

the hierarchy (Q^k_d)_d∈^N, is exact, i.e., ρ^k₁ = f^∗. (In contrast the LP-hierarchy cannot converges in finitely many steps for such convex problems).

The BSOS hierarchy has also two other important distinguishing features, now when compared to the standard SOS hierarchy defined in [7, 11, 12] (let us call it PUT as it is based on Putinar’s Positivstellensatz).

• For each semidefinite relaxationQ^k_d,d∈N, the size of the semidefinite constraint isO(n^k), hence fixed and controlled by the parameterk(fixed and chosen by the user). Withk= 0 one retrieves the LP-hierarchy based on a positivity certificate due to Stengle; see [6] and [8] for more details.

• For the important class of quadratic/quadratically constrained problems, the first relaxation of the BSOS hierarchy is at least as good as the first relaxation of the PUT hierarchy.

In this paper we introduce a sparse version of the BSOS hierarchy to help solve large scale polynomial optimization problems that exhibit some structured sparsity pattern. This hierarchy has exactly the same advantages as BSOS, now when compared with the sparse version of the standard SOS hierarchy introduced in Waki et al. [24].

Motivation

In general the standard SOS hierarchy PUT [7] is considered very efficient (and hard to beat) when it can be implemented. For instance, PUT’s first relaxation solves at optimality several small size instances of the Optimal Power Flow problem (OPF), an important problem in management of energy networks (essentially a quadratic/quadratically constrained optimization problem); see e.g. [16]. But even for relatively small size OPFs, PUT’s second relaxation cannot be implemented because the size of some matrices required to be positive semidefinite (psd) is too large for state-of-the art semidefinite solvers.

This was an important motivation for introducing the BSOS hierarchy [6]: That is, to provide an alternative to PUT, especially in cases where PUT’s second relaxation cannot be implemented. In particular, in all relaxations of the BSOS hierarchy (with parameter k = 1) the size of the matrix required to be psd is only O(n) (in contrast to O(n²) for PUT’s second relaxation). Thus for quadratic/quadratically constrained problems where PUT’s second relaxation cannot be implemented, the second and even higher order relaxations of BSOS can be implemented and provide better lower bounds. ¹ When they are strictly less than the optimal value these lower bounds might still be useful as they can be exploited in some other procedure.

The motivation for introducing the sparse version of BSOS is exactly the same as for BSOS, but now for large scale polynomial optimization problems that exhibit some sparsity pattern.

For such problems the sparse version of the SOS hierarchy introduced in Waki et al. [24] (that we call “Sparse-PUT” in the sequel) is in general very efficient, in particular when its second relaxation can be implemented. So there is a need to provide an alternative to the latter, especially in cases where its second relaxation cannot be implemented – for instance for some large scale OPF problems (where indeed the second relaxation cannot be implemented, at least in its initial form), e.g as described in [15].

1Recently Marandi et al. [14] have shown that BSOS provides better bounds than the first relaxation of PUT for the Pooling Problem - another instance of quadratic/quadratically constrained problems. However their experiments also show that BSOS (i) does not always solve these difficult problems to optimality at a low level

“d” relaxation and (ii) does not scale well.

(4)

Contribution

Even though in the BSOS hierarchy [6] the sizeO(n^k) of the semidefinite constraint ofQ^k_dis fixed for alldand permits to handle problems (P) of size larger than with the standard SOS-hierarchy, its application is still limited to problems of relatively modest size (say medium size problems).

Our contribution is to provide a sparse version “Sparse-BSOS” of the BSOS hierarchy which permits to handle large size problems (P) that satisfy some (structured) sparsity pattern. The Sparse-BSOS hierarchy is to its dense version BSOS what the Sparse-PUT hierarchy is to the standard SOS-hierarchy PUT. Again as in the dense case, a distinguishing feature of the Sparse- BSOS hierarchy (and in contrast to Sparse-PUT) is that the size of the resulting semidefinite constraints is fixed in advance at the user convenience and does notdepend on the rank in the hierarchy. However, such an extension is not straightforward because in contrast to Putinar’s SOS-based certificate [18] (where the gj’s appear separately), the positivity certificate used in the dense BSOS algorithm [6] potentially mixes all polynomials gj that defineK, that is, if f is positive on Kthen

f = σ+ ^X

α,β∈(N₀)^m

cαβ

Ym

j=1

g_j^α^j(1−gj)^β^j, (2)

for some SOS polynomial σ and positive scalar weightsc_αβ. Therefore in principle the sparsity as defined in [24] may be destroyed in σ and in the products ^Q_jg_j^α^j(1 −g_j)^β^j. In fact, one contribution of this paper is to provide a specialized sparse version of (2). In particular, we prove that if the sparsity pattern satisfies the so-called Running Intersection Property (RIP) then the Sparse-BSOS hierarchy also converges to the global optimum f^∗ of (P). A sufficient rank-condition also permits to detect finite convergence. Last but not least, we also prove that the Sparse-BSOS hierarchy preserves two distinguishing features of the dense BSOS hierarchy.

Namely:

• Finite convergence at the first step of the hierarchy for the class of SOS-convex problems.

(Recall that the standard LP hierarchy cannot converge in finitely many steps for such problems [8, 12].)

• For quadratic/quadratically constrained problems (with a sparsity pattern), the first relaxation of the Sparse-BSOS hierarchy is at least as good as that of Sparse-PUT. Hence when Sparse-PUT’s second relaxation cannot be implemented the Sparse-BSOS hierarchy (with parameter k= 1) will provide better lower bounds.

This also suggests that the Sparse-BSOS relaxations could be useful in Branch & Bound algorithms for solving large scale Mixed Integer Non Linear programs (MINLP). Indeed for such algorithms the quality of lower bounds computed at each node of the search tree is crucial for the overall computational efficiency.

The sparsity pattern. Roughly speaking we say that problem (P) satisfies a structured sparsity pattern, if the set I₀ := {1, . . . , n} of all (indices of) variables is some union ∪^p_ℓ=1Ik

of smaller blocks of (indices of) variables I_ℓ such that each monomial of the objective function only consists of variables in one of the blocks. In addition, each polynomial g_j in the definition (1) of the feasible set, is also a polynomial only in variables of one of the blocks. Of course the blocks I₁, . . . , Ip may overlap, i.e., variables may appear in several blocks. Together with the maximum degree appearing in the data of (P), the number and size of the blocks I₁, . . . , I_p as well as the size of their overlaps, are the characteristics of the sparsity pattern which have the strongest influence on the performance of our algorithm.

(5)

Computational experiments. We have tested the Sparse-BSOS hierarchy against both its dense version BSOS and Sparse-PUT [24]. To compare Sparse-BSOS and BSOS we consider problems of small and medium size (≤100 variables) and different sparsity patterns. Also we show that Sparse-BSOS can solve sparse large scale problems with up to 3000 variables which by far is out of the scope of the dense version in a reasonable time on a standard lap-top.²

To compare Sparse-BSOS and Sparse-PUT we have considered several problems from the literature in non-linear optimization and random medium size non-linear problems. For several such problems the first or second relaxation of the Sparse-PUT hierarchy cannot be solved whereas the first Sparse-BSOS relaxations d= 1,2. . . with a small parameter k= 1 or 2 can³. Therefore in such cases a good approximation (or even the optimal value) can be obtained by the Sparse-BSOS hierarchy. In our numerical experiments the problems were chosen in such a way that only the first relaxation of Sparse-PUT can be implemented, and indeed for such cases the Sparse-BSOS relaxations provide better lower bounds (and sometimes even the optimal value).

2 Preliminaries

2.1 Notation and definitions

Let R[x] be the ring of polynomials in the variablesx= (x₁, . . . , x_n). Denote by R[x]_d ⊂R[x]

the vector space of polynomials of degree at most d, which has dimension s(d) := ^n+d_d , with e.g., the usual canonical basis (x^γ)_γ∈Nⁿ

d of monomials, where Nⁿ

d :={γ ∈(N₀)ⁿ : |γ| ≤d},N₀ is the set of natural numbers including 0 and |γ|:=^Pⁿ_i=1γ_i. Also, denote by Σ[x]⊂ R[x] (resp.

Σ[x]d⊂R[x]_2d) the cone of sums of squares (SOS) polynomials (resp. SOS polynomials of degree at most 2d). If f ∈ R[x]_d, we write f(x) = ^P_γ∈Nn

d fγx^γ in the canonical basis and denote by f = (fγ)γ ∈R^s(d)its vector of coefficients. Finally, letSⁿdenote the space ofn×nreal symmetric matrices, with inner product hA,Bi = traceAB. We use the notation A 0 to denote that A is positive semidefinite (psd). With g0 := 1, the quadratic module Q(g1, . . . , gm) ⊂ R[x]

generated by polynomials g₁, . . . , gm, is defined by Q(g₁, . . . , gm) :=





 Xm

j=0

σjgj : σj ∈Σ[x]





 .

With a real sequence y = (yγ)_γ∈^Nⁿ_d, one may associate the linear functional Ly : R[x]d → R defined by

f =^X

γ

f_γx^γ

!

7→L_y(f) := ^X

γ

f_γy_γ,

which is called the Riesz functional. If d= 2adenote by M_a(y) the moment matrix associated withy. It is a real symmetric matrix with rows and columns indexed in the basis of monomials (x^γ)_γ∈Nn

a, and with entries

Ma(y)(α, β) := Ly(x^α+β) = y_α+β, ∀α, β∈Nⁿ_a.

Ify= (y_γ)_γ∈Nn is the sequence of moments of some Borel measureµon RⁿthenM_a(y)0 for all a∈N. However the converse is not true in general and it is related to the well-known fact

2The numerical experiments were run on a standard lap-top from the year 2015. For a more detailed description see page 13.

3If in the description (P) some degree “s” is strictly larger than 2 then for Sparse-PUT’s first relaxation, the size of the largest positive semidefinite variable is already at least O(κ^⌈s/²^⌉) where κ is the size of the largest clique in some graph associated with the problem; ifκis not small it can be prohibitive. In contrast, for all the Sparse-BSOS relaxations (with parameterk= 1), this largest size is onlyO(κ).

(6)

that there are positive polynomials that are not sums of squares. For more details the interested reader is referred to e.g. [11, Chapter 3].

A polynomial f ∈ R[x] is said to be SOS-convex if its Hessian matrix x 7→ ∇²f(x) is an SOS matrix-polynomial, that is, ∇²f = L L^T for some real matrix polynomial L ∈ R[x]^n×a (for some integer a). In particular, for SOS-convex polynomials and sequences y with positive semidefinite moment matrix Ma(y)0, a Jensen-type inequality is valid:

Lemma 1. Let f ∈ R[x]_2a be SOS-convex and let y = (yγ)_γ∈^Nⁿ₂_a be such that y₀ = 1 and Ma(y)0. Then

Ly(f) ≥ f(Ly(x)), with Ly(x) := (Ly(x1), . . . , Ly(xn)).

For a proof see Theorem 13.21, p. 209 in [12].

2.2 A sparsity pattern

Given I ⊂ {1, . . . , n} denote by R[x;I] the ring of polynomials in the variables {xi : i ∈ I}, which we understand as a subring of R[x]. Hence, a polynomial g∈R[x;I] canonically induces two polynomial functions g:R^#I →Rand g:Rⁿ→R, where #I is the number of elements in I.

Assumption 1 (Sparsity pattern). There exists p ∈ N and subsets I_ℓ ⊆ {1, . . . , n} and J_ℓ ⊆ {1, . . . , m} for allℓ∈ {1, . . . , p} such that

• f =^P^p_ℓ=1f^ℓ, for some f¹, . . . , f^p such that f^ℓ∈R[x;Iℓ], ℓ∈ {1, . . . , p},

• gj ∈R[x;Iℓ]for all j∈Jℓ andℓ∈ {1, . . . , p},

• ^S^p_ℓ=1Iℓ={1, . . . , n},

• ^S^p_ℓ=1Jℓ={1, . . . , m};

• for all ℓ = 1, . . . , p− 1 there is an s ≤ ℓ such that I_ℓ+1∩^S^ℓ_r=1Ir

⊆ Is (Running Intersection Property).

On the RIP. The RIP can be understood in the following framework. Suppose that we are given probability measures (µℓ)_ℓ=1,...,p, where each µℓ only seesthe variables {xi :i∈Iℓ}, and the µ_ℓ’s are consistent in the sense that if I_ℓ,k :=I_ℓ∩I_k6=∅ thenµ_ℓ,k =µ_k,ℓ, where µ_ℓ,k (resp.

µ_k,ℓ) is the marginal of µ_ℓ (resp. µ_k) with respect to the variables {xi :i ∈ I_ℓ,k}. If the RIP property holds then from the µℓ’s one is able to build up a probability measureφ that sees all variables x₁, . . . , xn, and such that for every ℓ= 1, . . . , p, the marginal φ_ℓ of φ with respect to the variables {xi :i∈I_ℓ} is exactly µ_ℓ. Put differently, the “local” information (µ_ℓ) is part of a “consistent and global” information φ, without knowing φ. In this set up the RIP condition appears naturally when one tries to build upφfrom theµ_ℓ’s; see e.g. the proof in Lasserre [10].

Note that if the setsI₁, . . . , Ip satisfy all requirements of Assumption 1 except the Running Intersection Property (RIP) it is always possible to enlarge the I_ℓ until the RIP is satisfied.

In the worst case however Iℓ = {1, . . . , n} which is not interesting. In [24] no assumption on the sparsity is made. Instead Waki et al. provide an algorithm to create a sparsity pattern satisfying Assumption 1 from any POP. They consider a graph with nodes{1, . . . , n} which has an edge (i, j) if the objective function has a monomial involvingxi andxj or there is a constraint involving xi and xj. If this graph is chordal, the sets Iℓ correspond to the maximum cliques of this graph (and hence we sometimes call the I_ℓ “cliques”). If the graph is not chordal it is replaced by its chordal extension.

(7)

Finally and importantly, even if the RIP property does not hold, the Sparse-PUT or Sparse- BSOS hierarchies can still be implemented and provide valid lower bounds on the global optimum f^∗. However convergence of the lower bounds to f^∗ is not guaranteed any more.

2.3 A preliminary result

For the uninitiated reader we start this section by citing two important Positivstellens¨atze. For polynomials g₁, . . . , gm ∈R[x] and α, β∈(N₀)^m, let hαβ ∈R[x] be the polynomial:

x7→ h_αβ(x) :=

Ym

j=1

g_j(x)^α^j(1−g_j(x))^β^j, x∈Rⁿ.

Lemma 2 (Krivine/Stengle/Vasilescu/Handelmann Positivstellensatz [5, 19, 23, 1]). Let f, g₁, . . . , gm ∈ R[x] and K = {x ∈ Rⁿ : 0 ≤ gj(x) ≤ 1, j = 1, . . . , m} be compact. If the polynomials 1 and (g_j)_j=1...,m generate R[x] as an R-algebra and f is (strictly) positive on K, then there exist finitely many positive weights cαβ such that:

f = ^X

α,β∈(N₀)^m

cαβhαβ.

We next provide a sparse version of this Positivstellensatz which to the best of our knowledge is new. Before we recall the sparse version of Putinar’s Positivstellensatz [18] proved in [10].

Lemma 3 (Sparse Putinar Positivstellensatz [4]). Let f, g₁, . . . , gm ∈ R[x] satisfy Assumption 1 and let the polynomials 1−M_ℓ⁻¹^P_i∈I_ℓx²_i for some positive numbers M_ℓ be among the g_j. If f is (strictly) positive on K={x∈ Rⁿ:g_j(x) ≥0, j = 1, . . . , m},then f =^P^p_ℓ=1f^ℓ for some polynomials f^ℓ ∈R[x;Iℓ],ℓ= 1, . . . , p, and

f^ℓ∈







σ₀^ℓ+ ^X

j∈J_ℓ

σ_j^ℓg_j : σ₀^ℓ, σ^ℓ_j ∈Σ[x, I_ℓ]





 .

From now on, we assume thatKis described by polynomials g₁, . . . , g_m such that

K= {x∈Rⁿ: 0≤g_j(x)≤1, j= 1, . . . , m}. (3) Note that this is not a restriction whenKdefined in (P) is compact, as the constraint polynomials can be scaled by a positive factor without adding or losing information.

Let Assumption 1 hold, definenℓ :=|Iℓ|, mℓ := |Jℓ|, and letKℓ ⊂Rⁿ^ℓ, ℓ= 1, . . . , p, be the sets:

K_ℓ := {x∈Rⁿ^ℓ: 0≤g_j(x)≤1, j∈J_ℓ}, ℓ= 1, . . . , p. (4) Define π_ℓ:Rⁿ→Rⁿ^ℓ,x7→(xi)_i∈I_ℓ. Then

K={x∈Rⁿ : πℓ(x)∈Kℓ for all ℓ= 1, . . . , p}. (5) Assumption 2. (a) The setsKℓ defined in (4)are compact and the polynomials1and(gj)j∈J_ℓ

generate R[x;Iℓ] as an R-algebra for each ℓ= 1, . . . , p.

(b) For each ℓthere exists M_ℓ>0 and j∈J_ℓ such that g_j = 1−M_ℓ⁻¹^P_i∈I_ℓx²_i.

Note that if Assumption 2(a) holds then Assumption 2(b) can be satisfied easily. Indeed, sinceK_ℓis assumed to be compact, the polynomialx7→^P_i∈I_ℓx²_i attains its maximumM_ℓonK_ℓ. Defining x 7→g₊^ℓ(x) := 1−M_ℓ⁻¹^P_i∈I_ℓx²_i and adding the redundant constraint 0 ≤g^ℓ₊(x) ≤1 to the description of Kℓ for each ℓ= 1, . . . , p, Assumption 2(b) is satisfied.

LetN^ℓ:={(α, β)∈(N₀)^2m : supp(α)∪supp(β)⊆J_ℓ}, where supp(α) :={j∈ {1, . . . , m} : αj 6= 0}.

(8)

Theorem 1 (Sparse Krivine-Stengle Positivstellensatz). Let f, g₁, . . . , gm ∈ R[x] satisfy As- sumption 1 and 2(a). If f is (strictly) positive on K then f = ^P^p_ℓ=1f^ℓ for some polynomials f^ℓ ∈R[x;I_ℓ],ℓ= 1, . . . , p, and there exists finitely many positive weightsc^ℓ_αβ such that

f^ℓ = ^X

(α,β)∈N^ℓ

c^ℓ_αβhαβ, ℓ= 1, . . . , p. (6) Note that for eachℓthe representation (6) only involvesgj ∈R[x;Iℓ] since the corresponding exponents in the definition of h_αβ are 0 if j /∈J_ℓ.

Proof. As f is positive on K there exist ε > 0 such that f −ε > 0 on K. As remarked just after Assumption 2, we can add a redundant constraint 0 ≤g₊^ℓ(x) ≤ 1, withg₊^ℓ ∈R[x;Iℓ], to the description of each of the K_ℓ so as to satisfy Assumption 2(b). Hence we can apply Lemma 3 to obtain the representation

f −ε= Xp

ℓ=1







σ₀^ℓ+σ₊^ℓg₊^ℓ +^X

j∈Jℓ

σ^ℓ_jgj

| {z }

∈R[x;I_ℓ]and ≥0 on^Kℓ





 ,

for some SOS polynomials σ_j^ℓ, σ₊^ℓ. Next, let

f^ℓ := ε/p+σ₀^ℓ+σ^ℓ₊g^ℓ₊+ ^X

j∈J_ℓ

σ_j^ℓg_j, ℓ= 1, . . . , p.

Notice thatf =^P^p_ℓ=1f^ℓ and eachf^ℓ∈R[x;I_ℓ] is strictly positive onK_ℓ,ℓ= 1, . . . , p. Removing the redundant constraints 0 ≤ g₊^ℓ ≤ 1 from the description of K_ℓ again does not change the fact that f^ℓ is strictly positive on Kℓ. Furthermore Assumption 2(a) still holds. Hence, we can apply Lemma 2 to each of the f^ℓ, which yields (6) for each ℓ= 1, . . . , p.

3 Main result

3.1 The Sparse Bounded-Degree SOS-hierarchy (Sparse-BSOS)

Consider problem (P) and let Assumption 1 & 2 hold. For d ∈ N let N_d^ℓ := {(α, β) ∈ N^ℓ : P

j(αj +βj)≤d}, which is of size ^2m_d^ℓ^+d. Letk∈Nbe fixed and define d_max:= max{deg(f),2k, dmax

j {deg(gj)}}.

We consider the family of optimization problems indexed by d∈N: q^k_d:= sup

t,λ¹, . . . ,λ^p, f¹, . . . , f^p

{t : f^ℓ− ^X

(α,β)∈N_d^ℓ

λ^ℓ_αβh_αβ ∈Σ[x;I_ℓ]_k, ℓ= 1, . . . , p,

f −t = Xp

ℓ=1

f^ℓ, λ^ℓ ∈R^|N

ℓ d|

+ , t∈R, f^ℓ∈R[x;Iℓ]dmax

,

(7)

where the scalars λ^ℓ_αβ are the entries of the vector λ^ℓ ∈R^|N

ℓ d| + .

(9)

Observe that whenkis fixed, then for each d∈N, computing q^k_d in (7) reduces to solving a semidefinite program and q^k_d+1 ≥q^k_d for all d∈N. Therefore (7) defines ahierarchy of semidefinite programs of which the associated sequence of optimal values is monotone non decreasing.

Let us call it the Sparse-BSOS hierarchy, as it is the sparse versionof the BSOS hierarchy [6].

For practical implementation of (7) there are at least two possibilities depending on how the polynomial identities in (7) are treated in the resulting semidefinite program. To state that two polynomials p, q∈R[x]_d are identical one can either:

- equate their coefficients(e.g. in the monomial basis, i.e., pγ= qγ for allγ ∈Nⁿ

d), or - equate their values(i.e an implementation by sampling) on ^n+d_d generic points (e.g. ran- domly generated on the box [−1,1]ⁿ).

Both strategies have drawbacks: To equate coefficients one has to take powers of the polynomials gj and (1−gj) which leads to an ill-conditioning of the coefficients of the polynomials h_αβ (in the monomial basis) as some of them are multiplied by binomial coefficients which become large quickly when the relaxation order d increases. On the other hand, when equating values the resulting linear system may become ill-conditioned because (depending on the points of evaluation and the gj) the constraints may be nearly linear dependent. The authors of [6]

chose point evaluation for the implementation of BSOS because the SDP solver SDPT3⁴ is able to exploit the structure of the SDP generated in that way and hence problems with psd variables of larger size can be solved. However, this feature cannot be used in our case and so in Sparse-BSOS we have implemented the equality constraints of (7) by comparing coefficients.

Indeed, equating coefficients is reasonable in the present context because we expect the number of variablesnℓ in each block to be rather small. The drawback of this choice is that the resulting relaxations with high order dcan become time consuming (and even ill-conditioned as explained above).

Define I_ℓ^d:= {γ ∈Nⁿ_d : supp(γ)∈Iℓ}, Γ^d :={γ : γ ∈ I_ℓ^d,for someℓ∈ {1, . . . , p}} and let 0 be the all zero vector of size n. Then fork fixed and for eachd, we get

q^k_d= sup

t,Q¹,...,Q^p, λ¹,...,λ^p,

f¹,...,f^p

{t s.t. ^X

ℓ:γ∈I_ℓ^dmax

f_γ^ℓ = fγ, ∀γ ∈Γ^d^max\{0}, Xp

ℓ=1

f0^ℓ = f⁰−t

f_γ^ℓ − ^X

(α,β)∈N_d^ℓ

λ^ℓ_αβ(h_αβ)_γ− hQ^ℓ,v^ℓ_k(v^ℓ_k)^Tiγ= 0,

∀γ ∈ I_ℓ^d^max, ℓ= 1, . . . , p, Q^ℓ∈ S₊^s(ℓ,k), λ^ℓ∈R^|N

l d|

+ , f^ℓ ∈R^s(ℓ,d^max⁾, ℓ= 1, . . . , p, t∈R

, (8)

where s(ℓ, k) := ⁿ^ℓ_k^+k, and v^ℓ_k is the vector of the canonical (monomial) basis of the vector space R[x;I_ℓ]_k. Here we use the convention that the coefficient q_γ of a polynomial q is 0 if

|γ| > deg(q). For a matrix polynomial q = (qij)1≤i,j≤s ∈ R[x]^s×s the coefficient qγ is the matrix ((qij)γ)_1≤i,j≤s ∈R^s×s. Note that the semidefinite matrix variables have fixed sizes(ℓ, k), independent of d∈N. This is a crucial feature for computational efficiency of the approach.

The dual of the semidefinite program (8) reads:

˜

q_d^k:= inf

θ¹,...,θ^p,y{L_y(f) s.t. y_γ = θ^ℓ_γ, ∀ℓ:γ ∈ I_ℓ^d^max; ∀γ ∈Γ^d^max, y₀ = 1,

M_k(θ^ℓ) 0, L_θℓ(h_αβ) ≥ 0, (α, β)∈N_d^ℓ; ℓ= 1, . . . , p θ^ℓ ∈R^s(ℓ,d^max⁾, ℓ= 1, . . . , p, y∈R^|Γ^d^|^o.

(9)

4In both [6] and this paper SDPT3 [21, 22] is used as SDP solver.

(10)

By standard weak duality of convex optimization, ˜q_d^k ≥ q_d^k for all d ∈ N. Moreover (9) is a relaxation of (P). Indeed, if ˆx is feasible in (P), define θ_γ^ℓ := ˆx^γ for all γ ∈ I_ℓ^d^max and all ℓ ∈ {1, . . . , p}. Define y according to the constraints in (9). Then y₀ = ˆx⁰ = 1. Let v_k^ℓ be the vector of the canonical (monomial) basis of the vector space R[x;I_ℓ]_k, understood as a function Rⁿ → R^s(ℓ,k). Then Mk(θ^ℓ) = v_k^ℓ(ˆx)(v^ℓ_k(ˆx))^T 0. Finally, regarding the Riesz functionals L_θℓ(h_αβ) = h_αβ(ˆx) ≥ 0 for all (α, β) ∈ N_d^ℓ, ℓ = 1, . . . , p and Ly(f) = f(ˆx). Thus to every feasible point ˆx of (P) corresponds a feasible point of (9) giving the same value f(ˆx) and therefore f^∗ ≥q˜_d^k≥q^k_d for all d∈N. In fact we have an even more precise and interesting result:

Theorem 2 ([9]). Consider problem (P) and let Assumption 1 & 2 hold. Let k ∈ N be fixed.

Then the sequence (q_d^k)_d∈^N, defined in (7) is monotone non-decreasing and q_d^k→f^∗ as d→ ∞.

Proof. Monotonicity of the sequence (q_d^k)_d∈N follows from its definition. Let ε > 0 be fixed arbitrary. Then the polynomial f −f^∗+ε is positive on K. By Theorem 1 there exist some polynomials f¹. . . , f^p such that (6) holds, i.e.,

f−(f^∗−ε)

| {z }

t

= Xp

ℓ=1

f^ℓ with f^ℓ = ^X

(α,β)∈N^ℓ

c^ℓ_αβh_αβ, ℓ= 1, . . . , p,

for finitely many positive weightsc^ℓ_αβ. Hence (f^∗−ε, f^ℓ, c^ℓ_αβ) is a feasible solution for (7) as soon as d is sufficiently large, and therefore q_d^k ≥f^∗−ε. Combining this with q^k_d ≤f^∗ and noting that ε >0 was arbitrary, yield the desired result q^k_d→f^∗ as d→ ∞.

Note that we optimize over all possible representations of f of the formf =^P^p_ℓ=1f^ℓ with f^ℓ ∈ R[x;I_ℓ]. By Assumption 1 such a representation exists; however it does not need to be unique.

We next show that a distinguishing feature of the dense BSOS hierarchy [6] is also valid for its sparse version Sparse-BSOS.

Theorem 3. Assume the feasible set K in problem (P) is non-empty and let Assumption 1 &

2 hold. Let k∈N be fixed and assume that for every ℓ= 1, . . . , p, the polynomials f^ℓ and −gj

are all SOS-convex polynomials of degree at most 2k. (If k > 1 we assume (with no loss of generality) that for each ℓ = 1, . . . , p, and some sufficiently large κℓ >0, the redundant (SOS- convex) constraints 1−κ⁻¹_ℓ ^P_i∈I_ℓx^2k_i ≥ 0, ℓ = 1, . . . , p, are present in the description (3) of K.)

Then the semidefinite program (9) has an optimal solution ((θ^∗ℓ),y^∗) such that f^∗ = ˜q₁^k = L_y^∗(f) and x^∗ := (L_y^∗(x₁), . . . , L_y^∗(xn)) ∈ K is an optimal solution of (P). Hence finite convergence takes place at the first step of the hierarchy.

Proof. Let d = 1 (so that d_max = 2k) and consider the semidefinite program (9). Note, that θ^ℓ = (θ^ℓ_γ)_γ∈I2k

ℓ . Recall that by Assumption 2, for every ℓ = 1, . . . , p, there exists j ∈ J_ℓ such that g_j(x) = 1−M_ℓ⁻¹^P_i∈I_ℓx²_i. In addition if k > 1 then there also exists r such that gr(x) = 1−κ⁻¹_ℓ ^P_i∈I_ℓx^2k_i . Hence, feasibility in (9) (with an appropriate choice of (α, β) ∈N_d^ℓ) implies thatL_θℓ(g_j)≥0 andL_θℓ(g_r)≥0, which in turn imply:

L_θ(x²_i)≤M_ℓθ^ℓ₀(=M_ℓ) and L_θ(x^2k_i )≤κ_ℓθ₀^ℓ(=κ_ℓ) (ifk >1), ∀i∈I_ℓ, ∀ℓ= 1, . . . , p, where we have used that θ₀^ℓ =y₀ = 1 for allℓ= 1, . . . , p.

Combining this with M_k(θ^ℓ) 0 and invoking Proposition 2.38 in [12] yields that |θ_γ^ℓ| ≤ max[Mℓ, κℓ,1] for every |γ| ≤ 2k and ℓ = 1, . . . , p. Consequently, the set of feasible solutions

(11)

(θ^ℓ,y) of (9) is bounded, hence compact. This implies that (9) has an optimal solution (θ^∗ℓ,y^∗).

Notice that among the constraints L_θ∗ℓ(hαβ)≥0 are the constraintsL_θ∗ℓ(gj)≥0 for allj∈Jℓ. As f^ℓ and −gj are SOS-convex, invoking Lemma 1 yields

f^ℓ(x^∗_ℓ) ≤ L_θ∗ℓ(f^ℓ) and 0 ≤ L_θ∗ℓ(gj) ≤ gj(x_ℓ^∗), ∀j∈J_ℓ, ℓ= 1, . . . , p,

wherex^∗_ℓ := (L_θ∗ℓ(xi))_i∈I_ℓ∈Kℓ,ℓ= 1, . . . , p. In addition, the constrainty_γ^∗ = θ^∗ℓ_γ , for allℓsuch that γ ∈ I_ℓ^d^max, implies that (x^∗_ℓ)i = (x_ℓ^∗′)i whenever i∈I_ℓ∩I_ℓ^′. Therefore definingx^∗_i := (x^∗_ℓ)i

whenever i∈I_ℓ, one obtainsg_j(x^∗)≥0 for all j, i.e., x^∗∈K. Finally, f^∗ ≥q˜₁^k = Ly^∗(f) =

Xp

ℓ=1

L_θ∗ℓ(f^ℓ) ≥ Xp

ℓ=1

f^ℓ(x^∗_ℓ) = f(x^∗), which shows that x^∗∈Kis an optimal solution of (P). Hencef^∗ =f(x^∗) = ˜q^k₁. 3.2 Sufficient condition for finite convergence

By looking at the dual (9) of the semidefinite program (8) one obtains a sufficient condition for finite convergence. Choose ω ∈N minimal such that 2ω ≥ max{deg(f),deg(g₁), . . . ,deg(gm)}.

We have the following lemma.

Lemma 4. Let (θ^∗1, . . . ,θ^∗p,y^∗) ∈ R^s(1,d^max⁾× · · · ×R^s(p,d^max⁾×R^s be an optimal solution of (9). If rankMω(θ^∗ℓ) = 1 for every ℓ= 1, . . . , p, then q˜^k_d =f^∗ and x^∗ = (y^∗_γ)_|γ|=1 is an optimal solution of problem (P).

Proof. If rankMω(θ^∗ℓ) = 1, then (θ_γ^∗ℓ)_|γ|≤2ω, is the vector of moments (up to order 2ω) of the Dirac measure δxℓ at the point x^ℓ := (θ^∗ℓ_γ )_|γ|=1 ∈Rⁿ^ℓ. Note that from the constraints yγ = θ^ℓ_γ in (9)

x^ℓ_γ¹ =x^ℓ_γ², ∀ℓ1, ℓ₂ :γ ∈ I_ℓ¹₁ ∩ I_ℓ¹₂.

Hence we can define x^∗_γ :=x^ℓ_γ for all γ such that |γ|= 1 and independent of the specific choice of ℓsuch that γ ∈ Iℓ. For the same reasonx^∗ = (y^∗_γ)_|γ|=1. Consequently, for allq ∈R[x;I_ℓ]_2ω

q(x^∗) =q(x^ℓ) = Z

q δxℓ =L_θ∗ℓ(q) = L_y^∗(q)

Letj∈J_ℓ. The constraintsL_θ∗ℓ(h_αβ)≥0 imply in particularL_θ∗ℓ(g_j)≥0. Since deg(g_j)≤2ω, 0 ≤ L_θ∗ℓ(gj) = gj(x^ℓ) = gj(x^∗), and so as j ∈ Jℓ was arbitrary, x^∗ ∈ K. Finally, and again because deg(f)≤2ω,

f^∗ ≥ q˜^k_d = L_y^∗(f) = f(x^∗),

from which we may conclude that x^∗ ∈K is an optimal solution of problem (P).

4 Computational issues

4.1 Comparing coefficients

As already outlined earlier we have implemented polynomial equality constraints in (7) by com- parison of coefficients. The resulting constraints in the SDP are sparse and can be treated efficiently by the SDP solver. A crucial issue for the implementation of the Sparse-BSOS relaxations is how to equate the coefficients. The bottleneck for such an implementation is that one has to gather all occurrences of the same monomials.

(12)

As in [6], we use the following data format for representing a polynomialf inn variables:

F(i,1:n+1)= [γ^T,f_γ],

stating that f_γ is the ith coefficient of f corresponding to the monomialx^γ. Adding two polynomials is done by concatenating their representations. Hence, equating the coefficients of x^γ is basically finding all indices iof a polynomial F, such thatF(i,1:n)=γ^T.

Matlab is providing the functionismember(A,B,’rows’)to find a rowAin a MatrixB. This however is too slow for our purpose. Instead of using this function, we reduce the problem to finding all equal entries of a vector, which can be handled much more efficiently. To that end we multiply F(:,1:n)by a random vector. Generically this results in a vector whose entries are different if and only if the corresponding rows in F(:,1:n)are different.

4.2 Reducing problem size

By looking at (8) more closely one may reduce the number of free variables and the number of constraints. It is likely that there are some indices i ∈ {1, . . . , n}, that only appear in one of theIℓ, sayi∈Iℓ_i. Hence, for allγ ∈^S^p_j=1I_j^d^max such thatγi 6= 0 the second equality constraint in (8) reduces to f_γ^ℓⁱ = f_γ. Consequently, there is a number of variables that are or can be fixed from the beginning. We do this in our implementation. However, in order to be able to certify optimality by Lemma 4, one needs to trace back these substitutions, to recover the moment sequences y^ℓ from the solution of the dual problem. Removing these fixed variables occasionally leads to equality constraints 0 = 0 in the SDP. We remove those constraints for better conditioning.

5 Numerical experiments

In this section we provide some examples to illustrate the performance of our approach and to compare with others. It is natural to compare Sparse-BSOS with its dense counterpart BSOS [6] whenever possible. We also compare Sparse-BSOS with the sparse standard SOS-hierarchy [24] (Sparse-PUT). We use the following notation to refer to both the different SDP relaxations of Problem (P) and to the particular implementations used in our experiments. Note that for the sparse versions, we suppose that Assumption 1 holds.

• BSOS: The implementation of the dense version of BSOS as described in [6].

sup

t,Q,λ

{t s.t. f(x^(τ))− ^X

(α,β)∈N²^m

d

λ_αβh_αβ(x^(τ))− hQ,v_k(x^(τ))(v_k(x^(τ)))^Ti= 0, τ = 1, . . . ,|Nⁿ_d

max|, Q∈ S₊^|^Nⁿ^k^|, λ∈R^|^N

2m d | + , t∈R

,

wheredis the relaxation order,kis a parameter fixed in advance to control the size of the psd variableQ,vkis the vector of the canonical (monomial) basis of the vector spaceR[x]k

understood as a function Rⁿ→ R^|^Nⁿk|, and {x^(τ) : τ = 1, . . . ,|Nⁿ

dmax|}is a set of generic points in [−1,1]ⁿ. We emphasize that BSOS uses sampling to implement the equality constraints in the above problem (see discussion §3.1). We do the same implementation as is used for the numerical experiments in [6]. As explained in [6, Section 3], some of the constraints in the SDP stated above might by nearly redundant, i.e. they might be

“almost” linearly dependant of others. Such constraints are removed before handing the problem over to the SDP solver. A close look to the code also reveals that the maximum