Generation of polynomial inequalities as invariants

(1)

HAL Id: dumas-00854834

https://dumas.ccsd.cnrs.fr/dumas-00854834

Submitted on 28 Aug 2013

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Generation of polynomial inequalities as invariants

Yannick Zakowski

To cite this version:

Yannick Zakowski. Generation of polynomial inequalities as invariants. Symbolic Computation [cs.SC]. 2013. �dumas-00854834�

(2)

Master research Internship

Internship report

Generation of polynomial inequalities as invariants

Author: Yannick Zakowski Supervisor: David Cachera Celtique team Abstract

Embedded software in critical systems rise a need for software anal-ysis, especially for guaranteeing safety properties. In the late seventies, Cousot & Cousot introduced a general framework, called abstract in-terpretation, dedicated to the conception of particular analyses: static analyses. Among the program properties of interest, discovering al-gebraic relationships between variables allows for proving the lack of run-time errors. While the inference of linear relationships is efficiently solved, computing polynomial equalities as invariants is still a chal-lenge, a fortiori is the inequality counterpart. After a brief overview of existing techniques for invariant inference, this report investigates the viability of a constraint-based method dedicated to the inference of polynomial inequalities as program invariants.

(3)

1 Introduction

During the last decades, embedded software took place in countless indus-tries. Consequently, bugs grew potentially costly, or even life threatening: flight controllers, automotive brakes or even nuclear safety systems consti-tute critical environments where software has gained a prominent room. Hence, craving for software reliability grew alongside.

Software reliability can be ensured by differents methods. The presum-ably oldest and currently most used in industry is testing. Nevertheless, such a method can do no more than increasing our confidence in the tested software, while we would like to have access to a formal, mathematical, proof of correctness. Unfortunately, this problem has proven to be undecidable. We therefore tighten our expectation: we seek an analysis which exclusively accepts safe programs, even if this implies some safe programs to be rejected. In the late seventies, Cousot and Cousot laid the fundation of the do-main by introducing a general framework called Abstract Interpretation [14]. This framework formalizes the notion of property and its over approxima-tion, by defining the adequate mathematical structures. Not only does this framework permit one to deeply understand the problem of static analysis, it also cuts up the problem of developing a static analysis into several recur-rent subproblems. The community of software static analysis can therefore develop algorithms and ideas reusable in numerous analyses.

We focus in this work on particular properties, namely numerical prop-erties between variables. Both non-relational as well as linear propprop-erties are quite well handled by existing analyses. On the other hand, the non-linear case remains a difficult task when it comes to inference.

In this report, we focus on program invariants expressed as polynomial inequalities. While no satisfying solutions exist at the moment, a few trails of extensions to polynomial inequalities as invariants have been suggested in works restricted to lower degrees or easier properties than invariants. We therefore investigated those works, and more specifically a constraint based approach by Cousot [13]: invariance is characterized by constraints, which we progressively relax by weaker ones until reaching a system we are able to efficiently solve.

While this process has originally been developed for an easier task, rank-ing function inference, its extension to invariant inference has been suggested by Cousot. In this report, our contributions on this subject are the follow-ing: we investigated the viability of the extension, both through theoretical insights and experiments; we identified the main challenges as the necessity of getting rid of bilinearity in the constraints and the task of guiding the solver toward semantics relevancy; we suggested trails toward the resolution of these issues.

The remaining of this report is organized as follows. A first section is dedicated to background in invariant inference. We then more specifically

(5)

focus on the work we investigated, by exposing the successive relaxations al-lowing us to abstract the semantics as a set of constraints. Section 4 presents our analysis of the problem, coupled with some background in mathemat-ical programming, as well as the experimental limitations we encountered. Finally, we present a few interesting related works in Section 5, before con-cluding.

2 Overview of invariant inference

2.1 The static analysis framework

The purpose of a static analysis is to compute a program property without executing it. In order to facilitate this work, Cousot and Cousot introduced in 1977 a mathematical background [14], partially systematizing the con-ception of the analysis by identifying recurrent subproblem, thus allowing generic solutions. We informally introduce this framework through this sec-tion.

2.1.1 Programming model

Analyses we consider act on programs. We therefore need a model to for-mally represent a generic language. The model used by authors slightly varies from one paper to another, hence we now introduce for this section a model general enough to easily introduce several works here. A more specific model dedicated to our main object of study will be presented in the next section.

A program is modeled as a graph in which nodes represent program points (or Locations). Assignments are represented by an edge labeled with the assignment, while conditionals require two different edges starting from the current node, labeled respectively with the test and its negation. Simi-larly, a loop induces a back-edge in the graph.

2.1.2 An over-approximated world

We consider a program involving n variables, taking their values among the reals; we define a state as a vector of Rn_{, representing the memory at a given}

time during execution. For convenience, we will write x for vectors of reals or variables, while writting x for unique variables or reals.

We would like to extract some information about this program, without executing it. More specifically, we want to safely establish numerical prop-erties over variables which are invariant at a given program point: they hold through any execution.

In order to do so, we would ideally like to compute, given the set of possible initial memory states, the set of possible states at a given program

(6)

Figure 1: A property P , its approximation Papprox and some erroneous sets

of states Φi. The analysis correctly claims the program to be safe to Φ1 and

unsafe to Φ2, but erroneously qualifies it as unsafe to Φ3.

point: this is known as the collecting semantics. On our programming model, defining the collecting semantics consists in following the flow defined by the graph, and iteratively collecting information from each instruction in order to build a function from Locations to _P(Rn_{). A first challenge}

arises from conditionals: they appear as two branches in the graph, which reach a common program point. We therefore need to be able to collect the information grabbed from each branch. Furthermore, we have to collect the inductive information induced by a loop: this is solved by computing a fixpoint, i.e. a set of states stable for the semantics of the loop body.

Unfortunately, the collecting semantics is known to be not computable. More generally, if we refer to a property as a subsetK ∈ Rn_{, static analysis is}

haunted by Rice’s theorem: no non-trivial property on programs semantics is computable. We therefore seek for a weaker analysis through overapprox-imation. The main idea is to restrict our computations on a fixed subset of properties, referred to as a domain. Since the program semantics, as a subset of Rn_{, may not directly belong to this domain, we look for a domain}

element which overapproximates this semantics. The smallest this approx-imation is, the most precise information we have. The interest of such an analysis is that it may be computable. But as illustrated on Figure 1, it eventually raises false positives. Note however that it can’t qualify as safe a program which is not.

Example 1 (The interval domain). One might be satisfied with the only extrem bounds of the possible values of variables: indeed, it may be a precise

(7)

enough information to prove the absence of memory overflows or out-of-bounds array indices. Such an information can be gathered through the in-terval domain: the analysis computes, for each program point i and variable x, an interval on R which gives an over-approximation of all the possible values of x at i.

2.1.3 A world with an adequate behavior: the lattice structure We informally suggested in the previous subsection to work on a subset of properties, and hope them to be actually computable. Indeed, we need a way to qualify such a subspace. As stated above, we face two intuitive problems when computing our semantics:

• in presence of a conditional, our graph presents a branching which joins farther. Hence we must have a process which, given two elements of our domain, computes one domain element that subsumes them; • when dealing with a loop, the collecting semantics follows a loop in

the graph. We must therefore be able to gather all the information by computing a fixpoint.

The adequate mathematical structure is that of a complete lattice. Definition 1 (Complete lattice). A complete lattice is given by (A,_⊑,F) where A is a set supplied with:

• a partial order relation ⊑

• a least upper bound F: ∀S ⊆ A, F S ∈ A is such that: – ∀x ∈ S, x ⊑F S

– _{∀x, (∀y ∈ S, y ⊑ x) ⇒}_{F S ⊑ x}

Complete lattices grant any monotonic function with a least fixpoint, defined as F

n∈NF

n₍_{⊥), where ⊥ is the least element. Hence, we have an}

algorithm, known as the Kleene iteration (whose termination however is not guaranteed), to compute a fixpoint: starting from ⊥, iterate F until stabilization.

Example 2 (The intervals lattice). The specific subset of properties defined by products of intervals on R actually possesses the desired structure:

• [a, b] ⊑ [c, d] if c ≤ a ∧ d ≥ b. • F{[ai, bi]} = [inf ai, sup bi]

(8)

2.1.4 An abstract world for convenience and efficiency

Theoretically, computing directly on a restricted set of subsets of properties as a domain would be possible. However, it happens to be highly more prac-tical to work on an abstraction of this domain, in order to gain convenience as welle as mathematical properties. Let us highlight the idea through an example.

Example 3. We consider the interval domain. A domain element is there-fore the infinite set of reals contained in an interval. Quite obviously, we cannot represent an element by enumerating every single point in it: we need a compact finite representation. Trivially here, the abstract domain is made of couples in ¯R_{, representing the bounds of the interval. Note that both} do-mains are in bijection: we gained a better representation, but no additional mathematical properties.

Another abstraction of intervals would be the sign domain: _{A = {∅, +, −, ±},} that carries information about the sign of variables. An abstract correspond-ing element would therefore be deduced from the interval considered, while the concretisation of a _{∈ A would respectively be the intervals ∅, [0, +∞],} [_{−∞, 0], [−∞, +∞]. The domains are not in bijection: computations will} be easier on the sign domain, but we carry fewer information than on the interval domain.

Note that the notion of abstraction is relative: an interval abstracts a set of reals it contains, but is abstracted by the sign of its elements.

As illustrated through the previous example, it may be highly more convenient to transport ourselves on abstract domains, where two major purposes can be attained. First, the identification of the mathematical prop-erties of the domain, on which the analysis is likely to be funded. Second, properties are infinite sets, hence a compact way to represent them is re-quired: abstract domains fill this need.

The abstract domain can either be bijective with the concrete one, there-fore only aiming at representation purpose, or be less precise, enhancing our computational capabilities. Moreover, we need a way to transport our infor-mation from the concrete domain to the abstract one, where computations occur. Then, we need to transport the information back in order to ob-tain the property overapproximation. The correspondence being usually not bijective, we want to lose as few information as possible when moving from a world to another: this intuition is formalized by the notion of Galois connection.

Definition 2 (Galois connection). Let (L1,⊑1,F₁) and (L2,⊑2,F₂) be two

complete lattices. A couple of functions α_{∈ L}1 → L2 and γ∈ L2→ L1 is a

Galois connection if :

(9)

α is referred to as the abstraction function, while γ is the concretization.

2.1.5 The widening operator

As stated above, loops require fixpoint computations. The lattice structure ensures existence of a least fixpoint, equal to F

n∈NF

n₍_{⊥). But is it always}

computable, and if so, within a tractable complexity? Intuitively, we may grow too slowly in our lattice and want to accelerate the iteration by per-forming some kind of jumps in the lattice.

Such intuition can be formalized by an operator such that iterative ap-plication of this operator always results in a chain of finite length. However, despite granting us with a computable fix point, it does not always result in the smallest one. Widening operators are therefore a consequent source of imprecision.

Widening operators was the last missing tool. Thanks to this whole framework, designing an abstract analysis is realized by defining an ab-stract lattice, building a Galois connexion between both the abab-stract and the concrete lattice, and designing the abstract semantic. On our program-ming model, the latter is therefore composed of operators on the abstract domain for annotations, both assignments and tests, as well as a process to compute fixpoints, possibly using a widening operator. In the remaining of this section, we present several analyses fitting in this framework, starting with analyses dedicated to linear invariant inference.

2.2 Linear invariants

Inferring polynomial invariants is a difficult task, intuition as well as tech-niques have therefore first been developed on an easier problem: linear in-variants. We present through this section two analyses dedicated to this purpose: first the seminal work of Cousot and Halbwachs [15], and then a more recent one, by Sankaranarayanan [40].

2.2.1 The polyhedral domain

Cousot and Halbwachs designed an analysis in order to generate linear re-lations between variables which hold at a given program point for any exe-cution [15].

Let x = (x1, . . . , xn) be the variables of the program. Let us consider a

linear inequality; it has the following form: X

i∈{1,...,n}

aixi ≤ b

This inequation defines an hyperplane on Rn_{, which can be seen, as}

(10)

Figure 2: A convex polyhedron, defined both geometrically and algebraically (figure extracted from Jobin’s thesis [22])

(a) Approximation of a set by a polyhe-dron

(b) The meet and join of two polyhedra

Figure 3

an intersection of hyperplanes, which can be seen geometrically as a con-vex polyhedron, as seen on Figure 2. We therefore want to approximate a property, the collecting semantics of our program, by a convex polyhedron which subsumes that property, as illustrated on Figure 3a. Once more, the smallest the polyhedron, the most precise information we have.

As explained in Section 2, we need a way to represent properties, poly-hedra here, in a compact manner. Figure 3a exposes a first solution: a polyhedron can be defined as a set of linear constraints. More precisely, we represent a polyhedron as a matrix A and a vector b, encoding the linear constraints by Ax_{≤ b. A polyhedron can also be interpreted geometrically,} being generated by vertices and vectors. Each representation is efficient for specific tasks, both are generally used together in static analysis tools.

The set of convex polyhedra over Rn _{can be proved to have a lattice}

structure, and therefore acts as our abstract domain. Being stable for the meet operation, the definition of this operator is trival. The join is a bit more tricky, since we have to over-approximate the set-theoretic join by its convex hull, as illustrated on Figure 3b.

(11)

This context being set, we initially have no constraints on programs, and therefore start from Rn _{as an initial polyhedron. We then collect our}

information thanks to an abstract correspondance for both the assignment and the boolean test. For instance, a linear boolean test adds a constraint to the polyhedron.

The lattice of polyhedra is of infinite height, hence termination of fixpoint computations is not guaranteed: we need a widening operator. The accuracy of the analysis highly depends on its definition and this question has led to numerous researches. Cousot and Halbwachs’s initial proposition[15] was to keep only the constraints of one polyhedron which are verified by every generator of the other one. While being safe, this operator gets rid of a lot of information. Hence, the question has been deeply studied and led to numerous enhancements, among which we can cite Bagnara’s work [3].

2.2.2 The template-based polyhedral domain

The expressiveness of polyhedra makes them useful for capturing linear re-lationships between variables. Nevertheless, the worst case complexity of polyhedral computations is exponential. Numerous weaker domains have therefore been designed, looking for a compromise between completeness and computational efficiency. One of particular interest is the template-based polyhedral domain, introduced by Sankaranarayanan [40].

Intuitively, the complexity of the general polyhedral domain lies in the infinity of linear directions to chose from in order to improve the approxi-mation. We therefore restrict ourselves to a finite subset of directions, fixed at each program point.

An abstract element was previously defined by a convex polyhedron, which can be defined as a m_{× n matrix A and a vector b, the polyhedron} being the set of x such that Ax _{≤ b. Restricting the possible directions} corresponds to fixing one matrix A for each program point, an abstract element consequently being simply a vector b. This leads to the following Galois connection:

{Set of linear constraints} −−−→←−−−α γ

{b such that Ax ≤ b}

This illustrates the relative nature of abstraction: the concrete domain here, the sets of linear constraints, is equivalent of the one of convex poly-hedra, which was the abstract one in Cousot’s analysis.

Despite maintaining a rich expressiveness with respect to linear invari-ants, operations on such a domain are still worst case polynomial. However, the use of this domain raises a connex problem: the choice, at each program point, of the corresponding template. Sankaranarayanan suggests to use heuristics for this purpose.

(12)

2.3 Polynomial equalities as invariants

Inferring linear invariants is an efficiently solved task, scaling up on indus-trial programs having been realized1. Nonetheless, it does not grasp as much information as desired, which makes the task of designing validated program a challenge. Hence, we would like to generate polynomial invariants of any degree.

While what we are ideally looking for, and actually study in the next section, are polynomial inequalities as invariants, i.e. relations of the form {x | p(x) ≥ 0} where p is a n-variate polynomial, an easier problem consists in seeking invariants as polynomial equalities. Note that one such relation is equivalent to a couple of the former: _{{p(x) ≥ 0} and {−p(x) ≥ 0}.}

We motivate in this section the use of the polynomial ideal domain as an abstract domain for such a problem, before briefly presenting analyses generating polynomial equalities as invariants.

2.3.1 The domain of ideals

Let_{S = {p}1 = 0, . . . , pm = 0} be a set of polynomial equalities. Such a set

possesses strong structural properties: ∀p ∈ S, ∀q ∈ R[x1, . . . , xn], p.q = 0

and_{∀p, q ∈ S, p + q = 0.}

Such properties of stability for a set define the mathematical notion of ideal.

Definition 3 (Polynomial ideal). A set _{I ⊆ R[x}1, . . . , xn] is an ideal if:

• 0 ∈ I

• ∀p1, p2∈ I, p1+ p2 ∈ I

• ∀q ∈ R[x1, . . . , xn], p.q∈ I

We note _{hSi the ideal generated by a set of polynomials S.}

Hence, if we are able to infer some polynomial invariants, any polynomial in the ideal they generate are of interest. Moreover, the set of ideals has a structure of lattice, which we have seen to be a necessity for defining an analysis: I_{⊓ J = I ∩ J and I ⊔ J = hI ∪ Ji.}

Our third concern is having a compact representation of our ideals. The following theorem states that ideals can always be finitely represented, even if containing an infinity of elements.

Theorem 1 (Hilbert). For any ideal I, there exists S_{⊆ I, finite, such that} hSi = I.

1

For instance, Airbus A380 flight control has been proved run time error free in 2008 with a static analysis tool. See http://www.astree.ens.fr/

(13)

Nevertheless, ideal intersection and fixpoints computations require to solve the membership problem: in order to detect stabilization, we have to be able to state if a polynomial p is part of an ideal. While being trivial with mono-variate polynomials, this problem is harder with multi-variates ones as they do not constitute an euclidean ring. We therefore need a substitute to euclidean division: Gröbner bases. A Gröbner basis is a particular set of generators for an ideal which grants us with a process to solve the mem-bership problem. Nevertheless, Gröbner basis computation is known to be doubly exponential with respect to the number of variables.

However, ideals appear as the natural choice of abstract domain when aiming at polynomial equalities inference. In the remaining of this section, we will present the state of the art in this specific sub-domain.

2.3.2 Analyses for polynomial equality inference

While recent, the domain of polynomial equality inference is quite well cov-ered. Notably, Rodr´ıguez-Carbonell and Kapur introduced an analysis [37] complete2, for a sub-class of programs, whose in particular assignments have to be inversible in a certain sense. The analysis performs on the domain of ideals, but is restricted in practice by the use of Gr¨obner bases, preventing it to scale well with the number of variables.

M¨uller-Olm and Seidl proposed a complete backward analysis [32] whose only restriction is that guards have to be polynomial disequalities. While lacking of experiments and complexity result, their analysis is precious for highlighting the interest of backward analysis in order to facilitate abstract assignment computations, as well as for its constrained-based aspect, al-lowing the systematic derivation of an invariant inference algorithm from a invariance testing algorithm.

Sankaranarayanan et al. have proposed in 2004 an analysis dedicated to the inference of polynomial equalites invariant [39]. They do not im-pose any constraint on programs, but their analysis in turn is not complete. Their major contribution to the domain is the notion of weakening the con-secution constraint, allowing to trade completeness with a reduction of the computational cost.

Cachera et al. have proposed another analysis [11], using the domain of ideals and extending the works of [32] and [39]. No hypotheses are made on the program structure, but they only compute inductive loop invari-ants, similarly to Sankaranarayanan’s approach. Managing to get rid of any Gr¨obner bases use, the technique has shown good scalability.

2

An analysis is said to be complete for a class I of invariants and a class P of programs if, executed on a program p ∈ P, any invariant i ∈ I of p is found.

(14)

2.4 Polynomial inequalities as invariants

While many progress still have to be done, polynomial equalities as invari-ants inference seem to have its main methods commonly accepted. On the other hand, polynomial inequalities inference techniques have not shown so far good scalability properties.

In this section, we first describe an approach taking its inspiration from the equality case by looking for an adequate structure similar to the ideal structure; we then study an extension of the template-based domain intro-duced in Section 2; finally, we analyse the flaws of this method and motivate the study in depth of a third solution.

2.4.1 A similar approach to the equality case: cones of polyno-mials

The stable properties of polynomial equalities have led to the structure of polynomial ideals. Bagnara et al. [4] suggested a similar reasoning in the inequality case.

Given a finite set_{pi ≥ 0} of polynomial inequalities, one can note that,

∀λ, λi ∈ R+, λ +P i

λi pi ≥ 0. The set of those polynomials is the

so-called polynomial cone generated by {pi ≥ 0}. However, as this structure

only contains linear combinations, it does not perfectly entail all consequent invariants: consider the cone generated by_{x1, x1x2}, the invariant x2≥ 0 is

obvious, but not entailed. We therefore need to close the cone with respect to the product operator. Note that ideals also respect this closure property. The analysis on this domain is then implemented on top of the polyhedral domain by considering each variable as an independant variable.

Implementation has shown that the analysis is able to infer non-trivial quadratic invariants. However, the practical results remain quite modest and show no direct hope to scale up.

2.4.2 Extending the template-based domain

A drastically different approach has been proposed by Adj´e et al. [1], ex-tending the template-based domain described in Section 2. A template is now given by a finite set of polynomials, allowing more complex sets to be constructed, non-convex ones in particular.

Given a template P = _{pi}i∈I, a concrete element is a set of points

delimited by polynomial equalities, named P -sub-level set. Conversely, an abstract element is a function which associates to each pi a real number vi,

(15)

therefore have the following Galois connection. {P -sub-level sets} −−−→←−−−α γ {functions from P to ¯R_} X7→ (p 7→ sup x_∈X p(x)) {x ∈ Rn _{| p(x) ≤ v(p), ∀p ∈ P } ←[ v}

While being suitable to any degree, this domain leaves us with the task of safely approximizing an optimum of polynomials under polynomial con-straints. This problem is known to be NP-hard, leading Adje et al. to restrict their analysis to degree two, hence performing the so-called Shor’s relaxation.

2.4.3 The problem of template inference

Investigating the extension to higher degrees of the method by Adje et al. could have been an interesting way to investigate. Indeed, while being NP-hard, the constraint polynomial optimization problem appears to be deeply related to polynomial invariants: intuitively, no matter which method we follow, we are ultimately trying to bound as precisely as possible a set by a polynomial hypersurface.

Nevertheless, their framework is quite rigid in the sense the templates used in the analysis have to be manually provided. This knowledge therefore needs to be built on human expertise, a situation we would like to prevent as much as possible.

In order to illustrate this flaw, let us introduce a classical approach in stability theory: the concept of Lyapunov function. Indeed, in the domain of stability theory, invariants are specifically of great interest in order to prove that a given system is stable, i.e. does not diverge. Such a result is traditionally obtained by exhibiting a Lyapunov function:

Definition 4(Lyapunov function). Given a system defined by the evolution law xk + 1 = f (xk), a Lyapunov function is a function V : Rn → R such

that:

V (0) = 0_{∧ ∀x ∈ R}n_{\ {0}, V (x) > 0 ∧ lim}

||x||→∞V (x) = 0

∀x ∈ Rn_{, V (f (x)}_{− V (x)) ≥ 0}

Intuitively, such a function, unbounded in all directions, decreases at each step of evolution of the system, therefore proving the stability of the system.

In the context of Adje et al.’s method, exhibiting a Lyapunov function provides a suitable template for proving stability on the system. But at this point, the problem has moved to the question of how to infer Lyapunov functions. Cousot proposed a method answering this question in the spe-cific case of ranking function inference [13]. Once again, an extention of

(16)

this method to invariant inference and higher degrees raises the question of polynomial optimization. We therefore present Cousot’s method before investigating the theoretical and practical viability of its extension.

3 A constraint-based approach through successive

relaxations

We have chosen to start from a method mostly synthesized, with a differ-ent objective, by Cousot [13]. In opposition to most previous approaches, whose starting point is a chosen domain, the idea here it to pass a quite gen-eral characterization of the semantics through successive “sieves”, hopefully reaching tractable time and space complexities.

Cousot developed this method specifically aiming at inferring ranking functions. Roughly speaking, a ranking function is a function from the values of program variables into a well-funded set, N for instance, which is strictly (and fast enough) decreasing at each iteration of the loop, hence proving termination for the given program. Cousot points out at the end of his paper the possibility of an extension of this method to invariant inference. However, this alteration rises additional difficulties, regarding which Cousot seemed quite optimistic, while performing very few actual insight to this problem. Our work has mainly be to explore this idea, trying to validate or invalidate it.

The remaining of this section consists of Cousot’s method, mainly through the exposition of the successive relaxations. This is all well known littera-ture, while we tried to enrich it with discussions about loss of expressivity. The in depth analysis of the resulting system of constraints and our related contributions will be dealt with in Section 4.

3.1 Relational semantics and loop invariant

3.1.1 Programming model

We consider in the remaining of this report a programming model more specific than the one introduced in Section 2.1.1. We consider a simple imperative language, allowing assignments, tests and loops. We suppose our program to have n variables, taking their values in R; a state is therefore an element of Rn_{. We define the preconditions of a loop as an overapproximation}

of the possible states before entering the loop. Preconditions are defined as algebraic relationship over the variables of the program.

More specifically, we are interested in inferring, given a loop and its pre-conditions, algebraic relationship for any state reached by the execution of the loop starting from a state satisfying the precondition to hold. Programs

(17)

considered therefore have the following shape: x← P (Preconditions) while B do C.

where _{← abusively designs here a nondeterministic assignment. B is a} boolean test, and C a subprogram, potentially containing tests, loops and assignments itself. Assignments are assumed to be polynomials3: any as-signment has the form xi= p(x), where p is polynomial.

3.1.2 Relational semantics

Considering loops, we abstract the semantics of the program in a standard way, as a polynomial relationship between the values of the variables before the execution of the loop body and the ones after the execution. Having re-strained our assignments to polynomial functions, we can therefore write, by composition of polynomials, the semantics of the loop body as a conjunction of m polynomial relationships ¯σ1, . . . , ¯σm describing the effect of the loop

on each variable. Introducing fresh copies of variables, we represent their values after the loop by x and before the loop by x0, we obtain:

m

^

k=1

¯

σk(x0, x).

In presence of a conditional, we have one such relationship for each branch. When the semantics of a loop body is defined only by equality relationships, it can be equivalently expressed as an application σ such that, if x represents the values of the variables entering the loop, σ(x) = (σi(x)),

with σi ∈ R[x]. This method avoids the introduction of fresh variables, but

will induce a rise in the considered degree by substitution inside the invari-ant, as described in further details in Section 3.1.3. However, we stick in this report to the latter description of the semantics, illustrated on Figure 4.

3.1.3 Characterization of loops invariants

Based on such a description of the semantics, being the least fixpoint of the canonical extension of σ over sets, starting from the set of preconditions, a subclass of invariants can be characterized: loop invariants. A loop invariant is an algebraic relationship which is stable under execution of the loop body. More formally, P _{≥ 0 being a polynomial relationship characterizing the}

3

This apparent restriction has to be mitigated. Despite of the numerous calls to loga-rithms or trigonometric functions the Math library Java allows you, down to a low enough level, any computation is polynomial.

(18)

i← 2 j_{← 0} while ?? do if ?? then i_{← i + 4} else i← i + 2 j_{← j + 1} end if end while x= (i, j) σthen 1 = (i + 4, j) σelse 1 = (i + 2, j + 1)

Figure 4: An elementary program and the associated relational semantics.

preconditions, let I≥ 0 be a polynomial relationship among variables, I ≥ 0 is a loop invariant of the program if and only if:

P (x)_{≥ 0 ⇒ I(x) ≥ 0} I(x0)≥ 0 ∧ n ^ k=1 σk(x0, x) ! ⇒ I(x) ≥ 0

Equivalently, when the semantics is expressed as polynomial equalities:

P (x)≥ 0 ⇒ I(x) ≥ 0 (1)

I(x)≥ 0 ⇒ I(σ(x)) ≥ 0 (2)

The definition is highly intuitive: we state first that any possible initial state, i.e. satisfying the precondition, also satisfies the invariant, and second that the invariant is stable by the semantics of the loop body.

Distinction between invariants and the specific subclass of loop invariants must be understood. In the case of an infinite loop, while true do. . . , we have:

[I(x)_{⇒ I(σ(x)] ∧ [P (x) ⇒ I(x)]} [Loop invariant] [P (x)_{⇒ ∀n, I(σ}n(x)] [Generic invariant]

Let us consider the example depicted on Figure 5 to illustrate the loss of expressivity induced by the notion of loop invariant. The program simply doubles the norm of the variables if this norm is greater or equal to 1, cuts it in half otherwise. The precondition being the open unit ball, the concrete semantics is exactly this open ball, which therefore is an invariant, and even a loop invariant. However, open sets are quite hard to grasp, reacting badly to meet and join operations. The closed unit ball would therefore be an interesting overapproximation, and hence an invariant. Nevertheless, the closed unit ball is not a loop invariant: the border is not stable by the semantics of the body. More precisely, this program admits no other loop invariant than the concrete collecting semantics, i.e. the open unit ball.

(19)

Precondition: x2+ y2 < 1 whiletrue do if x2_{+ y}2_{≥ 1 then} x← 2 · x y← 2 · y else x← x/2 y← y/2 end if end while

Figure 5: The closed unit ball is a trivial invariant, but not a loop invariant.

3.2 A bridge to computability: the parametric abstraction

In its current form of characterizing general inductive invariants, our prob-lem is not even computable. Indeed, we implicitely existentially quantify over predicates: we state that there exists a predicate I such that the pre-condition and the consecution are satisfied. Hence, we have a second order formula, leading to undecidability.

In order to get rid of such a complexity, we realize a parametric ab-straction as named by Cousot. Intuitively, we fix a priori the shape of the invariants by parameterizing it. For instance, we can restrict ourselves to conics by parameterizing our invariant under the form Ia,b,c= ax2+ by2+ c.

The characterization of a parameterized invariant Ia therefore becomes:

∃a ∈ Rp _:

∀x ∈ Rn_{: P (x)}_{≥ 0 ⇒ I}

a(x)≥ 0

∀x ∈ Rn: Ia(x)≥ 0 ⇒ Ia(σ(x))≥ 0

The loss of expressiveness here is quite easy to grasp: we restrict our inference to invariants of the chosen shape, conics in the example above. However, the quantification is no more over a predicate, but over the pa-rameters, hence relying on first order quantification.

Note that at first sight, this process can present similarities with the notion of template, described previously through the works of Sankaranara-yaran and Adje. However, parametric abstraction is far more general: while the idea behind a template is to fix a specific polynomial shape, and opti-mize how close from the semantics you move it by tweaking exclusively the constant, parametric abstraction theoretically allows you to restrict among ellipses for instance and look for the ideally shaped one.

In spite of the fact that decidability is reached, this restriction is still too loose, calling for a new “sieve”.

(20)

3.3 Lagrangian relaxation

With this first relaxation to parametric invariants, the inference indeed be-comes computable: the Tarski-Seidenberg decision procedure for the first-order theory of real closed fields by quantifier elimination can be used for instance. Anyway, even the cylindrical algebraic decomposition method by Collins [12] is still doubly exponential in the number of quantifier blocks, and is the best method available. There is still a need for further relaxation.

3.3.1 Definition

The Lagrangian relaxation states that a sufficient condition for a relationship of the form: ∀x ∈ Rn_: m ^ k=1 σk(x)≥ 0 ! ⇒ σ0(x)≥ 0 (3)

to hold is given by the following condition: ∃λi ≥ 0 : ∀x ∈ Rn, σ0(x)−

m

X

k=1

λkσk(x)≥ 0. (4)

Indeed, assume_{∀k ∈ {1, . . . , m}, σ}k(x)≥ 0, then −

Pm

k=1λkσk(x)≤ 0 and

(3)_{⇒ (4) is straightforward.}

However, the converse implication does not hold in general. Nevertheless, if all the σi are assumed to be linear, the Lagrangian relaxation is complete.

Additionally, the relaxation known as the Yakubovitch’s S-procedure [46, 23], frequently used in system theory to derive stability, happens to be a particular case of Lagrangian relaxation which happens to be complete while admitting one quadratic constraints among the linear ones.

3.3.2 A link between Lagrangian relaxation and duality theory The Lagrangian relaxation as expressed here takes its origin from the the-ory of the Lagrangian dual in optimization, described in further details in Section 4.1.1. However, we can notice without further knowledge that (3) is equivalent to p∗ _{≥ 0 where p}∗ _{is the solution to the following optimization}

problem: minimize σ0(x) subject to m ^ k=1 σk(x)≥ 0

Hence, defining L(λ) = inf

x σ0(x)−

Pm

k=1λkσk(x), and the dual problem:

maximize L(σ) subject to λ_{≥ 0,}

(21)

any feasible point of the dual problem is a lower bound on the initial one, hence establishing the sufficient condition. Furthermore, the condition is an equivalence if and only if the duality gap, the difference between the solutions of both problems, is tight (i.e. equal to zero). This additional condition is notably achieved in the linear case, but not in the general polynomial context.

3.3.3 Invariant characterization

Back to invariant inference, relaxing the parametric invariance conditions provides the following characterization of invariants:

∃a, ∃λP ≥ 0, ∀x, Ia(x)− λ_PP (x)≥ 0

∃λ ≥ 0, ∀x, Ia(σ(x))− λIa(x)≥ 0.

At this point, the constraints are first order quantification over inequality constraints. While this is far from guaranteeing tractability, it constitutes a more reasonable ground to lay on compared to the one we started from. In the following of the report, we investigate both theoretically and ex-perimentally the task of extracting relevant invariants through the use of optimization solvers.

4 Invariant generation as solutions of a set of

con-straints

The successive relaxations described in the previous section allow for syn-tactically generating a decidable set of constraints, encoding the so-called precondition and consecution conditions. Any solution of these constraints is an invariant for the considered program. Two questions naturally arise. Af-ter those successive relaxations reducing the time complexity, is the derived system efficiently solvable? Furthermore, from an algebraic perspective, the solutions of a system of constraints are not distinguishable. This is not true in our case: we look for semantically meaningful solutions, i.e. sets of pa-rameters defining relevant invariants. For instance, if two circles centered at the same point are both invariants of the considered program, the one with the smallest ray is of greater interest. The second question therefore is: how to guide the solver toward solutions which are relevant to us? While both problems are closely related, this explains why we can’t restrict our-selves to constraint solving and have to consider the more general setting of constrained optimization.

We try through this section to investigate and reduce the time complexity of the constraint resolution, before considering the problematic of guiding the solver to a semantically relevant solution. Finally, we propose ideas on

(22)

yet unsolved problems open for further works. However, we start by giving some notions of optimization that will be useful for our purpose.

4.1 A targeted framework: semidefinite programming

The question of solving or relaxing the current system we are dealing with cannot be answered without some insight into the topic of optimization, also known as mathematical programming. In its most general form, this framework is defined as the optimization of a function f0 : Rn → R, under

inequality constraints over functions on the same domain and codomain: minimize f0(x)

subject to fi(x)≤ bi, i = 1 . . . m

Where no hypotheses are put neither on the objective function, f0, nor on the

fi, functions denoting the constraints. Note that equalities can be encoded

as a couple of opposed inequalities. Naturally, this framework is extremely general, and highly non-computable. For the sake of tractability, special subclasses have therefore been heavily studied.

4.1.1 Linear programming

The most elementary but still interesting optimization problem consists in enforcing all objective and constraint functions to be affine. Such a problem is called a linear program, usually written as:

minimize cTx+ d subject to Ax < b

x <0,

where A_{∈ R}m×n, b_{∈ R}m_{, x}_{∈ R}n_{and < denotes over vectors the pointwise}

inequality.

The feasible set of a linear program, i.e. the set of x satisfying the con-straints, is a polyhedron, hence naturally a convex set. Solving the problem is quite straightforward: the solution is the farthest possible x following di-rection−c, as illustrated on Figure 6. Dantzig’s simplex method for instance permits to efficiently solve in such problems practice.

In addition, the linear case benefits from an invaluable property, namely the duality theorem, that helps in designing optimization solvers. The idea behind duality consists in associating to a so-called primal problem of n variables and m constraints a closely related problem of m variables and n constraints. In the linear case, we therefore define the dual as:

(23)

Figure 6: Geometric interpretation of a linear program (from [8]). Dashed lines are objective function’s level curves. x∗ is optimal, as far as possible along direction _−c.

Primal Dual

min cT_x_: _{max y}T_b_:

Ax < b yT_{A 4 c}T

x <0 y 40

The duality theorem states the existing link between both primal and dual problems [16].

Theorem 2. If both the primal and the dual of a linear program are feasible, then the two optima coincide.

This very strong result relies on Farkas’ lemma, stating when a linear system is satisfiable.

Lemma 1 (Farkas’ lemma). The linear system subject to Ax < b

x <0,

is satisfiable if and only if using a positive linear combination of the inequal-ities, it is possible to derive −1 ≥ 0, i.e. there exists λ1, . . . , λm ∈ R+ such

that m X i=1 λiai ≺ 0 and m X i=1 λibi> 0,

where the ai are the lines of A.

The duality theorem therefore offers an alternate problem which can be exploited in order to solve the problem more efficiently. Linear programming is an ideal case, that we can in practice solve exactly in a very efficient way. We will now see how most of the ideas behind this first framework can be extended to the more general case of convex programming.

(24)

4.1.2 Convex programming

A particular attention has been given to a subclass of mathematical pro-grams for about one century, namely convex programming [8]. This at-tention has even been growing in the last decades with the discovery of efficient resolution methods. In a convex optimization problem, the func-tions f0, . . . , fmare supposed to be convex. Obviously, linear programs are

special instances of this framework. While there is in general no analytical formula for the solution of convex optimization problems, convexity is a key to efficiency. Let us give a few insights on this interest of convexity.

First, any locally optimal point of a convex problem is also globally optimal, which highly facilitate the process of optimization. Indeed, an optimization algorithm typically consists in picking up an initial point and iterating the following steps, until a stopping criterion is satisfied:

• determine a descent direction ∆x; • choose a step size t;

• update x ← x + t∆x

Hence, algorithms are likely to fall and get stuck in local optima, a problem which does not concern convex optimization.

What’s more, convexity provides many optimality critera, which are valuable for determining a relevant descent direction. One illustrative ex-ample can be stated if f0 is additionally assumed to be differentiable: x is

optimal if and only if x is feasible and

∇f0(x)T(y− x) ≥ 0, for any y feasible,

where_{∇ denotes the gradient operator. This condition generalizes the idea} already present in the linear case: going as far as possible along direction −c. Whereas the gradient was constant in the linear case, the condition here more generally states that the opposite of the gradient evaluated at the optimal point defines a supporting hyperplane (i.e. tangent and such that the set is included on one side of the hyperplane) to the feasible set, as illustrated on Figure 7.

Such sufficient conditions provide ways to determine the descent direc-tion. The gradient descent method thus suggests to descend in the current opposite gradient direction, as illustrated on Figure 8. The more evolved steepest descent technique takes the direction which minimizes the gradient, intuitively where the slope is the highest, as illustrated on Figure 9. Finally, Newton’s method uses a second-order Taylor approximation, assuming f0 is

twice continuously differentiable, to compute a more relevant descent direc-tion based on the Hessian of f0.

These methods are both quite intuitive and efficient. But they cannot be applied to general convex programs. The best compromise is Newton’s

(25)

Figure 7: Geometric interpretation of the optimality condition (from [8]). Dashed lines are objective function’s level curves. x∗ is optimal, as far as possible in the direction_−∇f0.

Figure 8: Iterates of the gradient method (from [8]). Dashed lines are ob-jective function’s level curves.

Figure 9: Iterates of the steepest descent method (from [8]). Dashed lines are objective function’s level curves.

(26)

Figure 10: Comparison between the dashed indicator function and several solid logarithmic barrier (from [8]).

method which performs in practice extremely well on problems constrained by equalities exclusively. This method is therefore the basis of the so-called interior-points method for convex optimization. Intuitively, the idea is that an inequality constraint can be replaced by adding to the objective function a function which is null where the constraint is satisfied, and evaluates to infinity otherwise. Unfortunatly, such a function would deprive us from continuity, which we need. We therefore rely on a logarithmic barrier to approximate this indicator function with arbitrary precision, as illustrated on Figure 10. Interior-points method has proven to be in practice of great efficiency.

Convex programming is thus the cornerstone of optimization: the frame-work is expressive enough to formulate many problems, and most of the ideas developed for more complex problems extends ideas developed in the convex case.

4.1.3 Semidefinite programming

The framework of convex programming admits a significant extension which can be efficiently handled by an extension of the interior-points method: semidefinite programming4. Let us first briefly recall a few well known algebraic results.

Proposition 1. For a real symmetric n× n matrix A, the following prop-erties are equivalent:

1. xT_Ax_{≥ 0 for all x ∈ R}n_.

4

Actually, inequalities can be generalized with respect to any proper cone (i.e. convex, closed, solid and pointed). Semidefinite programming coincides with the cone of positive semidefinite k × k matrices.

(27)

2. All eigenvalues of A are _{≥ 0.}

3. A = UT_{U for some n}_{× n matrix U.}

We say a square matrix A is positive semidefinite if A is real symmetric and the equivalent conditions supra hold. A linear matrix inequality (l.m.i.) is then defined as:

F0+ m

X

i=1

xiFi <0,

where over matrices, A < B means A− B is semidefinite positive.

A semidefinite program is defined as the optimization of a linear function subject to a l.m.i. A survey of the theory and applications is provided by Vandenberghe and Boyd [45]. The popularity of this framework has grown constantly due to three circumstances: the amount of domains from which arise naturally semidefinite constraints; the expressiveness of the framework, most of convex programs admitting a semidefinite reformulation; and the quite recent discovery that semidefinite programs are actually not much harder to solve than linear programs.

Indeed, while no practical simplex method is able to handle those con-straints, Nesterov and Nemirovsky managed to extend the interior-points method in the late eighties [34]. From a computer scientist perspective, semidefinite programming got notably renowned thanks to Goemans and Williamson [17] which performed a breakthrough on the maximum cut prob-lem by relaxing the probprob-lem under the semidefinite framework.

Getting back to invariant inference, we note that semidefinite program-ming is the framework in which one Adje et al. [1] have chosen to fit. In order to do so, they restricted themselves to degree two, allowing them to perform the so-called Shor’s relaxation.

In the remaining of the section, we try to analyze our specific problem in the light of the foregoing, and study ways to fall in the semidefinite framework, as well as possible alternatives.

4.2 Polynomial inequalities

While semidefinite programming appears to be the ideal framework, we do not fit straight in it. A first reason is that we deal with constraints as multi-variate polynomial inequalities, which are highly non convex and cannot be expressed as a l.m.i. Actually, satisfiability of polynomial inequality constraints is known to be a NP-hard problem [33].

Unlike Adje et al., we do not want to put restrictions over the degrees of invariants we infer, and therefore need to relax our polynomial inequality constraints. One well known solution to this issue is the sum of squares relaxation. In the remaining of this section, we study first the relaxation in itself before looking at how the sum of squares constraint reduces to semidefinite programming.

(28)

4.2.1 Relaxation as sum of squares

We consider the relation between two properties of a polynomial p: • being nonnegative, i.e. ∀x, p(x) ≥ 0

• being a sum of squares (s.o.s.), i.e. ∃p1, . . . , pm∈ R[x], p =Pm_i=1p2_i

More precisely, being a s.o.s. polynomial obviously implying the non-negativity, we wonder if the converse implication holds. The study of this connection between the nonnegativity of a polynomial and existence of a way to write it as a sum of squares has a quite long history [29]. This prob-lem goes back to Hilbert, first in a paper in 1888, then by stating a close question, writing a polynomial as a sum of squares of rational functions, as his 17th _{famous problem.}

Actually, Hilbert already knew that the equivalence stands in the uni-variate case. It is even true if we restrict ourselves either to degree two, or to degree four and two variables. But in any other case, and especially as soon as we have two variables and if we don’t want to put restrictive hypotheses over degrees, there exists nonnegative polynomials which admit no formulation as a s.o.s. polynomial. The first constructive example has been exhibited by Motzkin in 1967 [31]:

s(x, y) = 1− 3x2y2+ x2y4+ x4y2

Relaxing the positivity constraint by a s.o.s. constraint therefore strictly simplifies our problem. However, estimating the loss of expressivity is as much tricky than crucial. A few results are known: on one hand, Blekher-man derived bounds for the volumes of the bases of both the cones of non-negatives polynomials and s.o.s. ones, showing that in a sense, “there are significantly more nonnegative polynomials than sums of squares” [6]; on the other hand, Berg et al. established a denseness result [5], in the sense of a specific norm, of the cone of s.o.s. polynomial in the space of nonnegatives ones. From our perspective, those results appears highly technical, making it especially difficult to rely to a notion of semantical expressiveness. Two remarks can nevertheless be made: from a far more pragmatical point of view, experiments tend to show that many polynomials practically raised in various domain do accept a s.o.s. decomposition, allowing actual exper-imental use of the relaxation; from a perspective of a long term work, we tried to develop, and believe many more would be required, an expertise on the overall domains of real algebraic geometra as well as optimization in order to be able to identify the relevant techniques, as well as their precise drawbacks.

The dual issue to expressiveness raised by the s.o.s. relaxation is the one of time complexity. Fortunately, this problem has found a far more satisfactory answer.

(29)

4.2.2 Reformulation as a semidefinite program

Hence, s.o.s. constraints relaxes the positivity constraints while maintaining an experimentally interesting, despite hard to predict, amount of expressive-ness. Moreover, the time complexity of s.o.s. testing recently encountered a satisfactory solution through several independant work, notably by Lasserre and by Parrilo. The main result, enforcing high interest in this relaxation, is the reformulation of the s.o.s. constraints in the semidefinite framework. The method developed by Parrilo [36, 35] is built upon the computation of a s.o.s. decomposition thanks to the Gram matrix method. Indeed, given a polynomial p of degree 2d, we write it in the basis of all monomials of degree less or equal to d:

Z = ( Y P j ri,j≤n xri,j j ) : p = ZtP Z,

where P is a constant matrix. For instance, considering the bivariate poly-nomial F (x, y) = 2x4+ 2x3y− x2y2+ 5y4=   x2 y2 xy   T   p11 p12 p13 p21 p22 p23 p31 p32 p33     x2 y2 xy  .

Such a decomposition is not unique, but p is a s.o.s. polynomial if and only if there exists one such P which is semidefinite positive. The existence of a s.o.s. decomposition of a polynomial of n variable and degree d can therefore be checked by solving a semidefinite program of size n+d_d _× n+d_d . Upon this result, Parrilo developed a procedure allowing the search for bounded degree solutions to the more general problem consisting of solving a set of polynomial inequalities. Furthermore, while this process checks for the emptyness of a set of constraints, it can be extended to compute a global lower bound of a polynomial p by maximizing a parameter γ such that p(x)− γ is a s.o.s. polynomial, hopefully returning a precise underapproximation. This procedure has been implemented, coupled with numerous optimiza-tion, especially linked to sparsity, by Parrilo in the SOSTOOLS5free Mat-labtoolbox.

Major breakthroughs in the domain have independently been developed by Lasserre, forging a link with the dual theory of moments. He has built both a process constructing a family of s.o.s. polynomials converging to a given polynomial [26], and a sequence of semidefinite problems whose se-quence of solution converges to the solution of the initial problem bearing a nonnegativity constraint [24, 25] (see [41] for a more didactical survey of these results). Unfortunately, no implementations are available, and results

5

(30)

on the speed of convergence are still lacking at this time, outside of spe-cific cases. Nevertheless, we believe that these works study might lead to enhancements.

4.3 Yalmip

Aiming at some experimental validation of this constraint-based method for invariant inference, we had to experiment various solvers over examples. To this end, the Matlab optimization application programming interface YALMIP6, developed and maintained by L¨ofberg, has been of invaluable help. It offers a simple common language to describe mathematical pro-grams of any nature, includes transparent speed optimization processes, and interface numerous solvers dedicated to each specific class of problem. In particular, SOSTOOL is even directly plugged inside of it [28].

4.4 Bilinear parametrization

So far, we saw that ideally, the semidefinite framework is the suitable one. We therefore aim at relaxing our generic polynomial inequalities constraints into s.o.s. ones as those have been discovered to be reformulable as semidef-inite programs during the last decade. However, troubles remain. As stated sooner, semidefinite programming deals with l.m.i. Nevertheless, Lagrangian relaxation reformulates the consecution condition as

Ia(σ(x))− λ ∗ Ia(x),

where Ia(x) is parametrized. We therefore, due to the Lagrangian

mul-tipliers λ, have a product between two parameters, moving us from the well-behaved world of linear matrix inequalities to the much wider one of bilinear matrix inequalities (b.m.i.).

4.4.1 An insight into the b.m.i. problem A b.m.i. is of the form:

F (x, y) = F0+ m X i=1 xiFi+ n X j=1 yjGj+ m X i=1 n X j=1 xiyjHij <0.

The bilinear term gets rid of convexity which was inherent to l.m.i., allowing such a constraint to describe nonconvex sets and therefore suggesting the much greater complexity of this problem. Another way to have an insight into this complexity lies in the expressivity of this framework: any general polynomial inequality can be written as a b.m.i. [44]. It is therefore an NP-hard problem [7].

6

(31)

However, recent progress has been performed on the way of practical solving of b.m.i. problems. We can in particular cite the works of Quoc Tran Dinh, Michiels and Diehl [42, 43], appearing to be promising, but which we could not test due to lack of available implementation. However, further development of these works should be followed.

The second recent breakthrough is due to Henrion, Koˇcvara and Stingl [20, 21]. They outline the fact that a bilinear problem can still contain hidden convexity properties and therefore be “simple” to solve, or can be numeri-cally efficiently solved through relaxations such as in the case of the s.o.s. decomposition by Parrilo, or the dual theory of moments by Lasserre. Cou-pled with the observation that ill cases of convex problems can be hard to solve, they suggest that the convex/non-convex opposition is too simple, and propose an algorithm for non-linear programming. To give the general idea, the method is based on the use of a penalty/barrier function, similarly to the logarithmic one exploited for the interior-points method. However, technical details are at this point out of reach for us. Note finally that this algorithm cannot pretend at global optimization, but only local one. Nevertheless, this is by no mean eliminatory, a non optimal invariant still potentially bringing us relevant information.

From our point of view, the huge benefit of this work is the existency of an implementation, as a Matlab solver called PENbmi7. PENbmi appears to be the only one bilinear solver currently available, and has therefore been the one we used for experimentations.

We stated we did not manage to go into the technical details of the method. This actually constitutes a real practical difficulty since the algo-rithm can require high degree of expertise to exploit it at its best. Indeed, reformulation of the problem or choice of an appropriate initial point can be crucial for its efficiency.

4.4.2 Experiments

As stated above, even using the s.o.s. relaxation, our problem remains part of the b.m.i. framework, due to the product between Lagrangian multipliers and parameters of the invariant. Coupled with the asymmetry of relevancy between solutions, this peculiarity makes the treatment of invariant inference inherently harder than the one of ranking function inference as treated by Cousot [13]. Cousot directly underlined this point in his paper but seemed reasonably optimistic in the efficiency of PENbmi , exhibiting two examples. We therefore started experimenting thanks to this solver.

Let us first consider the following Yalmip program (omitting variable declarations and printing commands for clarity) from [13]. This program encodes the search for linear invariants, under the form a.i + b.j + c≥ 0, of the program described on Figure 4, p. 16:

7

(32)

(1) F = [a*2+c>=0];

(2) F = F + [sos(a*(i+4)+b*j+c-m*(a*i+b*j+c)), m>=0]; (3) F = F + [sos(a*(i+2)+b*(j+1)+c-l*(a*i+b*j+c)), l>=0];

(4) sol = solvesos(F, [], sdpsettings(’solver’,’penbmi’), [a;b;c;m;l]); Each line translates to:

(1) P (x)⇒ I(x), simplified as I(2, 0) ≥ 0

(2) ∃m ≥ 0, I(σ(x)) − m.I(x) is a s.o.s. (first branch) (3) _{∃l ≥ 0, I(σ(x)) − l.I(x) is a s.o.s. (second branch)} (4) Resolution using the PENbmi solver

Depending on the version of PENbmi used, we either get the same in-variant as Cousot did:

2.14678.10−12i− 3.12793.10−10j + 0.486712≥ 0, or a totally different one:

1.59925.10−4i + 4.85516.10−4j + 2.22665.105_{≥ 0}

This is not a surprise, as this simply highlights the lack of control we have at this point on the resulting invariant, only its correction being guaranteed. However, a more disturbing point might be noted: which relevance can we give to a result precise at 10−12 given by a numerical tool? Especially, if we refer to the solvesos documentation8_{, terms smaller than 10}−6 _{are get}

rid off as being most likely due to numerical inaccuracy. It is therefore unclear whether the invariant exhibited by Cousot is not simply the trivial relationship 0.497812_{≥ 0.}

Anyway, the situation gets even worse on other examples. Reproducing the second example proposed by Cousot [13, page 21], we apparently get the exact same invariant as he did, but along with the following message from PENbmi :

+2.11831e-05*x -2.11831e-05*q*y -2.11831e-05*r >= 0 Stopped by iterations counter. Result may be wrong. PENBMI failed.

The provided invariant therefore is totally irrelevant. And this does not constitute an isolated case. Considering the following elementary example, also taken from [13], but treated for invariant inference instead of ranking function inference:

Under this form, the solver successively returns an invariant, semanti-cally irrelevant. However, thinking of guiding the solver, one first simple

8

(33)

j← 2; k ← 0 while ?? do if ?? then j_{← j + 4} else j_{← j + 2} k_{← k + 1} end if end while I = a*j+b*k+c; F = [2*a+c>=0]; F = [F, sos(a*(j+4)+b*k+c-l1*I), l1>=0]; F = [F, sos(a*(j+2)+b*(k+1)+c-l2*I), l2>=0]; sol = solvesos(F, [], .. ..sdpsettings(’solver’,’penbmi’), param);

idea could be to normalize the result by enforcing c = 1, either as an addi-tionnal constraint or directly in the parametric abstraction. However, this simple change, intuitively more likely to simplify the problem by suppressing a variable, now raises a PENbmi error:

Stopped by linesearch failure. Result may be wrong. PENBMI failed.

Multiplying the experiments, trying to deal with examples from [1] or setup elementary examples, both the previously described errors arised in the majority of the cases, in an impredictable way. As a conclusion, PENbmi has shown to be, without extensive expertise allowing one to cleverly refor-mulate the problem, unable to deal even with problems as small as two variables and five parameters, with degree 2.

The conclusion of this first stream of experiments is quite unexpectedly one-sided: as things stand, bilinear solvers do not handle our problem under its current form. It is not excluded that, considered how restricted is the bilinearity of our problem, a clever reformulation could allow the use of PENbmi . However, this would require far more expertise in the specific field of optimization under b.m.i. that we got. We therefore decided to seek for a heuristic that aims at getting rid of bilinearity.

4.5 Necessity of a heuristic

Experiments tend to show tools dealing with bilinear matrix inequalities can-not currently handle our issues. Nevertheless, the bilinearity we are dealing with is quite restricted, and only present in the consecution condition:

Ia(σ(x)− λ ∗ Ia(x)

Fixing the value of λ, coupled with the s.o.s. relaxation, would therefore be sufficient to get down to semidefinite programming. Hence we seek for a heuristic aiming at finding an interesting value for this Lagrangian mul-tiplier. The problematic need nevertheless to be precised. Indeed, to each

(34)

while true do x← −y y_{← x} end while Consecution: ay2+ bx2_{− 1 − λ(ax}2+ by2_{− 1) is a s.o.s.}

Figure 11: A simple rotation and the induced consecution.

positive λ is associated a set of admissible a, hence of inferable invariants. We would therefore like to be able to approximate this set for a given λ. Note that despite the restricted bilinearity we are dealing with, a precise answer to this question seems to be highly unlikely to be found, PENbmi appearing to be quite unable to deal with our constraints.

In order to develop some intuition about it, we first focus on degree two and study some examples. Visualization is easier in this restricted case, since these invariants define conics. As a first illustration, let us consider a parametric invariant Ia,b= ax2+by2−1, and a loop body executing a simple

rotation of π/2. The program and the associated consecution constraint are given on Figure 11.

Note that the quarter of the plane in which parameters (a, b) lie defines the type of conic in the variables (x, y) we obtain as an invariant of the program: an ellipse if a > 0, b > 0, the whole plane if a < 0, b < 0, and a hyperbola otherwise. Hyperbolae are not invariant by rotation, thus cannot be invariant for our program. Neither can ellipses, except for the particular case of circles, defined by the constraint a = b. The plane is naturally always invariant, but gives us no information on the semantics of our program.

This last remark highlights an hidden difficulty to our problem. At first sight, we could assume that finding a λ whose associated feasible set is non-empty would be sufficient in order to get relevant information about the semantics of the program. In particular, one could try to exploit the optimization description of the Lagrangian relaxation, as described in Sec-tion 3.3.2, trying to enforce Slater’s condiSec-tions9 for instance. However, this example shows how insufficient this idea is10_.

Indeed, elementary calculations lead to the following result about the feasibility of the consecution constraint, as subsumed on Figure 12:

• if λ ∈ [0, 1[, a ≤ b/λ ∧ a ≥ λb, hence only invariants describing the whole plane are available, for a decreasing set of admissible values of a and b.

9

Given a convex programming problem, Slater’s condition states that if a strictly fea-sible point exists (all constraints are strictly satisfied at this point), then the duality gap is tight.

10

Additionaly, note that the way we encode the constraints also is relevant. For instance, assuming our preconditions are the unit square, encoding it as four linear inequalities or two quadratic inequalities do not lead to the same sets of solutions.