Proximal ACCPM, a Cutting Plane Method for Column Generation and Lagrangian Relaxation: Application to the P-Median Problem

(1)

Report

Reference

Proximal ACCPM, a Cutting Plane Method for Column Generation and Lagrangian Relaxation: Application to the P-Median Problem

MERLE, Olivier du, VIAL, Jean-Philippe

Abstract

Proximal ACCPM is a variant of the analytic center cutting plane method, in which a proximal term is added to the barrier function that defines the center. The present paper gives a detailed presentation of the method and of its implementation. Proximal ACCPM is used to solve the Lagrangian relaxation of the p-median problem on two sets of problem instances.

Problems of the same collection are tentatively solved with the classical column generation scheme.

MERLE, Olivier du, VIAL, Jean-Philippe. Proximal ACCPM, a Cutting Plane Method for Column Generation and Lagrangian Relaxation: Application to the P-Median Problem. 2002

Available at:

http://archive-ouverte.unige.ch/unige:5824

Disclaimer: layout of this document may differ from the published version.

(2)

Proximal ACCPM, a cutting plane method for column generation and Lagrangian relaxation: application to the

p-median problem

O. du Merle^∗ J.-Ph.Vial^† November 18, 2002

Abstract

Proximal ACCPM is a variant of the analytic center cutting plane method, in which a proximal term is added to the barrier function that defines the center. The present paper gives a detailed presentation of the method and of its implementation. Proximal ACCPM is used to solve the Lagrangian relaxation of thep-median problem on two sets of problem instances. Problems of the same collection are tentatively solved with the classical column generation scheme.

Keywords Lagrangian relaxation, column generation, cutting plane method, analytic center, proximal method.

Acknowledgments The work was supported by the Fonds National Suisse de la Recherche Scientifique, grant # 12-57093.99. Part of the work was completed while one of the authors was visiting INRIA, Grenoble. The development of the new version of ACCPM is a collabo- rative work: F. Babonneau, C. Beltran, O. Epelly, Y. Hachez, O. P´eton and C. Tadonki are acknowledged for their contribution. We also wish to thank C. Lemar´echal for stimulating discussions.

1 Introduction

Column generation and Lagrangian relaxation are widespread techniques to compute useful bounds for integer programming problems. The standard solution method consists in formu- lating the so-called restricted master as a linear programming problem, preferably solved by Simplex-based method to take advantage of efficient warm start technique. This approach is known to have severe limits. Not only does the restricted master program becomes hard to solve because of massive degeneracy, but also the dual variables do not yield efficient columns.

∗Air France, Operation Research Department, 1 avenue du Mar´echal Devaux, 91550 Paray-Vielle-Poste, France

†HEC, Universit´e de Gen`eve, 40 Bd du Pont d’Arve, CH-1211 Geneva, Switzerland.

(3)

The way the master produces dual variables must be revisited, and the traditional use of the Simplex algorithm to compute them must be questioned.

Alternative methods have been proposed in the literature, e.g., [19, 11, 20], which enjoy better convergence properties from the theoretical and practical point of view. The method we propose in this paper is a new variant of the analytic center cutting plane method. The new feature is the addition of a proximal term to the logarithmic barrier, hence the name Proximal ACCPM. The method can be viewed as an hybrid between the classical ACCPM and the Bundle method.

A short clarification point is in order. Column generation is usually associated with Dantzig- Wolfe decomposition [5], while Lagrangian relaxation is linked with subgradient optimization.

The two methods are considered to be different. However, the two approaches rely on the same oracle. In the column generation environment, the oracle outputs a column that enriches the description of the restricted master; in Lagrangian relaxation, the oracle produces a cutting plane that refines a polyhedral relaxation of the Lagrangian dual. Optimizing the restricted master, or taking a subgradient step, are just two different methods to handle the oracle output. They belong to the real of convex nondifferentiable optimization. The fundamental issue in this field is to design algorithms that will take best advantage of the information output by the oracle.

In this paper, we adopt the point of view of cutting planes, even though the numerical method we adopt to generate the query point is a primal Newton method that is best described in the context of columns and restricted master. This is just a matter of presentation. In that framework, the successive cutting planes build, in the epigraph space, a polyhedral set that approximates the epigraph of the Lagrangian dual function. We name it the localization set, as it contains the set of optimal solutions. The counterpart of Dantzig-Wolfe decomposition is the Cheney-Goldstein-Kelley [4, 15] method that consists in choosing the minimizing point in the set of localization. It is easy to see that this point is an optimal dual solution for the Dantzig-Wolfe restricted master. This strategy has some serious drawbacks. First it requires that the localization set be inf-compact, i.e., has bounded level sets, something that is usually achieved at the cost of artificial bounds on the variables. Second and foremost, it selects extreme points (in the case of a Simplex-based solver) or point on facets (in the case of an interior point solver), which are usually far too distant and result in non-informative responses by the oracle. Lastly, the computation of this optimal point turns out to be surprisingly difficult on some problems.

One of the most efficient substitute to the above algorithm is the Bundle method [19]. Very loosely, it could be interpreted as Kelley’s cutting plane method in which proximal term is added to the objective. The proximal term that avoids generation of zig-zagging and chaotic query points. The Bundle method has the additional advantage that convergence can be enforced even with a limited-size bundle of cutting planes. The analytic center cutting plane method [11] ACCPM is just another scheme for nondifferentiable optimization. Its main feature is that the query point it selects is the analytic center of the localization set¹. The analytic center is a much more stable point than the Kelley point: ACCPM can be viewed as an alternative way of regularizing the query points.

Despite good results in application problems (see survey [13]), ACCPM has some drawbacks.

To be implemented ACCPM needs, as the classical Simplex-based approach, a compact set,

1Actually the intersection of the localization set with an upper bound constraint on the epigraph variable.

(4)

a feature that is obtained by adding artificial box constraints. This requirement is not as damaging as in the classical approach, as the sides of the box are not approached at once, and if the query point gets too close to a side, the box can be moved away. This strategy was successfully implemented in the previous version of ACCPM. The other difficulty arises from the use of an interior point method to generate the query point. It is known that the efficiency of these methods depends very much on the sparsity pattern of the constraint matrix. In the column generation framework, the matrix of columns is not always sparse, and almost surely, the product of that matrix by its transpose —an object that is crucial in the computation of the Newton direction— is dense. On problems in large dimensions, the interior point computations can kill the method.

These considerations have motivated us to design a variant of ACCPM that incorporates two important features of the Bundle method. First we have added a proximal term to the barrier function on the localization set. Second, we have implemented a scheme to reduce the number of columns to be taken into account in the effective definition of the localization set. The idea of inserting a quadratic term in the function that defines the analytic center is not new. It appeared first in [22] and later in [25] in an homogeneous framework. The quadratic was instrumental in the convergence proof and the complexity estimates; it seemed also useful in the implementation of the homogeneous version, in particular in the solving of variational inequalities [26]. In these two cases, the proximal center remained fixed. In Proximal ACCPM, the proximal center is updated to the best response of the oracle. An update of the proximal center corresponds to a serious step in the Bundle method.

There is a theoretical argument [3] which asserts that the number of necessary cutting planes is bounded, a property that makes the analytic center polynomial. Unfortunately the bound is too large to be practical. One has to rely on heuristics to limit the number of cuts and make the method efficient. Common sense suggests that distant cuts are unlikely to play any role in the definition of the localization set near the optimum, and could be discarded. The problem is what is meant by “distant”. To remain consistent with the logic of an interior point method, the distance should be evaluated in the local metric of the inverse of the Hessian of the barrier (with the added proximal term). Unfortunately, the computation of distances in that metric requires the solving of a linear system for each cut, an extra operation that is too costly to be efficient. On the other hand, the Euclidean distance of a point to a cut is proportional to the slack variable of the constraint. It is given at no computational cost, but eliminating cuts (or cutting planes) on that basis may sometimes lead to disasters. In Proximal ACCPM, we apply a heuristic rule that aggregate columns that are far away in the Euclidean metric into one column (a surrogate constraint in the dual). By linking the intensity of the elimination with the pattern of serious steps, the heuristic rule has been deemed safe on very diverse problems.

To assess the value of the new method, we apply it to the computation of effective lower bounds for the p-median problem, a classical problem in combinatorial optimization. The choice of this specific problem was dictated by our experience of failures to solve the Lagrangian relaxation with the standard scheme. As it is the case for many related problems with covering or partitioning constraints, the master program is degenerate, which contributes to the inefficiency of the classical approach. The numerical results confirm this common view.

The classical approach could only solve the smaller instances in our test-bed, that is problems of dimension less than 300. We have not attempted to improve the classical approach, for instance, by using linear regularizing terms as in [9]. We believe that our results are quite

(5)

typical and reveal an inherent difficulty of approaches querying the oracle at the optimal point of the master program. By comparison, Proximal ACCPM behaves much better, making it possible to solve in a reasonable time problems of dimension 2500 or higher.

The paper is organized as follows. In section 2, we give a general presentation of polyhedral relaxation methods for convex optimization. In section 3, we define proximal analytic centers where the method queries the oracle. Section 4 proposes a primal Newton method to compute the proximal analytic centers. In section 5 we recall the definition of the p-median problem and introduce the dual problem in the Lagrangian relaxation. The last section deals with numerical experiments.

2 Polyhedral relaxation methods for convex optimization

The general problem we address is the minimization of a nonsmooth convex function on a convex set, with the assumption that both the function and the set are respectively revealed by a first order oracle and by a separation oracle. The canonical problem of interest is

min{f(y) =f₁(y) +f₂(y)|y ∈Y₁∩Y₂}, (1) where f₁ and f₂ are convex functions on Rⁿ and Y_i ⊂ Rⁿ, i = 1,2, are convex sets. The functionf₁ and the set Y₁ are revealed by a first order oracle. The functionf₂ and the set Y2 are given explicitly. In our study on the Lagrangian relaxation of the p-median problem, f₂(y) =b^Ty is linear and the setY₂ is made of simple linear constraints, e.g., nonnegativity constraints.

By oracle, we mean a black box procedure that returns the following information to a query pointy.

Feasibility cut y6∈Y1; the vector (a, γ)∈Rⁿ×R is a separating hyperplane:

a^T(y⁰−y) +γ ≤0, for ally⁰ ∈Y1, with γ ≥0.

Optimality cut y ∈ Y1; the vector (a, γ) ∈ Rⁿ×R, with a ∈ ∂f1(y) and γ = f1(y) is a supporting hyperplane:

a^T(y⁰−y) +γ ≤f1(y⁰), for all y⁰.

In many applications, in particular in Lagrangian relaxation, the function f₁ is the positively weighted sum of p nonsmooth functions

f₁(y) =

p

X

i=1

π_if_1i(y). (2)

Using this property, problem (1) can be written in format as min b^Ty+π^Tz

s.t. f_1j(y)−z_j ≤0, j= 1, . . . , p, y∈Y₁∩Y₂.

(3)

(6)

The new problem is an embedding of the original problem in anRⁿ×R^p dimensional space.

Polyhedral relaxation schemes are based on approximating the nonsmooth components of (1) by increasingly refined polyhedral sets. We intersect the set withY2and we add the constraint that the objective achieves at least the best recorded value. If he best value is achieved at ¯y the constraint takes the form

b^Ty+π^Tz≤θ¯=b^Ty¯+f1(¯y),

whereπ^Tzis a surrogate for the nonsmooth objective. The set we described is named theset of localization. It is formally written

Lθ¯={(y, z)|A^Ty−E^Tz≤c, b^Ty+π^Tz≤θ, y¯ ∈Y₂}. (4) The matrix A is made of the subgradients of f; it has n rows (the dimension of y) and m columns. In that definitionE is a boolean matrix that is constructed as follows: a 1 in rowi and columnj indicates that the j-th column ofAis subgradient of the i-th component off₁ in (2). A zero indicates that the cut is not associated with that component. If columnj is made of zeros exclusively, thej column ofAis associated with a feasibility cut. In this way, the rows of E are 2 by 2 orthogonal, a feature that may be exploited in the interior point scheme to be discussed in a later section.

Since Y₂ is made of linear constraints there is no loss of generality if we insert them as feasibility constraints in the expressionA^Ty−E^Tz≤c. Therefore, we shall no more mention Y₂ explicitly in the rest of the paper.

The basic step, or outer iteration, of a cutting plane method can be summarized as follows.

1. Select a query point in the set of localization.

2. Send the query point to the oracle and retrieve the information.

3. Update the lower and upper bounds and the set of localization.

4. Test termination.

Solution methods differ

• in the the selection of the query point,

• in the updating of the localization set,

• and in the termination criterion.

The termination test is usually based on the relative gap. To compute the gap one needs upper and lower bounds for the objective function. The upper bound is given by the oracle:

it is the best objective function value obtained at previously generated query points. The computation of a valid lower bound is more involved. The best value would be the minimum of the objectiveb^Ty+π^Tzover the localization set. This minimum may not exist, in particular when the number of linear pieces in the polyhedral approximation of the set is smaller than the dimension of the space. Even when the minimum exists, it may not be a by-product of the method that computes the query point. One may not want to afford the extra computing time

(7)

to get that minimum. Nevertheless, it so happens that many methods still produce useful information in the form of dual variables associated with the localization set. We discuss here a general principle to exploit this fact.

We are interested in finding a lower bound to the functionf(y) =f1(y)+b^Ty. The nonsmooth functionf₁ is estimated by the polyhedral functionπ^Tzwithz satisfying A^Ty−E^Tz≤c. A lower bound is given by

θ^∗= min{b^Ty+π^Tz|A^Ty−E^Tz≤c}.

Since we may not way to solve this linear programming problem, we may be satisfied with a bound produced by a primal feasible solution. The dual (called primal here) is

maxx −c^Tx (5a)

s.t. Ax=−b, (5b)

Ex=π, (5c)

x≥0. (5d)

If a pointx^c is feasible to (5), then the value at x^c of the objective (5a) is a lower bound for θ^∗, that is

f(y) =b^Ty+f₁(y)≥θ^∗ ≥ −c^Tx^c.

In the analytic center cutting plane method to be presented in the next section, it turns out primal solutions (in the sense of problem (5)) satisfying (5b) approximately are a by- product of the computation of the analytic center. Let us explore what can be done with this approximation. More specifically, suppose

b+Ax^c = r ≈0, π−Ex^c = 0.

Note that it is assumed that the solution satisfies the second equation exactly. Due to the special structure of the matrix E, this equation can always be achieved by an appropriate scaling ofx.

Consider the perturbed problem

min{(b−r)^Ty+π^Tz|A^Ty−E^Tz≤c}

whose dual is

maxx {−c^Tx|Ax=r−b, Ex=π, x≥0.} (6) By constructionx^c is feasible to (6). By duality, we have the bound

f(y^∗)−r^Ty^∗ ≥ −c^Tx^c,

wherey^∗ is an optimal solution for the original problem. Alternatively, f(y^∗) ≥ −c^Tx^c+r^Ty^c+r^T(y^∗−y^c),

≥ −c^Tx^c+r^Ty^c− ||r||δ. (7) The last inequality follows from Cauchy-Schwarz andδ≥ ||y^∗−y^c||is an upper bound on the distance of the current pointy^c to the optimal set. The value δ has to be set by the user in accordance with the problem to be solved. We shall discuss this issue in the section dealing with implementation.

(8)

3 Proximal generalized analytic centers

We associate with the localization set a standard (weighted) logarithmic barrier F(s) =

m

X

i=0

F_i(s_i) =−

m

X

i=0

w_ilogs_i, withs >0 defined by

s0 = θ¯−(hf₂⁰(¯y), yi+π^Tz),

si = ci−(A^Ty−E^Tz)i, i∈I ={1, . . . , m}.

The barrier function is augmented with a proximal term to yield the augmented barrier G(y, z) = 1

2(y−y)¯ ^TQ(y−y) +¯ F(s) (8) whereQis a positive definite matrix and ¯y∈Rⁿ a point named the proximal center.

In the interior point literature, problems in which the nonnegativity constraints apply to the slacks of linear inequalities are considered to be in the dual format. Barrier functions of the typeG(y, z) are also referred to as dual potentials. We shall use this terminology, despite the confusing fact that our dual potential is applied to our primal problem (3).

The proximal generalized analytic center polyhedral method defines the query point for the nonsmooth oracle as they component of the solution of

min G(y, z) = ¹₂(y−y)¯ ^TQ(y−y)¯ − ^P^m

i=0

wilogsi

s.t. s₀+b^Ty+π^Tz= ¯θ,

si+ (A^Ty−E^Tz)i=ci, i∈I1={1, . . . , m}.

(9)

SinceGis a self-concordant function, it achieves a minimum value if it is bounded from below.

To prove boundedness, we construct the dual problem and show that it is feasible. To this end we introduce the Lagrangian associated with (9):

L(y, z, s, x) =G(y, z) +x0(b^Ty+π^Tz+s0−θ) +¯ x^T(A^Ty−E^Tz+s−c).

By duality

miny,z,smax

x L(y, z, s, x) = max

x min

y,z,sL(y, z, s, x).

The first order optimality conditions for the inner minimization problem in the right-hand side are

Q(y−y) +¯ bx₀+Ax = 0, πx₀−Ex = 0,

x_is_i = w_i, i= 0, . . . m.

Using these equations, we can eliminatey and z in the expression of the Lagrangian to get the potential

H(x0, x) = 1

2(bx0+Ax)^TQ⁻¹(bx0+Ax) + (c−A^Ty)¯ ^Tx+

(¯θ−b^Ty)x¯ 0−

m

X

i=0

wilogxi−

m

X

i=0

wi(1−logwi). (10)

(9)

and the primal problem

min{H(x₀, x)| −πx₀+Ex= 0}. (11)

The potential (11) is an augmented barrier studied in [25].

It is worth pointing out that the constraints in (11) are easy to satisfy. Recall that E is a special a p×m matrix, of the kind associated with generalized upper bounds in linear programming. To make things precise, let J_k, k = 1, . . . p be the set of indices such that (E)k,i= 1,i∈Jk. All other entries ofE are zeros, and the setsJkare nonempty and disjoint.

If Ex=x₀π, then

X

i∈J_k

x_i=x₀π_k, k = 1, . . . , p.

Given an arbitrary x >0, let ˆx be defined by ˆ

x₀ = x₀, (12)

ˆ

x_i = x₀π_k P

i∈J_kx_i x_i, i∈J_k, k= 1, . . . , p. (13) By construction Exˆ = x₀π. In the derivation of the primal method to be exposed in the next section we shall assume that the primal constraintsEx=x0π are always satisfied. This remark also proves that (11) is feasible; so (8) is bounded below. The analytic center is well-defined.

4 Newton’s method: primal version

To work with simpler formulas in Problem (11), we make the following substitutions:

(b, A)→A, (−π, E)→E, θ¯ c

!

→c and x0

x

!

→x. (14)

The first order optimality conditions of

min{H(x))|Ex= 0}

are

H⁰(x) +E^Tz = 0, (15)

Ex = 0, (16)

wherez∈R^p is some appropriate vector. We assume the second equation is always satisfied since it is always possible to scalex by (12) and (13). Assume now ˆr=−(H⁰(x) +E^Tz)6= 0 at the currentxand z. The Newton direction associated with this system solves

H⁰⁰(x)dx+E^Tdz = r,ˆ Edx = 0.

The solution is

dz = E[H⁰⁰(x)]⁻¹E^T⁻¹E[H⁰⁰(x)]⁻¹r,ˆ (17)

dx = [H⁰⁰(x)]⁻¹(ˆr−E^Tdz). (18)

(10)

The theory of self-concordant function tells us that the norm of the gradient of H in the metric of the projected² Hessian is the right proximity measure to the analytic center. This norm can be equivalently written as the scalar product of the gradientH⁰(x) with the Newton step dx. More precisely, whenH⁰(x)^Tdx≤η = ³⁻

√ 5

2 , then the full Newton step (α = 1) is feasible and the convergence becomes quadratic.

The step along the search direction (dx, dz) can be determined by solving the one-dimensional search

α_opt= arg minH(x+αdx). (19)

The Newton method can be summarized as

• Select an initial point (x⁰, z⁰).

• Basic iteration

1. Compute the Newton step (dx, dz) by (18) and (17).

2. If H⁰(x)^Tdx≤η, terminate.

3. Perform linesearch (19) and update (x, z).

4.1 Computation of the Newton direction

We recall that with our notation H(x) =1

2x^TA^TQ⁻¹Ax+ (c−A^Ty)¯ ^Tx−^Xw_ilogx_i−

m

X

i=0

w_i(1−logw_i).

Thus the first and second derivatives of H are

H⁰(x) = A^TQ⁻¹Ax+c−A^Ty¯−wx⁻¹, H⁰⁰(x) = A^TQ⁻¹A+W X⁻².

For practical computation, two cases are in order depending on the relative sizes of mand n.

(Recall that Ais an n×m matrix andm is the number of cuts.)

The first case occurs when m < n. The first component A^TQ⁻¹A in H⁰⁰(x) is fixed. It is computed at the first iteration only. Of course, H⁰⁰(x) changes as x changes from one iteration to the next, making the system different at each iteration. Yet, the fact that the first component does not change allows significant savings in computing time. In a cutting plane scheme, the matrix A is augmented with a matrix B of new cuts. The update of A^TQ⁻¹Ais still relatively simple since

(A, B)^TQ⁻¹(A, B) =

A^TQ⁻¹A A^TQ⁻¹B B^TQ⁻¹A B^TQ⁻¹B

.

The second case applies when m ≥ n. To control the dimension of the linear system to be solved, we use the Shermann-Morrison-Woodburry formula of rank correction

[A^TQ⁻¹A+W X⁻²]⁻¹=W⁻¹X²−W⁻¹X²A^T Q+AW⁻¹X²A^T⁻¹AW⁻¹X².

2The projection subspace isEx= 0.

(11)

The main task are to form the matrixQ+AW⁻¹X²A^T and to solve linear systems inu of the form

(Q+AW⁻¹X²A^T)u=d.

The product matrix AW⁻¹X²A^T changes at each iteration, as it is expected with an interior point method. The important fact is that the real dimension of the systems to be solved is n×n.

4.2 Retrieving the analytic center in the y-space

Let x be an approximate analytic center computed by the Newton scheme. At x the first optimality condition is close to be satisfied

A^TQ⁻¹Ax+c−A^Ty¯−wx⁻¹+E^Tz≈0.

Let us introduce the variables

y = y¯−Q⁻¹Ax, s = wx⁻¹>0.

Let z be the multiplier in (15). We check that first optimality condition (15) becomes A^Ty−E^Tz+s≈c,

where z is the dual variable in the computation of the Newton direction. Since s > 0, (y, z) belongs to the localization set if the above equation is satisfied accurately enough.

4.3 Lower bound

We are interested in generating a lower bound using the information generated in the computation of the Newton direction. Let (y^c, z^c, x^c₀, x^c) be the output of the computation of the generalized analytic center. To this end, we expand the compact notation (14) back into its original constituents.

Let

˜ x= x^c

x^c₀ = w(s^c)⁻¹ w₀(s^c₀)⁻¹ >0.

We have

π−Ex˜= 0.

Close to the analytic center, the first oder optimality condition Q(y^c−y) +¯ x^c₀b+Ax^c = 0

is almost satisfied. Lettingτ =Q(y^c−y) +¯ x^c₀b+Ax^c ≈0, we may use the bound (7) with r= _x¹c

0

τ −Q(y^c−y)¯ .

(12)

5 Lagrangian relaxation of the p-median problem

5.1 Problem definition

Givennpoints, one wishes to assign each one of them to exactly one out of pmedians. The medians have to be selected among the n points. There is a non-negative cost of assigning a point to a median. The objective is to minimize the total cost of the assignment. The problem is formulated as follows

min ^X

i,j

cijxij (20a)

X

i

xij = 1, j= 1, . . . n, (20b)

X

i

yi =p, (20c)

0≤xij ≤yi, j= 1, . . . n, i= 1, . . . n (20d)

y∈ {0,1}ⁿ. (20e)

The boolean variableyi indicates whether the point iis a median. It is easy to see that the continuous variablexij actually takes the values 0 or 1 only. A value 1 indicates that pointj is assigned to mediani.

Assumption 1 c_ij ≥0 and c_ii= 0.

One can relax (20b) and (20c) tolarger than or equal and lower than or equal inequalities.

5.2 Lagrangian relaxation

Upon relaxing (20b) and (20c), one obtain the Lagrangian relaxation L(x, y;u, v) =^X

ij

c_ijx_ij−^X

j

u_j(^X

i

x_ij−1) +v(^X

i

y_i−p),

and the domain ofx and y F =

n

Y

i=1

F_i =

n

Y

i=1

{(x_i,·, y_i)|0≤x_ij ≤y_i, j= 1, . . . , n, y_i∈ {0,1}}.

The Lagrangian dual problem is

z_{W L} = max

u, v≥0L(u, v), (21)

where

L(u, v) = min

(x,y)∈FL(x, y;u, v). (22)

Denote

M(x, y;u, v) =

n

X

i=1

Mi(x, y;u, v) =

n

X

i=1





n

X

j=1

(cij−uj)xij +vyi



,

(13)

and let

M(u, v) = min

(x,y)∈FM(x, y;u, v) =

n

X

i=1

min

(xi·,yi)∈F_iM_i(x, y;u, v)

!

. (23)

Thus

L(u, v) =M(u, v) +

n

X

j=1

uj−pv.

Finally, we can formulate (21) as

u,v≥0,zmax ( _n

X

i=1

uj−pv+z| −M(u, v) +z≤0 )

. (24)

The variable z can be viewed as the vertical coordinate in the space of the epigraph of

−M(u, v).

The function M(u, v) is the point-wise minimum of a set of linear functions. It is thus a concave piecewise linear function. The function value M(u, v) and an element of the anti- subgradient set is obtained by solving (23). This can be done by direct inspection. Indeed, let (¯u,v) be fixed. If¯ yi = 0, then set xij = 0 for all j = 1, . . . n. Otherwise choose xij as follows:

x_ij =

( 1 ifc_ij−u¯_j <0

0 otherwise. (25)

Thus, the minimum in (23) is achieved by setting y_i=

( 1 if −^P_j(c_ij −u¯_j)⁻+ ¯v <0

0 otherwise. (26)

From this relation one directly deduces the subgradient ofL.

5.3 Linear relaxation

Let us consider now the linear relaxation of the p-median z_L= min ^X

i,j

c_ijx_ij

s.t. ^X

i

x_ij = 1, j= 1, . . . n, X

i

y_i =p,

0≤x_ij ≤y_i, j= 1, . . . n, i= 1, . . . n 0≤y_i≤1

(27)

Theorem 1 The Lagrangian relaxation problem (22) and the linear relaxation problem (27) achieve the same optimal value, i.e.,z_L=z_{W L}.

Proof: Let us consider the Lagrangian relaxation of the linear relaxation (27). The definition of the Lagrangian dual L(u, v) is the same as (22) up to the definition of the set^e F that is now relaxed to

Fe =

n

Y

i=1

Fe_i =

n

Y

i=1

{(x_i,·, yi)|0≤xij ≤yi,≤1, j= 1, . . . , n}.

(14)

To compute L(u, v) we have to solve the same problem as (23) with^e Fe instead of F. It is easy to check that (25)–(26) is still an optimal solution. This shows that L(u, v) =^e L(u, v) and the proves the theorem.

The Lagrangian relaxation discussed so far is not the only conceivable one. For instance, one may want to relax (20b) alone and keep (20c) into the constraint set. This relaxation is potentially stronger (see [17]). Actually this is not so. This alternative relaxation can be shown to be equivalent to the linear relaxation. The proof is similar to the proof of Theorem 1, We briefly sketch it. In the so-called strong relaxation, the feasible set of the oracle is (x, y) ∈ F ∩ {y | ^Pⁿ_i=1yi ≤ p}. It is easy to show that the extreme points of that set do not contain non integer y’s. As a result the constraint y ∈ {0,1}ⁿ in F can be relaxed to y ∈ [0,1]ⁿ, as it is in the Lagrangian relaxation of the linear relaxation of (27). The proof ends as in Theorem 1.

6 Numerical results

The proximal analytic center cutting plane method based on the primal Newton’s scheme has been programmed in Matlab 6.5. We have applied it to solve p-median problems of varied configurations. The data for ap-median instance are essentially the cost matrix of assigning an item to a median. Since we are interested in testing the method, we tried to select “difficult”

problems, in particular problems with a lot of symmetries in the cost matrix. To this effect, we considered in a first test set cost matrices made out of the Euclidean distances³ between regularly spaced points on square. The second test set is made of plausibly more realistic problems with cost matrices equal to the distances in problems from the TSP library. The numerical experiments have been carried out on a laptop Pentium III 800Mhz and 256 Mb core memory.

The goals of the numerical experimentation are the following:

1. Check the ability of the Proximal ACCPM to solve the Lagrangian relaxation of p- median problems, in particular when the number of medians varies and when the size of the problem increases.

2. Compare the performance of Proximal ACCPM with the standard column generation scheme in which the query point is the ‘optimal’ point in the localization set, that is the optimal solution of a linear programming problem. The restricted master is solved either with an interior point algorithm, or with the Simplex method. The latter allows efficient warm start.

6.1 Implementation issues

Let us review our choice of parameters, and give a brief description of the heuristics used at various stages.

Oracle To save time and focus on the performance of Proximal ACCPM we programmed the oracle in C and made it a callable mex file. The cost matrix is kept in the core memory.

3In our experiments, the Euclidean distance is not rounded to the nearest integer.

(15)

No special care has been taken to save memory space, for instance by storing the triangular upper part of this symmetric matrix.

Starting point For each index i, we assign to ui the distance of ito its closest neighbor, namelyui = minj6=icij. The variablev, which is attached to the constraint on the number of medians and is is also known to be positive and is given the value 1.

Warm start In the space of thex variables, the new cuts generated by the oracle correspond to new columns and new variables. The warm start issue consists of finding efficient values for the new variables to initiate the Newton method. The general framework for efficient warm start has been set in [21] and studied in detail in [12]. In this implementation we used a heuristic rule based on the following idea. At the analytic center the complemen- tarity conditionxisi =wi holds. When new cuts are introduced the y analytic center remain feasible for the old cuts, but infeasible for the new ones. For the old cuts we consider that the associatedx variables are relevant starting values. For the new cuts, we argue that their associated constraint is likely to be as binding at the new analytic center as the most binding old constraints at the current analytic center. Since the most binding constraints are those with the smallest slacks, and thus the largestx, we assign to each new column a new variable with initial value equal to this largest x value. To enhance the geometric significance of the heuristic, we divide the slack by the norm of the cut to make it the Euclidean distance of the current point to the cut.

Centering condition We use the standard proximity measure for self-concordant functions, namely, the norm of the gradient⁴ in the metric of the inverse of the local Hessian. We use 1 as proximity criterion, a very large value that does not guarantee that the point is in the area of quadratic convergence, but which turns out to be quite efficient in practice.

Proximal center The initial proximal center is the starting point. The proximal center is updated to the query point whenever the oracle returns an objective function value that improves upon the best upper bound⁵.

Proximal coefficient The Qmatrix is the diagonal unit matrix, except for the last component that is set to 10⁻². The reason is as follows. This last term is associated with the constraint on the number of medians, a constraint that overweights the other. At each outer iteration, the matrix Q is multiplied by a factor ρ that is dynamically updated. The rule is that the coefficient is divided by a certain factor if the oracle produces a solution that improves upon the current upper bound (serious step), and multiplied by that same factor in the opposite case.

Weight on the objective The constraint of the localization set that bounds the objective has a special status: it points downward the direction of the epigraph variable, while all other cuts⁶are supporting hyperplanes that point upward. When the number of cuts increases, their total weight dominate in (9). The analytic center of the localization set tends to the analytic center of the slice of the localization passing through the proximal center. A reasonable strategy consists in assigning a weight to the objective cut equal to the total number of generated cuts. This strategy was implemented successfully in the early version of ACCPM.

It was not deemed necessary to enforce it in the base experiments reported here. However, it

4actually, the norm of the projected gradient, to account for the linear equality constraint.

5In the parlance of the bundle method, this corresponds to a so-calledserious step.

6We do not consider feasibility cuts in that discussion, since they are absent from the basic formulation of the Lagrangian relaxation of thep-median we work with.

(16)

was necessary to use this strategy to solve the most difficult instances of our test-bed.

Lower bound Recall that the canonical problem (1) is a minimization problem, but we apply the method to the maximization problem (21). In this paragraph we talk of lower bounds with respect to (1), thus implicitly of upper bounds for (21). Lower bounds are computed from (7). In this formula, all terms are well specified, but δ ≥ ||y^c −y^∗|| is left to the appreciation of the user. The quality of the bound in (7) strongly depends on the residualr, which we want to be as small as possible. Since the proximal term is an important component of it, we activate the computation of the lower bound when||Q(¯y−y^c)||<10⁻³. Then we set δ = 10 ||¯y−y^c||. This strategy has been tested to our satisfaction on other problems than those arising from the Lagrangian relaxation of thep-median problem.

Convergence criterion We uniformly chose a relative duality gap less than 10⁻⁶. This criterion is probably too demanding for practical applications in combinatorial optimization.

Since the purpose of our work is to study the behavior of the new solution method for solving the continuous optimization problem of maximizing the Lagrangian dual, we feel appropriate to impose a more severe convergence criterion.

Column aggregation Common sense asserts that ‘old’ cuts, or possibly ‘distant’ cuts, are not useful in the definition of good query points. Experience shows that it is not easy to derive a safe rule to eliminate cuts. Reasonable implementations may even lead to disasters.

We have adopted the following rule that seems to work well on a large number of problems.

We consider the Euclidean distance⁷ of a cut to the current analytic center and we divide it by the weight⁸ of the cut. The inverseξ of that quantity is close to the primal variable value times the norm of the cut. We compute the median of theξ’s , and we select the values that are less than a certain fraction of the median (in practice take value 0.5 for this fraction).

The selected cuts are then aggregated by convex combination. The coefficients of the convex combination are equal to the primal variables values scaled by the appropriate scaling factor to have them sum to 1. Finally, the new surrogate cut, is given a weight equal to the sum of the weights of the aggregated cuts. This aggregation/elimination process is repeated at iterations that are multiple of a certain factor. The factor is chosen to be a given fraction of the dimension of the space.

Disaggregation of the additive Lagrangian dual We have pointed out that the objective function in (21) is additive and we know from formulation (3) that this property can be exploited to insert a richer information in the localization set at each call of the oracle.

This approach is liable to decrease the number of calls to the oracle, and possibly speed up the process. It turns out that for our class of problems of interest, this strategy does not pay off in term of CPU time. Proximal ACCPM behaves better in the aggregate mode when the oracle returns only one global subgradient rather than a subgradient for each component of the objective. With the classical column generation scheme (Kelley’s method in the parlance of cutting planes or Dantzig-Wolfe decomposition in the parlance of column generation), the disaggregation approach improves performance.

7that is, the slack divided by the norm of the constraint coefficient vector.

8The weight roughly speaking represents the number of times a cut is repeated. Since a repeated cut is more repellent, the slack at the analytic center is larger than it would be if the cut were not repeated. This is the reason why we divide the distance by the weight.

(17)

PbID Best objective Relative Outer Cuts in Inner CPU % CPU value gap iterations the end iterations total in oracle GRID problems

100 1.198559E+02 9.67E-07 57 27 152 1.03 8.7

144 2.101109E+02 9.66E-07 134 34 348 2.66 10.1

196 3.326351E+02 5.09E-07 86 34 220 1.80 16.7

289 5.978656E+02 9.73E-07 101 47 231 3.20 17.8

400 9.722806E+02 5.84E-07 84 49 190 2.90 26.9

529 1.481605E+03 7.51E-07 164 62 398 11.66 23.5

676 2.139522E+03 9.90E-07 193 74 440 21.28 23.3

841 2.970793E+03 9.40E-07 232 98 561 34.94 26.0

1024 3.990278E+03 8.14E-07 234 112 553 48.46 28.1

1225 5.223107E+03 7.77E-07 338 127 822 93.55 29.5

1521 7.221080E+03 9.61E-07 348 146 902 132.51 32.9

1849 9.685930E+03 9.70E-07 417 193 1042 241.80 31.7

2025 1.109664E+04 9.84E-07 382 162 961 229.83 36.5

2304 1.347456E+04 9.80E-07 448 237 1111 370.66 34.3

2500 1.522810E+04 9.14E-07 440 248 1095 428.53 34.3

totals 3658 1650 9026 1624.81

TSP problems

120 2.156019E+03 8.48E-07 102 44 215 3.68 3.3

159 7.814660E+04 9.67E-08 86 35 184 1.75 9.7

225 8.096143E+03 6.32E-10 54 29 136 1.13 20.3

442 1.660452E+05 1.12E-09 82 37 207 3.33 29.8

574 1.415617E+05 4.55E-07 157 42 339 11.04 28.1

724 1.812206E+05 4.58E-07 207 40 449 19.71 31.6

1002 1.263326E+06 1.91E-09 80 39 189 9.85 44.0

1060 1.252142E+06 8.59E-09 112 50 257 16.38 42.4

∗1173 3.222846E+05 8.66E-07 1375 959 3906 3015.04 3.5

1332 2.023925E+04 3.40E-07 133 32 299 25.95 50.2

1432 8.502265E+05 3.02E-09 119 58 276 26.34 50.3

∗1748 2.990320E+06 7.69E-07 817 226 1763 531.05 26.0

∗1817 4.552057E+05 9.49E-07 1070 802 2303 1861.42 10.2

2103 6.889857E+05 9.96E-07 316 106 701 156.72 47.7

2152 5.438122E+05 1.04E-07 196 82 430 98.59 50.5

2319 1.395892E+06 8.38E-07 369 93 775 237.37 45.7

3038 1.213082E+06 2.72E-08 127 59 292 102.79 61.5

totals 5402 2733 12721 6122.14

∗Problems solved with the option of an increasing weight on the objective.

Table 1: Performance on problems with 10 medians

6.2 Numerical investigations with Proximal ACCPM

We first report detailed results on GRID and TSP problems with 10 medians. We activate the column aggregation/elimination procedure every dn/100eouter iterations, wherenis the number of variables in the Lagrangian dual. Table 1 displays the relevant information.

The three problems marked with an asterisk are notably more difficult. To solve them, we dynamically increased the weight on the objective cut to match at any time the total weight of all the other cuts.

We have reported in a column the number of cuts that are present at termination. As the problem size increases and the number of cuts is more important, the dominant operation in the inner iterations tends to be the solving ofp×plinear systems, wherepis the number of cuts:

the complexity of the operation grows with the cube of p. Therefore, elimination in the early outer iterations is not as efficient as elimination at later stages. It follows that the number of cuts that are present at termination is a mediocre indicator of the real computational effort.

Yet, it is interesting to know the number of linear pieces that are necessary to detect (near)

(18)

optimality.

From the tables we may draw two interesting conclusions. First, the average number of inner iterations per outer iteration is surprisingly stable on all problems. It is close to 2.2 for the TSP collection and 2.4 for the GRID collection. Even though our warm start procedure is purely heuristic and does not use the theory developed in [], it appears to be quite efficient.

The second noticeable fact is the magnitude of the oracle time share. It is rather large, despite the fact that the oracle is simple and written in C, while the rest of the method is complex and relies on Matlab to handle the great amount of matrix manipulations.

By and large the results with the GRID problems are more stable: the number of outer iterations and the total computation time grow quite regularly with the problem size. With the TSP problems, the situation is a bit more chaotic; in general those problems are rather easier to solve, but some instances are definitely more difficult.

The results in Table 1 correspond to a certain column elimination/aggregation rate. We have performed a more comprehensive study on the influence of the frequency in column aggregation/elimination. Since a full report of these experiments would be cumbersome, we focus on two factors: the total CPU and the total number of outer iterations (calls to the oracle). The summary is reported in Table 2. All problems in this series have 10 medians.

We have eliminated from the summary report 2 the three outliers TSP1173, TSP1748 and TSP1817, because the figures are not in the same range. However, the effect of column aggregation is absolutely comparable.

Our last investigation pertains to the impact of the number of medians. We chose two problems of average size and let the number of medians take different values. The results are reported in Table 3. The difficulty seems to increase as the number of medians increases.

We made a few more experiments to see the impact of dynamic weight on the objective cut.

We said that this strategy made it possible to solve the three difficult problems in the TSP collection. To see whether this strategy always pays off we applied it to two problem instances, the ‘easy’ TSP1060 and the ‘difficult’ TSP1173, and we let number of medians vary. In this experiment we bounded the maximum value of the proximal parameter by 1 instead of 1000.

The results are reported in Table 4. Setting the weight dynamically does not hurt much the solving of TSP1060, but it does not help either.

In branch and bound applications, the Lagrangian relaxation is mostly used to provide bounds to prune the nodes of the enumeration tree. The quality of the bound should at least match the integrality gap, but there is no clear reason why one should achieve a relative gap of the order we have used to test the efficiency of our method. Thus, it is relevant to study the performance of the algorithm when the precision on the relative gap is relaxed to 10⁻³. The results are reported on Table 5. The total computing time is divided by 10 on the GRID problems and by 15 on the TSP collection. By comparing objective function values, it is easy to check that the relative gap reported in Table 5 is about twice the true relative gap. It shows that our choice ofδ in (7) produces a safe lower bound estimate.

(19)

PbID Computing time Outer iterations

n n n/10 n/20 n/100 n/200 n n/10 n/20 n/100 n/200

GRID problems

100 0.71 0.81 0.73 1.03 0.61 42 52 50 57 42

144 1.03 1.24 1.19 2.66 1.07 55 64 70 134 55

196 1.64 1.43 1.68 1.80 1.61 69 70 82 86 69

289 3.24 2.44 2.88 3.20 3.38 88 85 97 101 118

400 4.38 3.93 3.40 2.90 3.60 85 83 85 84 105

529 9.92 9.00 7.55 11.66 9.81 117 119 119 164 151

676 25.11 21.1 20.02 21.28 69.3 166 174 177 193 500

841 31.91 26.24 27.28 34.94 30.46 165 164 181 232 202

1024 68.96 56.49 45.24 48.46 56.77 218 221 212 234 301

1225 127.84 101.77 99.19 93.55 87.74 271 282 290 338 313 1521 196.28 169.24 139.79 132.51 153.77 301 313 311 348 391 1849 390.55 304.20 265.24 241.80 241.65 382 382 398 417 421 2025 308.2 268.39 242.92 229.83 229.72 327 337 349 382 403 2304 690.87 554.63 461.87 370.66 388.57 450 448 458 448 471 2500 601.37 489.12 464.79 428.53 444.97 404 405 429 440 441 total 2462 2010 1784 1625 1723 3140 3199 3308 3658 3983 TSP problems

120 2.42 2.87 2.56 3.68 3.68 102 113 115 102 102

159 2.21 2.11 1.80 1.75 2.22 84 87 94 86 84

225 1.04 1.24 1.00 1.13 1.09 48 48 48 54 50

442 3.57 4.50 3.35 3.33 2.71 77 75 78 82 70

574 14.99 20.37 13.97 11.04 11.08 159 159 162 157 160

724 25.5 39.34 22.33 19.71 20.42 199 188 190 207 211

1002 11.04 11.18 10.44 9.85 10.53 77 77 77 80 87

1060 20.26 20.31 18.01 16.38 17.13 106 106 108 112 120

1332 33.95 33.99 31.47 25.95 23.54 126 126 126 133 125

1432 35.28 34.71 29.57 26.34 28.17 120 120 116 119 127

2103 214.13 247.87 188.25 156.72 131.54 290 293 294 316 282 2152 144.69 147.33 112.4 98.59 97.46 207 207 193 196 198 2319 346.02 399.13 306.16 237.37 253.6 354 355 369 369 426 3038 112.53 111.07 112.44 102.79 105.2 123 123 123 127 135

total 968 1076 854 715 708 2072 2077 2093 2140 2177

Table 2: Influence of column elimination/aggregation. Problems with 10 medians.