Report
Reference
The face simplex method in the cutting plane framework
BELTRAN, Cesar
Abstract
In order to maximize a piecewise affine concave function one can basically use the simplex method or an interior point method. As an alternative, we propose the face simplex method.
The vertex to vertex scheme of the simplex method is replaced by a more general face to face scheme in the face simplex method. To improve the current iterate, in the face simplex method, one computes the steepest ascent on the current face of the objective function graph and then an exact linesearch determines next iterate. This new procedure can be used in the cutting plane framework as a substitute of the simplex method. As a preliminary numerical test, this new version of the cutting plane method is compared with three other methods : subgradient, Kelley cutting plane and ACCPM.
BELTRAN, Cesar. The face simplex method in the cutting plane framework. 2003
Available at:
http://archive-ouverte.unige.ch/unige:5800
Disclaimer: layout of this document may differ from the published version.
The face simplex method in the cutting plane framework
Cesar Beltran
April11, 2003
Abstract
In order to maximize a piecewise ane concave function one can basically use the simplex method or an interior point method. As an alternative, we propose the face simplex method. The vertex to
vertex scheme of the simplex method is replaced by a more general
facetofacescheme in the face simplex method. To improve the current iterate, in the face simplex method, one computes the steepest ascent
onthecurrentface of the objective function graph and then an exact linesearch determines next iterate. This new procedure can be used in the cutting plane framework as a substitute of the simplex method.
As a preliminary numerical test, this new version of the cutting plane method is compared with three other methods: subgradient, Kelley cutting plane and ACCPM.
Key words. Nonsmooth optimization, cutting plane methods, Lagrangian relaxation, subgradient method, steepest ascent method.
1 Introduction
In smooth unconstrained optimization the gradient method and the Newton method represent the two basic optimization philosophies. If we wish to maximize the smooth function
f
s, as is well know, in the gradient method one chooses the steepest direction and picks up the best point within this direction. At each iteration the gradient method uses only a slice of the graph off
s, as the information source to compute the step length. On the other hand, the Newton method, rst, constructsg
s, which is a second order model off
s based at the current iterate and then moves to the maximizer ofg
s. Therefore, the dierence between the two methods is based on the level of information aboutf
s used at each iteration. As expected, low level of information methods such as the gradient method produce optimization algorithms with low computation burden but may suer from slow convergence. With the high level of information methods, such as the Newton method, the situation is symmetrical: high rate of convergence but with a high computational burden. The choice among a low or a high level of information method is problem dependent.cesar.beltran@hec.unige.ch, Logilab, HEC, University of Geneva, Switzerland. This work has been partially supported by the Fonds National Suisse de la Recherche Scientique, subsidy FNS 1214- 057093,99, and by the Spanish government, MCYT subsidy dpi2002-03330
In nonsmooth unconstrained optimization we can also distinguish between low and high level of information methods. In the low level of information family we nd the subgra- dient method and the steepest ascent method, the nonsmooth version of the gradient method. In the high level of information family, we nd the cutting plane method. If we wish to maximize the nonsmooth function
f
ns, the cutting plane method constructs and updates at each iterationg
ns, a model off
ns based on rst order information (cutting planes). Dierent types of cutting plane methods arise depending on how the method de- termines the next iterate based ong
ns. In Kelley's cutting plane method Kel60, WG59]the next iterate is taken as the maximizer of
g
ns. Somehow, we could say that Kelley's cutting plane method is the nonsmooth version of the Newton method.A more sophisticated and eective version of the cutting plane method is the proximal bundle method HUL96, Kiw88, Fra02], in which a quadratic penalty term is appended to the linear objective of the cutting plane method in order to stabilize its performance.
The other improved version of the cutting plane method is the analytic center cutting plane method (ACCPM) GV99a, GV99c, GV99b], in which the analytic center of the localization set (the hypograph of
g
ns plus an hyperplane to bound the hypograph) gives the next iterate. In practice, cutting plane methods have a good convergence behavior, with the exception of Kelley's version. However, for large scale problems they may be computationally demanding due to the high amount of information managed at each iteration.When applied to the maximization of the function
f
ns, the subgradient method Sho69, Pol69, Sho98], at each iteration, tries to improve the current iterate (in terms of the dis- tance to the optimal set), by following the direction given by a subgradient off
ns. If in theory, the subgradient method has a guaranteed convergence to the optimum, in prac- tice it has no clear stopping criterion which ensures optimality. The other drawback of the subgradient method is that the subgradient direction may not be an ascending direc- tion. The steepest ascent method HUL96] overcomes this handicap of the subgradient method by computing the minimum norm subgradient, which indeed, is an ascending direction. Computing the minimum norm subgradient may not be easy (it depends on the knowledge of@f
ns), so the steepest ascent method has so far not been a very successful method. Nevertheless, some successful direction following heuristic methods Erl78, BH02] or problem depending ascent methods CC90] have been proposed.The aim of this paper is to introduce the face simplex method and to use it in the cutting plane framework. The objective of the face simplex method is to maximize a (concave) piecewise ane function. Therefore, a direct use of the face simplex method is to maximize the piecewise ane function
g
ns which arises in the cutting plane frame- work. The philosophy of the face simplex method is to keep as much as possible the computational lightness of the direction following methods while using the rst order information utilized by the cutting plane methods. Employed as a direction following method, the face simplex method rst computes the steepest ascent on the current face of the graph ofg
ns (a concave piecewise ane function) and second, the next iterate is computed by an exact linesearch. Employed in the cutting plane framework, the face simplex method replaces the simplex method as the tool to maximizeg
ns. However, at each iteration of the cutting plane method, the face simplex method is truncated before reaching a Kelley's point (a point which maximizesg
ns). Apparently the points generated in this truncated way remain in a region centered around the optimal set asx1
q(x)
x2
Figure 1: The pyramid function graph.
it happens with the bundle method and with ACCPM.
In the following example we introduce the face simplex method from an intuitive point of view. The algebra of the face simplex method is developed in section 3, together with its stopping criteria. In section 4, the face simplex method is proposed as a substitute of the simplex method within the cutting plane framework. Finally, section 5 presents the rst computational experiences with the face simplex method in the cutting plane framework and its performance when compared to the subgradient method, Kelley cutting plane method and ACCPM.
Given
x
2Rnandz
2R, to lighten notation we will write (xz
) instead of;xz2Rn+1.2 Example
To introduce the face simplex method we use
q
(x
), a pyramid shaped function. Let us dene the following vectors:v
1 = (;1;1)0v
2 = (1;1)0v
3 = (11)0v
4 = (;11)0, the ane functionsq
1(x
) =v
10x
= ;x
1;x
2q
2(x
) =v
20x
=x
1;x
2q
3(x
) =v
30x
=x
1 +x
2q
4(x
) =v
40x
= ;x
1 +x
2 and the index setI
= f1234g. The pyramid function is dened asq
(x
) :R2!Rwithq
(x
) = mini2Ifq
i(x
)g, whose graph is depicted in Figure 1. The pyramid problem is then:maxx2R2
q
(x
):
(1)Dierent versions of the subgradient method (
x
k+1=x
k+ks
k=
ks
kkwiths
k 2@q
(x
k)) arise depending on the chosen step length. Thus the basic subgradient method takes k =c=
(a
+bk
) Sho69, Sho98]. One way to improve the very slow convergence of the basic subgradient method is to use the Polyak subgradient method Pol69, Ber99] which takes k = k(^q
k ;q
(x
k))=
ks
kk, where ^q
k is an estimate of the optimum ofq
(x
) and k 2]02. In the pyramid problem, by takingk = 1=k
, ^q
k dened as in section 5.2 and with 50 iterations, we obtain a suboptimal solution:q
50=;3:
210;3 (it is not dicult to see that the optimum isq
= 0). The path that follow the Polyak subgradient iterates is depicted in Figure 2. Note that the Polyak subgradient method improves the basic subgradient method by incorporating dual information (^q
k;q
(x
k))=
ks
kk) to compute more accurate step lengths. However, the Polyak subgradient method requires an estimate of the optimum to compute ^q
k, which in general may not be easy to tune.−6 −5 −4 −3 −2 −1 0 1 2 x 10−3
−0.01 0 0.01 0.02 0.03 0.04 0.05
x1
x2
Figure 2: The Polyak subgradient path for the pyramid problem
p3
p
p
2
1
x1
q(x)
x2
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000
11111111 11111111 11111111 11111111 11111111 11111111 11111111 11111111 001111 0000 0000 0000 0000 0000
1111 1111 1111 1111 1111 11
00000000 00000000 00000000 00000000 00000000 0000
11111111 11111111 11111111 11111111 11111111 1111 00
11
00 0 11 1
0000 00 1111 11
Figure 3: The face simplex method improves the subgradient method.
To end this example, we solve the pyramid problem by the face simplex method. The subgradient method works in the
x
-space (R2for the pyramid problem), whereas the face simplex method works in the (xz
)-space (R3for the pyramid problem), wherez
=q
(x
).Let us consider
G
the graph ofq
(x
) whose shape is a pyramid and letp
1 = (x
1z
1)2G
(see Figure 3). Fromp
1, by following the steepest ascent on the current face ofG
we obtainp
2, the best point ofG
in this direction. Now the face associated top
2 is an edge, therefore the steepest ascent on the current face follows this edge. Finally, the best point in the current ascent direction isp
3, the best point of the pyramid. Ifp
3 = (x
3z
3) thenx
3 maximizesq
(x
) and the optimum isz
3 =q
(x
3).For the pyramid problem, we reach the optimum by using only two iterations of the face simplex method. The dierence with the Polyak subgradient method is that we do not use estimated information about
q
(x
), but have an exact knowledge of it. The dierence with the cutting plane method, is that now, at each iteration, we do not use all the available information aboutq
(x
), we only use restricted information (an exactlinesearch).
3 The face simplex method
The aim of the face simplex method is to maximize a concave piecewise ane function dened by a nite family of known ane functions. To be more precise, rst we dene the index set
I
=f1:::m
g, the family of ane functionsq
i :Rn!Rwithq
i(x
) =s
0ix
;b
ifor all
i
2I
and the piecewise ane functionq
(x
) = mini2Ifq
i(x
)g. Then the problem to be solved isxmax2Rn
q
(x
) (2)whose optimum is assumed nite.
A general method to solve this problem is the steepest ascent method (HUL96], Vol.
1, Chapter VIII). Given that a subgradient at a nonoptimal point
x
k may not be an ascent, the steepest ascent ensures an ascending direction by taking the steepest ascentd
k, which can be computed asd
k = argminf12k
s
k2 :s
2@q
(x
k)g:
(3) Then an exact linesearch alongd
k gives the step lengthk, which denes the next point asx
k+1 =x
k+kd
k. Unfortunately the steepest ascent method with exact linesearch may fail to converge (HUL96], Vol. 1, VIII.2.2). To ensure convergence of the steepest ascent method, in the case of a piecewise ane function, one can use the 'step-to- nearest-edge' linesearch (HUL96], Vol. 1, Theorem VIII.3.4.8). A second handicap of the steepest ascent method is its numerical instability, which may produce a large number of very small steps. In NBR00] these diculties are overcome by considering an outer approximation to the dierential@q
(x
k).As an alternative to the steepest ascent method, we propose the face simplex method.
To give a rst sketch of this method, we consider the graph of
q
(x
),G
= f(xz
) :x
2Rn
z
=q
(x
)gRn+1, a piecewise ane graph formed by the facesF
i, (i
= 1:::n
F) which may range from dimension 0 (vertices), dimension 1 (edges), up to dimensionn
;1 (facets), i.e.G
= Sni=1FF
i. We also deneF
k as the face of minimal dimension that containsp
k = (x
kz
k),A
k as the smallest ane space containingF
k andV
k as the vector space parallel toA
k.p
k is said to be an optimal point ofG
, if and only if,x
k is an optimizer ofq
(x
).The face simplex method ensures an ascending direction at
p
k by taking the face steepest ascent, that is, the steepest ascent onF
k, or equivalently, the direction inV
k with the steepest slope. It is not dicult to see that the face steepest ascent and the steepest ascent are dierent concepts. To see that, one can imagine that the current iteratep
k is on an edge ofG
, i.e. the dimension ofF
k is one. In such a case, we have only one choice for the face steepest ascent: it must be parallel to the one dimensional vector spaceV
k. On the other hand, the steepest ascent may be a dierent direction from the direction given byV
k, as can be seen in the example given in HUL96], Vol. 1, VIII.2.2.Furthermore, in the steepest ascent method one have to solve problem (3), which amounts to solving a quadratic programming problem if
@q
(x
k) is polyhedral, as isthe case for a piecewise ane function. In contrast, with the face simplex method one can develop an explicit and easy to compute formula to solve the associated quadratic programming problem (see section 3.1). This can be done by computing the search direction on the graph of the objective function.
Note that if the current point
p
k is a nonoptimal vertex ofG
, thenF
k has dimension zero and it makes no sense to compute the face steepest ascent on it. In this case, the steepest adjacent edge top
k will dene the search direction for the face simplex method (vertex steepest ascent ).Face simplex algorithm (sketch)
Step 1 Initialization.] Set
k
= 1 and initializep
1= (x
1z
1)2G
.Step 2 Stopping criteria.] If
p
k fullls any of the stopping criteria, stop.Step 3 Direction: case 1.] If dim(
V
k)>
0 then computed
kas the steepest direction withinV
k.Step 4 Direction: case 2.] If dim(
V
k) = 0 then computed
k as the parallel direction to the steepest adjacent edge top
k.Step 5 Step length.] Compute
k by exact linesearch alongd
k.Step 6 Updating.] Compute next iterate
p
k+1 = (x
k+1z
k+1), wherex
k+1 =x
k + kd
k,z
k+1=q
(x
k+1). Setk
k
+ 1 and go back to step 2.In the following sections we develop the algebra associated to the above face simplex method. We distinguish two cases:
3.1 First case: search direction on a face of positive dimension
In this section we wish to compute the face steepest ascent at a point of
G
which is neither an optimal point nor a vertex. As usual, ifq
(x
) = mini2Ifq
i(x
)g, then the hyperplane associated to the functionq
i(x
) =s
0ix
;b
iis said to be active atp
k = (x
kz
k) ifq
i(x
k) =q
(x
k). In this case,@q
(x
k) 3s
i is said to be active atx
k. The faceF
k associated top
k is dened by the intersection of all the active hyperplanes atp
k. IfI
k is the index-set of active hyperplanes atp
k, i.e.I
k =fi
:q
i(x
k) =q
(x
k)g, then the smallest ane subspace that containsF
k isA
k =fp
= (xz
)2Rn+1:z
=q
i(x
)i
2I
kg. Sincez
=q
i(x
),z
=s
0ix
;b
i,b
i= (s
0i;1)x z
,
b
i= (s
0i;1)p
the associated vector space parallel to
A
k isV
k = fv
2 Rn+1 : (s
0i;1)v
= 0i
2I
kg. Computing the face steepest ascent onF
k is equivalent to computing the steepest direction withinV
k. For anyv
= (v
1:::v
n+1) dening the steepest direction inV
k, the nonoptimality ofp
k ensures thatv
n+1 6= 0. Consequently, we can usew
= (1=v
n+1)v
=: (w
1:::w
n1) to dene the steepest direction inV
k. That is, in order to nd the steepest direction withinV
k, we can xv
n+1= 1 and restrict our search within the ane subspaceW
k =fw
: (w
1)2V
kg=fw
2Rn:s
0iw
= 1i
2I
kg.The slope of (
w
1)2V
k is 1=
kw
k. Thus looking for the vector of maximum slope inV
k (the steepest ascent) is equivalent to looking forw
2W
k of minimum norm. Therefore, the problem of nding the face steepest ascentd
k onF
k turns out to be equivalent to the problemw
k = argminf12k
w
k2:w
2W
kg= argminf12k
w
k2:s
0iw
= 1i
2I
kg:
(4) If we dene matrixS
= (s
1:::s
m) wherem
=jI
kj(without loss of generality we can assume that the active hyperplanes correspond to the rstm
hyperplanes) and vectore
0 = (1:::
1) 2 Rm, problem (4) can be recast asw
k = argminf12kw
k2 :S
0w
=e
g, whose rst order optimality conditions are given by the linear system on (w
):w
+S
= 0S
0w
;e
= 0:
(5)From the rst equation in (5)
w
=;S
, which substituted in the second equation gives k =;(S
0S
);1e
. We can writew
k =S
(S
0S
);1e
.So far we have assumed that
S
0S
has complete rank and therefore (S
0S
);1 exists. Sup- pose now thatS
0S
has incomplete rank. IfW
k 6= , problem (4) is well-posed and the optimality conditions (5) must have a solution independently of the rank ofS
0S
. A solution of (5) can be computed ask =;(S
0S
)+e w
k =S
(S
0S
)+e
, where (S
0S
)+ denotes the pseudo-inverse matrix ofS
0S
. (The pseudo-inverse of them
n
matrixA
, denoted byA
+, is the uniquen
m
matrix such thatx
=A
+b
is the vector of minimum Euclidean length that minimizes kb
;Ax
k2, see GMW95], Sec. 2.2.5.6). The above discussion is summarized in the following proposition.Proposition 3.1
Letp
k be a non optimal point ofG
,F
k its associated face,S
= (s
1:::s
m) the matrix of active subgradients atp
k ande
0 = (1:::
1). IfF
k is not a vertex ofG
, then the face steepest ascent onF
k is given by (w
k1), wherew
k =S
(S
0S
);1e
ifS
0S
has full rank and byw
k =S
(S
0S
)+e
otherwise.3.2 Second case: search direction at a non optimal vertex
In the second case we compute the vertex steepest ascent at a non optimal vertex
p
k. To do this, we adapt the algebra of the simplex method to our case. Any point (xz
)2G
satises
z
q
1(x
)z
q
2(x
)::::::::::::
,z
q
m(x
):
(6)Furthermore, considering that
q
i(x
) =s
0ix
;b
i, there must exist a vector of slacksy
0 such thatb
1 =s
01x
;z
;y
1b
2 =s
02x
;z
;y
2:::::::::::::::::::::
,b
m =s
0mx
;z
;y
m:
(7)Equivalently
0
B
B
B
@
s
01 ;1 ;1s
02 ;1 ;1... ... ...
s
0m ;1 ;11
C
C
C
A 0
@
x z y
1
A=
0
B
B
B
@
b
1b
2b
...m1
C
C
C
A
:
(8)For convenience we express the matrix of the above linear system as
A
=
V
;I
n+1W
;I
l (9)where
V
=0
B
@
s
01 ;1 ... ...s
0n+1 ;11
C
A
W
=0
B
@
s
0n+2 ;1 ... ...s
0m ;11
C
A
(10)I
n is then
n
identity matrix andl
=m
;(n
+ 1).As in the simplex method, we partition matrix
A
intoB
andN
, i.e.A
= (B N
) withB
=
V
0W
;I
l andN
=
;
I
n+10
:
(11)Next we present a simple result to be used later.
Lemma 3.1
IfV
has full rankn
+ 1 then:a)
B
also has full rankm
.b)
B
;1 =
V
;1 0WV
;1 ;I
l:
(12)c)
B
;1N
=;
V
;1WV
;1:
(13)Proof. a) Given the triangular block structure of
B
we have thatjdet(
B
)j=jdet(V
)jjdet(;I
l)j =jdet(V
)j:
b) and c) have a direct proof. 2
If the current point (
xz
) is a vertex ofG
then the number of active hyperplanes (y
i=0) is at leastn
+1. We taken
+1 of them, say the rstn
+1, to constructV
and the otherl
hyperplanes to constructW
. Even if some of the hyperplanes inW
are active, from now on we will use the term active only for hyperplanes inV
and the term non active for hyperplanes inW
. In what follows we use the notationv
(i
) to denote componenti
of vectorv
.To continue our analysis we express the linear system (8) as (
B N
)0
B
B
@
x z y
By
N1
C
C
A=
b
(14)where
y
B 0 are the slacks corresponding to the non active hyperplanes at the cur- rent point (non active slacks) andy
N = 0 are the slacks corresponding to the active hyperplanes (active slacks). From (14)B
0
@
x z y
B1
A+
Ny
N =b
)0
@
x z y
B1
A=
B
;1b
;B
;1Ny
N:
(15) Now we denec
B as then
+ 1 canonical vector of Rm, i.e.c
B(i
) = 0 fori
6=n
+ 1 andc
B(n
+ 1) = 1. We also denec
N as 0n, the zero vector of Rnand nally, we dene the cost vectorc
0= (c
0Bc
0N). With this cost vectorz
can be expressed as:z
=c
00
B
B
@
x z y
By
N1
C
C
A=
c
0B0
@
x z y
B1
A
(16)since
c
N = 0n+1.From (15) and (16), we obtain
z
as a function ofy
N:z
(y
N) =c
0BB
;1b
;c
0BB
;1Ny
N (17) When moving from the current pointp
k = (x
kz
k) (a vertex) top
k+1 = (x
k+1z
k+1) at least one active hyperplane associated toq
i0(x
) will become non active, i.e.y
N(i
0) will increase from 0 to a strictly positive value. The variation ofz
caused by this variation ofy
N is studied in the following proposition.Proposition 3.2
Letp
k = (x
kz
k) be a vertex ofG
the graph ofq
(x
),V
as in (10) andy
N the active slacks atp
k. Thenz
(y
N) =z
k +nX+1j=1
V
n;1+1jy
N(j
):
(18)Proof. From (17)
z
(y
N) =c
0BB
;1b
;c
0BB
;1Ny
NNow, considering that
z
k =z
(0) =c
0BB
;1b
and lemma 3.1 c), we can writez
(y
N) =z
k;c
BB
;1Ny
N=
z
k+ (00n100l)
V
;1WV
;1y
N=
z
k+V
n;1+1y
N =z
k +Pnj=1+1V
n;1+1jy
N(j
):
Next, with the elements of this section, we derive the search direction designed to im- prove the current point (vertex). To do so, let us dene
dz
j as @z@y(yN)N
(j)(0)
which is the partial derivative ofz
(y
N) respect toy
N(j
)(0) aty
N(j
) = 0 (j
= 1:::n
+ 1). From proposition 3.2 we have thatdz
j =V
n;1+1j (j
= 1:::n
+ 1) and considering thaty
N 0, it is clear that ifdz
0 we cannot improve the current value ofz
k and therefore the current pointp
k (a vertex) is optimal. Otherwise, there must existdz
j>
0. DeningJ
1 = fj
:dz
j>
0g 6=, we can improvez
k by increasing anyy
N(j
0)j
0 2J
1. When, for a particularj
0 2J
1,y
N(j
0) is increased from 0 to a strictly positive value,y
B must remain nonnegative.y
B has two components:y
B 1 = 0 andy
B 2>
0. When leaving the current point we must ensure thaty
B 1 remains positive. From (15)y
B 1(y
N) =B
B 1;1b
;(B
;1N
)B 1y
Nwhere notation
B
B 1;1 denotes the rows ofB
;1 corresponding toy
B 1. Given thaty
B 1(0) =B
;1B 1b
and by hypothesisy
B 1(0) = 0, it turns out thatB
B 1;1b
= 0 and theny
B 1(y
N) = 0;(B
;1N
)B 1y
N Finally, by applying lemma 3.1 c) we obtain:y
B 1(y
N) = (WV
;1)B 1y
N:
Let
Dy
B 1 = (WV
;1)B 1, formed by the column vectorsDy
B 1(j
)j
= 1:::n
+1. Given thaty
N(j
0) 0, to ensurey
B 1 0 we can only use positive columns ofDy
B 1. The index-set of such columns isJ
2 =fj
:Dy
B 1(j
)0g.The idea to derive the search direction is to increase only the active slack
y
N(j
0), which improves the most functionz
(y
N) but restraining our choice ofj
0 among the positive columns ofDy
B 1, that is,j
0 must be inJ
1\J
2. To be more precise, letdz
j0 = arg maxj2J1\J2f
dz
jgand rewrite (15) as
0
@
x
(y
N)z
(y
N)y
B(y
N)1
A=
B
;1b
;B
;1Ny
N =0
@
x
kz
ky
k1
A+
V
;1WV
;1y
N:
Taking into account that the new vector
y
N0 is equal to (0:::y
N(j
0):::
0) it followsthat
x
(y
N)z
(y
N) =
x
kz
k +V
;1j0y
N(j
0) from wherex
(y
N) =x
k+y
N(j
0)0
B
@
V
1;1j0V
nj...;101
C
A =
x
k+kd
k:
Note that so far we have only computed the search direction(
d
k)0= (V
1;1j0:::V
nj;10):
The step length
k =y
N(j
0) will be calculated by exact line search alongd
k. We summarize the above discussion in next proposition.Proposition 3.3
Letp
k = (x
kz
k) be a non optimal vertex ofG
the graph ofq
(x
).Then
d
k, the vertex steepest ascent fromp
k, can be computed asd
k =0
B
@
V
1;1j0V
nj...;101
C
A
whereV
=0
B
@
s
01 ;1 ... ...s
0n+1 ;11
C
A
j
0 is such thatdz
j0 = argmaxj2J1\J2fdz
jg,dz
j =V
n;1+1j,J
1 =fj
:dz
j>
0g,J
2 =fj
:Dy
B 1(j
)0g,Dy
B 1 = (WV
;1)B 1 andy
B 1 are the null non active slacks.Proof. At vertex
p
k we haven
+ 1 active hyperplanes with the associated null slacksy
N = 0. Above, in the current section, we have seen that we leavep
k by increasingy
N(j
0) while the othern
active hyperplanes remain active, that is, followingd
k we move along the edge ofG
associated toy
N(j
0). We have also seen that we choosej
0such thatdz
j0 = arg maxj2J1\J2f
dz
jgwhere
dz
j =@z
(0)=@y
N(j
), i.e. we choose the edge with the steepest slope 2 In this section we have seen how the computation of the vertex steepest ascent has been inspired by the algebra of the simplex method. However, an important dierence with the simplex method must be pointed out. In the simplex method iterates move from vertex to vertex of graphG
. In the face simplex method iterates move from a face to another face of graphG
, since once the search direction has been computed, next iterate is computed by exact line search. For example, assuming that the simplex and the steepest ascent methods are in the same vertex, and also assuming that the two methods compute the same improving direction (the vertex steepest ascent), the simplex method stops at the next vertex, whereas the steepest ascent method stops at the best encountered point onG
by following the improving direction.3.3 Stopping criteria
The face simplex method uses three dierent criteria that guarantee optimality of the current point
p
k = (x
kz
k).Null subgradient stopping criterion
: A null subgradient ensures optimality, that is, if@q
(x
k)3s
k = 0 thenx
k is optimal (HUL00] Theorem D.2.2.1).Vertex stopping criterion
: The following stopping criterion is tested whenever the current pointp
k is a vertex.Proposition 3.4
Letp
k = (x
kz
k) be a vertex ofG
,V
as in (10). IfV
n;1+1j 0j
= 1:::n
+ 1, thenx
is an optimal point ofq
(x
).Proof. This result is clear from Proposition 3.2, where we saw that
z
(y
N) =z
k +nX+1j=1
V
n;1+1jy
N(j
)that is,
z
(y
N)z
k for ally
N 0 considering that by hypothesisV
n;1+1j 0j
=1
:::n
+ 1. 2Decient Rank stopping criterion
: The following stopping criterion is applied to any pointp
k = (x
kz
k) ofG
.Proposition 3.5
Let fs
1:::s
mg be the set of active subgradients, which dene the active hyperplanes at the current pointp
k,S
=;s
1s
mS
e=
s
1s
m;1 ;1
and
V
k =fv
2Rn+1: (s
0i;1)v
= 0i
= 1:::m
g=fv
2Rn+1:S
e0v
= 0g the vector space associated tox
k. If rank(S
)<
rank(S
e), thenx
k maximizesq
(x
).Proof. Let us suppose that rank(
S
e) =m
. First let us see thate
z?V
k wheree
0z = (00n1).By hypothesis rank(
S
)< m
)9v
^6= 0 :S
^v
= 0)
s
1s
m;1 ;1
v
^=0
;
Pni=1+1
v
^i)
S
e^v
=Ke
z, whereK
=;Pni=1+1v
^i)
Sv
e =e
z wherev
= ^v=K
.Note that
K
cannot be 0, otherwiseS
ev
^= 0 with ^v
6= 0 which contradicts the fact thatS
ehas full rank.If
w
2V
k, by denition ofV
k we haveS
e0w
= 0. Thene
0zw
=v
0S
e0w
=v
00 = 0 withv
6= 0 )e
z?V
k.Considering that problem (2) is equivalent to the linear programming problem that maximizes
z
=q
(x
) on the graph ofq
(x
),e
z?V
k implies thatp
k is optimal.If rank(
S
e) =p < m
we can repeat the proof by deningS
andS
e fromp
linearlyindependent columns of
S
e. 23.4 The face simplex algorithm
Section 3 can be summarized in the following algorithm.
Face simplex algorithm
Step 1 Initialization.] Set
k
= 1 and initializep
12G
.Step 2 Null subgradient stopping criterion.] If
p
k fullls the null subgradient stop- ping criterion,p
k is optimal. Stop.Step 3 Decient rank stopping criterion.] Form the matrix of active subgradients at
p
k,S
= (s
1:::s
m) andS
e:S
e=
s
1s
m;1 ;1
If If rank(S
)<
rank(S
e),p
k is optimal. Stop.Step 4 Direction: case 1.] If rank(
S
e)< n
+ 1 then compute the search directiond
k =S
(S
0S
);1e
ifS
has full rank and byd
k =S
(S
0S
)+e
otherwise. Go to step 7.Step 5 Vertex stopping criterion.] If rank(
S
e)n
+ 1 deneV
as in (10), computedz
j =V
n;1+1jj
= 1:::n
+ 1 and deneJ
1 =fj
:dz
j>
0g. IfJ
1 is empty, thenp
k is optimal. Stop.Step 6 Direction: case 2.] Compute the search direction
d
k: DeneW
as in (10) and as in section 3.2, letDy
B 1 = (WV
;1)B 1 be the rows ofWV
;1associated to the null non active slacks (y
B 1 = 0).Dy
B 1 is formed by the column vectorsDy
B 1(j
)j
= 1:::n
+ 1. DeneJ
2=fj
:Dy
B 1(j
)0g and computedz
j0 = arg maxj2J1\J2f
dz
jg:
Thend
k =0
B
@
V
1;1j0V
nj...;101
C
A
:
Step 7 Step length.] Compute
k by exact linesearch alongd
k.Step 8 Updating.] Compute next iterate
p
k+1 = (x
k+1z
k+1), wherex
k+1 =x
k + kd
k,z
k+1=q
(x
k+1). Setk
k
+ 1 and go back to step 2.Note that in step 4, by using next lemma we can check rank(
S
) to know whether (S
0S
);1 exists.Lemma 3.2
LetS
an
m
matrix.S
has full rank, if and only if,S
0S
has full rank.Proof. First, let us see that if
S
has full rank, thenS
0S
also has full rank. Let us considerv
6= 0. Then, sinceS
has full rank,Sv
6= 0. On the other hand, ifS
0Sv
= 0 and deningw
=Sv
, we would haveS
0w
= 0 withw
6= 0, which contradicts the fact thatS
0 has full rank (rank(S
) = rank(S
0)).Second, let us see that if
S
0S
has full rank, thenS
also has full rank.Sv
= 0 impliesS
0Sv
= 0. Considering thatS
0S
has full rank, thenv
must be equal to 0, which provesthat