Defining the Direction

2.1 Descent and Steepest-Descent Directions

The next definition is motivated by our care to decrease

f

at each iteration: we want 3t > 0 such that f(Xk

+

^td)^<^f(Xk). ^(2.1.1)

o Definition 2.1.1 A descent direction issued from x for the continuously differentiable

f

is ad E lRn such that (recall our notation s = 'il f)

(S(X), d} < O. (2.1.2)

o

Clearly, if

f

is a fixed given function, there may exist directions which satisfy the natural property (2.1.1) but not (2.1.2) - think of f(x) :=

-lIxlI

²at x = 0 : every d =1= 0 satisfies (2.1.1) and no d satisfies (2.1.2). Definition 2.1.1 then appears somewhat artificial. However, one should remember Remark 1.2.3 and the rules of the game for numerical optimization: if

f

is an arbitrary function compatible with the known information f(x) and sex) at a given fixed x, then (2.1.2) is the only chance to obtain (2.1.1):

Proposition 2.1.2 Let the triple {Xk, ft. Sk} be given in lRⁿx lR x lRⁿand consider the set offUnctions

l/Jk := {f differentiable at Xk : f(Xk) =

ik,

'il f(Xk) = Sk} . Then, d E lRⁿsatisfies (2·1.1)for any f E l/Jk

if

and only

if

(Sb d) < O.

2 Defining the Direction 55 PROOF. [if] Take d with (Sb d) < 0 and I arbitrary in <Pk; then

I(Xk

+

td) = I(Xk)

+

t(Sk' d}

+

oCt) and it suffices to take t > 0 small enough to obtain (2.1.1).

[only if] Take d with (Sk, d) ~ 0 and j E <Pk defined by

jo

fk +

(Sk' . - Xk) ; then, there holds

"It ~ 0 j(Xk

+

^td)⁼^fcXk)

+

^{t(Sb d)}^~^{j(Xk) ,}

so this d cannot satisfy (2.1.1) for all I E <Pk.

o

Thus, Definition 2.1.1 does appear as the relevant concept for a descent direction.

It implies more than (2.1.1), namely

31> 0 such that I(x

+

td) < I(x) for all t E ]0, i]

and this makes sense since, in optimization algorithms, the move from Xk to Xk+l -and hence tk - is usually quite small (remember Fig. 1.3.1).

A descent direction in the sense of Definition 2.1.1 is one along which not only does

I

decrease, but it does so at a non-negligible rate, i.e. the decrease in

I

is proportional to the move from x. This rate of decrease, precisely, is the number (s(x), d), the directional derivative of I at x in the direction d (see Remark 1.4.1.4).

It is the derivative at 0 of the univariate function t ~

I

+

^t^d)and it measures the above-mentioned progress that is made locally when moving away from x in the direction d. Then it is a natural idea to choose d so as to make this number as negative as possible, a concept which we now make precise:

Definition 2.1.3 Let ~I

. III

be a norm on JRn. A normalized steepest-descent direction of

I

at x, associated with

III . m,

is a solution of the problem

min{(s(x),d} :

mdlll =

^I}. ^(2.1.3)

A non-normalized steepest-descent direction is ad =f:. 0 such that IIIdlll-1d is a

nor-malized steepest-descent direction. 0

Problem (2.1.3) does have optimal solutions because the (continuous) function (s(x),·) attains its minimum on the (compact) boundary of the unit ball; it may have several solutions (see §2.2 below). To characterize these solutions, the results of Chap. VII are needed. For our present purpose, however, it suffices to display them graphically, which is done on Fig. 2.1.1: for given K E JR, the locus of those d having (s (x), d) = K is an affine hyperplane DK orthogonal to S (x); the optimal solutions are obtained for K as small as possible, i.e. when DK is as far as possible in the direction of -sex), yet touching the unit ball.

Fig.2.1.1. Homothety in the steepest-descent problem

Remark 2.1.4 Figure 2.1.1 displays the need for a normalization in Definition 2.l.3: without its constraint, problem (2.1.3) would have no solution (or, rather, a solution "at infinity", with a directional derivative "equal to -00") and the concept would not make sense. A norm m . m, which does not have to be the Euclidean norm

II . II = (', .)

1/2, must be specified when speaking of a steepest-descent direction.

This implies also the artificial introduction ofthe number 1 in (2.1.3). It should be noted, however, that the particular value "1" is irrelevant, as far as a steepest-descent direction is of interest, regardless of its length. Looking at Fig. 2.1.1 with different glasses, we observe that collinear solutions are obtained if, K being kept fixed, say K = -1, the radius of the unit ball is changed so as to become as small as possible yet touching D_I. In other words, (2.1.3) and

min{lildl: (s(x),d)

=

^-I} ^(2.1.4)

have collinear solutions. This property is due to homothety in Fig. 2.1.1: the functions (s(x), .) and HI· m are positively homogeneous of degree 1. This remark explains the important property that replacing "1" by K > 0 in (2.l.3) or (2.1.4) would just multiply the set of optimal solutions by ^K.Within a descent algorithm, this multiplication would be cancelled out by the line-search and, finally, the only important definition for our purpose is that of non-normalized

(steepest-descent) directions. 0

The choice of the norm in (2.1.3) or (2.1.4) is of fundamental importance for practical efficiency, and we will divide our study into two parts, according to this choice. Afterwards, we will study the conjugate-gradient method, which is based on a different principle.

2.2 First-Order Methods

The first possibility for the norm in (2.1.3) is a choice a priori, independent of

f.

Classically, there are two such choices: the il norm and the Euclidean norm.

(a) One Coordinate at a Time The il norm is

Mdll

Idb

L

^Id

ⁱ

^I

^(2.2.1)

i=1

(here and in what follows, lRn is assumed to have a basis, in which zi is the i^lhcoordinate of a vector Z, and the natural dot-product is used). Figure 2.2.1 particularizes Fig. 2.1.1

2 Defining the Direction 57

Fig. 2.2.1. An ll-steepest-descent direction

to this norm. It clearly indicates the following characterization of an optimal dk (which will be confirmed in Chap. VII): let ik be an index such that

lik(Xk)

I ^~

^Isi(Xk)1 fori=I, ... ,n

(note that there may be several such it, just choose one); then the n numbers

dk ⁼ I ^~

_Isik(Xk)^sik(Xk)

_I

îfîf ⁱⁱ⁼¹⁼⁼ît.îk ^(2.2.2)

make up an optimal direction. In other words: among the solutions of (2.1.3) with the II norm (2.2.1), there is one of the vectors of the basis ofll~n (neglecting its sign, chosen so as to obtain a descent direction), namely one corresponding to a maximal coordinate of the gradient.

Remark 2.2.1 Under these conditions, xk+ 1 is obtained from Xk by changing only one coor-dinate, namely one which locally changes f most. The resulting scheme has an interpretation in terms of a traditional method, the method of Gauss-Seidel, which we briefly describe now.

To solve the linear system

Qx +b =0, (2.2.3)

this method consists of choosing at iteration k one ofthe equations, say the

iJeh,

and of solving this equation with respect to the single variable xik, the other "unknowns" x^j being set to the (known) coordinates of the current Xk. In other words, the whole vector Xk+1 is just Xb

except for its i~ coordinate, which is set to the value a E IR solving

Dans le document Wissenschaften 305 A Series (Page 69-72)

f

+

f

o

f