• Aucun résultat trouvé

Second-Order Differentiation

Dans le document Wissenschaften 305 A Series (Page 44-51)

fk(y) ;;:, fk(x)

+

Sk(Y - x) for all Y E lR

(a technical point is that, since the limit

f

is finite at x by assumption, then necessarily fk(X) is also finite for k large enough).

Counter-examples to the converse inclusion in (4.3.2) are known even in classical differential calculus, for instance x 1-+ fk(X) = Jx 2

+

11k: when k -+

+00,

fk converges (even uniformly) to

Ixl

and

Dfk(O)

==

0 -+ 0 E [-I, +1] = 0(1·1)(0).

We conclude this section with a simple example: taking I to be a convex function and C a closed interval included in int dom f, consider the minimization problem

inf {f(x) : x E C} . (4.3.3)

With the help of the indicator function of Example 3.2.4, it can be transformed to the obviously equivalent problem:

inf {g(x) : x E lR}, where g

=

I

+

Ie,

in which the constraint is hidden; formally, it suffices to study unconstrained problems. Fur-thermore, (4.1.8) tells us that this in tum is equivalent to finding x such that 0 E

a

g(x), which can be further expressed as:

-00 ~ D_g(x) ~ 0 ~ D+g(x) ~

+

00,

or also, in terms of the directional derivative:

g' (x, d) ~ 0 for all d or for d

= ±

1 .

Existence of such a solution is linked to the behaviour of g(x) when Ix I -+ 00, see §2.3.

We just mention a result emerging from the continuity properties of the half-derivatives: if there existxl andx2 with Xl ~ x2 andD+g(xl) ~ 0, D _g(X2) ~ 0, then there exists a solution in [Xl, X2].

Our assumption C C int dom I enables the use of Proposition 4.3.1: the subdifferential aIc(x) is clearly empty for x ¢ C, {OJ for x E int C, and] - 00,0] (resp. [0, +oo[) for x on the left (resp. right) endpoint of C. It is then easy to characterize an optimal solution: x solves (4.3.3) if and only ifit satisfies one of the three properties:

- either x E int C and 0 E a/(x);

- or x is the left endpoint of C and D+/(x) ~ 0;

- or x is the right endpoint of C and D _ I (x) ~

o.

5 Second-Order Differentiation

First-order differentiation of a convex function

f

results in the increasing derivatives

D_fO and D+fO - or, in a condensed way, in the increasing multifunction of.

In view of Lebesgue's differentiation theorem (§A.6), a convex function is therefore

"twice differentiable almost everywhere", giving way to some sort of second deriva-tives. The behaviour of such second derivatives, however, is much less pleasant than that of first derivatives. In a word, anything can happen: they can oscillate, or approach infinity anywhere; their only certain property is nonnegativity.

5.1 The Second Derivative of a Convex Function

First of all, we specify what we mean by "twice differentiability" for a convex function.

Definition 5.1.1 Let

I

E Conv lR. We say that the multifunction al is differentiable at x E int dom

I

when

(i) al(x) is a singleton {D/(x)} (which is thus the usual derivative of

I

at x), and (ii) there is a real number Dzf(x) such that

al(x

+

h) - D/(x)

=

{Dzf(x)} ,

lim ,

h-+o

i.e.: "Ie> 0,38> 0 such that

Ihl :::;

8 ands E al(x

+

h) implies

Is -

D/(x) - Dzf(x)hl :::; elhl.

(5.1.1)

(5.1.2)

o

Putting s on each endpoint of al(x

+

h) in (5.1.2), one sees that differentiability of

a

I implies the usual differentiability ofD _ I and D+ I at x. Conversely, it is not too difficult to see via Theorem 4.2.1 that differentiability ofD _

I

implies differentiability of al at x, and ofD+1 as well. In a word: differentiability at x of the multifunction

ai,

or ofD_I, or ofD+I, are three equivalent properties.

Note that this differentiability does not force af to be single-valued in a neighborhood of x: indeed, the af of Example 4.1.9 is differentiable at 0, with Dzf(O) = 1. Geometrically, the multifunction af is differentiable when it is as displayed in Fig. 5.1.1: all the possible curves h 1-+ s(h) E af(x

+

h) have the same tangent, of equation s(h) = Df(x)

+

D2f(x)h.

df(x+h)

Df(x)

h Fig.S.l.l. Allowed values for a differentiable multifunction

In real analysis, a function

I

has a second derivative i at x if D/(x

+

h) - D/(x)

h has a limit i for h --+ 0;

this means that D

I

has a first-order development near x:

D/(x

+

h) = D/(x)

+

ih

+

o(lhl).

Then

I

itself has a second-order development near x:

(5.1.3)

5 Second-Order Differentiation 31 f(x + h) = f(x) + Df(x)h + 4lh2 + 0(h2). (5.1.4) Conversely, it is generally not true that a second-order development of

f

implies the existence of a second derivative. In the convex case, however, equivalence is obtained if the differentiability definition (5.1.1) is used as a substitute for (5.1.3):

Theorem 5.1.2 Let f E Conv lR and x E int dom f. Then the two statements below are equivalent:

(i) af is differentiable at x in the sense of (5.1.1);

(ii) f has a second-order development (5.1.4) at x with l = D2!(x).

PROOF. [(i)::::} (ii)] Given c > 0, take Ihl so small that, for all lui :::;; Ihl, -clul :::;; sex + u) - Df(x) - D2!(x)u :::;; clul.

Integrate from 0 to h to obtain with (4.2.6):

it(x

+

h) - f(x) - Df(x)h - 4D2!(X)h21 :::;; 4ch2 .

[(ii) ::::} (i)] Fix e arbitrarily in ]0,1[; develop f(x + h) and f(x + eh) according to (5.1.4) and obtain by subtraction

f(x + h) - f(x + eh) = (l - e)Df(x)h + 4l(1 - e2)h2 + 0(h2).

From the mean-value theorem 4.2.4, thereiscbetweenx+h andx+eh,ands E af(c) such that

f(x

+

h) - f(x

+

eh)

s = - - - -

(1 - e)h

Applying the definition (5.1.4) to f(x + h) and f(x + eh), we therefore get s

=

Df(x) + 4l(1 + e)h + o(h).

Now apply the monotonicity property (4.2.1): assuming for example h > 0, af(x

+

eh) :::;; s :::;; af(x

+

h) (5.1.5)

so that we obtain

af(x

+

eh) - Df(x) s - Df(x) 1 1

+

e o(h)

eh :::;; eh

=

2l -e-

+ h

af(x

+

h) - Df(x) s - Df(x) 1 o(h)

h ~ h = 2l(l

+

e)

+ h .

If h < 0, inequalities are reversed in (5.1.5) but the division by h reproduces the same last two inequalities.

Finally, let h -Y 0 (e is still fixed):

lim sup af(x + h) - Df(x)

h-+o h

liminf af(x + h) - Df(x)

h-+o h

1 1 +e ::( l

-"" 2

e

~ 4l(l

+

e).

These inequalities are valid for all () E ]0. 1[, hence we have really proved that the

lim sup and the lim inf are both equal to i. 0

Note that the equivalence with the usual second derivative still does not hold:

Example 4.1.9 is differentiable in the sense of Theorem 5.1.2 but not in the sense of (5.1.3), since Df(x

+

h) does not even exist in the neighborhood ofO. On the other hand, the property (5.1.1) appears as a suitable adaptation of (5. 1.3) to the case ofa

"set-valued derivative"; therefore we agree to postulate Definition 5.1.1 as the second differentiability of a convex function. It is clear, for example from (4.1.7), that

Dd

is a nonnegative number whenever it exists. Lebesgue's differentiation theorem now says:

Theorem 5.1.3 A function

f

E Conv lR is twice differentiable almost everywhere on

the interior of its domain. 0

Unfortunately, this kind of second differentiability result does not help much in terms of

f.

Consider for example a piecewise affine function:

f(x):=max{Sjx-rj: j=I •...• m}.

It has first and second derivatives except at a a finite number of points (those where two different affine pieces meet, see Theorem 4.3.2). Its second derivative is 0 wherever it exists, but yet

f

differs substantially from being affine.

Remark 5.1.4 The derivative D f of f E Conv IR is locally integrable on the interior of its domain I; as such. it can be seen as a distribution on I: why not consider its differentiation in the sense of distributions. then? A second derivative of f would be obtained, which would be a nonnegative Radon measure; for example. the second derivative of I . I would be the Dirac measure at 0: the piecewise affine f above would be reconstructed with the sole help of this second derivative.

However, this approach is blind to sets of zero-measure; as such. it does not help much in optimization, where one is definitely interested in a designated point (the optimum): for this purpose, a pointwise differentiation is in order. 0

5.2 One-Sided Second Derivatives

In Definition 5.1.1, existence of the usual first derivative is required at x, so as to control the difference quotient (5.1.1). However, we can get rid of this limitation; in fact, if h ..j, 0, say, Theorem 4.2. 1 (iii) tells us that [8f(x

+

h) - D+f(x)]/ h is the appropriate difference quotient - and the situation is symmetric for h

t o.

The way is open to "half-second derivatives". From now on, it is convenient to switch to the directional notation of Remark 4.1.4: for given x E int dom f, we fix d =1= 0 and we set h

=

td, t > O. We make appropriate substitutions in (5.1.1), (5.1.3) and (5.1.4) to obtain respectively

1. 8f(x

+

td) - f'(x, d)

1m •

t~o t (5.2.1)

5 Second-Order Differentiation 33

Vs(t) E [D_f(x

+

td), D+f(x

+

td)], I. s(t)-f'(x,d)

1m ,

t,!.o t (5.2.2)

I. f(x

+

td) - f(x) - tf'(x, d)

1m .

t.J,.o 1/2 t2 (5.2.3)

As before, the definitions (5.2.1) and (5.2.2) are just equivalent: if one of the limits exists, the other two exist as well and are the same; this is the so-called point of view of Dini. As for (5.2.3) (the point of view of de la Vallee-Poussin), equivalence also holds:

Theorem 5.2.1 Ifone of the limits in (5.2.1)-(5.2.3) exists and is denoted by f"(x, d) (';3 0), then the other limits exist as well and are equal to f" (x, d).

PROOF. Just reproduce the proof of Theorem 5.1.2, without bothering with the sign

ofh. 0

To illustrate what has been gained in passing from §5.1 to §5.2, take Example 4.1.9 and modify cp by settingcp(u) = 0 for u ~ O. Then the new

I

has the two "half-second derivatives"

1"(0, -1)

=

0 and 1"(0,1)

=

1.

Still, the limits in (5.2.1) - (5.2.2) may fail to exist, for two possible reasons: the difference quotients may go to

+00,

as in

I

(x

+

td) = t3/ 2 , or they may have several cluster points. Take again Example 4.1.9: al(O+t) is squeezed between the curves s

=

t ands

=

t/(l +t), which are tangent to each other at O. If cp is modified so that this second curve becomes s

=

1/21,

say, then the set of cluster points in the difference quotient (5.2.1) blows up to the segment [1/2, 1].

Remark 5.2.2 (Interpretation of Second Difference Quotients) For fixed x andd, consider the family of parabolas of equations indexed by c ';3 0:

t'l-+pcCt')=~ct'2+sot'+f(x), with so=f'(x,d). (5.2.4) They are constructed in such a way that Pc(O)

=

f(x) and p~(O)

=

f'(x, d).

Now, fix t > 0 and compute c so as to fit either the slope-value p~(t) = set) or the function-value Pc(t)

=

f(x

+

td). In the first case, c is given by the difference quotient in (5.2.2) and in the second case by the difference quotient in (5.2.3). Both difference quotients thus appear as an estimate of the "curvature" of

f

at x in the

direction d. 0

5.3 How to Recognize a Convex Function

Given a function defined on an intervall, the question is now: can we decide whether it is convex on I or not? The answer depends on how much information is available:

about the function itself, about its first derivatives (possibly one-sided), or about its second derivatives (or some sort of generalization). We review here the main criteria that are useful in optimization.

(a) Using the Function Itself Many criteria exist, relying on the definition of

1

and nothing more. Some of them are rather involved, most of them are of little relevance in the context of optimization. The most useful attitude is generally to view

1

as being constructed from other functions known to be convex, via operations such as those of §2.1 - and others to be seen in Chap. N.

At this stage, the criterion 1.1.4 of increasing slopes should not be forgotten:

1

is convex if and only if the function

Lltf(x, x') := I(x) - I(x')

x -x' (5.3.1)

defined for pairs of different points in I, is increasing in each of its arguments. Note, however, that LlI

1

is a symmetric function of its two variables; hence it suffices that LlI

1

(x, .) be increasing for each x.

As seen in §3.1, convexity of Ion I

=

[a, b] implies its upper semi-continuity at a and b. Conversely, if

1

is convex on int I, and upper semi-continuous (relative to l) on the boundary of I, then

1

is convex on I: just pass to the limit in (1.1.1). We will therefore content ourselves with checking the convexity of a given function on an open interval. Then, checking convexity on the closure of that interval will reduce to a study of continuity, usually much easier.

(b) Using the First Derivative Passing to the limit in Lltf of(5.3.1), one obtains the following result:

Theorem 5.3.1 Let

1

be continuous on an open interval I and possess an increasing right-derivative, or an increasing left-derivative, on I. Then

1

is convex on I.

PROOF. Assume that

1

has an increasing right-derivative D+I. For x, x' in I with x < x' and U E]X, x'[, there holds

I(u) - I(x)

~

sup D+/(t)

~

inf D+/(t)

~ I(X'~

- I(u)

u - x te]x,u[ telu,x'[ x - u

(the first and last inequalities come from mean-value theorems - in inequality form - for continuous functions admitting right-derivatives). Then (1.1.1) is obtained via a multiplication by x' - x > 0, knowing that u

=

ax

+

(1 - a)x' for some a E ]0, 1[.

The proof for D _

1

is just the same. 0

Corollary 5.3.2 Assume that

1

is differentiable on I with an increasing derivative

on an open interval I. Then

1

is convex on I. 0

(c) Using the Second Derivative To begin with, an immediate consequence of Corollary 5.3.2 is the following well-known criterion, by far the most useful of all, even though second differentiability is required:

Theorem 5.3.3 Assume that

1

is twice differentiable on an open interval I, and its second derivative is nonnegative on I. Then

1

is convex on I. 0

5 Second-Order Differentiation 35 To illustrate a combination of Theorems 5.3.1 and 5.3.3, assume for example that f is

"piecewise C2 with increasing slopes", namely: there is a subdivision Xo

=

a < XI < '" <

Xk

=

b of I =]a, b[ such that:

- f is continuous on I,

- f is of class C2 and Dz! ;;:: 0 on each subinterval ]Xi -I, Xi [, i = 1, ... , k, - f has one-sided derivatives at XI, ... , Xk-I satisfying

D_f(Xi) !(D+f(Xi) fori = 1, ... ,k-l.

Then f is convex on I.

In the absence of second differentiability, some sort of substitute is required to determine convexity. Translating to second order the criterion using the (symmetric) function 11

If

of (5.3.1), we obtain: f is convex if and only if I1zf is nonnegative on I x I x I, where

112f(x, x', x") := , 1 " [f(X) - f(x') - ---=...-:.-....:.. f(x) - f(X")]

X - X X - x, x - x"

is defined for all triples of different points x, x', x" in I. Note that I1zf is symmetric in its three variables.

Letting x' and x" tend to x in 112

I,

just as was done with 111

I,

one can get an analogue to Theorem 5.3.l. One must be careful when letting x' and x" converge, however: consider

I( ) .- .

{I 2

+

I 2 }

X ~ X.- mm iX x'ix - x . (5.3.2)

Its half-second derivatives (5.2.1) are constantly 1, but it is not convex: when passing to the limit with x' and x", account must be taken of both sides of x. The "Schwarz second derivative", for example, does the job by taking x - x'

=

x" - x:

Af() I' f(x-t)-2f(x)+f(x+t)

L.l2 x := Imsup 2

t~O t (5.3.3)

We obtain the second derivative of f at x if there is one; the counter-example (5.3 .2) has Lfzf(O) = -00, and is thus eliminated. When

f

is convex,

Lfzf(x) ~ 0 for all x E I. (5.3.4) This condition turns out to be sufficient if combined with the continuity of

I:

Theorem 5.3.4 Assume that f is continuous on the open interval I and that (5.3.4) holds. Then

I

is convex on I.

PROOF. Take a and b in I with a < b, a E ]0, 1[ and set x := aa

+

(1 - a)b. We

have to prove the "mean-value inequality"

f(x) :!( f(a)

+

feb) - f(a) (x - a). (5.3.5)

We take

feb) - f(a)

g(x) := f(x) - f(a) - (x - a) , b-a

and we prove g ~ 0 on la, b[. We have g(a) = g(b) = 0 and, since f and g differ by an affine function, .12g

=

.12f.

Suppose first

.12g(x)

=

~f(x) > 0 for all x E la, b[. (5.3.6) We claim that g is then nonpositive on]a, b[: if such were not the case, the continuous g would assume its maximal value at some x* E la, b[ and the relation

g(x* - t) - 2g(x*)

+

g(x*

+

t) < 0 for all t small enough would contradict (5.3.6). Thus (5.3.5) is proved.

Now define !k(x) := f(x)

+

l/kx2. If (5.3.4) holds, .1dk is positive on la, b[

and, from the first part of the proof, !k is convex. Its pointwise limit

f

is therefore

convex (Proposition 2.2.1). 0

Remark 5.3.5 With relation to Remark 5.2.2, observe that the difference quotient in (5.3.3) represents one more "curvature" estimate. Let So be free in (5.2.4) and force Pc to coincide with f atx, x - t, x

+

t: we again obtain c = i12f(x, x - t, x

+

t). 0

Dans le document Wissenschaften 305 A Series (Page 44-51)