• Aucun résultat trouvé

First Steps into the Theory of Conjugate Functions

Dans le document Wissenschaften 305 A Series (Page 51-62)

On several occasions, we have encountered the conjugate jUnction of f, defined by R 3 S H- f*(s) := sup {sx - f(x) : x E dom f} . (6.0.1) Because sx is a finite number, we can let x run through the whole ofR, and of course this does not change the supremum: insteadof(6.0.1), we may as well write the simpler form

f*(s)

=

sup [sx - f(x)]. (6.0.2)

XER

Remark 6.0.1 Some comments are in order with respect to Remark 1.3.3. What is actually computed in (6.0.2) is

inf[f(x) - sX],

x (6.0.3)

a number which is certainly not

+00.

As a result, its opposite f*(s) is in our space of interest R U {+oo}. Furthermore, this opposite will be seen to behave as a convex function of s (already here, remember Proposition 2.1.2).

Indeed, one should realize that (6.0.1), (6.0.2) - or (6.0.3) - actually means

!*(s) = sup {sx - r : (x, r) E epi f}, (6.0.4)

6 First Steps into the Theory of Conjugate Functions 37

-r

",

""

"", S

.... ---_

..

')

)

,,.,,

"

"",

Fig.6.0.1. Constructing a conjugate function

and this last writing has two advantages: first, it suppresses the "-I(x)" operation;

and more importantly, it interprets the conjugacy operation as the supremum of a linear function [is (x, r) := sx - r] over a closed convex set ofJR.2 • We will return later (Chapters IV and X) to this aspect; considering that (6.0.4) is rather heavy, the versions (6.0.1) or (6.0.2) are generally preferable, and will be generally preferred.

o

We retain from (6.0.4) the geometrical interpretation displayed in Fig. 6.0.1: for given s and r, consider the affine function as ,r defined by

JR. 3 x t-+ as,r(x) = sx - r

and the corresponding line gr as ,r in R.2 . Due to the geometry of an epigraph, there are two kinds ofr for givens: those, small enough, such thatas,r ~ I; and those so large that as,r(x) > I(x) for some x. The particular r = f*(s) is their common bound, obtained when the line gr as ,r "leans" on epi

f,

or supports epi I. For the particular value s = 0, we obtain

- 1*(0) = inf {f(x) : x E JR.}. (6.0.5) Figure 6.0.1 displays the set for which f* is finite; and this set depends exclusively on the behaviour of

1

at infinity, which therefore plays an important role for the determination of dom f* (remember §2.3). On the other hand, let Xo E dom 1 and choose sEal (xo); then the corresponding "optimal" line supports gr 1 at (xo, 1 (xo)) , so that I*(s) = sXo - I(xo) for such an s.

Examples 6.0.2 For each f E Conv lR considered below, we give the corresponding conju-gate function

/*.

Draw the graph of

/*

in each case.

- f(x)

=

Ixl: then

/*

is the indicator function 1[-1.+1]; more simply, f(x)

=

sx gives

/*

= lIs)'

- f(x)

=

(l/p)lxIP, with p > 1: then /*(s) = (l/q)lsIQ, with l/p+ l/q = l.Inparticular, f*

=

f ifp =2.

- f(x) = x log x if x > 0, +00 ifnot: then /*(s) = exps - 1 for all s E lR.

- f(x)

=

-.Jl=X2iflxl:::;; 1,

+00

ifnot(the ball-pen function of Fig. 2.1.1): then !*(s)

=

~. 0

It is important to realize that the argument s, which f* depends on, is a slope, i.e. strictly speaking an element of the dual ofR When taking again the conjugate of f*, one goes back to the primal and the result is the biconjugate function of J:

J**(x) := (f*)*(x) = sup {sx - J*(s) : s E dom J*}.

For illustration, compute the biconjugates in the examples above.

The transformation J ~ J* is (the one-dimensional version of) the so-called Fenchel correspondence, and is closely related to the Legendre transform. In view of its importance for a deep understanding of the properties of a convex function, we are going to explore step by step some basic results about it.

6.1 Basic Properties ofthe Conjugate

First of all, the very definition (6.0.1) directly implies the relation

sx ~ J(x)

+

J*(s) for all x E dom J and all s E dom f* , (6.1.1) called the Young-Fenchel inequality (which, incidentally, holds for all s and x!).

Proposition 6.1.1 Let J E ConvR Then

- the conjugate of J is a closed convex function (J* E Conv JR.), - the biconjugate of J is its closure (f**

=

cl J).

PROOF. The function f* takes its values in JR. U {+oo} by construction. Its domain is nonempty, see Remark 4.1.7. Then, its convexity and closedness result from Propo-sition 3.3.2.

Now, use the form (6.0.4) to define J**:

J**(x) = sup{sx - r r;::: J*(s)}. (6.1.2) s,r

By definition of f*, to say r ;::: f* (s) is to say that, for all y E dom J, r ;::: sy - J(y), i.e. sy - r ~ J(y).

In other words, (6.1.2) can be written

J**(x)=sup{sx-r: sy-r~J(y) forallYEdomJL s,r

in which we recognize the expression (3.2.4) of cl

J. o

When conjugating a function

J,

one considers the set of all affine functions minorizing it. As mentioned in Remark 3.2.6, this is also the set of all affine functions minorizing cl f. It follows that

J

and cl

J

have the same conjugate: from now on, we may assume that the convex

J

is closed, this will be good enough. Then the relation J** = J, established in Proposition 6.1.1, shows that the Legendre-Fenchel transformation is an involution in Conv R This is confirmed by the next result, in which we have also an involution between s and x via the solution-set of(6.0.1).

6 First Steps into the Theory of Conjugate Functions 39 Proposition 6.1.2 Let

I

E Conv R Then

sx

=

I(x)

+

I*(s) ifand only if x E doml and s E al(x);

s E al(x) ifandonlyif x E af*(s).

PROOF. We have that

- f*(s) = inf (I(x) - sx : x E JR} .

(6.1.3) (6.1.4)

The function gs : x f-+ I (x) - sx, which is in Conv JR, achieves its infimum at x if and only if 0 E ag(x) - see (4.1.8) - i.e.

-I*(s) = I(x) -sx ifandonlyifs E a/(x).

This implies s E dom f* and can be written as (6.1.3). Applying this same result to f* (which is closed), we obtain

x E af*(s) if and only if sx = f*(s)

+

I**(x),

which is again (6.1.3) since f** = I. D

What (6.1.3) says is that the pairs (x, s) E JR2 for which the inequality of Young-Fenchel (6.1.1) holds as an equality fonn exactly the graphofal. In view of(6.1.4), the mapping al* is obtained by inverting the mapping ai, i.e. reflecting its graph across the line of equation s = x: see Fig. 6.1.1, and remember the increasing property (4.2.1).

s=x

grat*

Fig.6.1.1. The symmetry between 8j and 8j*

Remark 6.1.3 The above inversion property suggests a way of computing a conju-gate which may be useful: "differentiate" I to obtain al; then invert the result and integrate it to obtain

1*

up to a constant. As an exercise, compute graphically the

conjugate of 1/2X 2

+

Ixl. D

6.2 Differentiation of the Conjugate

The question we address in this section is: what differentiability properties can be expected for j*, perhaps requiring from j something more than mere convexity?

Let So E int dom j* and consider the statement j* is differentiable at So . According to Proposition 6.1.2, it means

there is a unique solution to the "equation" (in x) aj(x) 3 so, (6.2.1) which in turn relies on the key property

aj is "strictly increasing" on its domain, (6.2.2) in the sense that aj(Xl) < aj(X2) whenever Xl < X2. As is easily checked, this last property is equivalent to

j is strictly convex. (6.2.3)

Thus, we have:

Proposition 6.2.1 Let j be strictly convex. Then j* is differentiable on the interior of its domain and,for all s E int dom j*,

Dj*(s) = x(s) where x (s) is the unique solution of

s E aj(x). or sx - j(x) = j*(s) • or min [f(x) - sx].

x

o

The converse to Proposition 6.2.1 is false: j* may be differentiable on the interior of its domain while j is not strictly convex. A counter-example is

{ ! x2 if Ix

I

~ I •

j(1)(x):=

IXI -

1/2 if Ixl

~

1. (6.2.4)

for which easy computations give

* s

= {

!S2

if

Is I

~ 1 •

j(1)() !s2+I[_I,+1)(S)=+oo iflsl > 1.

The only explanation is that (6.2.1) (assumed to hold for all So E intdom

/*)

does not imply (6.2.2). More precisely, two different Xl and X2 are allowed to give a nonempty intersection aj(Xl)

n

aj(X2) 3 so, provided that this So falls on the boundary of dom j*. Some additional assumption is necessary to rule this case out;

among other things, the following result illustrates further the involutional character of the Legendre-Fenchel transformation.

6 First Steps into the Theory of Conjugate Functions 41 Proposition 6.2.2 Let f : lR. -+ lR. be strictly convex, difforentiable, and I-coercive (f(x)/lxl -+ +oofor Ixl -+ (0). Then

(i) f* enjoys the same properties and, for all s E

R

(ii) there is a unique solution to the equation Df(x) = s, (iii) /*(s) = S(Df)-I(S) - f«Df)-IS ).

PROOF. We claim first that the I-coercivity assumption on

f

(which, according to (2.3.3), is equivalent to f~ (I)

=

f60 ( -1)

=

+(0) amounts to saying that

lim Df(x)

= -

lim Df(x)

=

+00.

x-++oo x-+-oo

In fact, for x > 0, (4.1.7) gives

Df(x) ~ f(x) - f(O) . x

When x -+ +00, the right-hand side goes to f6o(1) , so f~(1) = +00 implies D

f

(x) -+ +00. To prove the converse, let x -+ +00 in the inequalities

Df(x) ~ f(x

+

1) - f(x) ~ fbo(1) ,

which come from the property of increasing slopes. The same proof works for x -+

- 00 and establishes our claim.

Remembering the equivalence between (6.2.2) and (6.2.3), we therefore see that D

f

is a bijection from lR. onto lR. Its inverse (D f) -I = D

/*

is a bijection as well and

the whole result follows. 0

Example 6.2.3 The function f (x) = ch x satisfies the assumptions of Proposition 6.2.2:

D I(x) = shx, hence the inverse D I*(s) = (Sh)-I (s). We readily obtain I*(s)

=

s(sh)-I (s)

-,/1 +

s2,

which is an illustration of (iii). Among other things, the l-coercivity of the above function is

implied by (i), but could not be seen at first glance. 0

Consider now the problem of differentiating f* twice, which is (not unexpectedly) more complex. To get an idea of what can be expected and what is hopeless, we suggest meditating on the following examples.

Examples 6.2.4

(a) II

=

I ·1 is Coo in a neighborhood of an arbitrary Xo > O. Nevertheless,

It =

1[-1.+1]

is not even finite in a neighborhood of So = D II (xo) (= 1 for all Xo > 0).

(b) The previous function was not differentiable everywhere, but consider

) {O

if Ix I ~ I ,

hex

=

!(Ixl-1)2 otherwise.

Then, g(s)

=

1/2 s2

+

lsi is still not differentiable (at s

=

0).

(c) The following function is convex, I-coercive and twice differentiable everywhere:

The deep reason for all these oddities is that f* is a global concept, as it takes into account a priori the behaviour of

f

on its whole domain; as a result, the smoothness of f* is a tricky matter. We just mention two results: a local one, and a global one which echoes Proposition 6.2.1 via the inverse function theorem.

Proposition 6.2.5 Assume that f E Conv lR. is twice differentiable at Xo (in the sense of Definition 5.1.1) with Dz!(xo) >

o.

Then !* is likewise twice differentiable at

So = Df(xo) and

*

1

Dz! (so) = -Dz!(xo)

PROOF. First of all, we claim that f* is differentiable at so, with derivative Xo. In fact, Xo E af*(so) because of (6.1.4). If the convex set af*(so) contains another Xo

+

d, Dz! > 0 throughout. Then f* enjoys the same properties, but only on the image-set Df(intdomf); see Example 6.2.4(e).

A one-sided version of Proposition 6.2.5 can also be stated just as in Theo-rem 5.2.1. We rather give the global version below, obtained via the C 1 parametrization of Proposition 6.2.1: D!*

=

(D f) -I.

6 First Steps into the Theory of Conjugate Functions 43 6.3 Calculus Rules with Conjugacy

In §2.1, we have introduced some operations preserving convexity, whose effect on the subdifferentials has been seen in §4.3. Here, we briefly review their effect on the conjugate function.

Proposition 6.3.1 Let II and 12 be two (closed) convex junctions, minorized by a common affine function. Then

(/1

t

12)* =

It + g .

(6.3.1)

PROOF. The proof illustrates some properties of extremization (see in particular

§A.1.2). For s E JR,

(/1

t

h)*(s) = suPx {sx - infxt +x2=x[f1 (x])

+

h(X2)J}

= sUPXt+x2=AS(XI +X2) - II(xl) - h(x2)]

= sUPXt,xJS(XI +X2) - II (XI) - h(x2)]

= SUPXt [SXI - II (x])]

+

SUPX2[SX2 - h(x2)]

and we recognize

It(s) + Iz*(s)

in this last expression.

o

The dual version of this result is that, if II and 12 are two closed convex functions finite at some common point, then

(/1

+

12)* =

It t g .

(6.3.2)

The way to prove it is to observe that the two functions

It

and 12* satisfy the assumptions of Proposition 6.3.1, and their conjugates are II and fz respectively;

hence

(/t t

Iz*)* = II

+

fz .

Taking the conjugate of both sides and knowing that an infimal convolution is closed (see Remark 3.3.4) gives directly (6.3.2). In several dimensions, however, an inf-convolution is no longer closed, so technical difficulties can be anticipated to establish (6.3.2).

The value at S

=

0 of the function (6.3.2) gives an interesting relation: in view of (6.0.5), we have

inf[fl(x)

+

hex)]

=

-(/1

+

12)*(0)

=

inf[ft(s)

+

g(-s)] ,

XER SER

which is known as (the univariate version of) Fenchel's duality theorem - but once again, beware that it does not extend readily to several variables.

Formulae (6.3.1) and (6.3.2) show that the addition off unctions and their infimal convolution are operations dual to each other. The sup-operation is more complex:

it is dual to an operation that we have not seen yet, namely that of taking the closed convex hull of a nonconvex function. Indeed, convexity of

I

is by no means necessary to define its conjugate (6.0.1): the result is "meaningful" as soon as we have:

(i)

f

is not identically +00 (otherwise

f*

would be - identically! - -00)

(ii)

f

is minorized by some affine function (otherwise

f*

would be identically +00).

Now, to

f

satisfying these properties, we can associate the family of affine functions s 1-+ SX - f(x), indexed by x E lR: Proposition 3.3.2 tells us that their supremum

f*

is a closed convex function of s .

In a word, the conjugacy operation can perfectly well be applied to any function

f

satisfying the conditions (i) and (ii) above, "and nothing more". Looking again at the proof of Proposition 6.1.1, we see that the biconjugate of

f

is then the pointwise supremum of all the affine functions minorizing f. The epigraph of f** appears as the closed convex hull of epi

f,

as indicated by Fig. 6.3 .1. In view of this remark, a more suggestive notation can be used:

f** = cl co f = co f . (6.3.3) This last function appears as the "close-convexification" of

f,

i.e. the largest closed and convex function minorizing f; naturally, co f ~ f!

f(x) _ (x2 • 1)2 a kinky function

Fig.6.3.1. Taking a closed convex hull

The extension thus introduced for the conjugacy is used in our next results.

Proposition 6.3.2 Let

{f.;

lje] be a collection offunctions not identically +00, and all minorized by some common affine function. Then the function

f

:= infje]

f.;

satisfies (i) and (ii), and its conjugate is

(infj ij)* = SUPj(fj*) . (6.3.4)

PROOF. That

f

satisfies (i) and (ii) is clear. Then (6.3.4) is proved as (6.3.1), via the

same properties of extremization. 0

Corollary 6.3.3 Let {gj lje] be a collection offunctions in Conv lR, and suppose that there is some Xo such that SUPje] gj(xo) < +00. Then

(SUPj gj)* = co(infj gj) . PROOF. Proposition 6.3.2 applied to

f.; = gj

gives

(infj gj)*

=

SUPj gr

=

SUPj gj .

The result follows from (6.3.3), by taking the conjugate of each side.

o

6 First Steps into the Theory of Conjugate Functions 45 Example 6.3.4 Given two arbitrary functions rp and c from some arbitrary set Y to JR, consider the (closed and convex) function

JR 3 x ~ g(x) := sup {xc(y) - rp(y) : y E Y} (6.3.5) which we assume < +00 for some Xo E lR. With the help of the notation

gy(x) := xc(y) - rp(y) for all y E Y and x E JR, we can apply Proposition 6.3.3 to compute g*. We directly obtain

g*(s) = co[infyeY g;(s)] , (6.3.6)

where the conjugate of each gy is easy to compute:

* { rp(y) if s = c(y)

gy(s)

=

supA(s - c(y»x + rp(y)]

=

+00 otherwise.

This calculation is of interest in optimization: consider the (abstract) minimization prob-lem with one constraint

I

infrp(y) c(y) = s. y E Y (6.3.7)

Here, the right-hand side of the constraint is parametrized by s E lR. The optimal value is a function of the parameter, say P(s), usually called the value-function, or also primal, perturbation, or marginal function. Clearly enough, this function can be written

pes) = inf {g;(s) : y E Y}.

Observe that P has no special structure since we have made no assumption on Y, rp, c -other than g

t=.

+00 in (6.3.5). Nevertheless, what (6.3.6) tells us is that the closed convex hull of P is the conjugate of gin (6.3.5):

g*

=

coP.

In particular, if P happens to be closed and convex, we obtain from (6.0.5): - inf g = g*(O) = P(O). With notation closer to that of (6.3.7), this means

sup inf [rp(y) - xc(y)]

=

inf {rp(y) : c(y)

=

O} .

xeR yeY o

The closed convex hull of a function is an important object for optimization, even though it is not easily computable. A reason is that minimizing

f

or minimizing co

f

are "equivalent" problems in the sense that:

i minimizes f ¢:::::} [i minimizes co f and co f(i) = f(i)] . Even more can be said:

Theorem 6.3.5 Let

f :

lR. ~ lR. be a differentiable function with derivative D

f.

Then i minimizes

f

on lR.

if

and only

if

Df(i) = 0 and co f(i)

=

f(i).

In such a case, co

f

is differentiable and minimal at i.

PROOF. The condition D

f

(i) = 0 is known to be necessary for i to minimize the differentiable function

f.

Furthermore, the (constant) affine function defined by lex)

==

f(i) minorizes f - hence l ~ co f - and coincides with f at i-hence lei) = co f(i).

Conversely, let x satisfy Df(x) = 0 and co f(x) = f(x). Since co f ~ f, we have

co

f(x

+

h) -

co

f(x) ~ f(x

+

h) - f(x)

h ~ h for all h > O.

Letting h

+

0, we obtain

D+ co f(x) ~ Df(x) =

o.

Taking h < 0, we show likewise that

D_ co f(x) ;;:: Df(x)

= o.

On the other hand, the convex co f satisfies D_ co f ~ D+ co f: we conclude that D co f (x)

=

0, co f has a O-derivative at x, is therefore minimal at x, and f as

well. 0

Thus, what is needed for a stationary point x of

f

to be a minimum is just to satisfy co f(x)

=

f(x). The examples of Fig. 6.3.1 help understanding this last property:

the function (x2 - 1)2 has the minima ± 1, and 0 is left out. It is interesting to note that the condition D

f

(x) = 0 is purely local and makes no reference whatsoever to minimality of x, rather than maximality, say. In fact, suppose

f

has only one-sided derivatives; if the stationarity condition "D

f

(x) = 0" is replaced by the apparently natural "D_f(x) ~ 0 ~ D+f(x)", then Theorem 6.3.5 breaks down: see the right part of Fig. 6.3.1. By contrast, the condition "co f(x) = f(x)" has global character.

Dans le document Wissenschaften 305 A Series (Page 51-62)