First- and Second-Order Differentiation - Wissenschaften 305 A Series

Let C

c

IRn be nonempty and convex. For a function

f

defined on C (f(x) <

+00

for all x E C), we study here the following questions:

- When f is convex and differentiable on C, what can be said about the gradient V f?

- When f is differentiable on C, can we characterize its convexity in terms of V f?

- When

f

is convex on C, what can be said about its first and second differentiability?

We start with the first two questions.

4.1 Differentiable Convex Functions

First we assume that

f

is differentiable on C. Given Xo E C, the sentence

"f

is differentiable at xo" is meaningful only if

f

is at least defined in a neighborhood of Xo. Then, it is normal to assume that C is contained in an open set g on which

f

is differentiable.

Theorem 4.1.1 Let f be a function difforentiable on an open set g C IRn, and let C be a convex subset of g. Then

(i) f is convex on C

if

and only

if

f(x) ~ f(xo)

+

(V f(xo), x - Xo) for all (Xo, x) E C xC; (4.1.1) (ii) f is strictly convex on C

if

and only

if

strict inequality holds in (4.1.1) whenever

x =1= Xo;

(iii) f is strongly convex with modulus c on C

if

and only if, for all (xo, x) E C x C.

f(x) ~ f(xo)

+

^(V^{f(xo), x}^{- Xo)}

+

~cllx - xoll2 • (4.1.2) PROOF. [(i)] Let f be convex on C: for arbitrary (xo, x) E C X C and a E ]0,1[, we have from the definition ( 1.1.1) of convexity

f(ax

+

^{(1 -} ^a)xo)- f(xo) ::;; a[f(x) - f(xo)].

184 IV Convex Functions of Several Variables

Divide by a and let a .l, 0: observing that ax

+

(1 - a)xo = Xo

+

^{a(x -} ^{xo), the}

left-hand side tends to (\7 f(xo), x - xo) and (4.1.1) is established.

Conversely, take XI andx2 inC,a e]O, l[anddefinexo:= aXI +(l-a)x2 e C.

By assumption,

f(Xi) ~ f(xo)

+

^(\7^{f(xo), Xi}^- xo) for i = 1,2 and we obtain by convex combination

af(xI)

+

(1 - a)f(x2) ~ f(xo)

+

(\7 f(xo), aXI

+

^{(1 -} ^{a)x2 -} ^xo)

which, after simplification, is just the relation of definition (1.1.1).

[(ii)] If

f

is strictly convex, we have for Xo

1=

x in C and a E]O, 1[, f(xo

+

^{a(x -} xo» - f(xo) < a[f(x) - f(xo)] ; but

f

is in particular convex and we can use (i):

(\7 f(xo), a(x - xo») :::;; f(xo

+

^{a(x -} xo» - f(xo) , so the required strict inequality follows.

(4.1.3)

For the converse, proceed as for (i), starting from strict inequalities in (4.1.3).

[(iii)] Using Proposition 1.1.2, just apply (i) to the function

f -

1/2cll . 11 2 , which is

of course differentiable. 0

Thus, a differentiable function is convex when its graph lies above its tangent hyperplanes:

for each Xo, f is minorized by its affine approximation x 1-+ f(xo)

+

(V f(xo), x - xo}

(which coincides with f at xo). It is strictly convex when the coincidence set is reduced to the singleton (xo, f(xo». Finally, f is strongly convex when it is minorized by the quadratic convex function

X 1-+ f(xo)

+

(V f(xo), x - xo}

+

!cllx - xoll2 ,

whose gradient at Xo is also V f(xo). These tangency properties are illustrated on Fig. 4.1.1.

Xo slope Vf(xol Xo Xo curvature c

Fig.4.1.1. Affine and quadratic minorizations

Remark 4.1.2 Inequality (4.1.1) is fundamental. In case of convexity, the remainder term r in

f(x) = f(xo)

+

^(V^{f(xo), x}^{- xo)}

+

^{r(xo, x)}

must be well-behaved; for example, it is nonnegative for all x and xo; also, r(xo, -) is convex.

Both

f -

and V

f

-values appear in the relations dealt with in Theorem 4.1.1; we now proceed to give additional relations, involving V

f

only. We have seen in Chap. I that a differentiable function is convex if and only if its derivative is monotone in-creasing (on the interval where the function is studied). Here, we need a generalization of the wording "monotone increasing" to our multidimensional situation. There are several possibilities, one is particularly well-suited to convexity:

Definition 4.1.3 Let C

c

~n be convex. The mapping F : C --+ ~n is said monotone [resp. strictly monotone, resp. strongly monotone with modulus c > 0] on C when, for all x and x' in C,

(F(x) - F(x'), x - x'} ~ 0

[resp. (F(x) - F(x'), x - x'} > 0 whenever x =I-x' ,

resp. (F(x) - F(x'), x - x'} ~ cllx - x'II2 ] . o In the univariate case, the present monotonicity thus corresponds to F being increasing. When particularized to a gradient mapping F = V f, our definition char-acterizes the convexity of the underlying potential function f:

Theorem 4.1.4 Let f be afunction differentiable on an open set il C ~n, and let C be a convex subset of il. Then, f is convex (resp. strictly convex, resp. strongly convex with modulus c] on C

if

and only

if

its gradient V f is monotone [resp. strictly monotone, resp. strongly monotone with modulus c] on C.

PROOF. We combine the "convex {} monotone" and "strongly convex {} strongly monotone" cases by accepting the value c

=

0 in the relevant relations such as (4.1.2).

Thus, let

f

be [strongly] convex on C: in view of Theorem 4.1.1, we can write for arbitrary Xo and x in C:

f(x) ~ f(xo)

+

^(V^{f(xo), x}^{- Xo}}

+ 4cllx - xoll

f(xo) ~ f(x)

+

^(V^{f(x), Xo}^{- x}}

+ 4cllxo - xll

^{2 ,}

and mere addition shows that V

f

is [strongly] monotone.

Conversely, let (Xo, x I ) be a pair of elements in C. Consider the univariate function t t-+ ep(t) := f(xt), where Xt := Xo

+

^t(XI- xo); for t in an open interval containing [0,1], Xt Eiland ep is well-defined and differentiable; its derivative at tis ep'(t) = (V f(xr), XI - xo}. Thus, we have for all 0 ~ t' < t ~ I

q/(t) - ep'(t') = (V f(xd - V f(xt,), ^{Xl -} Xo}

=

t~t,(Vf(xt)-Vf(xt,),Xt-Xt') (4.1.4) and the monotonicity relation for V f shows that ep' is increasing, ep is therefore convex (Corollary I.5.3.2).

For strong convexity, set t' = 0 in (4.1.4) and use the strong monotonicity relation for V f:

ep'(t) - ep'(O) ~ fCllxt - xoll2 = tclixi - xoll 2 . (4.1.5)

186 IV. Convex Functions of Several Variables

Because the differentiable convex function ep is the integral of its derivative, we can write

ep(l) - ep(O) - ep'(O) =

11

^[ep'(t)- ep'(O)]dt

~ tcllxl - xoll

which, by definition of ep , is just (4.1.2) (the coefficient 1/2 is fol t dt I).

The same technique proves the "strictly monotone <=> strictly convex" case; then, (4.1.5) becomes a strict inequality - with c = 0 - and remains so after integration.

o

The attention of the reader is drawn on the coefficient e - and not 1/2e - in the defini-tion 4.1.3 of strong monotonicity. Actually, a sensible rule is: "Use 1/2 when dealing with a square"; here, the scalar product (LlF, Llx) is homogeneous to a square. Alternatively, remember in Proposition 1.1.2 that the gradient of 1/2 ell· 112 at x is ex.

We mention the following example: let

f(x) := !(Ax, x)

+

^{(b, x)}

be a quadratic convex function (A is symmetric), and let An ~ 0 be its smallest eigenvalue.

Observe that V f(x) = Ax

+

b and that

(Ax - Ax',x -x') = (A(x _X/),X -x') ~ Anllx _x'1l2 .

Thus V

f

is monotone [strongly with modulus An]. The [strong] convexity of

f,

in the sense of(1.1.2), has been already alluded to in §1.3(d); but (4.1.2) is easier to establish here: simply write

f(x) - f(xo) - (V f(xo), x - xo) ~(Ax, x) - ~(Axo, xo) - (Axo, x - xo)

= !(A(x - xo), x - xo) ~ !Anllx - xoll^{2 .}

Note that for this particular class of convex functions, strong and strict convexity are equivalent to each other, and to the positive definiteness of A.

Remark 4.1.5 Do not infer from Theorem 4.1.4 the statement "a monotone mapping is the gradient ofa convex function", which is wrong. To be so, the mapping in question must first be a gradient, an issue that we do not study here. We just mention the following property: if Q is convex and F : Q -+ IRⁿis differentiable, then F is a gradient if and only if its Jacobian operator is symmetric (in 2 or 3 dimensions, curl F

=

^0). ⁰

Example 4.1.6 Let C C ]Rn be nonempty closed convex. We have already seen in Example 2.1.4 that the function

]Rn 3 X 1-+ epc(x) :=

t

^{[IIx1l2 -} ^d~(x)]

is convex and finite everywhere. It would be so for arbitrary C, but the convexity of C here implies the differentiability of epc, with gradient Vepc

=

^Pc(the projection operator on C). To differentiate the only delicate term d~, consider

.tl := d~(x

+

h) - d~(x) .

Dans le document Wissenschaften 305 A Series (Page 197-200)