Globally convergent algorithms for finding zeros of duplomonotone mappings

(1)

Globally convergent algorithms for nding zeros of duplomonotone mappings

Francisco J. Aragón Artacho

^∗

Ronan M.T. Fleming

^†

September 1, 2014

Systems Biochemistry Group, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Campus Belval, 4362 Esch-sur-Alzette, Luxembourg.

Abstract

We introduce a new class of mappings, called duplomonotone, which is strictly broader than the class of monotone mappings. We study some of the main properties of duplomonotone functions and provide various examples, including nonlinear duplomonotone functions arising from the study of systems of biochemical reactions. Finally, we present three variations of a derivative-free line search algorithm for nding zeros of systems of duplomonotone equations, and we prove their linear convergence to a zero of the function.

Keywords: Generalized monotonicity, duplomonotone mapping, monotone mapping, global convergence, line search method, derivative-free algorithm, biochemical reactions.

1 Introduction

Monotone mappings have been extensively studied in the literature, see for instance [6, Chapter 12]

or the recent monograph [1]. In many practical problems, though, the monotonicity assumption turns out to be too strong. Consequently, several generalized notions of monotonocity have been introduced and thoroughly studied by various authors in order to relax it while keeping some of the useful properties of monotone mappings, see [2,4] and the references therein.

In mathematical models of biochemical reaction networks [3], a problem arises of nding a zero of functions that are typically not monotone (see Example2.14). These functions seem to have a generalized monotonicity property that has not yet appeared in the literature but can be exploited to nd a zero of such functions. In this paper we introduce this new class of generalized monotone mappings, which we call duplomonotone, and present a rather simple derivative-free line search algorithm that can be used to nd a zero of a duplomonotone function.

The paper is organized as follows: in Section2we introduce duplomonotone mappings, analyze their basic properties and provide various illustrative examples; in Section 3 we present three variations of a derivative-free line search algorithm for nding a zero of a duplomonotone function, and we prove their linear convergence under strong duplomonotonicity plus some Lipschitz-type assumption on the points of the lower level set dened by the initial point.

Throughout, k · k denotes the Euclidean norm, while the usual inner product is denoted by h·,·i.We say thatF is a set-valued mapping fromR^mtoRⁿ, denoted byF :R^m⇒Rⁿ, if for every x∈R^m, F(x)is a subset of Rⁿ. The gradient of a dierentiable functionf :R^m →Rⁿ at some pointx∈R^m is denoted by∇f(x)∈R^m×n.

∗E-mail: francisco.aragon@ua.es

†E-mail: ronan.mt.eming@gmail.com

(2)

2 Duplomonotonicity

Recall that a functionf :R^m→R^m is said to be monotone when hf(x)−f(y), x−yi ≥0 for allx, y∈R^m,

and strictly monotone if this inequality is strict whenever x 6= y. Further, f is called strongly monotone for someσ >0when

hf(x)−f(y), x−yi ≥σkx−yk² for allx, y∈R^m. We introduce next a new property that is implied by monotonicity.

Denition 2.1. A functionf :R^m→R^m is called duplomonotone with constantτ >¯ 0if hf(x)−f(x−τ f(x)), f(x)i ≥0 wheneverx∈R^m,0≤τ ≤¯τ , (1) and strictly duplomonotone if this inequality is strict wheneverf(x)6= 0. The functionf is said to be strongly duplomonotone for someσ >0with constant τ >¯ 0if

hf(x)−f(x−τ f(x)), f(x)i ≥στkf(x)k² wheneverx∈R^m,0≤τ≤τ .¯ (2) The modulus of strong duplomonotonicity is the supremum of the constantsσfor which (2) holds.

Remark 2.2. Lettingσbe zero in (2) will allow us to handle both duplomonotonicity and strong duplomonotonicity at the same time. Hence, we refer to this asf being strongly duplomonotone withσ≥0.

Obviously, every (strongly) monotone function is (strongly) duplomonotone. In the next simple example we show that the converse is not true in general: the class of duplomonotone functions is strictly broader than the class of monotone functions. Thus, we have:

monotonicity ⇒

6⇐ duplomonotonicity

Example 2.3. Given a matrixA∈R^m×m, consider the linear functionf(x) :=Ax. Recall that the symmetric part ofAis the matrix As:=¹₂(A+A^T). The mappingf is monotone if and only ifAsis positive semidenite (see e.g. [6, Example 12.2]). On the other hand,f is duplomonotone if and only if there is someτ >¯ 0such that, for anyx∈R^m, one has

0≤ hf(x)−f(x−τ f(x)), f(x)i=τ x^TA^TA²x, whenever0≤τ ≤¯τ;

that is, f is duplomonotone if and only if A^TA²

s is positive semidenite. Furthermore, f is strongly duplomonotone forσ >0if and only if for anyx∈R^mand any positiveτ, one has

0≤ hf(x)−f(x−τ f(x)), f(x)i −στkf(x)k² = τ x^TA^TA²x−στ x^TA^TAx

= τ x^TA^T(A−σI)Ax,

whereI denotes the identity mapping. Therefore, f is strongly duplomonotone forσ > 0 if and only if A^T(A−σI)A

s=A^T(As−σI)Ais positive semidenite.

If Ais symmetric, then A^TA²

s =A³, whose eigenvalues have the same sign as those ofA. Thus, for Asymmetric, the function f is duplomonotone if and only if f is monotone. However, this may not be the case if A is asymmetric. As a simple example, if we take

A:=

2 0 2 0

, (3)

then,

As= 2 1

1 0

,

(3)

which is not positive semidenite, while A^TA²

s=

16 0 0 0

is positive semidenite. Thus, f((x, y)^T) = (2x,2x)^T is duplomonotone but is not monotone.

Moreover, it is not dicult to check that f is not even quasimonotone¹. In fact, f is strongly duplomonotone with modulusσ= 2. Indeed,

A^T(A−σI)A=

8(2−σ) 0

0 0

,

which is positive semidenite if and only ifσ≤2.

A strictly monotone function has at most one zero. This is not the case for duplomonotone functions: even under strong duplomonotonicity we can see that the function f(x) = Ax with A given by (3) has a zero at (0, y)^T for every y ∈ R. In fact, the zero function is strongly duplomonotone for anyσ >0.

We have shown a function in Example2.3that is duplomonotone but not quasimonotone. It is interesting to note that there are also functions that are quasimonotone but not duplomonotone, e.g. f(x) =−|x|forx∈R.

Example 2.4. Given a matrix A ∈ R^m×m and a vector b ∈ R^m, consider the ane function f(x) :=Ax+b. By [6, Example 12.2],f is monotone if and only ifAsis positive semidenite. On the other hand,f is duplomonotone if and only if

(Ax+b)^TA(Ax+b)≥0 for allx∈R^m;

that is,f is duplomonotone if and only ifAsis positive semidenite on the range off. For example, one can check that forAgiven in (3) and anyb= (b1, b2)^T ∈R², the functionf is duplomonotone if and only ifb1=b2.

Next we present a direct characterization of duplomonotonicity in terms of the Euclidean norm.

Proposition 2.5. A functionf : R^m→R^m is strongly duplomonotone for σ≥0 if and only if there is some constant¯τ >0 such that for allx∈R^m and all0≤τ ≤¯τ one has

kf(x−τ f(x))k²≤(1−2στ)kf(x)k²+kf(x−τ f(x))−f(x)k². (4) Proof. For anyx∈R^mand any τ >0, we have.

kf(x−τ f(x))k² = k(f(x−τ f(x))−f(x)) +f(x)k²

= kf(x−τ f(x))−f(x)k²+kf(x)k²+ 2hf(x−τ f(x))−f(x), f(x)i.

The stated equivalence follows then from the denition of strong duplomonotonicity off.

The following example shows the importance of considering the constant¯τ in the denition of duplomonotonicity: there are functions for which (1) does not hold for allτ >0. One could also dene a weaker notion of duplomonotonicity where the constant ¯τ in (1) depends on each point x. Nevertheless, this property might be too weak to guarantee the convergence of the line search algorithms in Section3, as we need to ensure that the step size is bounded away from zero.

1A functionf:R^m→R^mis quasimonotone if the following implication holds:

hf(x), y−xi>0⇒ hf(y), y−xi ≥0, for everyx, y∈R^m. Monotonicity implies quasimonotonicity.

(4)

Example 2.6. Let f : R² →R² be given by f(x) := (x1x²₂, x2)^T forx:= (x1, x2)^T ∈ R². It is easy to check thatf is not monotone: if we take x:= (−3,0)^T andy:= (−1,1)^T, we have

hf(x)−f(y), x−yi=−1.

On the other hand, after some algebraic manipulation, one can show that for allx:= (x1, x2)^T ∈ R², one has

hf(x)−f(x−τ f(x)), f(x)i= (τ−1)²x₁²x⁴₂+ (2−τ)x²₁x²₂+ 1 τ x²₂,

which is nonnegative for all τ ∈ [0,2]. Thus, f is duplomonotone with constant τ¯ = 2. If τ >2, the expression above can be negative for some x∈R². Indeed, choose any ε >0 and let z:= (z₁,p

ε/2/(ε+ 1))for somez₁∈R. Then,

hf(z)−f(z−(2 +ε)f(z)), f(z)i=− ε⁴+ 2ε³

z²₁+ 4ε⁴+ 16ε³+ 20ε²+ 8ε

8(ε+ 1)⁴ ,

which is negative forz₁²suciently big.

The next result shows that if a function is both Lipschitz continuous and strongly duplomonotone forσ >0, thenσis bounded above by the Lipschitz constant.

Proposition 2.7. If a functionf : R^m →R^m is Lipschitz continuous with constant ` >0 and strongly duplomonotone forσ >0, withf 6≡0, thenσ≤`.

Proof. Because of the Lipschitz continuity, we have

kf(x−τ f(x))−f(x)k ≤`τkf(x)k for allx∈R^m, τ >0.

Let τ >¯ 0 be the strong duplomonotonicity constant in (2), and pick any z ∈ R^m such that f(z)6= 0. Then

σ¯τkf(z)k²≤ hf(z)−f(z−τ f(z)), f¯ (z)i ≤ kf(z−τ f(z))¯ −f(z)kkf(z)k ≤`¯τkf(z)k², whenceσ≤`.

In the following result we show a direct consequence of duplomonotonicity for dierentiable functions.

Proposition 2.8. Letf :R^m→R^m be dierentiable. The following assertions hold.

(i) Iff is duplomonotone, then

f(x)^T∇f(x)f(x)≥0 for allx∈R^m. (5) (ii) Iff is strongly duplomonotone forσ >0, then

f(x)^T∇f(x)f(x)≥σkf(x)k² for allx∈R^m. (6) Proof. Assume that f satises (2) withσ ≥0 and ¯τ > 0. Fix x∈ R^m and choose an arbitrary τ∈(0,τ]¯. Dividing (2) byτ we get

−

f(x−τ f(x))−f(x)

τ , f(x)

≥σkf(x)k², and taking the limit asτ &0, we obtainf(x)^T∇f(x)f(x)≥σkf(x)k².

Remark 2.9. (i) In general, strict duplomonotonicity does not imply that equality in (5) is only achieved whenf(x) = 0, in the same way that strict monotonicity does not imply positive de- niteness of∇f(x).

(ii) Observe that both assertions also hold under the weaker notion of duplomonotonicity where the constant¯τ depends on eachx∈R^m.

(5)

For dierentiable functions in one dimension, the notions of (strong) duplomonotonicity and (strong) monotonicity agree. In fact, we will prove in Proposition2.11that the concepts of monotonicity and duplomonotonicity coincide for continuous functions in one dimension². This is not the case if the function is not continuous, as we show in Example2.12.

Corollary 2.10. Letf :R→Rbe dierentiable. Then f is (strongly) monotone if and only iff is (strongly) duplomonotone.

Proof. This is just a consequence of Proposition2.8 and the fact that f is (strongly) monotone with constantσ≥0if and only if f⁰(x)≥σ.

Proposition 2.11. Let f : R → R be continuous. Then f is monotone if and only if f is duplomonotone.

Proof. Suppose that f is duplomonotone with constant τ >¯ 0. If there is some z ∈ R such that f(z) >0, then we claim that there is an open interval containing z such that f(z) is both nondecreasing and positive on it. Indeed, by continuity off, there is someδ0>0such thatf(x)>

f(z)/2>0for allx∈(z−δ0, z+δ0). Setδ:= min{δ0,¯τ f(z)/4}. Choose anyx, y∈(z−δ, z+δ) withx > y, and setτ := _f(x)^x−y ∈(0,τ)¯ . Then,x−τ f(x) = y. From the duplomonotonicity off, we deduce

0≤f(x)−f(x−τ f(x)) =f(x)−f(y).

Hence,f is nondecreasing and positive on(z−δ, z+δ),as claimed.

Observe now that f has to be positive and nondecreasing on (z−δ,+∞), again by continuity of f. Therefore, if we set a := inf{x∈R|f(x)>0} ∈ R∪ {−∞,+∞}, it follows that {x∈R|f(x)>0}= (a,+∞)and f is nondecreasing on (a,+∞). Using the same argument, we deduce that {x∈R|f(x)<0} = (−∞, b) with b ∈ R∪ {−∞,+∞} and f is nondecreasing on (−∞, b). Thus,f is monotone.

Example 2.12. Consider the functionf :R→Rdened forx∈Rby f(x) :=

(0, ifx∈Q; 1, ifx6∈Q. The functionf is not monotone (not even locally):

(f(π)−f(4))(π−4) =π−4<0.

On the other hand, f is duplomonotone: for any x ∈ Q the duplomonotonicity condition (1) trivially holds sincef(x) = 0, while for anyx6∈Qand anyτ >0 we have

(f(x)−f(x−τ f(x)))f(x) = 1−f(x−τ)≥0.

Furthermore, one can easily check that this function is not strongly duplomonotone. A slight modication of this example yields a function that is strongly duplomonotone, but still not monotone:

letg:R→Rbe dened forx∈Rby

g(x) :=

(0, ifx∈Q; x, ifx6∈Q. Again, the functiongis not monotone (not even locally), since

(g(π)−g(4))(π−4) =π(π−4)<0.

2This result and the proof included here is due to the referee of this paper, who noticed that the Dirichlet function in Example2.12is not monotone because it is not continuous.

(6)

In this case, g is strongly duplomonotone forσ= 1 with constantτ¯= 1: for anyx6∈Qand any τ∈[0,1]we have

(g(x)−g(x−τ g(x)))g(x)−τ g(x)² = (x−g((1−τ)x))x−τ x²

=

((1−τ)x², if(1−τ)x∈Q 0, if(1−τ)x6∈Q

≥ 0.

Therefore, without dierentiability, the concepts of monotonicity and duplomonotonicity may be quite dierent, even in one dimension.

In the next proposition we introduce a property that implies duplomonotonicity, but is still weaker than monotonicity (see Example2.14). This property has a characterization for dierentiable functions analogous to the positive-semideniteness of the Jacobian for monotone functions, see e.g. [6, Proposition 12.3].

Proposition 2.13. Let f :R^m →R^m be dierentiable. Then, for any σ≥0, the following two properties are equivalent:

(i) hf(x−τ1f(x))−f(x−τ2f(x)), f(x)i ≥σ(τ2−τ1)kf(x)k² for allx∈R^m,0≤τ1≤τ2≤τ¯; (ii) f(x)^T∇f(x−τ f(x))f(x)≥σkf(x)k²for allx∈R^m,τ∈[0,τ]¯.

Proof. Assume that (i) holds. Choose anyx∈R^mand anyτ ∈[0,τ)¯ . For allt∈(0,τ¯−τ]one has

−hf(x−(t+τ)f(x))−f(x−τ f(x)), f(x)i ≥σtkf(x)k².

Thus, dividing bytand taking the limit ast&0, σkf(x)k²≤ −

lim

t&0

f(x−(t+τ)f(x))−f(x−τ f(x))

t , f(x)

=h∇f(x−τ f(x))^Tf(x), f(x)i, so (ii) follows.

Conversely, assume that (ii) holds. Pick anyx∈R^mand any 0≤τ1 ≤τ2 ≤τ¯. Consider the function

h(λ) :=hf(x−(λτ1+ (1−λ)τ2)f(x))−f(x−τ2f(x))−σλ(τ2−τ1)f(x), f(x)i forλ∈R. Then, by (ii),

h⁰(λ) =h∇f(x−(λτ1+ (1−λ)τ2)f(x))^T(τ2−τ1)f(x)−σ(τ2−τ1)f(x), f(x)i ≥0, for allλ∈[0,1], whence,

0 =h(0)≤h(1) =hf(x−τ₁f(x))−f(x−τ₂f(x))−σ(τ₂−τ₁)f(x), f(x)i, which implies (i).

Our motivation to characterize duplomonotone mappings arose from mathematical modeling of networks of (bio)chemical reactions, an increasingly prominent application of mathematical and numerical optimization. The next example introduces a very simple (bio)chemical reaction network, involving three molecules and three reactions, where each row ofxcorresponds to the logarithmic abundance of some molecule and each row of−f(x)corresponds to the rate of change of abundance per unit time.

Example 2.14. Consider the function f : R³ → R³ dened for x ∈ R³ by f(x) := ([F, R]− [R, F]) exp([F, R]^Tx),whereexp(·)denotes the component-wise exponential,

F:=





0 0 1 1 0 0 0 1 0



, R:=





1 0 0 0 1 0 0 0 1



,

(7)

and[·,·]is the horizontal concatenation operator. That is, forx:= (x1, x2, x3)^T ∈R³ we have

f(x) =





2e^x¹−e^x²−e^x³

−e^x¹+ 2e^x²−e^x³

−e^x¹−e^x²+ 2e^x³



.

The function f is not monotone because ∇f(x) is not positive semidenite for all x ∈ R³. For instance, ifz:= (0,0,log(2))^T andw:= (3,3,2)^T, we have

w^T∇f(z)w=−2.

Nevertheless, the functionf is duplomonotone because, in fact, it satises Proposition2.13(ii) with σ= 0. Indeed, if we dene

ϕ(x, τ) :=hf(x)−f(x−τ f(x)), f(x)i, we have

∂ϕ

∂τ(x, τ) =h∇f(x−τ f(x))^Tf(x), f(x)i. (7) After some algebraic manipulation, we obtain

ϕ(x, τ) = 3e^x¹^+τ(−2e^x¹^+e^x²^+e^x³⁾(−2e^x¹+e^x²+e^x³) +3e^x²^+τ(e^x¹^−2e^x²^+e^x³⁾(e^x¹−2e^x²+e^x³) +3e^x³^+τ(e^x¹^+e^x²^−2e^x³⁾(e^x¹+e^x²−2e^x³) +(−2e^x¹+e^x²+e^x³)²+ (e^x¹−2e^x²+e^x³)² +(e^x¹+e^x²−2e^x³)².

Thus,

∂ϕ

∂τ(x, τ) = 3e^x¹^+τ(−2e^x¹^+e^x²^+e^x³⁾(−2e^x¹+e^x²+e^x³)² +3e^x²^+τ(e^x¹^−2e^x²^+e^x³⁾(e^x¹−2e^x²+e^x³)² +3e^x³^+τ(e^x¹^+e^x²^−2e^x³⁾(e^x¹+e^x²−2e^x³)²

≥ 0,

and because of (7), we have that Proposition2.13(ii) holds for allτ >0.

Indeed, the functionf is strictly duplomonotone because ^∂ϕ_∂τ(x, τ)>0 for allx6∈Ω, where Ω :=

x∈R³|f(x) = 0 =

(x₁, x₂, x₃)^T ∈R³|x₁=x₂=x₃ .

Hence,ϕ(x, τ)> ϕ(x,0) = 0for allx6∈Ωand allτ >0; that is,f is strictly duplomonotone.

The sum of two monotone operators is clearly monotone. Further, if a mappingF is monotone, one can easily show that for all α > 0 the mapping F +αI is strongly monotone. Do these properties also hold for duplomonotone functions? The answer is negative in general. As we show in the next example, duplomonotonicity can be destroyed by the addition of a monotone linear function of arbitrarily small slope.

Example 2.15. Consider the matrix

A:=

0 1 0 0

.

By Example 2.3, the function f(x) := Ax is duplomonotone, since A^TA² = 0_2×2. Choose any α >0 and consider the functiong(x) :=Bx, withB:=A+αI. Then,

B^TB²

s=

α³ ³₂α²

3

2α² α³+ 2α

. The eigenvalues of B^TB²

sare α³+α±1/2α√

9α²+ 4. Ifα∈(0,1/2), we have that α³+α− 1/2α√

9α²+ 4<0. Therefore, the functiong=f+αI is not duplomonotone for anyα∈(0,1/2).

(8)

A direct consequence of Proposition2.8is that −f(x)is a descent direction for kf(·)k² at any pointx∈R^m whenf is duplomonotone. This property inspires the derivative-free algorithms in Section3for nding zeros of the function f.

Corollary 2.16. Letf :R^m→R^mbe dierentiable and strongly duplomonotone forσ >0. Then, for all x ∈ R^m, either f(x) = 0 or the vector −f(x) provides a descent direction for the merit functionkf(·)k² at the point x.

Proof. Observe that, for anyx∈R^m,we have∇ kf(·)k²

(x) = 2∇f(x)f(x). Thus, inequality (6) implies that

h∇ kf(·)k²

(x),−f(x)i=−2h∇f(x)f(x), f(x)i ≤ −2σkf(x)k². (8) The assertion follows.

It is straightforward to extend the denition of duplomonotonicity for set-valued mappings.

Denition 2.17. A set-valued mapping F : R^m ⇒ R^m is called duplomonotone with constant

¯

τ >0if for allx∈R^mand allτ ∈[0,τ]¯ one has

hy0−y1, y0i ≥0 whenevery0∈F(x), y1∈F(x−τ y0).

The mappingF is said to be strongly duplomonotone for someσ >0 with constantτ >¯ 0 if for all x∈R^mand allτ ∈[0,τ]¯ one has

hy0−y1, y0i ≥στky0k² whenevery0∈F(x), y1∈F(x−τ y0).

One can easily extend the characterization of duplomonotonicity given in Proposition 2.5 to set-valued mappings.

Proposition 2.18. A set-valued mapping F :R^m⇒R^m is strongly duplomonotone forσ≥0 if and only if there is someτ >¯ 0 such that for all x∈R^m and allτ ∈[0,τ]¯ one has

ky1k²≤(1−2στ)ky0k²+ky1−y₀k² whenevery₀∈F(x), y₁∈F(x−τ y₀).

We will not explore duplomonotone set-valued mappings any further here, as it is beyond the scope of the present paper.

3 Derivative-free algorithms for systems of duplomonotone equations

In this section we consider the problem of nding solutions of systems of nonlinear equations

f(x) = 0, (9)

wheref :R^m→R^mis strongly duplomonotone forσ >0. Corollary2.16drives us to consider the following derivative-free line search algorithm for nding zeros off.

Algorithm 1: Backtracking (withσ >0 known)

Fix a precisionε >0. Choose anyx₀∈R^m,0< α <2σ, 0< β <1, and setk:= 0. while kf(x_k)k> εdo

Computeλ_k:= _α¹β^p^k, where p_k is the smallest positive integer such that

kf(xk−λkf(xk))k²≤(1−αλk)kf(xk)k². (10) xk+1:=xk−λkf(xk);

k:=k+ 1; end

(9)

Observe that, whenf is dierentiable, the step size acceptance criteria (10) is implied by the usual Armijo rule for the function kf(·)k² and the direction dk := −f(xk). Indeed, given some constantc∈(0,1), the Armijo rule forkf(·)k²will search for a step sizeλk satisfying

kf(xk+λ_kd_k)k² ≤ kf(x_k)k²+ 2cλ_kd^T_k∇f(x_k)f(x_k)

= kf(xk)k²−2cλkf(xk)^T∇f(xk)f(xk).

Proposition2.8(ii) gives us

kf(xk)k²−2cλkf(xk)^T∇f(xk)f(xk) ≤ (1−2σcλk)kf(xk)k². Takingα:= 2σc, we get0< α <2σ, and (10) follows.

The steepest descent algorithm could be applied to nd solutions to nonlinear equations of type (9) whenever the functionf has a computable Jacobian. The main advantage of Algorithm1 relative to the steepest descent method is that no derivative information is needed. On the other hand, note that one cannot assure in general that the steepest descent method will converge to a zero of the functionf, but to a critical point ofkf(·)k²(for more details, see e.g. [5, Chapter 11]).

This is not a concern under strong duplomonotonicity forσ >0: in this case, any critical point of kf(·)k²will be a zero off. Indeed, otherwise one would have∇ kf(·)k²

(˜x) = 0andf(˜x)6= 0for somex˜∈R^m. Then

0 =∇ kf(·)k²

(˜x) = 2∇f(˜x)f(˜x), whence, by Proposition2.8(ii),

0 =f(˜x)^T∇f(˜x)f(˜x)≥σkf(˜x)k²>0, which is a contradiction.

Iff is Lipschitz continuous with a known constant` >0and is also strongly duplomonotone for σ >0with constantτ >¯ 0, then, as a direct consequence of the characterization in Proposition2.5, we get

kf(x−τ f(x))k²≤(1−2στ+`²τ²)kf(x)k², (11) for allx∈R^mand all 0≤τ ≤τ¯. The right-hand side of (11) attains its minimum (with respect toτ∈[0,¯τ]) atτ^?:= min

σ/`²,τ¯ . Thus, ifσ/`²≤τ¯, we have kf(x−τ^?f(x))k²≤

1−σ²

`²

kf(x)k². (12)

This makes us consider the following variation of Algorithm1, where the step size is chosen constant.

Algorithm 2: Constant step lengthmin σ/`²,τ¯

Fix a precisionε >0. Choose anyx0∈R^m, and setk:= 0andλ:= min

σ/`²,τ¯ . while kf(xk)k> εdo

xk+1:=xk−λf(xk); k:=k+ 1;

end

As a direct consequence of (12) we have that Algorithm2is (globally) linearly convergent to a zero off, and moreover, the Lipschitz assumption can be relaxed as follows.

Theorem 3.1. Let f :R^m→R^mbe strongly duplomonotone for σ >0 with constantτ >¯ 0. Let x0∈R^m be an initial point, and assume there exists some constant` >0 such that

kf(x−τ f(x))−f(x)k ≤`τkf(x)k for all x∈L(x )and all0≤τ≤τ ,¯ (13)

(10)

whereL(x0)is the lower level set dened by

L(x0) :={x∈R^m:kf(x)k ≤ kf(x0)k}. (14) Setλ:= min

σ/`²,τ¯ . Then the iteration x_k+1:=x_k−λf(x_k)satises kf(xk+1)k ≤p

1−2σλ+`²λ²kf(xk)k;

whence, f(x_k) is linearly convergent to zero. Thus, if f is continuous, any accumulation point of the sequencex_k is a zero off.

Proof. It follows from the argumentation above.

Even when f is known to be Lipschitz continuous, its Lipschitz constant might not be easy to compute. The next result shows that in this case Algorithm1 can be used, and the step size λk can always be found by a backtracking technique whereλk is bounded away from zero and the algorithm is linearly convergent. We denote byd·ethe ceiling function, i.e., the smallest following integer to a given number.

Theorem 3.2. Letf :R^m→R^m be strongly duplomonotone for σ >0 with constantτ >¯ 0. Let x₀ ∈R^m be an initial point, and assume that there is a positive constant ` such that (13) holds.

Then, for all0< α <2σ and all0< β <1, Algorithm1generates a sequencexk such that f(xk) is linearly convergent to zero with rate√

1−β^p, where p:=

1

logβ (logα+ min{log(¯τ),log (2σ−α)−2 log`})

. (15)

Thus, iff is continuous, any accumulation point of the sequencexk is a zero off.

Proof. Letx ∈ L(x₀). We will prove that the step size (1/α)β^p with p as in (15) always satises (10), i.e., that we have

kf(x−(1/α)β^pf(x))k²≤(1−β^p)kf(x)k². (16) Proposition2.5gives us

kf(x−τ f(x))k²≤(1−2στ)kf(x)k²+kf(x−τ f(x))−f(x)k², (17) for allτ ∈[0,τ]¯. Takepas in (15), that is,

p:=

max

log(α¯τ)

logβ ,logα+ log (2σ−α)−2 log` logβ

.

Then (13) holds for all0< τ ≤(1/α)β^p. This, together with (17), implies that kf(x−τ f(x))k²≤(1−2στ+`²τ²)kf(x)k² whenever0< τ ≤ 1

αβ^p.

Moreover, we have that1−2στ+`²τ²≤1−ατ if and only if τ≤(2σ−α)/`². The denition of pimplies that(1/α)β^p≤(2σ−α)/`²; hence,

kf(x−τ f(x))k²≤(1−ατ)kf(x)k² whenever0< τ ≤ 1 αβ^p,

which implies (16). Therefore, given a point xk generated by Algorithm 1, the integer pk can always be found and it satisespk ≤p. Thus,λk = (1/α)β^p^k ≥(1/α)β^p, and we have

kf(x_k+1)k²=kf(x_k−λ_kf(x_k)k²≤(1−αλ_k)kf(x_k)k²≤(1−β^p)kf(x_k)k², which in particular yieldsxk+1∈L(x0), and the claims in the statement follow.

(11)

Remark 3.3. Notice that: (i) the constant ` in (13) does not need to be known in order to use Algorithm1, but is involved in the rate of convergence, and (ii) Lipschitz continuity of the function f onL(x0)implies (13).

Even iff is known (or conjectured) to be both Lipschitz continuous and strongly duplomonotone forσ > 0, in practical situations the values of both the Lipschitz constant and σmight be unknown or dicult to compute. The following modication of Algorithm 1 permits nding an adequate step size by a double backtracking technique, where an additional backtracking is per- formed in order to nd an appropriate value of the parameterαin (10) such thatα <2σ.

Algorithm 3: Double backtracking (withσ >0 unknown)

Fix a precisionε >0, and a minimum and a maximum step size0< λmin< λmax. Choose anyx₀∈R^m,0< α < λ⁻¹_max,0< β <1, and setk:= 0.

while kf(x_k)k> εdo λ_k :=λ_max;

while kf(x_k−λ_kf(x_k))k²>(1−αλ_k)kf(x_k)k² do λ_k :=βλ_k;

if λk< λminthen α:=αβ; λk :=λmax; end

end

xk+1:=xk−λkf(xk); k:=k+ 1;

end

Theorem 3.4. Letf :R^m→R^m be strongly duplomonotone for σ >0 with constantτ >¯ 0. Let x0∈R^m be an initial point, and assume that there exists some positive constants`such that (13) holds. Then, for all positive constants λmin and λmax such that there exists some integer q with λmin≤β^qλmax<min

2σ/`²,τ¯ , Algorithm3generates a sequence xk such that f(xk)is linearly convergent to zero with ratep

1−αβ^p+qλmax, where p:=

log(2σ−`²β^qλmax)−log(α) log(β)

. (18)

Thus, iff is continuous, any accumulation point of the sequencexk is a zero off.

Proof. Denote byα0the initial value ofαin Algorithm3. Proposition2.5together with (13) gives us

kf(x−τ f(x))k²≤(1−2στ+`²τ²)kf(x)k² wheneverx∈L(x₀),0< τ ≤¯τ .

Further, we have that 1−2στ +`²τ² ≤ 1 −α₀β^pτ with 0 < τ ≤ τ¯ if and only if 0 < τ ≤ min

(2σ−α₀β^p)/`²,¯τ . By assumption, there exists some positive integer q such that λ_min ≤ β^qλmax<min

2σ/`²,¯τ . By the denition ofpin (18), we haveβ^qλmax≤(2σ−α0β^p)/`². Hence, λmin≤β^qλmax≤min

2σ−α0β^p

`² ,¯τ

.

Thus, for allx∈L(x0), we have

kf(x−(β^qλmax)f(x))k²≤(1−α0β^p+qλmax)kf(x)k². (19) Finally, observe that there is some positive integer ssuch that β^sλmax < λmin. Therefore, given xk, a new pointxk+1is guaranteed to be found in a nite number of steps of Algorithm3, because the double backtracking loop can only be executed a maximum ofsp+qtimes (after a maximum ofspiterations the value ofαwill be equal toαβ^p, after which, a maximum ofqiterations will be

(12)

enough to nd an appropriate step sizeλk). Thus, we haveαλk ≥α0β^p+qλmax. Consequently, by the acceptance criteria of the step size in Algorithm3, we have

kf(xk+1)k²≤(1−αλk)kf(xk)k²≤(1−α0β^p+qλmax)kf(x)k², and the claims follow.

Remark 3.5. (i) The conditionλmin≤β^qλmax<min

2σ/`²,τ¯ in Theorem3.4is needed to avoid the possibility of an innite loop in an iteration of the algorithm. Nevertheless, we believe this condition should not be too dicult to guarantee in practice, as it basically requires thatλmin is not too big andβ is not too small.

(ii) Certainly, the constantβ used for updatingαcan be chosen dierent from the constantβused for updatingλ_k, and Theorem 3.4would remain valid with slight changes. Nonetheless, we have decided to use the same constant to ease the notation and the analysis.

(iii) In Algorithm 3, the constant α is required to be smaller than λ⁻¹_max to avoid unnecessary iterations (otherwise, the initial stepλ_k=λ_max would always be too big, since 1−αλ_k would be negative).

Acknowledgements

We are indebted to the referee for his/her valuable comments and suggestions, specially for con- tributing to the paper with Proposition2.11. We would also like to thank Professor Michael A.

Saunders for his careful reading of the manuscript and useful suggestions. This work was sup- ported by the National Research Fund, Luxembourg, co-funded under the Marie Curie Actions of the European Commission (FP7-COFUND), and by the U.S. Department of Energy, Oces of Advanced Scientic Computing Research and the Biological and Environmental Research as part of the Scientic Discovery Through Advanced Computing program, grant #DE-SC0010429.

References

[1] Heinz H. Bauschke and Patrick L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, April 2011.

[2] Alberto Cambini and Laura Martein. Generalized Convexity and Optimization: Theory and Applications. Springer, October 2008.

[3] R. M. T. Fleming and I. Thiele. Mass conserved elementary kinetics is sucient for the existence of a non-equilibrium steady state concentration. Journal of Theoretical Biology, 314:173181, 2012.

[4] Nicolas Hadjisavvas, Sándor Komlósi, and Siegfried S. Schaible. Handbook of Generalized Con- vexity and Generalized Monotonicity. Springer, 2005.

[5] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer, New York, second edition, 2006.

[6] R. Tyrrell Rockafellar and Roger J.-B. Wets. Variational Analysis, volume 317 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences].

Springer-Verlag, Berlin, 1998.