Differentiable McCormick relaxations

(1)

Differentiable McCormick relaxations

The MIT Faculty has made this article openly available.

Please share

how this access benefits you. Your story matters.

Citation

Khan, Kamil A., Harry A. J. Watson, and Paul I. Barton.

“Differentiable McCormick Relaxations.” Journal of Global

Optimization 67, no. 4 (May 27, 2016): 687–729.

As Published

http://dx.doi.org/10.1007/s10898-016-0440-6

Publisher

Springer US

Version

Author's final manuscript

Citable link

http://hdl.handle.net/1721.1/107681

Terms of Use

Creative Commons Attribution-Noncommercial-Share Alike

(2)

(will be inserted by the editor)

Differentiable McCormick relaxations

Kamil A. Khan · Harry A. J. Watson ·

Paul I. Barton

Received: date / Accepted: date

Abstract McCormick’s classical relaxation technique constructs closed-form convex and

concave relaxations of compositions of simple intrinsic functions. These relaxations have several properties which make them useful for lower bounding problems in global optimiza-tion: they can be evaluated automatically, accurately, and computationally inexpensively, and they converge rapidly to the relaxed function as the underlying domain is reduced in size. They may also be adapted to yield relaxations of certain implicit functions and dif-ferential equation solutions. However, McCormick’s relaxations may be nonsmooth, and this nonsmoothness can create theoretical and computational obstacles when relaxations are to be deployed. This article presents a continuously differentiable variant of McCormick’s original relaxations in the multivariate McCormick framework of Tsoukalas and Mitsos. Gradients of the new differentiable relaxations may be computed efficiently using the stan-dard forward or reverse modes of automatic differentiation. Extensions to differentiable re-laxations of implicit functions and solutions of parametric ordinary differential equations are discussed. A C++ implementation based on the library MC++ is described and applied to a case study in nonsmooth nonconvex optimization.

Keywords Nonconvex optimization· Convex underestimators · McCormick relaxations ·

Implicit functions

Mathematics Subject Classification (2000) 49M20· 90C26 · 65G40 · 26B25

This material was supported by Novartis Pharmaceuticals as part of the Novartis-MIT Center for Contin-uous Manufacturing, was also supported by Statoil, and was based in part on work supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357.

K.A. Khan

Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA. E-mail: kkhan@anl.gov

H.A.J. Watson

Process Systems Engineering Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA. E-mail: hwatson@mit.edu

P.I. Barton

Process Systems Engineering Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA. Tel.: +1 617 253 6526, Fax: +1 617 258 5042

(3)

1 Introduction

Branch-and-bound methods for deterministic global optimization [27] require the ability to evaluate a lower bound on a nonconvex function on particular classes of subdomains. This bounding information may be generated using a relaxation scheme by McCormick [38], which evaluates convex underestimators of a nonconvex objective function on interval sub-domains. McCormick’s relaxation method assumes that the objective function can be ex-pressed as a finite composition of known intrinsic functions, but assumes no other global knowledge of the function a priori. Subgradients may be computed for the McCormick re-laxations using dedicated variants [40, 6] of automatic differentiation [22, 43]. Using this information, a lower bound on a nonconvex objective function on an interval may be sup-plied by minimizing the corresponding convex McCormick underestimator using a local optimization solver. Other methods for global optimization, such as nonconvex outer ap-proximation [28] and nonconvex generalized Benders decomposition [32], also require the construction and minimization of convex underestimators.

McCormick’s relaxation method has several useful properties. Firstly, accurate evalua-tion of a convex underestimator and a corresponding subgradient is computaevalua-tionally inex-pensive and automatable; the C++ library MC++ [13, 40] and the Fortran library amodMC [6] use operator overloading to compute these quantities for well-defined user-supplied com-positions of the basic arithmetic operations and functions such as sin/ cos and exp / log.

Secondly, as the width of the interval on which a McCormick relaxation is constructed is reduced to zero, the relaxation approaches the objective function sufficiently rapidly [10] to mitigate a phenomenon called the cluster effect [16, 67], in which a branch-and-bound method will branch many times on intervals that either contain or are near a global mini-mum. Since McCormick relaxations are constructed in closed form, they have been used to generate relaxations for implicit functions [68, 60, 55] and for solutions of parametric ordi-nary differential equations [57, 56]. However, as the following example shows, McCormick’s relaxations can be nondifferentiable.

Example 1 Let a function mid : R3→ R map to the median of its three scalar arguments,

consider the smooth composite function g : R→ R : z 7→ exp(z3_{), and set z}_∗_:

= −1 +√3. As shown in [40, Example 2.1], the function gC_:_{[−1, 1] → R for which}

gC: z7→ exp(mid(z3_{+ 3z}2 − 3, z3

− 3z2

+ 3, −1)) = exp(−1),_exp_(z3_{+ 3z}2 if z≤ z∗, − 3), if z > z∗,

can be generated from g according to McCormick’s rule [40, Section 3] for constructing convex relaxations of a composite function. (In this application of McCormick’s rule,αBB relaxations [3] of the inner function z7→ z3 _{have been employed.) Indeed, g}C _{is convex}

on[−1, 1], and gC

(z) ≤ g(z) for each z ∈ [−1, 1]. However, even though gC_satisfies

Mc-Cormick’s proposed sufficient condition for differentiability of a convex relaxation [38, p. 151], it is in fact nondifferentiable at z∗.

Several factors can introduce failure of differentiability of McCormick relaxations. As illustrated by the above example, the median function used in defining McCormick’s com-position rule is itself nondifferentiable. Secondly, any nondifferentiability in supplied re-laxations of intrinsic functions can propagate to yield nondifferentiability in constructed relaxations of composite functions; this is the case both in McCormick’s original rule for handling bivariate products [38, 40], and in an improved product rule by Tsoukalas and Mit-sos [63]. Thirdly, [54, Example 2.4.1] shows that if McCormick relaxations are weaker than

(4)

constant bounds evaluated using interval arithmetic [41, 44], then this bounding information may be incorporated via nonsmooth “max” and “min” functions.

A relaxation scheme exhibiting continuous differentiability would be desirable for a number of reasons. In general, minimization of nondifferentiable convex objective func-tions requires dedicated numerical methods for nondifferentiable problems such as bun-dle methods [30, 31, 26, 59], which lack the strong convergence rate results of their smooth counterparts. On the other hand, continuously differentiable convex relaxations may be min-imized using standard gradient-based algorithms [46, 45, 12] for local optimization, which generally exhibit at least Q-linear convergence. Gradients of composite differentiable relax-ations may be evaluated readily using standard automatic differentiation techniques [22, 43], circumventing the need for dedicated theory and methods for subgradient propagation [40, 6, 63, 48]. A method constructing closed-form differentiable relaxations could be deployed in methods for relaxing implicit functions [68, 60, 55] and solutions of parametric ordinary differential equations [57, 56], to make these relaxations differentiable and to remove theo-retical obstacles concerning subgradient evaluation.

Thus, the goal of this article is to present a variant of McCormick’s relaxation scheme that produces continuously differentiable relaxations — even if the original function is nons-mooth — while retaining the various theoretical and computational benefits of McCormick’s original method. To achieve this, conditions are established under which the general multi-variate McCormick relaxations of Tsoukalas and Mitsos [63] are differentiable, circumvent-ing the obstacles listed above. Relaxation rules satisfycircumvent-ing both these conditions and the rapid convergence properties of [10, 42] are then presented for several intrinsic functions, includ-ing univariate functions such as “exp” and “log”, the absolute-value function, bivariate sums and products, and the bivariate “max” and “min” functions. Using these rules, closed-form continuously differentiable relaxations may be constructed and evaluated for finite compo-sitions of the considered intrinsic functions; moreover, these relaxations are shown to be twice-continuously differentiable in many cases. Further rules are provided for evaluating gradients of the differentiable relaxations analytically, for use in automatic differentiation methods. Computation of these relaxations and gradients was implemented in C++ by mod-ifying the header library MC++ [13]; this implementation is applied to several examples for illustration. Extensions of the developed relaxations to implicit functions and solutions of differential equations are also considered.

Several established approaches also construct useful differentiable convex and concave relaxations of functions. Interval arithmetic [41, 44] provides constant bounds that are inex-pensive to evaluate, but are generally weaker than the McCormick relaxations, and lack the rapid convergence property [10] required to avoid clustering [67, 16] in methods for global optimization. Nevertheless, the classical McCormick relaxations [38, 40] and the relaxations in this article employ interval arithmetic to glean information about the global behavior of the considered function.

TheαBB relaxation scheme [3] also produces closed-form relaxations, and shares sev-eral of the benefits of McCormick’s method outlined above. TheαBB relaxations of twice-continuously differentiable functions are themselves twice-twice-continuously differentiable, and also converge rapidly to the relaxed function as the considered domain is reduced in size. The

αBB scheme is particularly well-suited to sums of simple terms; evaluation ofαBB relax-ations of more complicated compositions typically requires bounding the largest eigenvalue of the considered function’s Hessian on the considered domain. Moreover,αBB relaxations cannot generally be applied to nonsmooth functions; in cases where this is possible, the re-laxations themselves will also be nonsmooth. The rere-laxations developed in this article are compared to theαBB relaxations in an example in Section 7.3.

(5)

The Auxiliary Variable Method (AVM) of Tawarmalani and Sahinidis [62, 61] is used in the state-of-the-art nonconvex solver BARON [51, 50], and employs McCormick relaxations in a different manner. Rather than constructing closed-form relaxations for factorable func-tions that are composifunc-tions of simple intrinsic funcfunc-tions, the AVM instead relaxes noncon-vex nonlinear programs (NLPs) directly, replacing them with connoncon-vex programs that provide lower-bounding information. Roughly, provided that the objective function and the con-straints in an NLP are factorable, the AVM constructs a relaxed optimization problem with auxiliary variables and constraints that correspond to relaxations of each intrinsic function in the original NLP’s constraints and objective function. The resulting convex program has differentiable constraints, and is at least as tight a relaxation of the original NLP as a convex program formed by replacing each objective function and each constraint by a correspond-ing closed-form McCormick relaxation [63, 62]. The nonconvex solvers ANTIGONE [39], COUENNE [7], SCIP [1, 2], and LINDO Global [34] also employ approaches for construct-ing differentiable relaxations of NLPs that are similar in some respects to the AVM. When applied to a nonconvex optimization problem, the closed-form differentiable McCormick approach of this article may have an advantage when the AVM involves a large number of auxiliary variables, yielding large subproblems at each node, and when formulation of a nonsmooth nonconvex problem for solution by BARON requires appending many additional variables and constraints. As shown in Section 7, the differentiable McCormick relaxations developed in this article were embedded in simple branch-and-bound solvers, which were found in turn to perform comparably to BARON in a case study problem in nonconvex opti-mization. Moreover, since the AVM does not produce closed-form differentiable relaxations for factorable functions, it cannot be used directly to produce differentiable relaxations for implicit functions or solutions of parametric ordinary differential equations.

We note, briefly, that the relaxations presented in this article are compatible with the generalized McCormick framework described by Scott et al. [54, 68, 52, 58]; in the language of [68], the relaxations developed in this article are valid relaxation functions. For simplic-ity, we do not pursue this connection further. We also note that twice-continuously differ-entiable relaxations may be computed reliably for the functions considered in this article, using a method in the first author’s Ph.D. thesis [29] that employs smoothing constructions analagous to those in [8, 19, 17, 47]. Again, this approach is not pursued further; the relax-ations presented in the current article are generally tighter and are simpler to implement.

This article is structured as follows. Section 2 summarizes key established results con-cerning McCormick’s classical relaxations [38, 40], a generalized relaxation scheme by Tsoukalas and Mitsos [63], and an appropriate notion of differentiability on closed sets [69]. Section 3 presents methods for propagating differentiable relaxations through compositions involving known univariate intrinsic functions, including nonsmooth intrinsic functions such as the absolute-value function. Section 4 discusses propagating differentiable relaxations through compositions of multivariate intrinsic functions, and shows that any such rules can-not be straightforward generalizations of their univariate counterparts. Rules are presented for handling addition, multiplication, and the nonsmooth bivariate “max” and “min” func-tions. Section 5 shows that the generated differentiable relaxations converge rapidly to the relaxed function as the underlying domain is reduced in size. Section 6 demonstrates the util-ity of differentiable McCormick relaxations in constructing differentiable relaxations for im-plicit functions and for solutions of parametric ordinary differential equations. Section 7 de-scribes a C++ implementation of the described differentiable relaxation scheme, developed by modifying the C++ header library MC++ [13]. Examples of relaxations of simple func-tions are presented for illustration, and compared against theαBB relaxations [3]. A case

(6)

study in nonconvex optimization is presented, in which the performance of BARON [51, 50] is compared against the approach of this article.

2 Mathematical background

This section presents the mathematical background underlying the results and methods in this article. Section 2.1 describes McCormick’s relaxations [38, 40] and the multivariate framework of Tsoukalas and Mitsos [63], and Section 2.2 describes notions of differen-tiability [69] that apply to functions defined on compact boxes.

The following notational conventions are used in this article. Any vector space Rn is equipped with the standard Euclidean normk · k and inner product h·, ·i. The ith _unit

coor-dinate vector in Rn_{is denoted as e}

(i); the dimension n of e(i)will be clear from the context.

The ithcomponent of a vector d∈ Rn_{is denoted as d}

i:= hd, e(i)i. Given vectors x, y ∈ Rn,

inequalities such as x≤ y or x < y are to be interpreted componentwise. The interior, closure,

and convex hull of a set S⊂ Rn_{are denoted as int}_{(S), cl(S), and conv(S), respectively; the}

boundary of S is then bd(S) := cl(S) \int(S).

An interval is a nonempty compact connected subset of R. The set of all intervals is denoted as IR. Intervals are denoted with boldface lowercase letters (e.g. x), with associated bounds denoted as x := inf x and x := sup x. Observe that x ≤ x and that x = [x, x]. Further

details of interval analysis are presented in [44, 41, 4]. While “[a, b]” refers to the interval {ξ: a≤ξ ≤ b}, “(a, b)” instead refers exclusively to a vector formed by concatenating a

and b, and never to an open set{ξ: a<ξ< b}.

2.1 McCormick’s relaxations

McCormick’s relaxation scheme [38, 40, 63] computes convex underestimators and concave overestimators for factorable functions, which are well-defined finite compositions of simple known intrinsic functions. The discussions of these underestimators and overestimators in this article assume knowledge of established properties of convex sets and convex functions, as presented in [48, 25, 11].

Definition 1 Consider a set X⊂ Rn_{, a function h : X}

→ R, and a convex subset C ⊂ X. A

function hC: C→ R is a convex relaxation of h on C if hC_{is convex and h}C

(x) ≤ h(x) for

each x∈ C. A function hC: C→ R is a concave relaxation of h on C if hCis concave and hC(x) ≥ h(x) for each x ∈ C.

The convex envelope of h on C is the unique convex relaxation of h on C that dominates all other convex relaxations of h on C. The concave envelope of h on C is the unique concave relaxation of h on C that is dominated by all other concave relaxations of h on C .

Tsoukalas and Mitsos [63] provide a general scheme for generating convex and concave relaxations of finite compositions of known intrinsic functions, by recursive application of the following rule, which generalizes earlier rules by McCormick [38, 40]. The continuously differentiable relaxations developed in this article are constructed using this rule.

Theorem 1 (Theorem 2 in [63]) Consider nonempty compact convex sets Z⊂ Rn_{and X} i⊂

R for each i∈ {1, . . . ,m}, and define X := X1× . . . × Xm. Consider functionsφ: X→ R and

(7)

– a continuous convex relaxation fC_i : Z→ Xiof fion Z for each i∈ {1, . . .,m},

– a continuous concave relaxation fCi : Z→ Xiof fion Z for each i∈ {1, . . .,m},

– a continuous convex relaxationφCofφ on X, and

– a continuous concave relaxationφCofφ on X.

Then, the following mapping is a continuous convex relaxation of the composite function g : Z→ R : z 7→φ( f1(z), . . ., fm(z)) on Z:

gC: Z→ R : z 7→ minnφC(ξ) :ξ∈ Rm, fC_i(z) ≤ξi≤ fCi(z), ∀i ∈ {1, . . .,m}

o

, (1)

and the following mapping is a continuous concave relaxation of g on Z: gC: Z→ R : z 7→ maxnφC(ξ) :ξ∈ Rm_, _fC i(z) ≤ξi≤ f C i(z), ∀i ∈ {1, . . .,m} o . (2) Weierstrass’s Theorem guarantees that the optimization problems defining gCand gChave solutions for each z∈ Z. Observe that the optimization problem (1) is a convex program,

and the problem (2) is an analogous concave maximization problem on a convex set. Thus, since both problems are bound-constrained for each z∈ Z, the well-known

Karush-Kuhn-Tucker (KKT) conditions (discussed in [5], for example) are necessary and sufficient for any solution of (1) or (2).

Theorem 1 is intended to be applied withφ chosen from a library of known intrin-sic functions, for which the optimization problems (1) and (2) are readily solved. These problems are solved trivially, for example, ifφ is affine. In addition, Tsoukalas and Mitsos provide closed-form expressions [63, Sections 5 and 6] for the relaxations gCand gCwhen

φ is either the bilinear product function(x, y) ∈ R2

7→ xy or the bivariate “max” or “min”

functions, withφCandφCchosen to be the convex and concave envelopes ofφin each case. These relaxations are generally tighter, respectively, than McCormick’s original treatment of the product [38, 40] and the treatment of “max” and “min” based on the identities:

max{x, y} ≡1

2(x + y + |x − y|) and min{x, y} ≡ 1

2(x + y − |x − y|).

Observe that tighter relaxations gC_{and g}C_{may be obtained if the supplied sets X} iare

chosen to be smaller while still satisfying the demands of the theorem. In practice, appro-priate sets Xiare computed using natural interval extensions [41, 44].

When m= 1 in Theorem 1,φ : X1→ R is a univariate function. In this case, the

re-laxations provided by the theorem reduce to McCormick’s classical rule [38] for handling univariate intrinsic functions. As the following result from [63] shows, the optimization problems (1) and (2) may then be solved analytically without any additional assumptions on the provided functions and relaxations. Here, the function mid : R3→ R returns the median

of its three scalar arguments.

Corollary 1 (Proposition 1 in [63]) Suppose that the conditions of Theorem 1 hold with

m= 1, and set f := f1. In this case, X is an interval x∈ IR. Letξmin∗ be a minimum ofφ C

on x, and letξmax∗ be a maximum ofφ C

on x. Then, the relaxations gCand gCin Theorem 1 are described equivalently as follows:

gC: Z→ R : z 7→φC_{(mid( f}C_{(z), f}C_(z),_ξ∗

min)),

(8)

In practice, using the above corollary typically requires closed-form expressions for both

ξ∗

minandξmax∗ , which can be obtained for many choices ofφ. These are straightforward to

obtain when gCand gCare monotone; if gCis increasing, for example, thenξ_min∗ may be set to x.

2.2 Differentiability on open and closed sets

Since McCormick relaxations are constructed on compact domains, describing differentia-bility of these relaxations requires some care at the boundaries of these domains. This article considers differentiability in the sense of Whitney [69]. As described in this section, Whit-ney differentiability coincides with classical (Fr´echet) differentiability on the interior of a function’s domain, satisfies the classical chain rule even at the domain’s boundary, and pro-vides meaningful subgradient information for convex functions and concave functions.

Given an open set X⊂ Rn_{, a function f : X}

→ Rm_{is (Fr´echet) differentiable at x}

∈ X if

there exists a matrix A∈ Rm×n_{for which}

0= lim

h→0

f(x + h) − ( f (x) + Ah)

khk .

In this case, the above equation defines A uniquely, and A is the unique derivative D f(x) of f

at x. If x is partitioned, for example, into x≡ (ξ,ζ), then partial derivatives∂_∂ξf(x) and_∂ζ∂f(x)

are defined as appropriate submatrices of D f(x). If m = 1, in which case f is scalar-valued,

then the gradient of f at x is the column vector∇f(x) := (D f (x))T ∈ Rm_.

Given an open set X⊂ Rn_{, a function f : X}_{→ R}n_{is continuously differentiable (C}1_{) at}

x∈ X if it is differentiable on some neighborhood N ⊂ X of x and the derivative mapping

y7→ D f (y) is continuous at x. If m = 1, in which case f is scalar-valued, then f is

twice-continuously differentiable (C2) on X if f is C1on X and there exists a continuous Hessian mapping x7→∇2_f (x) ∈ Rn_×n_{for which} 0= lim h_→0 f(x + h) − ( f (x) + h∇f(x), hi +1 2hh,∇ 2_f (x) hi) khk2 , ∀x ∈ X.

A vector-valued function f is C2if each of its component functions is C2. Higher orders of continuous differentiability are defined analogously.

By specializing a classical result by Whitney [69], differentiability on closed sets such as intervals can be defined in a manner that is consistent with the classical chain rule of differentiation, as follows.

Definition 2 (adapted from [69]) Given a closed set B⊂ Rn_{and some}_ν_{∈ N, a function}

f : B→ Rm_{is Whitney-C}ν _{on B if there exist an open set X}

⊂ Rn_{and a Whitney extension}

ˆf : X → Rm_{such that B}

⊂ X, ˆf ≡ f on B, and ˆf is Cν (in the classical sense) on X. Given any point x in the boundary of B, define a derivative D f(x) := D ˆf(x). If m = 1, in which

case f is scalar-valued, then define a gradient∇f(x) := D f (x)T_.

Remark 1 When x lies in the boundary of B, it is possible that D f(x) is not uniquely

spec-ified by the above definition, since ˆf might not be specified uniquely. For example, if B comprises a single point{x0} ⊂ Rn, then D f(x0) may be chosen to be any element of Rm×n,

since ˆf may be chosen to be any Ci_{function for which ˆf}_(x

(9)

The Whitney Extension Theorem [69, Theorem I] provides an equivalent characteriza-tion of Whitney-Cν functions. Despite the possible nonuniqueness implied by the previ-ous remark, the following propositions show that the classical chain rule continues to hold for derivatives of compositions of Whitney-C1functions. Both propositions are immediate corollaries of the Whitney Extension Theorem.

Proposition 1 Consider a closed set B⊂ Rn _{and a Whitney-C}1 _{function f : B}

→ Rm_{. If}

there exists any sequence{x(k)}k∈N→ x in B\{x}, then any derivative D f (x) satisfies

0= lim

h_→0

(x+h)∈B

f(x + h) − ( f (x) + D f (x) h)

khk .

Proposition 2 Consider nonempty sets B⊂ Rn _{and Y}

⊂ Rm_{such that B is either closed,}

open, or both, and such that Y is either closed, open, or both. For any fixedν∈ N, given

functions g : B→ Y and f : Y → Rp_{that are C}ν_{in either the classical sense or the Whitney}

sense, consider the composite function h≡ f ◦ g : B → Rp_{. If B is open, then h is C}ν _{on B}

in the classical sense; if B is closed, then h is Whitney-Cν on B.

Moreover, for each x∈ B, Dh(x) = D f (g(x)) Dg(x). (If B is closed and x lies in the

boundary of B, then this construction of Dh(x) satisfies both Definition 2 and Proposition 1

for some Whitney extension ˆh of h.)

Corollary 2 Given a closed convex set B⊂ Rn_{and a convex Whitney-C}1_{function f : B} → R, for each x∈ B,∇f(x) is a subgradient of f at x in that

f(y) ≥ f (x) + h∇f(x), y − xi, ∀y ∈ B.

The following proposition shows that local Whitney extensions imply the existence of a single global Whitney extension.

Proposition 3 Consider a compact convex set C⊂ Rn_{with nonempty interior, and a}

func-tion f : C→ Rm_{. Suppose that, for each x}

∈ C, there exists a neighborhood Nx⊂ Rnof x and

a C1functionφx: Nx→ Rmfor whichφx≡ f on Nx∩C. Then f is Whitney-C1on C.

Proof See Appendix A.1. ⊓⊔

3 Differentiable relaxations of univariate intrinsic functions

This section shows that, in Corollary 1, the generated relaxations of g≡φ◦ f are Whitney-C1_{under minimal assumptions on the supplied relaxations of f and the univariate}

intrin-sic functionφ. Ifφ is Whitney-C1_{on its domain, then its convex and concave envelopes}

are shown to satisfy these assumptions; ifφ is nonsmooth, then more care must be taken. Closed-form expressions are provided for the gradients of the generated relaxations. Under certain additional assumptions, the generated relaxations of g are shown to be Whitney-C2; these additional assumptions are readily satisfied for many typical choices of the univariate intrinsic functionφ.

(10)

3.1 Establishing differentiability

This section shows that the relaxations described by Corollary 1 for univariate composition are in fact Whitney-C1when the following assumption is satisfied. When a stronger version of this assumption is satisfied, the relaxations described by Corollary 1 are Whitney-C2.

Assumption 1 Consider the conditions of Corollary 1, and suppose that the supplied

relax-ations fC, fC,φC, andφCare each Whitney-C1_{on their respective domains.}

Observe that Assumption 1 does not demand any differentiability properties of f orφ. Nevertheless, the following proposition shows that the convex and concave envelopes of any unvariate Whitney-C1 function are themselves Whitney-C1. In this case, convex and concave envelopes satisfy the above assumption’s requirements concerningφCandφC.

Proposition 4 Consider an interval x∈ IR, and a Whitney-C1_{function f : x}

→ R. The

convex envelope fC: x→ R of f on x is also Whitney-C1_{on x, as is the concave envelope}

fC: x→ R of f on x.

Theorem 2 Suppose that Assumption 1 holds. The relaxations gC and gC described by Corollary 1 are Whitney-C1on Z, with gradients:

∇gC(z) = maxn0,∇φC _fC_(z)o ∇fC(z) + minn0,∇φC _fC (z)o ∇fC(z), and ∇gC(z) = minn0,∇φC fC(z)o∇ fC(z) + maxn0,∇φC fC(z)o∇ fC(z).

Proof The required results concerning gC_{will be demonstrated; a similar argument yields}

the results concerning gC_{. Since}φC _{is Whitney-C}1_{, and since any C}1 _{function is locally}

Lipschitz continuous,φCis Lipschitz continuous on x. Thus, proceeding as in the proof of Proposition 4, the following function is a C1_{Whitney extension of}_φC_{on x:}

ψ: R→ R :ξ7→    φC_{(x) + (D}_φC_(x))(_ξ − x), ifξ< x, φC (ξ), ifξ∈ x, φC_{(x) + (D}_φC_(x))(_ξ − x), if x<ξ.

SinceφCis convex, ∇φC is increasing on x. Thus,∇ψ is increasing on R, and soψ is convex. Define a set H := {(ℓ,u) ∈ R2_:_{ℓ ≤ u}, and a function}

γ: H→ R : (ℓ,u) 7→ min{ψ(ξ) : ℓ ≤ξ≤ u}.

Observing that gC(z) =γ( fC_{(z), f}C

(z)) for each z ∈ Z, it suffices to show thatγis

Whitney-C1 _{on the set K :}

= {(ℓ,u) ∈ R2 _:

ℓ ∈ x, u ∈ x, ℓ ≤ u}. Now, if∇ψ(x) > 0, then ψ is increasing on R; thus,γ(ℓ, u) = ℓ for each (ℓ, u) ∈ K. This shows thatγis Whitney-C1on K, as required. A similar argument shows thatγis Whitney-C1on K if∇ψ(x) < 0.

It only remains to consider the case in which∇ψ(x) ≤ 0 ≤∇ψ(x). By the

intermediate-value theorem, there existsη∗∈ x with∇ψ(η∗_{) = 0. Since}_ψ_{is convex,}_η∗_{is a minimum}

ofψon R; sinceη∗∈ x, this implies thatψ(η∗_{) =}_ψ₍_ξ∗

min). Thus,ξmin∗ is a minimum ofψ

on R, in which case∇ψ(ξmin∗ ) = 0. Consider the functions:

(11)

Since∇ψ(ξ∗

min) = 0, bothψIandψDare C1, as is the function:

˜

γ: R2→ R : (a, b) 7→ψI(a) +ψD(b) −ψ(ξmin∗ ).

Moreover,γ(ℓ, u) = ˜γ(ℓ, u) for each (ℓ, u) ∈ K. Thus,γis Whitney-C1on K, as required. The gradient of gC_{may then be evaluated using the chain rule for Whitney-C}1_functions. _⊓_⊔

Observe that the gradients of gCand gCprovided by Theorem 2 do not necessarily coin-cide with the corresponding subgradients provided by [40, Theorem 3.2]. As the following two corollaries show, the differentiability results of Theorem 2 can be improved when As-sumption 1 is strengthened.

Corollary 3 Suppose that the conditions of Corollary 1 hold, and suppose that each of the

following conditions is satisfied:

– fC, fC,φC, andφCare each Whitney-C2_{on their respective domains,} – if∇φC₍_ξ∗

min) = 0, then∇2φC(ξmin∗ ) = 0, and – if∇φC(ξ∗

max) = 0, then∇2φ C

(ξ∗

max) = 0.

Then, the relaxations gC_{and g}C_{described by Corollary 1 are Whitney-C}2_{on Z.}

Proof The required result concerning gCwill be demonstrated; the result concerning gCis analogous. The proof of Theorem 2 is still applicable under the assumptions of this corol-lary; to extend that proof to demonstrate this corollary, it suffices to show that the function

γis Whitney-C2_{on K. If either}_∇ψ_{(x) > 0 or}_∇ψ_{(x) < 0, then}_γ_{is linear and is therefore}

Whitney-C2. If∇ψ(x) ≤ 0 ≤∇ψ(x), then observe that, under the assumptions of this

corol-lary, the functionsψIandψDare C2on R. Thus, the function ˜γis C2on R2, which implies

thatγis Whitney-C2on K. ⊓⊔

Corollary 4 Suppose that the conditions of Corollary 1 hold, and suppose that, for some

ν∈ N, each of the following conditions is satisfied:

– fC, fC,φC, andφCare Whitney-Cν on their respective domains,

– φCis either increasing on x or decreasing on x, and

– φCis either increasing on x or decreasing on x.

Then, the relaxations gCand gCdescribed by Corollary 1 are Whitney-Cν on Z.

Proof This corollary is a special case of Theorem 3 below. ⊓⊔

When the requirements of Corollary 4 are met, then the expressions for∇gCand∇gC provided by Theorem 2 become simpler. For example, ifφC_{is increasing, then Theorem 2}

implies that, for each z∈ Z,

∇gC(z) =∇φC_{( f}C_(z))_∇_fC_(z).

Table 1 presents relaxations φC and φC for various choices of φ : x→ R; these

re-laxations are readily verified to satisfy the demands of either Corollary 3 or Corollary 4 withν:= 2. The remainder of this section presents similarly appropriate relaxations for the

following choices ofφ:

(12)

Table 1 RelaxationsφC,φCof various functionsφ: x→ R on various intervals x ∈ IR. These relaxations satisfy the demands of either Corollary 3 or Corollary 4 withν= 2. Here, R+:= {ξ∈ R : 0 <ξ} and R

−:= {ξ∈ R :ξ< 0}.

B φ(ξ) forξ∈ x ⊂ B φC_(ξ₎ _φC

(ξ)

R aξ_{+ b for a,b ∈ R} aξ_{+ b} aξ_{+ b}

R _expξ _expξ _{exp x}_{+ (exp x − exp x)}

_ξ −x x−x R₊ _lnξ _{ln x}_{+ (ln x − ln x)}ξ−x x−x lnξ R ξ2k+2_{for k}_{∈ N} _ξ2k+2 _x2k+2_{+ (x}2k+2_{− x}2k+2₎ξ−x x−x R₊ p ξ √x+ (√x₋√x)ξ_x_−x−x p ξ R₊ 1 ξkfor k∈ N 1 ξk 1 xk+ ( 1 xk− 1 xk) _ξ −x x−x R − _ξ12kfor k∈ N 1 ξ2k 1 x2k+ ( 1 x2k−_x12k) _ξ −x x−x R₊ 1 ξ2k−1 for k∈ N x2k1−1+ (x2k1−1−x2k1−1) _ξ −x x−x 1 ξ2k−1

– an odd powerξ7→ξ2k+1_{for k}_{∈ N,}

– the absolute-value functionξ7→ |ξ|,

– the mappingξ 7→ max{ξ, a} for some a ∈ R, and – the mappingξ 7→ min{ξ, a} for some a ∈ R.

If these choices ofφ,φC, andφC are employed in Corollary 1, then Theorem 2 provides gradients of the obtained relaxations gCand gCof the composite function g≡φ◦ f .

3.2 Relaxing squares and odd powers

This section considers the cases in which the functionφ in Corollary 1 is the power function

ξ7→ξa_{with a}

∈ {2}∪{2k+1 : k ∈ N}. In each case, Proposition 4 implies that the demands

of Assumption 1 onφCandφCare satisfied when these functions are chosen to be the convex and concave envelopes ofφ, respectively. Thus, when a= 2, the demands of Theorem 2 on

φC_and_φC

are satisfied whenφC:=φ and

φC

: x→ R :ξ7→ x2_{+ (x + x)(}_ξ

− x). (3)

When a= 2k + 1 for k ∈ N, the demands of Theorem 2 onφC_and _φC _{are satisfied when}

these functions are chosen to be the the odd-power envelopes described in [33].

However, in each of the cases above, the choices ofφC and φC are not necessarily Whitney-C2_{, and may fail to satisfy the assumptions of Corollaries 3 or 4. The following}

propositions provide alternative relaxations ofφ that are readily verified to satisfy the as-sumptions of one of these corollaries.

Proposition 5 Suppose that the conditions of Corollary 1 hold, and thatφ is the mapping x7→ x2_{. The following choices of}_φC_and_φC_{satisfy the demands of Corollary 3:}

φC : x→ R :ξ7→          ξ2 , if 0 ≤ x or x ≤ 0, ξ3 (x), if x < 0 < x and 0 ≤ξ, ξ3 (x), if x < 0 < x andξ< 0,

(13)

andφCdefined as in (3), withξ_min∗ := mid (x, x, 0) andξmax∗ ∈ argmaxζ∈{x,x}ζ2.

Proposition 6 Suppose that the conditions of Corollary 1 hold, and thatφ is the mapping x7→ x2k+1 _{for some k}_{∈ N. The following choices of} _φC _and _φC _{satisfy the demands of}

Corollary 4 whenν≤ 2k + 1: φC : x→ R :ξ7→            x2k+1+ (x2k+1_{− x}2k+1₎ξ−x x_−x , if x ≤ 0, x2k+1 _x −ξ x−x + (max{0,ξ})2k+1_{, if x < 0 < x,} ξ2k+1_, _{if 0}_{≤ x,} φC : x→ R :ξ7→            ξ2k+1_, _{if x}_{≤ 0,} x2k+1 _ξ −x x−x + (max{0,ξ})2k+1_{, if x < 0 < x,} x2k+1+ (x2k+1_{− x}2k+1₎ξ−x x−x , if 0 ≤ x,

withξ_min∗ := x andξmax∗ := x.

3.3 Relaxing the absolute-value function and variants

This section considers the cases in which the functionφ in Corollary 1 is either of the non-smooth univariate intrinsic functionsξ7→ |ξ|,ξ7→ max{ξ, a}, orξ 7→ min{ξ, a}. In each

case, relaxationsφCandφCare provided that are readily verified to satisfy the requirements of Theorem 2 and Corollary 3. Since these choices ofφare nonsmooth, the assumptions of Proposition 4 are not necessarily met; the convex and concave envelopes ofφ are thus less useful when obtaining Whitney-C1_relaxations.

Proposition 7 Suppose that the conditions of Corollary 1 hold, and thatφ is the absolute-value mapping x7→ |x|. Choose anyµ≥ 1. The following choices ofφCandφCsatisfy the demands of Assumption 1: φC_{: x} → R :ξ7→        |ξ|, if 0≤ x or x ≤ 0, x_(x)ξ µ+1, if x< 0 < x and 0 ≤ξ, −x_(x)ξ µ+1, if x < 0 < x andξ< 0, φC : x→ R :ξ7→ |x| + (|x| − |x|) ξ − x x− x ,

withξmin∗ := mid (x, x, 0) andξmax∗ ∈ argmaxζ∈{x,x}|ζ|. Moreover, ifµ≥ 2, thenφCandφ

C

satisfy the demands of Corollary 3.

Proposition 8 Suppose that the conditions of Corollary 1 hold, and thatφis the univariate “max” function x7→ max{x, a} for some fixed a ∈ R. Choose any µ ≥ 1. The following

(14)

choices ofφCandφCsatisfy the demands of Corollary 4 whenν≤µ: φC_{: x} → R :ξ7→      a, if x≤ a, ξ, if a≤ x, a+ (x − a)max n

0,ξ_x_−a−aoµ+1, otherwise,

φC

: x→ R :ξ7→ max{a, x} + (max{a,x} − max{a, x}) ξ

− x

x− x

,

withξmin∗ := x andξmax∗ := x.

The univariate “min” function may then be handled using the identity: min{x, a} ≡ −max{−x, −a}.

4 Differentiable relaxations of multivariate intrinsic functions

This section presents conditions under which the multivariate McCormick relaxations pro-vided by Theorem 1 are Whitney-C1. Satisfaction of these conditions is illustrated in the special cases of bivariate addition, multiplication, “max”, and “min”.

In general, the approach of the previous section is not transferable to this multivariate case. While Proposition 4 shows that the convex and concave envelopes of a univariate Whitney-C1function on an interval are themselves Whitney-C1, this is not necessarily the case for multivariate Whitney-C1functions. For example, the convex and concave envelopes of the bivariate product mapping(x, y) 7→ xy on a box X ⊂ IR2_{are both nonsmooth whenever}

X has nonempty interior. Thus, although Mitsos et al. recommend setting the relaxationsφC andφCin Theorem 1 to be convex and concave envelopes ofφon X where possible [63, 42], obtaining Whitney-C1_{relaxations may require employing weaker relaxations of}_φ _instead.

4.1 Establishing differentiability

Theorem 3 Suppose that the conditions of Theorem 1 are satisfied, and that, for someν∈ N, the supplied relaxations fC_i, fCi,φC, andφ

C

are Whitney-Cνon their respective domains. Suppose that, for each i∈ {1, . . .,m}, the partial derivative∂φ_∂xCi(ξ) is either nonnegative for

allξ∈ X or nonpositive for allξ∈ X. Then, the convex underestimator gC_{is Whitney-C}ν_on

Z. Similarly, if, for each i∈ {1, . . .,m}, the partial derivative ∂φ_∂xCi(ξ) is either nonnegative

for allξ∈ X or nonpositive for allξ∈ X, then the concave overestimator gC_{is Whitney-C}ν

on Z.

Proof The claim regarding gCwill be demonstrated; a similar argument yields the claim regarding gC_{. The optimization problem (1) can be solved by inspection to yield:}

gC: z7→φC( f1∗(z), . . ., fm∗(z)),

where, for each i∈ {1, . . .,m}, the mapping fi∗: Z→ Xi is defined as follows. If ∂φ C

∂xi is

nonnegative on X, then set fi∗:= fCi; otherwise, if

∂φC

∂xi is nonpositive on X, then set f

∗

i := f C i.

(15)

Observe that the requirements of Theorem 3 concerningφCandφCare satisfied in the special case whereφ is affine and φC:=φC:=φ. As the following section will show, the supplied convex relaxationφCfor the product mappingφ:(x, y) 7→ xy may be chosen

to satisfy the requirements of Theorem 3 and [42, Theorem 6] simultaneously, when X is contained in either the positive or negative quadrant of R2_.

The following theorem demands less stringent requirements of the supplied relaxations ofφand fithan the previous theorem, but can only guarantee continuous differentiability of

the obtained relaxations of g.

Theorem 4 Suppose that the conditions of Theorem 1 are satisfied with m= 2, and that X1

and X2have nonempty interior. In addition, suppose that the supplied relaxations fC_i, f C i, φC_{, and}_φC_{are each Whitney-C}1_{on their respective domains, and that there exist Whitney}

extensionsψCandψCofφCandφCthat satisfy all of the following conditions: 1. ψC_{is convex and}ψC_{is concave on some open convex superset Y of X,}

2. there exists a nonzero vector d∈ R2_{for which either:} – h∇ψC₍ξ_{), di > 0 for each}ξ_{∈ Y , or}

– Y= R2_and

h∇ψC₍_ξ

), di = 0 for eachξ∈ R2_,

3. there exists a nonzero vector d∈ R2_{for which either:} – h∇ψC₍_ξ

), di > 0 for eachξ∈ Y , or – Y= R2_and

h∇ψC(ξ), di = 0 for eachξ∈ R2_.

Then, the relaxations gCand gCare both Whitney-C1on Z.

Observe that, unlike [9, Proposition 3.3.3], this theorem does not require the optimiza-tion problems (1) or (2) to satisfy second-order sufficient optimality condioptimiza-tions, and does not require these problems to have unique solutions. As the following sections will show, when

φ is the mapping(x, y) ∈ R2

7→ xy on any compact subdomain X, the supplied relaxations

φC_and_φC_{can be chosen to satisfy the requirements of Theorem 4.}

4.2 Relaxing sums and products

Theorem 5 Suppose that the conditions of Theorem 1 hold with m= 2, and thatφ is the sum mapping (x1, x2) 7→ x1+ x2. Suppose that, for someν∈ N, the supplied relaxations

fC_i and fCi are Whitney-Cν on their respective domains. Then, the following mapping is a

Whitney-Cνconvex relaxation of g≡ f1+ f2on Z:

gC₊: Z→ R : z 7→ fC 1(z) + f

C 2(z),

and the following mapping is a Whitney-Cν concave relaxation of g on Z: gC₊: Z→ R : z 7→ fC1(z) + f

C 2(z).

Gradients of the mappings gC₊and gC₊are as follows, for each z∈ Z:

∇gC₊(z) =∇fC₁(z) +∇fC₂(z), and ∇gC₊(z) =∇fC1(z) +∇f C 2(z).

(16)

Proof The required results can be obtained directly, but also follow immediately from The-orems 1 and 3 whenφCandφCare each chosen to beφ. ⊓⊔

The relaxations of sums in Theorem 5 and univariate intrinsic functions in Section 3 may be combined to relax products using the identities:

xy≡1 4((x + y) 2 − (x − y)2) ≡1 2((x + y) 2 − x2− y2).

Though these approaches would lead to valid Whitney-C2 relaxations with second-order pointwise convergence, they could introduce unnecessary domain violations into a propa-gated natural interval extension. Moreover, the corresponding relaxations may not be tight in general, due to the repeated appearance of each variable [44]. A dedicated treatment of products can lead to much tighter relaxations.

Note that multiplication by a constant was already considered as a univariate intrinsic function in Table 1. To handle products of two variables, our suggested Whitney-C1 re-laxations of the bilinear product mapping (x, y) 7→ xy involve the following intermediate

functions.

Definition 3 Define IRprop:= {x ∈ IR : x < x}. For anyµ≥ 1, define the following

scalar-valued mappings; in each case, x, y ∈ R, ζ, η ∈ IRprop, and α, β ∈ IR.

δ_×,A:(x, y, ζ, η) 7→ (ζ−ζ)(η−η) y−η η−η− ζ−x ζ−ζ µ+1 , δ_×,B:(x, y, ζ, η) 7→ (ζ−ζ)(η−η) max 0,_ηy−₋η_η−_ζζ₋−x_ζ µ+1 , ψ_×,A:(x, y; ζ, η) 7→1 2 x(η+η) + y(ζ+ζ) − (ζ η+ζ η) +δ_×,A(x, y; ζ, η), ψ ×,B:(x, y; ζ, η) 7→ xη+ yζ−ζ η+δ×,B(x, y; ζ, η), σµ: x7→ ( xµ1_, _{if 0}_{≤ x,} −|x|1µ_{, if x < 0,} x∗:(y, ζ, η) 7→ζ+ (ζ−ζ) η− y η−η+σµ η₊η (µ+1)(η−η) ! , y∗:(x, ζ, η) 7→η+ (η−η) ζ− x ζ−ζ +σµ ζ+ζ (µ+1)(ζ−ζ) ! , gC ×,A:(α, β, ζ, η) 7→ min n ψ_×,A mid α,α, x∗(β, ζ, η),β, ζ, η, ψ ×,A mid α,α, x∗(β, ζ, η),β, ζ, η, ψ_×,A α, midβ,β, y∗(α, ζ, η), ζ, η, ψ ×,A α, midβ,β, y∗(α, ζ, η), ζ, η o.

(17)

Theorem 6 Suppose that the conditions of Theorem 1 hold with m= 2, thatφis the bilinear product mapping(ξ1,ξ2) 7→ξ1ξ2, and that X1and X2are nondegenerate intervals x1, x2∈ IR_prop_{, respectively. Suppose that, for some}ν_{∈ N, the supplied relaxations f}C

i and f C i are

Whitney-Cν on their respective domains, and that the value ofµ employed in Definition 3 satisfiesµ≥ν. For each i∈ {1, 2} and z ∈ Z, let fC

i (z) denote the interval [ f C i(z), f

C i(z)] ∈

IR. For any interval q_{∈ IR, let −q denote the interval [−q, −q] ∈ IR.}

Then, the following mapping is a Whitney-C1convex relaxation of g≡ f1f2on Z:

gC ×: Z→ R : z 7→            ψ ×,B fC₁(z), fC 2(z), x1, x2 , if both 0≤ x1and 0≤ x2, ψ_×,B − fC1(z), − f C 2(z), −x1, −x2 , if both x1≤ 0 and x2≤ 0, gC ×,A f C 1(z), f C 2(z), x1, x2 , otherwise,

and the following mapping is a Whitney-C1_{concave relaxation of g on Z:}

gC_×: Z→ R : z 7→            −ψ_×,B− fC1(z), fC₂(z), −x1, x2 , if both x1≤ 0 and 0 ≤ x2, −ψ_×,BfC₁(z), − fC2(z), x1, −x2 , if both 0 ≤ x1and x2≤ 0, −gC ×,A −f C 1(z), f C 2(z), −x1, x2 , otherwise. Gradients of gC ×and g C

×are provided in Proposition 15 in Appendix B. If each of the

follow-ing conditions is also satisfied:

– µ≥ν≥ 2, – fC₁, fC1, fC₂, and f

C

2 are Whitney-Cν on their respective domains, – either 0_{≤ x}1or x1≤ 0, and – either 0≤ x2or x2≤ 0, then gC ×and g C ×are Whitney-Cνon Z.

Proof It suffices to demonstrate only the claims concerning gC

×, for the following reason.

Observe that− fC1 is a convex relaxation of − f1 on Z, and that − fC₁ is a concave

relax-ation of − f1 on Z. Thus, if the claims concerning gC

× are demonstrated, then, since f1

and f2 were assigned arbitrarily, comparison of the definitions of gC

× and g

C

× shows that

−gC

× is a Whitney-Cν convex relaxation of the product z7→ (− f1(z)) f2(z) on Z. Since

g(z) ≡ −((− f1(z)) f2(z)), gC_×is then the required Whitney-Cνconcave relaxation of g on Z.

To demonstrate the claims concerning gC

×, first, consider the case in which both 0≤ x1

and 0≤ x2. In this case, it will be shown that the requirements of Theorem 3 concerning

gC_{are met with the substitutions g}C_:_{= g}C

×andφ

C_:₍_ξ

1,ξ2) 7→ψ

×,B(ξ1,ξ2, x1, x2). Since

the mappingξ7→ (max{0,ξ})µ+1 is increasing, convex, and Cµ on R for anyµ≥ 1, the

mapping(x, y) 7→δ×,B(x, y, x1, x2) is convex and Cµ on R2, and has nonnegative partial

derivatives∂δ_∂×,B_x and ∂δ_∂×,B_y . Thus,φCis convex and Cµ on R2, and has nonnegative partial derivatives ∂φ

C

∂x1 and

∂φC

∂x2. Since the mapping z7→ (max{0, z})

µ+1_{is dominated on}_{{z ∈ R :}

z≤ 1} by z 7→ max{0, z},φC _{is dominated on Z by the convex envelope of}_φ _{on Z, as}

(18)

The case in which both x1≤ 0 and x2≤ 0 is then handled by noting that the previous case

applies to the reformulation of g as the “positive-orthant” product z7→ (− f1(z))(− f2(z))

on Z.

Lastly, consider the case in which gC is set to(x1, x2) 7→ gC

×,A(ξ1,ξ2, x1, x2). In this

case, defineφC:(ξ1,ξ2) ∈ R27→ψ_×,A(ξ1,ξ2, x1, x2). Define d := (x1− x1, −(x2− x2)) ∈ R2_{, and observe that, for each}ξ_{∈ R}2_,

h∇φC₍_ξ ), di =1

2(x2+ x2)d1+12(x1+ x1)d2+ 0 = x1x2− x1x2. (4)

Thus, to show that some case in Theorem 4 applies with either d := d or d := −d, it suffices

to show thatφC is a Whitney-C1convex relaxation ofφ on X, and that, for any intervals

ζ1, ζ2∈ IRpropwith ζ1⊂ x1and ζ2⊂ x2,

gC

×,A(ζ1, ζ2, x1, x2) = min{φ

C

(ξ) : ζ_i≤ξi≤ζi, ∀i ∈ {1, 2}}. (5)

Since the mapping z∈ R 7→ |z|µ+1_{is convex and C}1_{, it follows that}_φC_{is both convex and} C1 _{on R}2_{. Moreover, since the mapping z}_{7→ |z|}µ+1 _{is nonnegative and is dominated on} [−1, 1] by z 7→ |z|,φC_{is dominated on X by the convex envelope of}_φ_{. To demonstrate (5),}

choose any intervals ζ1, ζ2∈ IRpropwith ζ1⊂ x1and ζ2⊂ x2. Due to (4), the proofs of

Lemmas 3 and Lemma 4 show that there exists a solution of the convex program min{φC₍_ξ_{) :} _ζ

i≤ξi≤ζi, ∀i ∈ {1, 2}} (6)

in one of the four edges of the box B := {(ξ1,ξ2) ∈ R2:ξ1∈ ζ1,ξ2∈ ζ2}. Moreover, since

this convex program is linearly constrained, the Karush-Kuhn-Tucker (KKT) optimality con-ditions are necessary and sufficient for any solution of (6). Now, observe that, for anyη_{∈ R,}

∂φC ∂x1 (x∗(η, ζ1, ζ2),η) = 0 and ∂φC ∂x2 (η, y∗(η, ζ1, ζ2)) = 0.

It follows immediately that any KKT point of (6) in the boundary of B must coincide with one of the fourψ

×,Aterms in the definition of g

C

×,A. Since the KKT conditions are necessary

and sufficient for optimality of (6), (5) is thereby demonstrated. Next, it will be shown that gC

×is Whitney-Cν when the conditions stated at the end of

the theorem are satisfied. The cases in which either both 0≤ x1 and 0≤ x2or both x1≤ 0

and x2≤ 0 have already been covered above. Consider the case in which 0 ≤ x1and x2≤ 0.

In this case, it is readily verified that, for any intervals ζ1, ζ2∈ IRprop with ζ1∈ x1 and ζ2∈ x2,

gC

×,A(ζ1, ζ2, x1, x2) =ψ×,A(ζ1,ζ2, x1, x2).

This equation and (5) imply that, for each z∈ Z, gC

×(z) =ψ×,A( f

C

1(z), fC₂(z), x1, x2). The

chain rule for Whitney-Cν functions then shows that gC

×is Whitney-C

ν_{. The case in which}

(19)

4.3 Relaxing the bivariate “max” and “min” functions

As was the case with bivariate products, the established treatment of addition and univariate intrinsic functions can be combined to obtain Whitney-C2relaxations of the bivariate “max” and “min” functions, since

max{x, y} ≡1

2(x + y + |x − y|) ≡ x + max{0, y − x} ≡ y + max{0, x − y}.

Observe that if the identity max{x, y} ≡ x + max{0, y − x} is applied, then the resulting

Whitney-C1_{convex relaxation is guaranteed to dominate the mapping}_{(x, y) 7→ x. This}

prop-erty will be exploited in Section 6.1 when handling implicit functions.

When this property is not required, however, tighter Whitney-C1 relaxations can be obtained by treating bivariate “max” and “min” as multivariate intrinsic functions. Our sug-gested way to do this involves the following intermediate functions.

Definition 4 For any µ ≥ 1, define the following scalar-valued mappings; in each case,

x, y ∈ R and ζ, η ∈ IRprop. ψ_max:(x, y, ζ, η) 7→                  x, ifη≤ζ, y, ifζ≤η, x+ (η−ζ)max n 0,_ηy−x₋_ζoµ+1, ifη≤ζ<η, y+ (ζ−η)max n 0,_ζx−y −η oµ+1 , ifζ<η<ζ,

θ:(x, y, ζ, η) 7→ (max{ζ,η} + max{ζ,η} − max{ζ,η} − max{ζ,η}) × max 0,_ζζ−x −ζ− y−η η−η µ+1 , ψmax:(x, y, ζ, η) 7→                x, ifη≤ζ, y, ifζ≤η,

max{ζ,η} − (max{ζ,η} − max{ζ,η)})_ζζ−x

−ζ

− (max{ζ,η} − max{ζ,η})_ηη₋−y_η+θ(x, y, ζ, η), otherwise.

The following theorem provides Whitney-C1_{relaxations for the bivariate “max”}

func-tion. Using these relaxations, similar relaxations may be constructed for bivariate “min” via the identity min_{{x, y} ≡ −max{−x, −y}.}

Theorem 7 Suppose that the conditions of Theorem 1 hold with m= 2, thatφis the bivari-ate “max” mapping(ξ1,ξ2) 7→ max{ξ1,ξ2}, and that X1and X2are nondegenerate intervals x1, x2∈ IRprop, respectively. Suppose that, for someν∈ N, the supplied relaxations fC_i and

fCi are Whitney-Cν on their respective domains, and that the value ofµemployed in

Defi-nition 4 satisfiesµ≥ν. Then, the following mapping gC

max: Z→ R is a Whitney-C 1_convex

(20)

relaxation of g≡ max{ f1, f2} on Z: gC_max: z7→                    ψ_max( fC 1(z), f C 2(z), x1, x2), if x2≤ x1or x1≤ x2, ψ_max midfC 1(z), f C 1(z), fC2(z) − (x2− x1)(µ+ 1)− 1 µ_{, f}C 2(z), x1, x2 , if x2≤ x1< x2, ψ_max fC₁(z), midfC₂(z), fC2(z), fC₁(z) − (x1− x2)(µ+ 1)− 1 µ_{, x} 1, x2 , if x1< x2< x1,

and the following mapping is a Whitney-Cν concave relaxation of g on Z: gCmax: Z→ R : z 7→ψmax( f

C 1(z), f

C

2(z), x1, x2).

Gradients of gC_maxand gCmaxare provided in Proposition 16 in Appendix B.

Proof First, the claims concerning gC

maxwill be addressed; define a mappingφ

C_{: R}2_{→ R :} (ξ1,ξ2) 7→ψ_max(ξ1,ξ2, x1, x2). First, consider the case in which either x2≤ x1or x1≤ x2.

In this case,φis affine on X andφC≡φ; the requirements of Theorem 3 concerning gCare then satisfied with gC:= gC

max. Next, consider the case in which x2≤ x1< x2; an analogous

argument applies to the case in which x1< x2< x1. Observe that the mappingξ ∈ R 7→ (max{0,ξ})µ+1_{is C}1_{and convex; it follows immediately that}φC_{is C}1_{and convex on R}2_.

Moreover, observe thath∇φC₍_ξ_{), e}

(1)+ e(2)i = 1 for eachξ∈ R2, and thatφdominatesφC

on X; Theorem 4 implies that the mapping gC_{: z}_{∈ Z 7→ min{}φC₍_ξ_{) : f}C

(z) ≤ξ ≤ fC(z)}

is a Whitney-C1convex relaxation of g on Z. Now, observe that, for eachξ∈ R2_, h∇φC₍_ξ_{), e} (2)i = (µ+ 1) max 0,ξ2−ξ1 x2− x1 µ ≥ 0;

the mean value theorem then implies that gC(z) = min{φC₍_ξ

1, fC₂(z)) : fC₁(z) ≤ξ1≤ f C 1(z)}

for each z∈ Z. Since the mappingξ1∈ R 7→φC(ξ1, fC₂(z)) is convex and C1for each fixed

z∈ Z, Corollary 1 shows that gC ≡ gC

maxon Z, which completes this case.

Next, the claims concerning gCmax will be addressed. It will be shown that the

con-ditions of Theorem 3 concerning gC _{are satisfied with the substitutions g}C_:_{= g}C max and φC

:(ξ1,ξ2) 7→ψmax(ξ1,ξ2, x1, x2). Observe again that the mappingξ7→ (max{0,ξ})µ+1

is increasing, nonnegative, convex, and Cν, and that the coefficient max{ζ,η} + max{ζ,η} − max{ζ,η} − max{ζ,η}

is nonpositive for any ζ, η ∈ IRprop. It follows that the mapping(ξ1,ξ2) 7→θ(ξ1,ξ2, x1, x2)

is concave and Cν, and that the mappings_{(x, y) 7→}∂θ_∂_x_{(x, y, ζ, η) and (x, y) 7→}∂θ_∂_y(x, y, ζ, η)

are nonnegative on R2_{. Thus,}_φC_{is concave and C}ν _{on R}2_{, and the partial derivatives} ∂φC ∂ξ1

and ∂φ

C

∂ξ2 are nonnegative on R

2_{. To show that}_φC_dominates_φ _{on X, observe that, whenever}

x∈ ζ and y ∈ η, 0≤ max 0,_ζζ−x −ζ− y−η η−η µ+1 ≤ max 0,_ζζ−x −ζ − y−η η−η ≤ 1.

(21)

5 Convergence order

Intuitively, tighter convex and concave relaxations provide more useful bounding informa-tion to a numerical method that employs relaxainforma-tions. Moreover, to avoid clustering [16] in branch-and-bound methods for global optimization, computed relaxations must converge to the relaxed function sufficiently rapidly as the underlying domain is shrunk [10, 67, 65, 42]. This section shows that the Whitney-C1relaxations provided in this article satisfy this requirement, and converge to the relaxed function as rapidly as McCormick’s original re-laxations. To show this, we employ the notion of pointwise convergence order developed by Bompadre and Mitsos [10], and extended subsequently to the multivariate McCormick relaxations of Theorem 1 [42]. Combined with [42, Theorem 6], this section shows that if the methods of this article are used to relax a finite composition of Whitney-C1 intrinsic functions, then the obtained relaxations exhibit second-order pointwise convergence. If any employed intrinsic function is nonsmooth, then only first-order pointwise convergence is guaranteed; nevertheless, under certain nonsingularity assumptions, [65, Section 2.3] shows that first-order pointwise convergence suffices in the nonsmooth case to avoid clustering.

In this section, the sets Z and X in Theorem 1 will be varied. Where appropriate, depen-dence of relaxations on these sets will be denoted explicitly; for example,φC(ξ) in

Theo-rem 1 may instead be denoted asφC(ξ; X). Notions of convergence of relaxations employ

the following definition of the diameter of a set.

Definition 5 The diameter of a set S⊂ Rn_{is wid S :}

= sup{kx − yk : x, y ∈ S}.

5.1 Pointwise convergence of differentiable relaxations of intrinsic functions This section shows that the Whitney-C1 _relaxations φC

(· ; X) andφC(· ; X) suggested for

particular intrinsic functionsφ in Sections 3 and 4 satisfy the requirements of [42, Theo-rem 6] concerning the pointwise convergence of supplied relaxations of an outer function F≡φ. In each of these cases, ifφitself is Whitney-C1, then these requirements are satisfied withγF:= 2; otherwise, they are satisfied withγF:= 1.

Noting that the relaxations of sums suggested by Theorem 5 coincide with the classic McCormick treatment of addition, pointwise convergence of these relaxations has already been treated in [10, Theorem 3].

Proposition 9 Suppose thatφ is any univariate intrinsic function considered in either Ta-ble 1 or Section 3.2, and consider an interval y∈ IR. Ifφwas chosen from Table 1, suppose additionally that y⊂ B. For any interval x ⊂ y, letφC

(· ; x) andφC(· ; x) denote the

par-ticular convex and concave relaxations forφ on x suggested in Table 1 and Section 3.2. There exists a constantτ≥ 0 (dependent on y) for which, for each interval x ⊂ y and each

ξ∈ x,

φ(ξ) −φC(ξ; x) ≤τ(widx)2, and φC(ξ; x) −φ(ξ) ≤τ(widx)2.

Proof An appropriate constantτ> 0 may be computed directly for each considered choice

ofφ, y, and x. ⊓⊔

Proposition 10 Suppose thatφis any univariate intrinsic function considered in Section 3.3. For any interval x∈ IR, letφC

(· ; x) andφC(· ; x) denote the particular convex and concave

relaxations forφ on x suggested in Section 3.3. Then, for each x∈ IR and eachξ∈ x,

φ(ξ) −φC₍_ξ_{; x}