• Aucun résultat trouvé

slides

N/A
N/A
Protected

Academic year: 2022

Partager "slides"

Copied!
66
0
0

Texte intégral

(1)

Submodular Functions:

from Discrete to Continuous Domains

Francis Bach

INRIA - Ecole Normale Sup´erieure

É C O L E N O R M A L E S U P É R I E U R E

ETH Z¨ urich - April 2016

(2)

Submodular functions

From discrete to continuous domains Summary

• Which functions can be minimized in polynomial time?

– Beyond convex functions

• Submodular functions

– Not convex, ... but “equivalent” to convex functions – Usually defined on {0, 1}n

– Extension to continuous domains

• Preprint available on ArXiv, second version (Bach, 2015)

(3)

Submodularity (almost) everywhere Clustering

• Semi-supervised clustering

• Submodular function minimization

(4)

Submodularity (almost) everywhere Graph cuts and image segmentation

• Submodular function minimization

(5)

Submodularity (almost) everywhere Sensor placement

• Each sensor covers a certain area (Krause and Guestrin, 2005) – Goal: maximize coverage

• Submodular function maximization

• Extension to experimental design (Seeger, 2009)

(6)

Submodularity (almost) everywhere Image denoising

• Total variation denoising (Chambolle, 2005)

• Submodular convex optimization problem

(7)

Submodularity (almost) everywhere Combinatorial optimization problems

• Set V = {1, . . . , n}

• Power set 2V = set of all subsets, of cardinality 2n

• Minimization/maximization of a set-function F : 2V → R.

A⊂Vmin F (A) = min

A∈2V F(A)

(8)

Submodularity (almost) everywhere Combinatorial optimization problems

• Set V = {1, . . . , n}

• Power set 2V = set of all subsets, of cardinality 2n

• Minimization/maximization of a set-function F : 2V → R.

A⊂Vmin F (A) = min

A∈2V F(A)

• Reformulation as (pseudo) Boolean function

x∈{0,1}min n H(x) with H : {0, 1}n → R

and ∀A ⊂ V, H(1A) = F(A)

(0, 1, 1)~{2, 3}

(0, 1, 0)~{2}

(1, 0, 1)~{1, 3} (1, 1, 1)~{1, 2, 3}

(1, 1, 0)~{1, 2}

(0, 0, 1)~{3}

(0, 0, 0)~{ } (1, 0, 0)~{1}

(9)

Outline

1. Submodular set-functions – Definitions, examples

– Links with convexity through Lov´asz extension – Minimization by convex optimization

2. From discrete to continuous domains – Nonpositive second-order derivatives – Invariances and examples

– Extensions on product measures through optimal transport

3. Minimization of continuous submodular functions – Subgradient descent

– Frank-Wolfe optimization

(10)

Submodular functions - References

• Reference book based on combinatorial optimization – Submodular Functions and Optimization (Fujishige, 2005)

• Tutorial monograph based on convex optimization (Bach, 2013)

Learning with submodular functions: a convex optimization perspective

(11)

Submodular functions Definitions

• Definition: H : {0, 1}n → R is submodular if and only if

∀x, y ∈ {0, 1}n, H(x) + H(y) > H(max{x, y}) + H(min{x, y}) – NB: equality for modular functions (linear functions of x)

– Always assume H(0) = 0

(12)

Submodular functions Definitions

• Definition: H : {0, 1}n → R is submodular if and only if

∀x, y ∈ {0, 1}n, H(x) + H(y) > H(max{x, y}) + H(min{x, y}) – NB: equality for modular functions (linear functions of x)

– Always assume H(0) = 0

• Equivalent definition: (with ei ∈ Rn i-th canonical basis vector)

∀i ∈ {1, . . . , n}, x 7→ H(x + ei) − H(x) is non-increasing – “Concave property”: Diminishing returns

(13)

Submodular functions - Examples

(see, e.g., Fujishige, 2005; Bach, 2013)

• Concave functions of the cardinality

• Cuts

• Entropies

– Joint entropy of (Xk)xk=1, from n random variables X1, . . . , Xn

• Functions of eigenvalues of sub-matrices

• Network flows

• Rank functions of matroids

(14)

Examples of submodular functions Cardinality-based functions

• Modular function: H(x) = wx for w ∈ Rn

– Cardinality example: If w = 1n, then H(x) = 1n x

• If g is a concave function, then H : x 7→ g(1n x) is submodular – Diminishing return property

g( x 1

n

)

x 1

T n

T

(15)

Examples of submodular functions Covers

S S3 S1

S2

7 S6

S5 S4

S 8

• Let W be any “base” set, and for each k ∈ V , a set Sk ⊂ W

• Set cover defined as H(x) =

Sx

k=1 Sk

(16)

Examples of submodular functions Cuts

• Given a (un)directed graph, with vertex set V = {1, . . . , n} and edge set E ⊂ V × V

– H(x) is the total number of edges going from {x = 1} to {x = 0}.

A={x=1}

• Generalization with d : {1, . . . , n} × {1, . . . , n} → R+

H(x) = X

j,k

d(k, j)(xk − xj)+

(17)

Choquet integral (Choquet, 1954) - Lov´ asz extension

• Subsets may be identified with elements of {0, 1}n

• Given any function H and µ ∈ Rn such that µj1 > · · · > µjn, define:

h(µ) =

Xn

k=1

µjk[H(ej1 + · · · + ejk) − H(ej1 + · · · + ejk−1)]

µ3

µ2

µ1

µ321

µ231

MMM MMM

MMM MMM

(18)

Choquet integral (Choquet, 1954) - Lov´ asz extension

• Subsets may be identified with elements of {0, 1}n

• Given any function H and µ ∈ Rn such that µj1 > · · · > µjn, define:

h(µ) =

Xn

k=1

µjk[H(ej1 + · · · + ejk) − H(ej1 + · · · + ejk−1)]

µ3

µ2

µ1

µ321

µ231

MMM MMM

MMM MMM

(19)

Choquet integral (Choquet, 1954) - Lov´ asz extension

• Subsets may be identified with elements of {0, 1}n

• Given any function H and µ ∈ Rn such that µj1 > · · · > µjn, define:

h(µ) =

Xn

k=1

µjk[H(ej1 + · · · + ejk) − H(ej1 + · · · + ejk−1)]

• For H(x) = wx, then h(µ) = wµ

• For cuts, h(µ) = P

k,j∈V d(k, j)|µk − µj| is the total variation

(20)

Choquet integral (Choquet, 1954) - Lov´ asz extension

• Subsets may be identified with elements of {0, 1}n

• Given any function H and µ ∈ Rn such that µj1 > · · · > µjn, define:

h(µ) =

Xn

k=1

µjk[H(ej1 + · · · + ejk) − H(ej1 + · · · + ejk−1)]

• For H(x) = wx, then h(µ) = wµ

• For cuts, h(µ) = P

k,j∈V d(k, j)|µk − µj| is the total variation

• For any set-function H (even not submodular)

– h is piecewise-linear and positively homogeneous

– If x ∈ {0, 1}n, h(x) = H(x) ⇒ extension from {0, 1}n to [0, 1]n

(21)

Submodular set-functions

Links with convexity (Lov´ asz, 1982)

1. H is submodular if and only if h is convex

2. If H is submodular, then

x∈{0,1}min n H(x) = min

µ∈{0,1}n h(µ) = min

µ∈[0,1]n h(µ)

3. If H is submodular, then a subgradient of h at any µ may be computed by the “greedy algorithm”

– Order the components of µ ∈ Rn as µj1 > · · · > µjn

– Define wjk = H(ej1 + · · · + ejk) − H(ej1 + · · · + ejk−1) for all k – Moreover h(µ) = wµ

(22)

Submodular set-functions

Links with convexity (Lov´ asz, 1982)

1. H is submodular if and only if h is convex

2. If H is submodular, then

x∈{0,1}min n H(x) = min

µ∈{0,1}n h(µ) = min

µ∈[0,1]n h(µ)

3. If H is submodular, then a subgradient of h at any µ may be computed by the “greedy algorithm”

• Consequences

– Submodular function minimization may be done in polynomial time – Ellipsoid algorithm in O(n5) (Gr¨otschel et al., 1981)

(23)

Exact submodular function minimization Combinatorial algorithms

• Algorithms based on min

µ∈[0,1]n h(µ) and its dual problem

• Output the subset A and a dual certificate of optimality

• Best algorithms have polynomial complexity (Schrijver, 2000; Iwata et al., 2001; Orlin, 2009)

– Typically O(n6) or more

• Not practical for large problems...

(24)

Submodular function minimization Through convex optimization

• Convex non-smooth optimization problem

x∈{0,1}min n H(x) = min

µ∈{0,1}n h(µ) = min

µ∈[0,1]n h(µ)

• Important properties of h for convex optimization – Polyhedral function

– Known subgradients obtained from greedy algorithm

• Generic algorithms (blind to submodular structure) – Some with complexity bounds, some without

– Subgradient, Frank-Wolfe, simplex, cutting-plane (ACCPM) – See Bach (2013) for details

(25)

Outline

1. Submodular set-functions – Definitions, examples

– Links with convexity through Lov´asz extension – Minimization by convex optimization

2. From discrete to continuous domains – Nonpositive second-order derivatives – Invariances and examples

– Extensions on product measures through optimal transport

3. Minimization of continuous submodular functions – Subgradient descent

– Frank-Wolfe optimization

(26)

From discrete to continuous domains

• Main insight: {0, 1} is totally ordered!

(27)

From discrete to continuous domains

• Main insight: {0, 1} is totally ordered!

• Extension to {0, . . . , k − 1}: H : {0, . . . , k − 1}n → R

∀x, y, H(x) + H(y) > H(min{x, y}) + H(max{x, y}) – Equivalent definition: with (ei)i∈{1,...,n} canonical basis of Rn

∀x, i 6= j, H(x + ei) + H(x + ej) > H(x) + H(x + ei + ej) – See Lorentz (1953); Topkis (1978)

(28)

From discrete to continuous domains

• Main insight: {0, 1} is totally ordered!

• Extension to {0, . . . , k − 1}: H : {0, . . . , k − 1}n → R

∀x, y, H(x) + H(y) > H(min{x, y}) + H(max{x, y}) – Equivalent definition: with (ei)i∈{1,...,n} canonical basis of Rn

∀x, i 6= j, H(x + ei) + H(x + ej) > H(x) + H(x + ei + ej) – See Lorentz (1953); Topkis (1978)

• Taylor expansion:

– H(x + ei) + H(x + ej) ≈ 2H(x) + ∂H∂x

i + ∂x∂H

j + 122H

∂x2i + 122H

∂x2j

– H(x) +H(x+ei +ej) = 2H(x) + ∂H∂x

i + ∂x∂H

j + 12∂x2H2

i

+ 12∂x2H2

j

+∂x2H

i∂xj

(29)

From discrete to continuous domains

• Main insight: {0, 1} is totally ordered!

• Extension to {0, . . . , k − 1}: H : {0, . . . , k − 1}n → R

∀x, y, H(x) + H(y) > H(min{x, y}) + H(max{x, y}) – Equivalent definition: with (ei)i∈{1,...,n} canonical basis of Rn

∀x, i 6= j, H(x + ei) + H(x + ej) > H(x) + H(x + ei + ej) – See Lorentz (1953); Topkis (1978)

• Generalization to all totally ordered sets: Xi ⊂ R intervals + H twice differentiable: ∀x ∈

Yn i=1

Xi, ∂2H

∂xi∂xj(x) 6 0

(30)

A “new” class of continuous functions

• Assume each Xi ⊂ R is a compact interval, and (for simplicity) H twice differentiable:

Submodularity : ∀x ∈

Yn i=1

Xi, ∂2H

∂xi∂xj(x) 6 0

• Invariance by

– individual increasing smooth change of variables H(ϕ1(x1), . . . , ϕn(xn)) – adding arbitrary (smooth) separable functions Pn

i=1 vi(xi)

(31)

A “new” class of continuous functions

• Assume each Xi ⊂ R is a compact interval, and (for simplicity) H twice differentiable:

Submodularity : ∀x ∈

Yn i=1

Xi, ∂2H

∂xi∂xj(x) 6 0

• Invariance by

– individual increasing smooth change of variables H(ϕ1(x1), . . . , ϕn(xn)) – adding arbitrary (smooth) separable functions Pn

i=1 vi(xi)

• Examples

– Quadratic functions with Hessians with non-negative off-diagonal entries (Kim and Kojima, 2003)

– ϕ(xi − xj), ϕ convex; ϕ(x1 + · · · + xn), ϕ concave; log det, etc...

– Monotone of order two (Carlier, 2003), Spence-Mirrlees condition (Milgrom and Shannon, 1994)

(32)

A “new” class of continuous functions

x1

x 2

−1 −0.5 0 0.5 1

−1

−0.5

0

0.5

1 −2

−1.5

−1

−0.5 0

• Level sets of the submodular function (x1, x2) 7→ 207 (x1 − x2)2 − e−4(x123)235e−4(x1+23)2 −e−4(x223)2 −e−4(x2+23)2, with several local minima, local maxima and saddle points

(33)

Extensions to the space of product measures

• Set-function: Xi = {0, 1}

– [0, 1] ≈ set of probability distributions on {0, 1}: µi = P(Xi = 1) – Lov´asz extension: for µ ∈ [0, 1]n such that µj1 > · · · > µjn

h(µ) =

Xn

k=1

µjk[H(ej1 + · · · + ejk) − H(ej1 + · · · + ejk−1})]

= (1 − µj1)H(0) +

n−1X

k=1

jk − µjk+1)H(ej1 + · · · + ejk) + µjnH(1n)

= E

H 1µ1>t, . . . ,1µn>t

for t uniform in [0, 1]

h If t ∈ (µjk+1, µjk), then µj1 > · · · > µjk > t > µjk+1 > · · · > µjn i

(34)

Extensions to the space of product measures

• Set-function: Xi = {0, 1}

– [0, 1] ≈ set of probability distributions on {0, 1}: µi = P(Xi = 1) – Lov´asz extension: for µ ∈ [0, 1]n such that µj1 > · · · > µjn

h(µ) =

Xn

k=1

µjk[H(ej1 + · · · + ejk) − H(ej1 + · · · + ejk−1})]

= (1 − µj1)H(0) +

n−1X

k=1

jk − µjk+1)H(ej1 + · · · + ejk) + µjnH(1n)

= E

H 1µ1>t, . . . ,1µn>t

for t uniform in [0, 1]

• Lov´asz extension = relaxation on product measures – Continuous variable µ = (µ1, . . . , µn) ∈ Qn

i=1[0, 1]

– t 7→ 1µi>t is the inverse cumulative distribution function of µi

(35)

Extensions to the space of product measures

View 1: thresholding cumulative distrib. functions

• Given a probability distribution µi ∈ P(Xi)

– (reversed) cumulative distribution function Fµi : Xi → [0, 1] as Fµi(xi) = µi {yi ∈ Xi, yi > xi}

= µi [xi, +∞)

∈ [0, 1]

– and its “inverse”: Fµ−1i (t) = sup{xi ∈ Xi, Fµi(xi) > t} ∈ Xi

xi Fµi(xi)

1

0

(36)

Extensions to the space of product measures

View 1: thresholding cumulative distrib. functions

• Given a probability distribution µi ∈ P(Xi)

– (reversed) cumulative distribution function Fµi : Xi → [0, 1] as Fµi(xi) = µi {yi ∈ Xi, yi > xi}

= µi [xi, +∞)

∈ [0, 1]

– and its “inverse”: Fµ−1i (t) = sup{xi ∈ Xi, Fµi(xi) > t} ∈ Xi

• “Continuous” extension

∀µ ∈

Yn i=1

P(Xi), h(µ1, . . . , µn) =

Z 1 0

H

Fµ−11 (t), . . . , Fµ−1n (t) dt – For finite sets, can be computed by sorting all values of Fµi(xi) – Equal to the Lov´asz extension for set-functions

(37)

Extensions to the space of product measures

View 1: thresholding cumulative distrib. functions

x1 Fµ1(x1)

1

0 x2

Fµ2(x2) 1

0 x3

Fµ3(x3) 1

0 t

Fµ11(t) Fµ21(t) Fµ31(t)

• “Continuous” extension

∀µ ∈

Yn i=1

P(Xi), h(µ1, . . . , µn) =

Z 1 0

H

Fµ−11 (t), . . . , Fµ−1n (t) dt – For finite sets, can be computed by sorting all values of Fµi(xi) – Equal to the Lov´asz extension for set-functions

(38)

Extensions to the space of product measures

View 1: thresholding cumulative distrib. functions

x1 Fµ1(x1)

1

0 x2

Fµ2(x2) 1

0 x3

Fµ3(x3) 1

0 t

Fµ11(t) Fµ21(t) Fµ31(t)

• “Continuous” extension

∀µ ∈

Yn i=1

P(Xi), h(µ1, . . . , µn) =

Z 1 0

H

Fµ−11 (t), . . . , Fµ−1n (t) dt – For finite sets, can be computed by sorting all values of Fµi(xi) – Equal to H(x1, . . . , xn) when µi = δxi for all i

(39)

Extensions to the space of product measures View 2: convex closure

• Given any function H on X = Qn

i=1 Xi

– Known value H(x) for any “extreme points” of product measures (i.e., all Diracs δx at any x ∈ X)

– Convex closure ˜h = largest convex lower bound

– Minimizing H and its convex closure ˜h is equivalent

(1,1)

1

µ

2

(0,0)

(1,0)

(0,1)

µ

(40)

Extensions to the space of product measures View 2: convex closure

• Given any function H on X = Qn

i=1 Xi

– Known value H(x) for any “extreme points” of product measures (i.e., all Diracs δx at any x ∈ X)

– Convex closure ˜h = largest convex lower bound

– Minimizing H and its convex closure ˜h is equivalent

• Need to compute the bi-conjugate of

a : µ 7→ H(x) if µ = δx for some x ∈ X, and + ∞ otherwise

(41)

Computation of the convex envelope

• Need to compute the bi-conjugate of

a : µ 7→ H(x) if µ = δx for some x ∈ X, and + ∞ otherwise

• Step 1: compute a(w) = supµhµ, wi − a(µ) for w ∈ Qn

i=1 RXi

a(w) = sup

x∈X

Xn i=1

wi(xi) − H(x) = sup

γP(X)

X

x∈X

γ(x) Xn

i=1

wi(xi) − H(x)

= sup

γ∈P(X)

Xn

i=1

X

xiXi

wi(xii(xi) − X

x∈X

γ(x)H(x)

– with γi(xi) = X

xj,j6=i

γ(x1, . . . , xn) the i-th marginal of γ

(42)

Computation of the convex envelope

• Step 1: a(w) = sup

γ∈P(X)

Xn

i=1

X

xiXi

wi(xii(xi) − X

x∈X

γ(x)H(x)

• Step 2: compute a∗∗(µ) = supwhw, µi − a(w) for µ ∈ Qn

i=1 P(Xi) a∗∗(µ) = sup

w hw, µi − sup

γ∈P(X)

Xn

i=1

X

xiXi

wi(xii(xi) − X

x∈X

γ(x)H(x)

= inf

γ∈P(X) sup

w

Xn i=1

X

xiXi

wi(xi) µi(xi) − γi(xi)

+ X

xX

γ(x)H(x)

• Thus a∗∗(µ) = inf

γP(X)

Z

X

H(x)dγ(x) such that ∀i, γi(xi) = µi(xi)

(43)

Extensions to the space of product measures View 2: convex closure

• Given any function H on X = Qn

i=1 Xi

– Known value H(x) for any “extreme points” of product measures (i.e., all Diracs δx at any x ∈ X)

– Convex closure ˜h = largest convex lower bound

– Minimizing H and its convex closure ˜h is equivalent

• “Closed-form” formulation: ˜h(µ1, . . . , µn) = inf

γ∈P(X)

Z

X

H(x)dγ(x), – with respect to all prob. measures γ on X such that γi(xi) = µi(xi) – Multi-marginal optimal transport

(44)

Optimal transport: from Monge to Kantorovich

• Monge formulation (“La th´eorie des d´eblais et des remblais”, 1781) – Transforming a measure µ1 to µ2 that (a) preserves local mass and

(b) minimize transportation cost R

X1 c(x1, T (x1))dµ1(x1) d´eblais

remblais

x1

x2 T

µ1

µ2 c(x1, x2) = |x1 − x2|

– Optimal transport map T may not always exists – Discrete case: earth’s mover distance

(45)

Optimal transport: from Monge to Kantorovich

• Monge formulation (“La th´eorie des d´eblais et des remblais”, 1781) – Transforming a measure µ1 to µ2 that (a) preserves local mass and

(b) minimize transportation cost R

X1 c(x1, T (x1))dµ1(x1) d´eblais

remblais

x1

x2 T

µ1

µ2 c(x1, x2) = |x1 − x2|

– Optimal transport map T may not always exists – Discrete case: earth’s mover distance

• Kantorovich formulation (1942)

– Convex relaxation on space of probability measures γ ∈ P(X1×X2) – Prescribed marginals γ1 = µ1 and γ2 = µ2

– Minimum cost R

X1×X2 c(x1, x2)dγ(x1, x2)

(46)

Optimal transport: from two to multiple marginals

• Kantorovich formulation (1942)

– Convex relaxation on space of probability measures γ ∈ P(X1×X2) – Prescribed marginals γ1 = µ1 and γ2 = µ2

– Minimum cost R

X1×X2 c(x1, x2)dγ(x1, x2)

• Properties

– Monge formulation with distribution of (x1, T(x1))

– Wasserstein distance between measures with c(x1, x2) = |x1−x2|p – Relationship with copulas

– See Villani (2008); Santambrogio (2015)

(47)

Optimal transport: from two to multiple marginals

• Kantorovich formulation (1942)

– Convex relaxation on space of probability measures γ ∈ P(X1×X2) – Prescribed marginals γ1 = µ1 and γ2 = µ2

– Minimum cost R

X1×X2 c(x1, x2)dγ(x1, x2)

• Properties

– Monge formulation with distribution of (x1, T(x1))

– Wasserstein distance between measures with c(x1, x2) = |x1−x2|p – Relationship with copulas

– See Villani (2008); Santambrogio (2015)

• Extension to multiple marginals – Minimize R

X H(x)dγ(x) with respect to all prob. measures γ on X such that γi(xi) = µi(xi) for all i ∈ {1, . . . , n}

(48)

Extensions to the space of product measures Combining the two views

• View 1: thresholding cumulative distribution functions + closed form computation for any H, always an extension

− not convex

• View 2: convex closure

+ convex for any H, allows minimization of H

− not computable, may not be an extension

(49)

Extensions to the space of product measures Combining the two views

• View 1: thresholding cumulative distribution functions + closed form computation for any H, always an extension

− not convex

• View 2: convex closure

+ convex for any H, allows minimization of H

− not computable, may not be an extension

• Submodularity

– The two views are equivalent

– Direct proof through optimal transport

– All results from submodular set-functions go through

(50)

Kantorovich optimal transport in one dimension

• Theorem (Carlier, 2003): If H is submodular, then

γ∈infP(X)

Z

X

H(x)dγ(x) such that ∀i, γi = µi

is equal to

Z 1 0

H

Fµ−11 (t), . . . , Fµ−1n (t) dt

(51)

Kantorovich optimal transport in one dimension

• Theorem (Carlier, 2003): If H is submodular, then

γ∈infP(X)

Z

X

H(x)dγ(x) such that ∀i, γi = µi

is equal to

Z 1 0

H

Fµ−11 (t), . . . , Fµ−1n (t) dt

• Proof/intuition for n = 2 for the Monge problem (a) Assume for simplicity atomless measures

(b) The following increasing map is natural Fµ−12 ◦ Fµ1 : X1 → X2 (c) This is the only increasing map

(d) Transport maps always increasing when H submodular

– If x1 < x1 mapped to x2 > x2, then exchanging x2 and x2 would increase cost by H(x1, x2)+H(x1, x2)−H(x1, x2)−H(x1, x2)60

(52)

Duality - Subgradients of extension

• General duality

h(µ) = sup

w

Xn i=1

X

xiXi

wi(xii(xi) − sup

x∈X

Xn

i=1

wi(xi) − H(x)

• Subgradients from “greedy algorithm”

– Sort all values of Fµi(xi) for i ∈ {1, . . . , n} and xi ∈ Xi – Get a subgradient w by taking differences of values of H – See Bach (2015) for more details

• Extensions of various submodular polytopes

(53)

Submodular functions

Links with convexity (Bach, 2015)

1. H is submodular if and only if h is convex

2. If H is submodular, then

x∈Qminn

i=1 Xi H(x) = min

µ∈Qn

i=1 P(Xi) h(µ)

3. If H is submodular, then a subgradient of h at any µ may be computed by a “greedy algorithm”

(54)

Submodular functions

Links with convexity (Bach, 2015)

1. H is submodular if and only if h is convex

2. If H is submodular, then

x∈Qminn

i=1 Xi H(x) = min

µ∈Qn

i=1 P(Xi) h(µ)

3. If H is submodular, then a subgradient of h at any µ may be computed by a “greedy algorithm”

– Submodular functions may be minimized in polynomial time with similar algorithms than for the binary case

– NB: existing reduction to submodular set-functions defined on a ring family (Schrijver, 2000)

(55)

Outline

1. Submodular set-functions – Definitions, examples

– Links with convexity through Lov´asz extension – Minimization by convex optimization

2. From discrete to continuous domains – Nonpositive second-order derivatives – Invariances and examples

– Extensions on product measures through optimal transport

3. Minimization of continuous submodular functions – Subgradient descent

– Frank-Wolfe optimization

(56)

Minimization of submodular functions Projected subgradient descent

• For simplicity: discretizing all sets Xi, i = 1, . . . , n to k elements

• Assume Lispschitz-continuity: ∀x, ei, |H(x + ei) − H(x)| 6 B – Fact: subgradients of h bounded by B in ℓ-norm

• Projected subgradient descent – Convergence rate of O(nkB/√

t) after t iterations – Cost of each iteration O(nk log(nk))

– Reasonable scaling with respect to discretization

Oen3 ε3

for continuous domains

(57)

Minimization of submodular functions Frank-Wolfe / conditional gradient

• Submodular set-functions: Xi = {0, 1} – (C) : minµ∈[0,1]n h(µ) non-smooth convex

– Solve instead (S) : minµ∈Rn h(µ) + 12kµk2 (strongly convex)

– Fact: level sets of (S) obtained from minimizers of H(x) + λx1n

(58)

Minimization of submodular functions Frank-Wolfe / conditional gradient

• Submodular set-functions: Xi = {0, 1} – (C) : minµ∈[0,1]n h(µ) non-smooth convex

– Solve instead (S) : minµ∈Rn h(µ) + 12kµk2 (strongly convex)

– Fact: level sets of (S) obtained from minimizers of H(x) + λx1n

• Extension to all submodular functions – (C) : minµ∈Qn

i=1 P(Xi) h(µ) – Solve instead (S) : minµ∈Qn

i=1 P(Xi) h(µ) + Pn

i=1 ϕii)

– ϕ(µi) defined through optimal transport with a submodular cost ci(xi, t) between µi and the uniform distribution on [0, 1]

– ϕ(µi) can be strongly convex

– Level sets of (S) obtained from minimizers of H(x)+Pn

i=1 ci(xi, t)

(59)

Empirical simulations (online code)

• Signal processing example: H : [−1, 1]n → R with α < 1 H(x) = 1

2

Xn i=1

(xi − zi)2 + λ

Xn i=1

|xi|α + µ

n−1X

i=1

(xi − xi+1)2

−1 −0.5 0 0.5 1

−1

−0.5 0 0.5 1

x

signal

noisy noiseless

−1 −0.5 0 0.5 1

−1

−0.5 0 0.5 1

x

signal

denoised noiseless

(60)

Empirical simulations (online code)

• Signal processing example: H : [−1, 1]n → R with α < 1 H(x) = 1

2

Xn i=1

(xi − zi)2 + λ

Xn i=1

|xi|α + µ

n−1X

i=1

(xi − xi+1)2

100 200 300 400

−8

−6

−4

−2 0 2 4

number of iterations

(certified) gaps

subgradient Frank−Wolfe Pair−wise FW

• Pair-wise Frank-Wolfe (Lacoste-Julien and Jaggi, 2015)

(61)

Empirical simulations (online code)

• Signal processing example: H : [−1, 1]n → R with α < 1 H(x) = 1

2

Xn i=1

(xi − zi)2 + λ

Xn i=1

|xi|α + µ

n−1X

i=1

(xi − xi+1)2

100 200 300 400

−8

−6

−4

−2 0 2 4

number of iterations

(certified) gaps

subgradient Frank−Wolfe Pair−wise FW

• Pair-wise Frank-Wolfe (Lacoste-Julien and Jaggi, 2015)

(62)

Conclusion

• Submodular function and convex optimization – From discrete to continuous domains

– Extensions to product measures

– Direct link with one-dimensional multi-marginal optimal transport

(63)

Conclusion

• Submodular function and convex optimization – From discrete to continuous domains

– Extensions to product measures

– Direct link with one-dimensional multi-marginal optimal transport

• On-going work and extensions

– Optimal transport beyond submodular functions – Beyond discretization

– Beyond minimization

– Sums of submodular functions and convex functions

– Sums of simple submodular functions (Jegelka et al., 2013)

– Mean-field inference in log-supermodular models (Djolonga and Krause, 2015)

(64)

References

F. Bach. Learning with Submodular Functions: A Convex Optimization Perspective. Technical Report 00645271, HAL, 2013.

F. Bach. Submodular functions: from discrete to continous domains. Technical Report 1511.00394-v2, HAL, 2015.

G. Carlier. On a class of multidimensional optimal transportation problems. Journal of Convex Analysis, 10(2):517–530, 2003.

A. Chambolle. Total variation minimization and a class of binary MRF models. In Energy Minimization Methods in Computer Vision and Pattern Recognition, pages 136–152. Springer, 2005.

G. Choquet. Theory of capacities. Ann. Inst. Fourier, 5:131–295, 1954.

J. Djolonga and A. Krause. Scalable variational inference in log-supermodular models. In International Conference on Machine Learning (ICML), 2015.

S. Fujishige. Submodular Functions and Optimization. Elsevier, 2005.

M. Gr¨otschel, L. Lov´asz, and A. Schrijver. The ellipsoid method and its consequences in combinatorial optimization. Combinatorica, 1(2):169–197, 1981.

S. Iwata, L. Fleischer, and S. Fujishige. A combinatorial strongly polynomial algorithm for minimizing submodular functions. Journal of the ACM, 48(4):761–777, 2001.

S. Jegelka, F. Bach, and S. Sra. Reflection methods for user-friendly submodular optimization. In Advances in Neural Information Processing Systems (NIPS), 2013.

(65)

Sunyoung Kim and Masakazu Kojima. Exact solutions of some nonconvex quadratic optimization problems via sdp and socp relaxations. Computational Optimization and Applications, 26(2):

143–154, 2003.

A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. In Proc.

UAI, 2005.

S. Lacoste-Julien and M. Jaggi. On the global linear convergence of frank-wolfe optimization variants.

In Advances in Neural Information Processing Systems (NIPS), 2015.

G. G. Lorentz. An inequality for rearrangements. American Mathematical Monthly, 60(3):176–179, 1953.

L. Lov´asz. Submodular functions and convexity. Mathematical programming: The state of the art, Bonn, pages 235–257, 1982.

P. Milgrom and C. Shannon. Monotone comparative statics. Econometrica: Journal of the Econometric Society, pages 157–180, 1994.

J.B. Orlin. A faster strongly polynomial time algorithm for submodular function minimization.

Mathematical Programming, 118(2):237–251, 2009.

F. Santambrogio. Optimal Transport for Applied Mathematicians. Springer, 2015.

A. Schrijver. A combinatorial algorithm minimizing submodular functions in strongly polynomial time.

Journal of Combinatorial Theory, Series B, 80(2):346–355, 2000.

M. Seeger. On the submodularity of linear experimental design, 2009. http://lapmal.epfl.ch/

papers/subm_lindesign.pdf.

D. M. Topkis. Minimizing a submodular function on a lattice. Operations research, 26(2):305–321,

(66)

1978.

C. Villani. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.

A. Weller. Bethe and related pairwise entropy approximations. In Uncertainty in Artificial Intelligence (UAI), 2015.

Références

Documents relatifs

In this section, we present optimization methods for minimizing convex objective functions regular- ized by the Lov´asz extension of a submodular function.. These lead to

Lascoux, Symmetric functions and combinatorial operators on polynomials, CBMS Regional Conference Series in Mathematics 99, American Math.. Lenart, Lagrange inversion and

Next, we give an online algorithm that achieves an approximation guarantee (depending on the search space) for the problem of maximizing non-monotone continuous DR-submodular

Series Solutions of Differential Equations 8.1 Introduction: The Taylor Polynomial Approximation 422 8.2 Power Series and Analytic Functions 427 8.3 Power Series Solutions to

important role by stating that a combinatorial problem can be solved in polynomial time if and only if there exists a polynomial time algorithm capable of solving the

Minimizing submodular functions on continuous domains has been recently shown to be equivalent to a convex optimization problem on a space of uni-dimensional measures [8], and

(1) In Section 3, we cast the problem of minimizing decomposable submodular functions as an or- thogonal projection problem and show how existing optimization techniques may be

Can positive Boolean functions be dualized in total polynomial time, that is, in time polynomial in the combined size of the input and of the output?.. Combinatorial models