• Aucun résultat trouvé

Belief Propagation for Min-Cost Network Flow: Convergence & Correctness

N/A
N/A
Protected

Academic year: 2021

Partager "Belief Propagation for Min-Cost Network Flow: Convergence & Correctness"

Copied!
15
0
0

Texte intégral

(1)

Belief Propagation for Min-Cost Network

Flow: Convergence & Correctness

The MIT Faculty has made this article openly available.

Please share

how this access benefits you. Your story matters.

Citation

David Gamarnik, Devavrat Shah, and Yehua Wei. 2010. Belief

propagation for min-cost network flow: convergence \& correctness.

In Proceedings of the Twenty-First Annual ACM-SIAM Symposium

on Discrete Algorithms (SODA '10). Society for Industrial and

Applied Mathematics, Philadelphia, PA, USA, 279-292. Copyright ©

2010, Society for Industrial and Applied Mathematics

As Published

http://dl.acm.org/citation.cfm?id=1873625

Publisher

Society for Industrial and Applied Mathematics

Version

Final published version

Citable link

http://hdl.handle.net/1721.1/73521

Terms of Use

Article is made available in accordance with the publisher's

policy and may be subject to US copyright law. Please refer to the

publisher's site for terms of use.

(2)

Belief Propagation for Min-cost Network Flow: Convergence & Correctness

David Gamarnik

Devavrat Shah

Yehua Wei

Abstract

We formulate a Belief Propagation (BP) algorithm in the context of the capacitated minimum-cost network flow prob-lem (MCF). Unlike most of the instances of BP studied in the past, the messages of BP in the context of this problem are piecewise-linear functions. We prove that BP converges to the optimal solution in pseudo-polynomial time, provided that the optimal solution is unique and the problem input is integral. Moreover, we present a simple modification of the BP algorithm which gives a fully polynomial-time random-ized approximation scheme (FPRAS) forMCF. This is the first instance where BP is proved to have fully-polynomial running time.

1 Introduction

The Markov random field or graphical model provides a succinct representation for capturing the dependency structure between a collection of random variables. In the recent years, the need for large scale statistical in-ference has made graphical models the representation of choice in many statistical applications. There are two key inference problems for a graphical model of interest. The first problem is the computation of marginal distri-bution of a random variable. This problem is (compu-tationally) equivalent to the computation of partition function in statistical physics or counting the number of combinatorial objects (e.g., independent sets) in com-puter science. The second problem is that of finding the mode of the distribution, i.e., the assignment with max-imum likelihood (ML). For a constrained optimization problem, when the constraints are modeled through a graphical model and probability is proportional to the cost of the assignment, an ML assignment is the same as an optimal solution. Both of these questions, in general, are computationally hard.

Belief Propagation (BP) is an “umbrella” heuristic designed for these two problems. Its version for the first problem is known as the “sum-product algorithm” and for the second problem is known as the “max-∗Operations Research Center and Sloan School of

Manage-ment, MIT, Cambridge, MA, 02139, e-mail: gamarnik@mit.edu

EECS, MIT, Cambridge, MA, 02139, e-mail:

devavrat@mit.edu

Operations Research Center, MIT, Cambridge, MA, 02139,

e-mail: y4wei@MIT.EDU

product algorithm”. Both versions of the BP algorithm are iterative, simple and message-passing in nature. In this paper, our interest lies in the correctness and convergence properties of the max-product version of BP when applied to the minimum-cost network flow problems (MCF), an important class of linear (or more generally convex) optimization problems. In the rest of the paper, we will use the term BP to refer to the max-product version unless specified otherwise.

The BP algorithm is essentially an approximation of the dynamic programming assuming that underly-ing graphical model is a tree [8], [22], [16]. Specifically, assuming that the graphical model is a tree, one can obtain a natural parallel iterative version of the dy-namic programming in which variable nodes pass mes-sages between each other along edges of the graphical model. Somewhat surprisingly, this seemingly naive BP heuristic has become quite popular in practice [2], [9], [11], [17]. In our opinion, there are two primary reasons for the popularity of BP. First, like dynamic program-ming, it is generically applicable, easy to understand and implementation-friendly due to its iterative, simple and message-passing nature. Second, in many practical scenarios, the performance of BP is surprisingly good [21] [22]. On one hand, for an optimist, this unexpected success of BP provides a hope for its being a genuinely much more powerful algorithm than what we know thus far (e.g., better than primal-dual methods). On the other hand, a skeptic would demand a systematic un-derstanding of the limitations (and strengths) of BP, in order to caution a practitioner in using it. Thus, irre-spective of the perirre-spective of an algorithmic theorist, rigorous understanding of BP is very important.

Prior work. Despite these compelling reasons, only

recently have we witnessed an explosion of research for theoretical understanding of the performance of the BP algorithm in the context of various combinatorial optimization problems, both tractable and intractable (NP-hard) versions. To begin with, Bayati, Shah and Sharma [6] considered the performance of BP for the the problem of finding maximum weight matching in a bipartite graph. They established that BP converges in pseudo-polynomial time to the optimal solution when the optimal solution is unique [6]. Bayati et al. [4] as well as Sanghavi et al. [18] generalized this result

(3)

v1 v2 v3

e1 e2 e3 e4 e5 e6 e7

Figure 1: An example of a factor graph

by establishing correctness and convergence of the BP algorithm for b-matching problem when the linear pro-gramming (LP) relaxation corresponding to the node constraints has a unique integral optimal solution. Note that the LP relaxation corresponding to the node con-straints is not tight in general, as inclusion of the odd-cycle elimination constraints [20] is essential. Further-more, [4] and [18] established that the BP does not con-verge if this LP relaxation does have a non-integral so-lution. Thus, for a b-matching problem BP finds an optimal answer when the LP relaxation can find an optimal solution. In the context of maximum weight independent set problem, a one-sided relation between LP relaxation and BP is established [19]; if BP con-verges then it is correct and LP relaxation is tight. We also take note of the work on BP for Steiner tree prob-lem [5], special instances of quadratic/convex optimiza-tion problems [13], [12], Gaussian graphical models [10], and the relation between primal and message-passing for LP decoding [3].

2 BP for Linear Programing Problem

Suppose we are given an LP problem in the standard form:

min cTx

(LP)

s.t. Ax = b, x≥ 0, x ∈ Rn

where A is a real m × n matrix, b is a real vector of dimension m, and c is a real vector of dimension

n. We define its factor graph, FLP, to be a bipartite graph, with factor nodes v1, v2, ..., vmand variable nodes

e1, e2, ..., en, such that eiand vjare adjacent if and only if aij = 0. For example, the graph shown in Figure 1, is

the factor graph of the LP problem:

min 4x1+ x2− 3x4+ x5+ 2x6− x7

s.t. x1+ 9x3− x4+ 5x7= 10

−2x1+ x2+ x3− 3x4+ x5− x6+ x7= 3

x1+ x6= 2

x≥ 0, x ∈ R7.

In FLP, for every 1≤ i ≤ m, let Ei={ej| aij= 0}; and

let Vj ={vi| aij = 0}. For each factor node vi, define the factor function ψi:R|Ei|→ ¯R (¯R = R ∪ ∞) as:

ψi(z) = 

0 if j∈E

iaijzj= bj

otherwise

Essentially, ψi represents the i-th equality constraints in LP. If we let xEi be the subvector of x with indices

in Ei, then ψi(xEi) <∞ if and only if x does not violate

the i-th equality constraint inLP. Also, for any variable node ej, define variable function φj :R → ¯R as:

φj(z) = 

cjz if z≥ 0

otherwise

Notice that problem LP is equivalent to

minx∈Rnmi=1ψi(xEi) + nj=1φj(xj), which is

de-fined on FLP. As BP is defined on graphical models, we can now formulate the BP algorithm as an iterative message-passing algorithm on FLP. It is formally

presented below as Algorithm 1. The idea of the

algorithm is that at the t-th iteration, every factor vertex vi sends a message function mtvi→ej(z) to each one of its neighbors ej, where mtvi→ej(z) is vi’s estimate of the cost if xj takes value z, based on the messages vi received at time t from its neighbors. Additionally, every variable vertex ej sends a message function mtej→vi(z) to each one of its neighbors vi, where mtej→vi(z) is ej’s estimate of the cost if xj takes value z, based on the messages ej received at time t− 1 from all of its neighbors except vj. After N iterations, the algorithm calculates the “belief” at variable node

ej using the messages sent to vertex ej at time N for each j.

In Algorithm 1, one basic question is the computa-tion procedure of message funccomputa-tions of the form mtej→vi and mtvi→ej. We claim that every message function is a convex piecewise-linear function, which allows us to en-code it in terms of a finite vector describing the break points and slopes of its linear pieces. In subsection 3.1, we describe in detail how mtvi→ej(z) can be computed. In general, the max-product behaves badly when solving an LP instance without a unique optimal so-lution [6], [4]. Yet, even with the assumption that an

(4)

Algorithm 1 BP for LP

1: Initialize t = 0, and for each variable node ej, create factor to variable message m0vi→ej(z) = 0,∀ej∈ Ei.

2: for t = 1, 2, ..., N do

3: ∀ej, vi, where vi∈ Vj, set mtej→vi(z) to

φj(z) + 

vi∈Vj\vi

mt−1v

i→ej(z),

∀z ∈ R.

4: ∀ej, vi, where ej ∈ Ei, set mtvi→ej(z) to

min ¯ z∈R|Ei|,¯zj=z ⎧ ⎨ ⎩ψiz) +  ej∈Ei\ej mtej→vizj) ⎫ ⎬ ⎭, ∀z ∈ R. 5: t := t + 1 6: end for

7: Set the “belief” function as b Nj (z) = φj(z) +

vi∈Vjm

N

vi→ej(z),∀1 ≤ j ≤ n.

8: Calculate the “belief” by finding xˆNj =

arg min bNj (z) for each j.

9: Return ˆxN.

LP problem has a unique optimal solution, in general ˆ

xN in Algorithm 1 does not converge to the unique op-timal solution as N grows large. One such instance (LP-relaxation of an instance of the maximum-weight independent set problem) which was earlier described in [19]: min 3  i=1 2xi− 3  j=1 3yi s.t. xi+ yj+ zij= 1, ∀1 ≤ i ≤ 3, 1 ≤ j ≤ 3 x, y, z≥ 0

3 BP Algorithm for Min-Cost Network Flow Problem

We start off this section by defining the capacitated

min-cost network flow problem (MCF). Given a directed

graph G = (V, E), let V , E denote the set of vertices and arcs respectively, and let|V | = n, |E| = m. For any vertex v∈ V , let Evbe the set of arcs incident to v, and for any e∈ Ev, let Δ(v, e) = 1 if e is an out-arc of v (i.e. arc e = (v, w), for some w∈ V ), and Δ(v, e) = −1 if e is an in-arc of v (i.e. arc e = (w, v), for some w ∈ V ).

MCF on G is formulated as follows [7],[1]: min e∈E cexe, (MCF)  e∈Ev

Δ(v, e)xe= bv, ∀v ∈ V (demand constraints)

0≤ xe≤ ue, ∀e ∈ E (flow constraints)

where ce≥ 0, ue≥ 0, ce∈ R, ue∈ ¯R, for each e ∈ E, and bv ∈ R for each v ∈ V . As MCF is a special class

of LP, we can view each arc e∈ E as a variable vertex and each v∈ V as a factor vertex, and define functions

ψv, φe,∀v ∈ V , ∀e ∈ E: ψv(z) =  0 if e∈E vΔ(v, e)ze= bv otherwise φe(z) =  cez if 0≤ z ≤ ue otherwise

Algorithm 2 BP for Network Flow

1: Initialize t = 0. For each e∈ E, suppose e = (v, w). Initialize messages m0e→v(z) = 0, ∀z ∈ R and

m0e→w(z) = 0,∀z ∈ R.

2: for t = 1, 2, 3, ..., N do

3: ∀e ∈ E, let e = (v, w), update messages:

mte→v(z) = φe(z) + min ¯ z∈R|Ew|,¯ze=z ⎧ ⎨ ⎩ψwz) +  ˜ e∈Ew\e mt−1e→w˜ze˜) ⎫ ⎬ ⎭, ∀z ∈ R mte→w(z) = φe(z) + min ¯ z∈R|Ev|,¯ze=z ⎧ ⎨ ⎩ψvz) +  ˜ e∈Ev\e mt−1e→v˜ze˜) ⎫ ⎬ ⎭, ∀z ∈ R. 4: t := t + 1 5: end for

6: ∀e ∈ E, let e = (v, w), and set the “belief” function:

nNe(z) = mNe→v(z) + mNe→w(z)− φe(z)

7: Calculate the “belief” by finding xˆNe =

arg min nNe(z) for each e∈ E.

8: Return ˆxN, which is the guess of the optimal solution ofMCF.

(5)

Clearly, solving MCF is equivalent to solving minx∈R|E|{v∈V ψv(xEv)+e∈Eφe(xe)}. We can then

apply BP algorithm for LP (Algorithm 1) for MCF.

Because of the special structure ofMCF, each variable node is adjacent to exactly two factor nodes. This al-lows us to skip the step of update messages mtv→e, and present a simplified version of BP algorithm forMCF, which we refer to as Algorithm 2.

To understand Algorithm 2 at a high level, each arc can be thought of as an agent, which is trying to figure out its own flow while meeting the conservation constraints at its endpoints. Each link maintains an estimate of its “local cost” as a function of its flow (thus this estimate is a function, not a single number). At each time step an arc updates its function as follows: the cost of assigning x units of flow to link e is the cost of pushing x units of flow through e plus the minimum-cost way of assigning flow to neighboring edges (w.r.t. the functions they computed last time step) to restore flow conservation at the endpoints of e.

Before formally stating theorem of convergence of BP for MCF, we first give the definition of a residual network [1]. Define G(x) to be the residual network of

G and flow x as: G(x) has the same vertex set as G,

and ∀e = (v, w) ∈ E, if xe < ue, then e is an arc in

G(x) with cost cxe = ce, also, if xe > 0, then there is

an arc e= (w, v) in G(x) with cost cxe =−ce, now, let

δ(x) = minC∈C{cx(C) =e∈Ccxe}, where C is the set

of directed cycles in G(x). Note that if x∗ is a unique optimal solution of MCF with directed graph G, then

δ(x∗) > 0 in G(x∗).

Theorem 3.1. Suppose MCF has a unique optimal

solution x∗. Define L to be the maximum cost of a simple directed path in G(x∗). Then for any N

(2δ(xL∗) + 1)n, we have ˆxN = x∗.

Thus, by Theorem 3.1, the BP algorithm finds the unique optimal solution ofMCF in at most (2δ(xL∗) +

1)n iterations.

3.1 Computing/Encoding Message Functions

First, we formally define a convex piecewise-linear func-tion:

Definition 3.1. A function f is a convex piecewise-linear function if for some finite set of reals, a0< a1< ... < an, (allowing a0=−∞ and an=∞), we have:

f (z) = ⎧ ⎨ ⎩ c1(z− a1) + f (a1) if z ∈ [a0, a1] ci+1(z− ai) + f (ai) if z ∈ (ai, ai+1], 1≤ i ≤ n otherwise where c1 < c2 < ... < cn and f (a1) ∈ R. We

define a0, a1, ..., an as the vertices of f , p(f ) = n

(p(f ) denotes the number of linear pieces in f ) and ci(z− ai−1) + f (ai−1) for z ∈ [ai−1, ai] the ith linear piece of f .

Clearly, if f is a convex piecewise-linear function, then we can store all the“information” about f in a finite vector of size O(p(f )).

Property 3.1. Let f1(x), f2(x) be two convex

piecewise-linear functions. Then, f1(ax + b), cf1(x) +

df2(x) are also convex piecewise-linear functions, for

any real numbers a, b, c and d, where c≥ 0, d ≥ 0.

Definition 3.2. Let S = {f1, f2, ..., fk} be a set of

convex piecewise-linear functions, and let Ψt:Rk → R

be:

Ψt(x) = 

0 if ki=1xi= t

otherwise

We say IS(t) = minx∈Rk{ψt(x) +ki=1fi(xi)} is a

interpolation of S.

Figure 2: Functions f1and f2

Figure 3: Interpolation of f1 and f2

Lemma 3.1. Let f1, f2 be two convex piecewise-linear

functions, and suppose z1∗, z∗2 are vertices of f1, f2

and z∗1 = arg min f1(z), z2 = arg min f2(z). Let

S = {f1, f2}, then IS(t) (the interpolation of S) is a

convex piecewise-linear function and it can be computed in O(p(f1) + p(f2)) operations.

Proof. We prove this lemma by describing a procedure

to construct IS(t). The idea behind construction of

(6)

of f1 and f2. Specifically, let g(t) be the function

which is defined only at z1 + z2∗, with g(z∗1 + z∗2) =

f1(z1∗) + f2(z∗2), and let L1= z1∗, L2= z2∗, U1= z∗1, U2=

z2∗. We will construct g iteratively and eventually have

g(t) = IS(t). The construction is described as follows: at each iteration, let X1(and X2) be the linear piece of

f1(and f2) at the left side of L1(and L2). Then choose the linear piece with the larger slope from{X1, X2}, and

“stitch” this piece onto the left side of the left endpoints of g. If Pi is chosen, update Li to the vertex which is on the left end of Pi. As an example, consider f1 and

f2in Figure 2, then z1∗= 1 and z2∗= 0 are vertices of f1

and f2such that z1∗= arg min f1(z), z2∗= arg min f2(z).

Note that the linear piece X1in the procedure is labeled

as P 1 on the graph, while X2does not exist (since there

is no linear piece for f2 on the right side of z2). Hence,

we “stitch” P 1 to the left side of g, and update L1to 0.

Similarly, let Y1 (Y2) be the linear piece of f1 (f2)

at the right side of U1(U2), then choose the linear piece

with the smaller slope and “stitch” this piece onto the right side of the right endpoints of g. If Qi is chosen, update Ui to the vertex which is on the right side of Pi. Again, we use f1 and f2 in Figure 2 as an illustration. Then the linear piece Y1 in the procedure is labeled as

P 2, while Y2 is labeled as P 3. As P 2 has a lower slope than P 3, we “stitch” P 2 to the right side of g and update

U1to 2.

Repeat this procedure until both L1 (and L2)

and U1 (and U2) are the left most (and right most)

endpoints of f1(and f2), or the both endpoints of g are

infinity. See Figure 2 and Figure 3 as an illutration of interpolation of two functions.

Note that the total number of iterations is bounded by O(p(f1) + p(f2)), and each iteration takes at most constant number of operations. By construction, it is clear that g is a convex piecewise-linear function. Also, g(z1∗+ z2∗) = f1(z1∗) + f2(z2), and by the way we

constructed g, we must have g(t)≤ {Ψt(x) + f1(x1) +

f2(x2)} for any t ∈ R. Therefore, we have g = IS, and

we are done.

Remark. In the case where either of f1 or f2 do

not have global minimal, the interpolation can be still obtained in O(p(f1) + p(f2)) operations, but it involves

tedious analysis of different cases. As reader will notice, the implementation of Algorithm 2 does not require this case and hence we don’t discuss it here.

Theorem 3.2. Given a set S consisting of convex

piecewise-linear functions, IS(t) is a convex

piecewise-linear function. Moreover, suppose |S| = k, and P =



f∈Sp(f ). Then, IS(t) can be computed in O(P·log k)

operations.

Proof. For the sake of simplicity, assume k is divisible by

2. Let S1={f1, f2}, S2={f3, f4}, ..., Sk

2 ={fk−1, fk}, and S ={IS1, IS2, ..., ISk

2}. Then one can observe that

IS = IS, by definition of IS. By Lemma 3.1, each function Sis piecewise-linear, and S can be computed in O(P ) operations. Consider changing S to S as a procedure of decreasing the number of convex piecewise-linear functions. This procedure reduces the number by a factor of 2 each time, and consumes O(P ) operations. Hence, it takes O(log k) procedures to reduce set S into a single convex piecewise-linear function. And hence, computing IS(t) takes O(P· log k) operations.

Definition 3.3. Let S = {f1, f2, ..., fk} be a set of

convex piecewise-linear functions, a∈ Rk, and let Ψt: Rk → R be:

Ψt(x) = 

0 if ki=1aixi = t

otherwise , ∀v ∈ V

We say ISa(t) = minx∈Rk{ψt(x) +ki=1fi(xi)} is a

scaled interpolation of S.

Corollary 3.1. Given any set of convex

piecewise-linear functions S, ISa(t) is a convex piecewise-linear

function. Suppose |S| = k, and P = f∈Sp(f ), then, the corners and slopes of IS(t) can be found in

O(P · log k) operations.

Proof. Let S = {f1(x), f2(x), ..., fk(x)}, and let S =

{f

1(x) = f1(a1x), f2(x) = f2(a2x), ..., fk(x) =

fk(akx)}. Then we have ISa(t) = IS(t), and the result follows immediately from Theorem 3.2.

Corollary 3.2. For any nonnegative integer t, e ∈ E

such that e = (v, w) (or e = (w, v)), then mte→v, a message function at t-th iteration of Algorithm 2, is a convex piecewise-linear function.

Proof. We show this by doing induction on t. For t = 0, mte→vis a convex piecewise-linear function by definition. For t > 0, let us remind ourselves that mte→v(z) =

φe(z)+minz∈R¯ |Ew|,¯ze=z ψwz) +e∈E˜ w\emt−1e→w˜ (¯ze˜) , for z∈ R.

By induction hypothesis, any mt−1e→w˜ is a piecewise-linear convex function. Suppose S = {mt−1e→w˜ze˜),

˜

e ∈ Ew\ e}, and a = REw\e, a

e = Δ(v, e)ze for any

e ∈ Ew \ e. Then, g(z) = minz∈R¯ |Ew|,¯ze=z{ψwz) +



˜

e∈Ew\em

t−1

˜

e→wze˜)} is equal to ISa(cz + d) for some real

number c and d (i.e., a shifted scaled interpolation of

S). By Corollary 3.1, g(z) is a convex piecewise-linear

function, and φe is a convex piecewise-linear function. Therefore, we have that mte→v = g + φe is a convex piecewise-linear function.

(7)

Corollary 3.3. Suppose the cost vector c in MCF is

integral, then at iteration t, for any defined message function mte→v(z), let {s1, s2, ..., sk} be the slopes of

mte→v(z), then −tcmax ≤ s1 ≤ sk ≤ tcmax, and si is integral for each 1≤ i ≤ k.

Proof. We prove the corollary using induction. For

t = 0, the statement is clearly true. For t ≥ 1, we

recall that mte→v(z)− φe(z) = minz∈R¯ |Ew|,¯ze=z{ψwz) +



˜

e∈Ew\em

t−1

˜

e→wze˜)} is a scaled interpolation of message

functions in the form of mt−1˜e→w(possibly shifted). Since

Δ(v, e) = ±1, the absolute values of the slopes for the linear pieces of mte→v−φeis same as the absolute values of the slopes for the linear pieces of message functions

mt−1e→w˜ which was used for interpolation. Therefore, by induction hypothesis, the absolute values of the slopes of mte→v− φe are integral and bounded by (t− 1)cmax.

This implies the absolute values of slopes of mte→v are integral and bounded by tcmax.

Corollary 3.4. Suppose the vectors b and u are

in-tegral in MCF, then at iteration t, for any message function mte→v, the vertices of mte→v are also integral. Proof. Again, we use induction. For t = 0, the statement is clearly true. For t≥ 1, as mte→v− φe is a piecewise-linear function obtained from a shifted scaled interpolation of functions mt−1e→w˜ . Since Δ(v, e) = ±1, and b is integral, the shift of the interpolation is integral, and as Δ(v, e) = ±1, the scaled factor is ±1. As all vertices in any function mt−1˜e→w are integral, we deduce that all vertices of mte→v − φe must also be integral. Since u is integral, all vertices of φe are integral, and therefore, all vertices of mte→v are integral.

Corollary 3.1 and Corollary 3.2 shows that at each iteration, every message function can be encoded in terms of a finite vector describing the corners and slopes of its linear pieces in finite number of iterations. Using similar argument, we can see that the message functions in Algorithm 1 are also convex piecewise-linear functions, and thus can also be encoded in terms of a finite vector describing the corners and slopes of its linear pieces. We would like to note the result that message functions mte→v are convex piecewise-linear functions can be also shown by sensitivity analysis of LP (see Chapter 5 of [7]).

3.2 BP on Special Min-cost Flow Problems

Al-though we have showed in Section 3.1 that each message functions can be computed as piecewise-linear convex functions, those functions tends to become complicated to compute and store as the number of iterations grows. When an instance ofMCF has b, u, integral, then mte→v

(for some e = (u, v), t∈ N+) is a piecewise-linear func-tion with at most uelinear pieces. Thus, if ueis bounded by some constant for all e, while b and u are integral, then the message functions at each iteration are much simpler. Next, we present a subclass of MCF, which can be solved quickly using BP compare to the more generalMCF. Given a directed graph G = (V, E), this subclass of problem is defined as follows:

min e∈E cexe (MCF)  e∈Ev Δ(v, e)xe= bv, ∀v ∈ V  e∈δ(v) xe≤ ˜uv, ∀v ∈ V, δ(v) = {(u, v) ∈ E} 0≤ xe≤ ue, ∀e ∈ E

c, u, and ˜u are all integral, and all of their entries are

bounded by a constant K.

To see MCF is indeed a MCF, observe that for each v∈ V , if we split v into two vertices vin and vout, where vin is incident to all in-arcs of v with bvin = 0, and vout is incident to all out-arcs of v with bvout = bv; while create an arc from vinto vout, set the capacity of the arc to be ˜uv, and cost of the arc to be 0. Let the new graph be G, then, the MCF on G is equivalent to MCF. Although we can use Algorithm 2 to solve the MCF on the new graph, we would like to define functions ψv, φe,∀v ∈ V , ∀e ∈ E: ψv(z) = ⎧ ⎨ ⎩ 0 if e∈EvΔ(v, e)ze= bv

and e∈δ(v)xe≤ ˜uv

otherwise

φe(z) = 

cez if 0≤ z ≤ ue

otherwise

and apply Algorithm 2 with defined ψ and φ. When we update message functions mte→v for all e∈ Ew, the inequality e∈δ(w)xe ≤ ˜uw, allow us to only look at the ˜uw linear pieces from message functions mt−1e→w˜ , for

all but constant number of e ∈ Ew. This allows us to further trim the running time of BP forMCF, and we will present this more formally in Section 6.1.

4 Convergence of BP for MCF

Before proving Theorem 3.1, we define the computa-tion trees, and then establish the conneccomputa-tion between computation trees and BP. Using this connection, we proceed to prove the main result of our paper.

(8)

v1 v2 v3 v4 v5 G e1e2 e3 e4 e5 e6 e7 v1 v3 v2 v2 v4 v4 v5 v1 v4 v1 v4 v2 v3 v1 v2 Te23

Figure 4: Computation tree of G rooted at e3= (1, 3)

4.1 Computation Tree and BP Let TeN, the N -computation tree corresponding to variable xe, be de-fined as follows:

1. Let V (TeN) be the vertex set of TeN and E(TeN) be the arc set of TeN. Let Γ(.) be the map which maps each vertex v ∈ V (TeN) to a vertex v ∈ V . And Γ preserves the arcs (i.e. For each e = (vα, vβ)

E(TeN), Γ(e) = (Γ(vα), Γ(vβ))∈ E).

2. Divide V (TeN) into N + 1 levels, on the 0-th level, we have a “root” arc r, and Γ(r) = e. And a vertex

vis on the t-th level of TeN if v is t arcs apart from either nodes of r.

3. For any v ∈ V (TeN), let Ev be the set of all arcs

incident to v. If v is on the t-th level of TeN, and

t < N , then Γ(Ev) = EΓ(v) (i.e. Γ preserves the

neighborhood of v).

4. For every vertex vthat is on the N -th level of TeN,

v is incident to exactly one arc in TeN.

In other literatures, TeN is known as the “unwrapped tree” of G rooted at e. Figure 4 gives an example of a computation tree. We make a note that the definition of computation trees we have introduced is slightly different from the definition in other papers [4] [6] [18], although the analysis and insight for computation trees is very similar.

Let Vo(TeN) be the set of all the vertices which are not on the N -th level of TeN. Consider the problem

min  f∈E(TN e ) cΓ(f)xf (MCFNe )  f∈Ev Δ(v, f )xf = bΓ(v), ∀v ∈ Vo(TeN) 0≤ xf≤ uΓ(f), f ∈ E(TeN)

Loosely speaking, MCFNe is almost an MCF on TeN. There is a flow constraint for every arc e∈ E(TeN), and a

demand/supply constraint for every node v∈ Vo(TeN). Now, we state the lemma which exhibits the connection between BP and the computation trees.

Lemma 4.1. MCFN

e has an optimal solution y∗,

satis-fying y∗r= ˆxNe (r is the root of TeN, Γ(r) = e).

This result is not too surprising as BP is exact on trees. A thorough analysis can show that the message functions in BP are essentially the implementation of an algorithm using the“bottom up” (aka dynamic programming) approach, starting from the bottom level of Tet. A more formal, combinatorial proof of this lemma also exists, which is a rather technical argument using induction. Proofs for similar statements can be found in [4] and [18].

4.2 Proof of the Main Theorem

Proof. [Proof of Theorem 3.1] Suppose ∃e0 ∈ E and

N ≥ (2δ(xL∗) +1)n such that ˆxNe0 = x∗e0. By Lemma 4.1, we can find an optimal solution y∗ for MCFNe0, where

y∗r = ˆxNe0. Now, without loss of generality, we assume

y∗r > x∗e0. Let r = (vα, vβ), because y∗ is feasible for

MCFNe0 and x∗ is feasible forMCF, we have

bΓ(vα)=  e∈Evα Δ(vα, e)ye∗= yr+  e∈Evα\r Δ(vα, e)ye bΓ(vα)=  e∈Evα Δ(vα, e)x∗Γ(e) = x∗e+  e∈Evα\r Δ(vα, e)x∗Γ(e)

Then, we can find arc e1 incident to vα0, e1 = r, such that y∗e 1 > x Γ(e 1) if e 

1 has the same orientation as r,

and ye∗ 1 < x Γ(e 1) if e 

1 has the opposite orientation as

r. Similarly, we can find some arc e−1 incident to vβ satisfying the same condition. Let vα1, vα−1 be the vertices on e1, e−1 which are one level below vα, vβ on

TeN, then, we can find some arc e2, e−2 satisfying the

same condition. If we continue this all the way down to the leaf nodes of TeN0, we can find the set of arcs

{e −N, e−N+1, ..., e−1, , e1, ..., eN} such that ye∗ i > x Γ(e i) ⇐⇒ e 

i has the same orientation as r

ye∗ i < x Γ(e i) ⇐⇒ e 

i has the opposite orientation as r

Let X = {e−N, e−N+1, ..., e−1, e0 = r, e1, ..., eN}.

For any e = (vp, vq) ∈ X, let Aug(e) =

(vp, vq) if ye∗ > xΓ(e ), and Aug(e) = (vq, vp)

if ye∗ < x∗Γ(e). Then, each Γ(Aug(e)) is an

arc on the residual graph G(x∗), and W =

(Aug(e−N), Aug(e−N+1), ..., Aug(e0), ..., Aug(eN)) is a

directed path on TeN0. We call W the augmenting path of y∗ with respect to x∗. Also, Γ(W ) is a directed

(9)

walk on G(x∗). Now we can decompose Γ(W ) into a simple directed path P (W ), and a set of simple di-rected cycles Cyc(W ). Because each simple didi-rected cycle/path on G(x∗) can have at most n arcs, and W has 2N + 1 arcs, where N ≥ (2δ(xL∗) + 1)n, we have

that |Cyc(W )| > δ(xL∗). Then, we have

c∗(W ) =  C∈Cyc(W ) c∗(C) + c∗(P (W ))  C∈Cyc(W ) c∗(C)− L > L δ(x∗)· δ(x )− L = 0

Let F orw = {e|e ∈ X , y∗e > x∗Γ(e)}, Back = {e|e ∈ P , y∗e < x∗Γ(e)}. Since both F orw and Back are finite,

we can find λ > 0 such that ye∗− λ ≥ x∗Γ(e), ∀e ∈ F orw and ye∗+ λ≤ x∗Γ(e),∀e ∈ Back. Let ˜y ∈ R|E(Te0N)| such

that ˜ ye= ⎧ ⎨ ⎩ ye∗− λ : e ∈ F orw ye∗+ λ : e∈ Back 0 : otherwise

We can think ˜y as pushing λ units of flow on W for y∗. Since for each e ∈ F orw, y∗e − λ ≥ x∗Γ(e) ≥ 0, and for each e ∈ Back, ye∗+ λ ≤ x∗Γ(e) ≤ uΓ(e), ˜y

satisfy all the flow constraints. Also, because F orw =

{e|e ∈ X , e has the same orientation as r}, and Back = {e|e ∈ X , e has the opposite orientation as r}, we

have that for any v ∈ Vo(TeN), e∈E

vΔ(v, e)˜ye =



e∈EvΔ(v, e)y

e = bΓ(e), which implies ˜y satisfies all

the demand/supply constraints. Therefore, ˜y is feasible

solution for (MCFNe0). But  e∈E(TN e0) cΓ(e)(ye)  e∈E(TN e0) cΓ(e)ye) =  e∈F orw cΓ(e)λ−  e∈Back cΓ(e)λ = c∗(W )λ > 0

This contradicts the optimality of y∗, and completes the proof of Theorem 3.1.

Corollary 4.1. Given an MCF with integral data c,

b and u. Suppose cmax = maxe∈E{ce}. Let Algorithm

2 run N = n2cmax+ n iterations. Let z∗e = arg nNe (z),

then we have nNe (ze∗−1)−ncmax> neN(ze∗) and nNe(ze+ 1)− ncmax > nNe(z∗e) iff MCF has a unique optimal

solution.

Proof. The proof of this corollary uses the same idea of

finding an augmenting path on the computation tree as in Theorem 3.1.

(⇒): Suppose MCF has a unique optimal solution. Let y be the optimal solutionMCFNe0 when the variable at r, yr is fixed to be ze∗− 1. Then, we can find an augmenting path W of y with respect to x∗ of length 2n2cmax. Then W can be decomposed into at least 2ncmax disjoint cycles and one path. Since each cycle has a cost of at most −δ(x∗), which is at least -1 as

MCF has integral data. Hence, once we push 1 unit of

flow of y on W , we can decrease the cost of y by at least

ncmax. Hence nNe(ze∗− 1) + ncmax< nNe(ze). Similarly,

we can also show that nNe(ze∗) < nNe(z∗e+ 1) + ncmax.

(⇐): Suppose MCF does not have a unique optimal solution. Let x∗ be an optimal solution ofMCF, and y be the optimal solutionMCFNe0. Again, we can find an augmenting path W of y with respect to x∗ of length 2n2cmax, and we can decompose W into at least 2ncmax

disjoint cycles. and one path P . The cost of P is at most (n− 1)cmax, while each cycle has a non-positive cost. Hence, when we push 1 unit of flow of y on W , we increase the cost of y by at most (n − 1)cmax < ncmax. Therefore, depends on the orientation of r on

W , we either have nNe(ze∗− 1) − ncmax ≤ nNe(ze) or

nNe (ze+ 1)− ncmax≤ nNe(ze).

Theorem 4.1 shows that BP can be used to detect the existence of an unique optimal solution for MCF.

5 Extension of BP on Convex-Cost Network Flow Problems

In this section, we discuss the extension of Theorem 3.1 to convex-cost network flow problem (or convex flow

problem). Convex flow problem is defined on graph

G = (V, E) as follows: min e∈E ce(xe) (CP)  e∈Ev Δ(v, e)xe= bv, ∀v ∈ V 0≤ xe≤ ue, ∀e ∈ E

where each ce is a convex piecewise-linear function. Notice that if we define ψ exactly same as we did for

MCF, and for each e ∈ E, define

φe(z) = 

ce(z) if 0≤ z ≤ ue

otherwise

then, we can apply Algorithm 2 on the graph G with functions ψ and φ, and hence obtain BP on convex flow problem.

(10)

Now, we give the definition of a residual graph on convex cost flow problem. Suppose x is a feasible solution for (CP), let G(x) be the residual graph for

G and x∗ defined as follows: ∀e = (vα, vβ) ∈ E, if xe < ue, then e is also an arc in G(x) with cost

cxe = limt→0c(xe+t)−c(xe)

t (the value of the slope of ceat

the right side of xe); if xe> 0, then there is an arc e= (vβ, vα) in G(x) with cost cxe = limt→0c(xe−t)−c(xt e)

(negative value of the slope of ce at the left side of xe). Finally, let δ(x) = min C∈C  e∈C cxe  .

where C is the set of directed cycles in G(x).

Theorem 5.1. Suppose x∗ is the unique optimal

solu-tion for CP. Let L to be the maximum cost of a simple directed path in G(x∗), and by uniqueness of optimal

so-lution, δ(x∗) > 0. Then, for any N ≥ (2δ(xL∗) + 1)m,

ˆ

xN = x∗. Namely, the BP algorithm converges to the optimal solution in at most ( L

2δ(x∗) + 1)m iterations.

The proof of Theorem 5.1 is almost identical to the proof of Theorem 3.1. Theorem 5.1 is an illustration of the power of BP, that not only it can solve MCF, but possibly many other variants ofMCF as well.

6 Running Time of BP on MCF

In the next two sections, we will always assume vectors

c, u and b in problemMCF are integral. This

assump-tion is not restrictive in practice, as theMCF solved in practice usually have rational data sets [1]. We define

cmax = maxe∈E{ce}, and provide an upper bound for

the running time of BP onMCF in terms m (the num-ber of arcs in G), n (the numnum-ber of nodes in G), and

cmax.

Lemma 6.1. Suppose MCF has integral data, in

Al-gorithm 2, the number of operations to update all the messages at iteration t is O(nm log n· tcmax).

Proof. At iteration t, fix a valid message function mte→v, and we update it as:

mte→v(z) e(z) + min ¯ z∈R|Ew|,¯ze=z ⎧ ⎨ ⎩ψwz) +  ˜ e∈Ew\e mt−1e→w˜ze˜) ⎫ ⎬ ⎭ As MCF has integral data, by Corollary 3.3, any messages of the form mt−1˜e→w has integral slopes, and the absolute values of the slopes are bounded by (t− 1)cmax. This implies for each mt−1e→w˜ , the number of

linear pieces of mt−1e→w˜ is bounded by 2(t− 1)cmax.

v1 v3 v2 e1 e3 e2 Figure 5:

As the size of Ew is at most n, by Corollary 3.1,

g(z) = minz∈R¯ |Ew|,¯ze=z{ψwz) +e∈E˜ w\emt−1˜e→wz˜e)} can be calculated in O(log n· ntcmax) operations. As

mte→v(z) = g(z) + φe(z), we have that mte→v can also be calculated in O(log n· ntcmax) operations. Since at each

iteration, we update 2m message functions, the total number of computations at iteration t can be bounded by O(nm log n· tcmax).

Theorem 6.1. Given MCF has a unique optimal

so-lution x∗ and integral data, the BP algorithm finds the unique optimal solution ofMCF in O(n5m log n· c3max)

operations.

Proof. Because c is integral, the value δ(x∗) in Theorem 3.1 is also integral. Recall that L is the maximum cost of a simple directed path in G(x∗). Since simple directed path has at most n− 1 arcs, L is bounded by ncmax.

Thus, by Theorem 3.1, the BP algorithm (Algorithm

2) converges to the optimal solution of MCF after

O(n2cmax) iterations. Combine this with Lemma 6.1, we have that BP converges to the optimal solution in

O(n2cmax·nm log n·n2cmax·cmax) = O(n5m log n·c3max)

operations.

We would like to point out that for MCF with

a unique optimal solution, Algorithm 2 can take an exponential number of iterations to converge. Consider the MCF on the directed graph G shown in Figure 5. Take a large positive integer D, set ce1 = ce2 = D,

ce3 = 2D− 1, bv1 = 1, bv2 = 0 and bv3 =−1. It can be checked that ˆxN1 alternates between 1 and−1 when

2N + 1 < 2D3 . This means that BP algorithm takes at least Ω(D) iterations to converge. Since the input size of a large D is just log(D), we have that Algorithm 2 for

MCF does not converge to the unique optimal solution

in polynomial time.

6.1 Runtime of BP on MCF Here we analyze

the run time of BP on MCF, which is defined in

Section 3.2. We show that BP performs much better on this special class of MCF. Specifically, we state the following theorem:

Theorem 6.2. If MCFhas a unique optimal solution,

Algorithm 2 forMCFfinds the unique optimal solution in O(n4) operations.

(11)

First, we state the convergence result of BP forMCF which is reminiscent of Theorem 3.1.

Theorem 6.3. Suppose MCF has a unique optimal

solution x∗. Define L to be the maximum cost of a simple directed path in G(x∗). Then, Algorithm 2 for

MCF converges to the optimal solution in at most

(2δ(xL∗) + 1)n iterations.

The proof of this theorem is exactly the same as the

proof of Theorem 3.1, and is hence omitted. Now,

we apply this result to compute the running time of Algorithm 2 forMCF.

Proof. [Proof of Theorem 6.2] For the simplicity of the

proof, we assume that every linear piece in a message

function has unit length. This assumption is not

restrictive, as each linear piece in general has integral vertices (from Corollary 3.4), so we can always break a bigger linear piece into many unit length linear pieces with the same slopes. At iteration t≥ 1, recall mte→v(z) is defined as: φe(z) + min ¯ z∈R|Ew|,¯ze=z ⎧ ⎨ ⎩ψwz) +  ˜ e∈Ew\e mt−1e→w˜ze˜) ⎫ ⎬ ⎭ We claim that each mte→v can be constructed using the following information:

(1) ˜uw smallest linear pieces in {mt−1e→w˜ |˜e ∈ δ(w) \ e}, where δ(w) denotes all in-arcs of w;

(2) (˜uw − bw) smallest linear pieces in {mt−1˜e→w|˜e ∈

γ(w)\ e} where γ(w) denotes all out-arcs of w;

(3) The value of˜e∈Ewmt−1˜e→w(0).

Observe that once all three sets of information is given, for any integer 0≤ z ≤ ue, one can find mte→v(z) in a constant number of operations. As ue is also bounded by a constant, we have that mte→v can be computed in constant time when information (1), (2) and (3) are given.

Now, since |Ew| is bounded by n, obtaining each one of information (1), (2) or (3) for a specific message

mte→v takes O(n) operations. When we update all

the messages in the set {mte→v|e ∈ Ew}, note that every e ∈ Ew except for a constant number of them, information (1) for updating mte→v is equivalent to ˜

uw smallest pieces in mt−1e→w˜ for some ˜e ∈ δ(w); and

information (2) for updating mte→v is equivalent ˜uw−bw

smallest linear pieces in mt−1e→w˜ for some ˜e ∈ γ(w).

In fact, the exceptions are precisely those e such that

mt−1e→w contain at least one of those ˜uw smallest linear pieces in {mt−1e→w˜ |˜e ∈ δ(w)} or one of those (˜uw− bw) smallest linear pieces in {mt−1e→w˜ |˜e ∈ γ(w)}. Moreover,

update any message in the set {mte→v|e ∈ Ew} has

information (3) to be exactly˜e∈Ewmt−1e→w˜ (0). Hence,

to update all messages in the set {mte→v|e ∈ Ew}, we can obtain information (1), (2), (3) for all the messages in {mte→v|e ∈ Ew} in O(n) operations, and hence can compute all messages in {mte→v|e ∈ Ew} in O(n)

operations.

Therefore, updating all the message functions at every iteration takes O(n2) operations. By Theorem

6.2, since L is bounded by nK, δ(x∗) ≥ 1, then

(2δ(xL∗) + 1)n is bounded by O(n2). This concludes

that Algorithm 2 finds a unique optimal solution in

O(n4) operations.

Note both the shortest-path problem and bipartite matching problem belongs to the class ofMCF, where ˜

u, b, u are all bounded by 2.

7 FPRAS Implementation

In this section, we provide a fully polynomial-time

randomized approximation scheme (FPRAS) for MCF

using BP as a subroutine. First, we describe the main idea of our approximation scheme.

As Theorem 6.1 indicated, in order to come up with an efficient approximation scheme using BP (Algorithm 2), we need to get around the following difficulties:

1. The convergence of BP requires MCF to have a

unique optimal solution.

2. The running time of BP is polynomial in m, n and

cmax.

For an instance of MCF, our strategy for finding an efficient (1 + ) approximation scheme is to find a modified cost vector ¯c (letMCF be the problem with

the modified cost vector) which satisfies the following properties:

1. TheMCF has a unique optimal solution. 2. ¯cmax is polynomial in m, n and 1.

3. The optimal solution ofMCF is a “near optimal”

solution of MCF. The term “near optimal” is

rather fuzzy and we will address this question later in the section.

First, in order to find a modified MCF with a unique optimal solution, we first state a result which is a variant of the Isolation Lemma introduced in [15].

7.1 Variation of Isolation Lemma

Theorem 7.1. Let MCF be an instance of min-cost

flow problem with underlying graph G = (V, E), demand vector b, constraint vector u. Let its cost vector ¯c be generated as follows: for each e ∈ E, ¯ce is chosen

(12)

independently and uniformly over Ne, where Ne is a discrete set of 4m positive numbers (m = |E|). Then, the probability thatMCF has a unique optimal solution is at least 12.

Proof. Fix an arc e1∈ E, and fix ¯ce for all e∈ E \ e1.

Suppose that there exists α ≥ 0, such that when

¯

ce1 = α, MCF have optimal solutions x∗, x∗∗, where

x∗e1 = 0 and x∗∗e1 > 0. Then, if ¯ce1 > α, for any feasible

solution x ofMCF , where xe1 > 0, we have

 e∈E ¯ cex∗e =  e∈E,e=e1 ¯ cex∗e  e∈E,e=e1 ¯ cexe+ xe1α < e∈E ¯ cexe

And if ¯ce1 < α, for any feasible solution x of MCF

where xe1= 0, we have  e∈E ¯ cex∗∗e <  e∈E,e=e1 ¯ cex∗∗e + αx∗∗e1  e∈E,e=e1 ¯ cexe+ αxe1 = e∈E ¯ cexe

This implies there exists at most one value for α, such that if ¯ce1 = α thenMCF have optimal solutions

x∗, x∗∗, where x∗e1 = 0 and x∗∗e1 > 0. Similarly, we can

also deduce that there exists at most one value for β, such that if ¯ce1 = β, MCF have optimal solutions x∗,

x∗∗, where x∗e1 < ue1 and x∗∗e1 = ue1.

Let OS denote the set of all optimal solutions of

MCF, and let D(e) be the condition of either: 0 = xe,

∀x ∈ OS, or 0 < xe < ue, ∀x ∈ OS, or xe = ue,

∀x ∈ OS. Since ¯ce1 has 4m possible values, where each value is chosen with equal probability, we conclude that the probability of D(e1) is satisfied is at least

4m−2

4m = 2m−12m . By the union bound of probability, we

have that the probability of D(e) is satisfied for all e∈ E is at least 1e∈E 2m1 = 12. Now, we state the following lemma:

Lemma 7.1. If ∀e ∈ E, condition D(e) is satisfied, then

MCF have a unique optimal solution.

Then by Lemma 7.1, we conclude that the probability that MCF has a unique optimal solution is at least 12.

Proof. [Proof of Lemma 7.1] Suppose x∗and x∗∗are two distinct optimal solutions of MCF. Let d = x∗∗− x∗, then x∗+ λd is an optimal solution of MCF iff 0 ≤

(x∗+ λd)e ≤ ue, ∀e ∈ E. As ¯ce > 0 for any e ∈ E,

and ¯cTd = ¯cTx∗∗− ¯cTx∗= 0, there exists some e such that de < 0. Let λ∗= sup{λ | x∗+ λd is an optimal

solution of MCF}, since de < 0, λ∗ is bounded; and since x∗+ d = x∗∗, λ∗> 0. By optimality of λ∗, there must exists some esuch that either (x∗+ λ∗d)e= 0 or

ue. Since λ∗ > 0, x∗e = (x∗+ λ∗d)e, this contradicts the assumption that D(e) is satisfied. Thus, MCF must have a unique optimal solution.

We note that Theorem 7.1 can be easily modified for LP in standard form.

Corollary 7.1. Let LP be an LP problem with

con-straint Ax = b, where A is a m× n matrix, b ∈ Rm. The cost vector ¯c ofLP is generated as follows: for each e ∈ E, ¯ce is chosen independently and uniformly over Ne, where Ne is a discrete set of 2n elements. Then, the probability that LP has a unique optimal solution is at least 12.

7.2 Finding the Correct Modified Cost Vector ¯c

Next, we provide a randomly generate ¯c with the desired

properties stated in the beginning of this section. Let

X : E → {1, 2, ..., 4m} be a random function where for

each e∈ E, X(e) is chosen independently and uniformly over the range. Let t = cmax

4mn, and generate ¯c as: for

each e ∈ E, let ¯ce = 4m· ce

t + X(e). Then, for any

¯

c generated at random, ¯cmax is polynomial in m, n and 1

, and by Theorem 7.1, the probability ofMCF having

a unique optimal solution is greater than 12.

Now, we introduce algorithm APRXMT(MCF, ),

which works as follows: select a random ¯c; try to solve MCF using BP. If BP discovers that MCF has no

unique optimal solution, then we start the procedure by selecting another ¯c at random, otherwise, return

the unique optimal solution found by BP. Formally, we

present APRXMT(MCF, ) as Algorithm 3.

Corollary 7.2. The expected number of operations

for APRXMT(MCF, ) to terminate is O(n8m73log n).

Proof. With Theorem 7.1, when we call

APRXMT(MCF, ), the expected number of MCF

BP tried to solve is bounded by 2. For each selection of ¯c, we run Algorithm 2 for 2¯cmaxn2 iterations. As ¯

cmax = O(m2n), by Lemma 6.1, the expected number

of operations for APRXMT(MCF, ) to terminate is

O(¯c3maxn5m log n) = O(n

8m7log n

3 ).

Now let ¯c be a randomly chosen vector such that MCF has a unique optimal solution x2. The next thing

we want is to show that x2 is a “near optimal” solution of MCF. To accomplish this, let e= arg max{ce} and

(13)

Algorithm 3 APRXMT(MCF, )

1: Let t = cmax

4mn, for any e ∈ E, assign ¯ce = 4m·

ce

t +pe, where pe is an integer chosen

indepen-dently, uniformly random from{1, 2, . . . , 4m}

2: LetMCF be the problem with modified cost ¯c.

3: Run Algorithm 2 on MCF for N = 2¯cmaxn2

iterations.

4: Use Corollary 4.1 to determine ifMCF has a unique solution.

5: if MCF does not have a unique solution then

6: Restart the procedure APRXMT(MCF, ).

7: else

8: Terminate and return x2 = ˆxN, where ˆxN if the “belief” vector found in Algorithm 2.

9: end if min e∈E cexe (MCF)  e∈Ev

Δ(v, e)xe= bv, ∀v ∈ V (demand constraints)

xe = x2e

0≤ xe≤ ue, ∀e ∈ E (flow constraints)

Suppose x3 is an optimal solution for (MCF), and x1 is an optimal solution ofMCF, then

Lemma 7.2. cTx3− cTx1≤ |x2

e − x1e|nt.

Proof. Consider d = x2 − x1, we call a vector k

{−1, 0, 1}|E|an synchronous cycle vector of d if for any

e ∈ E, ke = 1 only if de > 0, ke =−1 only if de < 0,

and the set {(i, j)|kij = 1 or kji =−1} forms exactly one directly cycle in G. Since d is a integral vector of circulation (i.e., d send 0 unti amount of flow to

every vertex v ∈ V ). Hence that we can decompose

d so that k∈Kk = d, for some set K of consisting of

synchronous cycle vectors.

Now, let K ⊂ K be the subset of K containing all vectors k ∈ K such that ke = 0. Then, for any

k ∈ K, observe that x2− k is a feasible solution for

MCF. Because x2 is the optimal solution for MCF,

we have that ¯ck≤ 0. And

¯ ce= 4mce t + pe, 1≤ pe≤ 4m, =⇒ ¯ce,4mce t ∈ [4m ce t , 4m( ce t + 1)], =⇒ |¯ce−4mce t | ≤ 4m, =  e |4mce t − ¯ce||ke| ≤  e |4m||ke| ≤ 4mn, but 4mc k t 4mck t − ¯c k e |4mce t − ¯ce||ke|, thus, we have 4mc k t ≤ 4mn, =⇒ ck≤ nt.

Clearly, x1+k∈Kk satisfy the demand/supply

constraints of MCF. Since min{x1e, x2e} ≤ x1e + 

k∈Kke ≤ max{x1e, x2e} for all e ∈ E, and each

k∈ K, we have that x1+k∈Kk is a feasible solution

for MCF. Since x3 is the optimal solution of MCF,

cTx3 ≤ cTx1+k∈KcTk ≤ cTx1+|K|nt. Since |K| = |x2 e− x1e|, we have cTx3− cTx1≤ |x2e − x1e|nt. Corollary 7.3. cTx3≤ (1 +  2m)cTx1, for any ≤ 2. Proof. By Lemma 7.2, c Tx3− cTx1 cTx3 |x2 e− x1e|nt cTx3 |x2e − x1e|nt |x2 e− x1e|ce =  4m, as t = ce 4mn. Thus, cTx3− cTx1 cTx3  4m =⇒ cTx3 1 14m c Tx1 =⇒ cTx3≤ (1 +  2m)c Tx1

where the last inequality holds because (1 4m )(1 +



2m) = 1 +4m 

2

8m ≥ 1, as 2m ≤ 1.

7.3 A (1 + ) Approximation Scheme Loosely

speaking, Corollary 7.3 shows that x2 at arc e is “near optimal”, since fixing the flow at arc e to x2e helps us

in finding a feasible solution of MCF which is close to optimal. This leads us to approximate algorithm AS(MCF, ) (see Algorithm 4); at each iteration, it uses APRXMT (Algorithm 3), and iteratively fixes the flow at the arc with the largest cost.

Figure

Figure 1: An example of a factor graph
Figure 2: Functions f 1 and f 2
Figure 4: Computation tree of G rooted at e 3 = (1, 3)

Références

Documents relatifs

It prefers ndbm for compatibility reasons with Perl 4, then Berkeley DB (See DB_File), GDBM, SDBM (which is always there--it comes with Perl), and finally ODBM. Requires

This is a &#34;plug-in&#34; class that allows Perldoc to use Pod::Simple::Checker as a &#34;formatter&#34; class (or if that is not available, then Pod::Checker), to check for

The following options are supported: center, date, fixed, fixedbold, fixeditalic, fixedbolditalic, quotes, release, section. (Those options are explained in Pod::Man.)

If Perldoc is running under MSWin and uses this class as a formatter, the output will be opened with write.exe or whatever program is specified in the environment variable

It supports the following options, which are explained in Pod::Text: alt, indent, loose, quotes, sentence, width.

simple-minded version of Pod::Simple::Text that formats only the &#34;Pod Errors&#34; section (if Pod::Simple even generates one for the given document). This is a subclass

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular

Pod::Simple::XMLOutStream's output is space-padded in a way that's better for sending to an XML processor (that is, it has no ignoreable whitespace). But