Belief Propagation for Min-Cost Network Flow: Convergence & Correctness

(1)

Belief Propagation for Min-Cost Network

Flow: Convergence & Correctness

The MIT Faculty has made this article openly available.

Please share

how this access benefits you. Your story matters.

Citation

David Gamarnik, Devavrat Shah, and Yehua Wei. 2010. Belief

propagation for min-cost network flow: convergence \& correctness.

In Proceedings of the Twenty-First Annual ACM-SIAM Symposium

on Discrete Algorithms (SODA '10). Society for Industrial and

Applied Mathematics, Philadelphia, PA, USA, 279-292. Copyright ©

2010, Society for Industrial and Applied Mathematics

As Published

http://dl.acm.org/citation.cfm?id=1873625

Publisher

Society for Industrial and Applied Mathematics

Version

Final published version

Citable link

http://hdl.handle.net/1721.1/73521

Terms of Use

Article is made available in accordance with the publisher's

policy and may be subject to US copyright law. Please refer to the

publisher's site for terms of use.

(2)

Belief Propagation for Min-cost Network Flow: Convergence & Correctness

David Gamarnik

∗

Devavrat Shah

†

Yehua Wei

‡

Abstract

We formulate a Belief Propagation (BP) algorithm in the context of the capacitated minimum-cost network flow prob-lem (MCF). Unlike most of the instances of BP studied in the past, the messages of BP in the context of this problem are piecewise-linear functions. We prove that BP converges to the optimal solution in pseudo-polynomial time, provided that the optimal solution is unique and the problem input is integral. Moreover, we present a simple modification of the BP algorithm which gives a fully polynomial-time random-ized approximation scheme (FPRAS) forMCF. This is the first instance where BP is proved to have fully-polynomial running time.

1 Introduction

The Markov random field or graphical model provides a succinct representation for capturing the dependency structure between a collection of random variables. In the recent years, the need for large scale statistical in-ference has made graphical models the representation of choice in many statistical applications. There are two key inference problems for a graphical model of interest. The first problem is the computation of marginal distri-bution of a random variable. This problem is (compu-tationally) equivalent to the computation of partition function in statistical physics or counting the number of combinatorial objects (e.g., independent sets) in com-puter science. The second problem is that of finding the mode of the distribution, i.e., the assignment with max-imum likelihood (ML). For a constrained optimization problem, when the constraints are modeled through a graphical model and probability is proportional to the cost of the assignment, an ML assignment is the same as an optimal solution. Both of these questions, in general, are computationally hard.

Belief Propagation (BP) is an “umbrella” heuristic designed for these two problems. Its version for the ﬁrst problem is known as the “sum-product algorithm” and for the second problem is known as the “max-∗_{Operations Research Center and Sloan School of}

Manage-ment, MIT, Cambridge, MA, 02139, e-mail: gamarnik@mit.edu

†_EECS, _MIT, _Cambridge, _MA, _02139, _e-mail:

devavrat@mit.edu

‡_{Operations Research Center, MIT, Cambridge, MA, 02139,}

e-mail: y4wei@MIT.EDU

product algorithm”. Both versions of the BP algorithm are iterative, simple and message-passing in nature. In this paper, our interest lies in the correctness and convergence properties of the max-product version of BP when applied to the minimum-cost network ﬂow problems (MCF), an important class of linear (or more generally convex) optimization problems. In the rest of the paper, we will use the term BP to refer to the max-product version unless speciﬁed otherwise.

The BP algorithm is essentially an approximation of the dynamic programming assuming that underly-ing graphical model is a tree [8], [22], [16]. Speciﬁcally, assuming that the graphical model is a tree, one can obtain a natural parallel iterative version of the dy-namic programming in which variable nodes pass mes-sages between each other along edges of the graphical model. Somewhat surprisingly, this seemingly naive BP heuristic has become quite popular in practice [2], [9], [11], [17]. In our opinion, there are two primary reasons for the popularity of BP. First, like dynamic program-ming, it is generically applicable, easy to understand and implementation-friendly due to its iterative, simple and message-passing nature. Second, in many practical scenarios, the performance of BP is surprisingly good [21] [22]. On one hand, for an optimist, this unexpected success of BP provides a hope for its being a genuinely much more powerful algorithm than what we know thus far (e.g., better than primal-dual methods). On the other hand, a skeptic would demand a systematic un-derstanding of the limitations (and strengths) of BP, in order to caution a practitioner in using it. Thus, irre-spective of the perirre-spective of an algorithmic theorist, rigorous understanding of BP is very important.

Prior work. Despite these compelling reasons, only

recently have we witnessed an explosion of research for theoretical understanding of the performance of the BP algorithm in the context of various combinatorial optimization problems, both tractable and intractable (NP-hard) versions. To begin with, Bayati, Shah and Sharma [6] considered the performance of BP for the the problem of ﬁnding maximum weight matching in a bipartite graph. They established that BP converges in pseudo-polynomial time to the optimal solution when the optimal solution is unique [6]. Bayati et al. [4] as well as Sanghavi et al. [18] generalized this result

(3)

v1 v2 v3

e1 e2 e3 e4 e5 e6 e7

Figure 1: An example of a factor graph

by establishing correctness and convergence of the BP algorithm for b-matching problem when the linear pro-gramming (LP) relaxation corresponding to the node constraints has a unique integral optimal solution. Note that the LP relaxation corresponding to the node con-straints is not tight in general, as inclusion of the odd-cycle elimination constraints [20] is essential. Further-more, [4] and [18] established that the BP does not con-verge if this LP relaxation does have a non-integral so-lution. Thus, for a b-matching problem BP ﬁnds an optimal answer when the LP relaxation can ﬁnd an optimal solution. In the context of maximum weight independent set problem, a one-sided relation between LP relaxation and BP is established [19]; if BP con-verges then it is correct and LP relaxation is tight. We also take note of the work on BP for Steiner tree prob-lem [5], special instances of quadratic/convex optimiza-tion problems [13], [12], Gaussian graphical models [10], and the relation between primal and message-passing for LP decoding [3].

2 BP for Linear Programing Problem

Suppose we are given an LP problem in the standard form:

min cTx

(LP)

s.t. Ax = b, x≥ 0, x ∈ Rn

where A is a real m × n matrix, b is a real vector of dimension m, and c is a real vector of dimension

n. We deﬁne its factor graph, FLP, to be a bipartite graph, with factor nodes v1, v2, ..., vmand variable nodes

e1, e2, ..., en, such that eiand vjare adjacent if and only if aij = 0. For example, the graph shown in Figure 1, is

the factor graph of the LP problem:

min 4x1+ x2− 3x4+ x5+ 2x6− x7

s.t. x1+ 9x3− x4+ 5x7= 10

−2x1+ x2+ x3− 3x4+ x5− x6+ x7= 3

x1+ x6= 2

x≥ 0, x ∈ R7.

In FLP, for every 1≤ i ≤ m, let Ei={ej| aij= 0}; and

let Vj ={vi| aij = 0}. For each factor node vi, deﬁne the factor function ψ_i:R|Ei|→ ¯R (¯R = R ∪ ∞) as:

ψ_i(z) =

0 if _j∈E

iaijzj= bj

∞ otherwise

Essentially, ψ_i represents the i-th equality constraints in LP. If we let xEi be the subvector of x with indices

in Ei, then ψi(xEi) <∞ if and only if x does not violate

the i-th equality constraint inLP. Also, for any variable node ej, deﬁne variable function φj :R → ¯R as:

φ_j(z) =

c_jz if z≥ 0

∞ otherwise

Notice that problem LP is equivalent to

min_x∈Rnm_i=1ψ_i(x_E_i) + n_j=1φ_j(x_j), which is

de-ﬁned on F_LP. As BP is deﬁned on graphical models, we can now formulate the BP algorithm as an iterative message-passing algorithm on F_LP. It is formally

presented below as Algorithm 1. The idea of the

algorithm is that at the t-th iteration, every factor vertex vi sends a message function mt_v_i_→e_j(z) to each one of its neighbors ej, where mt_v_i_→e_j(z) is vi’s estimate of the cost if xj takes value z, based on the messages vi received at time t from its neighbors. Additionally, every variable vertex e_j sends a message function mt_e_j_→v_i(z) to each one of its neighbors v_i, where mt_e_j_→v_i(z) is e_j’s estimate of the cost if x_j takes value z, based on the messages e_j received at time t− 1 from all of its neighbors except v_j. After N iterations, the algorithm calculates the “belief” at variable node

e_j using the messages sent to vertex e_j at time N for each j.

In Algorithm 1, one basic question is the computa-tion procedure of message funccomputa-tions of the form mt_e_j_→v_i and mt_v_i_→e_j. We claim that every message function is a convex piecewise-linear function, which allows us to en-code it in terms of a ﬁnite vector describing the break points and slopes of its linear pieces. In subsection 3.1, we describe in detail how mt_v_i_→e_j(z) can be computed. In general, the max-product behaves badly when solving an LP instance without a unique optimal so-lution [6], [4]. Yet, even with the assumption that an

(4)

Algorithm 1 BP for LP

1: Initialize t = 0, and for each variable node ej, create factor to variable message m0_v_i_→e_j(z) = 0,∀e_j∈ E_i.

2: for t = 1, 2, ..., N do

3: ∀e_j, v_i, where v_i∈ V_j, set mt_e_j_→v_i(z) to

φ_j(z) +

vi∈Vj\vi

mt−1_v

i→ej(z),

∀z ∈ R.

4: ∀e_j, vi, where ej ∈ Ei, set mt_v_i_→e_j(z) to

min ¯ z∈R|Ei|_,¯z_j_=z ⎧ ⎨ ⎩ψi(¯z) + ej∈Ei\ej mt_e_j_→v_i(¯zj) ⎫ ⎬ ⎭, ∀z ∈ R. 5: t := t + 1 6: end for

7: Set the “belief” function as b N_j (z) = φ_j(z) +

vi∈Vjm

N

vi→ej(z),∀1 ≤ j ≤ n.

8: Calculate the “belief” by ﬁnding xˆN_j =

arg min bN_j (z) for each j.

9: Return ˆxN.

LP problem has a unique optimal solution, in general ˆ

xN in Algorithm 1 does not converge to the unique op-timal solution as N grows large. One such instance (LP-relaxation of an instance of the maximum-weight independent set problem) which was earlier described in [19]: min − 3 i=1 2xi− 3 j=1 3yi s.t. x_i+ y_j+ z_ij= 1, ∀1 ≤ i ≤ 3, 1 ≤ j ≤ 3 x, y, z≥ 0

3 BP Algorithm for Min-Cost Network Flow Problem

We start oﬀ this section by deﬁning the capacitated

min-cost network ﬂow problem (MCF). Given a directed

graph G = (V, E), let V , E denote the set of vertices and arcs respectively, and let|V | = n, |E| = m. For any vertex v∈ V , let Evbe the set of arcs incident to v, and for any e∈ Ev, let Δ(v, e) = 1 if e is an out-arc of v (i.e. arc e = (v, w), for some w∈ V ), and Δ(v, e) = −1 if e is an in-arc of v (i.e. arc e = (w, v), for some w ∈ V ).

MCF on G is formulated as follows [7],[1]: min e∈E cexe, (MCF) e∈Ev

Δ(v, e)x_e= b_v, ∀v ∈ V (demand constraints)

0≤ x_e≤ u_e, ∀e ∈ E (ﬂow constraints)

where c_e≥ 0, u_e≥ 0, c_e∈ R, u_e∈ ¯R, for each e ∈ E, and bv ∈ R for each v ∈ V . As MCF is a special class

of LP, we can view each arc e∈ E as a variable vertex and each v∈ V as a factor vertex, and deﬁne functions

ψv, φe,∀v ∈ V , ∀e ∈ E: ψv(z) = 0 if _e∈E vΔ(v, e)ze= bv ∞ otherwise φe(z) = cez if 0≤ z ≤ ue ∞ otherwise

Algorithm 2 BP for Network Flow

1: Initialize t = 0. For each e∈ E, suppose e = (v, w). Initialize messages m0_e→v(z) = 0, ∀z ∈ R and

m0_e→w(z) = 0,∀z ∈ R.

2: for t = 1, 2, 3, ..., N do

3: ∀e ∈ E, let e = (v, w), update messages:

mt_e→v(z) = φ_e(z) + min ¯ z∈R|Ew|_,¯z_e_=z ⎧ ⎨ ⎩ψw(¯z) + ˜ e∈Ew\e mt−1_e→w_˜ (¯ze˜) ⎫ ⎬ ⎭, ∀z ∈ R mt_e→w(z) = φ_e(z) + min ¯ z∈R|Ev|_,¯z_e_=z ⎧ ⎨ ⎩ψv(¯z) + ˜ e∈Ev\e mt−1_e→v_˜ (¯ze˜) ⎫ ⎬ ⎭, ∀z ∈ R. 4: t := t + 1 5: end for

6: ∀e ∈ E, let e = (v, w), and set the “belief” function:

nN_e(z) = mN_e→v(z) + mN_e→w(z)− φ_e(z)

7: Calculate the “belief” by ﬁnding xˆN_e =

arg min nN_e(z) for each e∈ E.

8: Return ˆxN, which is the guess of the optimal solution ofMCF.

(5)

Clearly, solving MCF is equivalent to solving min_x∈R|E|{_v∈V ψ_v(x_E_v)+_e∈Eφ_e(x_e)}. We can then

apply BP algorithm for LP (Algorithm 1) for MCF.

Because of the special structure ofMCF, each variable node is adjacent to exactly two factor nodes. This al-lows us to skip the step of update messages mt_v→e, and present a simpliﬁed version of BP algorithm forMCF, which we refer to as Algorithm 2.

To understand Algorithm 2 at a high level, each arc can be thought of as an agent, which is trying to figure out its own flow while meeting the conservation constraints at its endpoints. Each link maintains an estimate of its “local cost” as a function of its flow (thus this estimate is a function, not a single number). At each time step an arc updates its function as follows: the cost of assigning x units of flow to link e is the cost of pushing x units of flow through e plus the minimum-cost way of assigning flow to neighboring edges (w.r.t. the functions they computed last time step) to restore flow conservation at the endpoints of e.

Before formally stating theorem of convergence of BP for MCF, we first give the definition of a residual network [1]. Define G(x) to be the residual network of

G and ﬂow x as: G(x) has the same vertex set as G,

and ∀e = (v, w) ∈ E, if x_e < ue, then e is an arc in

G(x) with cost cx_e = c_e, also, if x_e > 0, then there is

an arc e= (w, v) in G(x) with cost cx_e =−ce, now, let

δ(x) = min_C∈C{cx(C) =_e∈Ccx_e}, where C is the set

of directed cycles in G(x). Note that if x∗ is a unique optimal solution of MCF with directed graph G, then

δ(x∗) > 0 in G(x∗).

Theorem 3.1. Suppose MCF has a unique optimal

solution x∗. Deﬁne L to be the maximum cost of a simple directed path in G(x∗). Then for any N ≥

(_2δ(xL∗) + 1)n, we have ˆxN = x∗.

Thus, by Theorem 3.1, the BP algorithm ﬁnds the unique optimal solution ofMCF in at most (_2δ(xL∗) +

1)n iterations.

3.1 Computing/Encoding Message Functions

First, we formally deﬁne a convex piecewise-linear func-tion:

Definition 3.1. A function f is a convex piecewise-linear function if for some ﬁnite set of reals, a0< a1< ... < an, (allowing a0=−∞ and an=∞), we have:

f (z) = ⎧ ⎨ ⎩ c1(z− a1) + f (a1) if z ∈ [a0, a1] c_i+1(z− a_i) + f (a_i) if z ∈ (a_i, a_i+1], 1≤ i ≤ n ∞ otherwise where c1 < c2 < ... < cn and f (a1) ∈ R. We

deﬁne a0, a1, ..., an as the vertices of f , p(f ) = n

(p(f ) denotes the number of linear pieces in f ) and c_i(z− a_i−1) + f (a_i−1) for z ∈ [a_i−1, a_i] the ith linear piece of f .

Clearly, if f is a convex piecewise-linear function, then we can store all the“information” about f in a ﬁnite vector of size O(p(f )).

Property 3.1. Let f1(x), f2(x) be two convex

piecewise-linear functions. Then, f1(ax + b), cf1(x) +

df2(x) are also convex piecewise-linear functions, for

any real numbers a, b, c and d, where c≥ 0, d ≥ 0.

Definition 3.2. Let S = {f1, f2, ..., fk} be a set of

convex piecewise-linear functions, and let Ψ_t:Rk → R

be:

Ψ_t(x) =

0 if k_i=1x_i= t

∞ otherwise

We say I_S(t) = min_x∈Rk{ψ_t(x) +k_i=1f_i(x_i)} is a

interpolation of S.

Figure 2: Functions f1and f2

Figure 3: Interpolation of f1 and f2

Lemma 3.1. Let f1, f2 be two convex piecewise-linear

functions, and suppose z₁∗, z∗₂ are vertices of f1, f2

and z∗₁ = arg min f1(z), z2∗ = arg min f2(z). Let

S = {f1, f2}, then IS(t) (the interpolation of S) is a

convex piecewise-linear function and it can be computed in O(p(f1) + p(f2)) operations.

Proof. We prove this lemma by describing a procedure

to construct IS(t). The idea behind construction of

(6)

of f1 and f2. Speciﬁcally, let g(t) be the function

which is deﬁned only at z1∗ + z2∗, with g(z∗1 + z∗2) =

f1(z1∗) + f2(z∗2), and let L1= z1∗, L2= z2∗, U1= z∗1, U2=

z2∗. We will construct g iteratively and eventually have

g(t) = IS(t). The construction is described as follows: at each iteration, let X1(and X2) be the linear piece of

f1(and f2) at the left side of L1(and L2). Then choose the linear piece with the larger slope from{X1, X2}, and

“stitch” this piece onto the left side of the left endpoints of g. If P_i is chosen, update L_i to the vertex which is on the left end of P_i. As an example, consider f1 and

f2in Figure 2, then z1∗= 1 and z2∗= 0 are vertices of f1

and f2such that z1∗= arg min f1(z), z2∗= arg min f2(z).

Note that the linear piece X1in the procedure is labeled

as P 1 on the graph, while X2does not exist (since there

is no linear piece for f2 on the right side of z2). Hence,

we “stitch” P 1 to the left side of g, and update L1to 0.

Similarly, let Y1 (Y2) be the linear piece of f1 (f2)

at the right side of U1(U2), then choose the linear piece

with the smaller slope and “stitch” this piece onto the right side of the right endpoints of g. If Qi is chosen, update Ui to the vertex which is on the right side of Pi. Again, we use f1 and f2 in Figure 2 as an illustration. Then the linear piece Y1 in the procedure is labeled as

P 2, while Y2 is labeled as P 3. As P 2 has a lower slope than P 3, we “stitch” P 2 to the right side of g and update

U1to 2.

Repeat this procedure until both L1 (and L2)

and U1 (and U2) are the left most (and right most)

endpoints of f1(and f2), or the both endpoints of g are

inﬁnity. See Figure 2 and Figure 3 as an illutration of interpolation of two functions.

Note that the total number of iterations is bounded by O(p(f1) + p(f2)), and each iteration takes at most constant number of operations. By construction, it is clear that g is a convex piecewise-linear function. Also, g(z1∗+ z2∗) = f1(z1∗) + f2(z2∗), and by the way we

constructed g, we must have g(t)≤ {Ψt(x) + f1(x1) +

f2(x2)} for any t ∈ R. Therefore, we have g = IS, and

we are done.

Remark. In the case where either of f1 or f2 do

not have global minimal, the interpolation can be still obtained in O(p(f1) + p(f2)) operations, but it involves

tedious analysis of diﬀerent cases. As reader will notice, the implementation of Algorithm 2 does not require this case and hence we don’t discuss it here.

Theorem 3.2. Given a set S consisting of convex

piecewise-linear functions, I_S(t) is a convex

piecewise-linear function. Moreover, suppose |S| = k, and P =

f∈Sp(f ). Then, IS(t) can be computed in O(P·log k)

operations.

Proof. For the sake of simplicity, assume k is divisible by

2. Let S1={f1, f2}, S2={f3, f4}, ..., Sk

2 ={fk−1, fk}, and S ={I_S₁, I_S₂, ..., I_S_k

2}. Then one can observe that

I_S = I_S, by deﬁnition of I_S. By Lemma 3.1, each function Sis piecewise-linear, and S can be computed in O(P ) operations. Consider changing S to S as a procedure of decreasing the number of convex piecewise-linear functions. This procedure reduces the number by a factor of 2 each time, and consumes O(P ) operations. Hence, it takes O(log k) procedures to reduce set S into a single convex piecewise-linear function. And hence, computing I_S(t) takes O(P· log k) operations.

Definition 3.3. Let S = {f1, f2, ..., fk} be a set of

convex piecewise-linear functions, a∈ Rk, and let Ψt: Rk _{→ R be:}

Ψ_t(x) =

0 if k_i=1a_ix_i = t

∞ otherwise , ∀v ∈ V

We say I_Sa(t) = min_x∈Rk{ψ_t(x) +k_i=1f_i(x_i)} is a

scaled interpolation of S.

Corollary 3.1. Given any set of convex

piecewise-linear functions S, I_Sa(t) is a convex piecewise-linear

function. Suppose |S| = k, and P = _f∈Sp(f ), then, the corners and slopes of I_S(t) can be found in

O(P · log k) operations.

Proof. Let S = {f1(x), f2(x), ..., fk(x)}, and let S =

{f

1(x) = f1(a1x), f2(x) = f2(a2x), ..., f_k(x) =

fk(akx)}. Then we have ISa(t) = IS(t), and the result follows immediately from Theorem 3.2.

Corollary 3.2. For any nonnegative integer t, e ∈ E

such that e = (v, w) (or e = (w, v)), then mt_e→v, a message function at t-th iteration of Algorithm 2, is a convex piecewise-linear function.

Proof. We show this by doing induction on t. For t = 0, mt_e→vis a convex piecewise-linear function by deﬁnition. For t > 0, let us remind ourselves that mt_e→v(z) =

φe(z)+minz∈R¯ |Ew|_,¯z_e_=z ψw(¯z) +_e∈E_˜ _w_\emt−1e→w˜ (¯ze˜) , for z∈ R.

By induction hypothesis, any mt−1_e→w_˜ is a piecewise-linear convex function. Suppose S = {mt−1_e→w_˜ (¯ze˜),

˜

e ∈ E_w\ e}, and a = REw\e_{, a}

e = Δ(v, e)ze for any

e ∈ E_w \ e. Then, g(z) = minz∈R¯ |Ew|_,¯z_e_=z{ψw(¯z) +

˜

e∈Ew\em

t−1

˜

e→w(¯ze˜)} is equal to ISa(cz + d) for some real

number c and d (i.e., a shifted scaled interpolation of

S). By Corollary 3.1, g(z) is a convex piecewise-linear

function, and φe is a convex piecewise-linear function. Therefore, we have that mt_e→v = g + φe is a convex piecewise-linear function.

(7)

Corollary 3.3. Suppose the cost vector c in MCF is

integral, then at iteration t, for any deﬁned message function mt_e→v(z), let {s1, s2, ..., sk} be the slopes of

mt_e→v(z), then −tcmax ≤ s1 ≤ sk ≤ tcmax, and si is integral for each 1≤ i ≤ k.

Proof. We prove the corollary using induction. For

t = 0, the statement is clearly true. For t ≥ 1, we

recall that mt_e→v(z)− φe(z) = minz∈R¯ |Ew|_,¯z_e_=z{ψw(¯z) +

˜

e∈Ew\em

t−1

˜

e→w(¯ze˜)} is a scaled interpolation of message

functions in the form of mt−1˜e→w(possibly shifted). Since

Δ(v, e) = ±1, the absolute values of the slopes for the linear pieces of mt_e→v−φ_eis same as the absolute values of the slopes for the linear pieces of message functions

mt−1_e→w_˜ which was used for interpolation. Therefore, by induction hypothesis, the absolute values of the slopes of mt_e→v− φ_e are integral and bounded by (t− 1)cmax.

This implies the absolute values of slopes of mt_e→v are integral and bounded by tcmax.

Corollary 3.4. Suppose the vectors b and u are

in-tegral in MCF, then at iteration t, for any message function mt_e→v, the vertices of mt_e→v are also integral. Proof. Again, we use induction. For t = 0, the statement is clearly true. For t≥ 1, as mt_e→v− φ_e is a piecewise-linear function obtained from a shifted scaled interpolation of functions mt−1_e→w_˜ . Since Δ(v, e) = ±1, and b is integral, the shift of the interpolation is integral, and as Δ(v, e) = ±1, the scaled factor is ±1. As all vertices in any function mt−1_˜_e→w are integral, we deduce that all vertices of mt_e→v − φe must also be integral. Since u is integral, all vertices of φe are integral, and therefore, all vertices of mt_e→v are integral.

Corollary 3.1 and Corollary 3.2 shows that at each iteration, every message function can be encoded in terms of a finite vector describing the corners and slopes of its linear pieces in finite number of iterations. Using similar argument, we can see that the message functions in Algorithm 1 are also convex piecewise-linear functions, and thus can also be encoded in terms of a finite vector describing the corners and slopes of its linear pieces. We would like to note the result that message functions mt_e→v are convex piecewise-linear functions can be also shown by sensitivity analysis of LP (see Chapter 5 of [7]).

3.2 BP on Special Min-cost Flow Problems

Al-though we have showed in Section 3.1 that each message functions can be computed as piecewise-linear convex functions, those functions tends to become complicated to compute and store as the number of iterations grows. When an instance ofMCF has b, u, integral, then mt_e→v

(for some e = (u, v), t∈ N+) is a piecewise-linear func-tion with at most u_elinear pieces. Thus, if u_eis bounded by some constant for all e, while b and u are integral, then the message functions at each iteration are much simpler. Next, we present a subclass of MCF, which can be solved quickly using BP compare to the more generalMCF. Given a directed graph G = (V, E), this subclass of problem is deﬁned as follows:

min e∈E cexe (MCF) e∈Ev Δ(v, e)x_e= b_v, ∀v ∈ V e∈δ(v) x_e≤ ˜u_v, ∀v ∈ V, δ(v) = {(u, v) ∈ E} 0≤ x_e≤ u_e, ∀e ∈ E

c, u, and ˜u are all integral, and all of their entries are

bounded by a constant K.

To see MCF is indeed a MCF, observe that for each v∈ V , if we split v into two vertices v_in and v_out, where v_in is incident to all in-arcs of v with b_v_in = 0, and v_out is incident to all out-arcs of v with b_v_out = b_v; while create an arc from v_into v_out, set the capacity of the arc to be ˜u_v, and cost of the arc to be 0. Let the new graph be G, then, the MCF on G is equivalent to MCF. Although we can use Algorithm 2 to solve the MCF on the new graph, we would like to deﬁne functions ψv, φe,∀v ∈ V , ∀e ∈ E: ψ_v(z) = ⎧ ⎨ ⎩ 0 if _e∈E_vΔ(v, e)ze= bv

and _e∈δ(v)x_e≤ ˜u_v

∞ otherwise

φe(z) =

cez if 0≤ z ≤ ue

∞ otherwise

and apply Algorithm 2 with deﬁned ψ and φ. When we update message functions mt_e→v for all e∈ E_w, the inequality _e∈δ(w)x_e ≤ ˜u_w, allow us to only look at the ˜uw linear pieces from message functions mt−1e→w˜ , for

all but constant number of e ∈ Ew. This allows us to further trim the running time of BP forMCF, and we will present this more formally in Section 6.1.

4 Convergence of BP for MCF

Before proving Theorem 3.1, we deﬁne the computa-tion trees, and then establish the conneccomputa-tion between computation trees and BP. Using this connection, we proceed to prove the main result of our paper.

(8)

v1 v2 v3 v4 v5 G e1e2 e3 e4 e5 e6 e7 v1 v3 v2 v2 v4 v4 v5 v1 v4 v1 v4 v2 v3 v1 v2 T_e2₃

Figure 4: Computation tree of G rooted at e3= (1, 3)

4.1 Computation Tree and BP Let T_eN, the N -computation tree corresponding to variable xe, be de-ﬁned as follows:

1. Let V (T_eN) be the vertex set of T_eN and E(T_eN) be the arc set of T_eN. Let Γ(.) be the map which maps each vertex v ∈ V (T_eN) to a vertex v ∈ V . And Γ preserves the arcs (i.e. For each e = (v_α, v_β)∈

E(T_eN), Γ(e) = (Γ(v_α), Γ(v_β))∈ E).

2. Divide V (T_eN) into N + 1 levels, on the 0-th level, we have a “root” arc r, and Γ(r) = e. And a vertex

vis on the t-th level of T_eN if v is t arcs apart from either nodes of r.

3. For any v ∈ V (T_eN), let E_v be the set of all arcs

incident to v. If v is on the t-th level of T_eN, and

t < N , then Γ(E_v) = E_Γ(v) (i.e. Γ preserves the

neighborhood of v).

4. For every vertex vthat is on the N -th level of T_eN,

v is incident to exactly one arc in T_eN.

In other literatures, T_eN is known as the “unwrapped tree” of G rooted at e. Figure 4 gives an example of a computation tree. We make a note that the definition of computation trees we have introduced is slightly different from the definition in other papers [4] [6] [18], although the analysis and insight for computation trees is very similar.

Let Vo(T_eN) be the set of all the vertices which are not on the N -th level of T_eN. Consider the problem

min f∈E(TN e ) c_Γ(f)x_f (MCFN_e ) f∈Ev Δ(v, f )x_f = b_Γ(v), ∀v ∈ Vo(T_eN) 0≤ x_f≤ u_Γ(f), f ∈ E(T_eN)

Loosely speaking, MCFN_e is almost an MCF on T_eN. There is a ﬂow constraint for every arc e∈ E(T_eN), and a

demand/supply constraint for every node v∈ Vo(T_eN). Now, we state the lemma which exhibits the connection between BP and the computation trees.

Lemma 4.1. MCFN

e has an optimal solution y∗,

satis-fying y∗_r= ˆxN_e (r is the root of T_eN, Γ(r) = e).

This result is not too surprising as BP is exact on trees. A thorough analysis can show that the message functions in BP are essentially the implementation of an algorithm using the“bottom up” (aka dynamic programming) approach, starting from the bottom level of T_et. A more formal, combinatorial proof of this lemma also exists, which is a rather technical argument using induction. Proofs for similar statements can be found in [4] and [18].

4.2 Proof of the Main Theorem

Proof. [Proof of Theorem 3.1] Suppose ∃e0 ∈ E and

N ≥ (_2δ(xL∗)+1)n such that ˆxNe0 = x∗e0. By Lemma 4.1, we can ﬁnd an optimal solution y∗ for MCFN_e₀, where

y∗_r = ˆxN_e₀. Now, without loss of generality, we assume

y∗_r > x∗_e₀. Let r = (v_α, v_β), because y∗ is feasible for

MCFN_e₀ and x∗ is feasible forMCF, we have

b_Γ(v_α)= e∈Evα Δ(vα, e)y_e∗= y_r∗+ e∈Evα\r Δ(vα, e)y_e∗ b_Γ(v_α)= e∈Evα Δ(v_α, e)x∗_Γ(e) = x∗_e+ e∈Evα\r Δ(v_α, e)x∗_Γ(e)

Then, we can ﬁnd arc e1 incident to vα0, e1 = r, such that y∗_e 1 > x ∗ Γ(e 1) if e

1 has the same orientation as r,

and y_e∗ 1 < x ∗ Γ(e 1) if e

1 has the opposite orientation as

r. Similarly, we can ﬁnd some arc e₋₁ incident to v_β satisfying the same condition. Let v_α₁, v_α₋₁ be the vertices on e1, e−1 which are one level below vα, vβ on

T_eN, then, we can ﬁnd some arc e2, e−2 satisfying the

same condition. If we continue this all the way down to the leaf nodes of T_eN₀, we can ﬁnd the set of arcs

{e −N, e−N+1, ..., e−1, , e1, ..., e_N} such that y_e∗ i > x ∗ Γ(e i) ⇐⇒ e

i has the same orientation as r

y_e∗ i < x ∗ Γ(e i) ⇐⇒ e

i has the opposite orientation as r

Let X = {e_−N, e_−N+1, ..., e₋₁, e0 = r, e1, ..., eN}.

For any e = (vp, vq) ∈ X, let Aug(e) =

(v_p, v_q) if y_e∗ > x_Γ(e∗ ), and Aug(e) = (vq, vp)

if y_e∗ < x∗_Γ(e). Then, each Γ(Aug(e)) is an

arc on the residual graph G(x∗), and W =

(Aug(e_−N), Aug(e_−N+1), ..., Aug(e0), ..., Aug(eN)) is a

directed path on T_eN₀. We call W the augmenting path of y∗ with respect to x∗. Also, Γ(W ) is a directed

(9)

walk on G(x∗). Now we can decompose Γ(W ) into a simple directed path P (W ), and a set of simple di-rected cycles Cyc(W ). Because each simple didi-rected cycle/path on G(x∗) can have at most n arcs, and W has 2N + 1 arcs, where N ≥ (_2δ(xL∗₎ + 1)n, we have

that |Cyc(W )| > _δ(xL∗). Then, we have

c∗(W ) = C∈Cyc(W ) c∗(C) + c∗(P (W )) ≥ C∈Cyc(W ) c∗(C)− L > L δ(x∗)· δ(x ∗₎_{− L} = 0

Let F orw = {e|e ∈ X , y∗_e > x∗_Γ(e)}, Back = {e|e ∈ P , y∗_e < x∗_Γ(e)}. Since both F orw and Back are ﬁnite,

we can ﬁnd λ > 0 such that y_e∗− λ ≥ x∗_Γ(e), ∀e ∈ F orw and y_e∗+ λ≤ x∗_Γ(e),∀e ∈ Back. Let ˜y ∈ R|E(Te0N)| _such

that ˜ y_e= ⎧ ⎨ ⎩ y_e∗− λ : e ∈ F orw y_e∗+ λ : e∈ Back 0 : otherwise

We can think ˜y as pushing λ units of ﬂow on W for y∗. Since for each e ∈ F orw, y∗_e − λ ≥ x∗_Γ(e) ≥ 0, and for each e ∈ Back, y_e∗+ λ ≤ x∗_Γ(e) ≤ u_Γ(e), ˜y

satisfy all the ﬂow constraints. Also, because F orw =

{e|e ∈ X , e has the same orientation as r}, and Back = {e|e ∈ X , e has the opposite orientation as r}, we

have that for any v ∈ Vo(T_eN), _e∈E

vΔ(v, e)˜ye =

e∈EvΔ(v, e)y

∗

e = bΓ(e), which implies ˜y satisﬁes all

the demand/supply constraints. Therefore, ˜y is feasible

solution for (MCFN_e₀). But e∈E(TN e0) c_Γ(e)(y_e∗)− e∈E(TN e0) c_Γ(e)(˜y_e) = e∈F orw c_Γ(e)λ− e∈Back c_Γ(e)λ = c∗(W )λ > 0

This contradicts the optimality of y∗, and completes the proof of Theorem 3.1.

Corollary 4.1. Given an MCF with integral data c,

b and u. Suppose cmax = maxe∈E{ce}. Let Algorithm

2 run N = n2cmax+ n iterations. Let z∗_e = arg nN_e (z),

then we have nN_e (z_e∗−1)−ncmax> n_eN(z_e∗) and nN_e(z_e∗+ 1)− ncmax > nN_e(z∗_e) iﬀ MCF has a unique optimal

solution.

Proof. The proof of this corollary uses the same idea of

ﬁnding an augmenting path on the computation tree as in Theorem 3.1.

(⇒): Suppose MCF has a unique optimal solution. Let y be the optimal solutionMCFN_e₀ when the variable at r, yr is ﬁxed to be z_e∗− 1. Then, we can ﬁnd an augmenting path W of y with respect to x∗ of length 2n2cmax. Then W can be decomposed into at least 2ncmax disjoint cycles and one path. Since each cycle has a cost of at most −δ(x∗), which is at least -1 as

MCF has integral data. Hence, once we push 1 unit of

ﬂow of y on W , we can decrease the cost of y by at least

ncmax. Hence nN_e(z_e∗− 1) + ncmax< nN_e(z_e∗). Similarly,

we can also show that nN_e(z_e∗) < nN_e(z∗_e+ 1) + ncmax.

(⇐): Suppose MCF does not have a unique optimal solution. Let x∗ be an optimal solution ofMCF, and y be the optimal solutionMCFN_e₀. Again, we can ﬁnd an augmenting path W of y with respect to x∗ of length 2n2cmax, and we can decompose W into at least 2ncmax

disjoint cycles. and one path P . The cost of P is at most (n− 1)cmax, while each cycle has a non-positive cost. Hence, when we push 1 unit of ﬂow of y on W , we increase the cost of y by at most (n − 1)cmax < ncmax. Therefore, depends on the orientation of r on

W , we either have nN_e(z_e∗− 1) − ncmax ≤ nN_e(z_e∗) or

nN_e (z_e∗+ 1)− ncmax≤ nN_e(z_e∗).

Theorem 4.1 shows that BP can be used to detect the existence of an unique optimal solution for MCF.

5 Extension of BP on Convex-Cost Network Flow Problems

In this section, we discuss the extension of Theorem 3.1 to convex-cost network ﬂow problem (or convex ﬂow

problem). Convex ﬂow problem is deﬁned on graph

G = (V, E) as follows: min e∈E c_e(x_e) (CP) e∈Ev Δ(v, e)x_e= b_v, ∀v ∈ V 0≤ xe≤ ue, ∀e ∈ E

where each c_e is a convex piecewise-linear function. Notice that if we deﬁne ψ exactly same as we did for

MCF, and for each e ∈ E, deﬁne

φ_e(z) =

c_e(z) if 0≤ z ≤ u_e

∞ otherwise

then, we can apply Algorithm 2 on the graph G with functions ψ and φ, and hence obtain BP on convex ﬂow problem.

(10)

Now, we give the deﬁnition of a residual graph on convex cost ﬂow problem. Suppose x is a feasible solution for (CP), let G(x) be the residual graph for

G and x∗ deﬁned as follows: ∀e = (vα, vβ) ∈ E, if xe < ue, then e is also an arc in G(x) with cost

cx_e = lim_t→0c(xe+t)−c(xe)

t (the value of the slope of ceat

the right side of x_e); if x_e> 0, then there is an arc e= (v_β, v_α) in G(x) with cost cx_e = limt→0c(xe−t)−c(x_t e)

(negative value of the slope of c_e at the left side of x_e). Finally, let δ(x) = min C∈C e∈C cx_e .

where C is the set of directed cycles in G(x).

Theorem 5.1. Suppose x∗ _{is the unique optimal}

solu-tion for CP. Let L to be the maximum cost of a simple directed path in G(x∗), and by uniqueness of optimal

so-lution, δ(x∗) > 0. Then, for any N ≥ (_2δ(xL∗) + 1)m,

ˆ

xN = x∗. Namely, the BP algorithm converges to the optimal solution in at most ( L

2δ(x∗₎ + 1)m iterations.

The proof of Theorem 5.1 is almost identical to the proof of Theorem 3.1. Theorem 5.1 is an illustration of the power of BP, that not only it can solve MCF, but possibly many other variants ofMCF as well.

6 Running Time of BP on MCF

In the next two sections, we will always assume vectors

c, u and b in problemMCF are integral. This

assump-tion is not restrictive in practice, as theMCF solved in practice usually have rational data sets [1]. We deﬁne

cmax = maxe∈E{ce}, and provide an upper bound for

the running time of BP onMCF in terms m (the num-ber of arcs in G), n (the numnum-ber of nodes in G), and

cmax.

Lemma 6.1. Suppose MCF has integral data, in

Al-gorithm 2, the number of operations to update all the messages at iteration t is O(nm log n· tc_max).

Proof. At iteration t, ﬁx a valid message function mt_e→v, and we update it as:

mt_e→v(z) =φ_e(z) + min ¯ z∈R|Ew|_,¯z_e_=z ⎧ ⎨ ⎩ψw(¯z) + ˜ e∈Ew\e mt−1_e→w_˜ (¯ze˜) ⎫ ⎬ ⎭ As MCF has integral data, by Corollary 3.3, any messages of the form mt−1_˜_e→w has integral slopes, and the absolute values of the slopes are bounded by (t− 1)cmax. This implies for each mt−1e→w˜ , the number of

linear pieces of mt−1e→w˜ is bounded by 2(t− 1)cmax.

v1 v3 v2 e1 e3 e2 Figure 5:

As the size of Ew is at most n, by Corollary 3.1,

g(z) = minz∈R¯ |Ew|_,¯z_e_=z{ψw(¯z) +_e∈E_˜ _w_\emt−1_˜_e→w(¯z˜e)} can be calculated in O(log n· ntcmax) operations. As

mt_e→v(z) = g(z) + φe(z), we have that mt_e→v can also be calculated in O(log n· ntcmax) operations. Since at each

iteration, we update 2m message functions, the total number of computations at iteration t can be bounded by O(nm log n· tc_max).

Theorem 6.1. Given MCF has a unique optimal

so-lution x∗ and integral data, the BP algorithm ﬁnds the unique optimal solution ofMCF in O(n5m log n· c3_max)

operations.

Proof. Because c is integral, the value δ(x∗) in Theorem 3.1 is also integral. Recall that L is the maximum cost of a simple directed path in G(x∗). Since simple directed path has at most n− 1 arcs, L is bounded by ncmax.

Thus, by Theorem 3.1, the BP algorithm (Algorithm

2) converges to the optimal solution of MCF after

O(n2c_max) iterations. Combine this with Lemma 6.1, we have that BP converges to the optimal solution in

O(n2cmax·nm log n·n2cmax·cmax) = O(n5m log n·c3_max)

operations.

We would like to point out that for MCF with

a unique optimal solution, Algorithm 2 can take an exponential number of iterations to converge. Consider the MCF on the directed graph G shown in Figure 5. Take a large positive integer D, set c_e₁ = c_e₂ = D,

c_e₃ = 2D− 1, b_v₁ = 1, b_v₂ = 0 and b_v₃ =−1. It can be checked that ˆxN1 alternates between 1 and−1 when

2N + 1 < 2D₃ . This means that BP algorithm takes at least Ω(D) iterations to converge. Since the input size of a large D is just log(D), we have that Algorithm 2 for

MCF does not converge to the unique optimal solution

in polynomial time.

6.1 Runtime of BP on MCF Here we analyze

the run time of BP on MCF, which is deﬁned in

Section 3.2. We show that BP performs much better on this special class of MCF. Speciﬁcally, we state the following theorem:

Theorem 6.2. If MCF_{has a unique optimal solution,}

Algorithm 2 forMCFﬁnds the unique optimal solution in O(n4) operations.

(11)

First, we state the convergence result of BP forMCF which is reminiscent of Theorem 3.1.

Theorem 6.3. Suppose MCF _{has a unique optimal}

solution x∗. Deﬁne L to be the maximum cost of a simple directed path in G(x∗). Then, Algorithm 2 for

MCF converges to the optimal solution in at most

(_2δ(xL∗) + 1)n iterations.

The proof of this theorem is exactly the same as the

proof of Theorem 3.1, and is hence omitted. Now,

we apply this result to compute the running time of Algorithm 2 forMCF.

Proof. [Proof of Theorem 6.2] For the simplicity of the

proof, we assume that every linear piece in a message

function has unit length. This assumption is not

restrictive, as each linear piece in general has integral vertices (from Corollary 3.4), so we can always break a bigger linear piece into many unit length linear pieces with the same slopes. At iteration t≥ 1, recall mt_e→v(z) is deﬁned as: φ_e(z) + min ¯ z∈R|Ew|_,¯z_e_=z ⎧ ⎨ ⎩ψw(¯z) + ˜ e∈Ew\e mt−1_e→w_˜ (¯ze˜) ⎫ ⎬ ⎭ We claim that each mt_e→v can be constructed using the following information:

(1) ˜u_w smallest linear pieces in {mt−1_e→w_˜ |˜e ∈ δ(w) \ e}, where δ(w) denotes all in-arcs of w;

(2) (˜u_w − b_w) smallest linear pieces in {mt−1_˜_e→w|˜e ∈

γ(w)\ e} where γ(w) denotes all out-arcs of w;

(3) The value of_˜_e∈E_wmt−1_˜_e→w(0).

Observe that once all three sets of information is given, for any integer 0≤ z ≤ ue, one can ﬁnd mt_e→v(z) in a constant number of operations. As ue is also bounded by a constant, we have that mt_e→v can be computed in constant time when information (1), (2) and (3) are given.

Now, since |E_w| is bounded by n, obtaining each one of information (1), (2) or (3) for a speciﬁc message

mt_e→v takes O(n) operations. When we update all

the messages in the set {mt_e→v|e ∈ E_w}, note that every e ∈ E_w except for a constant number of them, information (1) for updating mt_e→v is equivalent to ˜

u_w smallest pieces in mt−1_e→w_˜ for some ˜e ∈ δ(w); and

information (2) for updating mt_e→v is equivalent ˜u_w−b_w

smallest linear pieces in mt−1_e→w_˜ for some ˜e ∈ γ(w).

In fact, the exceptions are precisely those e such that

mt−1_e→w contain at least one of those ˜uw smallest linear pieces in {mt−1_e→w_˜ |˜e ∈ δ(w)} or one of those (˜uw− bw) smallest linear pieces in {mt−1e→w˜ |˜e ∈ γ(w)}. Moreover,

update any message in the set {mt_e→v|e ∈ Ew} has

information (3) to be exactly_˜_e∈E_wmt−1e→w˜ (0). Hence,

to update all messages in the set {mt_e→v|e ∈ E_w}, we can obtain information (1), (2), (3) for all the messages in {mt_e→v|e ∈ E_w} in O(n) operations, and hence can compute all messages in {mt_e→v|e ∈ Ew} in O(n)

operations.

Therefore, updating all the message functions at every iteration takes O(n2) operations. By Theorem

6.2, since L is bounded by nK, δ(x∗) ≥ 1, then

(_2δ(xL∗) + 1)n is bounded by O(n2). This concludes

that Algorithm 2 ﬁnds a unique optimal solution in

O(n4) operations.

Note both the shortest-path problem and bipartite matching problem belongs to the class ofMCF, where ˜

u, b, u are all bounded by 2.

7 FPRAS Implementation

In this section, we provide a fully polynomial-time

randomized approximation scheme (FPRAS) for MCF

using BP as a subroutine. First, we describe the main idea of our approximation scheme.

As Theorem 6.1 indicated, in order to come up with an eﬃcient approximation scheme using BP (Algorithm 2), we need to get around the following diﬃculties:

1. The convergence of BP requires MCF to have a

unique optimal solution.

2. The running time of BP is polynomial in m, n and

cmax.

For an instance of MCF, our strategy for finding an efficient (1 + ) approximation scheme is to find a modified cost vector ¯c (letMCF be the problem with

the modiﬁed cost vector) which satisﬁes the following properties:

1. TheMCF has a unique optimal solution. 2. ¯cmax is polynomial in m, n and 1.

3. The optimal solution ofMCF is a “near optimal”

solution of MCF. The term “near optimal” is

rather fuzzy and we will address this question later in the section.

First, in order to find a modified MCF with a unique optimal solution, we first state a result which is a variant of the Isolation Lemma introduced in [15].

7.1 Variation of Isolation Lemma

Theorem 7.1. Let MCF be an instance of min-cost

ﬂow problem with underlying graph G = (V, E), demand vector b, constraint vector u. Let its cost vector ¯c be generated as follows: for each e ∈ E, ¯ce is chosen

(12)

independently and uniformly over N_e, where N_e is a discrete set of 4m positive numbers (m = |E|). Then, the probability thatMCF has a unique optimal solution is at least 1₂.

Proof. Fix an arc e1∈ E, and ﬁx ¯ce for all e∈ E \ e1.

Suppose that there exists α ≥ 0, such that when

¯

c_e₁ = α, MCF have optimal solutions x∗, x∗∗, where

x∗_e₁ = 0 and x∗∗_e₁ > 0. Then, if ¯c_e₁ > α, for any feasible

solution x ofMCF , where x_e₁ > 0, we have

e∈E ¯ c_ex∗_e = e∈E,e=e1 ¯ c_ex∗_e ≤ e∈E,e=e1 ¯ c_ex_e+ x_e₁α < e∈E ¯ cexe

And if ¯c_e₁ < α, for any feasible solution x of MCF

where x_e₁= 0, we have e∈E ¯ c_ex∗∗_e < e∈E,e=e1 ¯ c_ex∗∗_e + αx∗∗_e₁ ≤ e∈E,e=e1 ¯ c_ex_e+ αx_e₁ = e∈E ¯ cexe

This implies there exists at most one value for α, such that if ¯c_e₁ = α thenMCF have optimal solutions

x∗, x∗∗, where x∗_e₁ = 0 and x∗∗_e₁ > 0. Similarly, we can

also deduce that there exists at most one value for β, such that if ¯ce1 = β, MCF have optimal solutions x∗,

x∗∗, where x∗_e₁ < ue1 and x∗∗e1 = ue1.

Let OS denote the set of all optimal solutions of

MCF, and let D(e) be the condition of either: 0 = xe,

∀x ∈ OS, or 0 < xe < ue, ∀x ∈ OS, or xe = ue,

∀x ∈ OS. Since ¯ce1 has 4m possible values, where each value is chosen with equal probability, we conclude that the probability of D(e1) is satisﬁed is at least

4m−2

4m = 2m−12m . By the union bound of probability, we

have that the probability of D(e) is satisﬁed for all e∈ E is at least 1−_e∈E _2m1 = 1₂. Now, we state the following lemma:

Lemma 7.1. If ∀e ∈ E, condition D(e) is satisﬁed, then

MCF have a unique optimal solution.

Then by Lemma 7.1, we conclude that the probability that MCF has a unique optimal solution is at least 1₂.

Proof. [Proof of Lemma 7.1] Suppose x∗and x∗∗are two distinct optimal solutions of MCF. Let d = x∗∗− x∗, then x∗+ λd is an optimal solution of MCF iﬀ 0 ≤

(x∗+ λd)_e ≤ u_e, ∀e ∈ E. As ¯c_e > 0 for any e ∈ E,

and ¯cTd = ¯cTx∗∗− ¯cTx∗= 0, there exists some e such that d_e < 0. Let λ∗= sup{λ | x∗+ λd is an optimal

solution of MCF}, since de < 0, λ∗ is bounded; and since x∗+ d = x∗∗, λ∗> 0. By optimality of λ∗, there must exists some esuch that either (x∗+ λ∗d)e= 0 or

ue. Since λ∗ > 0, x∗_e = (x∗+ λ∗d)e, this contradicts the assumption that D(e) is satisﬁed. Thus, MCF must have a unique optimal solution.

We note that Theorem 7.1 can be easily modiﬁed for LP in standard form.

Corollary 7.1. Let LP be an LP problem with

con-straint Ax = b, where A is a m× n matrix, b ∈ Rm. The cost vector ¯c ofLP is generated as follows: for each e ∈ E, ¯c_e is chosen independently and uniformly over Ne, where Ne is a discrete set of 2n elements. Then, the probability that LP has a unique optimal solution is at least 1₂.

7.2 Finding the Correct Modified Cost Vector ¯c

Next, we provide a randomly generate ¯c with the desired

properties stated in the beginning of this section. Let

X : E → {1, 2, ..., 4m} be a random function where for

each e∈ E, X(e) is chosen independently and uniformly over the range. Let t = cmax

4mn, and generate ¯c as: for

each e ∈ E, let ¯c_e = 4m· ce

t + X(e). Then, for any

¯

c generated at random, ¯cmax is polynomial in m, n and 1

, and by Theorem 7.1, the probability ofMCF having

a unique optimal solution is greater than 1₂.

Now, we introduce algorithm APRXMT(MCF, ),

which works as follows: select a random ¯c; try to solve MCF using BP. If BP discovers that MCF has no

unique optimal solution, then we start the procedure by selecting another ¯c at random, otherwise, return

the unique optimal solution found by BP. Formally, we

present APRXMT(MCF, ) as Algorithm 3.

Corollary 7.2. The expected number of operations

for APRXMT(MCF, ) to terminate is O(n8m73log n).

Proof. With Theorem 7.1, when we call

APRXMT(MCF, ), the expected number of MCF

BP tried to solve is bounded by 2. For each selection of ¯c, we run Algorithm 2 for 2¯cmaxn2 iterations. As ¯

cmax = O(m2n), by Lemma 6.1, the expected number

of operations for APRXMT(MCF, ) to terminate is

O(¯c3maxn5m log n) = O(n

8_m7_{log n}

3 ).

Now let ¯c be a randomly chosen vector such that MCF has a unique optimal solution x2_{. The next thing}

we want is to show that x2 is a “near optimal” solution of MCF. To accomplish this, let e= arg max{ce} and

(13)

Algorithm 3 APRXMT(MCF, )

1: Let t = cmax

4mn, for any e ∈ E, assign ¯ce = 4m·

ce

t+pe, where pe is an integer chosen

indepen-dently, uniformly random from{1, 2, . . . , 4m}

2: LetMCF be the problem with modiﬁed cost ¯c.

3: Run Algorithm 2 on MCF for N = 2¯cmaxn2

iterations.

4: Use Corollary 4.1 to determine ifMCF has a unique solution.

5: if MCF does not have a unique solution then

6: Restart the procedure APRXMT(MCF, ).

7: else

8: Terminate and return x2 = ˆxN, where ˆxN if the “belief” vector found in Algorithm 2.

9: end if min e∈E c_ex_e (MCF) e∈Ev

Δ(v, e)x_e= b_v, ∀v ∈ V (demand constraints)

xe = x2_e

0≤ x_e≤ u_e, ∀e ∈ E (ﬂow constraints)

Suppose x3 is an optimal solution for (MCF), and x1 is an optimal solution ofMCF, then

Lemma 7.2. cT_x3_{− c}T_x1_{≤ |x}2

e − x1_e|nt.

Proof. Consider d = x2 − x1, we call a vector k ∈

{−1, 0, 1}|E|_{an synchronous cycle vector of d if for any}

e ∈ E, k_e = 1 only if d_e > 0, k_e =−1 only if d_e < 0,

and the set {(i, j)|k_ij = 1 or k_ji =−1} forms exactly one directly cycle in G. Since d is a integral vector of circulation (i.e., d send 0 unti amount of ﬂow to

every vertex v ∈ V ). Hence that we can decompose

d so that _k∈Kk = d, for some set K of consisting of

synchronous cycle vectors.

Now, let K ⊂ K be the subset of K containing all vectors k ∈ K such that ke = 0. Then, for any

k ∈ K, observe that x2− k is a feasible solution for

MCF. Because x2 _{is the optimal solution for} _MCF,

we have that ¯ck≤ 0. And

¯ ce= 4mce t + pe, 1≤ pe≤ 4m, =⇒ ¯ce,4mce t ∈ [4m c_e t , 4m( c_e t + 1)], =⇒ |¯ce−4mce t | ≤ 4m, =⇒ e |4mce t − ¯ce||ke| ≤ e |4m||ke| ≤ 4mn, but 4mc _k t ≤ 4mck t − ¯c _k_≤ e |4mce t − ¯ce||ke|, thus, we have 4mc _k t ≤ 4mn, =⇒ ck≤ nt.

Clearly, x1+_k∈Kk satisfy the demand/supply

constraints of MCF. Since min{x1_e, x2_e} ≤ x1_e +

k∈Kke ≤ max{x1e, x2e} for all e ∈ E, and each

k∈ K, we have that x1+_k∈Kk is a feasible solution

for MCF. Since x3 is the optimal solution of MCF,

cTx3 ≤ cTx1+_k∈KcTk ≤ cTx1+|K|nt. Since |K_{| = |x}2 e− x1_e|, we have cTx3− cTx1≤ |x2_e − x1_e|nt. Corollary 7.3. cT_x3_{≤ (1 +} 2m)cTx1, for any ≤ 2. Proof. By Lemma 7.2, c T_x3_{− c}T_x1 cTx3 ≤ |x2 e− x1_e|nt cTx3 ≤ |x2e − x1_e|nt |x2 e− x1_e|ce = 4m, as t = ce 4mn. Thus, cTx3− cTx1 cTx3 ≤ 4m =⇒ cTx3≤ 1 1−_4m c T_x1 =⇒ cTx3≤ (1 + 2m)c T_x1

where the last inequality holds because (1− _4m )(1 +

2m) = 1 +4m −

2

8m ≥ 1, as 2m ≤ 1.

7.3 A (1 + ) Approximation Scheme Loosely

speaking, Corollary 7.3 shows that x2 at arc e is “near optimal”, since ﬁxing the ﬂow at arc e to x2_e helps us

in finding a feasible solution of MCF which is close to optimal. This leads us to approximate algorithm AS(MCF, ) (see Algorithm 4); at each iteration, it uses APRXMT (Algorithm 3), and iteratively fixes the flow at the arc with the largest cost.