Objective of Hungarian method - The Hungarian algorithm

Chapter 3. Semi-Supervised Classification Using

4.2. New semi-supervised algorithm using the

4.3.2. The Hungarian algorithm

4.3.2.3. Objective of Hungarian method

Given the above introductory definitions regarding bipartite graphs, the objective of the Hungarian algorithm is to find the maximum weight matching M in a complete bipartite graph,G(X, Y, E).

Two fundamental notions that allow us to significantly reduce the search space of perfect matchings M to find the maximum weight matching are the concepts ofvertex labeling andequality graphs.

DEFINITION 4.7.– (Feasible vertex labeling): a feasible vertex labeling upon a weighted bipartite graph G(X, Y, E) is a function that assigns a label, l ∈ Z to each vertex in V, such that the sum of labels of any pair of vertices connected by an edgeex,y is greater than equal to the edge weightwxy,

l:V → Z | l(x) +l(y)≥wxy, ∀x∈X, y ∈X

DEFINITION 4.8.– (Equality (subgraph):) given a feasible vertex labeling,l, the equality subgraph of a complete bipartite graph is the graphGl(X, Y, El)defined by the vertices inXand

Y and a subset of edgesEl ∈E, whose weightswxy are strictly equal to the sum of vertex labelsl(x) +l(y):

El={(x, y)} | l(x) +l(y) =wxy

Given the vertex labeling and equality subgraph definitions, the basis of the Hungarian algorithm is the Huhn – Monkres theorem.

Theorem 4.1. (Huhn – Monkres theorem): if a perfect matching is found in an equality subgraph ofG(X, Y, E), this matching is the maximum weight matching.

Proof 1. From the definition of feasible labeling, any edge (x, y)∈E satisfies that

w(x, y)≤l(x) +l(y) [4.6]

For a perfect matching M, each vertex is only adjacent to one edge, thus,

Now, any edgeex,yin an equality graph satisfies

w(x, y) =l(x) +l(y) [4.8]

Hence, for any perfect matching over the equality graph,Ml, it

yields

Finally, by merging equations [4.7] and [4.10]

Thus, the problem of finding a maximum weighted matching is transformed into the onefinding a feasible vertex labeling with a perfect matching in the associated equality subgraph. This is essentially achieved by selecting an initial vertex labeling as well as a matchingM, of size |M|, in the equality graph, and iteratively growingM until it becomes a perfect matching (|M|=k). In each iteration, the size of M is increased by one edge after anaugmented pathis found.

DEFINITION 4.9.– (Path): a path over a graph G(V, E) is defined as a sequence of vertices{v¹, v², v³, v⁴, . . . , v^p}, such that there exists an edge connecting each pair of vertices (vⁱ, vⁱ⁺¹). Note that the superscripts 1,2, . . . , p are not referred to the indices of the vertices in the graph, but to their order in the path sequence.

DEFINITION 4.10.– (Augmented path): given a matchingM in the equality graph, an augmented path is a path (1) whose edges alternate between M and M¯ (alternating path), and (2) whose start and end vertices, v¹ and v^p, are unmatched, i.e., {(v¹, v²)∈M¯,(v², v³)∈M,(v³, v⁴)∈M, . . . ,¯ (vⁱ, vⁱ⁺¹)∈M}¯ .

Obviously, if an augmented path is found, the size of M can be increased by one edge by inverting the edges in the path from M¯ → M and M → M¯ so that the new path can be expressed as (Figure 4.4(a)) {(v¹, v²) ∈ M, (v², v³) ∈ M, . . . ,¯ (vⁱ, vⁱ⁺¹)∈M}(Figure 4.4(b)).

As mentioned earlier, the Hungarian algorithm starts by an arbitrary vertex labeling. Typically, the labels for the vertices inY are set to 0, while each vertexxi ∈Xis labeled with the maximum of its incident edges,

l(yi) = 0 [4.11]

l(xi) = max

y_i∈Y(w(xi, yi)) [4.12]

x₁ M y₁

x₃ y₃

x₂ y₂

(a)

x₁

x₃ x₂

y₁

y₃ y₂ Augmenting path

(b) x₁

x₃ x₂

y₁

y₃ y₂ M

(b)

Figure 4.4.Illustration of augmented paths. (a) Bipartite graph with two disjoint subsets and an arbitrary matchingM; (b) Alternating path with start aty1and end atx3. The vertex order is

indicated by the arrows. (c) The alternating path allows us to increase the matchingM, resulting in a perfect matching

Then, a matching in the equality graph,El, associated with the vertex labeling is selected. If |M| = k, this matching is already perfect and the optimum is found. If the matching is not perfect, the size of M needs to be gradually incremented in a number of iterations. By definition 4.10, |M| can be increased by one edge if an augmenting path is found. Thus, each iteration step is directed toward the search for an augmented path.

As M is not yet perfect, there must be some unmatched vertex x ∈ X connected to a matched vertex y. This seems

to be obvious, since otherwise the vertexxi would be already matched in M. Let Nl(x) denote the subset of vertices in Y which are connected to x (the neighbors of x). Also, let X denote the subset of vertices X ∈ X − {x} matched to any vertex inNl(x). Thus,xis “competing” withX for an edge in M. The path {x, y, x ∈ X} can be thought of as a section of an augmented path. Now, if an unmatched vertexy is found, connected to any vertex inS = {X ∪xi} in a (new) equality graph, two situations may occur:

1.y is connected tox. Then, M can be increased by adding a new edge (x, y). The new matching can be expressed as M =M ∪(x, y),

2.y is connected to a vertex in X. Let this vertex be denoted as x2, and y2, the vertex in Y matched to x2. Then, an augmented path can be found in the formx, y2, x2, y, and M can be incremented by inverting the path edges fromM to M¯ and vice versa.

Now, assuming that a maximum size matchingMhas been previously selected, the required unmatched vertex y does not yet exist in El. Otherwise, this vertex would already be included in M. Therefore, the equality graph El must be expanded to find new potential vertices in Y to augment M. Obviously, the expansion ofElrequires a vertex (re)labelingl. It is formulated as

δl = min

x∈S,y∈N_l(S)(l(x) +l(y)−w(x, y)) [4.13]

l(v) =

⎧⎨

⎩

l(v)−δl, v∈S l(v) +δl, v∈Nl(S)

l(v), otherwise [4.14]

The relabeling function l ensures that a new equality graph is found, E_l, such that E_l ∈ El and some new edge (xi ∈ S, y /∈ Nl(S)) exists. In other words, the new set of S neighbors inE_l isN_l(S) =Nl(S)∪ {y}.

Consider the new edges (x ∈ S, y /∈ Nl(S)) in E_l. With reference to the vertex y, two possible cases are to be considered:

–xi : (xi, y)∈ M (y is not matched). Thus, an augmented path can be found and|M|=|M|+ 1.

–∃xi : (xi, y) ∈ M (y is matched). The new edge can be expressed as (xi ∈ S, y /∈ Nl(S) ∈ M). Since y is not an unmatched vertex, the path cannot be augmented. In this case, the vertex x matched to y is attached to S so that y belongs to N_l(S). Then, a new vertex relabeling is required, forcing new edges in E_l connecting vertices from S toY −N_l(S). Such vertex labeling is iterated, adding vertices toS andNl(S)until an unmatched vertex inY is found.

Each time |M| is incremented, the previous steps are repeated, starting with another free vertex inX. This process is iterated until all free vertices in X are explored, in which case |M| = k and a perfect matching (the maximum weight matching) is achieved.

Dans le document Semi-Supervised and Unsupervised (Page 141-146)