A set of indices for polygonization evaluation

Evaluation of Cadastral Map processing

5.3 A set of indices for polygonization evaluation

the branch of optimization or operations research in mathematics. It consists of finding a maximum weight matching in a weighted bipartite graph.

In its proposed form, the problem is as follows:

• Let DGT, DCG be a Ground Truth document and a Computer Generated document, respectively.

• There are |D_CG|number of polygons in DCG and |D_GT|number of polygons in D_GT. Any polygons (P_CG) from D_CG can be assigned to any polygons (P_CG) ofD_CG, incurring some cost that may vary depending on theP_CG-P_GT assignment. It is required to map all polygons by assigning exactly onePCG

to eachP_GT in such a way that the total cost of the assignment is minimized.

This matching cost is directly linked to the cost function that measures the similarity between polygons.

Our combinatorial framework cuts down the algorithmic complexity to anO(n³) upper bound, depending on the number of polygons in the largest drawing. Hence, the matching can be achieved in polynomial time which tackles the computational barrier. We stand apart from the prior approaches by grouping low level primitives into polygons and then considering their matching at this high level point of view.

Once, polygons are mapped, it is interesting to take a closer look to a lower level by checking out to segment layouts within the mapped polygons. This presents some advantages, elements are locally affected to define a local dissimilarity measure which is visually interesting; it makes easier the spotting of miss detected areas.

A complete data flow process for polygonized document evaluation is proposed.

Our contribution in this domain is two-fold. Firstly, we compare a ground truth document and a computer generated document thanks to an optimal framework that proceeds to the object mapping. Finally, another operator provides estimates the relation between the segments within two mapped polygons in terms of edit operations by means of a cycle graph matching. The figure5.1depicts an overview of our methodology.

5.2.3 Organization

The organization of the chapter is as follows: Sect.5.3.1describes theoretically and in terms of algorithm the polygon mapping method, Sect. 5.3.2 explains the cycle graph matching process in order to judge the quality of the polygonal approximation.

Sect.5.3.3put forwards the type of errors that are likely to appear in object retrieval systems. Sect.5.4describes the experimental protocol, this section also explains how to interpret our new set of indices on a application to cadastral map evaluation. A summary is included in Sect.5.5, followed by discussions and concluding remarks.

5.3 A set of indices for polygonization evaluation

In this section, we define the two criteria involved into our proposal for a performance evaluation tool dedicated to polygonization. In the first part, a polygon assignment

Figure 5.1: Overview of the global methodology. A bipartite graph weighed by the symmetric difference, and cycle graph edit distance applied to mapped polygons

5.3. A set of indices for polygonization evaluation 123

method is described. It aims at taking into account shape distortions caused by retrieval systems. Secondly, a matched edit distance is defined. This measure repre-sents the variations introduced when a given system approximates digital curves. It is a synthesis on vectorization precision. Finally, miss or over detection errors due to raster to polygon conversion are introduced in a third part. This decomposition leads to the definition of specific notations and error categorizations.

5.3.1 Polygon mapping using the Hungarian method

Once polygons are located into the vectorized document, it can be seen as a partition of polygons. Comparing two documents (D₁,D₂) is resumed to the matching of each polygon of D1 with each polygon of D2. This assignment is performed using the Hungarian method which is formally described in the next part.

5.3.1.1 Algorithmic of the Hungarian method

Our approach for vectorized document comparison is based on the assignment problem. The assignment problem considers the task of finding an optimal as-signment of the elements of a set D₁ to the elements of a set D₂. Without loss of generality, we assume that |D₁| ≥ |D₂|. The complete bipartite graph Gpm=D1∪D2∪ 4, D₁×(D2∪ 4), where4represents empty dummy polygons, is called the polygon matching of D₁ and D₂. A polygon matching between D₁ andD₂ is defined as a maximal matching inG_pm. We define the matching distance betweenD1 andD2, denoted by P M D(D1, D2), as the cost of the minimum-weight polygon matching between D₁ and D₂ with respect to the cost function K. The cost function is especially dedicated to our problem and is fully explained in section 5.3.1.2. This optimal polygon assignment induces a univalent vertex mapping be-tween D₁ and D₂, such as the function P M D : D₁×(D₂∪ 4) → R⁺₀ minimized the cost of polygon matching. If the numbers of polygons are not equal in both documents, then empty "dummy" polygons are added until equality |D₁| = |D₂| is reached. The cost to match an empty "dummy" polygon is equal to the cost of inserting a whole unmapped polygon (K(∅, P)). A shortcoming of the method is the one-to-one mapping aspect of the algorithm, however, this latter is performed at a high level of perception where data are less likely to be fragmented. Finally, this disadvantage should not discourage the use of the PMD distance considering the im-portant speed-up it provides while being optimal, deterministic and quite accurate.

In addition, unmapped elements are not left behind, they are considered either as

"false alarm" or "false negative" according to the kind of mistakes they induced (see section5.3.3). Formally, the assignment problem can be defined as follows.

Definition 20. (The Assignment Problem) Let us assume there are two sets D1

and D2 together with an n×n cost matrix C of real numbers given. To improve the clarity of the reading |D₁| = |D₂| = n . The matrix elements C_ij correspond to the costs of assigning the i-th element of D1 to the j-th element of D2. The

assignment problem can be stated as finding a permutation p = p1, p2, ...pn of the integers 1,2,...,n that minimizes Pn

i=1C_ip_i

The assignment problem can be reformulated as finding an optimal matching in a complete bipartite graph and is therefore also referred to as bipartite graph matching problem. Solving the assignment problem in a brute force manner by enumerating all permutations and selecting the one that minimizes the objective function leads to an exponential complexity which is unreasonable, of course. However, there exists an algorithm which is known as Munkres’ algorithm [Munkres 1957]¹ that solves the bipartite matching problem in polynomial time. In Algorithm 4 Munkres’ method is described in detail. The assignment cost matrix C given in Definition 20 is the algorithm’s input, and the output corresponds to the optimal permutation, i.e. the assignment pairs resulting in the minimum cost. In the description of Munkres’

method in Algorithm4some lines (rows or columns) of the cost matrix C and some zero elements are distinguished. They are termed covered or uncovered lines and starred or primed zeros, respectively.

Munkres’ algorithm is based on the following theorem.

Theorem 5.3.1. (Equivalent Matrices) Given a cost matrix C as defined in Def-inition 20, a column vector c, c = (c₁, ..., c_n), and a row vector r = (r₁, ..., r_n), the square matrix C’ with the elements C_ij⁰ = Cij −ci−rj has the same optimal assignment solution as the matrix C. C andC⁰ are said to be equivalent.

Proof. ([Bourgeois 1971]) Let p be a permutation of the integers1,2, .., nminimizing Pn

The values of the last two terms are independent of permutation p so that if p minimizesPn

i=1Cipi, it also minimizes Pn i=1C_ip⁰ _i.

Consequently, if we find a new matrix C’ equivalent to the initial cost matrix C, and a permutation p with all C_ip⁰

i, then p also minimizes P_n

i=1C_ip_i. Intuitively, Munkres’ algorithm transforms the original cost matrix C into an equivalent matrix C’ having n independent zero elements. This independent set of zero elements exactly corresponds to the optimal assignment pairs.

The operations executed in lines 1 and 2, and STEP 4 of Algorithm 4 find a matrix equivalent to the initial cost matrix (Theorem 5.3.1). In lines 1 and 2 the column vector c= (c1, ..., cn) is constructed by ci = min{C_ij}_j=1,...,n, and the row

1Munkres’ algorithm is a refinement of an earlier version by Kuhn [Kuhn 1955] and is also referred to as Kuhn-Munkres, or Hungarian algorithm.

5.3. A set of indices for polygonization evaluation 125

Algorithm 4Munkres’ algorithm for the assignment problem Require: A cost matrix C with dimensionalityn

Ensure: The minimum cost polygon assignment

1: For each row r inC, subtract its smallest element from every element inr 2: For each column cinC, subtract its smallest element from every element inc 3: For all zeros zi in C, markzi with a star if there is no starred zero in its row or column

4: STEP 1:

5: forEach column containing a starred zero do 6: cover this column

7: end for

8: if ncolumns are covered then

9: GOTODONE

10: else

11: GOTOSTEP 2

Dans le document L’Université de La Rochelle (Page 143-147)