Multiple Criteria Districting Problems, Models, Algorithms, and Applications: The Public Transportation Paris Region Pricing System

(1)

Instituto de Engenharia de Sistemas e Computadores de Coimbra

Institute of Systems Engineering and Computers

INESC - Coimbra

Tavares Pereira, F., Figueira, J., Mousseau, V., Roy, B.

Multiple Criteria Districting Problems, Models,

Algorithms, and Applications: The Public

Transportation Paris Region Pricing System

No. 21

2004

ISSN: 1645-2631

Instituto de Engenharia de Sistemas e Computadores de Coimbra

INESC - Coimbra

(2)

Multiple Criteria Districting Problems,

Models, Algorithms, and Applications:

The Public Transportation Paris

Region Pricing System.

Fernando Tavares Pereira ∗‡§, Jos´e Figueira †‡§, Vincent Mousseau§and Bernard Roy§

December 19, 2004

∗_{Dept. of Mathematics, The University of Beira Interior, Rua Marquˆes D’Avilla e Bolama, 6201-001,}

Covilh˜a, Portugal, Phone/Fax: (+351) 275 31 97 32, E-mail: fpereira@noe.ubi.pt

†_{Faculty of Economics, The University of Coimbra, Av. Dias da Silva, Coimbra, Portugal, Phone:}

(+351) 239 790 500, Fax: (+351) 239 790 514. E-mail: figueira@fe.uc.pt

‡_{INESC - Coimbra, Rua Antero de Quental, 199, 3000-033 Coimbra, Portugal}

§_{LAMSADE, Paris-Dauphine University,} _{Place du Mar´echal De Lattre de Tassigny,} ₇₅₇₇₅

Paris Cedex 16, France, Phone: (+33-1) 44-05-44-01, Fax: (+33-1) 44-05-40-91, E-mail: {fpereira,figueira,mousseau,roy}@lamsade.dauphine.fr

(3)

Multiple Criteria Districting Problems,

Models, Algorithms, and Applications:

The Public Transportation Paris

Region Pricing System.

Abstract

Districting problems are of high importance in a wide scope of areas. Multiple criteria districting problems are a more realistic representation of the practical problems in the real-world. This paper deals with the problem of partitioning a territory in zones. Each zone is composed of a set of elementary territorial units or atoms. A district map is formed by partitioning the set of atoms in connected zones without inclusions. This problem can be modelled using the graph theory and the 0-1 mathematical programming concepts and techniques. Each atom is associated to a vertex of the graph, while a pair of contiguous atoms allows to define an edge of the graph. There are also some numerical values associated to the edges and/or vertices. The problem of enumerating all the efficient solutions for such a model is known as being NP-difficult, which “imposes” abandoning of exact methods to solve large-size instances. In this paper we propose a new method to approximate the efficient frontier based on an evolutionary algorithm with local search. The algorithm presents a new solution representation and the crossover/mutation operators. The algorithms were applied to a real-world problem of the Paris region public transportation.

Keywords: Multiple criteria; Districting Problems; Evolutionary algorithms; Local search; Combinatorial optimization.

(6)

1 Introduction

Over the last three decades, many researchers, academics and practitioners from distinct fields have developed models, built algorithms and implemented solutions concerning the so-called partition or districting problem. It can be viewed as a grouping process of elementary units or atoms of a territory into larger pieces or clusters, giving rise to a map or configuration.

There are many practical questions related to this problem: • To define the electoral districts of a country [5, 15, 20, 24].

• To establish the different working zones for a travel salesperson team [11, 18, 33, 39]. • To define areas in metropolitan internet networks to install hubs [29].

• To define the areas for manufactured and consumer goods [13].

But, the same kind of questions occur also in police districting [9]; school districting [12]; districting of salt spreading operations [27]; defining electrical power zones [2], and many other domains. These are barely some frequent real-world decision making questions and concerns in territory partition problems.

When carefully observing and analyzing the large scope of application areas dealing with districting problems we cannot be indifferent to the crucial importance of such kind of decision making situations to our societies. Most of the above cited applications deal with real-life decision making situations that contribute to the development of our societies in a large variety of fields. By nature, these problems are multiple criteria, frequently incommensurable and conflicting.

On the definition of the problem. One can say that to define more precisely the partition of a territory into different “homogeneous” zones under multiple criteria, consists of grouping elementary units of territory in order to form a set of districts, zones, or clusters. Thus a territory is composed of zones, each zone resulting from a grouping process of elementary units or atoms. The zones should fulfill certain, more or less, technical, ethical, ecological, social and other constraints. Different configurations plans or maps of a territory form a set of different solutions where each one is evaluated on the basis of a family of criteria. Thus, the search for an optimal solution makes no sense and the “best” solution is, in general, a compromise where the improvement on a given criterion leads to a degradation of the evaluations on at least one of the remaining criteria. This is the concept of non-dominated solution.

A brief historical view on territory partition problems. Historically, among the different types of territory partition problems, it was the so-called electoral districting problem that impelled the use of scientific methodologies that sought the construction of political districts or zones as closed as possible in terms of voting power in order to

(7)

generate trust and the impartiality in the partition process. The importance of this last aspect is due to the fact that it is possible to design or conceive a partition favoring a certain political, social or ethnic group. It was well-known in the history of the U.S.A. politics that the Governor of the state of Massachusetts, Albright Gerry (1744-1814), in an attempt to guarantee his re-election, manipulated the division of his state to concentrate his voters strictly in number enough to elect a representative and to group a large number of his opponents in a few districts [24]. Therefore one of the districts had the shape of a salamander, as it was suggested in the Boston Gazette on March 26, 1812. This fact originated the expression “gerrymander”, the result of the contraction between Gerry and salamander.

The danger that this type of practices represent for the democratic systems justified the dedication and the zeal of some researchers in the construction of new methodologies, as independent as possible of human intervention, that allow to draw electoral districts for all the political groups.

One of the first works [37] on this very topic appeared in 1961. It describes a heuristic process where a zone is built, at each iteration. Each zone results from the successive inclusion of adjacent elementary units to reach a given level of voters. This number is the ratio between the total of population not assigned to any zone and the number of zones to be built. Some years later, in 1965, the first mathematical programming model was proposed [19], formulating the problem as a location/allocation model.

Apart from the political districting problem, the model that most deserved the atten-tion on the area is the problem of designing zones for salespersons. Since 1971 different works have been published where the main objective consists of balancing the workload for the different zones [11, 18, 33, 39].

The frequent criteria and constraints. It is no easy to distinguish between criteria and constraints. Certain criteria are considered constraints in different models. Without going into the details of such a distinction let us consider that all the partition problems we are concerned with can be modelled through an undirected and connected graph, where the vertices represent the elementary units and the edges the adjacency relation between elementary units. Our problem is a particular graph partition problem where each partition is obviously composed of a connected subgraph. The following constraints should be considered:

1. Integrity. A vertex cannot belong to several subgraphs at the same time. It only belongs to one and only one subgraph of the partition;

2. Contiguity. It is possible to define a path from two pairs of vertices in a certain zone without passing through a different zone;

3. Absence of holes. It is not possible to obtain embedded connected subgraphs. In what follows, we only consider these three issues as constraints.

(8)

As for the criteria many classifications where published in scientific literature, mainly in political districting problems [16, 25, 38]:

1. Grofman (1985) proposed the following classification: formal (equal population, con-tiguity, compactness, etc.); racial intent (no intent to dilute minority voting power); political intent (no intent to favor a political party, no intent to favor an incum-bent, etc.); racial outcome/anticipated outcome (no retrogression in representation, no dilutions of racial minority voting strength and ‘racial proportionality’); political outcome/anticipated outcome (no “imposed” bias in favor of a party, no incumbent-centered partisan bias, etc.).

2. Morrill (1981) proposed to classify the political criteria in four class: constitutional (equal population and equal probability of representation); geographic (compactness, contiguity and integrity of political of interest); political-geographic (representation of political units and integrity of political boundaries); political (minimal changes to old plans and no political “gerrymandering”).

3. Williams (1995) proposed the following classification: demographic (equal popula-tion and proporpopula-tional minority representapopula-tion); geographic (contiguity, compactness and community integrity); political (proportionality, safe vs. swing districts and similarity to the old plan)

This type of classification is specific to the political districting problem. In other districting problems there are similar criteria that have an analogous interpretation. For example, while in the political districting problem we search for zones as close as possible in terms of vote power, in the sales territory alignment problem we are looking for zones as close as possible in terms of workload. This similarities allowed us propose a new classification:

1. Homogeneity Criteria. Criteria aiming at distributing/homogenizing some attribute (aspect, characteristic, . . . ) in the partition as a whole or in each zone individually. 2. Geographical Criteria. Criteria aiming at defining districts that respect any geo-graphical attribute, i.e., they account for the spatial position of the territorial units. 3. Flow Criteria. Assuming that there is a flow of any type between zones, we may wish to optimize the flow between zones. This criterion type reflects such an aspect. 4. Conformity Criteria. Criteria aiming at defining districts “compatible” with an

existing territorial partition.

(9)

Solving districting problems. When solving districting problems there are three main aspects to be considered, related with the “techniques” applied to solve this kind of prob-lems, the “models” and the “algorithms”. These aspects are related with the following questions: How shall we deal with the partitions of a territory, by agglomerating ele-mentary units or dividing the whole piece? How many criteria should be considered, a single one or several ones? Should we be concerned with exact solutions or approximated solutions will suffice? Let us consider each point individually.

The different “techniques” for districting problems can be divided in two big families: one based on the concept of division and the other on the notion of agglomeration [8]:

1. In the division techniques the territory is considered as a whole and the districting procedure works by dividing it into pieces [6, 14].

2. In the agglomerative techniques the territory is composed of a set of elementary territorial units and a district is a subset of units forming a connected piece of land [10, 15, 19, 20, 28, 37].

On the other hand we can classify the districting problems in terms of the number of “criteria”:

1. Single criterion models. Models involving only one criterion. Generally, voting po-tential equality, workload equality, etc., is the only objective function considered. However, several models take into account some criteria as constraints, like com-pactness [15, 19, 20].

2. Multiple criteria models. For dealing with conflicting criteria some authors adopted models with more than one criterion. A district map with very good values for voting potential equality can perform very badly in terms of compactness and vice-versa. There are several strategies where the criteria are considered according to a fixed hierarchy reflecting the decision maker preferences. In other cases the purpose is to build a mixed objective function combining all the objectives [1, 3, 5, 10].

The different works in this field can also be classified in exact and non-exact algorithms: 1. Exact techniques for single criterion problems are provided in [23] where it is applied a decomposition and column generation scheme for their solution. Computing exact efficient solutions for districting problems is a very hard task since in general the size of real-world problems is unmanageable.

2. A new avenue for dealing with this problem is the use of meta-heuristics to ap-proximate the exact non-dominated frontier. Indeed, this had been already done by several researchers [2, 27].

Most of the works published in the area deal with a specific problem of districting without establishing one common “platform” for the different partitioning territory prob-lems. The applied heuristics is also specific for each problem. Despite the very nature of

(10)

reality which is mainly multi-dimensional (since there are “always” more than one crite-ria to be optimized) there are still several works applying single criterion models which amalgamate all the dimensions in the same scale and more often provide meaningfulness conclusions. We can say that, in general, exact techniques only have some interest from a theoretical point of view. They can only be applied to small size instances, which are unrealistic indeed.

Main features of the proposed approach. Our approach is a local search evolution-ary based algorithm. The adopted representation for the individuals or solutions is the closest possible one to the solutions themselves. Each solution is a set of subsets where each subset represents a zone. This allows us to guide the operators (crossover and muta-tion) according to the specific criteria of each kind of problem. The main features of the algorithm are the following:

1. It allows to solve large size instances in a reasonable CPU time and for different kinds of problems.

2. It deals with multiple criteria.

3. It allows to consider certain specific constraints of each type of problems.

We intend to create a new platform that allows, with little effort, to adapt our algorithm to any districting problem.

The Paris region case study. The public transportation tickets pricing in the Paris region, “carte orange”, is calculated on the basis of a territory partition of the Paris region into concentric zones. The only criterion considered is the distance from the center. The price increases according to the this distance, but it is the same in each concentric zone. A study undertaken by the Syndicat des Transports Parisien (STP) showed that this districting map does not correspond any more to the needs of the users. With the current tendency of moving services from the center of the big cities to the suburbs, many users of public transportation only use of them inside suburban zones without having the need to move to the center of the town. This fact along with some verifications of socio-economic nature led researchers at STP and LAMSADE to study a reform of the actual pricing ticket system.

The outline of the paper. This paper is organized as follows. Section 2 describes how this problem can be modelled by using graph theory concepts, proposes a taxonomy of the criteria/constraints and presents the main concepts definitions and notation. The local search evolutionary algorithm is described in Section 3. This is followed by an example for checking its effectiveness in Section 4. Section 5 is devoted to the study of a real-world problem. Finally, several important model implementation issues are discussed in Section 6 and some suggestions for future work are also given.

(11)

1 2 3 4 5 6 7 1 2 3 4 5 6 7 (a) (b)

Figure 1: Contiguity graph

2 Problem statement

In this section we will present a model for this problem that uses a connected graph. An original criterion taxonomy is also presented.

2.1 Modelling issues

Given a territory composed of indivisible elementary territorial units or atoms, we define a contiguity graph (see Fig. 1) as a connected graph G = (V, E), where V = {1, 2, . . . , n} denotes the set of vertices representing territorial units and E = {e1, e2, . . . , ek, . . . , em} ⊂

V × V denotes the set of edges, where ek = {i, j}, represents two adjacent elementary

units i and j, i.e., the ones having a common boundary.

Once the contiguity graph G = (V, E) is defined, a district map can be viewed as a partition of V into connected subsets of vertices. Furthermore, all the values associated to the territorial units (population, surface, etc.) can be associated with the corresponding vertices, i.e., for each i ∈ V we have a vector ci = (c1i, c2i, ..., cri) of r values. Analogously,

the values associated to a couple of contiguous territorial units (length of the common frontier, for example) can be associated to the corresponding edge. Thus, for each ej ∈ E

we can have an s-vector dj = (d1j, c2j, ..., csj). We can also have for each pair of vertices

(i, j) one or more values fij representing a flow movement from i to j. It should be noticed

that i and j cannot represent contiguous territorial units.

In this context one solution Y will be represented as a partition of V , as follows,

Y = {y1, y2, . . . , yK}

where, yu∩ yv = ∅, u 6= v and ∪1≤u≤K yu= V.

(12)

2.2 Districting Criteria and Constraints

Various types of criteria can be expressed to define a territorial partition. In addition, different types of constraints can be also imposed. This section aims at proposing a taxonomy for such criteria and constraints. In [4] and [8] we can find an exhaustive list of political districting problem criteria that can be generalized to other districting problems. We propose to classify districting criteria into 4 categories:

1. Homogeneity Criteria. Criteria aiming at distributing/homogenizing some attributes (aspects, characteristics, effects, etc.) in the partition as a whole or in each zone individually;

2. Geographical Criteria. Criteria aiming at defining districts that respect any geo-graphical attribute, i.e., they account for the spatial position of the territorial units; 3. Flow Criteria. Assuming that there is a movement of flow of any type between zones (population, goods, etc.), we may wish to optimize the flow movement between zones; 4. Conformity Criteria. Criteria aiming at defining districts “compatible” with an

existing territorial partition. 2.2.1 Homogeneity Criteria

Homogeneity criteria can be divided in tree different sub-families:

Inter-zone criteria. We may want a partition where some attributes are uniformly distributed over all zones. For the political districting problem the number of voters in each zone should be balanced according to the “one-person-one-vote” principle. Hence, the number of voters in each district must be homogenized. If ci represents the number of

voters in the unit associated to vertex i then pu =P_i∈z_uci represents the total of voters

in zone zu. This can be implemented in differents ways:

1. Minimizing the difference between the number of voters of the zone containing the maximum number of voters and the one comprising the minimum number of voters:

min

max

1≤u≤K{pu} − min1≤u≤K{pu}

(2) 2. Another possibility is to minimize the sum of the deviation from the average:

min K X u=1 |pu− p| (3) where p= 1 K n X i=1 ci

(13)

Intra-zone criteria. We may want a partition where each zone is as uniform as possible according to a certain attribute, regardless of the “value” of this attribute in the remaining zones when considered individually. For example, when considering a district map for a team of salesperson, one of the criteria frequently used is to make a partition of the territory into uniform zones according to some attribute of the population (academic qualifications) which facilitate the specialized tasks of each salesperson or agent. Two possible criteria for modelling this situation are the following:

1. To minimize the sum of the difference between the maximum and the minimum in each zone. min K X u=1 (Mu− mu) (4) where Mu= max i∈zu {ci} and mu = min i∈zu {ci}

2. To minimize the “worst” difference between the maximum and the minimum on each zone. min max 1≤u≤K(Mu− mu) (5)

We call this type of homogeneity intra-zone homogeneity criteria.

Distributing criteria. We can include in this type of criteria the situation when we want balanced zones concerning some attribute. Suppose that each ci represents a

pro-duction level of some service and ri the demand level. We may wish each zone to have

the same production level per demand. Therefore, if C is the total of production and R the total of demand, we can minimize the sum of the deviation of the production level per demand on each zone according to the average:

min K X u=1 P i∈zuci P i∈zuri −C R (6) 2.2.2 Geographical Criteria

The geographical criteria aim at obtaining the partitions composed by zones as similar as possible to a given geometric shape or the zones that include some attributes. This type of criteria is important when, for example, the zones should be visited by some salesperson. So it is convenient that the zones are the as compact as possible (close in shape to a circle or square) to minimize covered distances. Obviously, it would be counter-productive for a salesperson to visit a long and skinny zone. This type of criterion is called compactness.

(14)

Several authors have found measures to evaluate the degree of compactness of a partition. An exhaustive list of compactness indicators can be found in [21].

Any measure that evaluates the compactness of a partition must be based on a measure that evaluates the compactness of a zone. Therefore, the compactness of the partition can be measured by the sum of the result in each zone or taking into account the worse value of them.

To measure the compactness of a zone we can define the ratio between the area of the zone and the area of the circumference with a diameter equal to the maximum distance between two points in the zone. If the area of zu is Au and its greatest distance is Du

then we minimize K X u=1 4Au πD2 u . (7)

This measure has values between 0 and K, being K in the case of maximum compactness when all the zones coincides with a circle.

When we have some units with special attributes (central hospitals, central stations, etc.) we may want to put this attributes in the center of their zones. To evaluate this aspect we need to know the distance between two territorial units. Let dij be the distance

between i and j, and Bzu the set of border units of the zone zu. The set SZ denotes the zones with some services and szu denotes the unit within the service. So, we want to maximize X zu∈SZ m(zu), where m(zu) = min i∈Bzu {dszui} (8)

that is, we want to maximize the sum of minimum distances between the service units and its border units.

2.2.3 Flow Criteria

Let us consider the flow between territorial units. In this situation it would be convenient to optimize the transference between units of different zones. For example, in the case of a partition into zones of public transports ticket pricing, knowing the number of people commuting within each pair of units, would be convenient to minimize the number of people commuting between different zones. For one partition Z, EZ = {{i, j} : i ∈

zu, j ∈ zv, u6= v} represents all pairs of vertices that are in different zones. So we want

to minimize

X

{i,j}∈Ez

fij (9)

where fij represent the number of people going from i to j. Criteria of this kind are called

(15)

2.2.4 Conformity Criteria

Finally, admitting that there is a current partition on a territory we may want the new partition is the as “compatible” as possible with the current one. In this case we must use a measure of compatibility between partitions.

2.2.5 Constraints

For this kind of problems there are two constraints strongly associated with its nature. One is the contiguity and the other is the absence of holes or inclusions in each zone. The first one means that each zone is a single portion of land such that every part is reachable from every other part without crossing the frontiers of the zone. From the contiguity graph point of view this means that the sub-graph associated with each zone must be connected. The second one means that no zone can have others zones “inside of it”.

Most of the literature considers contiguity without providing a mathematical formu-lation in the context of mathematical programming. Enforcing contiguity in a districting map model requires an exponential number of constraints relative to the number of vertices in the graph [24].

The constraint formulation associated with the absence of holes still seems to be of greater difficulty. The typical boarding for this type of difficulties, when we wish a practical application, uses algorithmic techniques. From the algorithmic point of view it is not difficult to test the contiguity of a zone or to verify if inside of it there are other zones.

Normally the constraint related to the absence of holes can be ignored when we have criteria that homogenizes the surface and compactness. It is clear that if we have two embedded zones, of similar size the outer zone will have a bad value of compactness.

Another kind of constraints result from the imposition of upper and/or lower bounds to one or more criteria.

2.3 Multiple criteria background: concepts, definitions and notation

A multiple objective linear program (MOLP) may be written as follows: max{f1(x) = z1} max{f2(x) = z2} .. . max{fk(x) = zk} s.t.: x∈ X or “max” Z = {F (x) = z ∈ Rk| x ∈ X} where:

(16)

x, is the vector with n decision variables;

fi, is a real function defined in Rn representing the ith objective;

zi, is the criterion value (objective function value) of the ith objective;

X, is the feasible region in the decision space;

“max”, means that the purpose is to maximize all the objectives simultaneously; F, is a vectorial function composed of k, fi, functions, i = 1, 2, . . . , k;

z, is the objective function vector.

Z, is the feasible region in the criteria space;

Definition 2.1 (Dominance relation) Let z1_{, z}2 _{∈ R}k _{be two criteria vectors. Then,}

z1 dominates z2 _{iff z}1 _{≥ z}2 _{and z}1 _{6= z}2_{, i.e., z}1 i ≥ z

2

i for all i and z 1 i > z

2

i for at least

one i.

Definition 2.2 (Non-dominated solution) A point z ∈ Z is non-dominated iff it does not exist another z ∈ Z such that z ≥ z and z 6= z. Otherwise, z is a dominated criterion vector.

Definition 2.3 (Non-dominated set) The set, Znd_{⊆ Z, of all non-dominated criteria}

vectors is called non-dominated set.

Definition 2.4 (Efficient solution) A solution x ∈ X is efficient (Pareto-optimal) if its criterion vector, z = F (x), is non-dominated.

Definition 2.5 (Efficient set) The set of all efficient solutions is called efficient set and is represented by Xef f_.

In multiple criteria linear integer programming, two types of non-dominated solutions can be distinguished:

Definition 2.6 (non-dominated supported solution) A point z ∈ Z is a non-domi-nated supported solution if it can be obtained by solving the following parametric mathe-matical programming problem:

min z∈Z k X i=1 λizi for k X i=1 λi = 1, λi >0 for i = 1, . . . , k.

Definition 2.7 (dominated unsupported solution) A point z ∈ Z is a non-dominated unsupported solution if it belongs to the interior of the convex hull of Z, Conv(Z)

(17)

In general, for large-size instances, it is not possible to enumerate all the set Znd, therefore we must approximate it using non-exact methods. The symbol ˆZnd_{represents the}

approximation of Znd_{. This bring us to the concept of potential non-dominated solution.}

Definition 2.8 (Potential non-dominated solution) A point ˆz is a potential non-dominated with respect to a feasible subset ˆZ ⊆ Z iff it does not exist another z ∈ ˆZ such that z ≥ ˆz and z 6= ˆz. Otherwise, ˆz is a dominated criterion vector.

When the subset ˆZ is omitted, it is perceived by the context. Very often the subset ˆZ represents the solutions determined by an algorithm.

Definition 2.9 (Potential non-dominated set) The set, ˆZnd ⊆ Z, of all potential non-dominated criteria vectors with respect to ˆZ is called potential non-dominated set.

3 A local search evolutionary algorithm

A local search evolutionary algorithm (LSEA) results from the combination of an evo-lutionary algorithm with a local search. The expression hybrid evoevo-lutionary algorithm is also used in this context. There are no rules about the way these combinations are done. Normally, each author apply his/her skills when he/she must choose a particular technique.

Over the recent years, combinations of different heuristics had been used, thus open-ing the research path of hybrid algorithms [30]. They have shown their ability to provide high quality local optima. In general, genetic operators are not adjusted to locate a bet-ter solution close to another one [32]. Therefore, the combination between evolutionary algorithms and local search seems to be very profitable [31]. For multiple criteria optimiza-tion problems, evoluoptimiza-tionary algorithms seem also particularly adequate because they deal simultaneously with a set of potential solutions which allows us to approximate several solutions of the efficient set in a single run.

The main feature of our approach is to find, in a low CPU time, a set of potential non-dominated solutions that approximate the exact non-non-dominated frontier. In each iteration, after attributing a fitness value to each individual, two different solutions are “randomly” selected in order to apply the crossover and mutation operators and thus form a new generation. At the same time a list of potential non-dominated solutions is built. When a new solution is successfully inserted in this list, a local search is then applied to it.

This section comprises the following subsections: representing a solution; defining the initial population by using a merging procedure; assigning a fitness value to each individual; selecting individuals; the crossover operator; the mutation operator; a local search procedure; the outline of the algorithm; and some implementation issues.

(18)

1 2 3 4 5 6 7 8 y1 y2 y3 Figure 2: Graph 3.1 Representing a solution

A solution, Y , is represented as a partition, Y = {y1, y2, . . . , yK}, of the set of vertices,

V. Each subset yi is the set of vertices of a connected subgraph of G. Thus, a partition is

implemented by a list of lists where each list represents an element of Y . The following partition,

Y =ny1= {1, 3}, y2= {2, 5, 4}, y3 = {2, 7, 8}

o is represented in Fig. 2. Each element, yi, of Y is called zone.

3.2 Defining the initial population by using a merging procedure

In many applications, the subsets of vertices belonging to the partition are constrained. For example, there are problems where the vertices have an associated non-negative weight and the total weight of each zone is constrained by lower and/or upper bounds. When we have any type of constraint, we refer to these problems as constrained districting problems. Thus, in the process of generating new solutions we must deal with two main concerns:

1. To reach feasible solutions;

2. To obtain the best possible solution according to the criteria.

The initial population, P0, is composed of a set of individuals or solutions,

P0= {Y1, Y2, . . . , YN}.

To generate an individual we start by the trivial solution where each subset of the partition is composed of a unique vertex. The order of each zone is randomly generated for each individual. Then, it is applied a procedure, called “Merging”, that consists in grouping two neighboring zones until reaching the number of zones fixed a priori (see Al-gorithm 1). This procedure is also used elsewhere in the crossover and mutation operators (see Fig. 3). If we have constraints concerning the number of vertices in each zone, the zone

(19)

Merging crossover

new solutions mutation

Figure 3: Merging solutions

comprising the lowest number of vertices is merged with one of its neighbors. The strategy of choosing the neighboring zone depends generally on the criteria or the constraints of each problem. Normally, we use of greedy heuristics that depends on a weighted-sum of the two criteria. In order to promote diversity in the set of initial solutions, the order of the incident edges to each vertex is randomly modified at each time.

Algorithm 1 Merging

Input: A solution Y = {y1, y2, . . . , yK}

Output: A solution W = {w1, w2, . . . , wL} such that L ≤ K

while (K ≥ the fixed number of zones) do

Let yi, yj be two neighbors zones, chosen according to a heuristic rule

defined for each type of criteria; Merge yj and yi;

K ← K − 1; end while

3.3 Assigning a fitness value to each individual

The selected strategy for evaluating the fitness value of each solution makes use of a well-known technique suggested by Srinivias and Deb [34] called Non-dominated Sorting Genetic Algorithm (NSGA).

This technique is based on the Pareto ranking where the individuals from the entire population are classified into several levels according to the concept of dominance. The

(20)

potential non-dominated individuals belonging to the population are identified at first. These individuals form the first front of the potential non-dominated frontier. Afterwards, we assign to each of them a large dummy fitness value, F . In order to preserve the diversity of the population, these individuals, classified in different levels, are then shared according to their dummy fitness values.

The share value of each individual, fi, is determined by dividing its original fitness

value by the quantity

mi = X j∈ND sh(dij) where, sh(dij) = ( 1 − ( dij σshare) 2 _{if d} ij < σshare 0 otherwise .

This quantity is proportional to the number of individuals around it. Thus,

fi =

F mi

.

The value dij is the Euclidian distance between two solution Yi and Yj and σshare is the

maximum distance allowed between any two solutions to become members of a niche (set of solutions that have common features). Afterwards, this front of the potential non-dominated individuals is temporarily ignored to process the other members of the population with the goal of identifying the second front. The new potential non-dominated individuals are then assigned to a new dummy fitness value which is kept smaller than the minimum shared dummy fitness of the previous front. This method continues until the entire population is classified into several fronts, and no more fronts can be identified. This technique is presented in Algorithm 2.

Algorithm 2 Evaluation

Input: A population P = {Y1, Y2, . . . , YN}

Output: A fitness value fi for each Yi

Paux ← P

F ← M, where M is a large dummy fitness value while Paux6= ∅ do

ˆ

Znd ← all potential non-dominated individuals in Paux

for all Yi∈ ˆZnd do

Calculate mi

fi ← _mF_i

end for

Paux ← Paux− ˆZnd

F ← x such that x ≤ min{fi : Yi ∈ ˆZnd}

(21)

-z1

6 _• second frontsolutions already evaluated

·remaining solutions

z2

Figure 4: First front evaluation.

The idea of this technique is to penalize or decrease the fitness value of the solutions being too close, in the criteria space, to some other solutions in the current front. Fig. 4 represents the calculations of fi for the first front. It should be noticed that the size of

each point is inversely proportional to the number of solutions in its niche.

3.4 Selecting individuals

To select each pair of individuals we apply the roulette wheel method. We have the guarantee that the fitness value of any individual is just kept smaller than the minimum fitness value of the previous potential non-dominated front. Therefore, the elements in the first fronts have higher probability of being selected in the next generation than the remaining solutions.

3.5 The crossover operator

The crossover operator takes two parents (solutions) and makes an exchange of sections of its chromosomes. In our case, in order to generate an offspring solution, from two parents,

Y1 = {y11, y 1 2, . . . , y 1 K1} and Y2 = {y12, y 2 2, . . . , y 2 K2} we first choose the k best zones, y1

i1, y

1 i2, . . . , y

1

ik, from partition Y1, according to a weighted-sum. The number k will be chosen randomly within the range [1

4K1, k1 − 1

(22)

1 2 3 4 5 6 7 8 Solution 1 Solution 2 1 2 3 4 5 6 7 8 Phase I 1 2 3 4 5 6 7 8 Phase II 1 2 3 4 5 6 7 8

Figure 5: Crossover operator.

k zones will belong to the offspring solution. Afterwards, we will define an equivalence relation v in the set

V\( k [ j=1 y1_i_j) as follows: • v1, v2 ∈ V \ Sk j=1y 1 ij; • v1 v v2 ⇔ ∃ j ∈ {1, . . . , k2}

such that there is a path between v1 and v2 in y2j\

Sk

j=1y 1 ij.

In other words, we say that, v1 is equivalent to v2 if and only if, for some zone, y_j2, in Y2

there is a path between v1 and v2 in y2j without the vertices of

Sk

j=1y 1 ij. The offspring candidate solution is composed of y1

i1, y

1 i2, . . . , y

1

ik more the equivalence class of the equivalence relation v defined in V \Sk

j=1y1ij. Generally, at this time, the number of zones is greater that the fixed one. Consequently the “Merging” procedure is applied to group zones as described in Algorithm 1 (see Fig. 3). As we can see, in Fig. 5 one choses a zone from solution 1, composed of vertices 2, 4 and 5. In Phase I, the equivalence classes defined by the relation v in {1, 3, 6, 7, 8} are {1}, {3, 6} and {7, 8}. Afterwards, the zones {1} and {3, 6} are merged as we can see in Phase II.

(23)

1 2 3 4 5 6 7 8 Initial Solution 1 2 3 4 5 6 7 8 Phase I 1 2 3 4 5 6 7 8 Phase II

Figure 6: Mutation operator.

3.6 The mutation operator

The mutation operator takes one solution and randomly modifies the chromosome. Typ-ically, there is a probability associated to this operator. In our case this probability is 1 since we can control the way the new solution is obtained without incurring in a great costs.

The operator starts for breaking up a set of zones (the worst zones according to a weighted-sum) in such a way that each vertex constitutes one zone. After this operation the number of zones is greater than the fixed one. Then the procedure “Merging” zones is applied as described in Algorithm 1 (see Fig. 3). Fig. 6 shows the different phases of the mutation. Firstly, the zone with nodes 2, 4 and 5 is chosen and is broken up into three zones (Phase I). In Phase II the zone with node 4 is clustered to the zone with node 5 and the zone with node 2 is clustered to the zone comprising nodes 1 and 3.

3.7 A local search procedure

In local search procedure we use the concept of neighborhood structure. From a current solution we can determine all the neighbor solutions. Neighbor solutions are determined by moving at most one vertex from a certain zone in the current solution to one of its neighbors zones, as Fig. 7 shows.

The possibility of moving a vertex to a neighbor zone is checked. In such a case the move is done thus yielding a different solution that will be checked in order to know if the solution is a potential non-dominated one. If this is the case the solution is added to the list.

As Fig. 7 shows, when moving vertex 2 from zone 2 to zone 1, two new zones are obtained. This move is feasible, but others are not. For example, vertex 8 cannot be moved from zone 3.

Each new potential non-dominated solution, resulting either from the crossover and mutation operators or the application of local search, is placed in a queue data structure. Local search will be applied to it, later on.

(24)

1 2 3 4 5 6 7 8 z1 z2 z3 1 2 3 4 5 6 7 8 z1 z2 z3

Figure 7: Neighborhood structure.

time algorithm.

3.8 Outline of the algorithm

The pseudo-code procedure for the Local Search Evolutionary Algorithm is outlined in Algorithm 3.

Algorithm 3 Outline of the algorithm t← 0

Find an initial population, P0

Compute Fitness in P0

From P0, initialize the set of potential non-dominated solutions, ˆZnd

while stop condition is false do Children ← Crossover (Pt)

Mutant ← Mutation(Children) Update ˆZnd _{with Children ∪ Mutant}

Local Search( ˆZnd)

Compute Fitness in Pt∪ Children ∪ Mutant

Pt+1 ← the best of Pt∪ Children ∪ Mutant

t← t + 1 end while

3.9 Particular implementation issues

During the process of conceptualization (design) and implementation of this algorithm some issues were treated in a very particular way. We emphasize tree main issues:

1. Assigning a fitness value. One important rule in assigning a fitness value is to guarantee that the fitness value of any individual is just kept smaller than the minimum fitness value of the previous potential non-dominated front. When we have to choose the next dummy fitness value it is not enough to keep the minimum fitness

(25)

value of the previous front because, in some cases, the whole population will have the same final fitness value. To avoid this situation, the next dummy fitness value will be 90% of the minimum fitness value of the previous front. This percentage was chosen after an empirical study. With this value we get a very good distribution of the fitness values among the whole population.

2. Selecting individuals. The crossover operator is applied to a couple of solutions previously selected. One of them is chosen in the population while the second one is selected from the set ˆZnd that contains the potential efficient solutions. The main reason is that the new potential efficient solutions resulting from local search are kept in set ˆZnd_{. They are not going directly to the population. Thus, they have an}

opportunity to apply the genetic operator to them.

3. Applying local search. Each new potential efficient solution resulting from genetic operators and local search is stored in a queue structure. Therefore, when the process of applying the genetic operators stops, until the queue structure is not empty, a local search is applied to the first one, and so on, that is, until the last of the remaining solutions in that queue. This process stops before emptying the queue structure when an upper bound of iterations regarding the local search procedure is achieved.

4 Illustrative examples

Our main concern in this section is to test how the algorithm behaves in practice. Fre-quently the instances with known results are use to evaluate the performance of the meta-heuristics (Section 6.3.4 in [36]). Therefore, concerning our problem there are no available instances, consequently, we have to build some examples.

In this section we will propose an heuristic for each kind of criterion to be tested (each criterion requires a specific heuristic). The main feature allowing to distinguish each heuristic is related to the rule for choosing two neighbor zones that will be merged later.

4.1 A small-size example

We will start by building a small-size bi-criteria instance with 22 vertices and 49 edges (see Fig. 8). Concerning this instance we can calculate the entire set of efficient solutions by using the -constraint technique (see [35]) and the Mehrotra mathematical programming model with two criteria.

4.1.1 Evaluating criteria Let us associate values c1

ij and c2ij to each edge {i, j}. We want to maximize the two

criteria

fl(Y ) =

X

{i,j}∈EY

(26)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 32,18 20,16 12,47 45,15 13,50 11,39 37,19 29,21 14,44 10,43 50,10 41,17 18,38 24,47 44,12 38,23 13,42 10,45 37,11 17,47 50,13 15,39 16,41 38,20 41,14 31,10 19,36 28,41 46,16 43,13 31,12 20,33 12,48 40,19 14,38 12,42 47,21 42,15 34,12 27,43 38,21 16,50 10,43 37,18 50,10 19,46 18,47 46,13 18,39

Figure 8: Test Graph

where

EY = {{i, j} ∈ E : i, j ∈ yu, u= 1, 2, . . . , K},

represents the subset of edges between vertices belonging to the same zone.

4.1.2 The bi-criteria Mehrotra-like model and the -constraint method Let x = (xk

i) and r = (rijk) be the decision variables defined as follows:

xk_i =

1 if vertex i belongs to zone k 0 otherwise.

rk_ij =

1 if edge {i, j} belongs to zone k 0 otherwise

Assuming that the number of zones to be formed is K and that every zone must have at least L vertices, the Mehrotra model, in the bi-criteria case, with costs c1

ij and c2ij for each

(27)

max z1= f1(x, r) = K X k=1 X {i,j}∈E c1_ijrk_ij max z2= f2(x, r) = K X k=1 X {i,j}∈E c2_ijrk_ij s.t. rk_ij ≤ xk i rk_ij ≤ xk j ) {i, j} ∈ E (10) rk ij ≥ xki + xkj − 1 {i, j} ∈ E (11) K X k=1 xk_i = 1 i= 1, 2, . . . , n (12) n X i=1 xk_i ≥ L k= 1, 2, . . . , K (13) xk_i, rk_ij ∈ {0, 1}

Constraints (10) and (11) concern the integrity of the meaning of variables. Inequal-ities (10) and (11) ensure that an edge is inside a zone if and only if both of its vertices are in that zone. Inequality (12) ensures that each vertex must be in some zone and (13) ensures that each zone must have at least L vertices.

The -constraint problem associated with the previous bi-criteria problem can be state as follows, max z1 = f1(x, r) s.t. (x, r) ∈ X z2 ≥ (14) where is a scalar.

In the -method, varies among all the values in which (14) remains feasible. So, in order to identify a set of efficient solutions, a sequence of problems (14) is solved for each different value of [7]. For the integer bi-criteria linear problem, the entire non-dominated set Ynd _{can be determined by solving a sequence of problems (14).}

Theorem 4.1 (Theorem of equivalence [17]) Consider ≤ max f2(x, r). If the

solu-tion (x∗_{, r}∗_{) solves problem 14 and when (x}∗_{, r}∗_{) is not unique it leads to a maximal value}

for criterion z2, then (x∗, r∗) solves the original bi-criteria problem, that is, (x∗, r∗) is an

(28)

ranges K min. vertices non-dominated sol. c1 e∈ [1, 9] 1 4 4 16 c2_e∈ [1, 9] 2 5 3 19 c1_e ∈ [1, 20] 3 4 4 15 c2 e ∈ [1, 20] 4 5 3 17 c1 e ∈ [10, 50] 5 4 4 20 c2 e ∈ [10, 50] 6 5 3 37

Table 1: Tests Problems

4.1.3 The entire non-dominated set

For the calculation of the non-dominated set we used LINGO [22] solver. For the same graph (see Fig. 8) we used three different weight ranges, [1, 9], [1, 20] and [10, 50] (see Appendix A), and for each one we imposed partitions with 4 zones where each one must have at least 4 vertices and the partitions with 5 zones with the minimum number of vertices per zone equal to 3. Therefore, we have six different instances for the same graph. Table 1 contains the number of exact non-dominated solutions for each instance.

4.1.4 Heuristic

For this type of criteria the heuristic developed can be summarized as follows: 1. Choose the zone with the lowest number of vertices, ym.

2. For each neighbor, yk, of ym determine the sum σ1(σ2) of the edges weights c1ij(c2ij)

linking yk and ym. A weighted-sum λ1σ1+ λ2σ2 is used to rank all neighbors

ac-cording to a decreasing order.

3. Among its neighbors, choose the first zone yN m for which the constraint associated

with the number of vertices per zone is not violated. 4. Merge zones ym and yN m.

The objective of steps 1 and 3 is to obtain solutions that fulfill the constraint associated with the number of vertices per zone. Step 2 aims at choosing the best contribution to the criteria.

4.1.5 Results

The parameters for the LSEA are the following: 1. population size pop size = 400;

(29)

·Elementary territorial unit

— Adjacent elementary territorial unit

Figure 9: Paris region contiguity graph

3. maximum generations max gen = 20.

For each instance we have run the algorithm ten times.

As for the first five problems, the algorithm was able to find all the exact efficient solutions. Indeed, almost all the solutions were found with the heuristic used to determine the initial population. As for the last problem, the algorithm, after running ten times, found all the exact solutions 7 times and identified 36 exact solutions in the remaining runs.

These results allow to conclude that the heuristic to determine the initial population is effective, since almost all the non-dominated solutions were determined. But this fact did not allow us to test the “work” of the genetic operators.

4.2 A large-size example

To develop a more rigorous evaluation of the performance of our algorithm, we decided to use a large-size instance with 1300 vertices and 3719 edges (see Fig. 9), which corresponds indeed to the real-world graph concerning the Paris region transportation problem.

(30)

4.2.1 Evaluating criteria and constraints

To have some idea of how the algorithm behaves with this data we used the following strategy that allows us to know two exact solutions of the efficient set, the optimal solution for each criterion. One instance, with two values c1

i and c 2

i for each vertex, was built as

follows:

1. Consider two homogeneity criteria, z1 and z2, as the ones described in Section 2.2.1,

“measure” 4;

2. From a large set of solutions, previously generated, select the two most distant solutions Y∗1 _{= {y}1

1, y21, . . . , yk1} and Y∗2 = {y12, y22, . . . , yk2} according to a distance

in the decision space defined a priori.

3. Choose the solution Y∗1_(Y∗2_{) for set values to c}1

i(c2i). For each zone give the same

value to all its vertices, although different values are assigned to each zone.

Therefore, we have the guarantee that Y∗1_(Y∗2_{) is the optimum solution for the first} (second) criterion with value 0. That is,

fl(Y∗l) = k X u=1 max i∈yl u {cl i} − min i∈yl u {cl i} = 0 for l = 1, 2

The constraints are the following: 1. number of zones: 30;

2. number of vertices per zone: between 20 and 70.

4.2.2 Heuristic

For this type of criteria the heuristic developed can be summarized as follows: 1. Choose the zone with the lowest number of vertices, zm.

2. For each neighbor, zk, determine the difference δ1(δ2) between the maximum and the

minimum weight c1

i(c2i) after a possible merging with zm. A weighted-sum λ1δ1+λ2δ2

is used to rank all the neighbors according to an increasing order.

3. Among its neighbors, choose the first zone zN m for which the constraint associated

with the number of vertices per zone is not violated. 4. Merge the zones zm and zN m.

(31)

·

One potential non-dominated solution

z1

z2

Y∗2

Y∗1

Figure 10: Initial potential non-dominated solutions

4.2.3 Results

The parameters for the LSEA were fixed as follows: 1. population size pop size = 300;

2. crossover probability cp = 0.3;

When the criteria weights (λ1, λ2) are set to (1, 0) and (0, 1), the heuristic applied

for generating a solution determines, almost always, the two efficient solutions Y∗1 _and Y∗2_{. Fig. 10 shows the 59 potential non-dominated solutions extracted from the initial} population. As we can see the solutions Y∗1 _{and Y}∗2 _{had been found.}

Fig. 11 shows the potential non-dominated solutions identified after 50 generations. 190 potential efficient solutions were found. Many of them although, have the same value in the criteria space. As we can see, it shows the performance of the genetic operators.

Nevertheless it will not be possible to determine the set Znd _{for this instance that}

allows to evaluate, in a rigorous way, the performance of our algorithm. Our intuition led us to assert that we got good results and is based on two points:

1. The heuristic that chooses the two neighboring zones to be merged has revealed a very good behavior. Therefore, in the generation process of new solutions, and when requested, this heuristic was able to determine, with high frequency, solutions Y∗1 and Y∗2_;

(32)

+Initial potential non-dominated solutions

·

Final potential non-dominated solutions

z1 z2 + + + +₊ ++₊ + ++ + ++ + ++ + + + + + + + ₊ + + + +₊ ++ + + + + + + ++ + + + + + + + + ++ +++++++++

Figure 11: Final potential non-dominated solutions

2. Since that this type of criteria have a “smooth” variation (the weights belong to a small range), it is foreseeable that the curve of the potential non-dominated solutions presents a certain “smoothness” as shown in the Fig. 11.

5 Case study:

The public transportation pricing system

problem

Observation of social and economic trends (falling population in the inner city of Paris, increased commuter flows between the center and the suburbs, and demand for local ticket prices) led the Syndicat des Transports d’Ile de France (STIF, the Paris transportation authority) to re-examine the current ticket pricing system.

The pricing system is grounded on the definition of geographical zones. The current districting map or zoning is defined by concentric zones, which does not correspond to the use of the transportation network (see Fig. 12). Therefore one of the first objectives of the reform is to modify the zoning map grounding the ticket prices. Such a problem considers approximately 1300 atoms or elementary units (the municipalities in the Paris region) and each zone of the new zoning map is supposed to represent autonomous units as far as public transportation is concerned. This autonomy of zones is modelled by several criteria (see [26]). Hence this problem involves multiple criteria and is of a combinatorial structure. We applied our algorithm to this real-world problem.

(33)

Figure 12: The current “Carte Orange” map

5.1 Data set

The descriptors were defined to highlight the acceptability of a zone in a districting map choice and validated by the “stakeholders” in the ticket pricing reform process. They are grouped according to the type of concern they refer to. Thus, for each territorial unit we have some real data that allow to build the criteria from de following descriptors:

1. Location of the zone with respect to the network • Number of stations in rail network (rsi);

• Number of buses on road service; • Density of the internal offer;

• Density of the external offer on the rail network; • Density of the bus external offer;

• Location of the stations in rail network. 2. Mobility structure within a zone

(34)

• Commuting;

• Presence of public services.

3. Zone corresponding to administrative structures

• Conformity to the current department boundaries; • Conformity to the current urban community boundaries. 4. Centers of attraction in the zone

• Location of shopping centers and malls; • Location of healthcare centers.

5. Social nature

• population (popi);

• active population (act popi);

• homes without cars (h0ci);

• homes with one car (h1ci);

• homes with two or more cars (h2ci).

6. Geographical nature • surface (surfi).

From the descriptors h0ci, h1ciand h2cion each unit i we constructed two new descriptors:

1. The proportion of homes with two or more cars, ph2ci =

h2ci

h0ci+ h1ci+ h2ci

;

2. The proportion of homes with one or more cars, ph1c_i = h2ci+ h1ci

h0ci+ h1ci+ h2ci

.

5.2 Criteria and constraints

The need to create criteria for comparing zones is manly due to the fact that several districting map choices have to be made by stakeholders. Stakeholders must, therefore, be able to compare their choices of zoning using criteria which have been accepted by consensus as a basis for comparison. The criteria were chosen by all and were deemed suitable for this task. Some of them are presented below.

With the available data we built a set of criteria. Each criterion is formed in two stages:

(35)

data Evaluation of yu Evaluation of Y Max\Min surf_i S(yu) = X i∈yu suf_i f1(Y ) = max yu∈Y S(yu) − min yu∈Y S(yu Min popi P(yu) = X i∈yu popi f2(Y ) =) max yu∈Y P(yu) − min yu∈Y P(yu) Min act pop_i AP(yu) = X i∈yu

act pop_i f3(Y ) = max yu∈Y AP(yu) − min yu∈Y AP(yu) Min rsi RS(yu) = X i∈yu rsi f4(Y ) = min yu∈Z RS(yu) Max ph2ci H2(yu) = max i∈yu ph2ci− min i∈yu ph2ci f5(Y ) = max yu∈Y H2(yu) Min ph1ci H1(yu) = max i∈yu ph1ci− min i∈yu ph1ci f6(Y ) = max yu∈Y H1(yu) Min

Table 2: The criteria

1. A value for each zone yu is determined.

2. The set of values are aggregated in a unique number representing the value of the criterion for a solution Y = {y1, y2, . . . , yk}.

Tab. 2 shows the criteria. All of them are homogeneity criteria (see Section 2.2.1). The inter-zone homogeneity criteria are:

• f1, surface homogenization;

• f2, population homogenization;

• f3, active population homogenization;

• f4, rail station homogenization.

The intra-zone homogeneity criteria are:

• f5, homogenization of the proportion of homes with 2 or more cars;

• f6, homogenization of the proportion of homes with 1 or more cars.

To build up bi-criteria problems we have coupled only the relevant pairs. These pairs are the following: (f1, f2), (f1, f3), (f1, f4), (f1, f5) and (f1, f6). Although, we are able

to apply the algorithm with all the criteria, we opted for choosing couples of criteria. In this way we can visualize the set of the potential non-dominated solutions and the “stakeholders” found it a very interesting starting point.

The constraints are related to the compactness, the number of zones and the number of units per zone. The degree of compactness C(Y ) of a district map, Y = {y1, y2, . . . , yk},

(36)

couple K = 20 K= 25 K= 30 K∈ [20, 30] (f1, f2) 199 329 167 250 (f1, f3) 82 307 160 158 (f1, f4) 37 16 51 57 (f1, f5) 72 35 334 97 (f1, f6) 347 567 125 988

Table 3: Number of potential efficient solutions.

is equal to the degree of the worse one of its zones c(yu) according to the compactness,

i.e.

C(Y ) = min

yu∈Z c(yu).

The degree of compactness of a zone results of the quotient between its surface, S(yu) and

the surface of the smallest circumference that will enclose it, S(y◦_u), i.e.

c(yu) =

S(yu)

S(y◦

u)

.

We decided to choose empirically an acceptable limit for compactness and to make tests with the following characteristics:

1. A fixed number of zones: 20, 25 and 30;

2. A variable number of zones: between 20 and 30; 3. A number of units per zone: between 20 and 110.

5.3 Results

In this case the parameters for the LSEA were the following: 1. population size pop size = 700;

2. crossover probability cp = 0.5;

Table 3 presents the number of potential efficient solutions found for each couple of criteria. In many cases the number of the corresponding solutions in the criteria space is very small. For example, couple (f1, f6) when K ∈ [20, 30] has 988 potential efficient

solutions, but in the criteria space these solutions correspond only to 7 points.

In the Figures 13 –16 we can see the graphics with the initial and final non-dominated solutions for the pair (f1, f3) (the homogenization of surface and active population) when

(37)

f1(100.000) f3(100.000) | | | | | 0 3 6 9 12 15 | | | | | 3 6 9 12 15

·

+ + + + + + ++ + + ₊

Figure 13: Potential non-dominated solutions: K = 20

K = 20, K = 25, K = 30 and K ∈ [20, 30]. For all of them the progress made by the LSEA is clear. In some cases the value of f1 and f3 improved more than 50%. The graphics of

the remaining couples are presented in Appendix B. Fig. 17 represents the best district map concerning surface homogenization, criterion f1, when K = 25.

The results reported in this section reveal more or less how the decision making process evolved since we start the analysis of the case study, in particular, the elements that concern the “resolution” of the “problem”. Several aspects should be taken into account for a better understanding of the decision making process:

1. The current model comprised 6 criteria built according to the descriptors presented in Section 5.1.

2. The current algorithm is able to deal with all the criteria simultaneously.

3. But, from a practical point of view “stakeholders” at STIF were unable to capture several aspects of the problems and proposed to start the analysis by an elementary level rendering it easier to understand the real-world decision making process. 4. According to their suggestion, we decided to analyze the problem taking into account

only pairs of criteria and observe what happens when regarding the criteria space. 5. The generation of the potential non-dominated frontier was well accepted by the

“stakeholders”, but they wanted always to fix their study in a particular region of such a frontier.

6. After locating that region some solutions were picked up and the associated maps were built.

(38)

f1(100.000) f3(100.000) | | | | | 0 ₃ ₆ ₉ ₁₂ ₁₅ | | | | | 3 6 9 12 15

·

+

+ + ++++₊₊₊

+₊ + _{++ +}

Figure 14: Potential non-dominated solutions: K = 25

f1(100.000) f3(100.000) | | | | | 0 ₃ ₆ ₉ ₁₂ ₁₅ | | | | | 3 6 9 12 15

·

+ + + + + + ++

(39)

f1(100.000) f3(100.000) | | | | | 0 ₃ ₆ ₉ ₁₂ ₁₅ | | | | | 3 6 9 12 15

·

+ + + + + +₊ + +₊

Figure 16: Potential non-dominated solutions: K ∈ [20, 30]

(40)

6 Conclusions and future research

In view of the large scope of application areas dealing with districting problems and the increasing number of published works in this field it is necessary to develop a new terminology and make bridges among the different districting problems. This research represents an initial attempt to characterize all districting problems. In this paper we presented a new taxonomy for different kinds of criteria classifying then in four classes. A new local search evolutionary algorithm for this type of problems was also presented. Our algorithm is a hybridization of recombination operators with local search that allows the use of local heuristics taking into account the nature of each criterion. We used a solution coding that allowed us to guide the genetic operators towards the criteria. The combinations with the local search allows to check all efficient solutions near the new found solutions.

Computational experiments and results showed that the new algorithm can effectively generate all exact efficient solutions for small-size instances. For large-size instances the algorithm generates a high quality potential efficient solution from all regions of the non-dominated set. The tests performed showed the excellent combination between evolu-tionary algorithms and local search. The possibility of guiding the genetic operators also showed the high convergence increment that allowed founding good solutions very quickly. On the whole, the research work not only develops the districting problem but also shows the potential power of EAs for combinatorial problems with multiple criteria.

On the one hand we believe that our work has a great potential of development. Its great flexibility allows to adapt it to many kinds of problems of a different nature. On the other hand the LSEA can be improved in some aspects: the search in the neighborhood structure can be improved because there is a high level of intersections between neigh-borhoods. In the future, a more or less “automatically” interactive procedure should be implemented in order to find the “best” solution according to the “stakeholders” prefer-ences.

Acknowledgements

This work has benefited from the luso-french grant PESSOA 2004 (GRICES/Ambassade de France au Portugal). The first two authors would like to acknowledge the partial support from MONET research project (POCTI/GES/37707/2001) and the second author acknowledges the support of the grant SFRH/BDP/6800/2001 (Funda¸c˜ao para a Ciˆencia e Tecnologia, Portugal).

(41)

References

[1] P.K. Bergey, C.T. Ragsdale, and M. Hoskote. A decision support system for the electrical power districting problem. Decision Support Systems, 36:1–17, 2003. [2] P.K. Bergey, C.T. Ragsdale, and M. Hoskote. A simulated annealing genetic algorithm

for the electrical power districting problem. Annals of Operations Research, 121:33– 55, 2003.

[3] J.M. Bourjolly, G. Laporte, and J.M. Rousseau. Découpage electoral automatisé: Application a l’ile de montréal. INFOR, 19:113–124, 1981.

[4] B. Bozkaya. Political Districting: A Tabu Search Algorithm and Geografical Inter-faces. PhD thesis, University of Alberta, 1999.

[5] B. Bozkaya, E. Erkut, and G. Laporte. A tabu search heuristic and adaptive memory procedure for political districting. European Journal of Operational Research, 144:12– 26, 2003.

[6] C.W. Chance. Representation and reaportionment. Political studies: Number 2, Dept. of Political Science, Ohio State University, Columbus, 1965.

[7] V. Chancon and V.V. Haimes. Multiobjective Decision Making: Theory and Method-ology. Elsevier, North-Holland, New York, 1983.

[8] P.G. Cortona, C. Manzi, A. Pennisi, F. Ricca, and B. Simeone. Evaluation and Optimization of Electoral Systems. SIAM Monographs on Discrete Mathematics and Applications, Philadelphia, 1999.

[9] S.J. D’Amico, S.J. Wang, R. Batta, and C.M. Rump. A simulated annealing approach to police district design. Computers & Operations Research, 29:667–684, 2002. [10] R.F. Deckro. Multiple objetive districting: A general heuristic approach using

mul-tiple criteria. Operational Research Quarterly, 28:953–961, 1979.

[11] C. Easingwood. A heuristic approach to selecting sales regions and territories. Oper-ational Research Quarterly, 24(4):527–534, 1973.

[12] J.A. Ferland and G. Gu´enette. Decision support system for the school districting problem. Operations Research, 38:15–21, 1990.

[13] B. Fleischmann and J.N. Paraschis. Solving a large scale districting problem: A case report. Computers & Operations Research, 15(6):521–533, 1988.