Three contributions to the P

(1)

Three contributions to

the P R O M E T H E E I I method

Stefan EPPE

Thèse présentée en vue de l’obtention du grade de Docteur en Sciences de l’Ingénieur et Technologie

Année académique 2014 – 2015

Computer and Decision Engineering (CoDE) Ecole polytechnique de Bruxelles

Université libre de Bruxelles

(2)

(3)

Foreword

The present thesis is an aggregation of published contributions related to the P R O M E T H E E

I I method. The aim of the introductory chapter is to offer a synthesised, self-contained, and notationally unified presentation of the scientific contributions that constitute the thesis. As this chapter does not give the details of the journal articles and conference papers, their reprints are found in Chapter 2. In appendix, we also provide reprints of published works that are related to the general topic of the thesis, providing a complete overview of our research work in this topic area (Appendix A).

Although not directly associated to the thesis’ main subject area, we have also worked on multi-criteria clustering, integrating the specificities of multi-criteria preference relations into the distance measure. We provide the related journal article and conference paper in Appendix B.

Finally, we would like to mention here that we assume that the reader is familiar with the field of multi-criteria decision aid and its different approaches, including outranking methods.

i

(4)

(5)

Foreword i

1 Synthesis 1

1.1 Introduction . . . . 1

1.2 General methodological approach . . . . 3

1.3 Notations . . . . 4

1.4 Approximating P R O M E T H E E ’s net flow scores . . . . 5

1.4.1 The piecewise linear approximation model (PLA) . . . . 6

1.4.2 Main results of numerical simulations . . . . 7

1.4.3 Conclusions & outlook . . . 10

1.5 Eliciting P R O M E T H E E ’s weight parameters . . . 12

1.5.1 The extended Q-Eval method . . . 16

1.5.2 Evolutionary multi-objective optimisation . . . 22

1.5.3 Conclusions . . . 26

1.6 Determining an exact condition for rank reversal . . . 27

1.6.1 Main results . . . 27

1.6.2 Conclusions . . . 30

2 Core thesis publication reprints 31

3 General conclusions 99

Bibliography 103

A Reprints of publications related to the thesis 107

B Paper reprints relating to the secondary research topic of

Multi-criteria Relational Clustering 129

iii

(6)

(7)

Chapter 1 Synthesis

1.1 Introduction

P R O M E T H E E is a family of widely used multi-criteria decision aid methods (Behzadian et al.

2010; Brans and Mareschal 2005) that are based on outranking relations. Amongst the set of methods available, we will focus on the P R O M E T H E E I I method that provides a complete ranking over a set of actions considered by a decision maker.

A preliminary work (Eppe et al. 2011) has shown that a direct integration of the P R O M E T H E E

I I preference model into an evolutionary multi-objective optimisation (EMO) algorithm is not straightforward and that it presents some difficulties. Indeed, attention has been drawn by a growing number of authors (Coello 2000; Köksalan 2009) to the need of investigating al- ternative preference models to the Pareto dominance relation, particularly when addressing many-objective combinatorial problems, i.e., problems that have more than three conflicting ob- jectives. As a matter of fact, besides the difficulty of visualising (an approximation of) the Pareto optimal frontier in more than two dimensions, the exponentially growing fraction of non domi- nated solutions with respect to the number of feasible solutions is probably the most impeding factor. For 7 objectives, for instance, an approximation to the Pareto optimal solution set is not of much help to an actual decision maker, because as much as 98% of the feasible solutions may already be efficient (Farina and Amato 2002). Our first attempts aimed at strongly reducing the number of potential solutions presented to the decision maker, by integrating the P R O M E T H E E

I I preference model into an optimisation algorithm, hence actively guiding the search towards supposedly preferred regions of the Pareto optimal frontier. However, this approach has shed light on:

1. the computational load of calculating P R O M E T H E E I I’s net flow scores 2. the sensitivity regarding the choice of the preference parameters

3. the occurrence of rank reversal.

Although returning to the initial question of integrating the P R O M E T H E E I I preference model into an EMO is certainly of interest and could be the subject of future works, we have chosen to focus on three topics that provide elements to solve the above mentioned difficulties. Below, we give first a high level description of these contributions and summarise their main results. Each contribution will be described in more detail in the subsequent sections.

1

(8)

Contribution 1: Approximating P R O M E T H E E I I ’s net flow scores

• Stefan Eppe and Yves De Smet (2014b). “Approximating PROMETHEE II’s net flow scores by piecewise linear value functions”. In: European Journal of Operational Research 233.3, pp. 651 –659

P R O M E T H E E I I is a prominent method for multi-criteria decision aid that builds a complete ranking on a set of potential actions by assigning each of them a so-called net flow score. However, to calculate these scores, each pair of actions has to be compared, causing the computational load to increase quadratically with the number of actions, eventually leading to prohibitive execution times for large decision problems. For some, a trade-off between the ranking’s ac- curacy and the required evaluation time may nevertheless be acceptable. Therefore, as a first contribution, we propose a piecewise linear model that approximates P R O M E T H E E I I’s net flow scores and reduces the computational complexity (with respect to the number of actions) from quadratic to linear. Simulations on artificial problem instances allow us to quantify this time/quality trade-off and to provide probabilistic bounds on the problem size above which our model approximates P R O M E T H E E I I’s rankings in a satisfying way. The simulations show, for instance, that for decision problems of 10,000 actions evaluated on 7 criteria, the Pearson correlation coefficient between the original scores and our approximation is of at least 0.97.

When put in balance with computation times that are more than 7,000 times faster than for the exact P R O M E T H E E I I model, the proposed approximation model represents an interesting alternative for large problem instances. Areas where the ability of rapidly handling vast amounts of multi-dimensional information is of paramount importance relate, for instance, to the advent of the big data phenomenon. Other possible application domains are on-line configuration tools and, as already mentioned, the integration of preferences into an EMO algorithm.

Contribution 2: Eliciting P R O M E T H E E I I ’s preference parameters

• Stefan Eppe, Yves De Smet, and Thomas Stützle (2011). “A bi-objective optimization model to eliciting decision maker’s preferences for the PROMETHEE II method”. In: Algorithmic Decision Theory, Third International Conference, ADT 2011. Ed. by Ronen I. Brafman, Fred S. Roberts, and Alexis Tsoukiàs. Vol. 6992. Lecture Notes in Computer Science. Springer, Heidelberg, Germany, pp. 56–66

• Stefan Eppe and Yves De Smet (2014a). “An adaptive questioning procedure for eliciting PROMETHEE II’s weight parameters”. In: International Journal of Multicriteria Decision Making 4.1, pp. 1–30

Eliciting the preferences of a (group of) decision maker(s) is an important and challenging aspect when applying multi-criteria decision aid methods on real applications. Yet, it is still a widely neglected one, particularly for the P R O M E T H E E methods. As a second contribution, we propose two different approaches to eliciting P R O M E T H E E I I’s preference parameters. Both are based on the aggregation/disaggregation approach (Jacquet-Lagrèze and Siskos 1982): given a set of actions, the decision maker is asked to provide holistic information, stating, for instance, his overall preference of one action over another rather than giving information at the preference parameter level.

The first method is based on a bi-objective optimisation algorithm aiming at minimising the

incoherence of the elicited weights (and the therewith induced ranking) while, simultaneously

maximising the “stability” of the elicited parameters. Encouraging simulation-based results show

(9)

that the stability of a set of preference parameters can be highly improved at only a small cost with respect to incoherences (constraint violations).

The second method that we propose is an extension of an existing procedure named Q-Eval (Iyengar, Lee, and Campbell 2001). It has been developed in the context of multi-attribute utility and refines the estimation of the preference parameters by iteratively querying the de- cision maker (DM) through pairwise action comparisons. We have adapted the method to P R O M E T H E E I I and implemented two additional types of queries. These queries have led us to further extend the procedure to have the query type being adaptively selected by the algorithm. In this particular contribution, the focus has been put on minimising the number of queries that a decision maker would have to be burdened with, and the obtained results are very positive in this respect.

However, although quite successful in eliciting the criteria’s relative weights, both approaches have shown the difficulty of determining the indifference and preference thresholds. As will be suggested below, findings concerning the approximation of P R O M E T H E E I I’s net flow scores (the first contribution) may provide useful insights that could facilitate eliciting also the thresh- olds.

Contribution 3: Determining the condition for rank reversal

• Stefan Eppe and Yves De Smet (2015). On the influence of altering the action set on PROMETHEE II’s relative ranks. Tech. rep. TR/SMG/2015-002

Note. This technical report has been submitted to the Omega journal on 20th March, 2015.

As for several multi-criteria decision aid methods, the relative ranks of two actions induced by P R O M E T H E E I I may be inverted when the original set is altered (Keyser and Peeters 1996) . This phenomenon is known as rank reversal and is much debated in the multi-criteria decision aid (MCDA) community. As third contribution, we formalise rank reversal for the P R O M E T H E E

I I method and derive the exact conditions for its occurrence when one or more actions are added or removed from/to the original set. These conditions eventually lead us to (1) assess whether or not rank reversal between a given pair of actions is, at all, possible, and (2) characterise the evaluations of the actions that have to be added or removed to induce rank reversal. We also propose two metrics that express the “strength” of and the “sensitivity” towards rank reversal;

we show on a toy example how they could be used in a decision making process.

1.2 General methodological approach

The common approach for the validation process of our contributions is the use of numerical simulations. The use of artificially generated datasets rather than “real” ones (used as illustrative applications) is mainly motivated by the repeatability of simulations done on different datasets for which specific characteristics, such as cardinality or distribution, can be freely chosen. We have standardised our approach for generating artificial data sets by using a set of five different random distributions (Figure 1.1). For most simulations, we randomly assign one of these dis- tributions to each criterion (each distribution can be assigned to zero, one, or more criteria) to foster instances with mixed distributions (MX).

However, our approach may still be subject to criticism. In particular, the evaluations’ corre-

lation on two or more criteria is not explicitly taken into account, although its effect may, in

(10)

0 1

y f (y)

(a) D1

0 1

y

f(y)

(b) D2

0 1

y

f (y)

(c) D3

0 1

y

f(y)

(d) D4

0 1

y

f (y)

(e) D5

Figure 1.1: Shape of the probability density functions f (y) used to generate random instances:

f (y) is the probability density that a given action evaluates to y.

general, not be negligible. We assume for all generated datasets used in this work, that criteria are independent, but this hypothesis only has a limited impact on our simulations. Indeed, we mainly focus on the unicriterion level (e.g., for the approximation of net flow scores in Sec- tion 1.4 or the rank reversal phenomenon in Section 1.6), merely aggregating these results through a weighted sum as a last step.

On the contrary, for the ease of presentation, we also make some explicit assumptions that do not impact the validity of the presented results in any way:

– we consider maximisation problems, the case of criteria that have to be minimised require only trivial, purely formal changes to the presented results.

– the evaluation range on each criterion is normalised: f _h (a _i ) ∈ [0, 1], ∀ (i, h) ∈ I × H. A normalisation being performed anyway through the P R O M E T H E E I I preference function, this assumption has no incidence.

1.3 Notations

The notations used in the contributing papers are not fully consistent. We therefore introduce here some unified notation that is used to present the contributions in this introductory chapter.

Let A = { a 1 , . . . , a n } be a set of n actions. Each action a i , i ∈ I = { 1, . . . , n } , is described by means of m evaluations g _h (a _i ), h ∈ H = { 1, . . . , m } . Without loss of generality, we assume that g _h (a _i ) ∈ [0, 1], ∀ (i, h) ∈ I × H.

To compare any pair of actions (a _i , a _j ) ∈ A × A, a so-called preference function P _h (a _i , a _j ) is introduced for each criterion index h to express the preference degree, on that criterion, of one action over the other. Although six different types of preference functions are proposed in the literature (Brans and Mareschal 2005), we will limit ourselves to the widely used and versatile piecewise linear (“V-shaped with indifference”) preference function P : A × A → R [0,1] , with indifference and preference thresholds that are respectively denoted by q _h and p _h (Figure 1.2):

P _h (a _i , a _j ) =

 



 

0 , if ∆g _h (a _i , a _j ) ≤ q _h

∆g

h

(a

i

,a

j

)−q

h

p

h

− q

h

, if q _h < ∆g _h (a _i , a _j ) ≤ p _h 1 , if p _h < ∆g _h (a _i , a _j ) where ∆g _h (a _i , a _j ) = g _h (a _i ) − g _h (a _j ) for a maximisation problem. ¹

1

It would suffice to define ∆g

h

(a

i

, a

j

) = g

h

(a

j

) − g

h

(a

i

) to adapt to the case where criterion h would have to

be minimised instead of maximised.

(11)

0 ∆g

h

(a

i

, a

j

) 1

P

h

(a

i

, a

j

)

q

h

p

h

1

Actionsai andaj are indifferent on criterionh.

Actionai is progressively preferred to actionaj.

Actionaiis strictly preferred to action aj on criterionh.

Figure 1.2: P R O M E T H E E ’s “V-shaped” preference function on criterion h, for an action pair (a _i , a _j ) ∈ A × A. ∆g _h (a _i , a _j ) = g _h (a _i ) − g _h (a _j ) in the case of a maximisation problem. For each criterion, an indifference threshold q _h and a preference threshold p _h have to be provided by the decision maker.

The pairwise action comparisons are aggregated for each action and provide the unicriterion net flow score φ _h (a _i ) on criterion h:

φ _h (a _i ) = _n ₋ ¹ ₁ X

a

j

∈ A

∆P _h (a _i , a _j ),

where ∆P _h (a _i , a _j ) = P _h (a _i , a _j ) − P _h (a _j , a _i ) is the difference of preference functions. Finally, the unicriterion scores are aggregated over all criteria through a weighted sum to yield that action’s net flow score:

φ(a _i ) = X

h∈H

w _h φ _h (a _i ), (1.1)

where w = { w ₁ , . . . , w _m } is a vector of the criteria’s relative importance, with w _h ≥ 0, ∀ h ∈ H, and P

h ∈ H w _h = 1.

For any pair of actions (a i , a j ) ∈ A × A, we have one of the following relations: (i) the rank of action a _i is better than the rank of action a _j , iff φ(a _i ) > φ(a _j ); (ii) the rank of action a _j is better than the rank of action a _i , iff φ(a _i ) < φ(a _j ); (iii) action a _i has the same rank as action a _j , iff φ(a i ) = φ(a j ).

1.4 Approximating P R O M E T H E E ’s net flow scores

Despite Moore’s law on the exponentially increasing computational power of computers, the time required to compute P R O M E T H E E ’s net flow scores remains relatively high when compared to other MCDA methods, such as MAUT (Dyer 2005). This issue is faced by all outranking methods, due to their very logic of comparing all pairs of actions. It induces that the time required for evaluating the net flow score of each action is of quadratic complexity, O n ²

, with respect to the total number n of considered actions.

Decision processes usually take some time and putting efforts into speeding up computation times of P R O M E T H E E I I’s net flow score may therefore seem irrelevant or even artificial.

However, there is an increasing number of applications, often border-line with (combinatorial) optimisation and big data, that strongly advocates the need for fast computation.

One concrete application where the effort for computing an action’s score should be reduced

to its minimum relates to the integration of preference models into heuristic algorithms for

(12)

solving multi-objective optimisation problems. Indeed, numerical experiments (Eppe et al. 2011) suggest that using the P R O M E T H E E I I preference model to select at each iteration the best solutions of a population based optimisation method (e.g., NSGA-II) has a significant impact on the execution times.

Another example is related to the ever increasing amount of available data, a.k.a. big data.

This trend stresses the urge to being able to make informed decisions on very large sets of data, that explode both in terms of number of considered alternatives and the number of criteria that are taken into consideration. Reducing the cost of comparing one such alternative with respect to the others is therefore of high relevance.

As a last example let us cite the use of P R O M E T H E E I I for tackling combinatorial problems, as for instance, on-line parametrisation interfaces that allow a user to configure features of a car or a computer, or a navigation system to select the best route under consideration of several criteria. The combinatorial nature of such applications makes the number of possible configurations explode. Being able to help a customer/user to find his way through the huge set of options would also be relevant.

In this section, we propose an approach to approximating PROMETHEE’s net flow scores based on a continuous extension of the basic, discrete formulation. Such an extension has already been proposed as the PROMETHEE IV method (Brans, Vincke, and Mareschal 1986), but has, to the best of our knowledge, neither been developed nor implemented since then, except for one recent application in Civil Engineering (Albuquerque 2015). Finally, in the concluding section, we briefly present an improved but still unpublished model that should greatly outperform the published approach.

1.4.1 The piecewise linear approximation model (PLA)

At a formal level, assuming an infinite number of actions impedes using indices for referencing one specific action. In the sequel we will therefore identify an action y by its evaluation vector y = { y ₁ . . . y _h } and use a statistical approach by expressing the probability for each criterion h to evaluate to a certain value y _h by the probability density function f _h (y _h ). Note that if we consider the criteria to be independent, the probability of an action with a given evaluation vector y would be given by the product of its unicriterion evaluation probabilities: f (y) = Q _m

h=1 f _h (y _h ).

The unicriterion net flow score on criterion h of any action y can be written as follows:

φ _h (y _h ) = Z ₁

0 ∆P _h (y _h , ξ )f _h (ξ)dξ (1.2)

To simplify (1.2), we make the assumption that for each criterion, the actions evaluations are uniformly distributed, i.e., f _h (y _h ) = 1, ∀ h ∈ H, y _h ∈ [0, 1]. Denoting ψ _h (y _h ) the net flow approxi- mation on criterion h for the evaluation y _h and simplifying (1.2) according to our assumption, we get

ψ _h (y _h ) = Z ₁

0 ∆P _h (y _h , ξ)dξ (1.3)

Integrating P R O M E T H E E ’s previously described piecewise linear preference function (Figure 1.2) into this equation yields:

ψ _h (y _h ) = y ⁻ _p + ^(y _2(p

⁻^q

⁻ ^y

⁻^p

⁾

²

h

− q

h

) − ^(y _2(p

⁺^p

⁻

_h

₋ ^y

⁺^q

_q

_h

⁾ ₎

²

− 1 − y ⁺ _p

(1.4)

(13)

qh ph 1-ph 1-qh 1 0

1 2

y

h

ψ

h

(y

h

)

Figure 1.3: Unicriterion approximation with PLA. The four circles indicate the ”control points”

of the approximation, defining three linear segments.

where the following auxiliary variables have been defined:

 

 



 

 

y _q ⁻ = max { 0 ; y _h − q _h } y _p ⁻ = max { 0 ; y _h − p _h } y _q ⁺ = min { 1 ; y _h + q _h } y _p ⁺ = min { 1 ; y _h + p _h }

As shown by (1.4), the unicriterion net flow score approximation ψ _h (y _h ) solely depends on the preference parameters q _h and p _h . Two particular cases are worth mentioning with respect to this approximation. First, if the indifference and preference parameters have the same value, the quadratic terms of (1.4) disappear and we obtain a piecewise linear expression. Second, if we further set both thresholds to zero, we obtain a usual criterion (the slightest difference in evaluations yields strict preference of one action over the other on a criterion). The resulting unicriterion net flow score expression, ψ _h (y _h ) = 2y − 1, is coherent with this type of criterion.

These particular cases suggest the idea of further approximating the expression by the follow- ing piecewise linear model:

ψ _h

^PLA

(y _h ) =

 



 

y _h + λ _h − 1 if y _h < λ _h

2y _h − 1 if λ _h ≤ y _h < 1 − λ _h y _h − λ _h if y _h > 1 − λ _h

(1.5)

where λ _h = ^q

^h

^+p ₂

^h

. The validity of this additional simplification (as indicated by the four “control points” on Figure 1.3) is supported by numerical experiments that show the little added value of keeping the polynomial form of (1.4).

Similarly to the discrete case, action y’s approximated net flow score is finally computed as the weighted sum over all criteria:

ψ(y) = X m h=1

w _h ψ _h (y _h ).

1.4.2 Main results of numerical simulations

In this section, we present the main results of simulations that have been carried out to put the

proposed approximation model to the test. Their aim is to answer two natural and legitimate

questions:

(14)

Table 1.1: Parameters used for the experimental investigation. Values in bold represent the most commonly used combinations and U _R represents a random value from a uniformly distributed range R.

Parameter Value(s)

Number of actions n 5, 10,100,1000,10000

Number of criteria m 2,3,5,7,10

Weights w _h U _[0,1] , with P

h w _h = 1

Thresholds p _h U _[0,

1

2

]

q _h U _[0,p

_h

_]

Evaluation distribution D1, D2, D3, D4, D5, MX

Ex post approximation models LiR, P3R

Ex ante approximation models PLA

Runs per instance config. N _trials 1000

1. What do we get?

In other words: What is the quality of the approximation with respect to the original results? Let us underline that, while P R O M E T H E E I I requires computing a net flow score for each action, its most commonly used outcome is a ranking. This is why we compare the respective rankings obtained with P R O M E T H E E I I and PLA, rather than their net flow scores.

2. What is the cost?

We could also formulate the question as follows: What do we “win” from applying PLA rather than P R O M E T H E E I I? Indeed, since the quality of the approximation is, by defini- tion, smaller than for the reference method, there has to be a gain in the computation cost.

The main potential drawback of P R O M E T H E E I I in our context being the cost of comput- ing the net flow scores, we compare the two competing models in terms of computation times. ²

From a methodological point of view, artificial instances of varying sizes (from 5 to 10 000 actions evaluated on 2 to 10 criteria) and with different evaluation distributions (described under Section 1.2) are generated. Preference parameters (weights and thresholds) are also randomly picked for each new instance. We use Pearson’s correlation coefficient ρ to measure the accuracy of the approximated ranking with respect to the reference ranking obtained with the original P R O M E T H E E I I method. For benchmarking purposes, the results provided with PLA are also compared with those of linear (LiR) and third order polynomial (P3R) regression models that have been calculated ex ante on the basis of P R O M E T H E E I I’s net flow scores. For each set of parameters, a 1000 tests have been run and for each test a newly generated instance has been used. The main parameters for the simulations are given in Table 1.1.

The main results of our simulations can be summarised as follows:

2

Paradoxically, the sorting of the actions according to their respective net flow scores (approximated or not),

which is the step that passes from these score to the actual ranking, is not integrated in the reported execution

times. This is considered as a constant value, although it may not be negligible in practice (it’s usually of complexity

O (n log n)).

(15)

LiR P3R PLA 0.94

0.96 0.98 ρ = 1

(a) n = 10 actions

LiR P3R PLA 0.94

0.96 0.98 ρ = 1

(b) n = 1000 actions

Figure 1.4: The box plots compare three types of net flow score approximation models: ex post linear regression (LiR); ex post third degree polynomial regression (P3R); and ex ante piecewise linear approximation (PLA). The results are shown for 1000 runs over mixed-distribution (MX) randomly generated action sets of 10 (resp. 1000) actions and 7 criteria.

0.6 0.7 0.8 0.9 1 0

0.2 0.4 0.6 0.8 1

x P(ρ > x)

n = 10

100 1000 10000

(a) m = 5 criteria

0.6 0.7 0.8 0.9 1 0

0.2 0.4 0.6 0.8 1

x P(ρ > x)

(b) m = 10 criteria

Figure 1.5: The probability P (ρ > x) to reach a standard Pearson correlation ρ that is at least as high as x for mixed-distribution (MX) randomly generated instances evolves as the number n of actions and m of criteria increases.

1. While the ex post P3R approximation model significantly outperforms all other models, PLA still gives very satisfying results, yielding Pearson correlation coefficients higher than 0.97 (Figure 1.4).

2. Figure 1.5 shows, for different instance sizes, the complement to 1 of the cumulated density function (CDF) of ρ (Pearson’s correlation coefficient), for a series of 1000 runs. Concretely, the plots give the approximated probability of reaching at least a given similarity, measured by the standard Pearson correlation ρ. In the following, we will often refer to this ratio as a measure of quality: the higher ρ, the better the approximation of P R O M E T H E E I I’s net flow scores by our piecewise linear model. Several observations can be done on the basis of these plots:

(a) As could be expected: a higher number of actions increases the approximation’s quality.

(b) The quality curve converges to an “extreme curve” (approximated by the plots for

n = 10000), which indicates that there exists an upper bound for the approximation

quality. In other words, whatever the instance size, it will not in general be possible

for our PLA-model to produce the same scores, and consequently to obtain the same

ranking as P R O M E T H E E I I.

(16)

10¹ 10² 10³ 10⁴ 0.6

0.8 0.9 0.95 1

n ρ

min

P(ρ > ρ

min

) = 0.80 0.90 0.95 0.99

(a) m = 5 criteria

10¹ 10² 10³ 10⁴

0.6 0.8 0.9 0.95 1

n ρ

min

(b) m = 10 criteria

Figure 1.6: For different probabilities P(ρ > ρ _min ), the plots show the minimum quality ρ _min that can be achieved with respect to the number of actions n and depending also on the number m of criteria.

(c) Taking the opposite point of view, the results also show that a satisfying approxima- tion (depending on a chosen quality level) can be reached, even for relatively small instance sizes that are frequently encountered in actual MCDA problems. Example:

For instances of n = 100 actions and m = 7 criteria, a correlation of ρ = 0.95 can be reached with a probability of almost 99%.

3. Figure 1.6 shows the same results from another perspective. For a given probability P that measures some sort of required accuracy level of the approximation quality, the plots represent the minimum quality ρ that is reached as a function of the number n of actions.

1.4.3 Conclusions & outlook

The approximation of P R O M E T H E E I I’s net flow scores alleviates the computational load of computing the exact scores and offers time complexities that are comparable with utility function evaluations.

While PLA provides a good correlation with P R O M E T H E E I I’s original net flow scores, a more demanding metric, such as, for instance, the hit rate (the ratio of absolutely well ranked actions with respect to P R O M E T H E E I I’s reference ranking), shows its actual limitations : only a small fraction of actions are ranked where they would be with P R O M E T H E E I I (Table 1.2). ³ As a qualitatively improved alternative and second (currently unpublished) contribution, we hereunder briefly propose an empirical distribution-based approximation model (EDA). It takes the evaluation distribution on each criterion into consideration and reaches an approximation quality that even outperforms the third order polynomial regression model P3R. For instances of 1000 actions evaluated on 7 criteria, for instance, the empirical distribution-based approximation ranks about 85% of the actions like P R O M E T H E E I I, but 250 times faster. Hence, it could be seen as an intermediate model, between PLA and the original method.

3

Let us stress that this metric is very severe. As we obtain good results for both proposed methods, PLA and EDA ,

on the rank correlation metric we have used this one to differentiate them. Despite the relatively weak results for

PLA on the hit rate, when compared with EDA, the former should not be discarded.

(17)

Table 1.2: Average hit rate (fraction of absolutely well ranked actions) with respect to the P R O M E T H E E I I reference ranking (1000 trials).

m = 5 m = 7 m = 10

n EDA PLA EDA PLA EDA PLA

10 0,97 0,61 0,97 0,57 0,97 0,56

100 0,87 0,15 0,87 0,21 0,87 0,15

1000 0,91 0,02 0,91 0,02 0,91 0,02

10000 − − − − − −

Table 1.3: Average CPU time (in milliseconds) required for computation (100 trials).

m = 5 m = 7 m = 10

n EDA PLA P2 EDA PLA P2 EDA PLA P2

10 41 0 1 57 0 1 81 0 1

100 41 0 2 58 0 2 83 0 3

1000 46 0 157 63 0 217 89 0 312

10000 77 3 21279 111 1 29619 166 3 43790

An empirical distribution based approximation

Rather than making assumptions about the distributions of the evaluations of a considered dataset, we take the empirical distribution of evaluations into account to improve the approxi- mation’s accuracy in comparison to PLA. Having, in expression (1.2), the integration of a product of functions, the main underlying idea of this approach is to integrate it by parts. This simplifies the expressions into

ψ _h (y _h ) = h

∆P _h (y _h , ξ) F _h (ξ) i 1 0 −

Z 1

0 d∆P

_h

dξ (y, ξ) F _h (ξ) dξ (1.6) where F _h (ξ) = R ξ

0 f _h (x) dx is the cumulated density function (CDF) of f _h (ξ). The numerical integration of the second term is done by a Riemann sum.

First very encouraging simulation based results (Table 1.2) show that the empirical distribution based approximation (EDA) strongly outperforms PLA in terms of approximation quality (hit rate), yet has a computational complexity of O (n log n) (for sorting the evaluations in increasing order to compute the CDF), which is significantly better than P R O M E T H E E I I’s O n ²

(due to pairwise comparisons of all actions; Table 1.3).

A possible concrete application of PLA

As one possible application of the proposed approximation model(s), let us come back to the initial problem that triggered most of the research questions of this thesis: the integration of the P R O M E T H E E I I preference model into a population-based evolutionary multi-objective optimisation algorithm. The PLA model could alleviate two major drawbacks of P R O M E T H E E

I I’s direct integration in such an algorithm:

(18)

1. Although this may not be deduced directly from the results provided (Table 1.3), it can be safely ventured that the time required for score evaluations would significantly drop with PLA in comparison to P R O M E T H E E I I and reach comparable values like for MAUT-based approaches.

2. Less obviously, the approximated approach could also bring a second advantage with respect to P R O M E T H E E I I. From one iteration to the next one, the population of solutions focuses on a narrower solution domain. The differences between the solutions of the population therefore also get smaller and it is likely that, at some point, these differences are below the indifference threshold q _h on one or more criteria, yielding many (if not all) current solutions to be considered as indifferent to each other. The “optimisation pressure”

towards (a sub-region of) the Pareto optimal frontier would hence be reduced and further convergence would be impeded. Using the continuous approximation of PLA would, on the other hand, permit:

• not to be sensitive to this reduced set of actions (since it inherently is independent on the other actions involved)

• to be a weighted sum of strictly monotonous functions and therefore yield different net flow scores for the slightest difference in evaluations. This offers a means of comparing actions, even if the evaluations of the current population are very close (which is expected to occur, eventually, when the algorithm converges into a narrow

region of the Pareto optimal frontier).

As a further argument for using an approximation model in this specific application context, let us recall that, for population-based approaches, a set of best current solutions is used. Hence, the absolute rank of one particular action is not a crucial information during the process and it can reasonably be conjectured that the best solutions selected from the current population would be (almost) the same as those selected with P R O M E T H E E I I.

The application described here has not been implemented yet. This could be the topic for a future research project.

1.5 Eliciting P R O M E T H E E ’s weight parameters

One mandatory step for using multi-criteria decision aid methods is to provide the method at hand with a formalised expression of the decision maker’s (DM) preferences (Öztürk, Tsoukiàs, and Vincke 2005), generally referenced to as the preference parameters. Due to the often encoun- tered difficulty for decision makers to provide actual values for these parameters, methods for inferring them have been developed over the years (Bous et al. 2010; Dias et al. 2002; Greco et al. 2011; Mousseau 2003). In this section we briefly present two methods that have respec- tively been published as a conference paper (Eppe, De Smet, and Stützle 2011) and a journal article (Eppe and De Smet 2014a). For both methods, we follow the aggregation/disaggregation approach (Mousseau and Słowi´ nski 1998) for preference elicitation: given a set of actions, the DM is asked to provide holistic information about his preferences. She states her overall pref- erence of one action over another rather than giving information at the preference parameter level, since the former seems to be a cognitively easier task.

To the best of our knowledge, only few works on preference elicitation exist for P R O M E T H E E

I I. Frikha, Chabchoub, and Martel (2010) propose a method for determining the criteria’s rel-

ative weights. They consider two sets of partial information provided by the DM: (i) ordinal

(19)

preference between two actions, and (ii) a ranking of the relative weights. These are formalised as constraints of a first linear program (LP) ⁴ that may admit multiple solutions. Then, for each criterion independently, an interval of weights that satisfies the first set of constraints is deter- mined. Finally, a second LP is applied on the set of weight intervals to reduce the number of violations of the weights’ partial pre-order constraint. Sun and Han (2010) propose a similar approach that also limits itself to determine the weights of the P R O M E T H E E preference pa- rameters. These, too, are determined by resolving an LP. Finally, Özerol and Karasakal (2008) present three interactive ways of eliciting the parameters of the P R O M E T H E E preference model for P R O M E T H E E I and II.

Although most methods for inferring a DM’s preferences found in the MCDA literature are based on the resolution of linear programs (Mousseau 2003) or otherwise using linear constraints (Bous et al. 2010), some more recent works also explore the use of meta-heuristics to tackle that problem (Doumpos and Zopounidis 2011). In particular, Fernandez, Navarro, and Bernal (2009) uses the NSGA-II evolutionary multi-objective optimisation (EMO) algorithm to elicit ELECTRE III preference parameters in the context of sorting problems.

In this work, we propose two eliciting procedures for P R O M E T H E E I I to (help) find the set of preference parameters that will represent a decision maker’s own preferences as good as possible. The approaches are respectively based on i) a linear programming iterative process and ii) a multi-objective optimisation algorithm. First, we formalise the aggregation/disaggregation approach for P R O M E T H E E I I, which is common to both contributions. This section is followed by a closer description of each method in particular.

When eliciting preferences by an aggregation-disaggregation approach (Jacquet-Lagrèze and Siskos 1982), the DM is asked to provide (partial) information about his preferences. Based on these, the parameters of the chosen preference model, i.e., the weights of P R O M E T H E E I I in our case, will be approximated. Let us denote Ω ₀ the domain of possible parameter values.

Eliciting preferences is thus the search for the parameter set ω ⁰ ∈ Ω 0 that represents the decision maker’s preference in the “best possible way”. For a MCDA ranking method this means that the found set of parameters ω ⁰ should lead to the same ranking R ⁰ as the implicit one, denoted R (that we assume the DM has in his mind and that will serve him as reference to answer the queries). The process iteratively reduces the domain of possible parameter values Ω ₀ through the answer to holistic queries that are proposed to the DM (Figure 1.7). Subsequently, the number of compatible rankings | R | also reduces during the process. For each iteration k:

1. A query q _k is generated. It is added to the set of so far generated queries Q _k = Q _k−1 ∪ q _k , with Q ₀ = ∅ . We will exclusively consider “closed” queries, that is, any query q _k = { α k1 . . . α kp } that can be defined as a set of partial preference statements (each being denoted by α _kl ) from which the DM has to pick one. For instance, for a pairwise comparison of actions a _i and a _j , we would have q _k = { α _k1 = “a _i a _j ”, α _k2 = “a _j a _i ” } : the DM could only select one of these two statements as answer.

2. The decision maker makes a statement. Each answer to a query adds a constraint on the admissible parameter values. The precise form and examples of the constraints are given below. The set of constraints at iteration k is updated: C _k = C _k ₋ ₁ ∪ c _k , with C ₀ = { “w ₁ ≥ 0”, “w ₂ ≥ 0”, . . . , “w _m ≥ 0”, “ P m

h=1 w _m = 1” } . C ₀ contains only the basic constraints for weights.

4

As the threshold values are assumed to be constant and given, net flow score calculations are linear with respect

to the weights.

(20)

Q q ₁

q ₂ q ₃

q ₄

C _k c ₁

c ₂ c ₃

c ₄

Ω 0

Ω ₁ Ω 2

. . .

R ₀ R ₁

R ₂ . . .

Figure 1.7: From a query to its consequences on the preference parameter space.

3. The compatible parameter domain Ω _k is computed. Each statement should reduce the size of the compatible parameter domain and converge to the parameter values that should best represent the DM’s preferences. If we denote Ω _k the compatible domain after the DM has answered the k-th query, we have Ω ₀ ⊇ Ω ₁ . . . ⊇ Ω _k . For this last transitive relation to hold true, we however need to assume that the decision maker’s statements are consistent.

While this is guaranteed for the second proposed method (extended Q-Eval), it is not for the first one (the EMO approach).

4. The set of compatible rankings R _k is deduced. The set of rankings R _k that can be gen- erated from all the weights inside the k-th compatible parameter domain Ω _k is finite.

Depending on the instance size n, this set can nevertheless be very large (n!). During the eliciting process, the goal is to reduce this number as much as possible, striving to have only one ranking left at the end, that hopefully equals the implicit ranking we as- sume exists in the DM’s head. As for the successive compatible weight domains, we have R ₀ ⊇ R ₁ . . . ⊇ R _k .

Let us note that a constraint can be direct or indirect, depending on the type of information provided. Direct constraints will have an explicit effect on the preference model’s possible parameter values (e.g., the relative weight of the first criterion is greater than ¹ ₂ ), while indirect constraints will have an impact on the domain Ω 0 (e.g., the first action is better than the fifth one).

We here focus on indirect constraints that result from holistic preference statements. Concretely, we consider three types of queries:

PAC Pairwise action comparison (allowing only strict preference of one action over another) (Siraj 2011). It is one of the most widely used and simplest way of expressing partial

preferences.

Example: Considering two actions, a i , a j ∈ A, we have q _PAC = { “a i a j ”, “a j a i ” } ; the DM can answer either that he prefers a _i to a _j , or a _j to a _i .

POS Preferred action from a subset of A. (Binshtok et al. 2009; Domshlak et al. 2006; Viappiani and Boutilier 2010)

Example: We consider a subset of three actions a _i , a _j , a _k ∈ A. q _POS = { “a _i a _j ∧ a _i

a _k ”, “a _j a _i ∧ a _j a _k ”, “a _k a _i ∧ a _k a _j ” } ; the DM selects one action: a _i , a _j , or a _k as

the most preferred action from the subset { a i , a j , a k } .

(21)

ASR Action sub-rankings on a subset of A. As for POS , the DM is given more than two actions that he is asked to rank from the best to the worst.

Example: Again, we consider three actions a _i , a _j , a _k ∈ A, but here we look for complete rankings: q _ASR = { “a _i a _j a _k ”, “a _i a _k a _j ”, “a _j a _i a _k ”, “a _j a _k a _i ”, “a _k a _i a _j ”, “a _k a _j a _i ” } ; the DM selects one of these rankings.

Note that we assume that the DM is always able to state a preference of one action over the other. In other words, we do not consider the possibility of answering that two actions are incomparable or indifferent. Although using indifference could be promising (Stewart 1984) in strongly reducing the domain of compatible weights, we have not considered it in the current state of the work.

Let us develop the PAC type of queries. Considering the thresholds q _h and p _h as given for each criterion h ⁵ , the eliciting of weight parameters by means of pairwise action comparisons yields a set of linear constraints. Without loss of generality, let us assume that the answer to the k-th query q _k = { “a _i,k a _j,k ”, “a _j,k a _i,k ” } is inducing the constraint c _k : “a _i,k a _j,k ”, which is expressed as follows in terms of action net flows:

a _i,k a _j,k ⇐⇒ φ(a _i,k ) > φ(a _j,k ) ⇐⇒

X m h=1

w _h [ φ _h (a _i,k ) − φ _h (a _j,k )] > 0

Hence, the (simplified) compatible weight domain Ω _k after k answered PAC queries is defined by :

Ω _k :

 

 

 

 



w h ≥ 0 , 1 ≤ h ≤ m − 1

P m − 1

h=1 w _h ≤ 1 P _m−1

h=1 w _h [ ∆φ _h (a _i,l , a _j,l ) − ∆φ _m (a _i,l , a _j,l )] + ∆φ _m (a _i,l , a _j,l ) > 0 , 1 ≤ l ≤ k

Note that, in the preceding expression, the actual constraints resulting from the decision maker’s answers integrate the fact that P

h ∈ H w _h = 1, i.e., w _m = 1 − P

h ∈ H \{ m } w _h .

The effects of pairwise comparison constraints on the compatible parameter space are illus- trated on Figure 1.8(b). For the other two considered types of queries, POS and ASR , determining the compatible domain is done in a very similar way. It suffices to notice that both can be ex- pressed as a set of PAC constraints.

For the sake of clarity, we will illustrate features of the compatible weights domain Ω in the case of m = 3 criteria. In particular, we show the planar projection of the three dimensional constrained hyperplane (Figure 1.8). However, the results and descriptions are, unless mentioned otherwise, not dependent of the number of criteria.

While the two methods presented hereunder share the same aggregation/disaggregation ap- proach (Jacquet-Lagrèze and Siskos 1982; Mousseau 2003), they each have their distinctive features and their particular focus:

• The proposed extended Q-Eval method starts from the complete domain of possible weight combinations. ⁶ The aim of this method is to reduce this domain as strongly as possible,

5

In particular, we consider the thresholds also to be constant throughout the process, i.e. the DM may not change them during the eliciting procedure. The preference function types are also fixed.

6

Note that this is not a mandatory assumption. The domain of possible weights that is used as starting point of

the method could already be reduced, for instance due to other constraints that are known beforehand.

(22)

0 1 w ₁

1 w ₂

1 w ₃

(a) Domain of possible weights for 3 criteria.

0 1 w 1

1 w 2

Ω ₄

w ^?

(b) Example of constraint statisfaction domain Ω

4

for 4 constraints.

Figure 1.8: Planar projection of a three-criteria compatible weight domain (a) onto a plane (b).

The dot on the right plot represents the reference weight ω ^? , i.e., the weight that is randomly generated during the simulations to determine the corresponding ranking which is used to answer the queries proposed by the procedure. The surrounding white polytope represents the domain of weight vectors Ω ₄ that satisfy all constraints at iteration 4. The darker an area, the more constraints are violated inside that area.

using the minimum number of queries. In other words, the goal is to put the minimum bur- den on the decision maker and to propose queries that will maximise the “discriminating power” (formally defined below) of each query.

• For the EMO approach, the focus is not put on the generation of the queries. The starting point is rather an existing set of queries and deriving constraints that have already been respectively asked and applied to the preference parameter space. The domain of compat- ible weights is already reduced and the method is searching for the best possible weight vector inside this very demain, simultaneously taking different objectives into account. In particular, the proposed algorithm searches for sets of weights that: 1) do violate the set of given constraints as little as possible; 2) induce the most stable rankings possible.

It follows quite naturally, that both methods could be used in conjunction, one after the other, in an actual preference eliciting problem. The intervention of the decision maker would be limited to the first phase (extended Q-Eval); the second EMO phase would automatically select, within the compatible domain, the “most suitable” set of preference parameters. The two subsequent sections follow this possible application logic to present the methods.

Let us stress that, contrary to the EMO approach, Q-Eval assumes that the decision maker answers the queries in a consistent way (no mistakes, no doubts, always providing an answer, etc.). This important assumption is motivated by the fact that it simplifies 1) the problem, our hope being to highlight more fundamental features of the eliciting process on a reduced problem;

2) the design of our simulation based study, because answers can be given without having to integrate further (arguable) uncertainty models.

1.5.1 The extended Q-Eval method

As has been shown, different ways of querying the DM can be considered, like pairwise com-

parisons of actions, rankings of subsets of actions and/or weights, etc. This first contribution

(23)

on eliciting P R O M E T H E E I I’s preference parameters is motivated by the observation that all queries do not have the same "value" to the eliciting process. Indeed, some queries could be very simple to answer (e.g. pairwise comparison of actions), but could require a lot of likewise queries to get sufficiently robust values for the parameters. Other questions, on the contrary, could be very demanding for the DM, but have a strong "eliciting power". An extreme example of the latter would be to ask explicitly for the ranking of the DM (the one we assume he “knows”

without being able to formally express it). Two conflicting objectives in particular should be taken into account in this context: minimising the cognitive difficulty of questions and/or the quantity of information required from the DM while maximising the added value of each query in the eliciting process. The information given by the DM is thus characterised by both its nature and its quantity (e.g. the number of pairwise comparisons).

For the k-th query q _k , let us define its discriminating power ∆ _k as follows:

∆ _k = 1 − max _α∈ q

_k

k Ω _k ₋ ₁ ∩ Ω _α k k Ω _k ₋ ₁ k ,

where k X k represents the hypervolume of the domain X. Practically, we approximate the hyper- volumes by means of a Monte Carlo sampling. Ω _k ₋ ₁ represents the domain of possible parameters of the constraint sequence C _k ₋ ₁ , and the numerator is the size of the biggest possible domain of compatible weights at the k-th query for the set of possible answers to query q _k . Ω _α is the domain that is compatible with the sole constraint induced by answer α. In other words, by his answer to each new query (for instance, one action from a set of two or more actions, etc.), the DM implicitly adds a constraint from a set of possible constraints. Each of these constraints should reduce the size of the compatible domain. The numerator is representing the least favorable case in the sense of compatible domain reduction: it corresponds to the maximum domain size associated with each possible choice at a given stage. As for the information structure, the dis- criminating power ∆ _k of a query q _k depends on the previous queries’ history Q _k = { q ₁ , . . . , q _k } . It is thus practically not computable beforehand when given a concrete problem. Quite obviously, the aim of an efficient eliciting procedure is thus to select, at each step, a query with the highest possible discriminating power. The adaptive query generation procedure described here follows this logic.

Because of the burden that it represents for the decision maker, the questioning process should be optimized in order to minimize the amount of information required from the decision maker.

We have decided to base our approach on the Q-Eval method (Iyengar, Lee, and Campbell 2001), although it has been proposed in the context of MAUT. In this paragraph, we briefly introduce the extended Q-Eval method as it has been adapted to P R O M E T H E E I I, integrating (limited versions of) the different query types presented in the previous section.

The main idea of Iyengar, Lee, and Campbell’s original iterative approach is to select the next query (a pair of actions to be presented to the decision maker, asking to state which one he prefers) in such a way that it splits the hyper-volume of admissible weights (the weights that are compatible with all preceding queries) as evenly as possible. Practically, this means selecting the query (the pair of actions) that maximizes, at step k, the discriminating power ∆ _k . After reducing the dimensionality m ⁰ = m − 1 of the problem by integrating the fact that P m

h=1 w _h = 1 into the relations, we obtain a base set of m constraints that defines the compatible weights domain Ω ₀ : w _h ≥ 0, ∀ h ∈ { 1, . . . , m ⁰ } and P _m

⁰

h=1 w _h ≤ 1. At each iteration, we compute the

analytical center of that polytope. For the base set of constraints, at the first iteration, the center

is trivially given by w = { _m ¹ , . . . , _m ¹ } . However, as considering all possible pairs of actions would

be computationally prohibitive for even mid-sized problem instances, Iyengar, Lee, and Campbell

propose a three-stage heuristic to find at each iteration k the next query in a reasonable time

(Figure 1.9):

(24)

w

1

w

2

w ^c _k ₋ ₁

(a) (b) (c)

Figure 1.9: Schematic representation of the three major steps of Q-Eval’s iterative process, as proposed in Iyengar, Lee, and Campbell 2001. The above plots represent iteration k of the process.

1. the actions with the β best corresponding net flow scores (based on the current analytic center w ^c _k of the compatible weight domain Ω _k ₋ ₁ ) are selected;

2. the γ pairwise comparison constraint hyperplanes (using only the β best actions) that are closest to the analytic center are kept;

3. the action pair whose constraint hyperplane most equally splits the current polytope into two sub-polytopes is selected as next query.

The parameters β and γ define how the complexity of finding the best, i.e., most discriminating, next query is reduced. For our experiments, we take β = 30 and γ = 10.

The decision maker answers the query and by doing so, imposes a new constraint on the parameter space. The analytical center of the new polytope is then computed (as described in Bous et al. 2010) and the whole process is repeated until the required number of queries is reached or there is no further discriminating pair of actions to compare.

The original paper generates exclusively pairwise action comparison ( PAC ) queries. We now extend Q-Eval to POS and ASR query types. However, because of the computational overhead of computing such queries for bigger subsets, we limit ourselves to triples of actions and denote these query types respectively POS -3 and ASR -3. Let us describe the method for POS -3:

1. as before, based on the current analytic center w ^c , the actions with the β best corresponding net flows are selected;

2. as opposed to the original procedure, we now consider all subsets of three actions from the set of β selected actions. Let us denote one such triple as (a, b, c). As already mentioned, q _POS = { “a is preferred”, “b is preferred”, “c is preferred” } is equivalent to the set of PAC query pairs: { “a b ∧ a c”, “b a ∧ b c”, “c a ∧ c b” } . Let us further denote d _H

_ab

= | P _m

h=1 w _h ^c [φ _h (a) − φ _h (b)] | the distance of the PAC ’s query hyperplane to the analytical center, and d _H

_max

_(a,b,c) = max (d _H

_ab

, d _H

_ac

, d _H

_bc

) the maximum distance. We keep the γ action triples with the smallest values of d _H

_max

;

3. for each of the γ best triples we compute the standard deviation of the distribution of hyper-

volumes for the three possible outcomes of the query: “a _k is preferred”, “b _k is preferred”,

or “c _k is preferred”. We keep the triple that minimizes that standard deviation, because

this means that, on average, it is the best split of the compatible polytope, whatever the

decision maker’s choice.

(25)

Table 1.4: Parameter values used for the tests. When different values are given for one parameter, the one in bold represents the default value.

Parameter Value

Instance sizes

Number of actions n 10, 20, 50, 100, 200

Number of criteria m 3, 5, 7, 10

Fixed preference thresholds

Indifference q _h

0.05 0.10

,

0.00 0.50

,

0.20 1.00

Preference p _h

Algorithm parameters

Number of trials for each run n _T 50

Sampling size for quality estimation n S 500

Q-Eval parameters

Size of actions “short-list” β 30

Queries for hyper-volume comparison γ 10

This procedure can be adapted to ASR -3 without difficulty. Here again, the choice is based on the evaluation of all possible outcomes (six of them, for ASR -3). Although the method described here could be further extended to more actions without any formal difficulty, the computation times, on the contrary, would increase strongly.

As will be shown in the results section, the different query types lead to complementary behaviours in the search for preference parameters. As an additional extension, we thus propose an adaptive method that has the ability to select, at each step of the process, for each query, the best suited next query from PAC , POS -3, and ASR -3. The selection is based on the discriminating power of the next query: q _k = arg max _q

0

∈Q

k

∆ _q

0

, where Q k = { q _k,PAC , q _k,POS ₋ ₃ , q _k,ASR ₋ ₃ } . The main aim of this adaptive behaviour is to further reduce the number of queries with respect to the original Q-Eval method. Hence, we end up with a two stage query generation procedure that is applied for each new query in the eliciting process:

1. For each query type ( PAC , POS -3, or ASR -3), the next query is determined, based on the (adapted) Q-Eval method. The discriminating power for each of the three queries is com-

puted.

2. The query with the highest discriminating power is selected and presented to the decision maker. Note that, while ASR -3 queries will most probably have the highest discriminating power, in the later stages of the process, there may not be enough actions left to build such queries. In that case, the other types of queries are used. The same yields, still later in the process, for the POS -3 queries.

Main results

The extended Q-Eval method has been validated by means of repeated simulations performed on

artificial data sets. The work-flow of the experimental process is described in Algorithm 1. In

particular, the algorithm uses the following functions:

(26)

Algorithm 1: Standard experimental process Input: n, m, k, q, p, N _trials , N _samples

for i = 1 . . . N _trials do

randomly generate a set of actions A _i and a reference weight vector w _i ^? ; compute the unicriterion net flow matrix φ i (A i , q, p);

compute the corresponding reference ranking R ^? _i = calcRanking(φ _i , w ^? _i );

initialize set of constraints C _i = ∅ ; for j = 1 . . . k do

q _ij = nextQuery(φ _i , C _i );

C _i = C _i ∪ calcAnswer(R ^? _i , q _ij );

sample weights w ~ = { w 1 . . . w N

_samples

} = genSamples(C i , N _samples );

τ _G _i = 1;

for s = 1 . . . N _samples do

compute the ranking R js = calcRanking(φ i , w s );

τ _G _i = min { τ _G _i , generalizedTau(R _js , R ^? _i ) } ;

calcRanking computes the rank of each action, based on its unicriterion net flow (that is fixed, once the threshold values are set) and a weight vector. This can be the reference weight w ^? _i , or any other weight (e.g. sampled compatible weight from the current iteration).

nextQuery generates the next query according to the (extended) Q-Eval procedure. It uses the unicriterion net flows and the current set of constraints that results from the history of already answered queries.

calcAnswer uses the reference ranking R ^? _i to answer the current query (which is a finite set of possible answers, the decision maker being asked to select his most preferred answer).

The answer is returned as an additional constraint to be added to the set of previous constraints.

genSamples randomly generates N _samples weights that are uniformly distributed on the compat- ible weight domain defined by the constraint set C _i .

Note that, for this contribution, only a uniform distribution has been considered for generating the artificial datasets. Applying other and mixed distributions (Section 1.2) to increase the representability of our results would be interesting and could be integrated in a possible future research.

To assess the quality of a ranking R (induced by a set of parameters ω) with respect to a refer-

ence ranking R ^? (induced by a set of reference parameters ω ^? ), we use the generalised Kendall’s

correlation coefficient, denoted τ G . As has been done in Eppe and De Smet 2012, we take a

pessimistic point of view by selecting the worst value of the generalised Kendall’s correlation

coefficient, denoted τ _G in the following, among the sampled weights that are compatible with

the so far provided constraints. In addition to this first metric, we also wish to study how many

different rankings could still be generated with the remaining set of constraints. Let us note that

this number of rankings, denoted ρ in the following, depends both on the instance size and the

sampling size. This means that the comparison makes the most sense between results of same

instance and sampling sizes. For the sake of comparability, the results presented in detail in the

following are therefore limited to mainly one type of instances. Practically, we choose to focus

on instances of n = 20 actions.

Three contributions to the P