Sensitivity index to measure dependence on parameters for rankings and top- k rankings

(1)

HAL Id: hal-01811360

https://hal.archives-ouvertes.fr/hal-01811360

Submitted on 8 Jun 2018

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Sensitivity index to measure dependence on parameters for rankings and top- k rankings

Antoine Rolland, Jairo Cugliari

To cite this version:

Antoine Rolland, Jairo Cugliari. Sensitivity index to measure dependence on parameters for rankings and top- k rankings. Journal of Applied Statistics, Taylor & Francis (Routledge), 2019,

�10.1080/02664763.2019.1671963�. �hal-01811360�

(2)

ARTICLE TEMPLATE

Ranking and top-k ranking robustness index to measure dependence to parameters

Antoine Rolland^a and Jairo Cugliari^a

aERIC EA 3083, Universit´e de Lyon, Lumi`ere 2, France

ARTICLE HISTORY Compiled May 20, 2018

ABSTRACT

A ranking depends on the data available on the alternatives for one hand, and for a second hand on a aggregation function in order to obtain a global score for each alternative. The choice of the aggregation function (e.g. a weighted sum) and the choice of the parameters of the function (e.g. the weights) may have a great influence on the obtained ranking. We introduce in this communication a ratio index that can quantify the respective part of the ranking due to the parameters and due to the data.

KEYWORDS

Ranking, index, robustness, top-k list

1. Ranking issues

Ranking items using composite indices is a very common issue that can be faced in several application fields. Ranking universities through the famous Shanghai index [2] or others [14]; ranking countries regarding their development levels [7]; ranking anything in newspapers (like ”best place to live, to study,...”); but also page-rank when asking a query to any web search engine, and so on.

The methodology to obtain such rankings is often the same: in a first step, an ac- curate set of evaluation criteria should be identified. Then data are collected in order to obtain a value for each individual on each criterion (which will be called ”variable”

in this paper, with a statistical point of view). Last, these informations are aggre- gated into a single score in order to obtain a general ranking which is, mathematically speaking, a total pre-order (i.e. a complete order on the set of individuals possibility including ex-aequo).

We focus in this paper in the last step of the rank construction, i.e. the aggregation process. We are very aware that the choice and construction of the set of variables for a composite index is a very touchy issue [5, 6, 13]. Moreover, quality, precision, availability of the data are also crucial issues that can greatly influence the accuracy of the final ranking. However, with an optimistic point of view, we suppose that all difficulties pointed out have been overcome. An unique synthesized score from several variables can be obtained through the use of an aggregation function. Even if a great number of aggregation functions have been identified [1, 9, 10], we choose in this paper

CONTACT Email: [email protected]

(3)

to study only the weighted mean as aggregation function. However, the index proposed in this paper can easily be generalized to other aggregation functions (except perhaps for computational reasons); the study of the variations of the Rank Robustness Index regarding the aggregation function is out of the scope of this paper. In a weighted mean, the final score, and therefore the ranking, is of course very dependent of the chosen weights (or more generally the chosen parameters in any aggregation function).

On the other hand, an individualawhich values on each variable are better than those of individualbshould necessarily be better ranked thanbin the final ranking, whatever the weights are. The final ranking then appears to depend both of the data and of the aggregation function parameters. If we consider the choice of the variables weights as a political act by the ranking-maker, i.e. a real choice from a human being and not a given parameter, then we can wonder if this choice has a weak or strong influence on the final ranking. This question is equivalent to studying the rank robustness with respect to the parameters (i.e. the weights) of the aggregation function.

Several previous works studied the ranking robustness up to the data, but not so much have explored the ranking robustness up to the weights. Among others, in [12]

Saisana, Saltelli and Tarantola present a methodology to test the robustness of a composite indicator, using sensitivity and uncertainty analysis, and point out several sources of uncertainty to obtain of a composite ranking. In [11] Permanyer presents descriptive tools that can be seen as a measure of robustness of a given ranking with respect to changes in the chosen weighting scheme. In [5] Forster, McGillivray and Seth particularly focus on the relationship between robustness and the statistical association between component variables. In [3], D’Agostino and Dardanoni investigate the problem of measuring social mobility when the social status of individuals is given by their rank. However, in these different works the robustness is merely never precisely quantified. We propose in this paper the definition of a global index which should be able to catch the robustness of a ranking with respect to the weights.

In section 3, we present a new Rank Robustness Index (RRI) which aims at answer- ing these question. In section 4 we study the RRI in the case of top-k lists, i.e. only the firstkitems of the ranking are a matter of interest. Experiments are presented in section 5.

2. Introducing examples

Let us present a first example with 3 individualsX ={x, y, z}and 3 variablesv1, v2, v3, which should all be minimized. The aggregation function used to determinate the ranking is a weighted sum, i.e. the individual with the lowest weighted sum of the 3 variables will be ranked first. There exist 6 different possible rankings onX, as shown in table 1.

rank R1 R2 R3 R4 R5 R6

1 x x y y z z

2 y z x z x y

3 z y z x y x

Table 1. The 6 different possible rankings (in columns) forX={x, y, z}

(4)

Let us now present three different situations. In situation 1, shown in table 2, x is preferred to y and z for all the variables, and y has better values than z for all the variables. Whatever the weights vector, the global score of x, y and z would be f(x) = 10, f(y) = 20 andf(z) = 30; the ranking would then bex first, y second and zthird. The weights have no influence on the final ranking.

v1 v2 v3

x 10 10 10 y 20 20 20 z 30 30 30

Table 2. Situation 1: In presence of an absolute winner the ranking is fully determined by data

In situation 2, as shown in table 3, the 3 individuals have totally symmetric values on the 3 variables. So each of the 6 rankings presented in table 1 are equally possible and the final ranking is totally dependent of the choice of the weights vector.

v1 v2 v3

x 10 30 20 y 20 10 30 z 30 20 10

Table 3. Situation 2: The ranking is fully determined by the weight vector.

Situation 3 is shown in table 4. The rank of x is always 1, whereas rank of y is 2 in half of cases and 3 in the other half of cases (same for z). The final ranking is then partially dependent of the chosen weights, and partially of the values taken by the individuals on the variables. This last example shows that a quantitative index of the respective parts of parameters choice and variables values should be necessary to precise the ranking robustness with respect to the parameters.

c1 c2 c3

x 10 10 20 y 20 30 30 z 30 20 30

Table 4. Situation 3: the ranking will be determined by both data and parameters.

3. Ranking robustness index (RRI)

Let us now formalize the framework: letf be an aggregation function fromX =R^p, set of the consideredmindividuals described onpvariables, toR. Suppose thatf depends of a set of parameters w. For example, if f is a weighted mean, w = (w1, . . . wp) is

(5)

the set of weights and f(x) = Pp

i=1wixi/Pp

i=1wi. We denote W the set of all the possible parameters sets for a given functionf. We suppose that the values on all the variables could be ordered by preference, and that the function f is non-decreasing with respect to the preference order of any variable. Therefore, we can use the function f to determine a preference ranking for all the considered individuals, with respect to the values taken on the variables, the aggregation function f and its parameters. We note r_ij the rank that f attributes to the i−th variant based on the j−th ranking.

Our question is then : for a given set of individualsX and an aggregation functionf, what is the influence of the parameterswon the final ranking?

3.1. Definition of RRI

We propose here a global index to measure the robustness of a ranking with respect to a change of the aggregation function parameters. The Rank Robustness Index (RRI) is based on an analysis of variance approach. The ranking onX depends on both:

(1) the values taken by each individual x on each variable i∈ {1, . . . , p}

(2) the parameters set w for the aggregation function.

R₁ R₂ R₃ . . . R_n

x₁ 1 1 5 . . . 2

x2 2 4 2 . . . 5

x₃ 3 2 1 . . . 4

. . . .

xm 5 3 3 . . . 1

Table 5. Example ofndifferent ranking on a setX ofmindividuals

Each parameters set w gives a ranking R on X. Then n different parameters sets givenranking, different or not, on themindividuals ofX. Table 5 presents an example of such a situation. The influence of the parameters on the rank of an individualxcan be measured by the intrinsic variability ofx rankings: the lowest the variability, the less influence have the parameters on the ranking. So one may focus on the classical analysis of variance theory (ANOVA) where the global variability is expressed as the sum of square deviations (SSD) of the rankings,

SSD_total=

m

X

i=1 n

X

j=1

(r_ij −r¯··)²,

where ¯r·· = (nm)⁻¹Pm i=1

Pn

j=1rij is the global mean. Applied to our framework, the SSD_total can be split into two factors: the variability due to the parameters (SSDparameters )and the variability only due to the values taken by each individual In mathematical terms,

SSD_total = SSDparameters+SSD_data. (1)

Proposition 3.1. RRI can be expressed as RRI = 1−12Pm

i=1vari

m(m²−1) (2)

(6)

Proof. The global SSD can easily be computed and is equal to SSD_total = n(m−1)m(m+ 1)

12 .

The SSD due to the parameters is equal to SSD_parameters =

m

X

i=1 n

X

j=1

(r_ij−¯r_i.)².

The SSD due to the data is equal to SSD_data=n

m

X

i=1

(¯r_i.−r..)¯ ²

where ¯r_i. is the mean rank of individual i.

As in usual ANOVA, we can compute the percentage of variance that is explained by the data versus the parameters. Therefore we define RRI as the ratio between the SSD depending on the data and the global SSD.

RRI = SSD_data SSDtotal

= 1−SSD_parameters SSDtotal

= 1−12Pm i=1var_i m(m²−1)

As far asvari are known,RRI does not depend onn. We develop below an estimation process ofRRI using nparameters sets.

3.2. Examples

Let us analyse the situations introduced above withRRIindex. In situation 1, shown in table 2,varx =vary =varz= 0 and thenRRI = 1. This means that all the variation comes from the data. Whatever the parameters, the rank of a specific individual is always the same. In situation 2, as shown in table 3, the 3 individuals have totally symmetric values on the 3 variables. So varx = vary = varz = var({1,2,3} = 2/3.

Then RRI(X) = 1− ^12×(3×2/3)_3×(9−1) = 0. This means that all the variation comes from the parameters, and therefore that an observed ranking is totally dependant of the parameters and not of the data. In situation 3, as shown in table 4, the rank of x is always 1, whereas rank ofyis 2 in half of cases and 3 in the other half of cases (same forz). Thereforevar_x= 0,var_y =var_z = 0.25. ThenRRI(X) = 1−12×(0+0.25+0.25)

3×(9−1) =

0.25. The interpretation is that the final ranking is dependent for 1/4 to the choice of the parameters, and for 3/4 to the values taken by the individuals on the variables

(7)

3.3. Links with Friedman statistic

Friedman non parametric test (see e.g. [8]) is an independence test on ranks for repeated measures. The test is based on the Friedman statistic

F_r= 12

n×m×(m+ 1)

m

X

i=1

S²_i −3n(m+ 1)

withntreatments/blocks andmsamples.S_iis the sum of ranks for each sample.RRIis linked toFras the individuals can be seen as samples and the different parameters sets as different treatments on the samples. It is straightforward to see thatn(m−1)RRI = Fr; therefore RRI can be interpreted as the mean of the Friedman statistic on the samples. It is well known that under the assumption of independenceF_r converge to a χ² law with m−1 degrees of freedom, which means in our framework that under the assumption of a total independence of the ranking to the parameters, n(m−1)RRI converge to a χ² law with m−1 degrees of freedom. The exact distribution of F_r under the assumption of partial dependence is not known. Therefore the distribution of observedRRI on a sample under the assumption ofRRI =α,α6= 0 is not known either, and we are unable to estimate an interval confidence ofRRI using statistical inference theory. To circumvent this obstacle we turn to an approximative estimate of the distribution.

3.4. Estimation and computational issues

As weights are generally not discrete, the set of possible weights is supposed to be infinite. It is then almost impossible to compute exact values of vari, and so the value ofRRI directly, unless the variance can be easily computed as in the examples above. Therefore, we have to determine an estimation of RRI. We propose to use a Monte-Carlo method to determineRRI as follows.

(1) Generate n different sets of f parameters randomly in the space of parameters vectors. We use for example a Dirichlet distribution in the case of generating weights for a weighted sum. Then use these n parameters sets to obtain n different scores vectors and sondifferent rankings on the set of m individuals.

(2) Compute the exact rank variance of each individual var_i^∗of thensampled rankings obtained in previous step.

(3) Compute RRI as one minus the ratio of the mean of individual variances divided by the exact total variance:

RRI[ = 1−12Pm i=1var_i^∗ m(m²−1)

(4) It is possible to repeat the point estimation to estimate also the variance of the estimator, and then to obtain a confidence interval of RRI. Ifm is the average value ofnRRI estimations ands²is the estimated variance ofnRRI estimations, then the confidence interval of RRI estimation is ]m−zs/√

n;m+zs/√

n[, with z the normal corresponding value of the choosen quantile (typically z=1.96 for a 95% confidence interval).

Experiments show that a good estimation of RRI can be obtained with a small number of parameters set. Typically, simulation of only 500 parameters sets leads to

(8)

good estimations of RRI. As an example, figure 1 and table 6 shows the variation of RRI standard deviation (estimated), for a set of individuals described by 3, 5 or 7 variables, with an average RRI around 0.75. The standard deviation is around 10⁻³.

Figure 1. Estimations of the RRI standard deviation function ofn

n 3 var. 5 var 7 var

100 1.7 10⁻³ 1.9 10⁻³ 1.8 10⁻³ 500 5.0 10⁻⁴ 3.4 10⁻⁴ 3.6 10⁻⁴ 1000 2.4 10⁻³ 1.8 10⁻³ 1.5 10⁻³

Table 6. Estimations of the RRI standard deviation function ofn

4. Top k-list and ranking robustness index (RRI)

4.1. Top k list definition

Until now we studied the case of a RRI index computed on the whole ranking. But often only the first elements of a ranking are a matter of interest. For example, a page- rank user will be interested in the first 10 results. A newspaper will typically focus on top-3 rankings. Therefore it is interesting to focus on the robustness of the top-k ranking to a variation of parameters. Then theRRI index will be more in accordance with the ranking user’s impression: as a matter of fact, variations in the first (or last) elements of a ranking appear to be more important than variations in the middle of the ranking. Please not that we focus on top-k rankings, as bottom-krankings can be obtained easily by symmetry.

Let us take for example the situation A described in table 7, where the objective is to minimize the value on each variable (less is better). It is obvious to see that as a values are minimum on each variable,awill always be ranked first. This is the same for bwhich will be always second. Asc,dandehave totally symmetric values on the three variables,c,dandewill be equi-probably ranked 3, 4 and 5. So in this case vara= 0, var_b = 0, var_c =var_d = var_e = 2/3, and RRI = 1−12×(0+0+2/3+2/3+2/3)

5×(25−1) = 0.8. If

we focus only at the first two elements of the ranking, as they are always the same in the same order, we obtainRRI = 1, which indicates that the top-2 ranking does not depend on the parameters of the aggregation function.

(9)

c1 c2 c3 c1 c2 c3

a 10 10 10 a 10 20 30

b 20 20 20 b 20 30 10

c 30 40 50 c 30 10 20

d 40 50 30 d 40 40 40

e 50 30 40 e 50 50 50

Table 7. Example: situations A (left) and B (right)

In situation B described in the same table 7, and always in a minimizing point of view, we have d and e which are always the last two ones, and a, b and c are equi- probably ranked 1, 2 and 3. In this case, we have obviously the same valueRRI = 0.8, as in situation 4. But if we focus only at the first two elements of the ranking (i.e. by giving the same rank 3 to all the individuals not ranked in the first two), we obtain the valueRRI_f = 0.375 (see computation below), which shows that the top-2 depends for about 62.5% on the parameters.

Following Fagin et al. [4], we define top k list, i.e. ranking when we only have the top k members of the ordering. As proposed in [4], a top k listR is a bijection from a domainD (intuitively, the members of the top-k list) tok. In our paper, we extend this definition assuming thatDis a subset of a discrete and possibly infinite set N of sizen∈N∪+∞. In order to formalize this presentation of a top-klist into a set ofn elements, we then choose what is called the “optimistic approach” in [4] and suppose that all then−kelements that are not in the topklist are rankedex aequoat thek+1 position. Therefore, a top-k listRis a bijection fromN to{1,2,3..., k, k+ 1, ..., k+ 1}.

4.2. Computing RRI on top k lists

RRI for top-k lists has the same definition as general case, proposed in equation 2:

RRI = SSD_data SSD_total.

The only differences are the values ofSSDdata and SSDtotal. Proposition 4.1. RRI for top k lists (RRIk) can be expressed as:

RRIk= SSDdata

SSD_total = 1− 6Pm

i=1var_i

k(k= 1) k+ (k+ 1)(1− _2m^3k). (3) Proof. The mean of 1,2, . . . , k, k + 1, . . . k + 1 is (k+1)(m−k/2)

m . The variance of 1,2, . . . , k, k + 1, . . . k+ 1 is computed as the mean of squares minus the squared mean. The mean of squares is

1 m

k(k+ 1)(2k+ 1)

6 + (k+ 1)²(m−k)

The squared mean is

(k+ 1)²(m−k/2)² m²

(10)

Then the variance of 1,2, . . . , k, k+ 1, . . . k+ 1 is var_tot = 1

m

k(k+ 1)(2k+ 1)

6 + (k+ 1)²(m−k)

−(k+ 1)²(m−k/2)² m²

= k(k+ 1) 6m

k+ (k+ 1)(1− 3k 2m)

Therefore as SSD_total = SSDparameters+SSD_data we can introduce RRI for top k lists as:

RRI_k = SSD_data SSDtotal

= 1−SSD_param SSD_total

= 1− Pm

i=1var_i m×vartot

= 1− 6Pm

i=1vari

k(k= 1) k+ (k+ 1)(1−_2m^3k)

Computing RRIk needs to determine the variance of the rank of each individual, for example by using estimation via a Monte Carlo process as in section 3.4.

5. Experiments

We explore in this section three different rankings as examples of situations where RRI can help to better understand the importance of the weights concerning a ranking.

We are not interested here in the criticism of the process to obtain the ranking - for giving just an example, most of these rankings use normalised scores on each criterion where the best score is automatically 100; this process should be avoid as it can lead to rank reversals. We focus on the fact that these rankings are obtained by a weighted average on several criteria, which is exactly the situation where RRI is useful.

Estimations have all been done with a sample of 1000 parameters vectors, which gives an estimated value of RRIk as proposed in section 3.4. We repeat this estimation 100 times to estimate the variance of RRI_k. We obtain estimations of RRI_k as the mean of 100 RRIk estimations. Standard deviation of the estimator is also provided by estimating standard deviation of the 100 RRI_k estimations. This is repeated for the top-k positions withk∈ {1,2,3,5,10, m}.

5.1. Three different situations

We present in the following the three different example rankings. Results and discussion are postponed until section 5.2. Graphs representingRRI_k distributions for each situation can be found in appendix.

(11)

5.1.1. OECD’s countries ranking

In May 2011, the Organization for Economic Co-operation and Development (OECD) proposed a new well-being index named “Better Life Index” (BLI)¹. The BLI evalu- ates the 34 member states of OECD on 11 criteria, like housing, income, education, education, governance, etc. Each criterion is evaluated on a scale ranging between 0 and 10. A global country score is obtained by a weighted mean of the criteria. As emphasized by OECD, the innovative aspect of the BLI is the possibility offered to anyone to choose her/his own weights (weights are integers between 0 and 5) in order to represent her/his own preferences on well-being indicators: “The OECD is NOT deciding what makes for better lives. YOU decide for yourself.” A study of this index has been proposed in [7], and enlighten the fact that the final ranking was poorly dependant of the selected parameters, i.e. whatever the parameters, it is always one of the same three countries which is ranked at the first place.

We present in table 8 the estimations of the Rank Robustness Index for OECD’s Better Life index (data obtained in may 2012).

k 1 2 3 5 10 34

RRI_k 0.245 0.345 0.422 0.571 0.770 0.927 σ_RRI 0.010 0.010 0.009 0.008 0.004 0.001

Table 8. OECD’s Better Life Index RRI estimations

5.1.2. “Where-to-study” cities ranking

The French magazine “L’Etudiant” is specialized in the student life, the student jobs, the student housing and specially student orientation. It publishes each year a ranking of 41 French cities to determine “where is the best city to study in France”. It does not rank universities, but cities, based on 9 criteria including for example life quality, housing possibilities, employment possibilities or international environment. Each city is evaluated on these 9 criteria, and the scores are averaged into a global score which is used to rank the cities. We present in table 9 the estimations of the Rank Robustness Index for ”Where to study”.

k 1 2 3 5 10 20 41

RRI_k 0.237 0.361 0.438 0.527 0.623 0.692 0.720 σ_RRI 0.010 0.011 0.012 0.011 0.009 0.010 0.007

Table 9. “Where to study” cities ranking RRI estimations

5.1.3. Universities ranking

Since the famous Shanghai ranking appeared in 2003, many others organisms proposed a world universities ranking, all based more or less on the same scheme, i.e.

evaluating the universities by giving a score on several criteria, and then averaging these scores on a global value. The Academic Ranking of World Universities (ARWU - best known as ”Shanghai ranking”²) considers every university that has any Nobel Laureates, Fields Medalists, Highly Cited Researchers, or papers published in Nature

1http://www.oecdbetterlifeindex.org/

2http://www.shanghairanking.com/ARWU2017.html

(12)

Figure 2.RRIkfor three different example rankings.

or Science. In addition, universities with significant amount of papers indexed by Sci- ence Citation Index-Expanded (SCIE) and Social Science Citation Index (SSCI) are also included. The ranking is obtained by averaging normalized score on 6 criteria with specific weights. The QS World University Ranking³has been also chosen for our study as the data are easily available, and the ranking is obtained by averaging the score on 6 criteria which weights are given in the methodology section of the website.

We restrict our study to the 20 first universities in the final ranking.

We present in table 10 some estimations of the Rank Robustness Index for ARWU (dated 2017) and QS World University Ranking ( dated 2018).

k 1 2 3 5 10 20

ARWU RRI_k 0.995 0.858 0.798 0.820 0.884 0.845 σ_RRI 0.003 0.002 0.003 0.004 0.003 0.004 QS RRI_k 0.744 0.662 0.590 0.536 0.603 0.673 σRRI 0.018 0.012 0.009 0.005 0.004 0.005

Table 10. Universities ARWU and QS ranking RRI estimations

5.2. Discussion

Results of the precedent section are plotted in figure 2. As we can see, the RRI changes according tok. In OECD’s index, the RRI is increasing with k, which means that the first elements of the ranking are less robust to a change of weights than the global ranking, i.e. the high top of the ranking is relatively dependent of the weight. The RRI score of 0.92 for the whole ranking shows the whole ranking is very robust with respect to the parameters, i.e. that countries ranked in the top of the ranking are always ranked in the top, whatever the weights are, and so for countries ranked in the bottom of the ranking. The RRI score of 0.57 for the top-5 shows that when focusing on the only first five countries, their ranks (inside the top-5) are more dependent

3http://www.topuniversities.com/qs-world-university-rankings

(13)

of the weights. The RRI score is even 0.25 for the first place of the ranking, which corresponds to the situation where two different countries can be ranked first with about the same probability of 50% each (see [7] for a detailed study). In this situation actually 25% of the variance for the first ranked country is given by the data (mainly only two countries can be ranked first) and 75% of the variance is due to the weights of the weighted sum (these two countries can be equaly ranked first depending of the weights).

In the “where to study” ranking, the RRI for the whole ranking is 0.72, which means that this ranking can be significantly modified by changing weights. Note that the magazine “L’Etudiant” generally focus on the best 3 cities. The RRI fork= 3 is about 0.44, which means that the best 3 ranking reflects at least the political choice from the magazine, through the choice of parameters, than just an objective situation.

In world universities ranking, the RRI is about 0.8 for ARWU and 0.6 for QS. It means that the rankings (and more the first position) depend marginally of the chosen weights. It means also that the QS ranking is more dependent to the choice of weights than the ARWU ranking, which is largely determined by the data. Please note that ARWU data are also product of political choices from the ranking builder, and then RRI=1 does not mean that the ranking is the ground truth, but that the ranking isn’t dependent to weights change.

6. Conclusion

We provide in this paper a Ranking Robustness Index based on an ANOVA approach.

This index is able to determine whether a specific ranking obtained by multicriteria aggregation is robust to a change of the aggregation operator parameters. The main interest of such an index is to specify if a given ranking is highly dependent of the choice of the parameters or not. As spotted in the examples, application fields of RRI are various and include any public ranking with available criteria data. Readers and users of these rankings (such as universities rankings) can then measure the influence of ”editorial” choices done by the ranking maker. However, using RRI for describing the robustness of a ranking with respect to the parameters is difficult if used as an absolute index, as a same value can recover several different situations. We recom- mend to use RRI more as a relative index for the ranking comparison, as between ARWU and QS rankings. Experiments have been done only on weighted average. Fu- ture work should focus on other aggregation operators like Ordered Weighted Average or Choquet integral.

References

[1] G. Beliakov, A. Pradera, and T. Calvo.Aggregation Functions: A Guide for Practitioners.

Springer Publishing Company, Incorporated, 1st edition, 2008.

[2] J.-C. Billaut, D. Bouyssou, and Ph. Vincke. Should you believe in the Shanghai ranking?

An mcdm view. Scientometrics, 84:237–263, 2010.

[3] M. D’Agostino and V. Dardanoni. The measurement of rank mobility. Journal of Eco- nomic Theory, 144(4):1783–1803, July 2009.

[4] R. Fagin, R. Kumar, and D. Sivakumar. Comparing top k lists. J. Discrete Mathematics, 17(1):134–160, 2003.

[5] J. E. Foster, M. McGillivray, and S. Seth. Composite indices: Rank robustness, statistical association, and redundancy. Econometric Reviews, 32(1):35–56, 2013.

(14)

[6] S. Greco, M. Ehrgott, and J.R. Figueira, editors. Multiple Criteria Decision Analysis : State of the Art Surveys. Springer, 2016.

[7] J Kasparian and A Rolland. Oecd’s’better life index: can any country be well ranked?

Journal of Applied Statistics, 39(10):2223–2230, 2012.

[8] E.L. Lehmann. Nonparametrics: Statistical Methods Based on Ranks. Holden-Day, San Francisco, 1975.

[9] S. Lin. Rank aggregation methods. Wiley Interdisciplinary Reviews: Computational Statistics, 2(5), 2010.

[10] J-L. Marichal. Aggregation functions for decision making, Decision-Making Process - Concepts and Methods. ISTE/John Wiley, 2009.

[11] I. Permanyer. Uncertainty and robustness in composite indices rankings.Oxford Economic Papers, 64:57–79, 2012.

[12] M. Saisana, A. Saltelli, and S. Tarantola. Uncertainty and sensitivity analysis techniques as tools for the quality assessment of composite indicators.Journal of the Royal Statistical Society, 168(2):307–323, 2005.

[13] M. Saisana and S. Tarantola. State-of-the-art report on current methodologies and prac- tices for composite indicator development. Report EUR 20408 EN. European Commission- Joint Research Centre, Ispra, 2002.

[14] A. Telcs, Z. T. Kosztyan, and A. T¨or¨ok. Unbiased one-dimensional university ranking - application-based preference ordering. Journal of Applied Statistics, to appear:1–17, 2015.

Appendix

Figures below represent more in detail the information summarized in figure 2. For each of them, a boxplot ofRRI_k is represented as a function ofk. Boxplots show that the variability of the estimates is reasonably low.

(15)

(a) OECD ranking (b) ”L’Etudiant” ranking

(c) QS ranking (d) ARWU ranking

Figure 3.RRIkfor the different examples studied.