Model-based evaluation of scientific impact indicators

(1)

Supporting Information

Mat´uˇs Medo1,∗ _{and Giulio Cimini}2, 3,†

1_{Physics Department, University of Fribourg, CH-1700 Fribourg, Switzerland} 2_{IMT School for Advanced Studies, 55100 Lucca, Italy}

3

Istituto dei Sistemi Complessi (ISC)-CNR, 00185 Rome, Italy (Dated: September 7, 2016)

PAPER CITATION COUNT VERSUS PAPER APPEARANCE TIME

1900 1920 1940 1960 1980 2000 year 0 2 4 6 8 10 median citation coun t (a) APS data 0 2 4 6 8 10 paper age 0 2 4 6 8 10 median citation coun t (b) APS data model with Ω∞

Figure S1. (a) The dependence of the median paper citation count on the publication year in the standard American Physical Society (APS) data that cover all papers published by the APS journals and citations among them (the covered period is 1893–2009). The citation count decrease for papers published after 2000 can be attributed to these papers not having time to reach their long-term impact. The citation count decrease for papers published before 1950

can be attributed to different citations practices (fewer references per paper). Unlike the model without Ω_∞, the data

feature no median peak for the earliest published papers.

(b) A comparison of the dependence of the median paper citation count in the model data with Ω∞(the basic model

setting; the error bars show three-fold of the standard error of the mean computed from 100 model realizations) and the APS data. Given the fact that the model was not calibrated with respect to this measurement, the two curves agree remarkably well.

∗_{matus.medo@unifr.ch} †_{giulio.cimini@imtlucca.it}

(2)

VARIATIONS OF THE BASIC SETTING −0.2 −0.1 0.0 0.1 0.2 more fitness noise ? • −0.2 −0.1 0.0 0.1 0.2 ? • −0.2 −0.1 0.0 0.1 0.2 ? • • • • −0.2 −0.1 0.0 0.1 0.2 ? • • • • −0.2 −0.1 0.0 0.1 0.2 20% random links ? _• −0.2 −0.1 0.0 0.1 0.2 ? _• −0.2 −0.1 0.0 0.1 0.2 ? • • • • −0.2 −0.1 0.0 0.1 0.2 ? • • • • −0.2 −0.1 0.0 0.1 0.2 base fitness mean _{? •} −0.2 −0.1 0.0 0.1 0.2 ? −0.2 −0.1 0.0 0.1 0.2 ? • • • • −0.2 −0.1 0.0 0.1 0.2 ? • • • −0.2 −0.1 0.0 0.1 0.2 base fitness random

precision

difference

b

et

w

een

a

giv

en

setting

and

the

basic

setting

from

Figure

4

? −0.2 −0.1 0.0 0.1 0.2 ? • −0.2 −0.1 0.0 0.1 0.2 ? • • • • −0.2 −0.1 0.0 0.1 0.2 ? • • • −0.2 −0.1 0.0 0.1 0.2 base abilit y a0 = 0 ? −0.2 −0.1 0.0 0.1 0.2 ? • −0.2 −0.1 0.0 0.1 0.2 ? • • • • −0.2 −0.1 0.0 0.1 0.2 ? • • • • −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 5000 researc hers total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE ? _• −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE ? • −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE ? • • • −0.3 −0.2 −0.1 0.0 0.1 0.2 0.3 total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE ? •

Figure S2. Precission difference with respect to the basic setting for individual metrics when various variations of this setting are considered (from top to bottom):

(1) the fitness noise amplitude η∗_{is increased from 0.2 to 0.5 (Ω}

∞= 1.48· 104),

(2) 20% of references made by a new paper target a random existing paper (Ω∞= 1.28· 104),

(3) the base fitness of a paper is obtained as the average of the authors’ ability values (Ω∞= 9.20· 103),

(4) the base fitness of a paper is a random sample from the authors’ ability values (Ω∞= 1.10· 104),

(5) the baseline ability of authors a0 is set to zero (Ω_∞= 1.06_{· 10}4_),

(6) the number of researchers is increased from 1000 to 5000 (Ω_∞= 6.90_{· 10}4_).

The different ground truth assumptions are, from left to right: author ability, average fitness of authored papers, author ability times activity, and total fitness of authored papers. The error bars show the three-fold of the standard error of the mean. For every setting and ground truth, the best-performing metric (in absolute terms) is marked with

a star (?), and all metrics that reach at least 90% of its precision are marked with bullets (_{•) whose color indicates}

(3)

AUTHORS AT DIFFERENT CAREER STAGES

Here we study the case where new authors are gradually introduced to the system, and check which indicators allow to fairly evaluate young researchers with respect to their senior colleagues. We thus consider a situation where 25% of researchers are active for the whole time; for the remaining 75%, author i begins their activity in a randomly

chosen step τi between t = 1 and t = T− 36. Here 36 is subtracted to only include researchers who spent sufficient

time in the system (by doing so, we are essentially considering only researchers who finished their PhD studies). In

this setting, the productivity of author i, initially drawn from_{G(k), is linearly rescaled by her appearance time in the}

system by a factor (1_{− τ}i/T ): young authors are thus in disadvantage with respect to seniors by having on average

less publications at the end of the simulation, and also by having less time to accrue citations.

Figure S3 presents the results obtained with this setting. The first observation is that most metrics actually perform similarly than in the basic setting where the group of active researchers does not change over time. The only metric that considers authors’ career length, m-quotient, is surprisingly performing worse in the new setting with researchers gradually entering the system. The reason lies again in the rescaling problem mentioned in the main text: in the new setting, there are more young users who only author their papers in the last years and yet outperform venerable researchers upon the rescaling, thus lowering the resulting precision more than in the basic setting presented in

Figure 4 of the main text. Specifically, there are on average 28_{± 8 authors with only one publication in the top 100}

positions of the ranking by the m-quotient and the average activity span of top 100 researchers is 60_{± 40 months (out}

of 120 in total). By contrast, for the h-index ranking there are no authors with only one publication in top 100 and

the average activity span of top researchers is 96_{± 21 months. The bias towards very young researchers is removed}

by the corrected m, which brings to no researchers with only one paper in top 100 and to an average activity span of

86± 28 months: the resulting precision is similar to that achieved with the h-index and the contemporary h-index.

Overall, the logarithm-based indices λE and λC are again the best performers against intensive and extensive ground truths, respectively. 0.0 0.1 0.2 0.3 precision P (100) total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (a) 0.0 0.1 0.2 0.3 0.4 0.5 precision P (100) total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (b) 0.0 0.2 0.4 0.6 0.8 1.0 precision P (100) total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (c) 0.0 0.2 0.4 0.6 0.8 1.0 precision P (100) total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (d)

Figure S3. For the setting when the number of active researchers grows in time (see the description in the text above), comparison of the mean precision values achieved by impact indicators with respect to different ground truth assumptions: (a) author ability, (b) average fitness of authored papers, (c) author ability times activity, (d) total fitness of authored papers. The horizontal dotted line marks the performance of the best metric (which is typed with bold letters); the error bars show three-fold of the standard error of the mean.

(4)

INTERMEDIATE GROUND TRUTH ASSUMPTIONS 0.0 0.1 0.2 0.3 0.4 0.5 precision P (100) total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (a) 0.0 0.1 0.2 0.3 0.4 0.5 precision P (100) total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (b) 0.0 0.2 0.4 0.6 0.8 1.0 precision P (100) total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (e) 0.0 0.2 0.4 0.6 0.8 1.0 precision P (100) total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (f) 0.0 0.2 0.4 0.6 0.8 1.0 precision P (100) total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (c) 0.0 0.2 0.4 0.6 0.8 1.0 precision P (100) total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (d)

Figure S4. Precision values achieved by individual metrics against intensive (a,b), midway (c,d) and extensive (e,f) ground truth assumptions: (a) author ability, (b) author ability times square root of activity, (c) author ability times activity, (d) average fitness of authored papers, (e) average fitness of authored papers times square root of activity, (f) average fitness of authored papers times activity—i.e., total fitness of authored papers. The first row of panels (a,c,e) refers to a benchmark obtained from the researchers’ potential, whereas, the second row (b,d,f) to a benchmark related to realized publication outputs. We assume here the basic simulation setting that was used to obtain Figure

(5)

0.0 0.1 0.2 0.3 0.4 0.5 precision P (100) total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (a) 0.0 0.1 0.2 0.3 0.4 0.5 precision P (100) total cit C mean cit E max cit M x -index _h-index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (b) 0.0 0.2 0.4 0.6 0.8 1.0 precision P (100) total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (e) 0.0 0.2 0.4 0.6 0.8 1.0 precision P (100) total cit C mean cit E max cit M x -index _h-index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (f) 0.0 0.2 0.4 0.6 0.8 1.0 precision P (100) total cit C mean cit E max cit M x -index h -index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (c) 0.0 0.2 0.4 0.6 0.8 1.0 precision P (100) total cit C mean cit E max cit M x -index _h-index con temp h g-index o-index m -quotien t corrected m total log cit λC mean log cit λE (d)

Figure S5. Same as Figure S4, but for the simulation setting with the number of active researchers increasing with