• Aucun résultat trouvé

Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data

N/A
N/A
Protected

Academic year: 2021

Partager "Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data"

Copied!
19
0
0

Texte intégral

(1)

Unbiased evaluation of ranking metrics reveals consistent

performance in science and technology citation data

Shuqi Xu

a

, Manuel Sebastian Mariani

a,b

, Linyuan Lü

a,c

, Matúˇs Medo

a,d,e,∗

aInstitute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, PR China bURPP Social Networks, University of Zurich, 8050 Zurich, Switzerland

cAlibaba Research Center for Complexity Sciences, Hangzhou Normal University, 311121 Hangzhou, PR China dDepartment of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, 3010 Bern, Switzerland eDepartment of Physics, University of Fribourg, 1700 Fribourg, Switzerland

Keywords: Citation networks Network ranking metrics Node centrality Metrics evaluation

Milestone scientific papers and patents

Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to iden-tify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics’ ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics’ performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other pop-ular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.

1. Introduction

Citation-based metrics for impact build on the premise that the number of citations received by a scientific paper (or a patent) is a reliable proxy for its scientific (or technological) impact. Such metrics are used not only to assess the impact of individual papers, but also to evaluate the overall research output of research units such as individual researchers (Hirsch, 2005;Medo & Cimini, 2016;Radicchi, Fortunato, Markines, & Vespignani, 2009;Zhou, Lü, & Li, 2012), research institutes (Charlton & Andras, 2007;West, Jensen, Dandrea, Gordon, & Bergstrom, 2013), and journals (González-Pereira, Guerrero-Bote, & Moya-Anegón, 2010;Harzing & Wal, 2009), for example. The relative ease with which new metrics of research impact can be designed has contributed to their proliferation (Mingers & Leydesdorff, 2015; Todeschini & Baccini, 2016; Waltman, 2016), and uncritical use of such metrics has eventually met a strong opposition (de Rijcke, Wouters, Rushforth, Franssen, & Hammarfelt, 2016;Hicks, Wouters, Waltman, de Rijcke, & Rafols, 2015;Leydesdorff, Bornmann, & Opthof, 2018).

∗ Corresponding author at: Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, PR China.

E-mail addresses:linyuan.lv@uestc.edu.cn(L. Lü),matus.medo@unifr.ch(M. Medo).

http://doc.rero.ch

3XEOLVKHGLQ-RXUQDORI,QIRUPHWULFV  

ZKLFKVKRXOGEHFLWHGWRUHIHUWRWKLVZRUN

(2)

Inparticular,scholarshaveemphasizedtheneedforunderstandingthetheoreticalfoundationsofimpactmetrics(Waltman, 2016),andevaluatinganalyticallyandempiricallythemeritsofthemetrics(Leydesdorffetal.,2018).

Despitetheuseofcitation-basedmetricsforresearchevaluationpurposesandtheincreasinglyrecognizedneedtobetter grasptheirmeritsandpitfalls,wedonotknowyetwhichmetricsbestdeliverontheirpromisetogaugethesignificanceof ascientificpaperorapatent.Thisgapispotentiallydangerous:Ifapaper-levelmetricisassumedtobeagoodproxyfor significanceanditisusedforresearchevaluationpurposes,yetitundervaluespaperswhosesignificanceisundeniable,its normativeuse(Leydesdorffetal.,2018)mightleadtodecisionsthatpenalizetrulysignificantresearch.

Weaimtofillthisgapbyprovidingacomprehensiveempiricalcomparisonofabroadrangeofrankingmetrics.Weassess themetrics’abilitytosingleoutscientificpapersandpatentsthathavebeenrecognizedbyfieldexpertsasgroundbreaking orseminal.Thecoreideabehindthisevaluationisthatmetricsthataimtogaugethesignificanceofapaper/patentshould beabletodetectpapers/patentswhoseoutstandinglong-termsignificancefortheinvolvedfieldsisundeniable.Ourgoalis toanswer,whethersomemetricsperformwellacrossdifferentcitationdatasets.Ifthatisnotthecase,whichcharacteristics oftheinputdatadecidewhichmetricisthemostsuitable?

Tothisend,weanalyzethreecitationnetworks:thescholarlycitationdatasetthatincludespaperspublishedbythe AmericanPhysicalSociety(APS),thecitationdatafromtheHigh-EnergyPhysicsLiteratureDatabaseINSPIRE(HEP),and theU.S.PatentOfficepatentcitationdata(PAT).Weuseexpert-selectedsetsofseminalnodesandassessrankingmetrics byhowwellaretheseminalnodesrankedbythem.Inparticular,weusemilestonepapersselectedbyAPSjournal edi-torsfortheAPSdata,community-curated“ChronologyofMilestoneEventsinParticlePhysics”fortheHEPdata,andthe listofsignificantpatentsbyStrumskyandLobo(2015)forthePATdata(seeDatasectionfordetails).Whilebynomeans exhaustive,theselistsofseminalpublicationsconsistofpapersandpatentsofexceptionalimportance(manypapershave, forexample,ledtoaNobelPrizetooneormoreoftheirauthors).Ourevaluationincludesninenetwork-basedranking metricsfromthescientometricsandnetworkscienceliteraturetogetherwiththeirtime-normalized(Mariani,Medo,& Zhang,2016)variants.Toprovideacomprehensivecomparisonofmetrics,wehavechosennetwork-basedmetricsthat havebeenusedinbibliometrics(Waltman,2016)ortheyhaveperformedwellinothernetworks(suchassocialand tech-nologicalnetworks).Additionally,weprovideresultsalsoforthepercentile-basedcitationcountwhichiscommonlyused inbibliometrics(Bornmann,Leydesdorff,&Mutz,2013;Leydesdorff,Bornmann,Mutz,&Opthof,2011).

Expert-selectednodeshavebeenusedbeforetoevaluaterankingsofauthors(Dunaiski,Geldenhuys,&Visser,2019a; Radicchietal.,2009),rankingsofmovies(Ren,Mariani,Zhang,&Medo,2018;Wasserman,Zeng,&Amaral,2015),rankings ofscientificpapers(Dunaiski,Geldenhuys,&Visser,2019b;Marianietal.,2016),andrankingsofcourtcases(Fowler&Jeon, 2008),forexample(seeDunaiski,Geldenhuys,&Visser,2018)forarecentin-depthdiscussionofthisevaluationapproach). Wemakehereanimportantmethodologicaldistinctionbydistinguishingtwosimilar,yetfundamentallydifferentranking tasks:

Task1Theusualtaskforarankingmetricistorankthegivenseminalnodesashighaspossible.Thisismotivatedbythe assumptionthatifseminalnodesareknowntobeofhighimpact,agoodmetricshouldplacethemintherankingofallnodes ashighaspossible.Toevaluatethemetrics’performanceinthistask,onetypicallyusestraditionalinformation-retrieval metricssuchasprecision,recall,andaccuracy(Dunaiski,Visser,&Geldenhuys,2016).

Task2Theneedforunbiasedevaluation,whichiscommoninscientometrics(Mingers&Leydesdorff,2015;Mutz&Daniel, 2012;Vaccario,Medo,Wider,&Mariani,2017),motivatesthesecondtask:Toranktheexpert-selectednodesashighas possiblewhilstrequiringthattherankingmetricisunbiased.Sincemoststructuralrankingmetricsarebiased(seeAppendix Aforademonstration),theirevaluationmustincludeapenaltyfortheperformancegainedthankstothemethods’bias. Citationdatacommonlyfeaturevariousbiasessuchasthefieldbias(Vaccarioetal.,2017),forexample.Wefocushereon theagebias;wesaythatagebiasispresentinthedatawhenthereisadependencebetweenthemeancitationcountand thepublicationage.Besidesbeingparticularlystrong,theagebiasisalsoexplicitandeasytomeasureasitisdeterminedby thepaper’spublicationdate.Itthusprovidesagoodtestbedfordealingwithbiasesintherankingproblem.

Aspartofourgoaltoevaluatetherankingperformanceofthechosenrankingmetrics,weaimtoelucidatethedifference betweenthetworankingtasksdescribedabove.Inparticular,weshowthatTask1canfavorbiasedmetricstosuchextentthat acustom-constructedrankingmethodbasedonlyonthebiasitself(inourcase,rankingbyage)caninsomecasesoutperform allstandardrankingmetrics.ThefactthatametricperformswellinthecommonTask1isthusnodirectindicationofitsability toassessthevalueorimpactofthenetwork’snodes.Wedemonstratethatthenormalizedidentificationrateintroduced inMarianietal.,(2016)appropriatelyaddressesTask2byimposingperformancepenalizationproportionaltothebias magnitudeoftheevaluatedmetric.UnlikeTask1,theresultsinTask2revealconsistentpatternsofmetrics’performance acrossthethreestudieddatasets.AsfurtherdiscussedinSection5.3,theproposedevaluationcanbealsointerpretedasa rankingproblemwhereagivensetofseminalnodesisinterpretedasapotentiallybiasedsamplefromalargergroupof high-qualitynodes.ThisfurtherincreasestherelevanceofTask2ascomparedtothecommonandstraightforwardapproach presentedinTask1.

Thepaperisorganizedasfollows.InSection2,weprovidealiteraturereview.InSection3,wedescribeandanalyzethe datasetsandthecorrespondingsetsofseminalnodes.InSection4,wepresenttheconsiderednetworkmetricsanddescribe variousperformancemeasuresofametric’srankingabilitybasedonseminalnodes.InSection5,wefirstaddressTask1, evaluatethemetricsbyhowwelldotheyranktheseminalnodes,andeventuallydiscussdrawbacksofthisevaluation

(3)

approach.InSection6,weaddressTask2wherethemetrics’biasistakenintoaccount,andexplainwhyarethus-obtained resultsmorerelevantthantheresultsobtainedwithinTask1.FinallyinSection7,wereviewthelimitationsandopen researchdirections,discussthemanagementimplicationsofourstudy.

2. Literaturereview

2.1. Citation-basedrankingmetrics

Citationimpactindicatorsareakeytoolinscientometricsandplayaprominentroleintheevaluationofscientificand technologicalpublications(Waltman,2016).Thegrowingdemandofevaluation informationfromresearchers,funding bodies,andresearchinstitutionsand theincreasingavailabilityofextensivedataonscholarlyactivityhavedriventhe proliferationofnewindicators(Mattedi&Spiess,2017).

Amongcitation-basedimpactindicatorsforresearcharticles,citationcount(referredasindegreeinthefieldofnetwork scienceNewman,2010;Zengetal.,2017)isthemostbasicandestablishedoneasithasbeenusedforrankingofscholarly publicationssincethe1970s(Liao,Mariani,Medo,Zhang,&Zhou,2017).Thebasicpremisecitationcountissimple:the mostinfluentialpublicationsarethemostcited.Themetric’ssimplicitycomesatacostasthenaturaldifferencesbetween citationsareneglectedbycitationcount(seeBornmann&Daniel,2008)foranextensivereviewofthecitingbehavior).In particular,citationcountassignsthesameweightstoacitationfromaground-breakingarticlepublishedinaleadingjournal andacitationfromanobscurearticle.TheseminalPageRankalgorithmfortheWorldWideWeb(Brin&Page,1998)assigns higherweighttoreferencesfromwebpagesthatarehighlyvaluedbythealgorithm.Chen,Xie,Maslov,andRedner(2007) appliedPageRanktoacitationnetworktomeasuretheimportanceofindividualscientificpublications,initiatingtheinterest inrecursivecitationimpactindicators(Waltman,2016).Sincethen,variousPageRankvariants(Waltman&Yan,2014)have beenproposed,ofwhichtheCiteRank(Walker,Xie,Yan,&Maslov,2007)isthebestknown.OfnoteistheHITS (Hyperlink-InducedTopicSearch)algorithm(Kleinberg,1999)whichassignstwoscores,hubscoreandauthorityscore,toeachnode.In thecontextofcitationdata,itisnaturaltoconsidervitalreviewsashubswhichciteotherinfluentialpublicationsthathave highauthorityscoreandhigh-impactarticleswhichtendtobecitedby,amongothers,reviewarticleswithhighhubscores (Nickerson,Chen,Wang,&Hu,2018).Anothernotablebranchofresearchimpactindicatorsistheh-index(Hirsch,2005) whichwasintroducedtoevaluatethescientificoutputofresearchersandlaterextendedbySchubert(2008)toassessthe impactofindividualpublications.Alargenumberofvariantsoftheh-indexhavebeenproposedintheliterature(Alonso, Cabrerizo,Herrera-Viedma,&Herrera,2009).

Besidesthemostusedcitationimpactindicators,includingcitationcount,PageRank,CiteRank,H-indexandHITSauthority score,weconsideralsoseveralnetwork-basedmetricsthatperformswellinotherrankingscenarios:LeaderRank,Collective Influence,andSemi-localcentrality.ThesemetricsareintroducedinSection4.

2.2. Rankingbiasincitationanalysis

Intheanalysisofcitationdata,citationcountsofdifferentpublicationscannotbedirectlycomparedastherearevarious sourcesofbiasthatcaninvalidatethevalidityofsuchacomparison.Themost-studiedbiasesareinducedbythepublication researchfield,ageanddocumenttype(Waltman,2016).Fieldbiasmanifestsitselfbythemeancitationcountofpublications ofasimilaragevaryinggreatlybetweenfieldsbecauseoftheirdifferentcitationpractices(Lundberg,2007).Theaverage citationcountdiffersbetween,forexample,naturalsciencesandhumanitiesbythefactorof10(Bornmann&Marx,2015). Theagebiasofcitationcounthastwodistinctcomponents.First,theaveragecitationcountofpapersofafixedagegradually increasesovertime(Martin,Ball,Karrer,&Newman,2013).Second,thecitationcountsnaturallygrowwiththeageof publicationsastheyaccumulatemoreandmorecitationswithtimewhichpreventscitationcountsofpapersofdifferent agefrombeingdirectlycomparable.Finally,citationcountsofpublicationsofdifferentdocumenttypes(suchasarticle, letter,review,andcommentary)shouldnotbedirectlycomparedwitheachotherbecause,forexample,reviewpaperstend toattractmorecitationsthanordinaryresearcharticlesandletters(Lundberg,2007).

Toovercomethesebiases,anumberofnormalizedcitation-basedrankingmetricshavebeendevelopedtoallowformore faircomparisons,suchasmean-basedmetricsandpercentile-basedmetrics(seeBornmann&Marx,2015;Waltman,2016 forreviews).Inthecaseofpaperandpatentcitationnetworks,thatareourfocusinthismanuscript,themostrelevantbias istheagebiaswhichisstrongandrelevantinbothpaperandpatentcitationdata.

Despitemanycitation-basedrankingmetricsbeingproposedinthepast,acomprehensivecomparisonoftheirranking performanceonvariousdatasetsisstilllacking.Todeterminewhichmetricsarebestsuitedtorankpapersorpatentsin citationnetworksisthefirstresearchgapthatweaimtofillbythisstudy.Thecommonapproachtoassessametric’sranking performanceisbasedonexpert-selectednodes.Thepreviousstudiesadoptingthisapproachfocussolelyontheranking positionsoftheexpert-selectednodesandignoretheconfoundingeffectsthatcanbeintroducedbythechoiceoftheseminal nodes.Forexample,ifexpertstendtoidentifyoldworksasseminal,arankingmetricthatsharesthisbiasgainsanadvantage anditspotentialsuperioritymaybeillusive.Wefillthissecondresearchgapbyexploringtheinterplaybetweenthebiasof theevaluatedrankingmetricsandthebiasoftheseminalnodes,andexploringanevaluationprocedurethattakesthebias oftheseminalnodesintoaccount.

(4)

Table1

Basiccharacteristicsofthenetworkscorrespondingtothethreeanalyzeddatasets:theirtimespan,thenumberofnodesN,thenumberofedgesE,andthe numberofseminalnodesS.

Dataset Label Timespan N E S

APSpapers APS 1893–2016 595,287 7,051,801 160

HEPpapers HEP 1764–2017 829,708 14,994,123 310

USpatents PAT 1926–2010 6,237,625 45,962,301 112

3. Data

Tocomparetherankingperformanceofnetwork-basedmetrics,weusethreecitationdatasets:theclassicalAmerican PhysicalSocietycitationdata,high-energyphysicscitationdata,andtheU.S.PatentOfficecitationdata.Eachdatasetcan berepresentedasagrowingdirectednetworkwherenodesgraduallyappearwithtime.Timeresolutionisonedayfor alldatasets.Nodesrepresentpapersorpatentsanddirectedlinksrepresentcitations.Foreachdataset,thereisasetof correspondingexpert-selectednodesofhighimpactthatwereferasseminalnodes.Table1summarizesbasiccharacteristics oftheanalyzednetworksandthecorrespondingsetsofseminalnodes.

3.1. AmericanPhysicalSocietycitationdata(APS)

TheAmericanPhysicalSociety(APS)datasetinourpossessioncoversyears 1893–2016(thedatasetisavailableon demandfromhttps://journals.aps.org/datasets).Afterremovingnon-researchpapers(announcements,bookreviews,etc.), thedatasetcontains595,287nodes(papers)publishedbytheAPSjournalsand7,051,801directedlinks(citations)between them.Forthisdataset,weusemultipleselectionsofmilestonepaperschosenbyeditorsofAPSjournals:87PhysicalReview Lettersmilestones,123PhysicalReviewEmilestones,2and78selectpapersannouncedonthe125thanniversaryofthe

PhysicalReviewjournals.3Intotal,thereare161uniqueseminalpapers,ofwhich160arepresentinthecitationdata(the

onemissingpaperisfrom2017,henceoutsidethecoverageperiodofourdataset). 3.2. High-EnergyPhysicscitationnetwork(HEP)

INSPIREisaprojectrunbyleadinghigh-energyphysicsinstitutionsaroundtheworld(CERN,DESY,Fermilab,IHEP, andSLAC).Amongotherthings,itcuratesadatabaseofpapershigh-energyphysicspapers(andpapersrelevanttothe high-energyphysicscommunity),whichisalsomadeavailableondemandforresearchpurposes.4 Afterprocessingthe

downloadedxmldatadumpcoveringyears1764–2017,weobtainedacitationnetworkcontaining829,708nodesand 14,994,123directedlinks.Thelistofmilestonepapershasbeendownloadedfromthewebsite“ChronologyofMilestone EventsinParticlePhysics”5 thatlistsmilestoneeventsandthecorrespondingpapers.Thewebsiteisajointeffortofthe

InstituteforHighEnergyPhysics(Russia)andtheParticleDataGroup(USA)withseveralleadinghigh-energyresearchers contributingtothefinalversionofthechronology.Thus-obtainedmilestonepapershavebeenmatchedwiththeHEPdata, leadingtothefinalsetof310seminalpapers.

3.3. USpatentcitationnetwork(PAT)

TheUSpatentdatasetwascollectedbyKogan,Papanikolaou,Seru,andStoffman(2017)andcoversyears1926–2010. Intotal,thereare6,237,625nodes(patents)and45,962,301links(citations)amongthem.InStrumskyandLobo(2015), theauthorslisted175patentswhich“affectedsociety,individualsandtheeconomyinahistoricallysignificantmanner”.In agreementwithMariani,Medo,andLafond(2018),weremovethepatentsissuedoutsidethecitationdataset’stimespanas wellasthedesignpatentsthatareabsentinthecitationdata.Asaresult,112seminalpatentsareusedforfurtheranalysis. Table2comparesthecharacteristicsofallnodesandtheseminalnodesineachdataset.Asexpected,theseminalnodes haveindegree(commonlyreferredtoasthenumberofcitations)significantlyhigherthantheoverallmedianindegreein allthreedatasets.Thisconfirmsthattheexpertassessmentoftheseminalnodesisnotincontradictionwiththeirimpact asreflectedbythecitationnetwork.

Table2furtherliststhetimesneededtocollecttheirfirstthreeandfivecitationsrespectively,3and5,byvarious groupsofnodes.Thesetimesindicatethetimescalesofcitationdynamics.Wesee,forexample,thatthenodescollecttheir citationsinthePATdatasignificantlyslowerthanintheAPSandtheHEPdata.IntheAPSdata,both3and5aresmaller fortheseminalnodesthantheyareforallnodes,whichisunderstandablegiventhemuchhigherindegreeoftheseminal

1Retrievedfromhttps://journals.aps.org/prl/50years/milestonesonJune6,2017. 2Retrievedfromhttps://journals.aps.org/pre/collections/pre-milestonesonJune6,2017. 3Retrievedfromhttps://journals.aps.org/125yearsonJanuary12,2018.

4WedownloadedtheINSPIREdataonOctober30,2017fromhttps://inspirehep.net/dumps/inspire-dump.html. 5Retrievedfromhttp://web.ihep.su/dbserv/compas/onApril6,2018.

(5)

Table2

Acomparisonbetweentheseminalnodesandallnodes.Here3and5arethemeantimesneededforthenodestogettheirfirst3and5citations, respectively(ignoringthenodesthathavelessthan3and5citations,respectively).

Dataset Setofnodes Medianindegree 3 5

APSpapers All 5 3.6years 4.8years

Seminal 239 1.3years 2.2years

HEPpapers All 4 3.1years 3.9years

Seminal 88 7.6years 12.4years

USpatents All 3 11.5years 13.3years

Seminal 30 9.6years 12.0years

Table3

Thesummarytableofallmetrics;ourcomputationofeachmetricisbasedontheprovidedimplementationreference.Weincludedalsoage-normalized variantsofthedisplayedmetricsintheanalysis(exceptforYCCPthatalreadyinvolvesage-normalization).Exceptfortherescaledcitationcount(Newman, 2009)andtherescaledPageRank(Marianietal.,2016),therescaledvariantsoftheremainingmetricshavenotbeenconsideredbefore.

Metric Abbreviation Implementationreference

Citationcount C Newman(2010)

PageRank P Chenetal.(2007)

CiteRank T Walkeretal.(2007)

LeaderRank L Lü,Zhang,Yeung,andZhou(2011)

H-index H Hirsch(2005)

Directedcollectiveinfluence CI BovetandMakse(2019)

Semi-localcentrality SLC Chen,Lü,Shang,Zhang,andZhou(2012)

HITSauthority HITS Kleinberg(1999)

Yearlycitationcountpercentile YCCP Leydesdorffetal.(2011)

nodes.Curiously,therelationisrevertedfortheHEPdatawheretheseminalnodesneedmoretimetogettheirfirst3or5 citationsthanallpapers.Whileparadoxicalatfirstsight,thisisadirectconsequenceoftheHEPseminalnodesbeing over-representedamongtheoldnodes(seeFig.4).Atthetimewhentheseseminalnodeswerecollectingtheirfirstcitations,the citationdynamicswassubstantiallyslowerthannowadays,andthisthenmanifestsitselfintheirhigh3and5.Weshall seeinSection5.3thatthestrongagebiasoftheHEPseminalnodeshasfurtherimportantimplications.

4. Noderankingmetrics

Weuseninedistinctnetworkcentralitymetricsthataredescribedbelow(seeTable3forasummary),andtheirvariants wheretheagebiasofmetricshasbeenremovedbytherescalingprocedureintroducedinMarianietal.(2016)(seeVaccario etal.,2017forsimultaneousremovalofageandfieldbiasbytherescalingprocedure).Inadatasetwithtime-stampednodes, rescalingcanbeappliedtoanynoderankingmetric;seeSection4.10fordetails.Theinputcitationdataarerepresentedby theN×NadjacencymatrixAwhoseelementAij=1ifnodeicitesnodejandAij=0otherwise.

4.1. Citationcount(C)

Citationcountreferstothenumberofcitationsreceivedbyagivenpaperorpatent.Itisequivalenttonodeindegreeina directednetwork(Newman,2010).Fornodei,citationcountisdefinedasCi=



j=1Aji.Basedontheassumptionthatanode isimportantifitiscitedbymanyothernodes,citationcountisthesimplestandthemostwidelyusedindicatorofpaperor patentimpactincitationdata.Themetric’ssimplicitydirectlytranslatesintoitslowcomputationalcomplexity.

4.2. PageRank(P)

PageRank(Brin&Page,1998),whichisannodecentralitymetricoriginallydevisedtorankpagesintheWorldWideWeb andlaterappliedtocitationdatatoassessthesignificanceofpublications(Chenetal.,2007),introducestheimportanceof differentnodesinaself-consistentmanner:Anodeisimportantifitiscitedbyotherimportantnodes.PageRankscorePiof nodeiisdefinedbythesetofequations(Berkhin,2005)

Pi=˛



j:kout>0 Aji kout j Pj+˛



j:kout=0 Pj N+ 1−˛ N (1) wherei=1,...,N,kout j =



lAjlistheoutdegreeofnodej,˛isthedampingfactor,and(1−˛)/Nisusuallyreferredtoasthe teleportationtermwhoseroleistoensurethatEq.(1)hasauniquesolution.While˛=0.85isusedintherankingofweb

(6)

pages(Berkhin,2005),˛=0.5istypicallyusedintheanalysisofcitationdata(Chenetal.,2007).Eq.(1)isusuallysolvedby iterations:startingfromtheuniforminitialscorePi(0)=1/N,everynode’sscoreisupdatediterativelyas(Berkhin,2005)

P(n+1)i



j:kout>0 Aji kout j Pj(n)+˛



j:kout=0 Pj(n) N + 1−˛ N (2)

wherenistheiterationnumber.Westoptheiterationswhentheaveragescorechangeissmallenough,thatis



Ni=1|P(n)i

Pi(n−1)|/N<εwhereε=10−9.Thesamestoppingconditionisusedalsoinothermetricsthatinvolveiterations(CiteRank

andLeaderRank).

4.3. CiteRank(T)

CiteRank(Walkeretal.,2007)hasbeenintroducedtooffsetPageRank’sstrongbiastowardsoldnodes[notethatin somecases,PageRankcanbealsobiasedtowardsrecentnodes(Mariani,Medo,&Zhang,2015)].Usingtherepresentation ofPageRankasarandomwalkonthecitationnetwork,CiteRankmodifiesthealgorithmbyinitiallydistributingrandom walkerspreferentiallyonrecentnodes,withtheoldnodesbeingexponentiallysuppressedatatimescale.SimilarlytoEq. (2),CiteRankscoreTifornodeicanbedefinedinaniterativewayas

Ti(n+1)=˛



j:kout>0 Aji kout j Tj(n)+˛



j:kout=0 Tj(n) N +(1−˛) exp[−(t−ti)/]



N j=1exp[−(t−ti)/] (3)

wheretiisthepublicationdateofnodei,tisthedatewhenthescoresarecomputed;othertermsandparametershavethe samemeaningasforPageRank.Toestimatesuitablevaluesforparameters˛and,wefollowedtheproceduredescribed inWalkeretal.(2007)wheretheauthorsmaximizethecorrelationbetweentheCiteRankscoresandthenodes’recent indegreeincrease.Theresultingparametervaluesare˛=0.50,=2.6yearsforAPS;˛=0.50,=2.4yearsforHEP;and˛=0.44, =7.6yearsforPAT.Notably,theparametervaluesforAPSarethesameasreportedinWalkeretal.(2007)despiteourAPS datasetincluding13additionalyearsand,consequently,60%morepapers.

4.4. LeaderRank(L)

TheneedforateleportationterminPageRankcanbeeliminatedbyconnectingeachnodetoanartificial“ground”node withbidirectionallinks.Theresultingparameter-freeLeaderRankmetrichasbeenproposedin(Lüetal.,2011)toquantify nodeinfluence.TheiterativeequationfortheLeaderRankscoreLis

L(n+1)i = N+1



j Aji kout j L(n)j (4)

wherebothAjiandkoutj includethegroundnodeandthelinksbetweenthegroundnodeandallothernodesinthenetwork. AfterobtainingtheequilibriumscoresL(nc)

i ,thegroundnodeisremovedfromthesystemanditsscoreisevenlydistributed amongallrealnodes.Thefinalscoreofnodeiisthusdefinedas

Li=L(ni c)+ L(nc)

g

N (5)

Theredistributionofthegroundnode’sscoredoesnotaffecttherankingofnodesbyLeaderRank,though.

4.5. H-index(H)

H-index(Hirsch,2005)wasoriginallydevisedtocharacterizetheacademicimpactofresearchersbasedontheir publi-cationsandcitations(Hirsch,2005,2007).Similarlyasitwaslaterappliedtoevaluateresearchjournals(Braun,Glänzel,& Schubert,2006),itcanbeadaptedalsotoevaluateresearchpapers:Theh-indexofpaperiisdefinedasthelargestnumberhi suchthatpaperiiscitedbyatleasthipapersthateachhaveatleasthicitations(Lü,Zhou,Zhang,&Stanley,2016;Schubert, 2008).

(7)

4.6. CollectiveInfluence(CI)

CollectiveInfluencewasintroducedinMoroneandMakse(2015)toidentifythosenodesthat,whenremoved,causethe biggestdamagetoagraph’sgiantcomponent;thealgorithmisbasedontheclassicalproblemofpercolationincomplex networks.TheCIcentralityofnodeiatlevellisdefinedas

CIli=(ki−1)



j:dij=l

(kj−1) (6)

wherekiisthedegreeofnodei,dijislengthoftheshortestdistancebetweennodesiandj,andlisthemetric’sparameter. InlinewithBovetandMakse(2019),weconsideronlynodeindegreeinthecomputationofCIasnodeindegreeisindicative ofnodeimpact.Usingnodeoutdegreeorcombiningin-andout-degreeleadstoinferiorresultsinourevaluation.Distance dijiscomputedsothatitrespectslinkdirections.Ourtestsshowthatl=1andl=2producethebestresults,themetric’s performancedeterioratesaslincreasesfurther.Weusel=2forallCIresultspresentedhere.

4.7. Semi-localcentrality(SLC)

Semi-localcentralitywasproposed(Chenetal.,2012)asanextensionofthepurelylocalnodedegree(whichisthe simplestnodecentralitymetric).Itissemi-localinthesenseofconsideringthenodeneighborhooduptothefourthorder. Thesemi-localcentralityscoreofnodeiisdefinedas

SLCi=



j∈i Qj, Qj=



k∈j Nk (7)

whereiisthesetofthenearestneighborsofnodeiandNkisthenumberofthenearestandthenext-nearestneighborsof nodek.Inthispaper,weconsideronlythein-neighbors(i’snearestin-neighborsarethenodesthatcitenodei)toleverage theimpactofcitations.IfSLCiscomputedusingout-neighborsorusingbothin-andout-neighbors,itsperformance(as measuredbythemetricsintroducedinSections5.1and6.1)deteriorates.

4.8. Hyperlink-InducedTopicSearch(HITS)

HITS(Kleinberg,1999)isaseminalrankingalgorithmthatconsiderstworolesforeachnodeinthenetwork,authority andhub.Agoodauthorityispointedbymanyhubs,andagoodhubpointstomanyauthorities.Theauthorityscoreofa nodeisequaltothesumofthehubscoresofallnodesthatpointtothisnode,andthehubscoreofanodeisequaltothe sumoftheauthorityscoresofallnodesthatthisnodepointsto.Mathematically,theauthorityscoreaiandthehubscorehi ofnodeifulfill an+1i = N



j=1 Ajihnj, hn+1i = N



j=1 Aijanj (8)

Bothscoresarenormalizedaftereachiterationsothatthesumoverallnodesisoneforeachscore.Theiterationsstop whentheaveragescorechangeissmallenough,thatis,



Ni=1(|a(n)i −ai(n−1)|+|h(n)i −h(n−1)i |)/N<εwhereε=10−9.Ofthe twoscores,authorityisrelatedtonodeimpactinacitationnetworkasitisderivedfromincominglinksasopposedtohub whichisderivedfromoutgoinglinks(references)tonodesofhighauthority.Wethusconsidertheequilibriumauthority valueasthenodes’HITSscoreshere.

4.9. Yearlycitationcountpercentile(YCCP)

Theuseofpercentilesintherankingofpapershastheadvantageofavoidingworkingdirectlywithcitationcountsthatare typicallybroadlydistributedwhichmakesitdifficulttoaggregatethem(Leydesdorffetal.,2011)by,forexample,averaging (suchascomputingtheaveragecitationcountofthepapersauthoredbyanindividualresearcher).Toreducetheagebias intheresultingranking,wecomputethecitationcountpercentileofanodewithrespecttothecitationcountsofallnodes thathaveappearedinthesameyearasthetargetnode.Thenodesarefinallyrankedbytheirrespectivepercentileranks. Notethatbycomparingwithnodesthatappearedinthesameyear,thisrankingmetricalreadyaddressestheagebias;we thusdonotconsiderarescaledversionofthismetric.Ifthecitationcountpercentileiscomputedwithrespecttoallnodes regardlessoftheirappearancetime,therankingisthesameastherankingbycitationcount,C.

(8)

4.10. Rescaledmetricvariants

Tosuppresstheagebiasofrankingmetrics,weusetherescalingprocedureproposedbyMarianietal.(2016);seeDunaiski etal.(2019b)forotherapproachestoscorenormalization.TherescaledscoreR(mi)formetricmandnodeiiscomputedas

R(mi)=mi−i(m) i(m)

(9) wheremiistheoriginalscoreofnodeiasproducedbymetricm,andi(m)andi(m)arethemetricmeanandstandard deviation,respectively,computedovernodesinawindowcenteredatnodei.Assumingthatthenodesaresortedbytheir age/appearancetime,thewindowaroundnodeiincludesnodesj∈[i−W/2,i+W/2]wheretheparameterWrepresents thewindowsize.FortheAPS,HEP,andPATdata,weuseW=1000,W=2000,andW=15,000,respectively,whichisroughly proportionaltothenumberofnodesineachdataset.

AsshowninMarianietal.(2016,2018)andRen(2019),rescalingsignificantlyreducesthemagnitudeoftheagebias—and, inthecaseofVaccarioetal.(2017),oftheageandfieldbias—ofcitationcountandPageRank.Weusethistechniquehere torescaleallrankingmetricsintroducedabove,andinturncomparetheirperformancewithoriginalnon-rescaledmetrics. Rescaledmetricsaremarkedbyadding Ratthebeginningof theiroriginallabels(e.g.,RPfor rescaledPageRank).The effectivenessoftherescalingprocedureinremovingtheagebiasofrespectivemetricsinthestudieddatasetsisinvestigated inAppendixAwherewefindthatrescalingindeedsignificantlyreducestheagebiasforalmostalldataset-metricpairs(see TableA.5forasummary).

5. Metricperformanceinrankingtheseminalnodes(Task1)

Wefirstevaluatetherankingperformanceofmetricstakingintoaccountsolelytherankingpositionsoftheseminal nodes,inlinewithcommoninformation-retrievalpractices(Dunaiskietal.,2018;Lü&Zhou,2011;Manning,Raghavan,& Schütze,2010;Radicchietal.,2009).

5.1. Identificationrate

Ourbasicevaluationprocedureisbasedonacompletegivennetworkwhichisusedasaninput.Werankthenetwork nodesbytheirscoreaccordingtoagivenmetricmandcomputethefractionoftheseminalnodesthatareamongthetopzN nodes,fz(m).Thisquantityiscommonlyreferredasrecallininformationfilteringliterature(Lüetal.,2012).Tocomplywith previousresearchonrescaling(Marianietal.,2016),andalsotoavoidconfusionforarelatedage-dependentversionofthis metric(seethenextparagraph),weuseherethepreviously-coinedtermidentificationrate(IR)forfz(m).Notethatz∈(0,1) isanevaluationparameterthat,toreflectourgoalofevaluatingtherankingmetricsbywhethertheyranktheseminalnodes “highly”,shouldbeasmallnumber.Weusez=1%unlessstatedotherwiseandlaterverifyourmainresultsusingz=0.5% andz=2%,respectively.

Besidesassessingtheidentificationrateonacompletenetwork,wealsostudythemetrics’performanceasafunctionof theageoftheseminalnodes(Marianietal.,2016).Tothisend,weconstructnetworksnapshotsattheendofeachcalendar year(ignoringallnodesandlinksthatappearafterward),andrankthenodesineachnetworksnapshot.Thisallowsusto evaluate,individuallyforeachseminalnode,whetheritwasatthetopzfractionoftherankingatanygivenaget.By averagingthisoverallseminalnodes,6 weobtaintheidentificationratef

z(m,t)whichisnowafunctionoftheageof seminalnodes.fz(m,1year),forexample,isthefractionoftheseminalnodesthatareinthetopzfractionoftheranking whentheyareoneyearold.

Task1focusessolelyontherankingpositionsoftheseminalnodes,andthesearereflectedbyfz(m)andfz(m,t).While theformerevaluatesthe“final”rankingpositionsoftheseminalnodes,thelatterallowsustoinspecthowfast(orslow)do theseminalnodesriseintherankingsbytherespectivemetric.

5.2. Metricevaluationusingidentificationrate

Fig.1showstherankingmetricsevaluatedbytheirIRincompletedatasets.Overall,thehighestidentificationratesare foundinAPS,followedbyHEP,andthenbyPAT.AlikelyreasonforthisisprovidedbyTable2whichshowsthatinthePAT data,medianindegreedifferstheleastbetweenallnodesandtheseminalnodes,thusmakingtheseminalnodesinthis datasetdifficulttobeseparatedfromtheothernodes.

TherelativestandingsofmetricsarerathersimilarbetweenAPSandPATwithPageRankbeingthebest-performing metricinboth.Relativedifferencesbetweenthemetricsinbothdatasetsarerathersmall,though:Inbothdatasetsthere areafewmethodswithnearly-identicalperformance,andtheratiobetweenthebestandtheworstmetric’sIRisaround 1.5.TheresultsareverydifferentinHEPwhere:(1)LeaderRank(L)outperformsthesecond-bestmethodbyawidemargin,

6Ifaseminalnodeappearstyearsbeforetheendofthecompletedataset,itisobviouslyimpossibletoknowitsrankingataget>t.Seminalnodesthat areyoungerthantarethereforeexcludedfromtheaveraging.

(9)

Fig.1. Metrics’performanceinidentifyingtheseminalnodesasmeasuredbytheidentificationrate(z=1%)incompletedatasets.Notethatthemaximal displayedvaluesdifferbetweenthepanels.Colorsofthebarsareusedtodistinguishtheoriginalrankingmetrics(white)andtheirage-rescaledcounterparts (orange).(Forinterpretationofthereferencestocolorinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.)

Fig.2.Theidentificationrateofindividualmetricsasafunctionoftheseminalnodeage(inyears).Tofacilitatethecomparison,themetrics’performance isnormalizedtothebestmetricineachagebin.AmetricwithzeroIRthusreceiveszeroscore,whileametricthatachievesthebestIRforgivenseminal nodeagereceivesthescoreofone.

(2)Theratiobetweenthebestandtheworstmetric’sIRis3.5,(3)Allrescaledmetricsperformsignificantlyworsethan theirunrescaledcounterparts.Wefocusonmetricevaluationinthissection;reasonsforthedifferencesobservedinHEP arediscussedinSection5.3.

Insummary,LeaderRankcanbeconsideredasthebest-performingmetricinTask1asitisclearlybestintheHEPdata andnearly-bestintheAPSandPATdata.Thisholdsalsowhendifferentevaluationthresholds,z=0.5%andz=2%,areused. Fig.2consequentlyshowsthemetrics’relativeperformanceasafunctionoftheseminalnodeage.Thisapproachservesto revealthetimeevolutionofthemetrics’rankingperformance.Tofurtherfacilitatethecomparisonofmetrics,wenormalize themetrics’identificationrateatagivenagetofseminalnodes,f(m;t),bythebest-achievedIRatthisage,maxnf(n;t). Relativeperformancethusrangesfromzero(whenametric’sIRiszeroataget)toone(achievedbythebest-performing metricataget).AsshowninFig.2,therelativeperformanceofmetricschangesdramaticallywiththeseminalnodeage: metricsthatworkwellshortafterpublication(mostlyrescaledmetrics)losetheiradvantageastheseminalnodesbecome older.Inthedisplayednodeagerange,CiteRankandrescaledCiteRank(TandRT)aretwobest-performingmetricsinAPS andPAT.ForHEP,thereisnosinglemetricthatperformswellformostagevalues.Rescaledcitationcount(RC)isbestuntil age5,thenh-indexandcollectiveinfluence(HandCI)arebestuntilage12,andfinallysemi-localcentrality(SLC)isthe bestfromthenuntilage20.LeaderRank(L),whichperformedbestforthecompleteHEPdatasetinFig.1,becomesthebest metriclateron(forcomparison,theaverageageoftheseminalnodesinthecompleteHEPdatasetis61years).

Togainfurtherinsightsindifferencesbetweenthemetrics,weevaluatetheirpairwisesimilarityusingtheSpearman rankcorrelationofallnodes’rankings.TheresultsareshowninFig.3togetherwithmetricclusteringbasedontheobtained correlationmatrices.Thereareseveralpointstonote.First,theclusteringofmetricsisremarkablystableacrossthedatasets. Second,theclusteringsrevealtwogroupsofmetricswhoserankingsaresimilartoeachother.ThelargergroupincludesCI, SLC,P,L,C,andH.Thesmallergroupincludessomeoftheirrescaledvariants:RP,RL,RC,andRH.Third,RCI,RSLC,andRTdo notclusterwithotherrescaledmetrics,probablyasaresultoftherescalingprocedurenotworkingperfectlyforthem(see Figs.A.1–A.3inAppendixA).Fourth,withineachofthetwomentionedclusters,thepairwiseSpearmancorrelationvalues areratherhigh(above0.73inallthreedatasets),whichindicatesahighdegreeofsimilarityamongtherespectivemetrics. NotethatwehaveomittedHITSfromthepresentationofresultsabove.Thereasonfordoingsoisthatitsperformanceis somuchworsethanthatoftheothermetricsthattheaddedvalueofdisplayingHITSinallpreviousfigureswouldbevery limited.Inparticular,theidentificationratevaluesoftheHITSauthorityscoreare0.143(APS),0.116(HEP),and0.054(PAT).7

7 HITSperformanceisstronglyinferiortoothermetricsalsointermsofnormalizedidentificationrateintroducedinSection6.1.

(10)

Fig.3.SimilarityoftheevaluatedmetricsasmeasuredbytheSpearmanrankcorrelationofallnoderankpositions.Themetrics’hierarchicalclusterings areobtainedbytheUPGMAmethodimplementedbytheclustermapfunctioninPython’sSeabornlibrary.

ThepoorperformanceofHITShereisverydifferentfromthisalgorithmbeingpraisedinthelineofresearchoncourtdecision citationnetworks(seeAgnoloni&Pagallo,2015;Fowler&Jeon,2008andthereferencestherein).Onepossiblereasonfor thisdifferenceisthatinscience,fewwouldagreethatacitationfromawell-referencedbutlittlecitedreviewpaperismore indicativeofthetargetpaper’simpactthanacitationfromahigh-impactpaperwithfewreferences(asHITSauthorityscore wouldassume).Inthissense,courtdecisioncitationnetworksmaybeintrinsicallymorefavorabletoHITSthanthepaper andpatentcitationnetworksare.Furtherresearchisnecessarytounderstandstructuraldifferencesbetweencourtdecision citationnetworksandscholarly/patentcitationnetworks.Also,acomprehensiveevaluationofseveralrankingmetricscan helpusunderstandwhetherHITSisindeedthesingularbest-performingmetricincourtdecisionnetworks.

Wehavesimilarlyomittedyearlycitationcountpercentile,YCCP,fromthefigures.AlbeittheperformanceofYCCPdoes notlackbehindthetopmetricsasmuchastheperformanceofHITS,theresultsarestillsignificantlylower:Theidentification ratevaluesofYCCPare0.700(APS),0.197(HEP),and0.375(PAT).Importantly,theIRresultsofYCCParesimilartotheresults ofRCwhichisexpectedasRCtoo,isanage-normalizedversionofcitationcountsimilarlytoYCCP.Becauseofthishighlevel ofsimilarity,wereporttheYCCPresultsonlyintext.

5.3. Caveatsofidentificationrate

Whilemetrics’performanceinFig.1isstrikinglyuniformintheAPSandPATdata,bigdifferencesarefoundintheHEP data.Toexplainwheredotheycomefrom,Fig.4showstheagedistributionsoftheseminalnodesinthedata.Intermsof realtime,thedifferencebetweenAPS/PATandHEPisalreadyapparentasthefirsttwodatasetshavetheaveragepublication yearoftheseminalnodes1976and1975,respectively,whereasitis1957fortheHEPseminalnodes.Thedifferencebetween thethreedatasetsismoreevident,though,wheneachseminalnodeisassignedtooneofthe40equally-sizedagegroups byitspublicationdate(withgroups1and40containingtheoldestandthemostrecentnodes,respectively).Thebottom rowofFig.4showsthattheHEPseminalnodesaredistributedextremelyunevenlyamongtheagegroupswith74%ofthem (230outof310)intheoldestagegroup1,andnoseminalnodesinagegroups14–40.Thebigdifferencesbetweenthetop andbottomrowinFig.4areduetotheacceleratingratesatwhichnewnodesappearinthedatasets.Thenumbersofrecent newnodesaresohighthatthey“push”theseminalnodestotheearlyagegroups.InAPS,forexample,approximately85% ofallnodesappearafter1976whichisthemeanpublicationyearofthedataset’sseminalnodes.

Thestrongtemporalnon-uniformityoftheseminalnodeshasprofoundconsequences.Firstly,itisnotfavorableto age-rescaledmetricswhich,bydesign,striveforauniformrepresentationofallagegroupsamongthetop-rankednodes.For theHEPdata,however,nodesfromagegroups2–40cancontributeonlymarginallytotheidentificationratebecausethere areonlyafewseminalnodesamongthem.Bycontrast,originalnon-rescaledmetricsaretypicallybiasedtowardsoldnodes (seeFigs.A.1–A.3)andthisgivesthemanadvantagewhenagivensetofseminalnodessharesthesamebiastowardsold nodes.Inparticular,Fig.A.2showsthatthebiasofLeaderRanktowardsoldnodesisthestrongestofallmetricsinHEP,which directlycontributestothemetric’ssuperiorperformanceinFig.1.

Secondly,theagebiasoftheseminalnodesinHEPissostrongthatitallowsthesimplerankingofnodesbytheirage(we referthismetricasAgeR;oldnodesareatthetop)tooutperformallothermetrics.Itsidentificationrateonthecomplete HEPdatais0.70whichisindeedbetterthanthevaluesshowninFig.1fortheothermetrics.Thisisfurtherillustratedbythe leftpanelofFig.5whichshowstheidentificationrateforafewselectedmetricsasafunctionoftheseminalnodeage.Here AgeRyieldszeroidentificationratewhentheseminalnodesareyoung(youngerthan30years)becauseitsimplyputsold nodesatthetopoftheranking.However,themetric’sresultsquicklyimprovewhentheseminalnodesareolderthanthat

(11)

Fig.4.Thedistributionsoftheseminalnodes’publicationdatesinthedatasets:realtime(toprow),and40equally-sizedagegroups(bottomrow).

Fig.5.PerformanceofselectedmetricsinidentifyingtheseminalnodesintheHEPdata:Acomparisonbetweentheidentificationrate(left)andthe normalizedidentificationrate(right;seeSection6.1forthedefinition)measuredasfunctionsoftheseminalnodeage.

andAgeRbecomesthebestmetricstartingfromage40,approximately.Thisdemonstratesthatevaluatingrankingmetrics solelybytheranksthattheyassigntotheseminalnodesisoflimitedrelevanceasAgeR—ametricthatentirelyignoresthe actualimpactofthenodes—iseventuallyabletooutperformallothermetrics.

Insummary,theidentificationratesobservedwithinTask1areaconfoundoutcomeofagivenmetric’sabilitytorank welltheseminalnodesandthelevelofagreementbetweenthemetric’sbiasesandthebiasesimplicitlypresentinthechosen setofseminalnodes.Notethatuntilnow,wediscussedspecificallytheagebiasbecauseitisbothmanifestlypresentaswell aseasytodefineandmeasure.Otherpotentiallyrelevantbiases—suchasthefieldbias,forexample—canbeinprinciple studiedandtreatedinasimilarwayaswedoherefortheagebias.

6. Metricperformanceinrankingtheseminalnodeswhilstpenalizingbiasedmetrics(Task2)

Havingdemonstratedthecaveatsofevaluatingtherankingperformanceofmetricsusingidentificationrate,wenow proceedtoTask2that additionallypenalizesbiasedmetrics.Tothisend, weemploythenormalizedidentificationrate introducedinMarianietal.(2016)whichimposesapenaltyonmetricsthatarebiased.

(12)

Fig.6. AnillustrationforthealternativeinterpretationofTask2:Theagedistributionoftheseminalexpert-selectednodesandthetopzfractionofnodes fromeachagegroup(weuseherefouragegroupsasanexample).

6.1. Normalizedidentificationrate

Normalizedidentificationrate(NIR)introducedinMarianietal.(2016)considerstheagedistributionofthetop-ranked nodesandappliesapenaltyfactortotheidentifiedseminalnodesthatcomefromagegroupsthatareover-represented amongthetop-rankednodes.TocomputeNIR,we divideallNnetwork nodesbyageintoGgroupsofequalsize,and computeNz(g)whichisthenumberofnodesfromeachgroupg(g=1,...,G)thatareinthetopzfractionoftheranking. Anage-unbiasedmetricwouldresultinNU:=zN/Gtopnodes,onaverage,ineachagegroup.Foranyagegroupgthatis “over-represented”(thatis,Nz(g)>NU),theseminalnodesthatareinthetopzfractionoftherankingdonotcontribute totheNIRfullybutonlyproportionallytoNU/Nz(g).If,forexample,aseminalnodeisfromanagegroupthatistwiceas frequentinthetopoftherankingasitshouldbe,thisseminalnodecontributesonlyhalftotheNIR.Bycontrast,seminal nodesfromunder-representedagegroups(Nz(g)<NU)contributetotheNIRinthesamewayastheycontributetotheIR. Inotherwords,NIRassumesapenaltyforseminalnodesfromover-representedagegroupsbutnobonusforseminalnodes fromunder-representedagegroups.Tosummarize,thefactorNU/Nz(g)introducedbythenormalizedidentificationratecan beviewedasapenaltyfortheperformancegainedbyagebiasofametric.

ThechoiceofthenumberofagegroupsGusedinthecomputationofNIRisacompromisebetweenimprovingthetemporal resolution(loweringthetimedurationofeachgroup)byincreasingGandlimitingthenaturalstatisticalvariabilityofNz(g) bykeepingGlow.WeuseG=40adoptedinpreviousliterature(Marianietal.,2016;Vaccarioetal.,2017);otherchoices leadtoqualitativelysimilarresults.Notethatduetotheintroductionofapenalizingfactor,NIRcannotbehigherthanIR foragivenranking.ThehighestpossibleNIRofoneisachievedbyarankingthatplacesallseminalnodesinthechosentop fractionoftheranking(weusetop1%here,unlessstatedotherwise)andtherankingisnotbiasedbynodeage(or,atleast, theagebinscontainingtheseminalnodesarenotover-representedinthetopoftheranking).

TherightpanelofFig.5showsthenormalizedidentificationrateasafunctionoftheseminalnodeageforasmallnumber ofselectedmetrics.ItshowsthatusingNIRindeedsolvestheproblemencounteredwhenmetricperformanceismeasured usingtheordinaryidentificationrate(leftpanelinFig.5).Inparticular,AgeRbecomes theworstmethodregardlessof theseminalnodeage,asappropriateforarankingmethodthatactuallyignoresnodeimpactinthenetwork.LeaderRank, anothermetricthatisstronglybiasedtowardsoldnodes,isalsostronglyaffectedanditsperformancestartstodecreaseat theseminalnodeage10yearsinsteadofgrowingmonotonouslywhenidentificationrateisused.Thisillustratesthatthe useofthenormalizedidentificationrateweakensthemutuallyreinforcinglinkbetweenage-biasedmetricsandage-biased setsofseminalnodes.

ThereisalsoanalternativeviewthatleadstoTask2andthenormalizedidentificationrateastheappropriateevaluation methods.Thisviewisbasedonrealizingthattheseminalnodesarenecessarilyasubsetofhigh-qualitynodes—theproverbial “tipofaniceberg”.TheshorttextintroducingthePhysicalReviewLettersmilestonesexplicitlyacknowledgesthat“Itis inevitablethatsomeveryimportantworkwillnotbefeatured(inthemilestonescollection)”.Wedefinethebestzfraction ofnodesineachagegroupasthehigh-qualitynodes—buttheproblemisthatwedonotknowwhicharethose“best”nodes. Thatiswhywestillneedtheseminalnodesbutwedonotviewthemasadefiniteandonlytargetfortheevaluatedranking metrics(whichwouldcorrespondtoTask1).Instead,werecognizethattheseminalnodesareaparticularsamplefromall high-qualitynodesinthedataset.ThisisillustratedbyFig.6wherethenumberofseminalnodesvariesgreatlyamongthe agegroupsbutthenumberofthetopzfractionofnodesisnaturallyconstant.Ifarankingmetricover-representsacertain agegroupinitstopzfractionoftherankingbyfactorX>1,onlythefraction1/Xofthetopnodesfromthisagegroupare amongthetopzfractionofnodesfromthisagegroup.Thatiswhythefactor1/Xneedstobeappliedtoanyidentifiedseminal nodesfromthisagegroup—whichispreciselywhatthenormalizedidentificationratedoes.Insummary,Task2andthe

(13)

Fig.7. Metrics’performanceinidentifyingtheseminalnodesasmeasuredbyNIRevaluatedonthecompletedatasets.

Table4

Asummaryofthemetricsevaluationbynormalizedidenticationrate.Theaveragescoreofmetricmisobtainedbycomputingitsscorerelativetothe best-performingmetricineachdataset,NIR(m)/maxnNIR(n),andaveragingthisscoreoverthethreeanalyzeddatasets.Ametricthatwouldperformbest inalldatasetswouldthereforeachievetheaveragescoreofone.ThesubsequentrowsshowtherankingofmetricsbytheirNIRforeachdataset.Thebottom partofthetableshowstheaveragescorebasedontheNIRvaluesfortwodifferenttoprankingfractionsz,0.5%and2%.

Metric RP RL RC RH RCI RSLC T RT C H CI P L SLC HITS RHITS

Avgscore 0.98 0.98 0.89 0.81 0.80 0.75 0.74 0.70 0.69 0.63 0.58 0.57 0.54 0.53 0.27 0.25 RankAPS 1 2 5 4 7 9 3 6 8 10 12 11 13 14 15 16 RankHEP 4 1 5 6 3 2 11 12 9 7 10 15 16 8 14 13 RankPAT 1 3 2 4 5 9 8 11 10 12 13 6 7 14 15 16 Avgscorez=0.005 0.96 0.96 0.78 0.73 0.72 0.68 0.65 0.63 0.60 0.65 0.53 0.55 0.49 0.46 0.20 0.21 Avgscorez=0.02 0.93 0.94 0.89 0.81 0.82 0.71 0.71 0.72 0.66 0.62 0.59 0.56 0.52 0.52 0.28 0.24

normalizedidentificationratecorrespondalsotothetaskofrankinghighlythebestnodesfromeachagegroup,fromwhich thegivenseminalnodesareapotentiallybiasedsample.

6.2. Metricevaluationusingnormalizedidentificationrate

NIRvaluesonthecompletedatasetsareshowninFig.7.Thefirstthingtonoteisthattherescaledmetricsgenerally performbetterthantheiroriginalcounterpartshere.TheonlyexceptionisTwhichitselfcontainsamechanismtoprevent themetricfrombeingoverlybiasedtowardsoldnodes,soanadditionalrescalingprocedureisinsomesensesuperfluous (evenwhenMarianietal.(2016)andFigs.A.1–A.3showthatCiteRankstilldisplaysstrongagebiasinsomeagegroups).The secondobservationisthattheNIRvaluesfortheHEPdatasetaremuchlowerthanthepreviouslyreportedIRvalues.Thisis adirectconsequenceofthepenalizationintroducedbyNIRthatheavilypenalizesbiasedrankingmetricsand,atthesame time,unbiasedrankingmetricsbeingunabletowellidentifythebiasedseminalnodes.Inlinewiththeidentificationrate resultsinTask1,YCCPperformssimilarlytoRC:itsNIRvaluesare0.678(APS),0.175(HEP),and0.348(PAT);theaverage scorereportedfortheothermetricsinTable4is0.89.

Themostimportant findingemerging fromFig.7 isthat upon adoptingthe normalizedidentificationrate forthe evaluation, metricsthat perform well in allthree datasets emerge. This is clearly visible in Table4 where the rela-tiveperformanceof therankingmetricsin allthree datasetsis summarized.We seethat rescaledPageRank, RP,and rescaledLeaderRank,RL,performbestor nearly-bestin allthreedatasets (recallthat LeaderRankisa modificationof PageRankobtainedbychangingtheteleportationterm).Thisresultisrobustwithrespecttochangingtheranking frac-tionzthatweusetoevaluatethenormalizedidentificationrate:evenwhentherelativeorderofsomemetricschanges, RPand RLremainthetwo bestmetricsbysomemargin.It isinterestingtonoteherethat whilethetwo topmetrics areglobalinthesenseoftakingthewholenetworkstructureintoaccount,theyarefolloweddirectlybyrescaled cita-tioncount, RC,which is a localmetric that is based only onthe immediate nodeneighborhood. Semi-local metrics, suchasthesemi-localcentralitySLCandthecollectiveinfluenceCI,regardlessifrescaledornot,combinetheworstof bothworlds:Theyarecomputationallymoredemandingthanlocalmetrics,andtheyranknodesworsethanlocal met-rics.

FurtherinsightscanbegainedbyplottingNIRasafunctionoftheseminalnodeageinFig.A.4similarlyaswedidforthe identificationrate,IR,inFig.2.Weseethere,forexample,thatunlikeintheAPSdata,rescaledPageRankdoesnotoutperform rescaledcitationcountintheHEPdatainthefirst18yearsofseminalnodeage.

7. Discussion

Previouslyintroducednormalized identificationrate(NIR)takesintoaccountboth therankingpositionsof expert-selected nodes as well as the metric’s bias that manifests itself in the ranking. We use NIR to uncover the

(14)

consistentperformance of impactranking metricsacrossdifferentcitation datasets.Our resultsindicate thatranking based on thenetwork structure is more successful than simple degree-based metrics in singling out the significant nodes.

7.1. Limitationsandopendirections

Thereare variousquestions that remainopenfor furtherresearch.First,to extendouranalysistomore thanone kindofbias(suchastheageandfieldbiascommoninscholarlycitationdataVaccarioetal.,2017)andtogeneralizeit tocaseswherethekindofbiasisnotexplicitlyknown.Second,identificationrate(referredtoasrecallininformation filteringandstatisticallearning)is justoneof variousperformancemetrics(see Dunaiskietal.,2018foranoverview of possibilities);we thusneedtostudy howtoaccount forbiasin theseothermetrics.Third,besidesmetric evalua-tion onrealdata,evaluation onsynthetic modeldata canbeused togain furthertheoreticalinsights. Thisapproach hastheadvantageofhavingthepossibilitytoarbitrarilyturnonandoff variousmodelcomponentsandthusidentify which ofthem are crucialfor theobserved metric performance.For example,which assumptions aboutpapers writ-ten bymultiple authorsmust be fulfilled in order for a specific researcher-impact metric reflect wellthe individual authors’contributions? Such modelsneed tobeinformed byanalysesof realempirical datasets and, conversely, the assumptions andeffectsidentified ascrucialinmodeldatacanbeinturnvalidatedinrealdatasets.Finally,thereare othercitationdatasetsthatcanbeusedforasimilaranalysisuponidentifyingcorrespondingsetsofseminalnodesfor them.

Todrawaparallel,inamachinelearningproblem,ifthetrainingsethassomeintrinsicbias,thesystemlearnsthisbias andinturnproducesbiasedoutcomes(Lloyd,2018;Raghavendra,Cerutti,&Preece,2018).Thisissimilartooursituation whereapotentialbiasintheusedsetofseminalnodes,ifleftunchecked,canleadtowrongrankingmetricsbeingbelieved toperformbest,orevendesigningnewrankingmetricsthatperformwellonlythankstothebias.Biasedoutcomesofthose metricscaninturnmisguideourfutureevaluationsanddecisions.Furtherresearchofvariousaspectsofbiasindatamining andcomplexsystemsresearch,inparticularhowtoavoidit,isthereforevital.

7.2. Managementimplications

Rankingandprioritizingisanessentialtaskinmanymanagerialapplications.Ourresultsshowthattorankpapers, age-rescaledPageRankisawell-performingmetricthatbyconstructionproducesrankingswithlittleresidualagebias.Evaluation ofresearchersandinstitutestypicallyusesmetricsderivedfromcitationcounts(suchastheh-index,forexample).Our analysis,inparticulartheperformancegapbetweenage-rescaledPageRankandthetestedunbiasedversionsofcitation count,suggestthatapplyingstructuralnetwork-basedmetricssuchasPageRankmightbeofadvantagealsowhenthe objectiveistoranktheresearchersorinstitutions.

8. Conclusion

Well-designedrobustevaluationprotocolsarecrucialforunderstandingwhichrankingmetrics,whichwehave abun-danceof(Liaoetal.,2017;Waltman,2016),performwellinwhichcontexts.Thisstudyshowsthattheevaluationofaranking metricbasedsolelyonthepositionsofexpert-selectednodesintheresultingrankingisdifficulttointerpretbecauseit con-foundstwoaspects:themetric’srankingperformanceandthedegreetowhichthebiasesoftheexpert-selectednodes overlapwiththemetric’sbiases.Normalizedidentificationrateweakensthelinkbetweentherankingbiasandthe evalua-tionresults,andyieldsresultsthatareconsistentacrossdifferentdatasets.Inourcaseofrankingseminalnodesincitation networks,wefindthatage-rescaledPageRankandage-rescaledLeaderRank(notethatLeaderRankisaclosevariantof PageRank)arethetwobest-performingmetricsbyawidemargin.

Ourworkdeepenstheunderstandingofimpactmetrics,especiallyinrelationtotheinterplaybetweentheirbiasesand thebiasesoftheconsideredtestset.Thecomprehensivecomparisonsamongvariousmetricsarecrucialtocopewiththe ever-growingnumberofnewmetricsandbeneficialtounderstandtheadvantagesandlimitationsofeachofthem.The proposedevaluationframeworkwhichpenalizesbiasedmetricshasgeneralapplicabilitybeyondtherankingofarticlesin citationdata;byhighlightingthevariousrolesofbias,itprovidespracticallessonsfortherankingpracticeinotherdatasets withbias,suchastechnologicalnetworks,socialnetworks,andothersystems.

Authors’contribution

ShuqiXu:Contributeddataoranalysistools;Performedtheanalysis;Wrotethepaper. ManuelSebastianMariani:Conceivedanddesignedtheanalysis;Wrotethepaper. LinyuanLü:Conceivedanddesignedtheanalysis.

MatúˇsMedo:Conceivedanddesignedtheanalysis;Collectedthedata;Contributeddataoranalysistools;Wrotethe paper.

(15)

Acknowledgements

ThisworkissupportedbytheNationalNaturalScienceFoundationofChina(Nos.11622538,61673150,11850410444), theScienceStrengthPromotionProgramoftheUESTC,andtheZhejiangProvincialNaturalScienceFoundationofChina (Grantno.LR16A050001).MSMacknowledgesfinancialsupportfromtheUniversityofZurichthroughtheURPPSocial Networks,theSwissNationalScienceFoundation(GrantNo.200021-182659),theUESTCprofessorresearchstart-up(Grant No.ZYGX2018KYQD21).

AppendixA. Evaluationoftheagebiasremoval

Fig.A.1–A.3review theagebiasofindividualrankingmetricsin thethreeanalyzeddatasets.Using theusual divi-sionofallnodesin40equally-sizedgroupsbyage,thefiguresshowthenumberofnodesfromeachagegroup,N1%(g) inthetop1%oftherankingbyeachrespectiverankingmetric.Anage-unbiasedmetric wouldthusdisplayaflat his-togramwheredeviationsfromtheperfectlyuniformvalueNU=0.01N/40ineachagebinwouldbeonlyofstatisticalnature. Asin(Marianietal.,2016),wemeasurethelevelofagebiasineachhistogramusingtheobservedstandarddeviation

=









1 40 40



g=1 [N1%(g)−NU]2

withtheaverage standard deviation0 that resultsfromdistributing 0.01N nodes among the40 agebins in a ran-dom(and therefore unbiased) way.When thebias strength is measuredby /0,the value of around one(or less) indicates that the observed level of bias can be explained by statistical fluctuations only. The higher the value, the stronger the bias. The values of /0 corresponding to the histograms in Figs. A.1–A.3 are summarized in TableA.5.

Fig.A.1.VisualizationoftheagebiasoforiginalandrescaledmetricsfortheAPSdata.

(16)

Fig.A.2.VisualizationoftheagebiasoforiginalandrescaledmetricsfortheHEPdata.

TableA.5

QuantificationoftheagebiasmagnitudeofrespectiverankingmetricsinFigs.A.1–A.3using/0(thehigherthevalue,thestrongertheagebias;thevalue ofoneindicateszerobias).

Metric APS HEP PAT

Original Rescaled Original Rescaled Original Rescaled

P 15.7 1.4 22.1 1.9 36.1 6.2 C 7.5 1.4 9.7 1.9 46.6 7.4 T 14.2 4.1 10.5 8.6 42.0 40.7 H 8.4 1.1 11.0 2.3 48.8 11.9 L 23.7 1.4 29.5 1.8 41.3 6.6 CI 9.9 2.8 12.5 3.3 49.6 17.9 SLC 11.5 2.1 13.8 3.1 50.6 18.6 HITS 6.0 5.0 10.4 8.9 28.2 108.3

http://doc.rero.ch

(17)

Fig.A.3.VisualizationoftheagebiasoforiginalandrescaledmetricsforthePATdata.ThefailureofweakeningtheagebiasoftheHITSauthorityscore byrescalingisduetotheauthorityscores“concentrating”onasmallfractionofnodeswiththeremainingnodeshavingasymptoticallyzeroscore,which posesobviousproblemstotherescalingprocedurebasedoncomputingscoremeanandstandarddeviationinafinitemovingtimewindow.

Fig.A.4.Thenormalizedidentificationrateofindividualmetricsasafunctionoftheseminalnodeage(inyears).Tofacilitatethecomparison,themetrics’ performanceisnormalizedtothebestmetricineachagebininthesamewayasinFig.2.AmetricwithzeroNIRthusreceiveszeroscore,whileametric thatachievesthebestNIRforgivenseminalnodeagereceivesthescoreofone.

References

Agnoloni,Tommaso,&Pagallo,Ugo.(2015).ThecaselawoftheItalianconstitutionalcourt,itspowerlaws,andthewebofscholarlyopinions.Proceedings ofthe15thinternationalconferenceonartificialintelligenceandlaw,151–155.

Alonso,Sergio,Cabrerizo,FranciscoJavier,Herrera-Viedma,Enrique,&Herrera,Francisco.(2009).h-index:Areviewfocusedinitsvariants,computation andstandardizationfordifferentscientificfields.JournalofInformetrics,3(4),273–289.

Berkhin,Pavel.(2005).AsurveyonPageRankcomputing.InternetMathematics,2(1),73–120.

Bornmann,Lutz,&Daniel,Hans-Dieter.(2008).Whatdocitationcountsmeasure?Areviewofstudiesoncitingbehavior.JournalofDocumentation,64(1), 45–80.

Bornmann,Lutz,&Marx,Werner.(2015).Methodsforthegenerationofnormalizedcitationimpactscoresinbibliometrics:Whichmethodbestreflects thejudgementsofexperts?JournalofInformetrics,9(2),408–418.

Bornmann,Lutz,Leydesdorff,Loet,&Mutz,Rüdiger.(2013).Theuseofpercentilesandpercentilerankclassesintheanalysisofbibliometricdata: Opportunitiesandlimits.JournalofInformetrics,7(1),158–165.

Bovet,Alexandre,&Makse,HernánA.(2019).InfluenceoffakenewsinTwitterduringthe2016USpresidentialelection.NatureCommunications,10(1),7. Braun,Tibor,Glänzel,Wolfgang,&Schubert,András.(2006).AHirsch-typeindexforjournals.Scientometrics,69(1),169–173.

(18)

Brin,Sergey,&Page,Lawrence.(1998).Theanatomyofalarge-scalehypertextualwebsearchengine.ComputerNetworksandISDNSystems,30(1-7), 107–117.

Charlton,BruceG.,&Andras,Peter.(2007).Evaluatinguniversitiesusingsimplescientometricresearch-outputmetrics:Totalcitationcountsper universityforaretrospectiveseven-yearrollingsample.ScienceandPublicPolicy,34(8),555–563.

Chen,Duanbing,Lü,Linyuan,Shang,Ming-Sheng,Zhang,Yi-Cheng,&Zhou,Tao.(2012).Identifyinginfluentialnodesincomplexnetworks.PhysicaA: StatisticalMechanicsandItsApplications,391(4),1777–1787.

Chen,Peng,Xie,Huafeng,Maslov,Sergei,&Redner,Sidney.(2007).FindingscientificgemswithGoogle’sPageRankalgorithm.JournalofInformetrics,1(1), 8–15.

Dunaiski,Marcel,Visser,Willem,&Geldenhuys,Jaco.(2016).Evaluatingpaperandauthorrankingalgorithmsusingimpactandcontributionawards. JournalofInformetrics,10(2),392–407.

Dunaiski,Marcel,Geldenhuys,Jaco,&Visser,Willem.(2018).Howtoevaluaterankingsofacademicentitiesusingtestdata.JournalofInformetrics,12(3), 631–655.

Dunaiski,Marcel,Geldenhuys,Jaco,&Visser,Willem.(2019a]).Globalisedvsaveraged:Biasandrankingperformanceontheauthorlevel.Journalof Informetrics,13(1),299–313.

Dunaiski,Marcel,Geldenhuys,Jaco,&Visser,Willem.(2019b]).Ontheinterplaybetweennormalisation,bias,andperformanceofpaperimpactmetrics. JournalofInformetrics,13(1),270–290.

Fowler,JamesH.,&Jeon,Sangick.(2008).Theauthorityofsupremecourtprecedent.SocialNetworks,30(1),16–30.

González-Pereira,Borja,Guerrero-Bote,VicenteP.,&Moya-Anegón,Félix.(2010).Anewapproachtothemetricofjournals’scientificprestige:TheSJR indicator.JournalofInformetrics,4(3),379–391.

Harzing,Anne-Wil,&Wal,RonVanDer.(2009).AGoogleScholarh-indexforjournals:Analternativemetrictomeasurejournalimpactineconomicsand business.JournaloftheAmericanSocietyforInformationScienceandtechnology,60(1),41–46.

Hicks,Diana,Wouters,Paul,Waltman,Ludo,deRijcke,Sarah,&Rafols,Ismael.(2015).Bibliometrics:TheLeidenManifestoforresearchmetrics.Nature, 520,429–431.

Hirsch,JorgeE.(2005).Anindextoquantifyanindividual’sscientificresearchoutput.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesof America,102(46),16569–16572.

Hirsch,JorgeE.(2007).Doesthehindexhavepredictivepower?ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,104(49), 19193–19198.

Kleinberg,JonM.(1999).Authoritativesourcesinahyperlinkedenvironment.JournaloftheACM,46(5),604–632.

Kogan,Leonid,Papanikolaou,Dimitris,Seru,Amit,&Stoffman,Noah.(2017).Technologicalinnovation,resourceallocation,andgrowth.TheQuarterly JournalofEconomics,132(2),665–712.

Leydesdorff,Loet,Bornmann,Lutz,Mutz,Rüdiger,&Opthof,Tobias.(2011).Turningthetablesoncitationanalysisonemoretime:Principlesfor comparingsetsofdocuments.JournaloftheAmericanSocietyforInformationScienceandTechnology,62(7),1370–1381.

Leydesdorff,Loet,Bornmann,Lutz,&Opthof,Tobias.(2018).h˛:Thescientistaschimpanzeeorbonobo.Scientometrics,1–4.

Liao,Hao,Mariani,ManuelSebastian,Medo,Matúˇs,Zhang,Yi-Cheng,&Zhou,Ming-Yang.(2017).Rankinginevolvingcomplexnetworks.PhysicsReports, 689,1–54.

Lloyd,Kirsten.(2018).Biasamplificationinartificialintelligencesystems.arXiv:1809.07842

Lü,Linyuan,&Zhou,Tao.(2011).Linkpredictionincomplexnetworks:Asurvey.PhysicaA:StatisticalMechanicsandItsApplications,390(6),1150–1170. Lü,Linyuan,Zhang,Yi-Cheng,Yeung,ChiHo,&Zhou,Tao.(2011).Leadersinsocialnetworks,theDeliciouscase.PLoSONE,6(6),e21202.

Lü,Linyuan,Medo,Matúˇs,Yeung,ChiHo,Zhang,Yi-Cheng,Zhang,Zi-Ke,&Zhou,Tao.(2012).Recommendersystems.PhysicsReports,519(1),1–49. Lü,Linyuan,Zhou,Tao,Zhang,Qian-Ming,&Stanley,H.Eugene.(2016).Theh-indexofanetworknodeanditsrelationtodegreeandcoreness.Nature

Communications,7,10168.

Lundberg,Jonas.(2007).Liftingthecrown–citationz-score.JournalofInformetrics,1(2),145–154.

Manning,Christopher,Raghavan,Prabhakar,&Schütze,Hinrich.(2010).Introductiontoinformationretrieval.NaturalLanguageEngineering,16(1), 100–103.

Mariani,ManuelSebastian,Medo,Matúˇs,&Zhang,Yi-Cheng.(2015).Rankingnodesingrowingnetworks:WhenPageRankfails.ScientificReports,5, 16181.

Mariani,ManuelSebastian,Medo,Matúˇs,&Zhang,Yi-Cheng.(2016).Identificationofmilestonepapersthroughtime-balancednetworkcentrality.Journal ofInformetrics,10(4),1207–1223.

Mariani,ManuelSebastian,Medo,Matúˇs,&Lafond,Franc¸ois.(2018).Earlyidentificationofimportantpatents:Designandvalidationofcitationnetwork metrics.TechnologicalForecastingandSocialChange,http://dx.doi.org/10.1016/j.techfore.2018.01.036

Martin,Travis,Ball,Brian,Karrer,Brian,&Newman,M.E.J.(2013).CoauthorshipandcitationpatternsinthePhysicalReview.PhysicalReviewE,88(1), 012814.

Mattedi,MarcosAntônio,&Spiess,MaikoRafael.(2017).Theevaluationofscientificproductivity.História,Ciências,Saúde-Manguinhos,24(3),623–643. Medo,Matúˇs,&Cimini,Giulio.(2016).Model-basedevaluationofscientificimpactindicators.PhysicalReviewE,94(3),032312.

Mingers,John,&Leydesdorff,Loet.(2015).Areviewoftheoryandpracticeinscientometrics.EuropeanJournalofOperationalResearch,246(1),1–19. Morone,Flaviano,&Makse,HernánA.(2015).Influencemaximizationincomplexnetworksthroughoptimalpercolation.Nature,527(7579),544. Mutz,Rüdiger,&Daniel,Hans-Dieter.(2012).Thegeneralizedpropensityscoremethodologyforestimatingunbiasedjournalimpactfactors.

Scientometrics,92(2),377–390.

Newman,Mark.(2010).Networks:Anintroduction.OxfordUniversityPress.

Newman,MarkE.J.(2009).Thefirst-moveradvantageinscientificpublication.EPL(EurophysicsLetters),86(6),68001.

Nickerson,KyleL.,Chen,Yuanzhu,Wang,Feng,&Hu,Ting.(2018).Measuringevolvabilityandaccessibilityusingthehyperlink-inducedtopicsearch algorithm.Proceedingsofthegeneticandevolutionarycomputationconference,1175–1182.

Radicchi,Filippo,Fortunato,Santo,Markines,Benjamin,&Vespignani,Alessandro.(2009).Diffusionofscientificcreditsandtherankingofscientists. PhysicalReviewE,80(5),056103.

Raghavendra,Ramya,Cerutti,Federico,&Preece,Alu.(2018).Whendatalie:Fairnessandrobustnessincontestedenvironments.InNext-generation analystVI(p.106530U).InternationalSocietyforOpticsandPhotonics.

Ren,Zhuo-Ming.(2019).Agepreferenceofmetricsforidentifyingsignificantnodesingrowingcitationnetworks.PhysicaA:StatisticalMechanicsandits Applications,513,325–332.

Ren,Zhuo-Ming,Mariani,ManuelSebastian,Zhang,Yi-Cheng,&Medo,Matúˇs.(2018).Randomizinggrowingnetworkswithatime-respectingnullmodel. PhysicalReviewE,97(5),052311.

deRijcke,Sarah,Wouters,PaulF.,Rushforth,AlexD.,Franssen,ThomasP.,&Hammarfelt,Björn.(2016).Evaluationpracticesandeffectsofindicatoruse– Aliteraturereview.ResearchEvaluation,25(2),161–169.

Schubert,András.(2008).Usingtheh-indexforassessingsinglepublications.Scientometrics,78(3),559–565.

Strumsky,Deborah,&Lobo,José.(2015).Identifyingthesourcesoftechnologicalnoveltyintheprocessofinvention.ResearchPolicy,44(8),1445–1461. Todeschini,Roberto,&Baccini,Alberto.(2016).Handbookofbibliometricindicators:Quantitativetoolsforstudyingandevaluatingresearch.JohnWiley&

Sons.

Vaccario,Giacomo,Medo,Matúˇs,Wider,Nicolas,&Mariani,ManuelSebastian.(2017).Quantifyingandsuppressingrankingbiasinalargecitation network.JournalofInformetrics,11(3),766–782.

(19)

Walker,Dylan,Xie,Huafeng,Yan,Koon-Kiu,&Maslov,Sergei.(2007).Rankingscientificpublicationsusingamodelofnetworktraffic.JournalofStatistical Mechanics:TheoryandExperiment,(06),P06010,2007.

Waltman,Ludo.(2016).Areviewoftheliteratureoncitationimpactindicators.JournalofInformetrics,10(2),365–391.

Waltman,Ludo,&Yan,Erjia.(2014).Pagerank-relatedmethodsforanalyzingcitationnetworks.Measuringscholarlyimpact,83–100.

Wasserman,Max,Zeng,XiaoHanT.,&Amaral,LuísA.Nunes.(2015).Cross-evaluationofmetricstoestimatethesignificanceofcreativeworks. ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,112(5),1281–1286.

West,JevinD.,Jensen,MichaelC.,Dandrea,RalphJ.,Gordon,GregoryJ.,&Bergstrom,CarlT.(2013).Author-leveleigenfactormetrics:Evaluatingthe influenceofauthors,institutions,andcountrieswithinthesocialscienceresearchnetworkcommunity.JournaloftheAmericanSocietyforInformation ScienceandTechnology,64(4),787–801.

Zeng,An,Shen,Zhesi,Zhou,Jianlin,Wu,Jinshan,Fan,Ying,Wang,Yougui,etal.(2017).Thescienceofscience:Fromtheperspectiveofcomplexsystems. PhysicsReports.

Zhou,Yan-Bo,Lü,Linyuan,&Li,Menghui.(2012).Quantifyingtheinfluenceofscientistsandtheirpublications:Distinguishingbetweenprestigeand popularity.NewJournalofPhysics,14(3),033033.

Figure

Fig. 1. Metrics’ performance in identifying the seminal nodes as measured by the identification rate (z = 1%) in complete datasets
Fig. 3. Similarity of the evaluated metrics as measured by the Spearman rank correlation of all node rank positions
Fig. 4. The distributions of the seminal nodes’ publication dates in the datasets: real time (top row), and 40 equally-sized age groups (bottom row).
Fig. 6. An illustration for the alternative interpretation of Task 2: The age distribution of the seminal expert-selected nodes and the top z fraction of nodes from each age group (we use here four age groups as an example).
+5

Références

Documents relatifs

This paper analyzes the relation between nominal exchange rate volatility and several macroeconomic variables, namely real per output growth, excess credit,

• Development of a framework of new metrics for assessing performance and energy efficiency of communication systems and processes in cloud computing data centers (Section 3)..

Evaluation of Network Energy and Power Usage Effec- tiveness: For obtaining CNEE and NPUE it is necessary to calculate the power consumption of the computing servers and

The normalized turbulent dissipation rate C ǫ is studied in decaying and forced turbulence by direct numerical simulations, large-eddy simulations and closure calculations.. A

When coupled with ARF, their aggregate throughput decline significantly, down to 100-200 KB/s on average in 802.11b (at the same level as NewReno), revealing the fact that

The technical specification of data quality control in the data collection, data input, subject indexing, data storage construction, data description and data

Then, in the column scaling phase, only a single MCS (corresponding to the smaller MCS index whose theoretical throughput is higher than the current theoretical throughput or

We could also have tested the impact of observable variables reflecting institutional characteristics of the goods and labour markets (trade union membership, replacement ratio,