Unbiased evaluation of ranking metrics reveals consistent performance in science and technology citation data

(1)

Unbiased evaluation of ranking metrics reveals consistent

performance in science and technology citation data

Shuqi Xu

a

_{, Manuel Sebastian Mariani}

a,b

_{, Linyuan Lü}

a,c

_{, Matúˇs Medo}

a,d,e,∗

a_{Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, PR China} b_{URPP Social Networks, University of Zurich, 8050 Zurich, Switzerland}

c_{Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, 311121 Hangzhou, PR China} d_{Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, 3010 Bern, Switzerland} e_{Department of Physics, University of Fribourg, 1700 Fribourg, Switzerland}

Keywords: Citation networks Network ranking metrics Node centrality Metrics evaluation

Milestone scientiﬁc papers and patents

Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to iden-tify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics’ ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics’ performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other pop-ular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.

1. Introduction

Citation-based metrics for impact build on the premise that the number of citations received by a scientiﬁc paper (or a patent) is a reliable proxy for its scientiﬁc (or technological) impact. Such metrics are used not only to assess the impact of individual papers, but also to evaluate the overall research output of research units such as individual researchers (Hirsch, 2005;Medo & Cimini, 2016;Radicchi, Fortunato, Markines, & Vespignani, 2009;Zhou, Lü, & Li, 2012), research institutes (Charlton & Andras, 2007;West, Jensen, Dandrea, Gordon, & Bergstrom, 2013), and journals (González-Pereira, Guerrero-Bote, & Moya-Anegón, 2010;Harzing & Wal, 2009), for example. The relative ease with which new metrics of research impact can be designed has contributed to their proliferation (Mingers & Leydesdorff, 2015; Todeschini & Baccini, 2016; Waltman, 2016), and uncritical use of such metrics has eventually met a strong opposition (de Rijcke, Wouters, Rushforth, Franssen, & Hammarfelt, 2016;Hicks, Wouters, Waltman, de Rijcke, & Rafols, 2015;Leydesdorff, Bornmann, & Opthof, 2018).

∗ Corresponding author at: Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, PR China.

E-mail addresses:[email protected](L. Lü),[email protected](M. Medo).

http://doc.rero.ch

3XEOLVKHGLQ-RXUQDORI,QIRUPHWULFV

ZKLFKVKRXOGEHFLWHGWRUHIHUWRWKLVZRUN

(2)

Inparticular,scholarshaveemphasizedtheneedforunderstandingthetheoreticalfoundationsofimpactmetrics(Waltman, 2016),andevaluatinganalyticallyandempiricallythemeritsofthemetrics(Leydesdorffetal.,2018).

Despitetheuseofcitation-basedmetricsforresearchevaluationpurposesandtheincreasinglyrecognizedneedtobetter grasptheirmeritsandpitfalls,wedonotknowyetwhichmetricsbestdeliverontheirpromisetogaugethesignificanceof ascientificpaperorapatent.Thisgapispotentiallydangerous:Ifapaper-levelmetricisassumedtobeagoodproxyfor significanceanditisusedforresearchevaluationpurposes,yetitundervaluespaperswhosesignificanceisundeniable,its normativeuse(Leydesdorffetal.,2018)mightleadtodecisionsthatpenalizetrulysignificantresearch.

Weaimtofillthisgapbyprovidingacomprehensiveempiricalcomparisonofabroadrangeofrankingmetrics.Weassess themetrics’abilitytosingleoutscientificpapersandpatentsthathavebeenrecognizedbyfieldexpertsasgroundbreaking orseminal.Thecoreideabehindthisevaluationisthatmetricsthataimtogaugethesignificanceofapaper/patentshould beabletodetectpapers/patentswhoseoutstandinglong-termsignificancefortheinvolvedfieldsisundeniable.Ourgoalis toanswer,whethersomemetricsperformwellacrossdifferentcitationdatasets.Ifthatisnotthecase,whichcharacteristics oftheinputdatadecidewhichmetricisthemostsuitable?

Tothisend,weanalyzethreecitationnetworks:thescholarlycitationdatasetthatincludespaperspublishedbythe AmericanPhysicalSociety(APS),thecitationdatafromtheHigh-EnergyPhysicsLiteratureDatabaseINSPIRE(HEP),and theU.S.PatentOfﬁcepatentcitationdata(PAT).Weuseexpert-selectedsetsofseminalnodesandassessrankingmetrics byhowwellaretheseminalnodesrankedbythem.Inparticular,weusemilestonepapersselectedbyAPSjournal edi-torsfortheAPSdata,community-curated“ChronologyofMilestoneEventsinParticlePhysics”fortheHEPdata,andthe listofsigniﬁcantpatentsbyStrumskyandLobo(2015)forthePATdata(seeDatasectionfordetails).Whilebynomeans exhaustive,theselistsofseminalpublicationsconsistofpapersandpatentsofexceptionalimportance(manypapershave, forexample,ledtoaNobelPrizetooneormoreoftheirauthors).Ourevaluationincludesninenetwork-basedranking metricsfromthescientometricsandnetworkscienceliteraturetogetherwiththeirtime-normalized(Mariani,Medo,& Zhang,2016)variants.Toprovideacomprehensivecomparisonofmetrics,wehavechosennetwork-basedmetricsthat havebeenusedinbibliometrics(Waltman,2016)ortheyhaveperformedwellinothernetworks(suchassocialand tech-nologicalnetworks).Additionally,weprovideresultsalsoforthepercentile-basedcitationcountwhichiscommonlyused inbibliometrics(Bornmann,Leydesdorff,&Mutz,2013;Leydesdorff,Bornmann,Mutz,&Opthof,2011).

Expert-selectednodeshavebeenusedbeforetoevaluaterankingsofauthors(Dunaiski,Geldenhuys,&Visser,2019a; Radicchietal.,2009),rankingsofmovies(Ren,Mariani,Zhang,&Medo,2018;Wasserman,Zeng,&Amaral,2015),rankings ofscientiﬁcpapers(Dunaiski,Geldenhuys,&Visser,2019b;Marianietal.,2016),andrankingsofcourtcases(Fowler&Jeon, 2008),forexample(seeDunaiski,Geldenhuys,&Visser,2018)forarecentin-depthdiscussionofthisevaluationapproach). Wemakehereanimportantmethodologicaldistinctionbydistinguishingtwosimilar,yetfundamentallydifferentranking tasks:

Task1Theusualtaskforarankingmetricistorankthegivenseminalnodesashighaspossible.Thisismotivatedbythe assumptionthatifseminalnodesareknowntobeofhighimpact,agoodmetricshouldplacethemintherankingofallnodes ashighaspossible.Toevaluatethemetrics’performanceinthistask,onetypicallyusestraditionalinformation-retrieval metricssuchasprecision,recall,andaccuracy(Dunaiski,Visser,&Geldenhuys,2016).

Task2Theneedforunbiasedevaluation,whichiscommoninscientometrics(Mingers&Leydesdorff,2015;Mutz&Daniel, 2012;Vaccario,Medo,Wider,&Mariani,2017),motivatesthesecondtask:Toranktheexpert-selectednodesashighas possiblewhilstrequiringthattherankingmetricisunbiased.Sincemoststructuralrankingmetricsarebiased(seeAppendix Aforademonstration),theirevaluationmustincludeapenaltyfortheperformancegainedthankstothemethods’bias. Citationdatacommonlyfeaturevariousbiasessuchastheﬁeldbias(Vaccarioetal.,2017),forexample.Wefocushereon theagebias;wesaythatagebiasispresentinthedatawhenthereisadependencebetweenthemeancitationcountand thepublicationage.Besidesbeingparticularlystrong,theagebiasisalsoexplicitandeasytomeasureasitisdeterminedby thepaper’spublicationdate.Itthusprovidesagoodtestbedfordealingwithbiasesintherankingproblem.

Aspartofourgoaltoevaluatetherankingperformanceofthechosenrankingmetrics,weaimtoelucidatethedifference betweenthetworankingtasksdescribedabove.Inparticular,weshowthatTask1canfavorbiasedmetricstosuchextentthat acustom-constructedrankingmethodbasedonlyonthebiasitself(inourcase,rankingbyage)caninsomecasesoutperform allstandardrankingmetrics.ThefactthatametricperformswellinthecommonTask1isthusnodirectindicationofitsability toassessthevalueorimpactofthenetwork’snodes.Wedemonstratethatthenormalizedidentiﬁcationrateintroduced inMarianietal.,(2016)appropriatelyaddressesTask2byimposingperformancepenalizationproportionaltothebias magnitudeoftheevaluatedmetric.UnlikeTask1,theresultsinTask2revealconsistentpatternsofmetrics’performance acrossthethreestudieddatasets.AsfurtherdiscussedinSection5.3,theproposedevaluationcanbealsointerpretedasa rankingproblemwhereagivensetofseminalnodesisinterpretedasapotentiallybiasedsamplefromalargergroupof high-qualitynodes.ThisfurtherincreasestherelevanceofTask2ascomparedtothecommonandstraightforwardapproach presentedinTask1.

Thepaperisorganizedasfollows.InSection2,weprovidealiteraturereview.InSection3,wedescribeandanalyzethe datasetsandthecorrespondingsetsofseminalnodes.InSection4,wepresenttheconsiderednetworkmetricsanddescribe variousperformancemeasuresofametric’srankingabilitybasedonseminalnodes.InSection5,weﬁrstaddressTask1, evaluatethemetricsbyhowwelldotheyranktheseminalnodes,andeventuallydiscussdrawbacksofthisevaluation

(3)

approach.InSection6,weaddressTask2wherethemetrics’biasistakenintoaccount,andexplainwhyarethus-obtained resultsmorerelevantthantheresultsobtainedwithinTask1.FinallyinSection7,wereviewthelimitationsandopen researchdirections,discussthemanagementimplicationsofourstudy.

2. Literaturereview

2.1. Citation-basedrankingmetrics

Citationimpactindicatorsareakeytoolinscientometricsandplayaprominentroleintheevaluationofscientiﬁcand technologicalpublications(Waltman,2016).Thegrowingdemandofevaluation informationfromresearchers,funding bodies,andresearchinstitutionsand theincreasingavailabilityofextensivedataonscholarlyactivityhavedriventhe proliferationofnewindicators(Mattedi&Spiess,2017).

Amongcitation-basedimpactindicatorsforresearcharticles,citationcount(referredasindegreeinthefieldofnetwork scienceNewman,2010;Zengetal.,2017)isthemostbasicandestablishedoneasithasbeenusedforrankingofscholarly publicationssincethe1970s(Liao,Mariani,Medo,Zhang,&Zhou,2017).Thebasicpremisecitationcountissimple:the mostinfluentialpublicationsarethemostcited.Themetric’ssimplicitycomesatacostasthenaturaldifferencesbetween citationsareneglectedbycitationcount(seeBornmann&Daniel,2008)foranextensivereviewofthecitingbehavior).In particular,citationcountassignsthesameweightstoacitationfromaground-breakingarticlepublishedinaleadingjournal andacitationfromanobscurearticle.TheseminalPageRankalgorithmfortheWorldWideWeb(Brin&Page,1998)assigns higherweighttoreferencesfromwebpagesthatarehighlyvaluedbythealgorithm.Chen,Xie,Maslov,andRedner(2007) appliedPageRanktoacitationnetworktomeasuretheimportanceofindividualscientificpublications,initiatingtheinterest inrecursivecitationimpactindicators(Waltman,2016).Sincethen,variousPageRankvariants(Waltman&Yan,2014)have beenproposed,ofwhichtheCiteRank(Walker,Xie,Yan,&Maslov,2007)isthebestknown.OfnoteistheHITS (Hyperlink-InducedTopicSearch)algorithm(Kleinberg,1999)whichassignstwoscores,hubscoreandauthorityscore,toeachnode.In thecontextofcitationdata,itisnaturaltoconsidervitalreviewsashubswhichciteotherinfluentialpublicationsthathave highauthorityscoreandhigh-impactarticleswhichtendtobecitedby,amongothers,reviewarticleswithhighhubscores (Nickerson,Chen,Wang,&Hu,2018).Anothernotablebranchofresearchimpactindicatorsistheh-index(Hirsch,2005) whichwasintroducedtoevaluatethescientificoutputofresearchersandlaterextendedbySchubert(2008)toassessthe impactofindividualpublications.Alargenumberofvariantsoftheh-indexhavebeenproposedintheliterature(Alonso, Cabrerizo,Herrera-Viedma,&Herrera,2009).

Besidesthemostusedcitationimpactindicators,includingcitationcount,PageRank,CiteRank,H-indexandHITSauthority score,weconsideralsoseveralnetwork-basedmetricsthatperformswellinotherrankingscenarios:LeaderRank,Collective Inﬂuence,andSemi-localcentrality.ThesemetricsareintroducedinSection4.

2.2. Rankingbiasincitationanalysis

Intheanalysisofcitationdata,citationcountsofdifferentpublicationscannotbedirectlycomparedastherearevarious sourcesofbiasthatcaninvalidatethevalidityofsuchacomparison.Themost-studiedbiasesareinducedbythepublication researchfield,ageanddocumenttype(Waltman,2016).Fieldbiasmanifestsitselfbythemeancitationcountofpublications ofasimilaragevaryinggreatlybetweenfieldsbecauseoftheirdifferentcitationpractices(Lundberg,2007).Theaverage citationcountdiffersbetween,forexample,naturalsciencesandhumanitiesbythefactorof10(Bornmann&Marx,2015). Theagebiasofcitationcounthastwodistinctcomponents.First,theaveragecitationcountofpapersofafixedagegradually increasesovertime(Martin,Ball,Karrer,&Newman,2013).Second,thecitationcountsnaturallygrowwiththeageof publicationsastheyaccumulatemoreandmorecitationswithtimewhichpreventscitationcountsofpapersofdifferent agefrombeingdirectlycomparable.Finally,citationcountsofpublicationsofdifferentdocumenttypes(suchasarticle, letter,review,andcommentary)shouldnotbedirectlycomparedwitheachotherbecause,forexample,reviewpaperstend toattractmorecitationsthanordinaryresearcharticlesandletters(Lundberg,2007).

Toovercomethesebiases,anumberofnormalizedcitation-basedrankingmetricshavebeendevelopedtoallowformore faircomparisons,suchasmean-basedmetricsandpercentile-basedmetrics(seeBornmann&Marx,2015;Waltman,2016 forreviews).Inthecaseofpaperandpatentcitationnetworks,thatareourfocusinthismanuscript,themostrelevantbias istheagebiaswhichisstrongandrelevantinbothpaperandpatentcitationdata.

Despitemanycitation-basedrankingmetricsbeingproposedinthepast,acomprehensivecomparisonoftheirranking performanceonvariousdatasetsisstilllacking.Todeterminewhichmetricsarebestsuitedtorankpapersorpatentsin citationnetworksisthefirstresearchgapthatweaimtofillbythisstudy.Thecommonapproachtoassessametric’sranking performanceisbasedonexpert-selectednodes.Thepreviousstudiesadoptingthisapproachfocussolelyontheranking positionsoftheexpert-selectednodesandignoretheconfoundingeffectsthatcanbeintroducedbythechoiceoftheseminal nodes.Forexample,ifexpertstendtoidentifyoldworksasseminal,arankingmetricthatsharesthisbiasgainsanadvantage anditspotentialsuperioritymaybeillusive.Wefillthissecondresearchgapbyexploringtheinterplaybetweenthebiasof theevaluatedrankingmetricsandthebiasoftheseminalnodes,andexploringanevaluationprocedurethattakesthebias oftheseminalnodesintoaccount.

(4)

Table1

Basiccharacteristicsofthenetworkscorrespondingtothethreeanalyzeddatasets:theirtimespan,thenumberofnodesN,thenumberofedgesE,andthe numberofseminalnodesS.

Dataset Label Timespan N E S

APSpapers APS 1893–2016 595,287 7,051,801 160

HEPpapers HEP 1764–2017 829,708 14,994,123 310

USpatents PAT 1926–2010 6,237,625 45,962,301 112

3. Data

Tocomparetherankingperformanceofnetwork-basedmetrics,weusethreecitationdatasets:theclassicalAmerican PhysicalSocietycitationdata,high-energyphysicscitationdata,andtheU.S.PatentOfﬁcecitationdata.Eachdatasetcan berepresentedasagrowingdirectednetworkwherenodesgraduallyappearwithtime.Timeresolutionisonedayfor alldatasets.Nodesrepresentpapersorpatentsanddirectedlinksrepresentcitations.Foreachdataset,thereisasetof correspondingexpert-selectednodesofhighimpactthatwereferasseminalnodes.Table1summarizesbasiccharacteristics oftheanalyzednetworksandthecorrespondingsetsofseminalnodes.

3.1. AmericanPhysicalSocietycitationdata(APS)

TheAmericanPhysicalSociety(APS)datasetinourpossessioncoversyears 1893–2016(thedatasetisavailableon demandfromhttps://journals.aps.org/datasets).Afterremovingnon-researchpapers(announcements,bookreviews,etc.), thedatasetcontains595,287nodes(papers)publishedbytheAPSjournalsand7,051,801directedlinks(citations)between them.Forthisdataset,weusemultipleselectionsofmilestonepaperschosenbyeditorsofAPSjournals:87PhysicalReview Lettersmilestones,1₂₃_Physical_Review_E_milestones,2_and₇₈_select_papers_announced_on_the_125th_anniversary_of_the

PhysicalReviewjournals.3_In_total,_there_are₁₆₁_unique_seminal_papers,_of_which₁₆₀_are_present_in_the_citation_data_(the

onemissingpaperisfrom2017,henceoutsidethecoverageperiodofourdataset). 3.2. High-EnergyPhysicscitationnetwork(HEP)

INSPIREisaprojectrunbyleadinghigh-energyphysicsinstitutionsaroundtheworld(CERN,DESY,Fermilab,IHEP, andSLAC).Amongotherthings,itcuratesadatabaseofpapershigh-energyphysicspapers(andpapersrelevanttothe high-energyphysicscommunity),whichisalsomadeavailableondemandforresearchpurposes.4 _After_processing_the

downloadedxmldatadumpcoveringyears1764–2017,weobtainedacitationnetworkcontaining829,708nodesand 14,994,123directedlinks.Thelistofmilestonepapershasbeendownloadedfromthewebsite“ChronologyofMilestone EventsinParticlePhysics”5 _that_lists_milestone_events_and_the_{corresponding}_papers._The_website_is_a_joint_effort_of_the

InstituteforHighEnergyPhysics(Russia)andtheParticleDataGroup(USA)withseveralleadinghigh-energyresearchers contributingtotheﬁnalversionofthechronology.Thus-obtainedmilestonepapershavebeenmatchedwiththeHEPdata, leadingtotheﬁnalsetof310seminalpapers.

3.3. USpatentcitationnetwork(PAT)

TheUSpatentdatasetwascollectedbyKogan,Papanikolaou,Seru,andStoffman(2017)andcoversyears1926–2010. Intotal,thereare6,237,625nodes(patents)and45,962,301links(citations)amongthem.InStrumskyandLobo(2015), theauthorslisted175patentswhich“affectedsociety,individualsandtheeconomyinahistoricallysignificantmanner”.In agreementwithMariani,Medo,andLafond(2018),weremovethepatentsissuedoutsidethecitationdataset’stimespanas wellasthedesignpatentsthatareabsentinthecitationdata.Asaresult,112seminalpatentsareusedforfurtheranalysis. Table2comparesthecharacteristicsofallnodesandtheseminalnodesineachdataset.Asexpected,theseminalnodes haveindegree(commonlyreferredtoasthenumberofcitations)significantlyhigherthantheoverallmedianindegreein allthreedatasets.Thisconfirmsthattheexpertassessmentoftheseminalnodesisnotincontradictionwiththeirimpact asreflectedbythecitationnetwork.

Table2furtherliststhetimesneededtocollecttheirfirstthreeandfivecitationsrespectively,3and5,byvarious groupsofnodes.Thesetimesindicatethetimescalesofcitationdynamics.Wesee,forexample,thatthenodescollecttheir citationsinthePATdatasignificantlyslowerthanintheAPSandtheHEPdata.IntheAPSdata,both3and5aresmaller fortheseminalnodesthantheyareforallnodes,whichisunderstandablegiventhemuchhigherindegreeoftheseminal

1_Retrieved_from_{https://journals.aps.org/prl/50years/milestones}_on_June_6,_2017. 2_Retrieved_from_{https://journals.aps.org/pre/collections/pre-milestones}_on_June_6,_2017. 3_Retrieved_from_{https://journals.aps.org/125years}_on_January_12,_2018.

4_We_downloaded_the_INSPIRE_data_on_October_30,₂₀₁₇_from_{https://inspirehep.net/dumps/inspire-dump.html}_. 5_Retrieved_from_{http://web.ihep.su/dbserv/compas/}_on_April_6,_2018.

(5)

Table2

Acomparisonbetweentheseminalnodesandallnodes.Here3and5arethemeantimesneededforthenodestogettheirﬁrst3and5citations, respectively(ignoringthenodesthathavelessthan3and5citations,respectively).

Dataset Setofnodes Medianindegree 3 5

APSpapers All 5 3.6years 4.8years

Seminal 239 1.3years 2.2years

HEPpapers All 4 3.1years 3.9years

USpatents All 3 11.5years 13.3years

Table3

Thesummarytableofallmetrics;ourcomputationofeachmetricisbasedontheprovidedimplementationreference.Weincludedalsoage-normalized variantsofthedisplayedmetricsintheanalysis(exceptforYCCPthatalreadyinvolvesage-normalization).Exceptfortherescaledcitationcount(Newman, 2009)andtherescaledPageRank(Marianietal.,2016),therescaledvariantsoftheremainingmetricshavenotbeenconsideredbefore.

Metric Abbreviation Implementationreference

Citationcount C Newman(2010)

PageRank P Chenetal.(2007)

CiteRank T Walkeretal.(2007)

LeaderRank L Lü,Zhang,Yeung,andZhou(2011)

H-index H Hirsch(2005)

Directedcollectiveinﬂuence CI BovetandMakse(2019)

Semi-localcentrality SLC Chen,Lü,Shang,Zhang,andZhou(2012)

HITSauthority HITS Kleinberg(1999)

Yearlycitationcountpercentile YCCP Leydesdorffetal.(2011)

nodes.Curiously,therelationisrevertedfortheHEPdatawheretheseminalnodesneedmoretimetogettheirfirst3or5 citationsthanallpapers.Whileparadoxicalatfirstsight,thisisadirectconsequenceoftheHEPseminalnodesbeing over-representedamongtheoldnodes(seeFig.4).Atthetimewhentheseseminalnodeswerecollectingtheirfirstcitations,the citationdynamicswassubstantiallyslowerthannowadays,andthisthenmanifestsitselfintheirhigh3and5.Weshall seeinSection5.3thatthestrongagebiasoftheHEPseminalnodeshasfurtherimportantimplications.

4. Noderankingmetrics

Weuseninedistinctnetworkcentralitymetricsthataredescribedbelow(seeTable3forasummary),andtheirvariants wheretheagebiasofmetricshasbeenremovedbytherescalingprocedureintroducedinMarianietal.(2016)(seeVaccario etal.,2017forsimultaneousremovalofageandﬁeldbiasbytherescalingprocedure).Inadatasetwithtime-stampednodes, rescalingcanbeappliedtoanynoderankingmetric;seeSection4.10fordetails.Theinputcitationdataarerepresentedby theN×NadjacencymatrixAwhoseelementAij=1ifnodeicitesnodejandAij=0otherwise.

4.1. Citationcount(C)

Citationcountreferstothenumberofcitationsreceivedbyagivenpaperorpatent.Itisequivalenttonodeindegreeina directednetwork(Newman,2010).Fornodei,citationcountisdeﬁnedasCi=

j=1Aji.Basedontheassumptionthatanode isimportantifitiscitedbymanyothernodes,citationcountisthesimplestandthemostwidelyusedindicatorofpaperor patentimpactincitationdata.Themetric’ssimplicitydirectlytranslatesintoitslowcomputationalcomplexity.

4.2. PageRank(P)

PageRank(Brin&Page,1998),whichisannodecentralitymetricoriginallydevisedtorankpagesintheWorldWideWeb andlaterappliedtocitationdatatoassessthesigniﬁcanceofpublications(Chenetal.,2007),introducestheimportanceof differentnodesinaself-consistentmanner:Anodeisimportantifitiscitedbyotherimportantnodes.PageRankscorePiof nodeiisdeﬁnedbythesetofequations(Berkhin,2005)

Pi=˛

j:kout_>0 Aji kout j Pj+˛

j:kout₌₀ Pj N+ 1−˛ N (1) wherei=1,...,N,kout j =

lAjlistheoutdegreeofnodej,˛isthedampingfactor,and(1−˛)/Nisusuallyreferredtoasthe teleportationtermwhoseroleistoensurethatEq.(1)hasauniquesolution.While˛=0.85isusedintherankingofweb

(6)

pages(Berkhin,2005),˛=0.5istypicallyusedintheanalysisofcitationdata(Chenetal.,2007).Eq.(1)isusuallysolvedby iterations:startingfromtheuniforminitialscoreP_i(0)=1/N,everynode’sscoreisupdatediterativelyas(Berkhin,2005)

P(n+1)_i =˛

j:kout_>0 A_ji kout j P_j(n)+˛

j:kout₌₀ P_j(n) N + 1−˛ N (2)

wherenistheiterationnumber.Westoptheiterationswhentheaveragescorechangeissmallenough,thatis

N_i=1|P(n)_i −

P_i(n−1)|/N<εwhereε=10−9.Thesamestoppingconditionisusedalsoinothermetricsthatinvolveiterations(CiteRank

andLeaderRank).

4.3. CiteRank(T)

CiteRank(Walkeretal.,2007)hasbeenintroducedtooffsetPageRank’sstrongbiastowardsoldnodes[notethatin somecases,PageRankcanbealsobiasedtowardsrecentnodes(Mariani,Medo,&Zhang,2015)].Usingtherepresentation ofPageRankasarandomwalkonthecitationnetwork,CiteRankmodiﬁesthealgorithmbyinitiallydistributingrandom walkerspreferentiallyonrecentnodes,withtheoldnodesbeingexponentiallysuppressedatatimescale.SimilarlytoEq. (2),CiteRankscoreTifornodeicanbedeﬁnedinaniterativewayas

T_i(n+1)=˛

j:kout_>0 Aji kout j T_j(n)+˛

j:kout₌₀ T_j(n) N +(1−˛) exp[−(t−ti)/]

N j=1exp[−(t−ti)/] (3)

wheretiisthepublicationdateofnodei,tisthedatewhenthescoresarecomputed;othertermsandparametershavethe samemeaningasforPageRank.Toestimatesuitablevaluesforparameters˛and,wefollowedtheproceduredescribed inWalkeretal.(2007)wheretheauthorsmaximizethecorrelationbetweentheCiteRankscoresandthenodes’recent indegreeincrease.Theresultingparametervaluesare˛=0.50,=2.6yearsforAPS;˛=0.50,=2.4yearsforHEP;and˛=0.44, =7.6yearsforPAT.Notably,theparametervaluesforAPSarethesameasreportedinWalkeretal.(2007)despiteourAPS datasetincluding13additionalyearsand,consequently,60%morepapers.

4.4. LeaderRank(L)

TheneedforateleportationterminPageRankcanbeeliminatedbyconnectingeachnodetoanartiﬁcial“ground”node withbidirectionallinks.Theresultingparameter-freeLeaderRankmetrichasbeenproposedin(Lüetal.,2011)toquantify nodeinﬂuence.TheiterativeequationfortheLeaderRankscoreLis

L(n+1)_i = N+1

j A_ji kout j L(n)_j (4)

wherebothAjiandkoutj includethegroundnodeandthelinksbetweenthegroundnodeandallothernodesinthenetwork. AfterobtainingtheequilibriumscoresL(nc)

i ,thegroundnodeisremovedfromthesystemanditsscoreisevenlydistributed amongallrealnodes.Theﬁnalscoreofnodeiisthusdeﬁnedas

Li=L(ni c)+ L(nc)

g

N (5)

Theredistributionofthegroundnode’sscoredoesnotaffecttherankingofnodesbyLeaderRank,though.

4.5. H-index(H)

H-index(Hirsch,2005)wasoriginallydevisedtocharacterizetheacademicimpactofresearchersbasedontheir publi-cationsandcitations(Hirsch,2005,2007).Similarlyasitwaslaterappliedtoevaluateresearchjournals(Braun,Glänzel,& Schubert,2006),itcanbeadaptedalsotoevaluateresearchpapers:Theh-indexofpaperiisdeﬁnedasthelargestnumberhi suchthatpaperiiscitedbyatleasthipapersthateachhaveatleasthicitations(Lü,Zhou,Zhang,&Stanley,2016;Schubert, 2008).

(7)

4.6. CollectiveInﬂuence(CI)

CollectiveInﬂuencewasintroducedinMoroneandMakse(2015)toidentifythosenodesthat,whenremoved,causethe biggestdamagetoagraph’sgiantcomponent;thealgorithmisbasedontheclassicalproblemofpercolationincomplex networks.TheCIcentralityofnodeiatlevellisdeﬁnedas

CIl_i=(k_i−1)

j:dij=l

(k_j−1) (6)

wherekiisthedegreeofnodei,dijislengthoftheshortestdistancebetweennodesiandj,andlisthemetric’sparameter. InlinewithBovetandMakse(2019),weconsideronlynodeindegreeinthecomputationofCIasnodeindegreeisindicative ofnodeimpact.Usingnodeoutdegreeorcombiningin-andout-degreeleadstoinferiorresultsinourevaluation.Distance dijiscomputedsothatitrespectslinkdirections.Ourtestsshowthatl=1andl=2producethebestresults,themetric’s performancedeterioratesaslincreasesfurther.Weusel=2forallCIresultspresentedhere.

4.7. Semi-localcentrality(SLC)

Semi-localcentralitywasproposed(Chenetal.,2012)asanextensionofthepurelylocalnodedegree(whichisthe simplestnodecentralitymetric).Itissemi-localinthesenseofconsideringthenodeneighborhooduptothefourthorder. Thesemi-localcentralityscoreofnodeiisdeﬁnedas

SLCi=

j∈i Qj, Qj=

k∈j Nk (7)

whereiisthesetofthenearestneighborsofnodeiandNkisthenumberofthenearestandthenext-nearestneighborsof nodek.Inthispaper,weconsideronlythein-neighbors(i’snearestin-neighborsarethenodesthatcitenodei)toleverage theimpactofcitations.IfSLCiscomputedusingout-neighborsorusingbothin-andout-neighbors,itsperformance(as measuredbythemetricsintroducedinSections5.1and6.1)deteriorates.

4.8. Hyperlink-InducedTopicSearch(HITS)

HITS(Kleinberg,1999)isaseminalrankingalgorithmthatconsiderstworolesforeachnodeinthenetwork,authority andhub.Agoodauthorityispointedbymanyhubs,andagoodhubpointstomanyauthorities.Theauthorityscoreofa nodeisequaltothesumofthehubscoresofallnodesthatpointtothisnode,andthehubscoreofanodeisequaltothe sumoftheauthorityscoresofallnodesthatthisnodepointsto.Mathematically,theauthorityscoreaiandthehubscorehi ofnodeifulﬁll an+1_i = N

j=1 Ajihnj, hn+1i = N

j=1 Aijanj (8)

Bothscoresarenormalizedaftereachiterationsothatthesumoverallnodesisoneforeachscore.Theiterationsstop whentheaveragescorechangeissmallenough,thatis,

N_i=1(|a(n)_i −a_i(n−1)|+|h(n)_i −h(n−1)_i |)/N<εwhereε=10−9.Ofthe twoscores,authorityisrelatedtonodeimpactinacitationnetworkasitisderivedfromincominglinksasopposedtohub whichisderivedfromoutgoinglinks(references)tonodesofhighauthority.Wethusconsidertheequilibriumauthority valueasthenodes’HITSscoreshere.

4.9. Yearlycitationcountpercentile(YCCP)

Theuseofpercentilesintherankingofpapershastheadvantageofavoidingworkingdirectlywithcitationcountsthatare typicallybroadlydistributedwhichmakesitdifﬁculttoaggregatethem(Leydesdorffetal.,2011)by,forexample,averaging (suchascomputingtheaveragecitationcountofthepapersauthoredbyanindividualresearcher).Toreducetheagebias intheresultingranking,wecomputethecitationcountpercentileofanodewithrespecttothecitationcountsofallnodes thathaveappearedinthesameyearasthetargetnode.Thenodesareﬁnallyrankedbytheirrespectivepercentileranks. Notethatbycomparingwithnodesthatappearedinthesameyear,thisrankingmetricalreadyaddressestheagebias;we thusdonotconsiderarescaledversionofthismetric.Ifthecitationcountpercentileiscomputedwithrespecttoallnodes regardlessoftheirappearancetime,therankingisthesameastherankingbycitationcount,C.

(8)

4.10. Rescaledmetricvariants

Tosuppresstheagebiasofrankingmetrics,weusetherescalingprocedureproposedbyMarianietal.(2016);seeDunaiski etal.(2019b)forotherapproachestoscorenormalization.TherescaledscoreR(mi)formetricmandnodeiiscomputedas

R(m_i)=mi−i(m) i(m)

(9) wheremiistheoriginalscoreofnodeiasproducedbymetricm,andi(m)andi(m)arethemetricmeanandstandard deviation,respectively,computedovernodesinawindowcenteredatnodei.Assumingthatthenodesaresortedbytheir age/appearancetime,thewindowaroundnodeiincludesnodesj∈[i−W/2,i+W/2]wheretheparameterWrepresents thewindowsize.FortheAPS,HEP,andPATdata,weuseW=1000,W=2000,andW=15,000,respectively,whichisroughly proportionaltothenumberofnodesineachdataset.

AsshowninMarianietal.(2016,2018)andRen(2019),rescalingsignificantlyreducesthemagnitudeoftheagebias—and, inthecaseofVaccarioetal.(2017),oftheageandfieldbias—ofcitationcountandPageRank.Weusethistechniquehere torescaleallrankingmetricsintroducedabove,andinturncomparetheirperformancewithoriginalnon-rescaledmetrics. Rescaledmetricsaremarkedbyadding Ratthebeginningof theiroriginallabels(e.g.,RPfor rescaledPageRank).The effectivenessoftherescalingprocedureinremovingtheagebiasofrespectivemetricsinthestudieddatasetsisinvestigated inAppendixAwherewefindthatrescalingindeedsignificantlyreducestheagebiasforalmostalldataset-metricpairs(see TableA.5forasummary).

5. Metricperformanceinrankingtheseminalnodes(Task1)

Weﬁrstevaluatetherankingperformanceofmetricstakingintoaccountsolelytherankingpositionsoftheseminal nodes,inlinewithcommoninformation-retrievalpractices(Dunaiskietal.,2018;Lü&Zhou,2011;Manning,Raghavan,& Schütze,2010;Radicchietal.,2009).

5.1. Identiﬁcationrate

Ourbasicevaluationprocedureisbasedonacompletegivennetworkwhichisusedasaninput.Werankthenetwork nodesbytheirscoreaccordingtoagivenmetricmandcomputethefractionoftheseminalnodesthatareamongthetopzN nodes,fz(m).Thisquantityiscommonlyreferredasrecallininformationfilteringliterature(Lüetal.,2012).Tocomplywith previousresearchonrescaling(Marianietal.,2016),andalsotoavoidconfusionforarelatedage-dependentversionofthis metric(seethenextparagraph),weuseherethepreviously-coinedtermidentificationrate(IR)forfz(m).Notethatz∈(0,1) isanevaluationparameterthat,toreflectourgoalofevaluatingtherankingmetricsbywhethertheyranktheseminalnodes “highly”,shouldbeasmallnumber.Weusez=1%unlessstatedotherwiseandlaterverifyourmainresultsusingz=0.5% andz=2%,respectively.

Besidesassessingtheidentiﬁcationrateonacompletenetwork,wealsostudythemetrics’performanceasafunctionof theageoftheseminalnodes(Marianietal.,2016).Tothisend,weconstructnetworksnapshotsattheendofeachcalendar year(ignoringallnodesandlinksthatappearafterward),andrankthenodesineachnetworksnapshot.Thisallowsusto evaluate,individuallyforeachseminalnode,whetheritwasatthetopzfractionoftherankingatanygivenaget.By averagingthisoverallseminalnodes,6 _we_obtain_the_{identiﬁcation}_rate_f

z(m,t)whichisnowafunctionoftheageof seminalnodes.fz(m,1year),forexample,isthefractionoftheseminalnodesthatareinthetopzfractionoftheranking whentheyareoneyearold.

Task1focusessolelyontherankingpositionsoftheseminalnodes,andthesearereﬂectedbyfz(m)andfz(m,t).While theformerevaluatesthe“ﬁnal”rankingpositionsoftheseminalnodes,thelatterallowsustoinspecthowfast(orslow)do theseminalnodesriseintherankingsbytherespectivemetric.

5.2. Metricevaluationusingidentiﬁcationrate

Fig.1showstherankingmetricsevaluatedbytheirIRincompletedatasets.Overall,thehighestidentiﬁcationratesare foundinAPS,followedbyHEP,andthenbyPAT.AlikelyreasonforthisisprovidedbyTable2whichshowsthatinthePAT data,medianindegreedifferstheleastbetweenallnodesandtheseminalnodes,thusmakingtheseminalnodesinthis datasetdifﬁculttobeseparatedfromtheothernodes.

TherelativestandingsofmetricsarerathersimilarbetweenAPSandPATwithPageRankbeingthebest-performing metricinboth.Relativedifferencesbetweenthemetricsinbothdatasetsarerathersmall,though:Inbothdatasetsthere areafewmethodswithnearly-identicalperformance,andtheratiobetweenthebestandtheworstmetric’sIRisaround 1.5.TheresultsareverydifferentinHEPwhere:(1)LeaderRank(L)outperformsthesecond-bestmethodbyawidemargin,

6_If_a_seminal_node_appears_t_years_before_the_end_of_the_complete_dataset,_it_is_obviously_impossible_to_know_its_ranking_at_age_t_>_t._Seminal_nodes_that areyoungerthantarethereforeexcludedfromtheaveraging.

(9)

Fig.1. Metrics’performanceinidentifyingtheseminalnodesasmeasuredbytheidentiﬁcationrate(z=1%)incompletedatasets.Notethatthemaximal displayedvaluesdifferbetweenthepanels.Colorsofthebarsareusedtodistinguishtheoriginalrankingmetrics(white)andtheirage-rescaledcounterparts (orange).(Forinterpretationofthereferencestocolorinthisﬁgurelegend,thereaderisreferredtothewebversionofthisarticle.)

Fig.2.Theidentiﬁcationrateofindividualmetricsasafunctionoftheseminalnodeage(inyears).Tofacilitatethecomparison,themetrics’performance isnormalizedtothebestmetricineachagebin.AmetricwithzeroIRthusreceiveszeroscore,whileametricthatachievesthebestIRforgivenseminal nodeagereceivesthescoreofone.

(2)Theratiobetweenthebestandtheworstmetric’sIRis3.5,(3)Allrescaledmetricsperformsigniﬁcantlyworsethan theirunrescaledcounterparts.Wefocusonmetricevaluationinthissection;reasonsforthedifferencesobservedinHEP arediscussedinSection5.3.

Insummary,LeaderRankcanbeconsideredasthebest-performingmetricinTask1asitisclearlybestintheHEPdata andnearly-bestintheAPSandPATdata.Thisholdsalsowhendifferentevaluationthresholds,z=0.5%andz=2%,areused. Fig.2consequentlyshowsthemetrics’relativeperformanceasafunctionoftheseminalnodeage.Thisapproachservesto revealthetimeevolutionofthemetrics’rankingperformance.Tofurtherfacilitatethecomparisonofmetrics,wenormalize themetrics’identificationrateatagivenagetofseminalnodes,f(m;t),bythebest-achievedIRatthisage,maxnf(n;t). Relativeperformancethusrangesfromzero(whenametric’sIRiszeroataget)toone(achievedbythebest-performing metricataget).AsshowninFig.2,therelativeperformanceofmetricschangesdramaticallywiththeseminalnodeage: metricsthatworkwellshortafterpublication(mostlyrescaledmetrics)losetheiradvantageastheseminalnodesbecome older.Inthedisplayednodeagerange,CiteRankandrescaledCiteRank(TandRT)aretwobest-performingmetricsinAPS andPAT.ForHEP,thereisnosinglemetricthatperformswellformostagevalues.Rescaledcitationcount(RC)isbestuntil age5,thenh-indexandcollectiveinfluence(HandCI)arebestuntilage12,andfinallysemi-localcentrality(SLC)isthe bestfromthenuntilage20.LeaderRank(L),whichperformedbestforthecompleteHEPdatasetinFig.1,becomesthebest metriclateron(forcomparison,theaverageageoftheseminalnodesinthecompleteHEPdatasetis61years).

Togainfurtherinsightsindifferencesbetweenthemetrics,weevaluatetheirpairwisesimilarityusingtheSpearman rankcorrelationofallnodes’rankings.TheresultsareshowninFig.3togetherwithmetricclusteringbasedontheobtained correlationmatrices.Thereareseveralpointstonote.First,theclusteringofmetricsisremarkablystableacrossthedatasets. Second,theclusteringsrevealtwogroupsofmetricswhoserankingsaresimilartoeachother.ThelargergroupincludesCI, SLC,P,L,C,andH.Thesmallergroupincludessomeoftheirrescaledvariants:RP,RL,RC,andRH.Third,RCI,RSLC,andRTdo notclusterwithotherrescaledmetrics,probablyasaresultoftherescalingprocedurenotworkingperfectlyforthem(see Figs.A.1–A.3inAppendixA).Fourth,withineachofthetwomentionedclusters,thepairwiseSpearmancorrelationvalues areratherhigh(above0.73inallthreedatasets),whichindicatesahighdegreeofsimilarityamongtherespectivemetrics. NotethatwehaveomittedHITSfromthepresentationofresultsabove.Thereasonfordoingsoisthatitsperformanceis somuchworsethanthatoftheothermetricsthattheaddedvalueofdisplayingHITSinallpreviousﬁgureswouldbevery limited.Inparticular,theidentiﬁcationratevaluesoftheHITSauthorityscoreare0.143(APS),0.116(HEP),and0.054(PAT).7

7 _HITS_performance_is_strongly_inferior_to_other_metrics_also_in_terms_of_normalized_{identiﬁcation}_rate_introduced_in_Section_6.1_.

(10)

Fig.3.SimilarityoftheevaluatedmetricsasmeasuredbytheSpearmanrankcorrelationofallnoderankpositions.Themetrics’hierarchicalclusterings areobtainedbytheUPGMAmethodimplementedbytheclustermapfunctioninPython’sSeabornlibrary.

ThepoorperformanceofHITShereisverydifferentfromthisalgorithmbeingpraisedinthelineofresearchoncourtdecision citationnetworks(seeAgnoloni&Pagallo,2015;Fowler&Jeon,2008andthereferencestherein).Onepossiblereasonfor thisdifferenceisthatinscience,fewwouldagreethatacitationfromawell-referencedbutlittlecitedreviewpaperismore indicativeofthetargetpaper’simpactthanacitationfromahigh-impactpaperwithfewreferences(asHITSauthorityscore wouldassume).Inthissense,courtdecisioncitationnetworksmaybeintrinsicallymorefavorabletoHITSthanthepaper andpatentcitationnetworksare.Furtherresearchisnecessarytounderstandstructuraldifferencesbetweencourtdecision citationnetworksandscholarly/patentcitationnetworks.Also,acomprehensiveevaluationofseveralrankingmetricscan helpusunderstandwhetherHITSisindeedthesingularbest-performingmetricincourtdecisionnetworks.

Wehavesimilarlyomittedyearlycitationcountpercentile,YCCP,fromthefigures.AlbeittheperformanceofYCCPdoes notlackbehindthetopmetricsasmuchastheperformanceofHITS,theresultsarestillsignificantlylower:Theidentification ratevaluesofYCCPare0.700(APS),0.197(HEP),and0.375(PAT).Importantly,theIRresultsofYCCParesimilartotheresults ofRCwhichisexpectedasRCtoo,isanage-normalizedversionofcitationcountsimilarlytoYCCP.Becauseofthishighlevel ofsimilarity,wereporttheYCCPresultsonlyintext.

5.3. Caveatsofidentiﬁcationrate

Whilemetrics’performanceinFig.1isstrikinglyuniformintheAPSandPATdata,bigdifferencesarefoundintheHEP data.Toexplainwheredotheycomefrom,Fig.4showstheagedistributionsoftheseminalnodesinthedata.Intermsof realtime,thedifferencebetweenAPS/PATandHEPisalreadyapparentastheﬁrsttwodatasetshavetheaveragepublication yearoftheseminalnodes1976and1975,respectively,whereasitis1957fortheHEPseminalnodes.Thedifferencebetween thethreedatasetsismoreevident,though,wheneachseminalnodeisassignedtooneofthe40equally-sizedagegroups byitspublicationdate(withgroups1and40containingtheoldestandthemostrecentnodes,respectively).Thebottom rowofFig.4showsthattheHEPseminalnodesaredistributedextremelyunevenlyamongtheagegroupswith74%ofthem (230outof310)intheoldestagegroup1,andnoseminalnodesinagegroups14–40.Thebigdifferencesbetweenthetop andbottomrowinFig.4areduetotheacceleratingratesatwhichnewnodesappearinthedatasets.Thenumbersofrecent newnodesaresohighthatthey“push”theseminalnodestotheearlyagegroups.InAPS,forexample,approximately85% ofallnodesappearafter1976whichisthemeanpublicationyearofthedataset’sseminalnodes.

Thestrongtemporalnon-uniformityoftheseminalnodeshasprofoundconsequences.Firstly,itisnotfavorableto age-rescaledmetricswhich,bydesign,striveforauniformrepresentationofallagegroupsamongthetop-rankednodes.For theHEPdata,however,nodesfromagegroups2–40cancontributeonlymarginallytotheidentiﬁcationratebecausethere areonlyafewseminalnodesamongthem.Bycontrast,originalnon-rescaledmetricsaretypicallybiasedtowardsoldnodes (seeFigs.A.1–A.3)andthisgivesthemanadvantagewhenagivensetofseminalnodessharesthesamebiastowardsold nodes.Inparticular,Fig.A.2showsthatthebiasofLeaderRanktowardsoldnodesisthestrongestofallmetricsinHEP,which directlycontributestothemetric’ssuperiorperformanceinFig.1.

Secondly,theagebiasoftheseminalnodesinHEPissostrongthatitallowsthesimplerankingofnodesbytheirage(we referthismetricasAgeR;oldnodesareatthetop)tooutperformallothermetrics.Itsidentificationrateonthecomplete HEPdatais0.70whichisindeedbetterthanthevaluesshowninFig.1fortheothermetrics.Thisisfurtherillustratedbythe leftpanelofFig.5whichshowstheidentificationrateforafewselectedmetricsasafunctionoftheseminalnodeage.Here AgeRyieldszeroidentificationratewhentheseminalnodesareyoung(youngerthan30years)becauseitsimplyputsold nodesatthetopoftheranking.However,themetric’sresultsquicklyimprovewhentheseminalnodesareolderthanthat

(11)

Fig.4.Thedistributionsoftheseminalnodes’publicationdatesinthedatasets:realtime(toprow),and40equally-sizedagegroups(bottomrow).

Fig.5.PerformanceofselectedmetricsinidentifyingtheseminalnodesintheHEPdata:Acomparisonbetweentheidentificationrate(left)andthe normalizedidentificationrate(right;seeSection6.1forthedefinition)measuredasfunctionsoftheseminalnodeage.

andAgeRbecomesthebestmetricstartingfromage40,approximately.Thisdemonstratesthatevaluatingrankingmetrics solelybytheranksthattheyassigntotheseminalnodesisoflimitedrelevanceasAgeR—ametricthatentirelyignoresthe actualimpactofthenodes—iseventuallyabletooutperformallothermetrics.

Insummary,theidentificationratesobservedwithinTask1areaconfoundoutcomeofagivenmetric’sabilitytorank welltheseminalnodesandthelevelofagreementbetweenthemetric’sbiasesandthebiasesimplicitlypresentinthechosen setofseminalnodes.Notethatuntilnow,wediscussedspecificallytheagebiasbecauseitisbothmanifestlypresentaswell aseasytodefineandmeasure.Otherpotentiallyrelevantbiases—suchasthefieldbias,forexample—canbeinprinciple studiedandtreatedinasimilarwayaswedoherefortheagebias.

6. Metricperformanceinrankingtheseminalnodeswhilstpenalizingbiasedmetrics(Task2)

Havingdemonstratedthecaveatsofevaluatingtherankingperformanceofmetricsusingidentiﬁcationrate,wenow proceedtoTask2that additionallypenalizesbiasedmetrics.Tothisend, weemploythenormalizedidentiﬁcationrate introducedinMarianietal.(2016)whichimposesapenaltyonmetricsthatarebiased.

(12)

Fig.6. AnillustrationforthealternativeinterpretationofTask2:Theagedistributionoftheseminalexpert-selectednodesandthetopzfractionofnodes fromeachagegroup(weuseherefouragegroupsasanexample).

6.1. Normalizedidentiﬁcationrate

Normalizedidentificationrate(NIR)introducedinMarianietal.(2016)considerstheagedistributionofthetop-ranked nodesandappliesapenaltyfactortotheidentifiedseminalnodesthatcomefromagegroupsthatareover-represented amongthetop-rankednodes.TocomputeNIR,we divideallNnetwork nodesbyageintoGgroupsofequalsize,and computeNz(g)whichisthenumberofnodesfromeachgroupg(g=1,...,G)thatareinthetopzfractionoftheranking. Anage-unbiasedmetricwouldresultinNU:=zN/Gtopnodes,onaverage,ineachagegroup.Foranyagegroupgthatis “over-represented”(thatis,Nz(g)>NU),theseminalnodesthatareinthetopzfractionoftherankingdonotcontribute totheNIRfullybutonlyproportionallytoNU/Nz(g).If,forexample,aseminalnodeisfromanagegroupthatistwiceas frequentinthetopoftherankingasitshouldbe,thisseminalnodecontributesonlyhalftotheNIR.Bycontrast,seminal nodesfromunder-representedagegroups(Nz(g)<NU)contributetotheNIRinthesamewayastheycontributetotheIR. Inotherwords,NIRassumesapenaltyforseminalnodesfromover-representedagegroupsbutnobonusforseminalnodes fromunder-representedagegroups.Tosummarize,thefactorNU/Nz(g)introducedbythenormalizedidentificationratecan beviewedasapenaltyfortheperformancegainedbyagebiasofametric.

ThechoiceofthenumberofagegroupsGusedinthecomputationofNIRisacompromisebetweenimprovingthetemporal resolution(loweringthetimedurationofeachgroup)byincreasingGandlimitingthenaturalstatisticalvariabilityofNz(g) bykeepingGlow.WeuseG=40adoptedinpreviousliterature(Marianietal.,2016;Vaccarioetal.,2017);otherchoices leadtoqualitativelysimilarresults.Notethatduetotheintroductionofapenalizingfactor,NIRcannotbehigherthanIR foragivenranking.ThehighestpossibleNIRofoneisachievedbyarankingthatplacesallseminalnodesinthechosentop fractionoftheranking(weusetop1%here,unlessstatedotherwise)andtherankingisnotbiasedbynodeage(or,atleast, theagebinscontainingtheseminalnodesarenotover-representedinthetopoftheranking).

TherightpanelofFig.5showsthenormalizedidentificationrateasafunctionoftheseminalnodeageforasmallnumber ofselectedmetrics.ItshowsthatusingNIRindeedsolvestheproblemencounteredwhenmetricperformanceismeasured usingtheordinaryidentificationrate(leftpanelinFig.5).Inparticular,AgeRbecomes theworstmethodregardlessof theseminalnodeage,asappropriateforarankingmethodthatactuallyignoresnodeimpactinthenetwork.LeaderRank, anothermetricthatisstronglybiasedtowardsoldnodes,isalsostronglyaffectedanditsperformancestartstodecreaseat theseminalnodeage10yearsinsteadofgrowingmonotonouslywhenidentificationrateisused.Thisillustratesthatthe useofthenormalizedidentificationrateweakensthemutuallyreinforcinglinkbetweenage-biasedmetricsandage-biased setsofseminalnodes.

ThereisalsoanalternativeviewthatleadstoTask2andthenormalizedidentificationrateastheappropriateevaluation methods.Thisviewisbasedonrealizingthattheseminalnodesarenecessarilyasubsetofhigh-qualitynodes—theproverbial “tipofaniceberg”.TheshorttextintroducingthePhysicalReviewLettersmilestonesexplicitlyacknowledgesthat“Itis inevitablethatsomeveryimportantworkwillnotbefeatured(inthemilestonescollection)”.Wedefinethebestzfraction ofnodesineachagegroupasthehigh-qualitynodes—buttheproblemisthatwedonotknowwhicharethose“best”nodes. Thatiswhywestillneedtheseminalnodesbutwedonotviewthemasadefiniteandonlytargetfortheevaluatedranking metrics(whichwouldcorrespondtoTask1).Instead,werecognizethattheseminalnodesareaparticularsamplefromall high-qualitynodesinthedataset.ThisisillustratedbyFig.6wherethenumberofseminalnodesvariesgreatlyamongthe agegroupsbutthenumberofthetopzfractionofnodesisnaturallyconstant.Ifarankingmetricover-representsacertain agegroupinitstopzfractionoftherankingbyfactorX>1,onlythefraction1/Xofthetopnodesfromthisagegroupare amongthetopzfractionofnodesfromthisagegroup.Thatiswhythefactor1/Xneedstobeappliedtoanyidentifiedseminal nodesfromthisagegroup—whichispreciselywhatthenormalizedidentificationratedoes.Insummary,Task2andthe

(13)

Fig.7. Metrics’performanceinidentifyingtheseminalnodesasmeasuredbyNIRevaluatedonthecompletedatasets.

Table4

Asummaryofthemetricsevaluationbynormalizedidenticationrate.Theaveragescoreofmetricmisobtainedbycomputingitsscorerelativetothe best-performingmetricineachdataset,NIR(m)/maxnNIR(n),andaveragingthisscoreoverthethreeanalyzeddatasets.Ametricthatwouldperformbest inalldatasetswouldthereforeachievetheaveragescoreofone.ThesubsequentrowsshowtherankingofmetricsbytheirNIRforeachdataset.Thebottom partofthetableshowstheaveragescorebasedontheNIRvaluesfortwodifferenttoprankingfractionsz,0.5%and2%.

Metric RP RL RC RH RCI RSLC T RT C H CI P L SLC HITS RHITS

Avgscore 0.98 0.98 0.89 0.81 0.80 0.75 0.74 0.70 0.69 0.63 0.58 0.57 0.54 0.53 0.27 0.25 RankAPS 1 2 5 4 7 9 3 6 8 10 12 11 13 14 15 16 RankHEP 4 1 5 6 3 2 11 12 9 7 10 15 16 8 14 13 RankPAT 1 3 2 4 5 9 8 11 10 12 13 6 7 14 15 16 Avgscorez=0.005 0.96 0.96 0.78 0.73 0.72 0.68 0.65 0.63 0.60 0.65 0.53 0.55 0.49 0.46 0.20 0.21 Avgscorez=0.02 0.93 0.94 0.89 0.81 0.82 0.71 0.71 0.72 0.66 0.62 0.59 0.56 0.52 0.52 0.28 0.24

normalizedidentiﬁcationratecorrespondalsotothetaskofrankinghighlythebestnodesfromeachagegroup,fromwhich thegivenseminalnodesareapotentiallybiasedsample.

6.2. Metricevaluationusingnormalizedidentiﬁcationrate

NIRvaluesonthecompletedatasetsareshowninFig.7.Thefirstthingtonoteisthattherescaledmetricsgenerally performbetterthantheiroriginalcounterpartshere.TheonlyexceptionisTwhichitselfcontainsamechanismtoprevent themetricfrombeingoverlybiasedtowardsoldnodes,soanadditionalrescalingprocedureisinsomesensesuperfluous (evenwhenMarianietal.(2016)andFigs.A.1–A.3showthatCiteRankstilldisplaysstrongagebiasinsomeagegroups).The secondobservationisthattheNIRvaluesfortheHEPdatasetaremuchlowerthanthepreviouslyreportedIRvalues.Thisis adirectconsequenceofthepenalizationintroducedbyNIRthatheavilypenalizesbiasedrankingmetricsand,atthesame time,unbiasedrankingmetricsbeingunabletowellidentifythebiasedseminalnodes.Inlinewiththeidentificationrate resultsinTask1,YCCPperformssimilarlytoRC:itsNIRvaluesare0.678(APS),0.175(HEP),and0.348(PAT);theaverage scorereportedfortheothermetricsinTable4is0.89.

Themostimportant findingemerging fromFig.7 isthat upon adoptingthe normalizedidentificationrate forthe evaluation, metricsthat perform well in allthree datasets emerge. This is clearly visible in Table4 where the rela-tiveperformanceof therankingmetricsin allthree datasetsis summarized.We seethat rescaledPageRank, RP,and rescaledLeaderRank,RL,performbestor nearly-bestin allthreedatasets (recallthat LeaderRankisa modificationof PageRankobtainedbychangingtheteleportationterm).Thisresultisrobustwithrespecttochangingtheranking frac-tionzthatweusetoevaluatethenormalizedidentificationrate:evenwhentherelativeorderofsomemetricschanges, RPand RLremainthetwo bestmetricsbysomemargin.It isinterestingtonoteherethat whilethetwo topmetrics areglobalinthesenseoftakingthewholenetworkstructureintoaccount,theyarefolloweddirectlybyrescaled cita-tioncount, RC,which is a localmetric that is based only onthe immediate nodeneighborhood. Semi-local metrics, suchasthesemi-localcentralitySLCandthecollectiveinfluenceCI,regardlessifrescaledornot,combinetheworstof bothworlds:Theyarecomputationallymoredemandingthanlocalmetrics,andtheyranknodesworsethanlocal met-rics.

FurtherinsightscanbegainedbyplottingNIRasafunctionoftheseminalnodeageinFig.A.4similarlyaswedidforthe identiﬁcationrate,IR,inFig.2.Weseethere,forexample,thatunlikeintheAPSdata,rescaledPageRankdoesnotoutperform rescaledcitationcountintheHEPdataintheﬁrst18yearsofseminalnodeage.

7. Discussion

Previouslyintroducednormalized identiﬁcationrate(NIR)takesintoaccountboth therankingpositionsof expert-selected nodes as well as the metric’s bias that manifests itself in the ranking. We use NIR to uncover the

(14)

consistentperformance of impactranking metricsacrossdifferentcitation datasets.Our resultsindicate thatranking based on thenetwork structure is more successful than simple degree-based metrics in singling out the signiﬁcant nodes.

7.1. Limitationsandopendirections

Thereare variousquestions that remainopenfor furtherresearch.First,to extendouranalysistomore thanone kindofbias(suchastheageandfieldbiascommoninscholarlycitationdataVaccarioetal.,2017)andtogeneralizeit tocaseswherethekindofbiasisnotexplicitlyknown.Second,identificationrate(referredtoasrecallininformation filteringandstatisticallearning)is justoneof variousperformancemetrics(see Dunaiskietal.,2018foranoverview of possibilities);we thusneedtostudy howtoaccount forbiasin theseothermetrics.Third,besidesmetric evalua-tion onrealdata,evaluation onsynthetic modeldata canbeused togain furthertheoreticalinsights. Thisapproach hastheadvantageofhavingthepossibilitytoarbitrarilyturnonandoff variousmodelcomponentsandthusidentify which ofthem are crucialfor theobserved metric performance.For example,which assumptions aboutpapers writ-ten bymultiple authorsmust be fulfilled in order for a specific researcher-impact metric reflect wellthe individual authors’contributions? Such modelsneed tobeinformed byanalysesof realempirical datasets and, conversely, the assumptions andeffectsidentified ascrucialinmodeldatacanbeinturnvalidatedinrealdatasets.Finally,thereare othercitationdatasetsthatcanbeusedforasimilaranalysisuponidentifyingcorrespondingsetsofseminalnodesfor them.

Todrawaparallel,inamachinelearningproblem,ifthetrainingsethassomeintrinsicbias,thesystemlearnsthisbias andinturnproducesbiasedoutcomes(Lloyd,2018;Raghavendra,Cerutti,&Preece,2018).Thisissimilartooursituation whereapotentialbiasintheusedsetofseminalnodes,ifleftunchecked,canleadtowrongrankingmetricsbeingbelieved toperformbest,orevendesigningnewrankingmetricsthatperformwellonlythankstothebias.Biasedoutcomesofthose metricscaninturnmisguideourfutureevaluationsanddecisions.Furtherresearchofvariousaspectsofbiasindatamining andcomplexsystemsresearch,inparticularhowtoavoidit,isthereforevital.

7.2. Managementimplications

Rankingandprioritizingisanessentialtaskinmanymanagerialapplications.Ourresultsshowthattorankpapers, age-rescaledPageRankisawell-performingmetricthatbyconstructionproducesrankingswithlittleresidualagebias.Evaluation ofresearchersandinstitutestypicallyusesmetricsderivedfromcitationcounts(suchastheh-index,forexample).Our analysis,inparticulartheperformancegapbetweenage-rescaledPageRankandthetestedunbiasedversionsofcitation count,suggestthatapplyingstructuralnetwork-basedmetricssuchasPageRankmightbeofadvantagealsowhenthe objectiveistoranktheresearchersorinstitutions.

8. Conclusion

Well-designedrobustevaluationprotocolsarecrucialforunderstandingwhichrankingmetrics,whichwehave abun-danceof(Liaoetal.,2017;Waltman,2016),performwellinwhichcontexts.Thisstudyshowsthattheevaluationofaranking metricbasedsolelyonthepositionsofexpert-selectednodesintheresultingrankingisdifficulttointerpretbecauseit con-foundstwoaspects:themetric’srankingperformanceandthedegreetowhichthebiasesoftheexpert-selectednodes overlapwiththemetric’sbiases.Normalizedidentificationrateweakensthelinkbetweentherankingbiasandthe evalua-tionresults,andyieldsresultsthatareconsistentacrossdifferentdatasets.Inourcaseofrankingseminalnodesincitation networks,wefindthatage-rescaledPageRankandage-rescaledLeaderRank(notethatLeaderRankisaclosevariantof PageRank)arethetwobest-performingmetricsbyawidemargin.

Ourworkdeepenstheunderstandingofimpactmetrics,especiallyinrelationtotheinterplaybetweentheirbiasesand thebiasesoftheconsideredtestset.Thecomprehensivecomparisonsamongvariousmetricsarecrucialtocopewiththe ever-growingnumberofnewmetricsandbeneﬁcialtounderstandtheadvantagesandlimitationsofeachofthem.The proposedevaluationframeworkwhichpenalizesbiasedmetricshasgeneralapplicabilitybeyondtherankingofarticlesin citationdata;byhighlightingthevariousrolesofbias,itprovidespracticallessonsfortherankingpracticeinotherdatasets withbias,suchastechnologicalnetworks,socialnetworks,andothersystems.

Authors’contribution

ShuqiXu:Contributeddataoranalysistools;Performedtheanalysis;Wrotethepaper. ManuelSebastianMariani:Conceivedanddesignedtheanalysis;Wrotethepaper. LinyuanLü:Conceivedanddesignedtheanalysis.

MatúˇsMedo:Conceivedanddesignedtheanalysis;Collectedthedata;Contributeddataoranalysistools;Wrotethe paper.

(15)

Acknowledgements

ThisworkissupportedbytheNationalNaturalScienceFoundationofChina(Nos.11622538,61673150,11850410444), theScienceStrengthPromotionProgramoftheUESTC,andtheZhejiangProvincialNaturalScienceFoundationofChina (Grantno.LR16A050001).MSMacknowledgesﬁnancialsupportfromtheUniversityofZurichthroughtheURPPSocial Networks,theSwissNationalScienceFoundation(GrantNo.200021-182659),theUESTCprofessorresearchstart-up(Grant No.ZYGX2018KYQD21).

AppendixA. Evaluationoftheagebiasremoval

Fig.A.1–A.3review theagebiasofindividualrankingmetricsin thethreeanalyzeddatasets.Using theusual divi-sionofallnodesin40equally-sizedgroupsbyage,theﬁguresshowthenumberofnodesfromeachagegroup,N1%(g) inthetop1%oftherankingbyeachrespectiverankingmetric.Anage-unbiasedmetric wouldthusdisplayaﬂat his-togramwheredeviationsfromtheperfectlyuniformvalueNU=0.01N/40ineachagebinwouldbeonlyofstatisticalnature. Asin(Marianietal.,2016),wemeasurethelevelofagebiasineachhistogramusingtheobservedstandarddeviation

=

1 40 40

g=1 [N1%(g)−NU]2

withtheaverage standard deviation0 that resultsfromdistributing 0.01N nodes among the40 agebins in a ran-dom(and therefore unbiased) way.When thebias strength is measuredby /0,the value of around one(or less) indicates that the observed level of bias can be explained by statistical ﬂuctuations only. The higher the value, the stronger the bias. The values of /0 corresponding to the histograms in Figs. A.1–A.3 are summarized in TableA.5.

Fig.A.1.VisualizationoftheagebiasoforiginalandrescaledmetricsfortheAPSdata.

(16)

Fig.A.2.VisualizationoftheagebiasoforiginalandrescaledmetricsfortheHEPdata.

TableA.5

QuantiﬁcationoftheagebiasmagnitudeofrespectiverankingmetricsinFigs.A.1–A.3using/0(thehigherthevalue,thestrongertheagebias;thevalue ofoneindicateszerobias).

Metric APS HEP PAT

Original Rescaled Original Rescaled Original Rescaled

P 15.7 1.4 22.1 1.9 36.1 6.2 C 7.5 1.4 9.7 1.9 46.6 7.4 T 14.2 4.1 10.5 8.6 42.0 40.7 H 8.4 1.1 11.0 2.3 48.8 11.9 L 23.7 1.4 29.5 1.8 41.3 6.6 CI 9.9 2.8 12.5 3.3 49.6 17.9 SLC 11.5 2.1 13.8 3.1 50.6 18.6 HITS 6.0 5.0 10.4 8.9 28.2 108.3

http://doc.rero.ch

(17)

Fig.A.3.VisualizationoftheagebiasoforiginalandrescaledmetricsforthePATdata.ThefailureofweakeningtheagebiasoftheHITSauthorityscore byrescalingisduetotheauthorityscores“concentrating”onasmallfractionofnodeswiththeremainingnodeshavingasymptoticallyzeroscore,which posesobviousproblemstotherescalingprocedurebasedoncomputingscoremeanandstandarddeviationinaﬁnitemovingtimewindow.

Fig.A.4.Thenormalizedidentiﬁcationrateofindividualmetricsasafunctionoftheseminalnodeage(inyears).Tofacilitatethecomparison,themetrics’ performanceisnormalizedtothebestmetricineachagebininthesamewayasinFig.2.AmetricwithzeroNIRthusreceiveszeroscore,whileametric thatachievesthebestNIRforgivenseminalnodeagereceivesthescoreofone.

References

Agnoloni,Tommaso,&Pagallo,Ugo.(2015).ThecaselawoftheItalianconstitutionalcourt,itspowerlaws,andthewebofscholarlyopinions.Proceedings ofthe15thinternationalconferenceonartiﬁcialintelligenceandlaw,151–155.

Alonso,Sergio,Cabrerizo,FranciscoJavier,Herrera-Viedma,Enrique,&Herrera,Francisco.(2009).h-index:Areviewfocusedinitsvariants,computation andstandardizationfordifferentscientiﬁcﬁelds.JournalofInformetrics,3(4),273–289.

Berkhin,Pavel.(2005).AsurveyonPageRankcomputing.InternetMathematics,2(1),73–120.

Bornmann,Lutz,&Daniel,Hans-Dieter.(2008).Whatdocitationcountsmeasure?Areviewofstudiesoncitingbehavior.JournalofDocumentation,64(1), 45–80.

Bornmann,Lutz,&Marx,Werner.(2015).Methodsforthegenerationofnormalizedcitationimpactscoresinbibliometrics:Whichmethodbestreﬂects thejudgementsofexperts?JournalofInformetrics,9(2),408–418.

Bornmann,Lutz,Leydesdorff,Loet,&Mutz,Rüdiger.(2013).Theuseofpercentilesandpercentilerankclassesintheanalysisofbibliometricdata: Opportunitiesandlimits.JournalofInformetrics,7(1),158–165.

Bovet,Alexandre,&Makse,HernánA.(2019).InﬂuenceoffakenewsinTwitterduringthe2016USpresidentialelection.NatureCommunications,10(1),7. Braun,Tibor,Glänzel,Wolfgang,&Schubert,András.(2006).AHirsch-typeindexforjournals.Scientometrics,69(1),169–173.

(18)

Brin,Sergey,&Page,Lawrence.(1998).Theanatomyofalarge-scalehypertextualwebsearchengine.ComputerNetworksandISDNSystems,30(1-7), 107–117.

Charlton,BruceG.,&Andras,Peter.(2007).Evaluatinguniversitiesusingsimplescientometricresearch-outputmetrics:Totalcitationcountsper universityforaretrospectiveseven-yearrollingsample.ScienceandPublicPolicy,34(8),555–563.

Chen,Duanbing,Lü,Linyuan,Shang,Ming-Sheng,Zhang,Yi-Cheng,&Zhou,Tao.(2012).Identifyinginﬂuentialnodesincomplexnetworks.PhysicaA: StatisticalMechanicsandItsApplications,391(4),1777–1787.

Chen,Peng,Xie,Huafeng,Maslov,Sergei,&Redner,Sidney.(2007).FindingscientiﬁcgemswithGoogle’sPageRankalgorithm.JournalofInformetrics,1(1), 8–15.

Dunaiski,Marcel,Visser,Willem,&Geldenhuys,Jaco.(2016).Evaluatingpaperandauthorrankingalgorithmsusingimpactandcontributionawards. JournalofInformetrics,10(2),392–407.

Dunaiski,Marcel,Geldenhuys,Jaco,&Visser,Willem.(2018).Howtoevaluaterankingsofacademicentitiesusingtestdata.JournalofInformetrics,12(3), 631–655.

Dunaiski,Marcel,Geldenhuys,Jaco,&Visser,Willem.(2019a]).Globalisedvsaveraged:Biasandrankingperformanceontheauthorlevel.Journalof Informetrics,13(1),299–313.

Dunaiski,Marcel,Geldenhuys,Jaco,&Visser,Willem.(2019b]).Ontheinterplaybetweennormalisation,bias,andperformanceofpaperimpactmetrics. JournalofInformetrics,13(1),270–290.

Fowler,JamesH.,&Jeon,Sangick.(2008).Theauthorityofsupremecourtprecedent.SocialNetworks,30(1),16–30.

González-Pereira,Borja,Guerrero-Bote,VicenteP.,&Moya-Anegón,Félix.(2010).Anewapproachtothemetricofjournals’scientiﬁcprestige:TheSJR indicator.JournalofInformetrics,4(3),379–391.

Harzing,Anne-Wil,&Wal,RonVanDer.(2009).AGoogleScholarh-indexforjournals:Analternativemetrictomeasurejournalimpactineconomicsand business.JournaloftheAmericanSocietyforInformationScienceandtechnology,60(1),41–46.

Hicks,Diana,Wouters,Paul,Waltman,Ludo,deRijcke,Sarah,&Rafols,Ismael.(2015).Bibliometrics:TheLeidenManifestoforresearchmetrics.Nature, 520,429–431.

Hirsch,JorgeE.(2005).Anindextoquantifyanindividual’sscientiﬁcresearchoutput.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesof America,102(46),16569–16572.

Hirsch,JorgeE.(2007).Doesthehindexhavepredictivepower?ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,104(49), 19193–19198.

Kleinberg,JonM.(1999).Authoritativesourcesinahyperlinkedenvironment.JournaloftheACM,46(5),604–632.

Kogan,Leonid,Papanikolaou,Dimitris,Seru,Amit,&Stoffman,Noah.(2017).Technologicalinnovation,resourceallocation,andgrowth.TheQuarterly JournalofEconomics,132(2),665–712.

Leydesdorff,Loet,Bornmann,Lutz,Mutz,Rüdiger,&Opthof,Tobias.(2011).Turningthetablesoncitationanalysisonemoretime:Principlesfor comparingsetsofdocuments.JournaloftheAmericanSocietyforInformationScienceandTechnology,62(7),1370–1381.

Leydesdorff,Loet,Bornmann,Lutz,&Opthof,Tobias.(2018).h˛:Thescientistaschimpanzeeorbonobo.Scientometrics,1–4.

Liao,Hao,Mariani,ManuelSebastian,Medo,Matúˇs,Zhang,Yi-Cheng,&Zhou,Ming-Yang.(2017).Rankinginevolvingcomplexnetworks.PhysicsReports, 689,1–54.

Lloyd,Kirsten.(2018).Biasampliﬁcationinartiﬁcialintelligencesystems.arXiv:1809.07842

Lü,Linyuan,&Zhou,Tao.(2011).Linkpredictionincomplexnetworks:Asurvey.PhysicaA:StatisticalMechanicsandItsApplications,390(6),1150–1170. Lü,Linyuan,Zhang,Yi-Cheng,Yeung,ChiHo,&Zhou,Tao.(2011).Leadersinsocialnetworks,theDeliciouscase.PLoSONE,6(6),e21202.

Lü,Linyuan,Medo,Matúˇs,Yeung,ChiHo,Zhang,Yi-Cheng,Zhang,Zi-Ke,&Zhou,Tao.(2012).Recommendersystems.PhysicsReports,519(1),1–49. Lü,Linyuan,Zhou,Tao,Zhang,Qian-Ming,&Stanley,H.Eugene.(2016).Theh-indexofanetworknodeanditsrelationtodegreeandcoreness.Nature

Communications,7,10168.

Lundberg,Jonas.(2007).Liftingthecrown–citationz-score.JournalofInformetrics,1(2),145–154.

Manning,Christopher,Raghavan,Prabhakar,&Schütze,Hinrich.(2010).Introductiontoinformationretrieval.NaturalLanguageEngineering,16(1), 100–103.

Mariani,ManuelSebastian,Medo,Matúˇs,&Zhang,Yi-Cheng.(2015).Rankingnodesingrowingnetworks:WhenPageRankfails.ScientiﬁcReports,5, 16181.

Mariani,ManuelSebastian,Medo,Matúˇs,&Zhang,Yi-Cheng.(2016).Identiﬁcationofmilestonepapersthroughtime-balancednetworkcentrality.Journal ofInformetrics,10(4),1207–1223.

Mariani,ManuelSebastian,Medo,Matúˇs,&Lafond,Franc¸ois.(2018).Earlyidentiﬁcationofimportantpatents:Designandvalidationofcitationnetwork metrics.TechnologicalForecastingandSocialChange,http://dx.doi.org/10.1016/j.techfore.2018.01.036

Martin,Travis,Ball,Brian,Karrer,Brian,&Newman,M.E.J.(2013).CoauthorshipandcitationpatternsinthePhysicalReview.PhysicalReviewE,88(1), 012814.

Mattedi,MarcosAntônio,&Spiess,MaikoRafael.(2017).Theevaluationofscientiﬁcproductivity.História,Ciências,Saúde-Manguinhos,24(3),623–643. Medo,Matúˇs,&Cimini,Giulio.(2016).Model-basedevaluationofscientiﬁcimpactindicators.PhysicalReviewE,94(3),032312.

Mingers,John,&Leydesdorff,Loet.(2015).Areviewoftheoryandpracticeinscientometrics.EuropeanJournalofOperationalResearch,246(1),1–19. Morone,Flaviano,&Makse,HernánA.(2015).Inﬂuencemaximizationincomplexnetworksthroughoptimalpercolation.Nature,527(7579),544. Mutz,Rüdiger,&Daniel,Hans-Dieter.(2012).Thegeneralizedpropensityscoremethodologyforestimatingunbiasedjournalimpactfactors.

Scientometrics,92(2),377–390.

Newman,Mark.(2010).Networks:Anintroduction.OxfordUniversityPress.

Newman,MarkE.J.(2009).Theﬁrst-moveradvantageinscientiﬁcpublication.EPL(EurophysicsLetters),86(6),68001.

Nickerson,KyleL.,Chen,Yuanzhu,Wang,Feng,&Hu,Ting.(2018).Measuringevolvabilityandaccessibilityusingthehyperlink-inducedtopicsearch algorithm.Proceedingsofthegeneticandevolutionarycomputationconference,1175–1182.

Radicchi,Filippo,Fortunato,Santo,Markines,Benjamin,&Vespignani,Alessandro.(2009).Diffusionofscientiﬁccreditsandtherankingofscientists. PhysicalReviewE,80(5),056103.

Raghavendra,Ramya,Cerutti,Federico,&Preece,Alu.(2018).Whendatalie:Fairnessandrobustnessincontestedenvironments.InNext-generation analystVI(p.106530U).InternationalSocietyforOpticsandPhotonics.

Ren,Zhuo-Ming.(2019).Agepreferenceofmetricsforidentifyingsigniﬁcantnodesingrowingcitationnetworks.PhysicaA:StatisticalMechanicsandits Applications,513,325–332.

Ren,Zhuo-Ming,Mariani,ManuelSebastian,Zhang,Yi-Cheng,&Medo,Matúˇs.(2018).Randomizinggrowingnetworkswithatime-respectingnullmodel. PhysicalReviewE,97(5),052311.

deRijcke,Sarah,Wouters,PaulF.,Rushforth,AlexD.,Franssen,ThomasP.,&Hammarfelt,Björn.(2016).Evaluationpracticesandeffectsofindicatoruse– Aliteraturereview.ResearchEvaluation,25(2),161–169.

Schubert,András.(2008).Usingtheh-indexforassessingsinglepublications.Scientometrics,78(3),559–565.

Strumsky,Deborah,&Lobo,José.(2015).Identifyingthesourcesoftechnologicalnoveltyintheprocessofinvention.ResearchPolicy,44(8),1445–1461. Todeschini,Roberto,&Baccini,Alberto.(2016).Handbookofbibliometricindicators:Quantitativetoolsforstudyingandevaluatingresearch.JohnWiley&

Sons.

Vaccario,Giacomo,Medo,Matúˇs,Wider,Nicolas,&Mariani,ManuelSebastian.(2017).Quantifyingandsuppressingrankingbiasinalargecitation network.JournalofInformetrics,11(3),766–782.

(19)

Walker,Dylan,Xie,Huafeng,Yan,Koon-Kiu,&Maslov,Sergei.(2007).Rankingscientiﬁcpublicationsusingamodelofnetworktrafﬁc.JournalofStatistical Mechanics:TheoryandExperiment,(06),P06010,2007.

Waltman,Ludo.(2016).Areviewoftheliteratureoncitationimpactindicators.JournalofInformetrics,10(2),365–391.

Waltman,Ludo,&Yan,Erjia.(2014).Pagerank-relatedmethodsforanalyzingcitationnetworks.Measuringscholarlyimpact,83–100.

Wasserman,Max,Zeng,XiaoHanT.,&Amaral,LuísA.Nunes.(2015).Cross-evaluationofmetricstoestimatethesigniﬁcanceofcreativeworks. ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,112(5),1281–1286.

West,JevinD.,Jensen,MichaelC.,Dandrea,RalphJ.,Gordon,GregoryJ.,&Bergstrom,CarlT.(2013).Author-leveleigenfactormetrics:Evaluatingthe inﬂuenceofauthors,institutions,andcountrieswithinthesocialscienceresearchnetworkcommunity.JournaloftheAmericanSocietyforInformation ScienceandTechnology,64(4),787–801.

Zeng,An,Shen,Zhesi,Zhou,Jianlin,Wu,Jinshan,Fan,Ying,Wang,Yougui,etal.(2017).Thescienceofscience:Fromtheperspectiveofcomplexsystems. PhysicsReports.

Zhou,Yan-Bo,Lü,Linyuan,&Li,Menghui.(2012).Quantifyingtheinﬂuenceofscientistsandtheirpublications:Distinguishingbetweenprestigeand popularity.NewJournalofPhysics,14(3),033033.