Unbiased evaluation of ranking metrics reveals consistent
performance in science and technology citation data
Shuqi Xu
a, Manuel Sebastian Mariani
a,b, Linyuan Lü
a,c, Matúˇs Medo
a,d,e,∗aInstitute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, PR China bURPP Social Networks, University of Zurich, 8050 Zurich, Switzerland
cAlibaba Research Center for Complexity Sciences, Hangzhou Normal University, 311121 Hangzhou, PR China dDepartment of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, 3010 Bern, Switzerland eDepartment of Physics, University of Fribourg, 1700 Fribourg, Switzerland
Keywords: Citation networks Network ranking metrics Node centrality Metrics evaluation
Milestone scientific papers and patents
Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to iden-tify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics’ ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics’ performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other pop-ular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.
1. Introduction
Citation-based metrics for impact build on the premise that the number of citations received by a scientific paper (or a patent) is a reliable proxy for its scientific (or technological) impact. Such metrics are used not only to assess the impact of individual papers, but also to evaluate the overall research output of research units such as individual researchers (Hirsch, 2005;Medo & Cimini, 2016;Radicchi, Fortunato, Markines, & Vespignani, 2009;Zhou, Lü, & Li, 2012), research institutes (Charlton & Andras, 2007;West, Jensen, Dandrea, Gordon, & Bergstrom, 2013), and journals (González-Pereira, Guerrero-Bote, & Moya-Anegón, 2010;Harzing & Wal, 2009), for example. The relative ease with which new metrics of research impact can be designed has contributed to their proliferation (Mingers & Leydesdorff, 2015; Todeschini & Baccini, 2016; Waltman, 2016), and uncritical use of such metrics has eventually met a strong opposition (de Rijcke, Wouters, Rushforth, Franssen, & Hammarfelt, 2016;Hicks, Wouters, Waltman, de Rijcke, & Rafols, 2015;Leydesdorff, Bornmann, & Opthof, 2018).
∗ Corresponding author at: Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, PR China.
E-mail addresses:linyuan.lv@uestc.edu.cn(L. Lü),matus.medo@unifr.ch(M. Medo).
http://doc.rero.ch
3XEOLVKHGLQ-RXUQDORI,QIRUPHWULFV
ZKLFKVKRXOGEHFLWHGWRUHIHUWRWKLVZRUN
Inparticular,scholarshaveemphasizedtheneedforunderstandingthetheoreticalfoundationsofimpactmetrics(Waltman, 2016),andevaluatinganalyticallyandempiricallythemeritsofthemetrics(Leydesdorffetal.,2018).
Despitetheuseofcitation-basedmetricsforresearchevaluationpurposesandtheincreasinglyrecognizedneedtobetter grasptheirmeritsandpitfalls,wedonotknowyetwhichmetricsbestdeliverontheirpromisetogaugethesignificanceof ascientificpaperorapatent.Thisgapispotentiallydangerous:Ifapaper-levelmetricisassumedtobeagoodproxyfor significanceanditisusedforresearchevaluationpurposes,yetitundervaluespaperswhosesignificanceisundeniable,its normativeuse(Leydesdorffetal.,2018)mightleadtodecisionsthatpenalizetrulysignificantresearch.
Weaimtofillthisgapbyprovidingacomprehensiveempiricalcomparisonofabroadrangeofrankingmetrics.Weassess themetrics’abilitytosingleoutscientificpapersandpatentsthathavebeenrecognizedbyfieldexpertsasgroundbreaking orseminal.Thecoreideabehindthisevaluationisthatmetricsthataimtogaugethesignificanceofapaper/patentshould beabletodetectpapers/patentswhoseoutstandinglong-termsignificancefortheinvolvedfieldsisundeniable.Ourgoalis toanswer,whethersomemetricsperformwellacrossdifferentcitationdatasets.Ifthatisnotthecase,whichcharacteristics oftheinputdatadecidewhichmetricisthemostsuitable?
Tothisend,weanalyzethreecitationnetworks:thescholarlycitationdatasetthatincludespaperspublishedbythe AmericanPhysicalSociety(APS),thecitationdatafromtheHigh-EnergyPhysicsLiteratureDatabaseINSPIRE(HEP),and theU.S.PatentOfficepatentcitationdata(PAT).Weuseexpert-selectedsetsofseminalnodesandassessrankingmetrics byhowwellaretheseminalnodesrankedbythem.Inparticular,weusemilestonepapersselectedbyAPSjournal edi-torsfortheAPSdata,community-curated“ChronologyofMilestoneEventsinParticlePhysics”fortheHEPdata,andthe listofsignificantpatentsbyStrumskyandLobo(2015)forthePATdata(seeDatasectionfordetails).Whilebynomeans exhaustive,theselistsofseminalpublicationsconsistofpapersandpatentsofexceptionalimportance(manypapershave, forexample,ledtoaNobelPrizetooneormoreoftheirauthors).Ourevaluationincludesninenetwork-basedranking metricsfromthescientometricsandnetworkscienceliteraturetogetherwiththeirtime-normalized(Mariani,Medo,& Zhang,2016)variants.Toprovideacomprehensivecomparisonofmetrics,wehavechosennetwork-basedmetricsthat havebeenusedinbibliometrics(Waltman,2016)ortheyhaveperformedwellinothernetworks(suchassocialand tech-nologicalnetworks).Additionally,weprovideresultsalsoforthepercentile-basedcitationcountwhichiscommonlyused inbibliometrics(Bornmann,Leydesdorff,&Mutz,2013;Leydesdorff,Bornmann,Mutz,&Opthof,2011).
Expert-selectednodeshavebeenusedbeforetoevaluaterankingsofauthors(Dunaiski,Geldenhuys,&Visser,2019a; Radicchietal.,2009),rankingsofmovies(Ren,Mariani,Zhang,&Medo,2018;Wasserman,Zeng,&Amaral,2015),rankings ofscientificpapers(Dunaiski,Geldenhuys,&Visser,2019b;Marianietal.,2016),andrankingsofcourtcases(Fowler&Jeon, 2008),forexample(seeDunaiski,Geldenhuys,&Visser,2018)forarecentin-depthdiscussionofthisevaluationapproach). Wemakehereanimportantmethodologicaldistinctionbydistinguishingtwosimilar,yetfundamentallydifferentranking tasks:
Task1Theusualtaskforarankingmetricistorankthegivenseminalnodesashighaspossible.Thisismotivatedbythe assumptionthatifseminalnodesareknowntobeofhighimpact,agoodmetricshouldplacethemintherankingofallnodes ashighaspossible.Toevaluatethemetrics’performanceinthistask,onetypicallyusestraditionalinformation-retrieval metricssuchasprecision,recall,andaccuracy(Dunaiski,Visser,&Geldenhuys,2016).
Task2Theneedforunbiasedevaluation,whichiscommoninscientometrics(Mingers&Leydesdorff,2015;Mutz&Daniel, 2012;Vaccario,Medo,Wider,&Mariani,2017),motivatesthesecondtask:Toranktheexpert-selectednodesashighas possiblewhilstrequiringthattherankingmetricisunbiased.Sincemoststructuralrankingmetricsarebiased(seeAppendix Aforademonstration),theirevaluationmustincludeapenaltyfortheperformancegainedthankstothemethods’bias. Citationdatacommonlyfeaturevariousbiasessuchasthefieldbias(Vaccarioetal.,2017),forexample.Wefocushereon theagebias;wesaythatagebiasispresentinthedatawhenthereisadependencebetweenthemeancitationcountand thepublicationage.Besidesbeingparticularlystrong,theagebiasisalsoexplicitandeasytomeasureasitisdeterminedby thepaper’spublicationdate.Itthusprovidesagoodtestbedfordealingwithbiasesintherankingproblem.
Aspartofourgoaltoevaluatetherankingperformanceofthechosenrankingmetrics,weaimtoelucidatethedifference betweenthetworankingtasksdescribedabove.Inparticular,weshowthatTask1canfavorbiasedmetricstosuchextentthat acustom-constructedrankingmethodbasedonlyonthebiasitself(inourcase,rankingbyage)caninsomecasesoutperform allstandardrankingmetrics.ThefactthatametricperformswellinthecommonTask1isthusnodirectindicationofitsability toassessthevalueorimpactofthenetwork’snodes.Wedemonstratethatthenormalizedidentificationrateintroduced inMarianietal.,(2016)appropriatelyaddressesTask2byimposingperformancepenalizationproportionaltothebias magnitudeoftheevaluatedmetric.UnlikeTask1,theresultsinTask2revealconsistentpatternsofmetrics’performance acrossthethreestudieddatasets.AsfurtherdiscussedinSection5.3,theproposedevaluationcanbealsointerpretedasa rankingproblemwhereagivensetofseminalnodesisinterpretedasapotentiallybiasedsamplefromalargergroupof high-qualitynodes.ThisfurtherincreasestherelevanceofTask2ascomparedtothecommonandstraightforwardapproach presentedinTask1.
Thepaperisorganizedasfollows.InSection2,weprovidealiteraturereview.InSection3,wedescribeandanalyzethe datasetsandthecorrespondingsetsofseminalnodes.InSection4,wepresenttheconsiderednetworkmetricsanddescribe variousperformancemeasuresofametric’srankingabilitybasedonseminalnodes.InSection5,wefirstaddressTask1, evaluatethemetricsbyhowwelldotheyranktheseminalnodes,andeventuallydiscussdrawbacksofthisevaluation
approach.InSection6,weaddressTask2wherethemetrics’biasistakenintoaccount,andexplainwhyarethus-obtained resultsmorerelevantthantheresultsobtainedwithinTask1.FinallyinSection7,wereviewthelimitationsandopen researchdirections,discussthemanagementimplicationsofourstudy.
2. Literaturereview
2.1. Citation-basedrankingmetrics
Citationimpactindicatorsareakeytoolinscientometricsandplayaprominentroleintheevaluationofscientificand technologicalpublications(Waltman,2016).Thegrowingdemandofevaluation informationfromresearchers,funding bodies,andresearchinstitutionsand theincreasingavailabilityofextensivedataonscholarlyactivityhavedriventhe proliferationofnewindicators(Mattedi&Spiess,2017).
Amongcitation-basedimpactindicatorsforresearcharticles,citationcount(referredasindegreeinthefieldofnetwork scienceNewman,2010;Zengetal.,2017)isthemostbasicandestablishedoneasithasbeenusedforrankingofscholarly publicationssincethe1970s(Liao,Mariani,Medo,Zhang,&Zhou,2017).Thebasicpremisecitationcountissimple:the mostinfluentialpublicationsarethemostcited.Themetric’ssimplicitycomesatacostasthenaturaldifferencesbetween citationsareneglectedbycitationcount(seeBornmann&Daniel,2008)foranextensivereviewofthecitingbehavior).In particular,citationcountassignsthesameweightstoacitationfromaground-breakingarticlepublishedinaleadingjournal andacitationfromanobscurearticle.TheseminalPageRankalgorithmfortheWorldWideWeb(Brin&Page,1998)assigns higherweighttoreferencesfromwebpagesthatarehighlyvaluedbythealgorithm.Chen,Xie,Maslov,andRedner(2007) appliedPageRanktoacitationnetworktomeasuretheimportanceofindividualscientificpublications,initiatingtheinterest inrecursivecitationimpactindicators(Waltman,2016).Sincethen,variousPageRankvariants(Waltman&Yan,2014)have beenproposed,ofwhichtheCiteRank(Walker,Xie,Yan,&Maslov,2007)isthebestknown.OfnoteistheHITS (Hyperlink-InducedTopicSearch)algorithm(Kleinberg,1999)whichassignstwoscores,hubscoreandauthorityscore,toeachnode.In thecontextofcitationdata,itisnaturaltoconsidervitalreviewsashubswhichciteotherinfluentialpublicationsthathave highauthorityscoreandhigh-impactarticleswhichtendtobecitedby,amongothers,reviewarticleswithhighhubscores (Nickerson,Chen,Wang,&Hu,2018).Anothernotablebranchofresearchimpactindicatorsistheh-index(Hirsch,2005) whichwasintroducedtoevaluatethescientificoutputofresearchersandlaterextendedbySchubert(2008)toassessthe impactofindividualpublications.Alargenumberofvariantsoftheh-indexhavebeenproposedintheliterature(Alonso, Cabrerizo,Herrera-Viedma,&Herrera,2009).
Besidesthemostusedcitationimpactindicators,includingcitationcount,PageRank,CiteRank,H-indexandHITSauthority score,weconsideralsoseveralnetwork-basedmetricsthatperformswellinotherrankingscenarios:LeaderRank,Collective Influence,andSemi-localcentrality.ThesemetricsareintroducedinSection4.
2.2. Rankingbiasincitationanalysis
Intheanalysisofcitationdata,citationcountsofdifferentpublicationscannotbedirectlycomparedastherearevarious sourcesofbiasthatcaninvalidatethevalidityofsuchacomparison.Themost-studiedbiasesareinducedbythepublication researchfield,ageanddocumenttype(Waltman,2016).Fieldbiasmanifestsitselfbythemeancitationcountofpublications ofasimilaragevaryinggreatlybetweenfieldsbecauseoftheirdifferentcitationpractices(Lundberg,2007).Theaverage citationcountdiffersbetween,forexample,naturalsciencesandhumanitiesbythefactorof10(Bornmann&Marx,2015). Theagebiasofcitationcounthastwodistinctcomponents.First,theaveragecitationcountofpapersofafixedagegradually increasesovertime(Martin,Ball,Karrer,&Newman,2013).Second,thecitationcountsnaturallygrowwiththeageof publicationsastheyaccumulatemoreandmorecitationswithtimewhichpreventscitationcountsofpapersofdifferent agefrombeingdirectlycomparable.Finally,citationcountsofpublicationsofdifferentdocumenttypes(suchasarticle, letter,review,andcommentary)shouldnotbedirectlycomparedwitheachotherbecause,forexample,reviewpaperstend toattractmorecitationsthanordinaryresearcharticlesandletters(Lundberg,2007).
Toovercomethesebiases,anumberofnormalizedcitation-basedrankingmetricshavebeendevelopedtoallowformore faircomparisons,suchasmean-basedmetricsandpercentile-basedmetrics(seeBornmann&Marx,2015;Waltman,2016 forreviews).Inthecaseofpaperandpatentcitationnetworks,thatareourfocusinthismanuscript,themostrelevantbias istheagebiaswhichisstrongandrelevantinbothpaperandpatentcitationdata.
Despitemanycitation-basedrankingmetricsbeingproposedinthepast,acomprehensivecomparisonoftheirranking performanceonvariousdatasetsisstilllacking.Todeterminewhichmetricsarebestsuitedtorankpapersorpatentsin citationnetworksisthefirstresearchgapthatweaimtofillbythisstudy.Thecommonapproachtoassessametric’sranking performanceisbasedonexpert-selectednodes.Thepreviousstudiesadoptingthisapproachfocussolelyontheranking positionsoftheexpert-selectednodesandignoretheconfoundingeffectsthatcanbeintroducedbythechoiceoftheseminal nodes.Forexample,ifexpertstendtoidentifyoldworksasseminal,arankingmetricthatsharesthisbiasgainsanadvantage anditspotentialsuperioritymaybeillusive.Wefillthissecondresearchgapbyexploringtheinterplaybetweenthebiasof theevaluatedrankingmetricsandthebiasoftheseminalnodes,andexploringanevaluationprocedurethattakesthebias oftheseminalnodesintoaccount.
Table1
Basiccharacteristicsofthenetworkscorrespondingtothethreeanalyzeddatasets:theirtimespan,thenumberofnodesN,thenumberofedgesE,andthe numberofseminalnodesS.
Dataset Label Timespan N E S
APSpapers APS 1893–2016 595,287 7,051,801 160
HEPpapers HEP 1764–2017 829,708 14,994,123 310
USpatents PAT 1926–2010 6,237,625 45,962,301 112
3. Data
Tocomparetherankingperformanceofnetwork-basedmetrics,weusethreecitationdatasets:theclassicalAmerican PhysicalSocietycitationdata,high-energyphysicscitationdata,andtheU.S.PatentOfficecitationdata.Eachdatasetcan berepresentedasagrowingdirectednetworkwherenodesgraduallyappearwithtime.Timeresolutionisonedayfor alldatasets.Nodesrepresentpapersorpatentsanddirectedlinksrepresentcitations.Foreachdataset,thereisasetof correspondingexpert-selectednodesofhighimpactthatwereferasseminalnodes.Table1summarizesbasiccharacteristics oftheanalyzednetworksandthecorrespondingsetsofseminalnodes.
3.1. AmericanPhysicalSocietycitationdata(APS)
TheAmericanPhysicalSociety(APS)datasetinourpossessioncoversyears 1893–2016(thedatasetisavailableon demandfromhttps://journals.aps.org/datasets).Afterremovingnon-researchpapers(announcements,bookreviews,etc.), thedatasetcontains595,287nodes(papers)publishedbytheAPSjournalsand7,051,801directedlinks(citations)between them.Forthisdataset,weusemultipleselectionsofmilestonepaperschosenbyeditorsofAPSjournals:87PhysicalReview Lettersmilestones,123PhysicalReviewEmilestones,2and78selectpapersannouncedonthe125thanniversaryofthe
PhysicalReviewjournals.3Intotal,thereare161uniqueseminalpapers,ofwhich160arepresentinthecitationdata(the
onemissingpaperisfrom2017,henceoutsidethecoverageperiodofourdataset). 3.2. High-EnergyPhysicscitationnetwork(HEP)
INSPIREisaprojectrunbyleadinghigh-energyphysicsinstitutionsaroundtheworld(CERN,DESY,Fermilab,IHEP, andSLAC).Amongotherthings,itcuratesadatabaseofpapershigh-energyphysicspapers(andpapersrelevanttothe high-energyphysicscommunity),whichisalsomadeavailableondemandforresearchpurposes.4 Afterprocessingthe
downloadedxmldatadumpcoveringyears1764–2017,weobtainedacitationnetworkcontaining829,708nodesand 14,994,123directedlinks.Thelistofmilestonepapershasbeendownloadedfromthewebsite“ChronologyofMilestone EventsinParticlePhysics”5 thatlistsmilestoneeventsandthecorrespondingpapers.Thewebsiteisajointeffortofthe
InstituteforHighEnergyPhysics(Russia)andtheParticleDataGroup(USA)withseveralleadinghigh-energyresearchers contributingtothefinalversionofthechronology.Thus-obtainedmilestonepapershavebeenmatchedwiththeHEPdata, leadingtothefinalsetof310seminalpapers.
3.3. USpatentcitationnetwork(PAT)
TheUSpatentdatasetwascollectedbyKogan,Papanikolaou,Seru,andStoffman(2017)andcoversyears1926–2010. Intotal,thereare6,237,625nodes(patents)and45,962,301links(citations)amongthem.InStrumskyandLobo(2015), theauthorslisted175patentswhich“affectedsociety,individualsandtheeconomyinahistoricallysignificantmanner”.In agreementwithMariani,Medo,andLafond(2018),weremovethepatentsissuedoutsidethecitationdataset’stimespanas wellasthedesignpatentsthatareabsentinthecitationdata.Asaresult,112seminalpatentsareusedforfurtheranalysis. Table2comparesthecharacteristicsofallnodesandtheseminalnodesineachdataset.Asexpected,theseminalnodes haveindegree(commonlyreferredtoasthenumberofcitations)significantlyhigherthantheoverallmedianindegreein allthreedatasets.Thisconfirmsthattheexpertassessmentoftheseminalnodesisnotincontradictionwiththeirimpact asreflectedbythecitationnetwork.
Table2furtherliststhetimesneededtocollecttheirfirstthreeandfivecitationsrespectively,3and5,byvarious groupsofnodes.Thesetimesindicatethetimescalesofcitationdynamics.Wesee,forexample,thatthenodescollecttheir citationsinthePATdatasignificantlyslowerthanintheAPSandtheHEPdata.IntheAPSdata,both3and5aresmaller fortheseminalnodesthantheyareforallnodes,whichisunderstandablegiventhemuchhigherindegreeoftheseminal
1Retrievedfromhttps://journals.aps.org/prl/50years/milestonesonJune6,2017. 2Retrievedfromhttps://journals.aps.org/pre/collections/pre-milestonesonJune6,2017. 3Retrievedfromhttps://journals.aps.org/125yearsonJanuary12,2018.
4WedownloadedtheINSPIREdataonOctober30,2017fromhttps://inspirehep.net/dumps/inspire-dump.html. 5Retrievedfromhttp://web.ihep.su/dbserv/compas/onApril6,2018.
Table2
Acomparisonbetweentheseminalnodesandallnodes.Here3and5arethemeantimesneededforthenodestogettheirfirst3and5citations, respectively(ignoringthenodesthathavelessthan3and5citations,respectively).
Dataset Setofnodes Medianindegree 3 5
APSpapers All 5 3.6years 4.8years
Seminal 239 1.3years 2.2years
HEPpapers All 4 3.1years 3.9years
Seminal 88 7.6years 12.4years
USpatents All 3 11.5years 13.3years
Seminal 30 9.6years 12.0years
Table3
Thesummarytableofallmetrics;ourcomputationofeachmetricisbasedontheprovidedimplementationreference.Weincludedalsoage-normalized variantsofthedisplayedmetricsintheanalysis(exceptforYCCPthatalreadyinvolvesage-normalization).Exceptfortherescaledcitationcount(Newman, 2009)andtherescaledPageRank(Marianietal.,2016),therescaledvariantsoftheremainingmetricshavenotbeenconsideredbefore.
Metric Abbreviation Implementationreference
Citationcount C Newman(2010)
PageRank P Chenetal.(2007)
CiteRank T Walkeretal.(2007)
LeaderRank L Lü,Zhang,Yeung,andZhou(2011)
H-index H Hirsch(2005)
Directedcollectiveinfluence CI BovetandMakse(2019)
Semi-localcentrality SLC Chen,Lü,Shang,Zhang,andZhou(2012)
HITSauthority HITS Kleinberg(1999)
Yearlycitationcountpercentile YCCP Leydesdorffetal.(2011)
nodes.Curiously,therelationisrevertedfortheHEPdatawheretheseminalnodesneedmoretimetogettheirfirst3or5 citationsthanallpapers.Whileparadoxicalatfirstsight,thisisadirectconsequenceoftheHEPseminalnodesbeing over-representedamongtheoldnodes(seeFig.4).Atthetimewhentheseseminalnodeswerecollectingtheirfirstcitations,the citationdynamicswassubstantiallyslowerthannowadays,andthisthenmanifestsitselfintheirhigh3and5.Weshall seeinSection5.3thatthestrongagebiasoftheHEPseminalnodeshasfurtherimportantimplications.
4. Noderankingmetrics
Weuseninedistinctnetworkcentralitymetricsthataredescribedbelow(seeTable3forasummary),andtheirvariants wheretheagebiasofmetricshasbeenremovedbytherescalingprocedureintroducedinMarianietal.(2016)(seeVaccario etal.,2017forsimultaneousremovalofageandfieldbiasbytherescalingprocedure).Inadatasetwithtime-stampednodes, rescalingcanbeappliedtoanynoderankingmetric;seeSection4.10fordetails.Theinputcitationdataarerepresentedby theN×NadjacencymatrixAwhoseelementAij=1ifnodeicitesnodejandAij=0otherwise.
4.1. Citationcount(C)
Citationcountreferstothenumberofcitationsreceivedbyagivenpaperorpatent.Itisequivalenttonodeindegreeina directednetwork(Newman,2010).Fornodei,citationcountisdefinedasCi=
j=1Aji.Basedontheassumptionthatanode isimportantifitiscitedbymanyothernodes,citationcountisthesimplestandthemostwidelyusedindicatorofpaperor patentimpactincitationdata.Themetric’ssimplicitydirectlytranslatesintoitslowcomputationalcomplexity.
4.2. PageRank(P)
PageRank(Brin&Page,1998),whichisannodecentralitymetricoriginallydevisedtorankpagesintheWorldWideWeb andlaterappliedtocitationdatatoassessthesignificanceofpublications(Chenetal.,2007),introducestheimportanceof differentnodesinaself-consistentmanner:Anodeisimportantifitiscitedbyotherimportantnodes.PageRankscorePiof nodeiisdefinedbythesetofequations(Berkhin,2005)
Pi=˛
j:kout>0 Aji kout j Pj+˛ j:kout=0 Pj N+ 1−˛ N (1) wherei=1,...,N,kout j =lAjlistheoutdegreeofnodej,˛isthedampingfactor,and(1−˛)/Nisusuallyreferredtoasthe teleportationtermwhoseroleistoensurethatEq.(1)hasauniquesolution.While˛=0.85isusedintherankingofweb
pages(Berkhin,2005),˛=0.5istypicallyusedintheanalysisofcitationdata(Chenetal.,2007).Eq.(1)isusuallysolvedby iterations:startingfromtheuniforminitialscorePi(0)=1/N,everynode’sscoreisupdatediterativelyas(Berkhin,2005)
P(n+1)i =˛
j:kout>0 Aji kout j Pj(n)+˛ j:kout=0 Pj(n) N + 1−˛ N (2)wherenistheiterationnumber.Westoptheiterationswhentheaveragescorechangeissmallenough,thatis
Ni=1|P(n)i −Pi(n−1)|/N<εwhereε=10−9.Thesamestoppingconditionisusedalsoinothermetricsthatinvolveiterations(CiteRank
andLeaderRank).
4.3. CiteRank(T)
CiteRank(Walkeretal.,2007)hasbeenintroducedtooffsetPageRank’sstrongbiastowardsoldnodes[notethatin somecases,PageRankcanbealsobiasedtowardsrecentnodes(Mariani,Medo,&Zhang,2015)].Usingtherepresentation ofPageRankasarandomwalkonthecitationnetwork,CiteRankmodifiesthealgorithmbyinitiallydistributingrandom walkerspreferentiallyonrecentnodes,withtheoldnodesbeingexponentiallysuppressedatatimescale.SimilarlytoEq. (2),CiteRankscoreTifornodeicanbedefinedinaniterativewayas
Ti(n+1)=˛
j:kout>0 Aji kout j Tj(n)+˛ j:kout=0 Tj(n) N +(1−˛) exp[−(t−ti)/] N j=1exp[−(t−ti)/] (3)wheretiisthepublicationdateofnodei,tisthedatewhenthescoresarecomputed;othertermsandparametershavethe samemeaningasforPageRank.Toestimatesuitablevaluesforparameters˛and,wefollowedtheproceduredescribed inWalkeretal.(2007)wheretheauthorsmaximizethecorrelationbetweentheCiteRankscoresandthenodes’recent indegreeincrease.Theresultingparametervaluesare˛=0.50,=2.6yearsforAPS;˛=0.50,=2.4yearsforHEP;and˛=0.44, =7.6yearsforPAT.Notably,theparametervaluesforAPSarethesameasreportedinWalkeretal.(2007)despiteourAPS datasetincluding13additionalyearsand,consequently,60%morepapers.
4.4. LeaderRank(L)
TheneedforateleportationterminPageRankcanbeeliminatedbyconnectingeachnodetoanartificial“ground”node withbidirectionallinks.Theresultingparameter-freeLeaderRankmetrichasbeenproposedin(Lüetal.,2011)toquantify nodeinfluence.TheiterativeequationfortheLeaderRankscoreLis
L(n+1)i = N+1
j Aji kout j L(n)j (4)wherebothAjiandkoutj includethegroundnodeandthelinksbetweenthegroundnodeandallothernodesinthenetwork. AfterobtainingtheequilibriumscoresL(nc)
i ,thegroundnodeisremovedfromthesystemanditsscoreisevenlydistributed amongallrealnodes.Thefinalscoreofnodeiisthusdefinedas
Li=L(ni c)+ L(nc)
g
N (5)
Theredistributionofthegroundnode’sscoredoesnotaffecttherankingofnodesbyLeaderRank,though.
4.5. H-index(H)
H-index(Hirsch,2005)wasoriginallydevisedtocharacterizetheacademicimpactofresearchersbasedontheir publi-cationsandcitations(Hirsch,2005,2007).Similarlyasitwaslaterappliedtoevaluateresearchjournals(Braun,Glänzel,& Schubert,2006),itcanbeadaptedalsotoevaluateresearchpapers:Theh-indexofpaperiisdefinedasthelargestnumberhi suchthatpaperiiscitedbyatleasthipapersthateachhaveatleasthicitations(Lü,Zhou,Zhang,&Stanley,2016;Schubert, 2008).
4.6. CollectiveInfluence(CI)
CollectiveInfluencewasintroducedinMoroneandMakse(2015)toidentifythosenodesthat,whenremoved,causethe biggestdamagetoagraph’sgiantcomponent;thealgorithmisbasedontheclassicalproblemofpercolationincomplex networks.TheCIcentralityofnodeiatlevellisdefinedas
CIli=(ki−1)
j:dij=l(kj−1) (6)
wherekiisthedegreeofnodei,dijislengthoftheshortestdistancebetweennodesiandj,andlisthemetric’sparameter. InlinewithBovetandMakse(2019),weconsideronlynodeindegreeinthecomputationofCIasnodeindegreeisindicative ofnodeimpact.Usingnodeoutdegreeorcombiningin-andout-degreeleadstoinferiorresultsinourevaluation.Distance dijiscomputedsothatitrespectslinkdirections.Ourtestsshowthatl=1andl=2producethebestresults,themetric’s performancedeterioratesaslincreasesfurther.Weusel=2forallCIresultspresentedhere.
4.7. Semi-localcentrality(SLC)
Semi-localcentralitywasproposed(Chenetal.,2012)asanextensionofthepurelylocalnodedegree(whichisthe simplestnodecentralitymetric).Itissemi-localinthesenseofconsideringthenodeneighborhooduptothefourthorder. Thesemi-localcentralityscoreofnodeiisdefinedas
SLCi=
j∈i Qj, Qj= k∈j Nk (7)whereiisthesetofthenearestneighborsofnodeiandNkisthenumberofthenearestandthenext-nearestneighborsof nodek.Inthispaper,weconsideronlythein-neighbors(i’snearestin-neighborsarethenodesthatcitenodei)toleverage theimpactofcitations.IfSLCiscomputedusingout-neighborsorusingbothin-andout-neighbors,itsperformance(as measuredbythemetricsintroducedinSections5.1and6.1)deteriorates.
4.8. Hyperlink-InducedTopicSearch(HITS)
HITS(Kleinberg,1999)isaseminalrankingalgorithmthatconsiderstworolesforeachnodeinthenetwork,authority andhub.Agoodauthorityispointedbymanyhubs,andagoodhubpointstomanyauthorities.Theauthorityscoreofa nodeisequaltothesumofthehubscoresofallnodesthatpointtothisnode,andthehubscoreofanodeisequaltothe sumoftheauthorityscoresofallnodesthatthisnodepointsto.Mathematically,theauthorityscoreaiandthehubscorehi ofnodeifulfill an+1i = N
j=1 Ajihnj, hn+1i = N j=1 Aijanj (8)Bothscoresarenormalizedaftereachiterationsothatthesumoverallnodesisoneforeachscore.Theiterationsstop whentheaveragescorechangeissmallenough,thatis,
Ni=1(|a(n)i −ai(n−1)|+|h(n)i −h(n−1)i |)/N<εwhereε=10−9.Ofthe twoscores,authorityisrelatedtonodeimpactinacitationnetworkasitisderivedfromincominglinksasopposedtohub whichisderivedfromoutgoinglinks(references)tonodesofhighauthority.Wethusconsidertheequilibriumauthority valueasthenodes’HITSscoreshere.4.9. Yearlycitationcountpercentile(YCCP)
Theuseofpercentilesintherankingofpapershastheadvantageofavoidingworkingdirectlywithcitationcountsthatare typicallybroadlydistributedwhichmakesitdifficulttoaggregatethem(Leydesdorffetal.,2011)by,forexample,averaging (suchascomputingtheaveragecitationcountofthepapersauthoredbyanindividualresearcher).Toreducetheagebias intheresultingranking,wecomputethecitationcountpercentileofanodewithrespecttothecitationcountsofallnodes thathaveappearedinthesameyearasthetargetnode.Thenodesarefinallyrankedbytheirrespectivepercentileranks. Notethatbycomparingwithnodesthatappearedinthesameyear,thisrankingmetricalreadyaddressestheagebias;we thusdonotconsiderarescaledversionofthismetric.Ifthecitationcountpercentileiscomputedwithrespecttoallnodes regardlessoftheirappearancetime,therankingisthesameastherankingbycitationcount,C.
4.10. Rescaledmetricvariants
Tosuppresstheagebiasofrankingmetrics,weusetherescalingprocedureproposedbyMarianietal.(2016);seeDunaiski etal.(2019b)forotherapproachestoscorenormalization.TherescaledscoreR(mi)formetricmandnodeiiscomputedas
R(mi)=mi−i(m) i(m)
(9) wheremiistheoriginalscoreofnodeiasproducedbymetricm,andi(m)andi(m)arethemetricmeanandstandard deviation,respectively,computedovernodesinawindowcenteredatnodei.Assumingthatthenodesaresortedbytheir age/appearancetime,thewindowaroundnodeiincludesnodesj∈[i−W/2,i+W/2]wheretheparameterWrepresents thewindowsize.FortheAPS,HEP,andPATdata,weuseW=1000,W=2000,andW=15,000,respectively,whichisroughly proportionaltothenumberofnodesineachdataset.
AsshowninMarianietal.(2016,2018)andRen(2019),rescalingsignificantlyreducesthemagnitudeoftheagebias—and, inthecaseofVaccarioetal.(2017),oftheageandfieldbias—ofcitationcountandPageRank.Weusethistechniquehere torescaleallrankingmetricsintroducedabove,andinturncomparetheirperformancewithoriginalnon-rescaledmetrics. Rescaledmetricsaremarkedbyadding Ratthebeginningof theiroriginallabels(e.g.,RPfor rescaledPageRank).The effectivenessoftherescalingprocedureinremovingtheagebiasofrespectivemetricsinthestudieddatasetsisinvestigated inAppendixAwherewefindthatrescalingindeedsignificantlyreducestheagebiasforalmostalldataset-metricpairs(see TableA.5forasummary).
5. Metricperformanceinrankingtheseminalnodes(Task1)
Wefirstevaluatetherankingperformanceofmetricstakingintoaccountsolelytherankingpositionsoftheseminal nodes,inlinewithcommoninformation-retrievalpractices(Dunaiskietal.,2018;Lü&Zhou,2011;Manning,Raghavan,& Schütze,2010;Radicchietal.,2009).
5.1. Identificationrate
Ourbasicevaluationprocedureisbasedonacompletegivennetworkwhichisusedasaninput.Werankthenetwork nodesbytheirscoreaccordingtoagivenmetricmandcomputethefractionoftheseminalnodesthatareamongthetopzN nodes,fz(m).Thisquantityiscommonlyreferredasrecallininformationfilteringliterature(Lüetal.,2012).Tocomplywith previousresearchonrescaling(Marianietal.,2016),andalsotoavoidconfusionforarelatedage-dependentversionofthis metric(seethenextparagraph),weuseherethepreviously-coinedtermidentificationrate(IR)forfz(m).Notethatz∈(0,1) isanevaluationparameterthat,toreflectourgoalofevaluatingtherankingmetricsbywhethertheyranktheseminalnodes “highly”,shouldbeasmallnumber.Weusez=1%unlessstatedotherwiseandlaterverifyourmainresultsusingz=0.5% andz=2%,respectively.
Besidesassessingtheidentificationrateonacompletenetwork,wealsostudythemetrics’performanceasafunctionof theageoftheseminalnodes(Marianietal.,2016).Tothisend,weconstructnetworksnapshotsattheendofeachcalendar year(ignoringallnodesandlinksthatappearafterward),andrankthenodesineachnetworksnapshot.Thisallowsusto evaluate,individuallyforeachseminalnode,whetheritwasatthetopzfractionoftherankingatanygivenaget.By averagingthisoverallseminalnodes,6 weobtaintheidentificationratef
z(m,t)whichisnowafunctionoftheageof seminalnodes.fz(m,1year),forexample,isthefractionoftheseminalnodesthatareinthetopzfractionoftheranking whentheyareoneyearold.
Task1focusessolelyontherankingpositionsoftheseminalnodes,andthesearereflectedbyfz(m)andfz(m,t).While theformerevaluatesthe“final”rankingpositionsoftheseminalnodes,thelatterallowsustoinspecthowfast(orslow)do theseminalnodesriseintherankingsbytherespectivemetric.
5.2. Metricevaluationusingidentificationrate
Fig.1showstherankingmetricsevaluatedbytheirIRincompletedatasets.Overall,thehighestidentificationratesare foundinAPS,followedbyHEP,andthenbyPAT.AlikelyreasonforthisisprovidedbyTable2whichshowsthatinthePAT data,medianindegreedifferstheleastbetweenallnodesandtheseminalnodes,thusmakingtheseminalnodesinthis datasetdifficulttobeseparatedfromtheothernodes.
TherelativestandingsofmetricsarerathersimilarbetweenAPSandPATwithPageRankbeingthebest-performing metricinboth.Relativedifferencesbetweenthemetricsinbothdatasetsarerathersmall,though:Inbothdatasetsthere areafewmethodswithnearly-identicalperformance,andtheratiobetweenthebestandtheworstmetric’sIRisaround 1.5.TheresultsareverydifferentinHEPwhere:(1)LeaderRank(L)outperformsthesecond-bestmethodbyawidemargin,
6Ifaseminalnodeappearstyearsbeforetheendofthecompletedataset,itisobviouslyimpossibletoknowitsrankingataget>t.Seminalnodesthat areyoungerthantarethereforeexcludedfromtheaveraging.
Fig.1. Metrics’performanceinidentifyingtheseminalnodesasmeasuredbytheidentificationrate(z=1%)incompletedatasets.Notethatthemaximal displayedvaluesdifferbetweenthepanels.Colorsofthebarsareusedtodistinguishtheoriginalrankingmetrics(white)andtheirage-rescaledcounterparts (orange).(Forinterpretationofthereferencestocolorinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.)
Fig.2.Theidentificationrateofindividualmetricsasafunctionoftheseminalnodeage(inyears).Tofacilitatethecomparison,themetrics’performance isnormalizedtothebestmetricineachagebin.AmetricwithzeroIRthusreceiveszeroscore,whileametricthatachievesthebestIRforgivenseminal nodeagereceivesthescoreofone.
(2)Theratiobetweenthebestandtheworstmetric’sIRis3.5,(3)Allrescaledmetricsperformsignificantlyworsethan theirunrescaledcounterparts.Wefocusonmetricevaluationinthissection;reasonsforthedifferencesobservedinHEP arediscussedinSection5.3.
Insummary,LeaderRankcanbeconsideredasthebest-performingmetricinTask1asitisclearlybestintheHEPdata andnearly-bestintheAPSandPATdata.Thisholdsalsowhendifferentevaluationthresholds,z=0.5%andz=2%,areused. Fig.2consequentlyshowsthemetrics’relativeperformanceasafunctionoftheseminalnodeage.Thisapproachservesto revealthetimeevolutionofthemetrics’rankingperformance.Tofurtherfacilitatethecomparisonofmetrics,wenormalize themetrics’identificationrateatagivenagetofseminalnodes,f(m;t),bythebest-achievedIRatthisage,maxnf(n;t). Relativeperformancethusrangesfromzero(whenametric’sIRiszeroataget)toone(achievedbythebest-performing metricataget).AsshowninFig.2,therelativeperformanceofmetricschangesdramaticallywiththeseminalnodeage: metricsthatworkwellshortafterpublication(mostlyrescaledmetrics)losetheiradvantageastheseminalnodesbecome older.Inthedisplayednodeagerange,CiteRankandrescaledCiteRank(TandRT)aretwobest-performingmetricsinAPS andPAT.ForHEP,thereisnosinglemetricthatperformswellformostagevalues.Rescaledcitationcount(RC)isbestuntil age5,thenh-indexandcollectiveinfluence(HandCI)arebestuntilage12,andfinallysemi-localcentrality(SLC)isthe bestfromthenuntilage20.LeaderRank(L),whichperformedbestforthecompleteHEPdatasetinFig.1,becomesthebest metriclateron(forcomparison,theaverageageoftheseminalnodesinthecompleteHEPdatasetis61years).
Togainfurtherinsightsindifferencesbetweenthemetrics,weevaluatetheirpairwisesimilarityusingtheSpearman rankcorrelationofallnodes’rankings.TheresultsareshowninFig.3togetherwithmetricclusteringbasedontheobtained correlationmatrices.Thereareseveralpointstonote.First,theclusteringofmetricsisremarkablystableacrossthedatasets. Second,theclusteringsrevealtwogroupsofmetricswhoserankingsaresimilartoeachother.ThelargergroupincludesCI, SLC,P,L,C,andH.Thesmallergroupincludessomeoftheirrescaledvariants:RP,RL,RC,andRH.Third,RCI,RSLC,andRTdo notclusterwithotherrescaledmetrics,probablyasaresultoftherescalingprocedurenotworkingperfectlyforthem(see Figs.A.1–A.3inAppendixA).Fourth,withineachofthetwomentionedclusters,thepairwiseSpearmancorrelationvalues areratherhigh(above0.73inallthreedatasets),whichindicatesahighdegreeofsimilarityamongtherespectivemetrics. NotethatwehaveomittedHITSfromthepresentationofresultsabove.Thereasonfordoingsoisthatitsperformanceis somuchworsethanthatoftheothermetricsthattheaddedvalueofdisplayingHITSinallpreviousfigureswouldbevery limited.Inparticular,theidentificationratevaluesoftheHITSauthorityscoreare0.143(APS),0.116(HEP),and0.054(PAT).7
7 HITSperformanceisstronglyinferiortoothermetricsalsointermsofnormalizedidentificationrateintroducedinSection6.1.
Fig.3.SimilarityoftheevaluatedmetricsasmeasuredbytheSpearmanrankcorrelationofallnoderankpositions.Themetrics’hierarchicalclusterings areobtainedbytheUPGMAmethodimplementedbytheclustermapfunctioninPython’sSeabornlibrary.
ThepoorperformanceofHITShereisverydifferentfromthisalgorithmbeingpraisedinthelineofresearchoncourtdecision citationnetworks(seeAgnoloni&Pagallo,2015;Fowler&Jeon,2008andthereferencestherein).Onepossiblereasonfor thisdifferenceisthatinscience,fewwouldagreethatacitationfromawell-referencedbutlittlecitedreviewpaperismore indicativeofthetargetpaper’simpactthanacitationfromahigh-impactpaperwithfewreferences(asHITSauthorityscore wouldassume).Inthissense,courtdecisioncitationnetworksmaybeintrinsicallymorefavorabletoHITSthanthepaper andpatentcitationnetworksare.Furtherresearchisnecessarytounderstandstructuraldifferencesbetweencourtdecision citationnetworksandscholarly/patentcitationnetworks.Also,acomprehensiveevaluationofseveralrankingmetricscan helpusunderstandwhetherHITSisindeedthesingularbest-performingmetricincourtdecisionnetworks.
Wehavesimilarlyomittedyearlycitationcountpercentile,YCCP,fromthefigures.AlbeittheperformanceofYCCPdoes notlackbehindthetopmetricsasmuchastheperformanceofHITS,theresultsarestillsignificantlylower:Theidentification ratevaluesofYCCPare0.700(APS),0.197(HEP),and0.375(PAT).Importantly,theIRresultsofYCCParesimilartotheresults ofRCwhichisexpectedasRCtoo,isanage-normalizedversionofcitationcountsimilarlytoYCCP.Becauseofthishighlevel ofsimilarity,wereporttheYCCPresultsonlyintext.
5.3. Caveatsofidentificationrate
Whilemetrics’performanceinFig.1isstrikinglyuniformintheAPSandPATdata,bigdifferencesarefoundintheHEP data.Toexplainwheredotheycomefrom,Fig.4showstheagedistributionsoftheseminalnodesinthedata.Intermsof realtime,thedifferencebetweenAPS/PATandHEPisalreadyapparentasthefirsttwodatasetshavetheaveragepublication yearoftheseminalnodes1976and1975,respectively,whereasitis1957fortheHEPseminalnodes.Thedifferencebetween thethreedatasetsismoreevident,though,wheneachseminalnodeisassignedtooneofthe40equally-sizedagegroups byitspublicationdate(withgroups1and40containingtheoldestandthemostrecentnodes,respectively).Thebottom rowofFig.4showsthattheHEPseminalnodesaredistributedextremelyunevenlyamongtheagegroupswith74%ofthem (230outof310)intheoldestagegroup1,andnoseminalnodesinagegroups14–40.Thebigdifferencesbetweenthetop andbottomrowinFig.4areduetotheacceleratingratesatwhichnewnodesappearinthedatasets.Thenumbersofrecent newnodesaresohighthatthey“push”theseminalnodestotheearlyagegroups.InAPS,forexample,approximately85% ofallnodesappearafter1976whichisthemeanpublicationyearofthedataset’sseminalnodes.
Thestrongtemporalnon-uniformityoftheseminalnodeshasprofoundconsequences.Firstly,itisnotfavorableto age-rescaledmetricswhich,bydesign,striveforauniformrepresentationofallagegroupsamongthetop-rankednodes.For theHEPdata,however,nodesfromagegroups2–40cancontributeonlymarginallytotheidentificationratebecausethere areonlyafewseminalnodesamongthem.Bycontrast,originalnon-rescaledmetricsaretypicallybiasedtowardsoldnodes (seeFigs.A.1–A.3)andthisgivesthemanadvantagewhenagivensetofseminalnodessharesthesamebiastowardsold nodes.Inparticular,Fig.A.2showsthatthebiasofLeaderRanktowardsoldnodesisthestrongestofallmetricsinHEP,which directlycontributestothemetric’ssuperiorperformanceinFig.1.
Secondly,theagebiasoftheseminalnodesinHEPissostrongthatitallowsthesimplerankingofnodesbytheirage(we referthismetricasAgeR;oldnodesareatthetop)tooutperformallothermetrics.Itsidentificationrateonthecomplete HEPdatais0.70whichisindeedbetterthanthevaluesshowninFig.1fortheothermetrics.Thisisfurtherillustratedbythe leftpanelofFig.5whichshowstheidentificationrateforafewselectedmetricsasafunctionoftheseminalnodeage.Here AgeRyieldszeroidentificationratewhentheseminalnodesareyoung(youngerthan30years)becauseitsimplyputsold nodesatthetopoftheranking.However,themetric’sresultsquicklyimprovewhentheseminalnodesareolderthanthat
Fig.4.Thedistributionsoftheseminalnodes’publicationdatesinthedatasets:realtime(toprow),and40equally-sizedagegroups(bottomrow).
Fig.5.PerformanceofselectedmetricsinidentifyingtheseminalnodesintheHEPdata:Acomparisonbetweentheidentificationrate(left)andthe normalizedidentificationrate(right;seeSection6.1forthedefinition)measuredasfunctionsoftheseminalnodeage.
andAgeRbecomesthebestmetricstartingfromage40,approximately.Thisdemonstratesthatevaluatingrankingmetrics solelybytheranksthattheyassigntotheseminalnodesisoflimitedrelevanceasAgeR—ametricthatentirelyignoresthe actualimpactofthenodes—iseventuallyabletooutperformallothermetrics.
Insummary,theidentificationratesobservedwithinTask1areaconfoundoutcomeofagivenmetric’sabilitytorank welltheseminalnodesandthelevelofagreementbetweenthemetric’sbiasesandthebiasesimplicitlypresentinthechosen setofseminalnodes.Notethatuntilnow,wediscussedspecificallytheagebiasbecauseitisbothmanifestlypresentaswell aseasytodefineandmeasure.Otherpotentiallyrelevantbiases—suchasthefieldbias,forexample—canbeinprinciple studiedandtreatedinasimilarwayaswedoherefortheagebias.
6. Metricperformanceinrankingtheseminalnodeswhilstpenalizingbiasedmetrics(Task2)
Havingdemonstratedthecaveatsofevaluatingtherankingperformanceofmetricsusingidentificationrate,wenow proceedtoTask2that additionallypenalizesbiasedmetrics.Tothisend, weemploythenormalizedidentificationrate introducedinMarianietal.(2016)whichimposesapenaltyonmetricsthatarebiased.
Fig.6. AnillustrationforthealternativeinterpretationofTask2:Theagedistributionoftheseminalexpert-selectednodesandthetopzfractionofnodes fromeachagegroup(weuseherefouragegroupsasanexample).
6.1. Normalizedidentificationrate
Normalizedidentificationrate(NIR)introducedinMarianietal.(2016)considerstheagedistributionofthetop-ranked nodesandappliesapenaltyfactortotheidentifiedseminalnodesthatcomefromagegroupsthatareover-represented amongthetop-rankednodes.TocomputeNIR,we divideallNnetwork nodesbyageintoGgroupsofequalsize,and computeNz(g)whichisthenumberofnodesfromeachgroupg(g=1,...,G)thatareinthetopzfractionoftheranking. Anage-unbiasedmetricwouldresultinNU:=zN/Gtopnodes,onaverage,ineachagegroup.Foranyagegroupgthatis “over-represented”(thatis,Nz(g)>NU),theseminalnodesthatareinthetopzfractionoftherankingdonotcontribute totheNIRfullybutonlyproportionallytoNU/Nz(g).If,forexample,aseminalnodeisfromanagegroupthatistwiceas frequentinthetopoftherankingasitshouldbe,thisseminalnodecontributesonlyhalftotheNIR.Bycontrast,seminal nodesfromunder-representedagegroups(Nz(g)<NU)contributetotheNIRinthesamewayastheycontributetotheIR. Inotherwords,NIRassumesapenaltyforseminalnodesfromover-representedagegroupsbutnobonusforseminalnodes fromunder-representedagegroups.Tosummarize,thefactorNU/Nz(g)introducedbythenormalizedidentificationratecan beviewedasapenaltyfortheperformancegainedbyagebiasofametric.
ThechoiceofthenumberofagegroupsGusedinthecomputationofNIRisacompromisebetweenimprovingthetemporal resolution(loweringthetimedurationofeachgroup)byincreasingGandlimitingthenaturalstatisticalvariabilityofNz(g) bykeepingGlow.WeuseG=40adoptedinpreviousliterature(Marianietal.,2016;Vaccarioetal.,2017);otherchoices leadtoqualitativelysimilarresults.Notethatduetotheintroductionofapenalizingfactor,NIRcannotbehigherthanIR foragivenranking.ThehighestpossibleNIRofoneisachievedbyarankingthatplacesallseminalnodesinthechosentop fractionoftheranking(weusetop1%here,unlessstatedotherwise)andtherankingisnotbiasedbynodeage(or,atleast, theagebinscontainingtheseminalnodesarenotover-representedinthetopoftheranking).
TherightpanelofFig.5showsthenormalizedidentificationrateasafunctionoftheseminalnodeageforasmallnumber ofselectedmetrics.ItshowsthatusingNIRindeedsolvestheproblemencounteredwhenmetricperformanceismeasured usingtheordinaryidentificationrate(leftpanelinFig.5).Inparticular,AgeRbecomes theworstmethodregardlessof theseminalnodeage,asappropriateforarankingmethodthatactuallyignoresnodeimpactinthenetwork.LeaderRank, anothermetricthatisstronglybiasedtowardsoldnodes,isalsostronglyaffectedanditsperformancestartstodecreaseat theseminalnodeage10yearsinsteadofgrowingmonotonouslywhenidentificationrateisused.Thisillustratesthatthe useofthenormalizedidentificationrateweakensthemutuallyreinforcinglinkbetweenage-biasedmetricsandage-biased setsofseminalnodes.
ThereisalsoanalternativeviewthatleadstoTask2andthenormalizedidentificationrateastheappropriateevaluation methods.Thisviewisbasedonrealizingthattheseminalnodesarenecessarilyasubsetofhigh-qualitynodes—theproverbial “tipofaniceberg”.TheshorttextintroducingthePhysicalReviewLettersmilestonesexplicitlyacknowledgesthat“Itis inevitablethatsomeveryimportantworkwillnotbefeatured(inthemilestonescollection)”.Wedefinethebestzfraction ofnodesineachagegroupasthehigh-qualitynodes—buttheproblemisthatwedonotknowwhicharethose“best”nodes. Thatiswhywestillneedtheseminalnodesbutwedonotviewthemasadefiniteandonlytargetfortheevaluatedranking metrics(whichwouldcorrespondtoTask1).Instead,werecognizethattheseminalnodesareaparticularsamplefromall high-qualitynodesinthedataset.ThisisillustratedbyFig.6wherethenumberofseminalnodesvariesgreatlyamongthe agegroupsbutthenumberofthetopzfractionofnodesisnaturallyconstant.Ifarankingmetricover-representsacertain agegroupinitstopzfractionoftherankingbyfactorX>1,onlythefraction1/Xofthetopnodesfromthisagegroupare amongthetopzfractionofnodesfromthisagegroup.Thatiswhythefactor1/Xneedstobeappliedtoanyidentifiedseminal nodesfromthisagegroup—whichispreciselywhatthenormalizedidentificationratedoes.Insummary,Task2andthe
Fig.7. Metrics’performanceinidentifyingtheseminalnodesasmeasuredbyNIRevaluatedonthecompletedatasets.
Table4
Asummaryofthemetricsevaluationbynormalizedidenticationrate.Theaveragescoreofmetricmisobtainedbycomputingitsscorerelativetothe best-performingmetricineachdataset,NIR(m)/maxnNIR(n),andaveragingthisscoreoverthethreeanalyzeddatasets.Ametricthatwouldperformbest inalldatasetswouldthereforeachievetheaveragescoreofone.ThesubsequentrowsshowtherankingofmetricsbytheirNIRforeachdataset.Thebottom partofthetableshowstheaveragescorebasedontheNIRvaluesfortwodifferenttoprankingfractionsz,0.5%and2%.
Metric RP RL RC RH RCI RSLC T RT C H CI P L SLC HITS RHITS
Avgscore 0.98 0.98 0.89 0.81 0.80 0.75 0.74 0.70 0.69 0.63 0.58 0.57 0.54 0.53 0.27 0.25 RankAPS 1 2 5 4 7 9 3 6 8 10 12 11 13 14 15 16 RankHEP 4 1 5 6 3 2 11 12 9 7 10 15 16 8 14 13 RankPAT 1 3 2 4 5 9 8 11 10 12 13 6 7 14 15 16 Avgscorez=0.005 0.96 0.96 0.78 0.73 0.72 0.68 0.65 0.63 0.60 0.65 0.53 0.55 0.49 0.46 0.20 0.21 Avgscorez=0.02 0.93 0.94 0.89 0.81 0.82 0.71 0.71 0.72 0.66 0.62 0.59 0.56 0.52 0.52 0.28 0.24
normalizedidentificationratecorrespondalsotothetaskofrankinghighlythebestnodesfromeachagegroup,fromwhich thegivenseminalnodesareapotentiallybiasedsample.
6.2. Metricevaluationusingnormalizedidentificationrate
NIRvaluesonthecompletedatasetsareshowninFig.7.Thefirstthingtonoteisthattherescaledmetricsgenerally performbetterthantheiroriginalcounterpartshere.TheonlyexceptionisTwhichitselfcontainsamechanismtoprevent themetricfrombeingoverlybiasedtowardsoldnodes,soanadditionalrescalingprocedureisinsomesensesuperfluous (evenwhenMarianietal.(2016)andFigs.A.1–A.3showthatCiteRankstilldisplaysstrongagebiasinsomeagegroups).The secondobservationisthattheNIRvaluesfortheHEPdatasetaremuchlowerthanthepreviouslyreportedIRvalues.Thisis adirectconsequenceofthepenalizationintroducedbyNIRthatheavilypenalizesbiasedrankingmetricsand,atthesame time,unbiasedrankingmetricsbeingunabletowellidentifythebiasedseminalnodes.Inlinewiththeidentificationrate resultsinTask1,YCCPperformssimilarlytoRC:itsNIRvaluesare0.678(APS),0.175(HEP),and0.348(PAT);theaverage scorereportedfortheothermetricsinTable4is0.89.
Themostimportant findingemerging fromFig.7 isthat upon adoptingthe normalizedidentificationrate forthe evaluation, metricsthat perform well in allthree datasets emerge. This is clearly visible in Table4 where the rela-tiveperformanceof therankingmetricsin allthree datasetsis summarized.We seethat rescaledPageRank, RP,and rescaledLeaderRank,RL,performbestor nearly-bestin allthreedatasets (recallthat LeaderRankisa modificationof PageRankobtainedbychangingtheteleportationterm).Thisresultisrobustwithrespecttochangingtheranking frac-tionzthatweusetoevaluatethenormalizedidentificationrate:evenwhentherelativeorderofsomemetricschanges, RPand RLremainthetwo bestmetricsbysomemargin.It isinterestingtonoteherethat whilethetwo topmetrics areglobalinthesenseoftakingthewholenetworkstructureintoaccount,theyarefolloweddirectlybyrescaled cita-tioncount, RC,which is a localmetric that is based only onthe immediate nodeneighborhood. Semi-local metrics, suchasthesemi-localcentralitySLCandthecollectiveinfluenceCI,regardlessifrescaledornot,combinetheworstof bothworlds:Theyarecomputationallymoredemandingthanlocalmetrics,andtheyranknodesworsethanlocal met-rics.
FurtherinsightscanbegainedbyplottingNIRasafunctionoftheseminalnodeageinFig.A.4similarlyaswedidforthe identificationrate,IR,inFig.2.Weseethere,forexample,thatunlikeintheAPSdata,rescaledPageRankdoesnotoutperform rescaledcitationcountintheHEPdatainthefirst18yearsofseminalnodeage.
7. Discussion
Previouslyintroducednormalized identificationrate(NIR)takesintoaccountboth therankingpositionsof expert-selected nodes as well as the metric’s bias that manifests itself in the ranking. We use NIR to uncover the
consistentperformance of impactranking metricsacrossdifferentcitation datasets.Our resultsindicate thatranking based on thenetwork structure is more successful than simple degree-based metrics in singling out the significant nodes.
7.1. Limitationsandopendirections
Thereare variousquestions that remainopenfor furtherresearch.First,to extendouranalysistomore thanone kindofbias(suchastheageandfieldbiascommoninscholarlycitationdataVaccarioetal.,2017)andtogeneralizeit tocaseswherethekindofbiasisnotexplicitlyknown.Second,identificationrate(referredtoasrecallininformation filteringandstatisticallearning)is justoneof variousperformancemetrics(see Dunaiskietal.,2018foranoverview of possibilities);we thusneedtostudy howtoaccount forbiasin theseothermetrics.Third,besidesmetric evalua-tion onrealdata,evaluation onsynthetic modeldata canbeused togain furthertheoreticalinsights. Thisapproach hastheadvantageofhavingthepossibilitytoarbitrarilyturnonandoff variousmodelcomponentsandthusidentify which ofthem are crucialfor theobserved metric performance.For example,which assumptions aboutpapers writ-ten bymultiple authorsmust be fulfilled in order for a specific researcher-impact metric reflect wellthe individual authors’contributions? Such modelsneed tobeinformed byanalysesof realempirical datasets and, conversely, the assumptions andeffectsidentified ascrucialinmodeldatacanbeinturnvalidatedinrealdatasets.Finally,thereare othercitationdatasetsthatcanbeusedforasimilaranalysisuponidentifyingcorrespondingsetsofseminalnodesfor them.
Todrawaparallel,inamachinelearningproblem,ifthetrainingsethassomeintrinsicbias,thesystemlearnsthisbias andinturnproducesbiasedoutcomes(Lloyd,2018;Raghavendra,Cerutti,&Preece,2018).Thisissimilartooursituation whereapotentialbiasintheusedsetofseminalnodes,ifleftunchecked,canleadtowrongrankingmetricsbeingbelieved toperformbest,orevendesigningnewrankingmetricsthatperformwellonlythankstothebias.Biasedoutcomesofthose metricscaninturnmisguideourfutureevaluationsanddecisions.Furtherresearchofvariousaspectsofbiasindatamining andcomplexsystemsresearch,inparticularhowtoavoidit,isthereforevital.
7.2. Managementimplications
Rankingandprioritizingisanessentialtaskinmanymanagerialapplications.Ourresultsshowthattorankpapers, age-rescaledPageRankisawell-performingmetricthatbyconstructionproducesrankingswithlittleresidualagebias.Evaluation ofresearchersandinstitutestypicallyusesmetricsderivedfromcitationcounts(suchastheh-index,forexample).Our analysis,inparticulartheperformancegapbetweenage-rescaledPageRankandthetestedunbiasedversionsofcitation count,suggestthatapplyingstructuralnetwork-basedmetricssuchasPageRankmightbeofadvantagealsowhenthe objectiveistoranktheresearchersorinstitutions.
8. Conclusion
Well-designedrobustevaluationprotocolsarecrucialforunderstandingwhichrankingmetrics,whichwehave abun-danceof(Liaoetal.,2017;Waltman,2016),performwellinwhichcontexts.Thisstudyshowsthattheevaluationofaranking metricbasedsolelyonthepositionsofexpert-selectednodesintheresultingrankingisdifficulttointerpretbecauseit con-foundstwoaspects:themetric’srankingperformanceandthedegreetowhichthebiasesoftheexpert-selectednodes overlapwiththemetric’sbiases.Normalizedidentificationrateweakensthelinkbetweentherankingbiasandthe evalua-tionresults,andyieldsresultsthatareconsistentacrossdifferentdatasets.Inourcaseofrankingseminalnodesincitation networks,wefindthatage-rescaledPageRankandage-rescaledLeaderRank(notethatLeaderRankisaclosevariantof PageRank)arethetwobest-performingmetricsbyawidemargin.
Ourworkdeepenstheunderstandingofimpactmetrics,especiallyinrelationtotheinterplaybetweentheirbiasesand thebiasesoftheconsideredtestset.Thecomprehensivecomparisonsamongvariousmetricsarecrucialtocopewiththe ever-growingnumberofnewmetricsandbeneficialtounderstandtheadvantagesandlimitationsofeachofthem.The proposedevaluationframeworkwhichpenalizesbiasedmetricshasgeneralapplicabilitybeyondtherankingofarticlesin citationdata;byhighlightingthevariousrolesofbias,itprovidespracticallessonsfortherankingpracticeinotherdatasets withbias,suchastechnologicalnetworks,socialnetworks,andothersystems.
Authors’contribution
ShuqiXu:Contributeddataoranalysistools;Performedtheanalysis;Wrotethepaper. ManuelSebastianMariani:Conceivedanddesignedtheanalysis;Wrotethepaper. LinyuanLü:Conceivedanddesignedtheanalysis.
MatúˇsMedo:Conceivedanddesignedtheanalysis;Collectedthedata;Contributeddataoranalysistools;Wrotethe paper.
Acknowledgements
ThisworkissupportedbytheNationalNaturalScienceFoundationofChina(Nos.11622538,61673150,11850410444), theScienceStrengthPromotionProgramoftheUESTC,andtheZhejiangProvincialNaturalScienceFoundationofChina (Grantno.LR16A050001).MSMacknowledgesfinancialsupportfromtheUniversityofZurichthroughtheURPPSocial Networks,theSwissNationalScienceFoundation(GrantNo.200021-182659),theUESTCprofessorresearchstart-up(Grant No.ZYGX2018KYQD21).
AppendixA. Evaluationoftheagebiasremoval
Fig.A.1–A.3review theagebiasofindividualrankingmetricsin thethreeanalyzeddatasets.Using theusual divi-sionofallnodesin40equally-sizedgroupsbyage,thefiguresshowthenumberofnodesfromeachagegroup,N1%(g) inthetop1%oftherankingbyeachrespectiverankingmetric.Anage-unbiasedmetric wouldthusdisplayaflat his-togramwheredeviationsfromtheperfectlyuniformvalueNU=0.01N/40ineachagebinwouldbeonlyofstatisticalnature. Asin(Marianietal.,2016),wemeasurethelevelofagebiasineachhistogramusingtheobservedstandarddeviation
=
1 40 40 g=1 [N1%(g)−NU]2withtheaverage standard deviation0 that resultsfromdistributing 0.01N nodes among the40 agebins in a ran-dom(and therefore unbiased) way.When thebias strength is measuredby /0,the value of around one(or less) indicates that the observed level of bias can be explained by statistical fluctuations only. The higher the value, the stronger the bias. The values of /0 corresponding to the histograms in Figs. A.1–A.3 are summarized in TableA.5.
Fig.A.1.VisualizationoftheagebiasoforiginalandrescaledmetricsfortheAPSdata.
Fig.A.2.VisualizationoftheagebiasoforiginalandrescaledmetricsfortheHEPdata.
TableA.5
QuantificationoftheagebiasmagnitudeofrespectiverankingmetricsinFigs.A.1–A.3using/0(thehigherthevalue,thestrongertheagebias;thevalue ofoneindicateszerobias).
Metric APS HEP PAT
Original Rescaled Original Rescaled Original Rescaled
P 15.7 1.4 22.1 1.9 36.1 6.2 C 7.5 1.4 9.7 1.9 46.6 7.4 T 14.2 4.1 10.5 8.6 42.0 40.7 H 8.4 1.1 11.0 2.3 48.8 11.9 L 23.7 1.4 29.5 1.8 41.3 6.6 CI 9.9 2.8 12.5 3.3 49.6 17.9 SLC 11.5 2.1 13.8 3.1 50.6 18.6 HITS 6.0 5.0 10.4 8.9 28.2 108.3
http://doc.rero.ch
Fig.A.3.VisualizationoftheagebiasoforiginalandrescaledmetricsforthePATdata.ThefailureofweakeningtheagebiasoftheHITSauthorityscore byrescalingisduetotheauthorityscores“concentrating”onasmallfractionofnodeswiththeremainingnodeshavingasymptoticallyzeroscore,which posesobviousproblemstotherescalingprocedurebasedoncomputingscoremeanandstandarddeviationinafinitemovingtimewindow.
Fig.A.4.Thenormalizedidentificationrateofindividualmetricsasafunctionoftheseminalnodeage(inyears).Tofacilitatethecomparison,themetrics’ performanceisnormalizedtothebestmetricineachagebininthesamewayasinFig.2.AmetricwithzeroNIRthusreceiveszeroscore,whileametric thatachievesthebestNIRforgivenseminalnodeagereceivesthescoreofone.
References
Agnoloni,Tommaso,&Pagallo,Ugo.(2015).ThecaselawoftheItalianconstitutionalcourt,itspowerlaws,andthewebofscholarlyopinions.Proceedings ofthe15thinternationalconferenceonartificialintelligenceandlaw,151–155.
Alonso,Sergio,Cabrerizo,FranciscoJavier,Herrera-Viedma,Enrique,&Herrera,Francisco.(2009).h-index:Areviewfocusedinitsvariants,computation andstandardizationfordifferentscientificfields.JournalofInformetrics,3(4),273–289.
Berkhin,Pavel.(2005).AsurveyonPageRankcomputing.InternetMathematics,2(1),73–120.
Bornmann,Lutz,&Daniel,Hans-Dieter.(2008).Whatdocitationcountsmeasure?Areviewofstudiesoncitingbehavior.JournalofDocumentation,64(1), 45–80.
Bornmann,Lutz,&Marx,Werner.(2015).Methodsforthegenerationofnormalizedcitationimpactscoresinbibliometrics:Whichmethodbestreflects thejudgementsofexperts?JournalofInformetrics,9(2),408–418.
Bornmann,Lutz,Leydesdorff,Loet,&Mutz,Rüdiger.(2013).Theuseofpercentilesandpercentilerankclassesintheanalysisofbibliometricdata: Opportunitiesandlimits.JournalofInformetrics,7(1),158–165.
Bovet,Alexandre,&Makse,HernánA.(2019).InfluenceoffakenewsinTwitterduringthe2016USpresidentialelection.NatureCommunications,10(1),7. Braun,Tibor,Glänzel,Wolfgang,&Schubert,András.(2006).AHirsch-typeindexforjournals.Scientometrics,69(1),169–173.
Brin,Sergey,&Page,Lawrence.(1998).Theanatomyofalarge-scalehypertextualwebsearchengine.ComputerNetworksandISDNSystems,30(1-7), 107–117.
Charlton,BruceG.,&Andras,Peter.(2007).Evaluatinguniversitiesusingsimplescientometricresearch-outputmetrics:Totalcitationcountsper universityforaretrospectiveseven-yearrollingsample.ScienceandPublicPolicy,34(8),555–563.
Chen,Duanbing,Lü,Linyuan,Shang,Ming-Sheng,Zhang,Yi-Cheng,&Zhou,Tao.(2012).Identifyinginfluentialnodesincomplexnetworks.PhysicaA: StatisticalMechanicsandItsApplications,391(4),1777–1787.
Chen,Peng,Xie,Huafeng,Maslov,Sergei,&Redner,Sidney.(2007).FindingscientificgemswithGoogle’sPageRankalgorithm.JournalofInformetrics,1(1), 8–15.
Dunaiski,Marcel,Visser,Willem,&Geldenhuys,Jaco.(2016).Evaluatingpaperandauthorrankingalgorithmsusingimpactandcontributionawards. JournalofInformetrics,10(2),392–407.
Dunaiski,Marcel,Geldenhuys,Jaco,&Visser,Willem.(2018).Howtoevaluaterankingsofacademicentitiesusingtestdata.JournalofInformetrics,12(3), 631–655.
Dunaiski,Marcel,Geldenhuys,Jaco,&Visser,Willem.(2019a]).Globalisedvsaveraged:Biasandrankingperformanceontheauthorlevel.Journalof Informetrics,13(1),299–313.
Dunaiski,Marcel,Geldenhuys,Jaco,&Visser,Willem.(2019b]).Ontheinterplaybetweennormalisation,bias,andperformanceofpaperimpactmetrics. JournalofInformetrics,13(1),270–290.
Fowler,JamesH.,&Jeon,Sangick.(2008).Theauthorityofsupremecourtprecedent.SocialNetworks,30(1),16–30.
González-Pereira,Borja,Guerrero-Bote,VicenteP.,&Moya-Anegón,Félix.(2010).Anewapproachtothemetricofjournals’scientificprestige:TheSJR indicator.JournalofInformetrics,4(3),379–391.
Harzing,Anne-Wil,&Wal,RonVanDer.(2009).AGoogleScholarh-indexforjournals:Analternativemetrictomeasurejournalimpactineconomicsand business.JournaloftheAmericanSocietyforInformationScienceandtechnology,60(1),41–46.
Hicks,Diana,Wouters,Paul,Waltman,Ludo,deRijcke,Sarah,&Rafols,Ismael.(2015).Bibliometrics:TheLeidenManifestoforresearchmetrics.Nature, 520,429–431.
Hirsch,JorgeE.(2005).Anindextoquantifyanindividual’sscientificresearchoutput.ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesof America,102(46),16569–16572.
Hirsch,JorgeE.(2007).Doesthehindexhavepredictivepower?ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,104(49), 19193–19198.
Kleinberg,JonM.(1999).Authoritativesourcesinahyperlinkedenvironment.JournaloftheACM,46(5),604–632.
Kogan,Leonid,Papanikolaou,Dimitris,Seru,Amit,&Stoffman,Noah.(2017).Technologicalinnovation,resourceallocation,andgrowth.TheQuarterly JournalofEconomics,132(2),665–712.
Leydesdorff,Loet,Bornmann,Lutz,Mutz,Rüdiger,&Opthof,Tobias.(2011).Turningthetablesoncitationanalysisonemoretime:Principlesfor comparingsetsofdocuments.JournaloftheAmericanSocietyforInformationScienceandTechnology,62(7),1370–1381.
Leydesdorff,Loet,Bornmann,Lutz,&Opthof,Tobias.(2018).h˛:Thescientistaschimpanzeeorbonobo.Scientometrics,1–4.
Liao,Hao,Mariani,ManuelSebastian,Medo,Matúˇs,Zhang,Yi-Cheng,&Zhou,Ming-Yang.(2017).Rankinginevolvingcomplexnetworks.PhysicsReports, 689,1–54.
Lloyd,Kirsten.(2018).Biasamplificationinartificialintelligencesystems.arXiv:1809.07842
Lü,Linyuan,&Zhou,Tao.(2011).Linkpredictionincomplexnetworks:Asurvey.PhysicaA:StatisticalMechanicsandItsApplications,390(6),1150–1170. Lü,Linyuan,Zhang,Yi-Cheng,Yeung,ChiHo,&Zhou,Tao.(2011).Leadersinsocialnetworks,theDeliciouscase.PLoSONE,6(6),e21202.
Lü,Linyuan,Medo,Matúˇs,Yeung,ChiHo,Zhang,Yi-Cheng,Zhang,Zi-Ke,&Zhou,Tao.(2012).Recommendersystems.PhysicsReports,519(1),1–49. Lü,Linyuan,Zhou,Tao,Zhang,Qian-Ming,&Stanley,H.Eugene.(2016).Theh-indexofanetworknodeanditsrelationtodegreeandcoreness.Nature
Communications,7,10168.
Lundberg,Jonas.(2007).Liftingthecrown–citationz-score.JournalofInformetrics,1(2),145–154.
Manning,Christopher,Raghavan,Prabhakar,&Schütze,Hinrich.(2010).Introductiontoinformationretrieval.NaturalLanguageEngineering,16(1), 100–103.
Mariani,ManuelSebastian,Medo,Matúˇs,&Zhang,Yi-Cheng.(2015).Rankingnodesingrowingnetworks:WhenPageRankfails.ScientificReports,5, 16181.
Mariani,ManuelSebastian,Medo,Matúˇs,&Zhang,Yi-Cheng.(2016).Identificationofmilestonepapersthroughtime-balancednetworkcentrality.Journal ofInformetrics,10(4),1207–1223.
Mariani,ManuelSebastian,Medo,Matúˇs,&Lafond,Franc¸ois.(2018).Earlyidentificationofimportantpatents:Designandvalidationofcitationnetwork metrics.TechnologicalForecastingandSocialChange,http://dx.doi.org/10.1016/j.techfore.2018.01.036
Martin,Travis,Ball,Brian,Karrer,Brian,&Newman,M.E.J.(2013).CoauthorshipandcitationpatternsinthePhysicalReview.PhysicalReviewE,88(1), 012814.
Mattedi,MarcosAntônio,&Spiess,MaikoRafael.(2017).Theevaluationofscientificproductivity.História,Ciências,Saúde-Manguinhos,24(3),623–643. Medo,Matúˇs,&Cimini,Giulio.(2016).Model-basedevaluationofscientificimpactindicators.PhysicalReviewE,94(3),032312.
Mingers,John,&Leydesdorff,Loet.(2015).Areviewoftheoryandpracticeinscientometrics.EuropeanJournalofOperationalResearch,246(1),1–19. Morone,Flaviano,&Makse,HernánA.(2015).Influencemaximizationincomplexnetworksthroughoptimalpercolation.Nature,527(7579),544. Mutz,Rüdiger,&Daniel,Hans-Dieter.(2012).Thegeneralizedpropensityscoremethodologyforestimatingunbiasedjournalimpactfactors.
Scientometrics,92(2),377–390.
Newman,Mark.(2010).Networks:Anintroduction.OxfordUniversityPress.
Newman,MarkE.J.(2009).Thefirst-moveradvantageinscientificpublication.EPL(EurophysicsLetters),86(6),68001.
Nickerson,KyleL.,Chen,Yuanzhu,Wang,Feng,&Hu,Ting.(2018).Measuringevolvabilityandaccessibilityusingthehyperlink-inducedtopicsearch algorithm.Proceedingsofthegeneticandevolutionarycomputationconference,1175–1182.
Radicchi,Filippo,Fortunato,Santo,Markines,Benjamin,&Vespignani,Alessandro.(2009).Diffusionofscientificcreditsandtherankingofscientists. PhysicalReviewE,80(5),056103.
Raghavendra,Ramya,Cerutti,Federico,&Preece,Alu.(2018).Whendatalie:Fairnessandrobustnessincontestedenvironments.InNext-generation analystVI(p.106530U).InternationalSocietyforOpticsandPhotonics.
Ren,Zhuo-Ming.(2019).Agepreferenceofmetricsforidentifyingsignificantnodesingrowingcitationnetworks.PhysicaA:StatisticalMechanicsandits Applications,513,325–332.
Ren,Zhuo-Ming,Mariani,ManuelSebastian,Zhang,Yi-Cheng,&Medo,Matúˇs.(2018).Randomizinggrowingnetworkswithatime-respectingnullmodel. PhysicalReviewE,97(5),052311.
deRijcke,Sarah,Wouters,PaulF.,Rushforth,AlexD.,Franssen,ThomasP.,&Hammarfelt,Björn.(2016).Evaluationpracticesandeffectsofindicatoruse– Aliteraturereview.ResearchEvaluation,25(2),161–169.
Schubert,András.(2008).Usingtheh-indexforassessingsinglepublications.Scientometrics,78(3),559–565.
Strumsky,Deborah,&Lobo,José.(2015).Identifyingthesourcesoftechnologicalnoveltyintheprocessofinvention.ResearchPolicy,44(8),1445–1461. Todeschini,Roberto,&Baccini,Alberto.(2016).Handbookofbibliometricindicators:Quantitativetoolsforstudyingandevaluatingresearch.JohnWiley&
Sons.
Vaccario,Giacomo,Medo,Matúˇs,Wider,Nicolas,&Mariani,ManuelSebastian.(2017).Quantifyingandsuppressingrankingbiasinalargecitation network.JournalofInformetrics,11(3),766–782.
Walker,Dylan,Xie,Huafeng,Yan,Koon-Kiu,&Maslov,Sergei.(2007).Rankingscientificpublicationsusingamodelofnetworktraffic.JournalofStatistical Mechanics:TheoryandExperiment,(06),P06010,2007.
Waltman,Ludo.(2016).Areviewoftheliteratureoncitationimpactindicators.JournalofInformetrics,10(2),365–391.
Waltman,Ludo,&Yan,Erjia.(2014).Pagerank-relatedmethodsforanalyzingcitationnetworks.Measuringscholarlyimpact,83–100.
Wasserman,Max,Zeng,XiaoHanT.,&Amaral,LuísA.Nunes.(2015).Cross-evaluationofmetricstoestimatethesignificanceofcreativeworks. ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesofAmerica,112(5),1281–1286.
West,JevinD.,Jensen,MichaelC.,Dandrea,RalphJ.,Gordon,GregoryJ.,&Bergstrom,CarlT.(2013).Author-leveleigenfactormetrics:Evaluatingthe influenceofauthors,institutions,andcountrieswithinthesocialscienceresearchnetworkcommunity.JournaloftheAmericanSocietyforInformation ScienceandTechnology,64(4),787–801.
Zeng,An,Shen,Zhesi,Zhou,Jianlin,Wu,Jinshan,Fan,Ying,Wang,Yougui,etal.(2017).Thescienceofscience:Fromtheperspectiveofcomplexsystems. PhysicsReports.
Zhou,Yan-Bo,Lü,Linyuan,&Li,Menghui.(2012).Quantifyingtheinfluenceofscientistsandtheirpublications:Distinguishingbetweenprestigeand popularity.NewJournalofPhysics,14(3),033033.