Identifying prize-winning scientists by a competition-aware ranking

(1)

Identifying prize-winning scientists by a competition-aware

ranking

Yuhao Zhou

a

_{, Ruijie Wang}

a

_{, An Zeng}

b,∗

_{, Yi-Cheng Zhang}

a

a_{Department of Physics, University of Fribourg, Fribourg 1700, Switzerland} b_{School of Systems Science, Beijing Normal University, Beijing, 100875, PR China}

Evaluating scholars’ achievements is an important problem in the science of science with

applications in the evaluation of grant proposals and promotion applications. Since the

number of scholars and the number of scholarly outputs grow exponentially with time,

well-designed ranking metrics that have the potential to assist in these tasks are of prime

importance. To rank scholars, it is important to put their achievements in perspective by

comparing them with the achievements of other scholars active in the same period. We

propose here a particular way of doing so: by computing the evaluated scholar’s share on

each year’s citations which quantiﬁes how the scholar fares in competition with the others.

We assess the resulting ranking method using the American Physical Society citation data

and four prestigious physics awards. Our results show that the new method signiﬁcantly

outperforms other ranking methods in identifying the prize laureates.

1. Introduction

Measuring the academic impact of scientists is an important research direction in the science of science (Zeng et al. (2017)). With the rapid development of science and technology, more and more scholars have devoted themselves to scientific research. Not only a large number of new academic papers appear every year, but also more junior scholars are emerging. So how to scientifically measure the academic influence of scientists and academic achievements are becoming more and more important. A fair measurement of academic influence can help scientific institutions select suitable scholars as faculties, promote outstanding academic staff, provide valuable references for academic awards, and evaluate grant proposals.

According to a recent review (Zeng et al., 2017), there are three main types of methods for evaluating a scholar’s academic achievements. The ﬁrst type is traditional static indicators. This kind of methods are the most widely used and studied, such as citation count (Garﬁeld, 1970;Redner, 1998), H-index (Bornmann & Daniel, 2005;Light, Polley, & Börner, 2014), G-index (Egghe, 2006), and so on (Hirsch, 2005;Jin, Liang, Rousseau, & Egghe, 2007). These indicators are generally based on simple statistics, so they require less information and are more convenient to calculate. Meanwhile they also have obvious shortcomings. One example is the famous H-index, which combines the number of citations and the number of papers to measure the impact of a scholar. Although this metric is a combined measure of the quantity and quality of a scientist’s publications, it is strongly dependent on the number of papers. For a scholar who publishes limited number of papers, despite that these papers have high impact, his/her H-index cannot be high. This means that although these traditional methods

∗ Corresponding author.

E-mail address:anzeng@bnu.edu.cn(A. Zeng).

http://doc.rero.ch

3XEOLVKHGLQ-RXUQDORI,QIRUPHWULFV

ZKLFKVKRXOGEHFLWHGWRUHIHUWRWKLVZRUN

(2)

aresimple,therearelimitationsinthem.Thesecondtypeisnetwork-awaremethods(Bar-Ilan,2008;Yan&Ding,2009). Inthescienceofscience,thecitationrelationshipcanbeseenasacomplexnetwork,whichiscalledscientificnetwork (Price,1965).Throughthenoderankingalgorithms,manyscientificinfluencemeasurementsforcitationnetworkshave beenderived,suchas,PageRank(Chen,Xie,Maslov,&Redner,2007),authorrank(Liu,Bollen,Nelson,&VandeSompel, 2005),personalizedPageRank(Ding,Yan,Frazho,&Caverlee,2009),time-awarePageRank(Fiala,2012)anddiffusion-based scienceauthorrankalgorithm(Radicchi, Fortunato,Markines,&Vespignani,2009),etc.Third,somespeciallydesigned measurementsforevaluatingscholars,suchas,dynamicmethod(Sinatra,Wang,Deville,Song,&Barabási,2016),credit allocationbasedmethod(VanHooydonk,1997)andmachinelearningbasedmethods(Abrishami&Aliakbary,2019;Shen, Chen,Yang,&Wu,2019).

Allthesemethodshavethesamelimitation,thatisthetimebiasproblem.Inordertoobtainafairrankingofscholars, manyresearchworkshavebeendone.Somescholarsfocusonsolvingtimebiasproblemsonasmallscale.Forexample,the time-awarePageRankproposedby(Fiala,2012),wementionedabove,solvesaseriesofsmallscaletimebiasproblemson citationrelationshipbetweenscholars.Theirresultsshowthatsolvingthesetimebiasescanhelptoimprovetheaccuracy ofprediction.Someotherresearchesfocusonlarge-scaletimebiases.Asmentionedabove,withtherapiddevelopmentof scienceandtechnology,theacademicenvironmentiscontinuouslychanging,butthemainstreamevaluationmethodsfor scholarsarealmoststatic,soit’sunfairwhencomparingscholarsfromdifferentages.Forexample,thereisaclassictimebias problemwhichiscausedbythewell-knownpreferentialattachmentmechanism(Bianconi&Barabási,2001;Papadopoulos, Kitsak,Serrano,Boguná,&Krioukov,2012).Inthescienceofscience,thisproblemexistsatboththescholarlevelandthe paperlevel(Price,1965,1976).Thisproblemcanbedescribedas,comparedtojuniorscholars,seniorscholarshavean advantageintimebecausetheystartedtheircareersearlier.Soseniorscholarshavelongertimetoaccumulateattention. Duetotheirreputation,theirworkwillbethoughtmorevaluable,soeventuallytheybecomefamousallovertheworld. Onthecontrary,itishardforjuniorscholarstogetattentions.Manyscholarshavepointedout(Barabási&Albert,1999; Perc,2014)thisproblem.However,accordingtoouranalysis,besidesthepreferentialattachment,thereisanothertime biasissueinevaluatingthelong-termorlifetimeachievementsofscholars.Itcanbesimplyunderstoodaswhencomputing theacademicachievementsofascholar,notonlyhis/herownpapersandcitationsshouldbeconsidered,butalsothepeers duringthesameperiodshouldbetakenintoaccount.Themorescholarsinthesameperiod,themoreacademicpapersare publishedatthisperiod,Soit’seasiertoaccumulatecitationswhilemoredifﬁculttocompeteforanacademicachievement award.Therefore,itisimportanttoconsidertheacademicenvironmentwhenevaluatingtheachievementsofscholars. Wenamethisphenomenoncompetition.Toexplainandvalidateourtheory,weusethewell-knownAmericanPhysical Society(ASP)dataset(Sinatraetal.,2016).Toverifythatourmethodcandetermineascholar’sacademicachievementmore reasonably,inthispaperweusetheNobelPrizeinPhysics,theWolfPrize,theEnricoFermiPrizeandtheMaxPlanckMedal asbenchmarkdatasets.

Thispaperwillbecomposedofthefollowingparts.Inthesecondsection,wewillbrieflydescribetheAPSdatasetwe used,definesomemathematicalsymbolsandintroducefourmetricsformeasuringexperimentalresults.Inthethirdsection, wefirstperformanalysisonAPSdatasettointroducethecompetitionprobleminevaluatingscholars.Thenweexplainin detailtheexistenceandimportanceofthecompetition.Finally,weproposeanewmethodbasedontheaboveanalysis.In section4,weillustratetheeffectivenessofourmethodbyanalysingtheresultsofexperiments.InSection5,wediscussthe roleoftheparameterinourmethod.Section6istheconclusionofthispaper.

2. Data,symbolsandmetrics

Inthispaper,weanalyzethepublicationdatafromalljournalsofAPS.Thedatacontains482,566papers,rangingfrom year1893toyear2010.Forthesakeofauthornamedisambiguation,weusetheauthornamedatasetprovidedbySinatra etal.whichisobtainedwithacomprehensivedisambiguationprocessintheAPSdataset(Sinatraetal.,2016).Eventually,a totalnumberof236,884distinctauthorsarematched.WeusetheNobelPrizeinPhysicsasthemainvalidationdatasetfor thispaper.TheNobelPrizedatasetincludesallNobelPrizelaureatesfromyear1900toyear2015,andatthesametimewe ensurethattheseNobelPrizelaureateshavepublishedatleastonepaperontheAPSdataset.Eventually,atotalnumberof 158distinctNobelPrizelaureatesarematched.Meanwhile,wealsoprovidetheresultsoftheWolfPrize,theEnricoFermi Prize,andtheMaxPlanckMedal.Itshouldbenotedthatinthefollowinganalysis,wemainlyshowtheresultsontheNobel Prizedataset(Theresultsfortheotherthreeawardsareshowninatableattheend).Therearethreereasons:

• TheNobelPrizeinPhysicshasthehighestauthorityintheﬁeldofphysics,whichmeansthatalaureatehasagreatinﬂuence ontheprogressofscienceandtechnology.

• TheNobelPrizeinPhysicshasalonghistory,andthelaureatesaremorewidelydistributedintime.Thisismoresuitable forourresearch,thatis,howtocomparetheachievementofscholarsfairlyinalongtimespan.

• TheperformancesoftheNobelPrizelaureatesareuneven.Somelaureatesmayhaveover10,000citations,whilesome ofthemonlyhavelessthan1,000,whichalsomakesitdifﬁculttodeterminethelaureatesoftheNobelPrize(Gingras& Wallace,2009).Therefore,usingtheNobelPrizeinPhysicsasourbenchmarksetforevaluatingacademicachievements hasacertaindegreeofnovelty.

(3)

Table1

Asimpleandconcreteexampleforcalculatingmetrics

Name Originalscore Originalranking Newscore Newranking

a 10 1 2 6 b 9 2 6 2 c 8 3 4 4.5 d 7 4 7 1 e 6 5 5 3 f 5 6 4 4.5 g 4 7 1 7 Precision 0.33 0.67 AR 4 2.5 RIR 0.67

Weﬁrstgivesomedeﬁnitionsofimportantsymbolshere.Alllowercaselettersindicatetheattributesofapaper,andall uppercaselettersindicatetheattributesoftheanauthor.Thetimeofpaperiwaspublishedisti.Thedebuttimeofauthor

iisTi,whichmeansthetimewhenthescholaripublishedtheﬁrstpaperintheAPSdataset.Nirepresentstheauthori’s

collectionofallpapersintheAPSdataset,andweuse|Ni|toindicatethenumberofallhispapers.Weusecitoindicatethe

totalnumberofcitationsforpaperi,anduseCitorepresentthetotalnumberofcitationsofauthori(Itshouldbenotedthat

ourcitationrelationshipislimitedtotheAPSdatasetweused),sowehaveCi=

j∈Nicj.Inthispaperweusethesecond subscripttoindicatethecomponentoftheindicatorinacertainyear.Forexample,Nitrepresentsthenumberofpapers

publishedbyauthoriinyeart,sowehaveNi=

tNit.Similarly,Citrepresentsthenumberofcitationstheauthorireceived

intheyeart,sowealsohaveCi=

tCit.Inaddition,weuseRttoindicatethetotalnumberofreferencesfortheyeart,that

is,thesumofthenumberofreferencesofthepaperspublishedinyeart.

Inthispaper,weaimtoidentifyamongallscholarswhoareeligiblefortheNobelPrizeandcomparetheresultswiththe realNobelPrizedataset.Inordertoquantitativelycomparetheidentiﬁcationperformanceofdifferentmethods,weusethe followingfourmetrics:

• Precision.Precision= |q

p|

|q| .Theprecisioniswidelyusedinidentiﬁcationtasks(Lü&Zhou,2011),qisthesetofreal

laure-atesandpistheresultpredictedbythealgorithm,where|q|=|p|.Wecallq∩pasidentifiedlaureatesandcomplementary setqpasunidentifiedlaureates.Thisindexisthemoststraightforwardindextomeasuretheidentificationresult.

How-ever,thisindexcannotaccuratelymeasurealgorithmswhenthesizeoftestdatasetissmall,suchastheNobelPrizedata setweusedinthispaper,soweprovidethesecondmeasurement.

• Averageranking(AR).Inordertosolvethelow-resolutionproblem,weusetheaverageranking.Thismetricisalsoused inRadicchietal.(2009)tomeasurethequalityoftheiridentiﬁcationmethod.Givenanevaluationindex,afterwehave therankingsforallscholarsaccordingtotheirscorescalculatedbytheindex,weaveragerankingsofallreallaureates. Itshouldbenotedthatwhensomescholarsgetthesamescores,therankingofthesescholarswillallbereplacedbythe meanvalueoftheirrankings.Unlikeprecision,ARconsiderstherankingofalllaureates,soitisnotaffectedbythesize ofthetestdataset.However,whenweuseit,weﬁndthatthismetricalsohasaproblem.Whentheaverageranking resultofanalgorithmisparticularlyhigh,theremaybetworeasons.Itmaybebecausethealgorithmhasabetteroverall performance,oritmaybebecausetherankingofafewdatapointsincreasesharply,resultinginanincreaseintheaverage ranking.Inordertoavoidthiskindofmisjudgment,weintroduceanothermetricRIR.

• Rankingincreaserate(RIR).RIR= 1 N

iı (i).Whencomparingtherankingofthetworankingmethods,besidetheaverage

ranking,wecanuseRIRtomeasurehowmanyNobellaureatesinourcaseareimprovedinranking.ıiscomparingfunction. Herewedirectlyabbreviatetoı(i),representingwhetherthenewmethodcanimprovetherankingforscholari.Ifthe rankingishigherthantheoriginalmethod,ı(i)isequalto1,otherwiseitis0.ItcanbeseenthattheRIRdoesnotcare abouttheoptimizationdegreeoftheranking,butfocusesonthescopeofoptimization.

• Top-n%.Finally,inordertoenrichtheresults,weusetheTop-n%metrictofurtheranalyzetheresults.Thismetricwas alsomentionedandusedinRadicchietal.(2009).nrepresentstherankingthreshold.Inthispaper,weusefourlevelsof n:0∼0.1%,0.1%∼1%,1%∼10%,10%∼100%.Thenineachintervalwecounthowmanyreallaureatesareincluded.Itiseasy toknowthatthemorethereallaureatesareassignedinthehigherrankingsinterval,thebettertheidentiﬁcationresult willbe.

AnexampleofcalculatingconcretevaluesforbothARandRIRisgivenbelow.Forexample,herewehavesevenscholars, named‘a’to‘g’.Theiroriginalscoresandoriginalrankingsarescoresandrankingscalculatedbyoriginalmethodrespectively. Andtheirnewscoreandnewrankingarecalculatedbynewmethodrespectively.DetailedvaluesaregiveninTable1.We assumethat‘b’,‘d’and‘f’arereallaureates.Usingtheoriginalmethodforprediction,thetopthreescholarsobtainedare ‘a’,‘b’and‘c’,sotheprecisionis0.33,whilethepredictionresultsofthenewmethodare‘d’,‘b’and‘e’,andtheprecisionis 0.67.Basedontheoriginalmethod,theaverageranking(AR)is4.Afterusingthenewmethod,theARofthenewmethodis improvedto2.5.Finally,both‘d’and‘f’areimprovedintheirrankings,sotheRIRisequalto0.67.

(4)

Fig.1.StatisticalresultsontheAPSdataset.Subgraph(a)isthescatterplotofthecareerstartyearofresearchersversusthetotalcitationstothem.(b) isthescatterplotofthecareerstartyearofresearchersandtheirrescaledcitationcount.(c)Histogramoftop-0.1%scholarsrankedbyrescaledcitation countandcitationcount.Hereweassignalltop-0.1%scholarsintoseveraldifferentdecadeperiodsinsteadofeverysingleyear.(d)Thenumberofnew authors(ornewpapers)ineachyearfortheAPSdata.They-axisisthelogarithmofthequantity.Wecanseethatthenumberofscholarsandpublications increasesexponentiallyovertime.

3. Model

Inthissection,wewillproposeournewmethodforidentifyingtheprize-winningscientistsbyconsideringcompetition inscientistsranking.Westartwiththebasicmethod,citationcount(CC),whichcanbecomputedbydirectlycountingthe totalcitationsofallscholarsinAPSdataset.AsshowninFig.1(a).Eachdotrepresentsascholar.Wecanseethattheresult distributionofCCcanbedividedintoarisingphaseandadecreasingphase(excepttheobviousgapcausedbytheWorld WarIIaround1945year).Thesetwophasesarecausedbycompetitionandpreferentialattachmentrespectively.

AsshowninFig.1(d),theacademicpopulationincreasesovertime,whichdirectlyleadstohighercitationsofscholars, showninFig.1(a).Thismakesthedistributionoftopscholarsseriouslyunevenovertime.Wecanseethebluehistogram inFig.1(c),by1980,thedistributionoftop-0.1%ofscholarshasapositivecorrelationwithtime.Afterthepeakaround 1980,scholars’citationcountandthenumberoftopscholarsdecreasesharplyyearbyyear.Thistrendismainlycausedby preferentialattachment.Manymethodshavebeenproposedforcorrectingthebiasofthepreferentialattachmentproblem, suchasrescaledcitationcount(Newman,2009),rescaledPageRank(Mariani,Medo,&Zhang,2016),citationrank(Walker, Xie,Yan,&Maslov,2007),etc.TakingrescaledPageRank(Marianietal.,2016)asanexample,thiseffectcanbeeliminated byonlycomparingascholarwithhispeers,andtheresultcanbeseeninFig.1(b)andtheorangehistograminFig.1(c).After usingtherescaledmethodtoeliminatetheeffectofpreferentialattachment,thedownwardtrendhasgone,leavingonlya clearupwardtrend.Itmeansthat,althoughwehavesolvedthepreferentialattachmentproblem,theeffectofcompetition stillexists.Scholarsfromdifferenttimeperiodsarestillnotatthesameleveltobecompared.Moreover,thedistribution shapeofthreeotherpopularindicatorsaresameasCC.ResultscanbeseeninFig.3.Thismeansthatthosethreeindicators donotconsidercompetitioneither.

Competitioncanbeillustratedwithasimpleexample,thenumberofscholarswhodebutedinyear1970is2,127,the averagecitationsofscholarsinthatyearis111.1,andthestandarddeviationofcitationsis295.Accordingtothe3principle, thereare48outstandingscholars(thatis,theircitationsaregreaterthanthemeanvalueplus3timesofstandarddeviation). In1910,only27scholarsdebutedinthisyear,andonlyonepersoncanbethoughtanoutstandingscholarwithtotal1,003 citations,evenlessthantheminimumofthe48scholarsfrom1970.Therefore,iftheNobelPrizeusescitationastheir criterion,allscholarsfrom1910willhavenochancetobenominated.Inreality,fortheoutstandingscholarsin1970itwas 48timesmoredifﬁculttowintheprizethanforthosein1910,becauseofthecompetitionbetweenalargenumberofpeers from1970.ThisphenomenonwasalsoobservedbyGingrasandWallace(2009).

Insummary,withtheacademicenvironmentcontinuouslychanging,traditionalstaticacademicindicatorsareineffective inidentifyingscholars’achievements(Fire&Guestrin,2019),becausetheyusuallyignorethecompetitioneffect.Therefore, weneedtodesignanewmethodforevaluatingtheoverallachievementofscholars,whichcantacklethecompetition problem.

WeusetheFig.2toexplainouridea.Asshownintheﬁgure,weassumethatbothscholariandscholarjdebutedinyear 1940,andthetotalcitationsofthemare100until2010.The100citationsofscholaricamefromyear1960,andthecitations

(5)

Fig.2.Theillustrationofthecompetitioneffectwhenevaluatingtheachievementoftwoscholars.Weassumetherearetwoscholarsiandj.Until2010,they bothhave100citations,butireceivedthesecitationsin1960andjreceivedthesecitationsin2000.Ifweevaluatethetwoscholarsfromtheperspective ofcompetition,scholariaremoreoutstandingthanscholarj,asitismoredifﬁculttoobtain100citationsin1960.

ofscholarjcamefromyear2000.AccordingtotheAPSdataset,thetotalnumberofreferencesfromallpaperspublished inyear1960was10,727,whilethetotalnumberofreferencesinyear2000was170,139.Sogetting100citationsin2000is mucheasierthanthatinyear1960.Inordertosolvethisdifference,weproposethefollowingformula.

C∗ i =

t Cit (Rt)˛ (1) WhereC∗

i isournewindex.InordertocalculateCi∗,weconsidertheauthor’scitationsineachsingleyearasCit.Accordingto

theaboveanalysis,theachievementsofascholarinyeartdonotdependontheabsolutecitationsreceivedinthisyear,but onarelativevalue.Inordertoincreasethegeneralityofthemethod,weaddanadjustableparameter˛.Alarger˛indicates thatourmethodgivesmoreweighttocompetition.Itisnaturalthatwhen˛=0,theEqn(1)degeneratesintocitationcount. Therefore,wecanregardCCasaspeciﬁccaseofourmethod.Wecanalsoseethatifweusetraditionalmethods,suchasCC, H-index(assumingthatbothscholarshaveonlyonepaperinthesametime)orrescaledmethodwementionedabove,itis impossibletodistinguishscholariandscholarj,becausetheyallignorethecompetitioneffect.

4. Results

Inthissection,wemainlypresenttheexperimentresultsontheNobelPrize,becauseofthethreeadvantagesmentioned above.Meanwhile,toenrichtheresults,wealsobrieﬂyshowtheresultsofthreeotherprizesattheendofthissection. 4.1. Methods

First,weintroduceallmethodsweusedformakingacomparison.Asmentionedabove,aplethoraofquantitativemetrics existandcouldbecomparedinprinciple,butourgoalistoshowthatourmethodscansolvethecompetitioneffectproblem effectively.Therefore,inthispaperourfocusisnarrowedtotraditionalstatisticalmethods.Soweﬁrstbrieﬂyintroducefour traditionalfamousmetrics:

• Citationcount(CC).CCisabasicacademicstatisticalindicator,whichcanstraightforwardlyreﬂecttheattentionofscholars’ scientiﬁcwork.

• Papercount(PC).Thatis,uptoacertainyear,howmanypublicationsthescholarhas.Thisindicatorisalsocalledthe academicproductivity,anditcanmeasurescholars’academicactivityandproductivity,butignoretheirpublications quality.

• H-index(Hirsch,2005).Currently,H-indexisamainstreamevaluationmethodforscholars,whichcombinesacademic productivityandacademicqualitytoevaluateascholarcomprehensively.H-indextreatsallpapersofascholarequally, whichwillignoretheimportanceofhighqualitypapers,soitisnotfairforevaluatingthescientiﬁcperformancefor scholarswithafewhighlycitedpapers.Toovercomeit,G-indexwasproposed.

• G-index.Egghe(2006)proposedtheG-index,whichisdeﬁnedasthelargestnumbergofindividualpublicationsthat togetherhaveatleastg2_citations._So_G-index_can_give_more_weight_to_papers_with_high_citations_than_H-index.

Thenweintroduceourtwomethods:

(6)

Fig.3. Scatterplotoffourbasicindicatorsversusthecareerstartyearofresearchers(a:citationcount,b:papercount,c:H-index,d:G-index).Ineach subgraph,eachpointrepresentsascholar.ThereddotsandthegreendotsrepresenttheNobelPrizelaureates.Thereddotsrepresentthescholarsthat arecorrectlyidentiﬁedastheNobelPrizelaureateswhilethegreendotsrepresentthefailedcase.AllblackdotsarescholarswhohavenotwontheNobel Prize.

• Competition-awarecitationcount(CCC),seeformula(1).AsavariantofCC,theresultsofCCCaremainlycomparedwith CC.Actually,theEqn(1)canbegeneralizedintoaframework,thatis,wecanuseanyacademicindicatorstoreplaceCC, thensplititbyyear,anddividebythecorrespondingcompetitionfactor,namelythetotalreferencesinoneyear.Therefore, inordertofurtherillustratetheeffectivenessofourmethod,wealsoapplyittothenumberofpapers(whichiscalledthe academicproductivityofscholars)asanadditionalexample.

• Competition-awarepapercount(CPC).Itisnaturaltoconsiderthatthecompetitionfactorfortheacademicproductivityof scholarsisthetotalnumberofpaperspublishedintheoneyear.Wethusproposeacompetition-awarepapercountmethod, mathematicallyexpressedasEqn(2).Wecanseethatwhen˛=0,P∗

i degeneratesintoapureacademicproductivityofa

scholar.Therefore,theEqn(2)isanimprovedversionofthescholar’sacademicproductivity.IfCPCworkswell,itcannot onlyexplaintheexistenceofthecompetitioneffectandtheeffectivenessofourmethod,butalsoexhibitstherobustness ofourmethod.Afterreducingalotofusefulinformation(withoutinformationofthecitationsofpaperscomparedwith CCandPC),ourmethodcanstillhavehighaccuracy,whichmeansanyacademicstatisticalindicatorscanbeimprovedby introducingthecompetitionconcept.AsavariantofPC,theCPCresultsaremainlycomparedwithPCbelow.

P∗ i =

t Pit (Nt)˛ (2) 4.2. Analysisresults

AsshowninFig.3,wepresenttheresultsoffourclassicalmethods(CC,PC,H-indexandG-index).First,Fig.3givesusan intuitivefeelingaboutthedistributionofallscholarsandtherankingpositionsofallreallaureates.Itcanbeclearlyseenthat althoughtheNobelPrizelaureatesaregenerallyrecognizedasthescholarswiththelargestcontributions,thisdoesnotmean thattheirindicatorsarealwaysthebest.ActuallymanyNobelPrizelaureatesareatverylowpositions.Ofcourse,thismaybe causedbythelimiteddatasetandincompleteinformation,butmoreimportantly,thismeansthatdeterminingNobelPrize laureatesisaverydifﬁculttask,andtheprecisionwillnotbeveryhigh.Then,aswementionedabove,thedistributiontrend ofthefourindicatorsisbasicallythesame,allofwhichshowseriousbiasesontime.Thismeansthatthesestaticindicators cannolongerevaluateacademicachievementsfairly,duetothefactthattheacademicenvironmentchangescontinuously (Fire&Guestrin,2019).ThedetailedsimulationresultsofthefourmethodsareshowninTable2.Fromtheperspectiveof precisionandaverageranking,wecanseethatG-indexperformsslightlybetterthanH-index,andtheCCmethodisthebest amongthem.Unsurprisingly,thepapercountistheworstindex,becausePConlyconsiderstheproductivitybutignoresthe quality.

TheresultsoftwonewmethodsCCCandCPCareshowninFig.4.Itshouldbenotedthatsinceournewmethodscontains atunableparameter˛,inourexperimentsweset˛from0to1,andcalculateevery0.01step.TheresultofCCCisshown intheﬁrstrowofFig.4.First,fromtheperspectiveofprecision,comparedwiththeoriginalCCmethod(when˛=0),most ofthe˛valuescanimprovetheidentiﬁcationprecision(exceptforasmallrangearound0.3),andtheoptimalprecision

(7)

Table2

Quantitativeresultsforallawards.Forthesakeoffairnesswhencomparingallawards,weadjust˛=0.64forCCCmethod(Experimentalresultsshowthat when˛=0.64,CCChasthebestpredictionprecisionontheNobelPrizedataset.Inordertocompareperformancewithoutlossofgenerality,wewillﬁx thisparameteronthethreeotherdatasets).

Prize CCC CC H-index G-index PC

Percision AR RIR Percision AR Percision AR Percision AR Percision AR

NobelPrize 0.222 21,370 0.717 0.139 30,617 0.095 33,761 0.114 32,440 0.013 35,921

EnricoFermiPrize 0.120 28,145 0.820 0.000 44,736 0.000 47,582 0.020 45,762 0.020 46,160 MaxPlanckMedal 0.069 20,768 0.724 0.034 37,737 0.017 48,652 0.017 46,287 0.017 53,448

WolfPrize 0.100 15,917 0.560 0.060 18,108 0.060 23,062 0.060 24,505 0.020 29,953

Fig.4. Dependenceoftheresultsontheparameter˛.Thesubgraphsa,bandcaretheresultsofourCCCmethod,andthesubgraphsd,eandfaretheresults oftheCPCmethod.Theleft,middle,andrightsidesrepresentprecision,averagerank,andrankingincreaseraterespectively.ThebestprecisionofCCCis 0.222,andthecorresponding˛is0.64.AndthebestprecisionforCPCis0.146withthe˛equalsto0.72.Asthe˛increases,theaveragerankingofthetwo methodsaremonotonicallyincreasing,whileboththerankingincreaseratesshowmonotonousdecreasingtrends.

0.222isreachedwhenalphaissetto0.64.Aswecansee,theaveragerankingshowsamonotonousincreasingpatternas˛ increases.However,thisdoesnotmeanthatthegreater˛isthebettertheresultwillbe.AswecanseefromtheRIRcurve, itshowsamonotonousdecreasingtrend.Thismeanswhenweincrease˛,theriseinaveragerankingsisduetoasmall numberofNobellaureatesmovingupsigniﬁcantly,notoveralloptimization.ThisphenomenonalsoindicatesthatARisnot aneffectiverankingmethodforidentifyingprize-winningscientists.Tosumup,ourCCCmethodshowsagoodperformance, whichindicatestheadvantageofintroducingthecompetitionmechanismintotheauthorrankingtask.

AsforCPC,itcanalsoprovetherobustnessofourmethod.Itiswellknownthatthereisapositivecorrelationbetween CCandPC(Bayer&Folger,1966).ComparedtoCC,PConlyaccountsforthequantityofpapersofscholarsbutnotquality information.IfourmethodstillhasgoodperformanceonPC,itindicatesthatourmethodisrobustagainstlowqualitydata. TheresultsofCPCareshowninthesecondrowofFig.4.SimilartotheresultsofCCC,theprecisionofCPCincreasesﬁrstand thendecreases.WecanseethattheRIRvaluestaysabove70%allthetime,whichmeansthatourmethodcaneffectively improvetherankofmostNobelPrizelaureates.ItshouldbenotedthatthereisagapattheleftsideofthetwoARcurvesin Fig.4(e).ThisisbecauserankingscholarsbyusingPCmaymakemanyscholarsindistinguishable.Forexample,thereare892 scholarsowning20papersinourdataset.Howeverafterusingournewindicator,thesescholars’scoreswillbecompletely differentandtherankingsoflaureatesbecomebetter.Itisworthnotingthatthereisagreatimprovementofprecisionin Fig.4(d)inwhichtheoptimalvalueoftheprecisionis0.146(when˛=0.72).Thisprecisionis11timeshigherthanusing theproductivitytoidentifyNobellaureates(when˛=0,precision=0.0127),anditisevenbetterthantheprecisionofCC (precision=0.139).Thisresultshowsthatafterconsideringthecompetitioneffect,ifweonlyuseproductivitytoidentify,it canalsoachievethesimilareffectasthecitationcount.

InordertocompareCCandCCCfromamoreintuitiveperspective,weusetherankingresultsofthetwoindicatorsto makeascatterplot,asshowninFig.5(a).Ifascholarisabovetheblackdiagonalline,itmeansthathis/herCCCrankingis betterthanCCranking,andthefartherthedeviatesfromtheline,themorehis/herrankingisimprovedbyCCC.Ascanbeseen fromFig.5,themajorityofNobelPrizelaureates(reddots)areabovethediagonalline,indicatingthatCCChasimprovedthe rankingofmostNobelPrizelaureates,whichistheresultofoverallimprovementratherthanlocalimprovement.Although someoftheNobellaureates(reddots)arebelowtheblacksolidline,thedeviationsaremuchsmallerthanthoseforthe dotsaboveit.Thatisactuallybecause,thosescholarswhoserankingsarenotpromoteddonotdropsigniﬁcantly,andthey

(8)

Fig.5.Thecomparisonofrankingresultsofdifferentmethods.Itshouldbenotethatwetakelogarithmforallrankings.(Left)Thex-axisistheranking ofascholarcalculatedbyCCC(˛=0.64),andthey-axisistherankingofascholarbyusingtheCC.(Right)Thex-axisisrankingofascholarcalculated byCPC(˛=0.72),andthey-axisistherankingofascholarbyusingthePC.Inthesetwofigures,eachdotrepresentsascholar.Theredmarkersandblue dotsindicatetheNobelPrizelaureatesandotherscholarsrespectively.Blacksolidlinesarey=x,indicatingthatthedotonithasthesamerankingsintwo methods.Theblackdashedlinesindicatetheboundariesofidentification,thatisx=ln158andy=ln158,sotheleftsideoftheverticaldashedlineindicates theidentificationresultofournewindicator,andthelowersideofthehorizontaldashedlineindicatestheidentificationresultofthetraditionalmethod.

canstillretaintheiroriginalrankings.Takingtwoscholarsasexamples,thefirstscholar‘ArthurH.Compton’andthesecond schoar‘CarlE.Wieman’whoaremarkedinthefigure.AmongalltheNobelPrizelaureatesidentifiedbyusingCCC,‘Arthur H.Compton’hasthelowestCCranking,hisCCCrankingandCCrankingare72and10,141respectively.Thisisasignificant improvementintheranking,whichlifts‘ArthurH.Compton’intotop-1%group.Meanwhile,amongalltheNobelPrize laureatesidentifiedbyusingCC,‘CarlE.Wieman’hasthelowestCCCranking,hisCCCrankingandCCrankingare453and 145respectively.Althoughthereisadropinhisranking,hestillremainsinthetop-1%group.

Inaddition,aswecanseethatthefigureisdividedintofourregionsbytwodashedlines.Thelowerleftareaisthe commonidentificationareaoftwomethods,thatis,thescholarsidentifiedbythetwomethodsatthesametime.Thelower rightareaandupperleftarearepresenttheadditionalidentificationareaofCCandCCCrespectively.WecanseethatCCC canretainthemostoflaureatesidentifiedbyCC(onlyfivereddotslocatedatthelowerrightarea).Asshowninthefigure, take‘StevenWeinberg’and‘PhilipW.Anderson’astwoexamples,nomatterwhichmethodisused,thesetwoscholarsare alwaysrankedveryhigh.Atthesametime,intheupperleftcorner,wecanseeCCCsuccessfullyidentifyother18Nobel Prizelaureates.ThisshowsthatCCCisasuccessfulextensionmethodofCC.TheresultsofCPCandPCareshowninFig.5.The sameastheleftone,mostoftheNobelPrizelaureates(reddots)arelocatedabovetheblacksolidline,andtheirdeviations aremuchlargerthanthereddotsbelowit(takethefourscholarsmarkedinthefigureasexamples).TherearenoNobel PrizelaureatesinthecommonidentificationareaofPCandCPC,mainlybecauseofthepoorperformanceoftheoriginalPC method.Inotherwords,thenewmethoddoesnotinheritanyusefulresultsfromPC.

Finally,weusetheTop-n%metrictocomparetheresultsofvariousmethods.Inordertoenrichtheresults,wealso usethreeotherfamousphysicsawards,theWolfPrize(50laureates),theEnricoFermiprize(50laureates)andtheMax PlanckMedal(58laureates).ComparedwiththeNobelPrize,thesethreeprizeshaveasmallernumberoflaureates,and thedistributionofthewinningtimeisnotaswideastheNobelPrize.Herewedonotplotthesethreeprizesseparately, butputalltheawardstogether(speciﬁcresultscanbefoundinTable2).Finallywehave272laureatesfromfourawards, someofwhommaywonmultipleawards.Inthetop-n%metric,weexpectthatthebetterthealgorithmperforms,themore awardedauthorswillbefoundinthetoprankpercentage.AswecanseefromtheFig.6,ourcompetition-awaremethod alwaysperformsbetter.Thetwocategories‘below0.1%’and‘0.1%-1%’alwayscontainmorelaureatescomparedtoother methods.Meanwhile‘1%-10%’and‘above10%’alwayshavefewerlaureates.Thismeansthatourmethodwouldidentify mostoftheawardwinningauthorsbyplacingthoseatarelativelyhighranks.Herewealsouseatabletoshowmoredetails ofperformanceonthefourawards,seeTable2.Weadjust˛to0.64.Fromtheresults,wecanseethatourmethodperforms wellonbothfourdatasets,andparameter˛seemstobeuniversal(Althoughinaspeciﬁcaward,0.64maynotbethebest choice,itstillworksbetter).

4.3. Theroleof˛

WhenweintroducetheEqn(1)proposedinsection3,weregard˛astheparameterthatisusedforadjustingthedegree ofattentionofthecompetition.Alarger˛indicatesthatstrongereffectofcompetitionisconsidered,whichisthemost intuitiveinterpretationof˛.Atthesametime,accordingtotheresultsinsection4.2,wecanﬁndthatalthoughmost˛can

(9)

Fig.6. Thetop-n%measurementon4awardsinphysics(theNobelPrize,theMaxPlanckMedal,theWolfPrize,andtheEnricoFermiPrize).Wefix˛=0.64 forCCC.Theperformanceofascholardependsonthepercentileofthescholar’sranking.Thelowerthispercentageis,thebettertheperformanceofthe consideredscientistis.Theheightofthebarindicatestheproportionoflaureates.TheleftfigureonlyconsiderstheNobelPrize,whiletherightfiguretakes intoaccountallfourprizes.WecanfindnomattertheNobelPrizeorallprizesweconsidered,ourcompetition-awaremethodperformsbetterthanothers.

Fig.7. WeuseKolmogorov-Smirnovtest(seemaintextforexplanation)toexplaintheroleofparameter˛.Theﬁgureontheleftsideisthecurveof KSscoreswithparameter˛,andthesmallinsetgraphisthecurveofp-values.Thethreesubgraphsontherightsidearethecomparisonsbetweenthe distributionoftheidentiﬁcationresultsunderthreedifferentparametervalues(˛=0,0.64,1)andthedistributionoftherealNobelPrizelaureates.Black barrepresentsthedistributionoftheNobelPrizelaureatesandredbarrepresentsthedistributionofourresults.

improvetheidentificationeffect,thereisalwaysabestchoiceof˛.Whyistherealwaysabestresultafteradjusting˛toa specificvalue,andhowdoes˛affecttheidentificationresults?Inthissection,weaimtoillustratetheeffectof˛.

˛isactuallyhelpingourmethodtofindthebestoutputdistribution,whichistheclosesttothedistributionofreal NobelPrizelaureates.Tobetterillustratethispoint,weuseKolmogorov-Smirnovtest(KStest)(Hodges,1958).Thistest isanonparametrictestoftheequalityofcontinuous(ordiscontinuous),one-dimensionalprobabilitydistributionsthatcan beusedtocomparetwosamples.ThelargertheKStestscore,thelargerthedistancebetweenthetwodistributions;the smallerthescore,themoresimilarthetwodistributionsare.Meanwhile,theKSscorealsogivesthecorrespondingp-value foranalyzingthesignificanceofresult.Theprocedureofexperimentisthesameasinsection4.2,where˛isfrom0to1,and thedistancebetweenthedistributionoftheidentificationsetandthedistributionoftherealNobelPrizesetiscalculatedin every0.01.

TheresultsareshowninFig.7.Wecanclearlyseethatunderthechangeof˛,theKSscoredecreasesfirstandthen increasessharply.Theminimumvalueofthedistanceappearsaround˛=0.6,andasmallerscoremeansthedistributionof ouridentificationcanmatchthedistributionoftherealNobelPrizebetter.InthethreehistogramsontherightsideofFig.7, weshowthedistributionsofourresultsunderthreespecialcasesof˛andthedistributionoftherealNobelPrizelaureates. Inthefirsthistogram,weset˛=0,whichmeansthattheredbarshowsthedistributionoftheCCresult.Itcanbeseen thatidentificationtendstoconcentratearound1970.Thisiseasytounderstand,sincewehaveanalysedinFig.3insection 4thatthedistributionsofthefourtraditionalindicatorsarebasicallythesame,withpeaksaround1970.Therefore,the distributionsoftopscholarscalculatedbytraditionalindicatorsalwaysappearnear1970.When˛=0.64,theidentification

(10)

Table3

Logisticregressionresults

Crossvalidation CV-1 CV-2 CV-3 CV-4 Average

A AR 8,242 9,829 6,405 6,105 7,646 F1-score 0.016 0.015 0.016 0.016 0.016 Precision 0.225 0.200 0.100 0.051 0.144 B AR 7,012 7,651 5,840 5,061 6,391 RIR 0.525 0.650 0.625 0.550 0.588 F1-score 0.026 0.022 0.026 0.021 0.024 Precision 0.250 0.250 0.150 0.077 0.182 C AR 5,840 5,356 4,305 4,580 5,020 RIR 0.725 0.825 0.725 0.667 0.735 F1-score 0.030 0.026 0.031 0.030 0.029 Precision 0.275 0.300 0.175 0.103 0.213 CCC AR 6,542 6,528 5,013 3,232 5,329 RIR 0.625 0.825 0.700 0.692 0.711 Precision 0.250 0.300 0.200 0.053 0.201

precisionreachestheoptimalvalue(0.222).Ascanbeseenfromthesecondhistogram,thematchingresultsoftheredbar andtheblackbararesigniﬁcantlybetterthantheﬁrstcaseandthedistributionofredbarbecomeswider.Itisparticularly noteworthythatbetween1900and1940,CCpredictsthatthenumberoftopscholarsinthisperiodisalmostzero,and thissituationisimprovedverywellafterincreasing˛to0.64.Inthethirdhistogramweshowtheresultswhen˛=1.0.It canbeseenthatthealgorithmpaystoomuchattentiontothecompetition,andthedistributionofthepredictionmainly concentratesbetween1900and1940.ItcanalsobeseenfromthecurveofKSscorethatwhenthecompetitionplaystoo muchroleinourmethodwhilethecitationsareignored,theKSscoreincreasesrapidlyto0.67.

Ingeneral,throughtheaboveanalysis,webelievethattheparameter˛isactuallyusedtoadjustthedistributionofthe identiﬁcationresult.Anoptimal˛makessurethatthereisnoobviousbiasontimewhenwemakeidentiﬁcation,andthe outputdistributionismoreclosetothedistributionofreallaureates.

4.4. Furtherexplorationwithmachinelearning

Attheendofthepaper,wewanttoillustratethatourcompetition-awaremethodcanalsobecombinedwithmachine learningmethods,which deservesmore exploration.Wetakethelogisticregression (Owen,2007)asanexample.We considerthetaskofpredictingNobelPrizelaureatesasaspecialbinaryclassiﬁcationtask.Weuselabel1toindicatethe NobelPrizewinner,andlabel0toindicatescholarwhodoesnotwintheNobelPrize.Thispredictiontaskisveryspecial becauseitstrainingdatasetisextremelyimbalanced.Thesizeofthepositivesamplesintrainingdatasetisonly158but Thesizeofthenegativesamplesisabout230,000.Soduringthetrainingphase,weuseweightedmethodstogivegreater weighttothepositivesamplestosolvethisproblem.

Toillustratetheimportance ofthecompetitionmechanism,wegivethree trainingstrategies formachine learning experiments.

• (A)Citationcount.Takethecitationcountastheuniquefeatureofsampletomakeprediction.

• (B)Takecitationsineachsingleyearasasinglefeature.Sointhiscaseweusevectorstorepresentdifferentsamples. • (C)BasedoncaseB,weusetheconceptofcompetitiontonormalizeallfeaturevectors.SeeEqn(1)fordetails.

Inaddition,inordertocomparepredictionresultsmorecomprehensively,weusefourfoldcrossvalidationsandfour metricsAR,RIR,Precision,andF1-scoretomeasuretheperformanceofalgorithm.

ItcanbeseenfromtheTable3thatstrategyCachievesthebestperformance,andstrategyAhastheworstresult.Compared withstrategyA,thetrainingsetinstrategyBisupgradedsuchthateachsamplegetsextrainformationabouttime.This extrainformationisveryhelpfultoimprovethepredictionprecision.ComparedtostrategyB,wecanseethatinstrategy Cthepredictionprecisionofthemachinelearningalgorithmhasbeenfurtherimprovedafterthecompetitionmechanism isperformed.Infact,forstrategyC,theroleofthecompetitionmechanismisusedasadatanormalizationprocess.This procedureisconducivetoimprovingtheperformanceofmachinelearningmethods,whichindicatesthatconsideringthe competitioneffectisalsohelpfulfordesigningtrainingsetsformachinelearningalgorithms.

Attheendofthetable,wealsoshowthepredictionresultsofthesimplecompetition-awarecitationcount(CCC).It canbeseenfromtheresultsthattheperformanceofCCCissigniﬁcantlybetterthanthatofstrategyB,whichmeansthat ourcompetitionmechanismisusefulitself.StrategyChasthehighestaccuracy,whichalsomeansthatmachinelearning algorithmsaresuperiortotraditionalmethodsinrankingorpredictiontasks.Insummary,alltheresultsshowthatthe competitionisacrucialfactorforidentifyingprizewinningscientists.

(11)

5. Conclusion

Inthispaper,wemainlydiscusshowtomeasuretheacademicachievementsforscholarsmorereasonably.Weintroduce thecompetitionmechanismthroughanalysisontheAmericanPhysicalSociety(APS)dataset.Withtherapiddevelopment ofmodernscienceandtechnology,thenumberofnewscholarsandnewacademicpapersareincreasingexponentially.This meansthatscientistsarefacingmoreandmorecompetitorsinacademia.Therefore,juniorscholarsmayfindthatitwillbe moredifficultforthemtoobtainthoselifetimeachievementawardsovertime.Meanwhile,thisalsomeansthattraditional academicindicatorsmayhavelosttheirmeaningwhenevaluatingscholarsfromdifferentperiods(Fire&Guestrin,2019). Comparedwithseniorscholars,juniorscholarsareinalargerandmoreactiveacademicenvironment,itiseasierforthem toobtainmorecitationsandattentions,butitdoesnotmeanthattheirperformancesarebetter.Therefore,ifwedonottake thecompetitioneffectintoaccount,itisdifficultforustoevaluatescholars’achievementfairly.

Thenweuseaseriesofexperimentstoproveouridea.Fromtheresults,welearnthataftertakingthecompetitionfactor intoaccount,thepredictivepowerofournewmethodshowsagreatimprovement.Takecitationcountasanexample, thepredictionprecisionincreasesfrom0.139to0.222.Theaveragerankingofallreallaureatesincreasesfrom30,617to 21,370.Moreover,accordingtotheindicatorRIR,71.7%ofthelaureates’rankingsareimprovedbyusingournewmethod. Ournewmethodistheoptimalmethodamongthetraditionalindicatorswehavelistedforcomparison.Atthesametime, wealsomadeafurtherstepbyusingthepapercounttoprovethatourmethodhasenoughrobustnessandgenerality.The resultsshowthatwhenthenumberofpapersistheonlyknowninformationtodeterminetheNobelPrizelaureates,our newmethodstillhasasigniﬁcantimprovement.Theprecision0.146isreachedwhencompetitionistakenintoaccount.The averagerankingofallreallaureatesrisesfrom35,921to18,462.Moreover,accordingtotheindicatorRIR,76.1%ofthereal laureates’rankingsareimproved.Insummary,theaimofourmethodisnottoreplaceotherrankingtechniques,optimized andalmostperfectedinthecourseofmanyyears.Wewanttoillustratetheexistenceofthecompetitionfactorandsolve itbyasimpleway.Ourresearchhasimportantpracticalvalue,suchasprovidingvaluablereferencesforacademicawards, selectingsuitablescholarsasfaculties,andpromotingoutstandingacademicstaff.

Therearealsonumerousmeaningfulextensionsforthefutureresearch.First,onecantrytocombineourideawith othermethods,suchasthediffusionbasedmodelsdiscussedinintroduction.Thiskindofmethodscanusemorestructure informationofcitationnetwork,whichwillbenefittheresults.Similartocompetition-awarecitationcount,wecouldadjust theimpactofeachpaperaccordingtothecompetitionmechanism.TakingPagerankasanexample,wecangiveeachout edgeofapaperaweightbasedonitspublicationyear,andthisweightisdeterminedbythetotalnumberofreferencesor thenumberofpapersinthatyear.Second,itiswellknownthattherearemanybranchesinthephysicstoday.Theimpact ofcompetitionwithinthesebranchesisdifferentfromeachother.Therefore,whencomparingthescholarsfromdifferent academicdomains,itwillbefairer,ifweperformanalysistodistinguishdifferentdomains.Forexample,asmentionedby Zengetal.(2019),onecanusethecommunitydetection(Fortunato&Hric,2016)todetectdifferentresearchbranches oncitationnetwork.Thesizeofthosecommunitiescanrepresentthedegreeofcompetition.Sobesidesthecompetition betweendifferentperiods,wecanalsocalculatethecompetitioninsidedifferentacademicdirections.Thesetwofactors togetherdecidetheweightofedgesinthecitationnetwork.Inadditiontoimprovingtheprediction,onecanalsoanalyze thestrengthofcompetitionbetweendifferentfieldsaccordingtothefittingresults.

CRediTauthorshipcontributionstatement

YuhaoZhou:Software,Validation,Writing-originaldraft,Writing-review&editing,Formalanalysis.RuijieWang: Writing-originaldraft,Writing-review&editing.AnZeng:Conceptualization,Methodology,Writing-review&editing, Formalanalysis,Datacuration.Yi-ChengZhang:Conceptualization,Supervision.

Acknowledgment

TheauthorswouldliketothankMatúˇsMedoforhelpfuldiscussion.ThisworkwassupportedbytheSwissNational ScienceFoundation(GrantNo.200020-156188).

References

Abrishami,A.,&Aliakbary,S.(2019).Predictingcitationcountsbasedondeepneuralnetworklearningtechniques.JournalofInformetrics,13,485–499.

Bar-Ilan,J.(2008).Informetricsatthebeginningofthe21stcenturyałareview.Journalofinformetrics,2,1–52.

Barabási,A.L.,&Albert,R.(1999).Emergenceofscalinginrandomnetworks.science,286,509–512.

Bayer,A.E.,&Folger,J.(1966).Somecorrelatesofacitationmeasureofproductivityinscience.Sociologyofeducation,381–390.

Bianconi,G.,&Barabási,A.L.(2001).Bose-einsteincondensationincomplexnetworks.Physicalreviewletters,86,5632.

Bornmann,L.,&Daniel,H.D.(2005).Doestheh-indexforrankingofscientistsreallywork?Scientometrics,65,391–392.

Chen,P.,Xie,H.,Maslov,S.,Redner,S.2007.Findingscientiﬁcgemswithgooglea´rspagerankalgorithm.

Ding,Y.,Yan,E.,Frazho,A.,&Caverlee,J.(2009).Pagerankforrankingauthorsinco-citationnetworks.JournaloftheAmericanSocietyforInformation

ScienceandTechnology,60,2229–2243.

Egghe,L.(2006).Theoryandpractiseoftheg-index.Scientometrics,69,131–152.

Fiala,D.(2012).Time-awarepagerankforbibliographicnetworks.JournalofInformetrics,6,370–388.

Fire,M.,&Guestrin,C.(2019).Over-optimizationofacademicpublishingmetrics:observinggoodharta´rslawinaction.GigaScience,8,giz053.

Fortunato,S.,&Hric,D.(2016).Communitydetectioninnetworks.Auserguide.Physicsreports,659,1–44.

(12)

Garﬁeld,E.,etal.(1970).Citationindexingforstudyingscience.Nature,227,669–671.

Gingras,Y.,&Wallace,M.(2009).Whyithasbecomemoredifﬁculttopredictnobelprizewinners:abibliometricanalysisofnomineesandwinnersofthe

chemistryandphysicsprizes(1901-2007).Scientometrics,82,401–412.

Hirsch,J.E.(2005).Anindextoquantifyanindividual’sscientiﬁcresearchoutput.ProceedingsoftheNationalacademyofSciences,102,16569–16572.

Hodges,J.1958.Thesigniﬁcanceprobabilityofthesmirnovtwo-sampletest.ArkivförMatematik3,469-486.

Jin,B.,Liang,L.,Rousseau,R.,&Egghe,L.(2007).Ther-andar-indices:Complementingtheh-index.Chinesesciencebulletin,52,855–863.

Light,R.P.,Polley,D.E.,&Börner,K.(2014).Opendataandopencodeforbigscienceofsciencestudies.Scientometrics,101,1535–1551.

Liu,X.,Bollen,J.,Nelson,M.L.,VandeSompel,H.2005.Co-authorshipnetworksinthedigitallibraryresearchcommunity.Informationprocessing& management41,1462-1480.

Lü,L.,Zhou,T.2011.Linkpredictionincomplexnetworks:Asurvey.PhysicaA:statisticalmechanicsanditsapplications390,1150-1170.

Mariani,M.S.,Medo,M.,&Zhang,Y.C.(2016).Identiﬁcationofmilestonepapersthroughtime-balancednetworkcentrality.JournalofInformetrics,10, 1207–1223.

Newman,M.E.(2009).Theﬁrst-moveradvantageinscientiﬁcpublication.EPL(EurophysicsLetters),86,68001.

Owen,A.B.(2007).Inﬁnitelyimbalancedlogisticregression.JournalofMachineLearningResearch,8,761–773.

Papadopoulos,F.,Kitsak,M.,Serrano,M.Á.,Boguná,M.,&Krioukov,D.(2012).Popularityversussimilarityingrowingnetworks.Nature,489,537.

Perc,M.(2014).Themattheweffectinempiricaldata.JournalofTheRoyalSocietyInterface,11,20140378.

Price,D.d.S.(1976).Ageneraltheoryofbibliometricandothercumulativeadvantageprocesses.JournaloftheAmericansocietyforInformationscience,27, 292–306.

Price,D.J.D.S.(1965).Networksofscientiﬁcpapers.Science,510–515.

Radicchi,F.,Fortunato,S.,Markines,B.,&Vespignani,A.(2009).Diffusionofscientiﬁccreditsandtherankingofscientists.PhysicalReviewE,80,056103.

Redner,S.(1998).Howpopularisyourpaper?anempiricalstudyofthecitationdistribution.TheEuropeanPhysicalJournalB-CondensedMatterand

ComplexSystems,4,131–134.

Shen,Z.,Chen,F.,Yang,L.,&Wu,J.(2019).Node2vecrepresentationforclusteringjournalsandasapossiblemeasureofdiversity.JournalofDataand

InformationScience,4,79–92.

Sinatra,R.,Wang,D.,Deville,P.,Song,C.,&Barabási,A.L.(2016).Quantifyingtheevolutionofindividualscientiﬁcimpact.Science,354,aaf5239.

VanHooydonk,G.(1997).Fractionalcountingofmultiauthoredpublications:Consequencesfortheimpactofauthors.JournaloftheAmericanSocietyfor

InformationScience,48,944–945.

Walker,D.,Xie,H.,Yan,K.K.,&Maslov,S.(2007).Rankingscientiﬁcpublicationsusingamodelofnetworktrafﬁc.JournalofStatisticalMechanics:Theory

andExperiment,2007,P06010.

Yan,E.,&Ding,Y.(2009).Applyingcentralitymeasurestoimpactanalysis:Acoauthorshipnetworkanalysis.JournaloftheAmericanSocietyforInformation

ScienceandTechnology,60,2107–2118.

Zeng,A.,Shen,Z.,Zhou,J.,Fan,Y.,Di,Z.,Wang,Y.,Stanley,H.E.,&Havlin,S.(2019).Increasingtrendofscientiststoswitchbetweentopics.Nature

communications,10,3439.

Zeng,A.,Shen,Z.,Zhou,J.,Wu,J.,Fan,Y.,Wang,Y.,&Stanley,H.E.(2017).Thescienceofscience:Fromtheperspectiveofcomplexsystems.Physics

Reports,714,1–73.

Identifying prize-winning scientists by a competition-aware ranking