Identifying prize-winning scientists by a competition-aware
ranking
Yuhao Zhou
a, Ruijie Wang
a, An Zeng
b,∗, Yi-Cheng Zhang
aaDepartment of Physics, University of Fribourg, Fribourg 1700, Switzerland bSchool of Systems Science, Beijing Normal University, Beijing, 100875, PR China
Evaluating scholars’ achievements is an important problem in the science of science with
applications in the evaluation of grant proposals and promotion applications. Since the
number of scholars and the number of scholarly outputs grow exponentially with time,
well-designed ranking metrics that have the potential to assist in these tasks are of prime
importance. To rank scholars, it is important to put their achievements in perspective by
comparing them with the achievements of other scholars active in the same period. We
propose here a particular way of doing so: by computing the evaluated scholar’s share on
each year’s citations which quantifies how the scholar fares in competition with the others.
We assess the resulting ranking method using the American Physical Society citation data
and four prestigious physics awards. Our results show that the new method significantly
outperforms other ranking methods in identifying the prize laureates.
1. Introduction
Measuring the academic impact of scientists is an important research direction in the science of science (Zeng et al. (2017)). With the rapid development of science and technology, more and more scholars have devoted themselves to scientific research. Not only a large number of new academic papers appear every year, but also more junior scholars are emerging. So how to scientifically measure the academic influence of scientists and academic achievements are becoming more and more important. A fair measurement of academic influence can help scientific institutions select suitable scholars as faculties, promote outstanding academic staff, provide valuable references for academic awards, and evaluate grant proposals.
According to a recent review (Zeng et al., 2017), there are three main types of methods for evaluating a scholar’s academic achievements. The first type is traditional static indicators. This kind of methods are the most widely used and studied, such as citation count (Garfield, 1970;Redner, 1998), H-index (Bornmann & Daniel, 2005;Light, Polley, & Börner, 2014), G-index (Egghe, 2006), and so on (Hirsch, 2005;Jin, Liang, Rousseau, & Egghe, 2007). These indicators are generally based on simple statistics, so they require less information and are more convenient to calculate. Meanwhile they also have obvious shortcomings. One example is the famous H-index, which combines the number of citations and the number of papers to measure the impact of a scholar. Although this metric is a combined measure of the quantity and quality of a scientist’s publications, it is strongly dependent on the number of papers. For a scholar who publishes limited number of papers, despite that these papers have high impact, his/her H-index cannot be high. This means that although these traditional methods
∗ Corresponding author.
E-mail address:anzeng@bnu.edu.cn(A. Zeng).
http://doc.rero.ch
3XEOLVKHGLQ-RXUQDORI,QIRUPHWULFV
ZKLFKVKRXOGEHFLWHGWRUHIHUWRWKLVZRUN
aresimple,therearelimitationsinthem.Thesecondtypeisnetwork-awaremethods(Bar-Ilan,2008;Yan&Ding,2009). Inthescienceofscience,thecitationrelationshipcanbeseenasacomplexnetwork,whichiscalledscientificnetwork (Price,1965).Throughthenoderankingalgorithms,manyscientificinfluencemeasurementsforcitationnetworkshave beenderived,suchas,PageRank(Chen,Xie,Maslov,&Redner,2007),authorrank(Liu,Bollen,Nelson,&VandeSompel, 2005),personalizedPageRank(Ding,Yan,Frazho,&Caverlee,2009),time-awarePageRank(Fiala,2012)anddiffusion-based scienceauthorrankalgorithm(Radicchi, Fortunato,Markines,&Vespignani,2009),etc.Third,somespeciallydesigned measurementsforevaluatingscholars,suchas,dynamicmethod(Sinatra,Wang,Deville,Song,&Barabási,2016),credit allocationbasedmethod(VanHooydonk,1997)andmachinelearningbasedmethods(Abrishami&Aliakbary,2019;Shen, Chen,Yang,&Wu,2019).
Allthesemethodshavethesamelimitation,thatisthetimebiasproblem.Inordertoobtainafairrankingofscholars, manyresearchworkshavebeendone.Somescholarsfocusonsolvingtimebiasproblemsonasmallscale.Forexample,the time-awarePageRankproposedby(Fiala,2012),wementionedabove,solvesaseriesofsmallscaletimebiasproblemson citationrelationshipbetweenscholars.Theirresultsshowthatsolvingthesetimebiasescanhelptoimprovetheaccuracy ofprediction.Someotherresearchesfocusonlarge-scaletimebiases.Asmentionedabove,withtherapiddevelopmentof scienceandtechnology,theacademicenvironmentiscontinuouslychanging,butthemainstreamevaluationmethodsfor scholarsarealmoststatic,soit’sunfairwhencomparingscholarsfromdifferentages.Forexample,thereisaclassictimebias problemwhichiscausedbythewell-knownpreferentialattachmentmechanism(Bianconi&Barabási,2001;Papadopoulos, Kitsak,Serrano,Boguná,&Krioukov,2012).Inthescienceofscience,thisproblemexistsatboththescholarlevelandthe paperlevel(Price,1965,1976).Thisproblemcanbedescribedas,comparedtojuniorscholars,seniorscholarshavean advantageintimebecausetheystartedtheircareersearlier.Soseniorscholarshavelongertimetoaccumulateattention. Duetotheirreputation,theirworkwillbethoughtmorevaluable,soeventuallytheybecomefamousallovertheworld. Onthecontrary,itishardforjuniorscholarstogetattentions.Manyscholarshavepointedout(Barabási&Albert,1999; Perc,2014)thisproblem.However,accordingtoouranalysis,besidesthepreferentialattachment,thereisanothertime biasissueinevaluatingthelong-termorlifetimeachievementsofscholars.Itcanbesimplyunderstoodaswhencomputing theacademicachievementsofascholar,notonlyhis/herownpapersandcitationsshouldbeconsidered,butalsothepeers duringthesameperiodshouldbetakenintoaccount.Themorescholarsinthesameperiod,themoreacademicpapersare publishedatthisperiod,Soit’seasiertoaccumulatecitationswhilemoredifficulttocompeteforanacademicachievement award.Therefore,itisimportanttoconsidertheacademicenvironmentwhenevaluatingtheachievementsofscholars. Wenamethisphenomenoncompetition.Toexplainandvalidateourtheory,weusethewell-knownAmericanPhysical Society(ASP)dataset(Sinatraetal.,2016).Toverifythatourmethodcandetermineascholar’sacademicachievementmore reasonably,inthispaperweusetheNobelPrizeinPhysics,theWolfPrize,theEnricoFermiPrizeandtheMaxPlanckMedal asbenchmarkdatasets.
Thispaperwillbecomposedofthefollowingparts.Inthesecondsection,wewillbrieflydescribetheAPSdatasetwe used,definesomemathematicalsymbolsandintroducefourmetricsformeasuringexperimentalresults.Inthethirdsection, wefirstperformanalysisonAPSdatasettointroducethecompetitionprobleminevaluatingscholars.Thenweexplainin detailtheexistenceandimportanceofthecompetition.Finally,weproposeanewmethodbasedontheaboveanalysis.In section4,weillustratetheeffectivenessofourmethodbyanalysingtheresultsofexperiments.InSection5,wediscussthe roleoftheparameterinourmethod.Section6istheconclusionofthispaper.
2. Data,symbolsandmetrics
Inthispaper,weanalyzethepublicationdatafromalljournalsofAPS.Thedatacontains482,566papers,rangingfrom year1893toyear2010.Forthesakeofauthornamedisambiguation,weusetheauthornamedatasetprovidedbySinatra etal.whichisobtainedwithacomprehensivedisambiguationprocessintheAPSdataset(Sinatraetal.,2016).Eventually,a totalnumberof236,884distinctauthorsarematched.WeusetheNobelPrizeinPhysicsasthemainvalidationdatasetfor thispaper.TheNobelPrizedatasetincludesallNobelPrizelaureatesfromyear1900toyear2015,andatthesametimewe ensurethattheseNobelPrizelaureateshavepublishedatleastonepaperontheAPSdataset.Eventually,atotalnumberof 158distinctNobelPrizelaureatesarematched.Meanwhile,wealsoprovidetheresultsoftheWolfPrize,theEnricoFermi Prize,andtheMaxPlanckMedal.Itshouldbenotedthatinthefollowinganalysis,wemainlyshowtheresultsontheNobel Prizedataset(Theresultsfortheotherthreeawardsareshowninatableattheend).Therearethreereasons:
• TheNobelPrizeinPhysicshasthehighestauthorityinthefieldofphysics,whichmeansthatalaureatehasagreatinfluence ontheprogressofscienceandtechnology.
• TheNobelPrizeinPhysicshasalonghistory,andthelaureatesaremorewidelydistributedintime.Thisismoresuitable forourresearch,thatis,howtocomparetheachievementofscholarsfairlyinalongtimespan.
• TheperformancesoftheNobelPrizelaureatesareuneven.Somelaureatesmayhaveover10,000citations,whilesome ofthemonlyhavelessthan1,000,whichalsomakesitdifficulttodeterminethelaureatesoftheNobelPrize(Gingras& Wallace,2009).Therefore,usingtheNobelPrizeinPhysicsasourbenchmarksetforevaluatingacademicachievements hasacertaindegreeofnovelty.
Table1
Asimpleandconcreteexampleforcalculatingmetrics
Name Originalscore Originalranking Newscore Newranking
a 10 1 2 6 b 9 2 6 2 c 8 3 4 4.5 d 7 4 7 1 e 6 5 5 3 f 5 6 4 4.5 g 4 7 1 7 Precision 0.33 0.67 AR 4 2.5 RIR 0.67
Wefirstgivesomedefinitionsofimportantsymbolshere.Alllowercaselettersindicatetheattributesofapaper,andall uppercaselettersindicatetheattributesoftheanauthor.Thetimeofpaperiwaspublishedisti.Thedebuttimeofauthor
iisTi,whichmeansthetimewhenthescholaripublishedthefirstpaperintheAPSdataset.Nirepresentstheauthori’s
collectionofallpapersintheAPSdataset,andweuse|Ni|toindicatethenumberofallhispapers.Weusecitoindicatethe
totalnumberofcitationsforpaperi,anduseCitorepresentthetotalnumberofcitationsofauthori(Itshouldbenotedthat
ourcitationrelationshipislimitedtotheAPSdatasetweused),sowehaveCi=
j∈Nicj.Inthispaperweusethesecond subscripttoindicatethecomponentoftheindicatorinacertainyear.Forexample,Nitrepresentsthenumberofpaperspublishedbyauthoriinyeart,sowehaveNi=
tNit.Similarly,Citrepresentsthenumberofcitationstheauthorireceived
intheyeart,sowealsohaveCi=
tCit.Inaddition,weuseRttoindicatethetotalnumberofreferencesfortheyeart,thatis,thesumofthenumberofreferencesofthepaperspublishedinyeart.
Inthispaper,weaimtoidentifyamongallscholarswhoareeligiblefortheNobelPrizeandcomparetheresultswiththe realNobelPrizedataset.Inordertoquantitativelycomparetheidentificationperformanceofdifferentmethods,weusethe followingfourmetrics:
• Precision.Precision= |q
p|
|q| .Theprecisioniswidelyusedinidentificationtasks(Lü&Zhou,2011),qisthesetofreal
laure-atesandpistheresultpredictedbythealgorithm,where|q|=|p|.Wecallq∩pasidentifiedlaureatesandcomplementary setqpasunidentifiedlaureates.Thisindexisthemoststraightforwardindextomeasuretheidentificationresult.
How-ever,thisindexcannotaccuratelymeasurealgorithmswhenthesizeoftestdatasetissmall,suchastheNobelPrizedata setweusedinthispaper,soweprovidethesecondmeasurement.
• Averageranking(AR).Inordertosolvethelow-resolutionproblem,weusetheaverageranking.Thismetricisalsoused inRadicchietal.(2009)tomeasurethequalityoftheiridentificationmethod.Givenanevaluationindex,afterwehave therankingsforallscholarsaccordingtotheirscorescalculatedbytheindex,weaveragerankingsofallreallaureates. Itshouldbenotedthatwhensomescholarsgetthesamescores,therankingofthesescholarswillallbereplacedbythe meanvalueoftheirrankings.Unlikeprecision,ARconsiderstherankingofalllaureates,soitisnotaffectedbythesize ofthetestdataset.However,whenweuseit,wefindthatthismetricalsohasaproblem.Whentheaverageranking resultofanalgorithmisparticularlyhigh,theremaybetworeasons.Itmaybebecausethealgorithmhasabetteroverall performance,oritmaybebecausetherankingofafewdatapointsincreasesharply,resultinginanincreaseintheaverage ranking.Inordertoavoidthiskindofmisjudgment,weintroduceanothermetricRIR.
• Rankingincreaserate(RIR).RIR= 1 N
iı (i).Whencomparingtherankingofthetworankingmethods,besidetheaverage
ranking,wecanuseRIRtomeasurehowmanyNobellaureatesinourcaseareimprovedinranking.ıiscomparingfunction. Herewedirectlyabbreviatetoı(i),representingwhetherthenewmethodcanimprovetherankingforscholari.Ifthe rankingishigherthantheoriginalmethod,ı(i)isequalto1,otherwiseitis0.ItcanbeseenthattheRIRdoesnotcare abouttheoptimizationdegreeoftheranking,butfocusesonthescopeofoptimization.
• Top-n%.Finally,inordertoenrichtheresults,weusetheTop-n%metrictofurtheranalyzetheresults.Thismetricwas alsomentionedandusedinRadicchietal.(2009).nrepresentstherankingthreshold.Inthispaper,weusefourlevelsof n:0∼0.1%,0.1%∼1%,1%∼10%,10%∼100%.Thenineachintervalwecounthowmanyreallaureatesareincluded.Itiseasy toknowthatthemorethereallaureatesareassignedinthehigherrankingsinterval,thebettertheidentificationresult willbe.
AnexampleofcalculatingconcretevaluesforbothARandRIRisgivenbelow.Forexample,herewehavesevenscholars, named‘a’to‘g’.Theiroriginalscoresandoriginalrankingsarescoresandrankingscalculatedbyoriginalmethodrespectively. Andtheirnewscoreandnewrankingarecalculatedbynewmethodrespectively.DetailedvaluesaregiveninTable1.We assumethat‘b’,‘d’and‘f’arereallaureates.Usingtheoriginalmethodforprediction,thetopthreescholarsobtainedare ‘a’,‘b’and‘c’,sotheprecisionis0.33,whilethepredictionresultsofthenewmethodare‘d’,‘b’and‘e’,andtheprecisionis 0.67.Basedontheoriginalmethod,theaverageranking(AR)is4.Afterusingthenewmethod,theARofthenewmethodis improvedto2.5.Finally,both‘d’and‘f’areimprovedintheirrankings,sotheRIRisequalto0.67.
Fig.1.StatisticalresultsontheAPSdataset.Subgraph(a)isthescatterplotofthecareerstartyearofresearchersversusthetotalcitationstothem.(b) isthescatterplotofthecareerstartyearofresearchersandtheirrescaledcitationcount.(c)Histogramoftop-0.1%scholarsrankedbyrescaledcitation countandcitationcount.Hereweassignalltop-0.1%scholarsintoseveraldifferentdecadeperiodsinsteadofeverysingleyear.(d)Thenumberofnew authors(ornewpapers)ineachyearfortheAPSdata.They-axisisthelogarithmofthequantity.Wecanseethatthenumberofscholarsandpublications increasesexponentiallyovertime.
3. Model
Inthissection,wewillproposeournewmethodforidentifyingtheprize-winningscientistsbyconsideringcompetition inscientistsranking.Westartwiththebasicmethod,citationcount(CC),whichcanbecomputedbydirectlycountingthe totalcitationsofallscholarsinAPSdataset.AsshowninFig.1(a).Eachdotrepresentsascholar.Wecanseethattheresult distributionofCCcanbedividedintoarisingphaseandadecreasingphase(excepttheobviousgapcausedbytheWorld WarIIaround1945year).Thesetwophasesarecausedbycompetitionandpreferentialattachmentrespectively.
AsshowninFig.1(d),theacademicpopulationincreasesovertime,whichdirectlyleadstohighercitationsofscholars, showninFig.1(a).Thismakesthedistributionoftopscholarsseriouslyunevenovertime.Wecanseethebluehistogram inFig.1(c),by1980,thedistributionoftop-0.1%ofscholarshasapositivecorrelationwithtime.Afterthepeakaround 1980,scholars’citationcountandthenumberoftopscholarsdecreasesharplyyearbyyear.Thistrendismainlycausedby preferentialattachment.Manymethodshavebeenproposedforcorrectingthebiasofthepreferentialattachmentproblem, suchasrescaledcitationcount(Newman,2009),rescaledPageRank(Mariani,Medo,&Zhang,2016),citationrank(Walker, Xie,Yan,&Maslov,2007),etc.TakingrescaledPageRank(Marianietal.,2016)asanexample,thiseffectcanbeeliminated byonlycomparingascholarwithhispeers,andtheresultcanbeseeninFig.1(b)andtheorangehistograminFig.1(c).After usingtherescaledmethodtoeliminatetheeffectofpreferentialattachment,thedownwardtrendhasgone,leavingonlya clearupwardtrend.Itmeansthat,althoughwehavesolvedthepreferentialattachmentproblem,theeffectofcompetition stillexists.Scholarsfromdifferenttimeperiodsarestillnotatthesameleveltobecompared.Moreover,thedistribution shapeofthreeotherpopularindicatorsaresameasCC.ResultscanbeseeninFig.3.Thismeansthatthosethreeindicators donotconsidercompetitioneither.
Competitioncanbeillustratedwithasimpleexample,thenumberofscholarswhodebutedinyear1970is2,127,the averagecitationsofscholarsinthatyearis111.1,andthestandarddeviationofcitationsis295.Accordingtothe3principle, thereare48outstandingscholars(thatis,theircitationsaregreaterthanthemeanvalueplus3timesofstandarddeviation). In1910,only27scholarsdebutedinthisyear,andonlyonepersoncanbethoughtanoutstandingscholarwithtotal1,003 citations,evenlessthantheminimumofthe48scholarsfrom1970.Therefore,iftheNobelPrizeusescitationastheir criterion,allscholarsfrom1910willhavenochancetobenominated.Inreality,fortheoutstandingscholarsin1970itwas 48timesmoredifficulttowintheprizethanforthosein1910,becauseofthecompetitionbetweenalargenumberofpeers from1970.ThisphenomenonwasalsoobservedbyGingrasandWallace(2009).
Insummary,withtheacademicenvironmentcontinuouslychanging,traditionalstaticacademicindicatorsareineffective inidentifyingscholars’achievements(Fire&Guestrin,2019),becausetheyusuallyignorethecompetitioneffect.Therefore, weneedtodesignanewmethodforevaluatingtheoverallachievementofscholars,whichcantacklethecompetition problem.
WeusetheFig.2toexplainouridea.Asshowninthefigure,weassumethatbothscholariandscholarjdebutedinyear 1940,andthetotalcitationsofthemare100until2010.The100citationsofscholaricamefromyear1960,andthecitations
Fig.2.Theillustrationofthecompetitioneffectwhenevaluatingtheachievementoftwoscholars.Weassumetherearetwoscholarsiandj.Until2010,they bothhave100citations,butireceivedthesecitationsin1960andjreceivedthesecitationsin2000.Ifweevaluatethetwoscholarsfromtheperspective ofcompetition,scholariaremoreoutstandingthanscholarj,asitismoredifficulttoobtain100citationsin1960.
ofscholarjcamefromyear2000.AccordingtotheAPSdataset,thetotalnumberofreferencesfromallpaperspublished inyear1960was10,727,whilethetotalnumberofreferencesinyear2000was170,139.Sogetting100citationsin2000is mucheasierthanthatinyear1960.Inordertosolvethisdifference,weproposethefollowingformula.
C∗ i =
t Cit (Rt)˛ (1) WhereC∗i isournewindex.InordertocalculateCi∗,weconsidertheauthor’scitationsineachsingleyearasCit.Accordingto
theaboveanalysis,theachievementsofascholarinyeartdonotdependontheabsolutecitationsreceivedinthisyear,but onarelativevalue.Inordertoincreasethegeneralityofthemethod,weaddanadjustableparameter˛.Alarger˛indicates thatourmethodgivesmoreweighttocompetition.Itisnaturalthatwhen˛=0,theEqn(1)degeneratesintocitationcount. Therefore,wecanregardCCasaspecificcaseofourmethod.Wecanalsoseethatifweusetraditionalmethods,suchasCC, H-index(assumingthatbothscholarshaveonlyonepaperinthesametime)orrescaledmethodwementionedabove,itis impossibletodistinguishscholariandscholarj,becausetheyallignorethecompetitioneffect.
4. Results
Inthissection,wemainlypresenttheexperimentresultsontheNobelPrize,becauseofthethreeadvantagesmentioned above.Meanwhile,toenrichtheresults,wealsobrieflyshowtheresultsofthreeotherprizesattheendofthissection. 4.1. Methods
First,weintroduceallmethodsweusedformakingacomparison.Asmentionedabove,aplethoraofquantitativemetrics existandcouldbecomparedinprinciple,butourgoalistoshowthatourmethodscansolvethecompetitioneffectproblem effectively.Therefore,inthispaperourfocusisnarrowedtotraditionalstatisticalmethods.Sowefirstbrieflyintroducefour traditionalfamousmetrics:
• Citationcount(CC).CCisabasicacademicstatisticalindicator,whichcanstraightforwardlyreflecttheattentionofscholars’ scientificwork.
• Papercount(PC).Thatis,uptoacertainyear,howmanypublicationsthescholarhas.Thisindicatorisalsocalledthe academicproductivity,anditcanmeasurescholars’academicactivityandproductivity,butignoretheirpublications quality.
• H-index(Hirsch,2005).Currently,H-indexisamainstreamevaluationmethodforscholars,whichcombinesacademic productivityandacademicqualitytoevaluateascholarcomprehensively.H-indextreatsallpapersofascholarequally, whichwillignoretheimportanceofhighqualitypapers,soitisnotfairforevaluatingthescientificperformancefor scholarswithafewhighlycitedpapers.Toovercomeit,G-indexwasproposed.
• G-index.Egghe(2006)proposedtheG-index,whichisdefinedasthelargestnumbergofindividualpublicationsthat togetherhaveatleastg2citations.SoG-indexcangivemoreweighttopaperswithhighcitationsthanH-index.
Thenweintroduceourtwomethods:
Fig.3. Scatterplotoffourbasicindicatorsversusthecareerstartyearofresearchers(a:citationcount,b:papercount,c:H-index,d:G-index).Ineach subgraph,eachpointrepresentsascholar.ThereddotsandthegreendotsrepresenttheNobelPrizelaureates.Thereddotsrepresentthescholarsthat arecorrectlyidentifiedastheNobelPrizelaureateswhilethegreendotsrepresentthefailedcase.AllblackdotsarescholarswhohavenotwontheNobel Prize.
• Competition-awarecitationcount(CCC),seeformula(1).AsavariantofCC,theresultsofCCCaremainlycomparedwith CC.Actually,theEqn(1)canbegeneralizedintoaframework,thatis,wecanuseanyacademicindicatorstoreplaceCC, thensplititbyyear,anddividebythecorrespondingcompetitionfactor,namelythetotalreferencesinoneyear.Therefore, inordertofurtherillustratetheeffectivenessofourmethod,wealsoapplyittothenumberofpapers(whichiscalledthe academicproductivityofscholars)asanadditionalexample.
• Competition-awarepapercount(CPC).Itisnaturaltoconsiderthatthecompetitionfactorfortheacademicproductivityof scholarsisthetotalnumberofpaperspublishedintheoneyear.Wethusproposeacompetition-awarepapercountmethod, mathematicallyexpressedasEqn(2).Wecanseethatwhen˛=0,P∗
i degeneratesintoapureacademicproductivityofa
scholar.Therefore,theEqn(2)isanimprovedversionofthescholar’sacademicproductivity.IfCPCworkswell,itcannot onlyexplaintheexistenceofthecompetitioneffectandtheeffectivenessofourmethod,butalsoexhibitstherobustness ofourmethod.Afterreducingalotofusefulinformation(withoutinformationofthecitationsofpaperscomparedwith CCandPC),ourmethodcanstillhavehighaccuracy,whichmeansanyacademicstatisticalindicatorscanbeimprovedby introducingthecompetitionconcept.AsavariantofPC,theCPCresultsaremainlycomparedwithPCbelow.
P∗ i =
t Pit (Nt)˛ (2) 4.2. AnalysisresultsAsshowninFig.3,wepresenttheresultsoffourclassicalmethods(CC,PC,H-indexandG-index).First,Fig.3givesusan intuitivefeelingaboutthedistributionofallscholarsandtherankingpositionsofallreallaureates.Itcanbeclearlyseenthat althoughtheNobelPrizelaureatesaregenerallyrecognizedasthescholarswiththelargestcontributions,thisdoesnotmean thattheirindicatorsarealwaysthebest.ActuallymanyNobelPrizelaureatesareatverylowpositions.Ofcourse,thismaybe causedbythelimiteddatasetandincompleteinformation,butmoreimportantly,thismeansthatdeterminingNobelPrize laureatesisaverydifficulttask,andtheprecisionwillnotbeveryhigh.Then,aswementionedabove,thedistributiontrend ofthefourindicatorsisbasicallythesame,allofwhichshowseriousbiasesontime.Thismeansthatthesestaticindicators cannolongerevaluateacademicachievementsfairly,duetothefactthattheacademicenvironmentchangescontinuously (Fire&Guestrin,2019).ThedetailedsimulationresultsofthefourmethodsareshowninTable2.Fromtheperspectiveof precisionandaverageranking,wecanseethatG-indexperformsslightlybetterthanH-index,andtheCCmethodisthebest amongthem.Unsurprisingly,thepapercountistheworstindex,becausePConlyconsiderstheproductivitybutignoresthe quality.
TheresultsoftwonewmethodsCCCandCPCareshowninFig.4.Itshouldbenotedthatsinceournewmethodscontains atunableparameter˛,inourexperimentsweset˛from0to1,andcalculateevery0.01step.TheresultofCCCisshown inthefirstrowofFig.4.First,fromtheperspectiveofprecision,comparedwiththeoriginalCCmethod(when˛=0),most ofthe˛valuescanimprovetheidentificationprecision(exceptforasmallrangearound0.3),andtheoptimalprecision
Table2
Quantitativeresultsforallawards.Forthesakeoffairnesswhencomparingallawards,weadjust˛=0.64forCCCmethod(Experimentalresultsshowthat when˛=0.64,CCChasthebestpredictionprecisionontheNobelPrizedataset.Inordertocompareperformancewithoutlossofgenerality,wewillfix thisparameteronthethreeotherdatasets).
Prize CCC CC H-index G-index PC
Percision AR RIR Percision AR Percision AR Percision AR Percision AR
NobelPrize 0.222 21,370 0.717 0.139 30,617 0.095 33,761 0.114 32,440 0.013 35,921
EnricoFermiPrize 0.120 28,145 0.820 0.000 44,736 0.000 47,582 0.020 45,762 0.020 46,160 MaxPlanckMedal 0.069 20,768 0.724 0.034 37,737 0.017 48,652 0.017 46,287 0.017 53,448
WolfPrize 0.100 15,917 0.560 0.060 18,108 0.060 23,062 0.060 24,505 0.020 29,953
Fig.4. Dependenceoftheresultsontheparameter˛.Thesubgraphsa,bandcaretheresultsofourCCCmethod,andthesubgraphsd,eandfaretheresults oftheCPCmethod.Theleft,middle,andrightsidesrepresentprecision,averagerank,andrankingincreaseraterespectively.ThebestprecisionofCCCis 0.222,andthecorresponding˛is0.64.AndthebestprecisionforCPCis0.146withthe˛equalsto0.72.Asthe˛increases,theaveragerankingofthetwo methodsaremonotonicallyincreasing,whileboththerankingincreaseratesshowmonotonousdecreasingtrends.
0.222isreachedwhenalphaissetto0.64.Aswecansee,theaveragerankingshowsamonotonousincreasingpatternas˛ increases.However,thisdoesnotmeanthatthegreater˛isthebettertheresultwillbe.AswecanseefromtheRIRcurve, itshowsamonotonousdecreasingtrend.Thismeanswhenweincrease˛,theriseinaveragerankingsisduetoasmall numberofNobellaureatesmovingupsignificantly,notoveralloptimization.ThisphenomenonalsoindicatesthatARisnot aneffectiverankingmethodforidentifyingprize-winningscientists.Tosumup,ourCCCmethodshowsagoodperformance, whichindicatestheadvantageofintroducingthecompetitionmechanismintotheauthorrankingtask.
AsforCPC,itcanalsoprovetherobustnessofourmethod.Itiswellknownthatthereisapositivecorrelationbetween CCandPC(Bayer&Folger,1966).ComparedtoCC,PConlyaccountsforthequantityofpapersofscholarsbutnotquality information.IfourmethodstillhasgoodperformanceonPC,itindicatesthatourmethodisrobustagainstlowqualitydata. TheresultsofCPCareshowninthesecondrowofFig.4.SimilartotheresultsofCCC,theprecisionofCPCincreasesfirstand thendecreases.WecanseethattheRIRvaluestaysabove70%allthetime,whichmeansthatourmethodcaneffectively improvetherankofmostNobelPrizelaureates.ItshouldbenotedthatthereisagapattheleftsideofthetwoARcurvesin Fig.4(e).ThisisbecauserankingscholarsbyusingPCmaymakemanyscholarsindistinguishable.Forexample,thereare892 scholarsowning20papersinourdataset.Howeverafterusingournewindicator,thesescholars’scoreswillbecompletely differentandtherankingsoflaureatesbecomebetter.Itisworthnotingthatthereisagreatimprovementofprecisionin Fig.4(d)inwhichtheoptimalvalueoftheprecisionis0.146(when˛=0.72).Thisprecisionis11timeshigherthanusing theproductivitytoidentifyNobellaureates(when˛=0,precision=0.0127),anditisevenbetterthantheprecisionofCC (precision=0.139).Thisresultshowsthatafterconsideringthecompetitioneffect,ifweonlyuseproductivitytoidentify,it canalsoachievethesimilareffectasthecitationcount.
InordertocompareCCandCCCfromamoreintuitiveperspective,weusetherankingresultsofthetwoindicatorsto makeascatterplot,asshowninFig.5(a).Ifascholarisabovetheblackdiagonalline,itmeansthathis/herCCCrankingis betterthanCCranking,andthefartherthedeviatesfromtheline,themorehis/herrankingisimprovedbyCCC.Ascanbeseen fromFig.5,themajorityofNobelPrizelaureates(reddots)areabovethediagonalline,indicatingthatCCChasimprovedthe rankingofmostNobelPrizelaureates,whichistheresultofoverallimprovementratherthanlocalimprovement.Although someoftheNobellaureates(reddots)arebelowtheblacksolidline,thedeviationsaremuchsmallerthanthoseforthe dotsaboveit.Thatisactuallybecause,thosescholarswhoserankingsarenotpromoteddonotdropsignificantly,andthey
Fig.5.Thecomparisonofrankingresultsofdifferentmethods.Itshouldbenotethatwetakelogarithmforallrankings.(Left)Thex-axisistheranking ofascholarcalculatedbyCCC(˛=0.64),andthey-axisistherankingofascholarbyusingtheCC.(Right)Thex-axisisrankingofascholarcalculated byCPC(˛=0.72),andthey-axisistherankingofascholarbyusingthePC.Inthesetwofigures,eachdotrepresentsascholar.Theredmarkersandblue dotsindicatetheNobelPrizelaureatesandotherscholarsrespectively.Blacksolidlinesarey=x,indicatingthatthedotonithasthesamerankingsintwo methods.Theblackdashedlinesindicatetheboundariesofidentification,thatisx=ln158andy=ln158,sotheleftsideoftheverticaldashedlineindicates theidentificationresultofournewindicator,andthelowersideofthehorizontaldashedlineindicatestheidentificationresultofthetraditionalmethod.
canstillretaintheiroriginalrankings.Takingtwoscholarsasexamples,thefirstscholar‘ArthurH.Compton’andthesecond schoar‘CarlE.Wieman’whoaremarkedinthefigure.AmongalltheNobelPrizelaureatesidentifiedbyusingCCC,‘Arthur H.Compton’hasthelowestCCranking,hisCCCrankingandCCrankingare72and10,141respectively.Thisisasignificant improvementintheranking,whichlifts‘ArthurH.Compton’intotop-1%group.Meanwhile,amongalltheNobelPrize laureatesidentifiedbyusingCC,‘CarlE.Wieman’hasthelowestCCCranking,hisCCCrankingandCCrankingare453and 145respectively.Althoughthereisadropinhisranking,hestillremainsinthetop-1%group.
Inaddition,aswecanseethatthefigureisdividedintofourregionsbytwodashedlines.Thelowerleftareaisthe commonidentificationareaoftwomethods,thatis,thescholarsidentifiedbythetwomethodsatthesametime.Thelower rightareaandupperleftarearepresenttheadditionalidentificationareaofCCandCCCrespectively.WecanseethatCCC canretainthemostoflaureatesidentifiedbyCC(onlyfivereddotslocatedatthelowerrightarea).Asshowninthefigure, take‘StevenWeinberg’and‘PhilipW.Anderson’astwoexamples,nomatterwhichmethodisused,thesetwoscholarsare alwaysrankedveryhigh.Atthesametime,intheupperleftcorner,wecanseeCCCsuccessfullyidentifyother18Nobel Prizelaureates.ThisshowsthatCCCisasuccessfulextensionmethodofCC.TheresultsofCPCandPCareshowninFig.5.The sameastheleftone,mostoftheNobelPrizelaureates(reddots)arelocatedabovetheblacksolidline,andtheirdeviations aremuchlargerthanthereddotsbelowit(takethefourscholarsmarkedinthefigureasexamples).TherearenoNobel PrizelaureatesinthecommonidentificationareaofPCandCPC,mainlybecauseofthepoorperformanceoftheoriginalPC method.Inotherwords,thenewmethoddoesnotinheritanyusefulresultsfromPC.
Finally,weusetheTop-n%metrictocomparetheresultsofvariousmethods.Inordertoenrichtheresults,wealso usethreeotherfamousphysicsawards,theWolfPrize(50laureates),theEnricoFermiprize(50laureates)andtheMax PlanckMedal(58laureates).ComparedwiththeNobelPrize,thesethreeprizeshaveasmallernumberoflaureates,and thedistributionofthewinningtimeisnotaswideastheNobelPrize.Herewedonotplotthesethreeprizesseparately, butputalltheawardstogether(specificresultscanbefoundinTable2).Finallywehave272laureatesfromfourawards, someofwhommaywonmultipleawards.Inthetop-n%metric,weexpectthatthebetterthealgorithmperforms,themore awardedauthorswillbefoundinthetoprankpercentage.AswecanseefromtheFig.6,ourcompetition-awaremethod alwaysperformsbetter.Thetwocategories‘below0.1%’and‘0.1%-1%’alwayscontainmorelaureatescomparedtoother methods.Meanwhile‘1%-10%’and‘above10%’alwayshavefewerlaureates.Thismeansthatourmethodwouldidentify mostoftheawardwinningauthorsbyplacingthoseatarelativelyhighranks.Herewealsouseatabletoshowmoredetails ofperformanceonthefourawards,seeTable2.Weadjust˛to0.64.Fromtheresults,wecanseethatourmethodperforms wellonbothfourdatasets,andparameter˛seemstobeuniversal(Althoughinaspecificaward,0.64maynotbethebest choice,itstillworksbetter).
4.3. Theroleof˛
WhenweintroducetheEqn(1)proposedinsection3,weregard˛astheparameterthatisusedforadjustingthedegree ofattentionofthecompetition.Alarger˛indicatesthatstrongereffectofcompetitionisconsidered,whichisthemost intuitiveinterpretationof˛.Atthesametime,accordingtotheresultsinsection4.2,wecanfindthatalthoughmost˛can
Fig.6. Thetop-n%measurementon4awardsinphysics(theNobelPrize,theMaxPlanckMedal,theWolfPrize,andtheEnricoFermiPrize).Wefix˛=0.64 forCCC.Theperformanceofascholardependsonthepercentileofthescholar’sranking.Thelowerthispercentageis,thebettertheperformanceofthe consideredscientistis.Theheightofthebarindicatestheproportionoflaureates.TheleftfigureonlyconsiderstheNobelPrize,whiletherightfiguretakes intoaccountallfourprizes.WecanfindnomattertheNobelPrizeorallprizesweconsidered,ourcompetition-awaremethodperformsbetterthanothers.
Fig.7. WeuseKolmogorov-Smirnovtest(seemaintextforexplanation)toexplaintheroleofparameter˛.Thefigureontheleftsideisthecurveof KSscoreswithparameter˛,andthesmallinsetgraphisthecurveofp-values.Thethreesubgraphsontherightsidearethecomparisonsbetweenthe distributionoftheidentificationresultsunderthreedifferentparametervalues(˛=0,0.64,1)andthedistributionoftherealNobelPrizelaureates.Black barrepresentsthedistributionoftheNobelPrizelaureatesandredbarrepresentsthedistributionofourresults.
improvetheidentificationeffect,thereisalwaysabestchoiceof˛.Whyistherealwaysabestresultafteradjusting˛toa specificvalue,andhowdoes˛affecttheidentificationresults?Inthissection,weaimtoillustratetheeffectof˛.
˛isactuallyhelpingourmethodtofindthebestoutputdistribution,whichistheclosesttothedistributionofreal NobelPrizelaureates.Tobetterillustratethispoint,weuseKolmogorov-Smirnovtest(KStest)(Hodges,1958).Thistest isanonparametrictestoftheequalityofcontinuous(ordiscontinuous),one-dimensionalprobabilitydistributionsthatcan beusedtocomparetwosamples.ThelargertheKStestscore,thelargerthedistancebetweenthetwodistributions;the smallerthescore,themoresimilarthetwodistributionsare.Meanwhile,theKSscorealsogivesthecorrespondingp-value foranalyzingthesignificanceofresult.Theprocedureofexperimentisthesameasinsection4.2,where˛isfrom0to1,and thedistancebetweenthedistributionoftheidentificationsetandthedistributionoftherealNobelPrizesetiscalculatedin every0.01.
TheresultsareshowninFig.7.Wecanclearlyseethatunderthechangeof˛,theKSscoredecreasesfirstandthen increasessharply.Theminimumvalueofthedistanceappearsaround˛=0.6,andasmallerscoremeansthedistributionof ouridentificationcanmatchthedistributionoftherealNobelPrizebetter.InthethreehistogramsontherightsideofFig.7, weshowthedistributionsofourresultsunderthreespecialcasesof˛andthedistributionoftherealNobelPrizelaureates. Inthefirsthistogram,weset˛=0,whichmeansthattheredbarshowsthedistributionoftheCCresult.Itcanbeseen thatidentificationtendstoconcentratearound1970.Thisiseasytounderstand,sincewehaveanalysedinFig.3insection 4thatthedistributionsofthefourtraditionalindicatorsarebasicallythesame,withpeaksaround1970.Therefore,the distributionsoftopscholarscalculatedbytraditionalindicatorsalwaysappearnear1970.When˛=0.64,theidentification
Table3
Logisticregressionresults
Crossvalidation CV-1 CV-2 CV-3 CV-4 Average
A AR 8,242 9,829 6,405 6,105 7,646 F1-score 0.016 0.015 0.016 0.016 0.016 Precision 0.225 0.200 0.100 0.051 0.144 B AR 7,012 7,651 5,840 5,061 6,391 RIR 0.525 0.650 0.625 0.550 0.588 F1-score 0.026 0.022 0.026 0.021 0.024 Precision 0.250 0.250 0.150 0.077 0.182 C AR 5,840 5,356 4,305 4,580 5,020 RIR 0.725 0.825 0.725 0.667 0.735 F1-score 0.030 0.026 0.031 0.030 0.029 Precision 0.275 0.300 0.175 0.103 0.213 CCC AR 6,542 6,528 5,013 3,232 5,329 RIR 0.625 0.825 0.700 0.692 0.711 Precision 0.250 0.300 0.200 0.053 0.201
precisionreachestheoptimalvalue(0.222).Ascanbeseenfromthesecondhistogram,thematchingresultsoftheredbar andtheblackbararesignificantlybetterthanthefirstcaseandthedistributionofredbarbecomeswider.Itisparticularly noteworthythatbetween1900and1940,CCpredictsthatthenumberoftopscholarsinthisperiodisalmostzero,and thissituationisimprovedverywellafterincreasing˛to0.64.Inthethirdhistogramweshowtheresultswhen˛=1.0.It canbeseenthatthealgorithmpaystoomuchattentiontothecompetition,andthedistributionofthepredictionmainly concentratesbetween1900and1940.ItcanalsobeseenfromthecurveofKSscorethatwhenthecompetitionplaystoo muchroleinourmethodwhilethecitationsareignored,theKSscoreincreasesrapidlyto0.67.
Ingeneral,throughtheaboveanalysis,webelievethattheparameter˛isactuallyusedtoadjustthedistributionofthe identificationresult.Anoptimal˛makessurethatthereisnoobviousbiasontimewhenwemakeidentification,andthe outputdistributionismoreclosetothedistributionofreallaureates.
4.4. Furtherexplorationwithmachinelearning
Attheendofthepaper,wewanttoillustratethatourcompetition-awaremethodcanalsobecombinedwithmachine learningmethods,which deservesmore exploration.Wetakethelogisticregression (Owen,2007)asanexample.We considerthetaskofpredictingNobelPrizelaureatesasaspecialbinaryclassificationtask.Weuselabel1toindicatethe NobelPrizewinner,andlabel0toindicatescholarwhodoesnotwintheNobelPrize.Thispredictiontaskisveryspecial becauseitstrainingdatasetisextremelyimbalanced.Thesizeofthepositivesamplesintrainingdatasetisonly158but Thesizeofthenegativesamplesisabout230,000.Soduringthetrainingphase,weuseweightedmethodstogivegreater weighttothepositivesamplestosolvethisproblem.
Toillustratetheimportance ofthecompetitionmechanism,wegivethree trainingstrategies formachine learning experiments.
• (A)Citationcount.Takethecitationcountastheuniquefeatureofsampletomakeprediction.
• (B)Takecitationsineachsingleyearasasinglefeature.Sointhiscaseweusevectorstorepresentdifferentsamples. • (C)BasedoncaseB,weusetheconceptofcompetitiontonormalizeallfeaturevectors.SeeEqn(1)fordetails.
Inaddition,inordertocomparepredictionresultsmorecomprehensively,weusefourfoldcrossvalidationsandfour metricsAR,RIR,Precision,andF1-scoretomeasuretheperformanceofalgorithm.
ItcanbeseenfromtheTable3thatstrategyCachievesthebestperformance,andstrategyAhastheworstresult.Compared withstrategyA,thetrainingsetinstrategyBisupgradedsuchthateachsamplegetsextrainformationabouttime.This extrainformationisveryhelpfultoimprovethepredictionprecision.ComparedtostrategyB,wecanseethatinstrategy Cthepredictionprecisionofthemachinelearningalgorithmhasbeenfurtherimprovedafterthecompetitionmechanism isperformed.Infact,forstrategyC,theroleofthecompetitionmechanismisusedasadatanormalizationprocess.This procedureisconducivetoimprovingtheperformanceofmachinelearningmethods,whichindicatesthatconsideringthe competitioneffectisalsohelpfulfordesigningtrainingsetsformachinelearningalgorithms.
Attheendofthetable,wealsoshowthepredictionresultsofthesimplecompetition-awarecitationcount(CCC).It canbeseenfromtheresultsthattheperformanceofCCCissignificantlybetterthanthatofstrategyB,whichmeansthat ourcompetitionmechanismisusefulitself.StrategyChasthehighestaccuracy,whichalsomeansthatmachinelearning algorithmsaresuperiortotraditionalmethodsinrankingorpredictiontasks.Insummary,alltheresultsshowthatthe competitionisacrucialfactorforidentifyingprizewinningscientists.
5. Conclusion
Inthispaper,wemainlydiscusshowtomeasuretheacademicachievementsforscholarsmorereasonably.Weintroduce thecompetitionmechanismthroughanalysisontheAmericanPhysicalSociety(APS)dataset.Withtherapiddevelopment ofmodernscienceandtechnology,thenumberofnewscholarsandnewacademicpapersareincreasingexponentially.This meansthatscientistsarefacingmoreandmorecompetitorsinacademia.Therefore,juniorscholarsmayfindthatitwillbe moredifficultforthemtoobtainthoselifetimeachievementawardsovertime.Meanwhile,thisalsomeansthattraditional academicindicatorsmayhavelosttheirmeaningwhenevaluatingscholarsfromdifferentperiods(Fire&Guestrin,2019). Comparedwithseniorscholars,juniorscholarsareinalargerandmoreactiveacademicenvironment,itiseasierforthem toobtainmorecitationsandattentions,butitdoesnotmeanthattheirperformancesarebetter.Therefore,ifwedonottake thecompetitioneffectintoaccount,itisdifficultforustoevaluatescholars’achievementfairly.
Thenweuseaseriesofexperimentstoproveouridea.Fromtheresults,welearnthataftertakingthecompetitionfactor intoaccount,thepredictivepowerofournewmethodshowsagreatimprovement.Takecitationcountasanexample, thepredictionprecisionincreasesfrom0.139to0.222.Theaveragerankingofallreallaureatesincreasesfrom30,617to 21,370.Moreover,accordingtotheindicatorRIR,71.7%ofthelaureates’rankingsareimprovedbyusingournewmethod. Ournewmethodistheoptimalmethodamongthetraditionalindicatorswehavelistedforcomparison.Atthesametime, wealsomadeafurtherstepbyusingthepapercounttoprovethatourmethodhasenoughrobustnessandgenerality.The resultsshowthatwhenthenumberofpapersistheonlyknowninformationtodeterminetheNobelPrizelaureates,our newmethodstillhasasignificantimprovement.Theprecision0.146isreachedwhencompetitionistakenintoaccount.The averagerankingofallreallaureatesrisesfrom35,921to18,462.Moreover,accordingtotheindicatorRIR,76.1%ofthereal laureates’rankingsareimproved.Insummary,theaimofourmethodisnottoreplaceotherrankingtechniques,optimized andalmostperfectedinthecourseofmanyyears.Wewanttoillustratetheexistenceofthecompetitionfactorandsolve itbyasimpleway.Ourresearchhasimportantpracticalvalue,suchasprovidingvaluablereferencesforacademicawards, selectingsuitablescholarsasfaculties,andpromotingoutstandingacademicstaff.
Therearealsonumerousmeaningfulextensionsforthefutureresearch.First,onecantrytocombineourideawith othermethods,suchasthediffusionbasedmodelsdiscussedinintroduction.Thiskindofmethodscanusemorestructure informationofcitationnetwork,whichwillbenefittheresults.Similartocompetition-awarecitationcount,wecouldadjust theimpactofeachpaperaccordingtothecompetitionmechanism.TakingPagerankasanexample,wecangiveeachout edgeofapaperaweightbasedonitspublicationyear,andthisweightisdeterminedbythetotalnumberofreferencesor thenumberofpapersinthatyear.Second,itiswellknownthattherearemanybranchesinthephysicstoday.Theimpact ofcompetitionwithinthesebranchesisdifferentfromeachother.Therefore,whencomparingthescholarsfromdifferent academicdomains,itwillbefairer,ifweperformanalysistodistinguishdifferentdomains.Forexample,asmentionedby Zengetal.(2019),onecanusethecommunitydetection(Fortunato&Hric,2016)todetectdifferentresearchbranches oncitationnetwork.Thesizeofthosecommunitiescanrepresentthedegreeofcompetition.Sobesidesthecompetition betweendifferentperiods,wecanalsocalculatethecompetitioninsidedifferentacademicdirections.Thesetwofactors togetherdecidetheweightofedgesinthecitationnetwork.Inadditiontoimprovingtheprediction,onecanalsoanalyze thestrengthofcompetitionbetweendifferentfieldsaccordingtothefittingresults.
CRediTauthorshipcontributionstatement
YuhaoZhou:Software,Validation,Writing-originaldraft,Writing-review&editing,Formalanalysis.RuijieWang: Writing-originaldraft,Writing-review&editing.AnZeng:Conceptualization,Methodology,Writing-review&editing, Formalanalysis,Datacuration.Yi-ChengZhang:Conceptualization,Supervision.
Acknowledgment
TheauthorswouldliketothankMatúˇsMedoforhelpfuldiscussion.ThisworkwassupportedbytheSwissNational ScienceFoundation(GrantNo.200020-156188).
References
Abrishami,A.,&Aliakbary,S.(2019).Predictingcitationcountsbasedondeepneuralnetworklearningtechniques.JournalofInformetrics,13,485–499.
Bar-Ilan,J.(2008).Informetricsatthebeginningofthe21stcenturyałareview.Journalofinformetrics,2,1–52.
Barabási,A.L.,&Albert,R.(1999).Emergenceofscalinginrandomnetworks.science,286,509–512.
Bayer,A.E.,&Folger,J.(1966).Somecorrelatesofacitationmeasureofproductivityinscience.Sociologyofeducation,381–390.
Bianconi,G.,&Barabási,A.L.(2001).Bose-einsteincondensationincomplexnetworks.Physicalreviewletters,86,5632.
Bornmann,L.,&Daniel,H.D.(2005).Doestheh-indexforrankingofscientistsreallywork?Scientometrics,65,391–392.
Chen,P.,Xie,H.,Maslov,S.,Redner,S.2007.Findingscientificgemswithgooglea´rspagerankalgorithm.
Ding,Y.,Yan,E.,Frazho,A.,&Caverlee,J.(2009).Pagerankforrankingauthorsinco-citationnetworks.JournaloftheAmericanSocietyforInformation
ScienceandTechnology,60,2229–2243.
Egghe,L.(2006).Theoryandpractiseoftheg-index.Scientometrics,69,131–152.
Fiala,D.(2012).Time-awarepagerankforbibliographicnetworks.JournalofInformetrics,6,370–388.
Fire,M.,&Guestrin,C.(2019).Over-optimizationofacademicpublishingmetrics:observinggoodharta´rslawinaction.GigaScience,8,giz053.
Fortunato,S.,&Hric,D.(2016).Communitydetectioninnetworks.Auserguide.Physicsreports,659,1–44.
Garfield,E.,etal.(1970).Citationindexingforstudyingscience.Nature,227,669–671.
Gingras,Y.,&Wallace,M.(2009).Whyithasbecomemoredifficulttopredictnobelprizewinners:abibliometricanalysisofnomineesandwinnersofthe
chemistryandphysicsprizes(1901-2007).Scientometrics,82,401–412.
Hirsch,J.E.(2005).Anindextoquantifyanindividual’sscientificresearchoutput.ProceedingsoftheNationalacademyofSciences,102,16569–16572.
Hodges,J.1958.Thesignificanceprobabilityofthesmirnovtwo-sampletest.ArkivförMatematik3,469-486.
Jin,B.,Liang,L.,Rousseau,R.,&Egghe,L.(2007).Ther-andar-indices:Complementingtheh-index.Chinesesciencebulletin,52,855–863.
Light,R.P.,Polley,D.E.,&Börner,K.(2014).Opendataandopencodeforbigscienceofsciencestudies.Scientometrics,101,1535–1551.
Liu,X.,Bollen,J.,Nelson,M.L.,VandeSompel,H.2005.Co-authorshipnetworksinthedigitallibraryresearchcommunity.Informationprocessing& management41,1462-1480.
Lü,L.,Zhou,T.2011.Linkpredictionincomplexnetworks:Asurvey.PhysicaA:statisticalmechanicsanditsapplications390,1150-1170.
Mariani,M.S.,Medo,M.,&Zhang,Y.C.(2016).Identificationofmilestonepapersthroughtime-balancednetworkcentrality.JournalofInformetrics,10, 1207–1223.
Newman,M.E.(2009).Thefirst-moveradvantageinscientificpublication.EPL(EurophysicsLetters),86,68001.
Owen,A.B.(2007).Infinitelyimbalancedlogisticregression.JournalofMachineLearningResearch,8,761–773.
Papadopoulos,F.,Kitsak,M.,Serrano,M.Á.,Boguná,M.,&Krioukov,D.(2012).Popularityversussimilarityingrowingnetworks.Nature,489,537.
Perc,M.(2014).Themattheweffectinempiricaldata.JournalofTheRoyalSocietyInterface,11,20140378.
Price,D.d.S.(1976).Ageneraltheoryofbibliometricandothercumulativeadvantageprocesses.JournaloftheAmericansocietyforInformationscience,27, 292–306.
Price,D.J.D.S.(1965).Networksofscientificpapers.Science,510–515.
Radicchi,F.,Fortunato,S.,Markines,B.,&Vespignani,A.(2009).Diffusionofscientificcreditsandtherankingofscientists.PhysicalReviewE,80,056103.
Redner,S.(1998).Howpopularisyourpaper?anempiricalstudyofthecitationdistribution.TheEuropeanPhysicalJournalB-CondensedMatterand
ComplexSystems,4,131–134.
Shen,Z.,Chen,F.,Yang,L.,&Wu,J.(2019).Node2vecrepresentationforclusteringjournalsandasapossiblemeasureofdiversity.JournalofDataand
InformationScience,4,79–92.
Sinatra,R.,Wang,D.,Deville,P.,Song,C.,&Barabási,A.L.(2016).Quantifyingtheevolutionofindividualscientificimpact.Science,354,aaf5239.
VanHooydonk,G.(1997).Fractionalcountingofmultiauthoredpublications:Consequencesfortheimpactofauthors.JournaloftheAmericanSocietyfor
InformationScience,48,944–945.
Walker,D.,Xie,H.,Yan,K.K.,&Maslov,S.(2007).Rankingscientificpublicationsusingamodelofnetworktraffic.JournalofStatisticalMechanics:Theory
andExperiment,2007,P06010.
Yan,E.,&Ding,Y.(2009).Applyingcentralitymeasurestoimpactanalysis:Acoauthorshipnetworkanalysis.JournaloftheAmericanSocietyforInformation
ScienceandTechnology,60,2107–2118.
Zeng,A.,Shen,Z.,Zhou,J.,Fan,Y.,Di,Z.,Wang,Y.,Stanley,H.E.,&Havlin,S.(2019).Increasingtrendofscientiststoswitchbetweentopics.Nature
communications,10,3439.
Zeng,A.,Shen,Z.,Zhou,J.,Wu,J.,Fan,Y.,Wang,Y.,&Stanley,H.E.(2017).Thescienceofscience:Fromtheperspectiveofcomplexsystems.Physics
Reports,714,1–73.