• Aucun résultat trouvé

Identifying prize-winning scientists by a competition-aware ranking

N/A
N/A
Protected

Academic year: 2021

Partager "Identifying prize-winning scientists by a competition-aware ranking"

Copied!
12
0
0

Texte intégral

(1)

Identifying prize-winning scientists by a competition-aware

ranking

Yuhao Zhou

a

, Ruijie Wang

a

, An Zeng

b,∗

, Yi-Cheng Zhang

a

aDepartment of Physics, University of Fribourg, Fribourg 1700, Switzerland bSchool of Systems Science, Beijing Normal University, Beijing, 100875, PR China

Evaluating scholars’ achievements is an important problem in the science of science with

applications in the evaluation of grant proposals and promotion applications. Since the

number of scholars and the number of scholarly outputs grow exponentially with time,

well-designed ranking metrics that have the potential to assist in these tasks are of prime

importance. To rank scholars, it is important to put their achievements in perspective by

comparing them with the achievements of other scholars active in the same period. We

propose here a particular way of doing so: by computing the evaluated scholar’s share on

each year’s citations which quantifies how the scholar fares in competition with the others.

We assess the resulting ranking method using the American Physical Society citation data

and four prestigious physics awards. Our results show that the new method significantly

outperforms other ranking methods in identifying the prize laureates.

1. Introduction

Measuring the academic impact of scientists is an important research direction in the science of science (Zeng et al. (2017)). With the rapid development of science and technology, more and more scholars have devoted themselves to scientific research. Not only a large number of new academic papers appear every year, but also more junior scholars are emerging. So how to scientifically measure the academic influence of scientists and academic achievements are becoming more and more important. A fair measurement of academic influence can help scientific institutions select suitable scholars as faculties, promote outstanding academic staff, provide valuable references for academic awards, and evaluate grant proposals.

According to a recent review (Zeng et al., 2017), there are three main types of methods for evaluating a scholar’s academic achievements. The first type is traditional static indicators. This kind of methods are the most widely used and studied, such as citation count (Garfield, 1970;Redner, 1998), H-index (Bornmann & Daniel, 2005;Light, Polley, & Börner, 2014), G-index (Egghe, 2006), and so on (Hirsch, 2005;Jin, Liang, Rousseau, & Egghe, 2007). These indicators are generally based on simple statistics, so they require less information and are more convenient to calculate. Meanwhile they also have obvious shortcomings. One example is the famous H-index, which combines the number of citations and the number of papers to measure the impact of a scholar. Although this metric is a combined measure of the quantity and quality of a scientist’s publications, it is strongly dependent on the number of papers. For a scholar who publishes limited number of papers, despite that these papers have high impact, his/her H-index cannot be high. This means that although these traditional methods

∗ Corresponding author.

E-mail address:anzeng@bnu.edu.cn(A. Zeng).

http://doc.rero.ch

3XEOLVKHGLQ-RXUQDORI,QIRUPHWULFV  

ZKLFKVKRXOGEHFLWHGWRUHIHUWRWKLVZRUN

(2)

aresimple,therearelimitationsinthem.Thesecondtypeisnetwork-awaremethods(Bar-Ilan,2008;Yan&Ding,2009). Inthescienceofscience,thecitationrelationshipcanbeseenasacomplexnetwork,whichiscalledscientificnetwork (Price,1965).Throughthenoderankingalgorithms,manyscientificinfluencemeasurementsforcitationnetworkshave beenderived,suchas,PageRank(Chen,Xie,Maslov,&Redner,2007),authorrank(Liu,Bollen,Nelson,&VandeSompel, 2005),personalizedPageRank(Ding,Yan,Frazho,&Caverlee,2009),time-awarePageRank(Fiala,2012)anddiffusion-based scienceauthorrankalgorithm(Radicchi, Fortunato,Markines,&Vespignani,2009),etc.Third,somespeciallydesigned measurementsforevaluatingscholars,suchas,dynamicmethod(Sinatra,Wang,Deville,Song,&Barabási,2016),credit allocationbasedmethod(VanHooydonk,1997)andmachinelearningbasedmethods(Abrishami&Aliakbary,2019;Shen, Chen,Yang,&Wu,2019).

Allthesemethodshavethesamelimitation,thatisthetimebiasproblem.Inordertoobtainafairrankingofscholars, manyresearchworkshavebeendone.Somescholarsfocusonsolvingtimebiasproblemsonasmallscale.Forexample,the time-awarePageRankproposedby(Fiala,2012),wementionedabove,solvesaseriesofsmallscaletimebiasproblemson citationrelationshipbetweenscholars.Theirresultsshowthatsolvingthesetimebiasescanhelptoimprovetheaccuracy ofprediction.Someotherresearchesfocusonlarge-scaletimebiases.Asmentionedabove,withtherapiddevelopmentof scienceandtechnology,theacademicenvironmentiscontinuouslychanging,butthemainstreamevaluationmethodsfor scholarsarealmoststatic,soit’sunfairwhencomparingscholarsfromdifferentages.Forexample,thereisaclassictimebias problemwhichiscausedbythewell-knownpreferentialattachmentmechanism(Bianconi&Barabási,2001;Papadopoulos, Kitsak,Serrano,Boguná,&Krioukov,2012).Inthescienceofscience,thisproblemexistsatboththescholarlevelandthe paperlevel(Price,1965,1976).Thisproblemcanbedescribedas,comparedtojuniorscholars,seniorscholarshavean advantageintimebecausetheystartedtheircareersearlier.Soseniorscholarshavelongertimetoaccumulateattention. Duetotheirreputation,theirworkwillbethoughtmorevaluable,soeventuallytheybecomefamousallovertheworld. Onthecontrary,itishardforjuniorscholarstogetattentions.Manyscholarshavepointedout(Barabási&Albert,1999; Perc,2014)thisproblem.However,accordingtoouranalysis,besidesthepreferentialattachment,thereisanothertime biasissueinevaluatingthelong-termorlifetimeachievementsofscholars.Itcanbesimplyunderstoodaswhencomputing theacademicachievementsofascholar,notonlyhis/herownpapersandcitationsshouldbeconsidered,butalsothepeers duringthesameperiodshouldbetakenintoaccount.Themorescholarsinthesameperiod,themoreacademicpapersare publishedatthisperiod,Soit’seasiertoaccumulatecitationswhilemoredifficulttocompeteforanacademicachievement award.Therefore,itisimportanttoconsidertheacademicenvironmentwhenevaluatingtheachievementsofscholars. Wenamethisphenomenoncompetition.Toexplainandvalidateourtheory,weusethewell-knownAmericanPhysical Society(ASP)dataset(Sinatraetal.,2016).Toverifythatourmethodcandetermineascholar’sacademicachievementmore reasonably,inthispaperweusetheNobelPrizeinPhysics,theWolfPrize,theEnricoFermiPrizeandtheMaxPlanckMedal asbenchmarkdatasets.

Thispaperwillbecomposedofthefollowingparts.Inthesecondsection,wewillbrieflydescribetheAPSdatasetwe used,definesomemathematicalsymbolsandintroducefourmetricsformeasuringexperimentalresults.Inthethirdsection, wefirstperformanalysisonAPSdatasettointroducethecompetitionprobleminevaluatingscholars.Thenweexplainin detailtheexistenceandimportanceofthecompetition.Finally,weproposeanewmethodbasedontheaboveanalysis.In section4,weillustratetheeffectivenessofourmethodbyanalysingtheresultsofexperiments.InSection5,wediscussthe roleoftheparameterinourmethod.Section6istheconclusionofthispaper.

2. Data,symbolsandmetrics

Inthispaper,weanalyzethepublicationdatafromalljournalsofAPS.Thedatacontains482,566papers,rangingfrom year1893toyear2010.Forthesakeofauthornamedisambiguation,weusetheauthornamedatasetprovidedbySinatra etal.whichisobtainedwithacomprehensivedisambiguationprocessintheAPSdataset(Sinatraetal.,2016).Eventually,a totalnumberof236,884distinctauthorsarematched.WeusetheNobelPrizeinPhysicsasthemainvalidationdatasetfor thispaper.TheNobelPrizedatasetincludesallNobelPrizelaureatesfromyear1900toyear2015,andatthesametimewe ensurethattheseNobelPrizelaureateshavepublishedatleastonepaperontheAPSdataset.Eventually,atotalnumberof 158distinctNobelPrizelaureatesarematched.Meanwhile,wealsoprovidetheresultsoftheWolfPrize,theEnricoFermi Prize,andtheMaxPlanckMedal.Itshouldbenotedthatinthefollowinganalysis,wemainlyshowtheresultsontheNobel Prizedataset(Theresultsfortheotherthreeawardsareshowninatableattheend).Therearethreereasons:

• TheNobelPrizeinPhysicshasthehighestauthorityinthefieldofphysics,whichmeansthatalaureatehasagreatinfluence ontheprogressofscienceandtechnology.

• TheNobelPrizeinPhysicshasalonghistory,andthelaureatesaremorewidelydistributedintime.Thisismoresuitable forourresearch,thatis,howtocomparetheachievementofscholarsfairlyinalongtimespan.

• TheperformancesoftheNobelPrizelaureatesareuneven.Somelaureatesmayhaveover10,000citations,whilesome ofthemonlyhavelessthan1,000,whichalsomakesitdifficulttodeterminethelaureatesoftheNobelPrize(Gingras& Wallace,2009).Therefore,usingtheNobelPrizeinPhysicsasourbenchmarksetforevaluatingacademicachievements hasacertaindegreeofnovelty.

(3)

Table1

Asimpleandconcreteexampleforcalculatingmetrics

Name Originalscore Originalranking Newscore Newranking

a 10 1 2 6 b 9 2 6 2 c 8 3 4 4.5 d 7 4 7 1 e 6 5 5 3 f 5 6 4 4.5 g 4 7 1 7 Precision 0.33 0.67 AR 4 2.5 RIR 0.67

Wefirstgivesomedefinitionsofimportantsymbolshere.Alllowercaselettersindicatetheattributesofapaper,andall uppercaselettersindicatetheattributesoftheanauthor.Thetimeofpaperiwaspublishedisti.Thedebuttimeofauthor

iisTi,whichmeansthetimewhenthescholaripublishedthefirstpaperintheAPSdataset.Nirepresentstheauthori’s

collectionofallpapersintheAPSdataset,andweuse|Ni|toindicatethenumberofallhispapers.Weusecitoindicatethe

totalnumberofcitationsforpaperi,anduseCitorepresentthetotalnumberofcitationsofauthori(Itshouldbenotedthat

ourcitationrelationshipislimitedtotheAPSdatasetweused),sowehaveCi=



j∈Nicj.Inthispaperweusethesecond subscripttoindicatethecomponentoftheindicatorinacertainyear.Forexample,Nitrepresentsthenumberofpapers

publishedbyauthoriinyeart,sowehaveNi=



tNit.Similarly,Citrepresentsthenumberofcitationstheauthorireceived

intheyeart,sowealsohaveCi=



tCit.Inaddition,weuseRttoindicatethetotalnumberofreferencesfortheyeart,that

is,thesumofthenumberofreferencesofthepaperspublishedinyeart.

Inthispaper,weaimtoidentifyamongallscholarswhoareeligiblefortheNobelPrizeandcomparetheresultswiththe realNobelPrizedataset.Inordertoquantitativelycomparetheidentificationperformanceofdifferentmethods,weusethe followingfourmetrics:

• Precision.Precision= |q



p|

|q| .Theprecisioniswidelyusedinidentificationtasks(Lü&Zhou,2011),qisthesetofreal

laure-atesandpistheresultpredictedbythealgorithm,where|q|=|p|.Wecallq∩pasidentifiedlaureatesandcomplementary setqpasunidentifiedlaureates.Thisindexisthemoststraightforwardindextomeasuretheidentificationresult.

How-ever,thisindexcannotaccuratelymeasurealgorithmswhenthesizeoftestdatasetissmall,suchastheNobelPrizedata setweusedinthispaper,soweprovidethesecondmeasurement.

• Averageranking(AR).Inordertosolvethelow-resolutionproblem,weusetheaverageranking.Thismetricisalsoused inRadicchietal.(2009)tomeasurethequalityoftheiridentificationmethod.Givenanevaluationindex,afterwehave therankingsforallscholarsaccordingtotheirscorescalculatedbytheindex,weaveragerankingsofallreallaureates. Itshouldbenotedthatwhensomescholarsgetthesamescores,therankingofthesescholarswillallbereplacedbythe meanvalueoftheirrankings.Unlikeprecision,ARconsiderstherankingofalllaureates,soitisnotaffectedbythesize ofthetestdataset.However,whenweuseit,wefindthatthismetricalsohasaproblem.Whentheaverageranking resultofanalgorithmisparticularlyhigh,theremaybetworeasons.Itmaybebecausethealgorithmhasabetteroverall performance,oritmaybebecausetherankingofafewdatapointsincreasesharply,resultinginanincreaseintheaverage ranking.Inordertoavoidthiskindofmisjudgment,weintroduceanothermetricRIR.

• Rankingincreaserate(RIR).RIR= 1 N



iı (i).Whencomparingtherankingofthetworankingmethods,besidetheaverage

ranking,wecanuseRIRtomeasurehowmanyNobellaureatesinourcaseareimprovedinranking.ıiscomparingfunction. Herewedirectlyabbreviatetoı(i),representingwhetherthenewmethodcanimprovetherankingforscholari.Ifthe rankingishigherthantheoriginalmethod,ı(i)isequalto1,otherwiseitis0.ItcanbeseenthattheRIRdoesnotcare abouttheoptimizationdegreeoftheranking,butfocusesonthescopeofoptimization.

• Top-n%.Finally,inordertoenrichtheresults,weusetheTop-n%metrictofurtheranalyzetheresults.Thismetricwas alsomentionedandusedinRadicchietal.(2009).nrepresentstherankingthreshold.Inthispaper,weusefourlevelsof n:0∼0.1%,0.1%∼1%,1%∼10%,10%∼100%.Thenineachintervalwecounthowmanyreallaureatesareincluded.Itiseasy toknowthatthemorethereallaureatesareassignedinthehigherrankingsinterval,thebettertheidentificationresult willbe.

AnexampleofcalculatingconcretevaluesforbothARandRIRisgivenbelow.Forexample,herewehavesevenscholars, named‘a’to‘g’.Theiroriginalscoresandoriginalrankingsarescoresandrankingscalculatedbyoriginalmethodrespectively. Andtheirnewscoreandnewrankingarecalculatedbynewmethodrespectively.DetailedvaluesaregiveninTable1.We assumethat‘b’,‘d’and‘f’arereallaureates.Usingtheoriginalmethodforprediction,thetopthreescholarsobtainedare ‘a’,‘b’and‘c’,sotheprecisionis0.33,whilethepredictionresultsofthenewmethodare‘d’,‘b’and‘e’,andtheprecisionis 0.67.Basedontheoriginalmethod,theaverageranking(AR)is4.Afterusingthenewmethod,theARofthenewmethodis improvedto2.5.Finally,both‘d’and‘f’areimprovedintheirrankings,sotheRIRisequalto0.67.

(4)

Fig.1.StatisticalresultsontheAPSdataset.Subgraph(a)isthescatterplotofthecareerstartyearofresearchersversusthetotalcitationstothem.(b) isthescatterplotofthecareerstartyearofresearchersandtheirrescaledcitationcount.(c)Histogramoftop-0.1%scholarsrankedbyrescaledcitation countandcitationcount.Hereweassignalltop-0.1%scholarsintoseveraldifferentdecadeperiodsinsteadofeverysingleyear.(d)Thenumberofnew authors(ornewpapers)ineachyearfortheAPSdata.They-axisisthelogarithmofthequantity.Wecanseethatthenumberofscholarsandpublications increasesexponentiallyovertime.

3. Model

Inthissection,wewillproposeournewmethodforidentifyingtheprize-winningscientistsbyconsideringcompetition inscientistsranking.Westartwiththebasicmethod,citationcount(CC),whichcanbecomputedbydirectlycountingthe totalcitationsofallscholarsinAPSdataset.AsshowninFig.1(a).Eachdotrepresentsascholar.Wecanseethattheresult distributionofCCcanbedividedintoarisingphaseandadecreasingphase(excepttheobviousgapcausedbytheWorld WarIIaround1945year).Thesetwophasesarecausedbycompetitionandpreferentialattachmentrespectively.

AsshowninFig.1(d),theacademicpopulationincreasesovertime,whichdirectlyleadstohighercitationsofscholars, showninFig.1(a).Thismakesthedistributionoftopscholarsseriouslyunevenovertime.Wecanseethebluehistogram inFig.1(c),by1980,thedistributionoftop-0.1%ofscholarshasapositivecorrelationwithtime.Afterthepeakaround 1980,scholars’citationcountandthenumberoftopscholarsdecreasesharplyyearbyyear.Thistrendismainlycausedby preferentialattachment.Manymethodshavebeenproposedforcorrectingthebiasofthepreferentialattachmentproblem, suchasrescaledcitationcount(Newman,2009),rescaledPageRank(Mariani,Medo,&Zhang,2016),citationrank(Walker, Xie,Yan,&Maslov,2007),etc.TakingrescaledPageRank(Marianietal.,2016)asanexample,thiseffectcanbeeliminated byonlycomparingascholarwithhispeers,andtheresultcanbeseeninFig.1(b)andtheorangehistograminFig.1(c).After usingtherescaledmethodtoeliminatetheeffectofpreferentialattachment,thedownwardtrendhasgone,leavingonlya clearupwardtrend.Itmeansthat,althoughwehavesolvedthepreferentialattachmentproblem,theeffectofcompetition stillexists.Scholarsfromdifferenttimeperiodsarestillnotatthesameleveltobecompared.Moreover,thedistribution shapeofthreeotherpopularindicatorsaresameasCC.ResultscanbeseeninFig.3.Thismeansthatthosethreeindicators donotconsidercompetitioneither.

Competitioncanbeillustratedwithasimpleexample,thenumberofscholarswhodebutedinyear1970is2,127,the averagecitationsofscholarsinthatyearis111.1,andthestandarddeviationofcitationsis295.Accordingtothe3principle, thereare48outstandingscholars(thatis,theircitationsaregreaterthanthemeanvalueplus3timesofstandarddeviation). In1910,only27scholarsdebutedinthisyear,andonlyonepersoncanbethoughtanoutstandingscholarwithtotal1,003 citations,evenlessthantheminimumofthe48scholarsfrom1970.Therefore,iftheNobelPrizeusescitationastheir criterion,allscholarsfrom1910willhavenochancetobenominated.Inreality,fortheoutstandingscholarsin1970itwas 48timesmoredifficulttowintheprizethanforthosein1910,becauseofthecompetitionbetweenalargenumberofpeers from1970.ThisphenomenonwasalsoobservedbyGingrasandWallace(2009).

Insummary,withtheacademicenvironmentcontinuouslychanging,traditionalstaticacademicindicatorsareineffective inidentifyingscholars’achievements(Fire&Guestrin,2019),becausetheyusuallyignorethecompetitioneffect.Therefore, weneedtodesignanewmethodforevaluatingtheoverallachievementofscholars,whichcantacklethecompetition problem.

WeusetheFig.2toexplainouridea.Asshowninthefigure,weassumethatbothscholariandscholarjdebutedinyear 1940,andthetotalcitationsofthemare100until2010.The100citationsofscholaricamefromyear1960,andthecitations

(5)

Fig.2.Theillustrationofthecompetitioneffectwhenevaluatingtheachievementoftwoscholars.Weassumetherearetwoscholarsiandj.Until2010,they bothhave100citations,butireceivedthesecitationsin1960andjreceivedthesecitationsin2000.Ifweevaluatethetwoscholarsfromtheperspective ofcompetition,scholariaremoreoutstandingthanscholarj,asitismoredifficulttoobtain100citationsin1960.

ofscholarjcamefromyear2000.AccordingtotheAPSdataset,thetotalnumberofreferencesfromallpaperspublished inyear1960was10,727,whilethetotalnumberofreferencesinyear2000was170,139.Sogetting100citationsin2000is mucheasierthanthatinyear1960.Inordertosolvethisdifference,weproposethefollowingformula.

C∗ i =



t Cit (Rt)˛ (1) WhereC∗

i isournewindex.InordertocalculateCi∗,weconsidertheauthor’scitationsineachsingleyearasCit.Accordingto

theaboveanalysis,theachievementsofascholarinyeartdonotdependontheabsolutecitationsreceivedinthisyear,but onarelativevalue.Inordertoincreasethegeneralityofthemethod,weaddanadjustableparameter˛.Alarger˛indicates thatourmethodgivesmoreweighttocompetition.Itisnaturalthatwhen˛=0,theEqn(1)degeneratesintocitationcount. Therefore,wecanregardCCasaspecificcaseofourmethod.Wecanalsoseethatifweusetraditionalmethods,suchasCC, H-index(assumingthatbothscholarshaveonlyonepaperinthesametime)orrescaledmethodwementionedabove,itis impossibletodistinguishscholariandscholarj,becausetheyallignorethecompetitioneffect.

4. Results

Inthissection,wemainlypresenttheexperimentresultsontheNobelPrize,becauseofthethreeadvantagesmentioned above.Meanwhile,toenrichtheresults,wealsobrieflyshowtheresultsofthreeotherprizesattheendofthissection. 4.1. Methods

First,weintroduceallmethodsweusedformakingacomparison.Asmentionedabove,aplethoraofquantitativemetrics existandcouldbecomparedinprinciple,butourgoalistoshowthatourmethodscansolvethecompetitioneffectproblem effectively.Therefore,inthispaperourfocusisnarrowedtotraditionalstatisticalmethods.Sowefirstbrieflyintroducefour traditionalfamousmetrics:

• Citationcount(CC).CCisabasicacademicstatisticalindicator,whichcanstraightforwardlyreflecttheattentionofscholars’ scientificwork.

• Papercount(PC).Thatis,uptoacertainyear,howmanypublicationsthescholarhas.Thisindicatorisalsocalledthe academicproductivity,anditcanmeasurescholars’academicactivityandproductivity,butignoretheirpublications quality.

• H-index(Hirsch,2005).Currently,H-indexisamainstreamevaluationmethodforscholars,whichcombinesacademic productivityandacademicqualitytoevaluateascholarcomprehensively.H-indextreatsallpapersofascholarequally, whichwillignoretheimportanceofhighqualitypapers,soitisnotfairforevaluatingthescientificperformancefor scholarswithafewhighlycitedpapers.Toovercomeit,G-indexwasproposed.

• G-index.Egghe(2006)proposedtheG-index,whichisdefinedasthelargestnumbergofindividualpublicationsthat togetherhaveatleastg2citations.SoG-indexcangivemoreweighttopaperswithhighcitationsthanH-index.

Thenweintroduceourtwomethods:

(6)

Fig.3. Scatterplotoffourbasicindicatorsversusthecareerstartyearofresearchers(a:citationcount,b:papercount,c:H-index,d:G-index).Ineach subgraph,eachpointrepresentsascholar.ThereddotsandthegreendotsrepresenttheNobelPrizelaureates.Thereddotsrepresentthescholarsthat arecorrectlyidentifiedastheNobelPrizelaureateswhilethegreendotsrepresentthefailedcase.AllblackdotsarescholarswhohavenotwontheNobel Prize.

• Competition-awarecitationcount(CCC),seeformula(1).AsavariantofCC,theresultsofCCCaremainlycomparedwith CC.Actually,theEqn(1)canbegeneralizedintoaframework,thatis,wecanuseanyacademicindicatorstoreplaceCC, thensplititbyyear,anddividebythecorrespondingcompetitionfactor,namelythetotalreferencesinoneyear.Therefore, inordertofurtherillustratetheeffectivenessofourmethod,wealsoapplyittothenumberofpapers(whichiscalledthe academicproductivityofscholars)asanadditionalexample.

• Competition-awarepapercount(CPC).Itisnaturaltoconsiderthatthecompetitionfactorfortheacademicproductivityof scholarsisthetotalnumberofpaperspublishedintheoneyear.Wethusproposeacompetition-awarepapercountmethod, mathematicallyexpressedasEqn(2).Wecanseethatwhen˛=0,P∗

i degeneratesintoapureacademicproductivityofa

scholar.Therefore,theEqn(2)isanimprovedversionofthescholar’sacademicproductivity.IfCPCworkswell,itcannot onlyexplaintheexistenceofthecompetitioneffectandtheeffectivenessofourmethod,butalsoexhibitstherobustness ofourmethod.Afterreducingalotofusefulinformation(withoutinformationofthecitationsofpaperscomparedwith CCandPC),ourmethodcanstillhavehighaccuracy,whichmeansanyacademicstatisticalindicatorscanbeimprovedby introducingthecompetitionconcept.AsavariantofPC,theCPCresultsaremainlycomparedwithPCbelow.

P∗ i =



t Pit (Nt)˛ (2) 4.2. Analysisresults

AsshowninFig.3,wepresenttheresultsoffourclassicalmethods(CC,PC,H-indexandG-index).First,Fig.3givesusan intuitivefeelingaboutthedistributionofallscholarsandtherankingpositionsofallreallaureates.Itcanbeclearlyseenthat althoughtheNobelPrizelaureatesaregenerallyrecognizedasthescholarswiththelargestcontributions,thisdoesnotmean thattheirindicatorsarealwaysthebest.ActuallymanyNobelPrizelaureatesareatverylowpositions.Ofcourse,thismaybe causedbythelimiteddatasetandincompleteinformation,butmoreimportantly,thismeansthatdeterminingNobelPrize laureatesisaverydifficulttask,andtheprecisionwillnotbeveryhigh.Then,aswementionedabove,thedistributiontrend ofthefourindicatorsisbasicallythesame,allofwhichshowseriousbiasesontime.Thismeansthatthesestaticindicators cannolongerevaluateacademicachievementsfairly,duetothefactthattheacademicenvironmentchangescontinuously (Fire&Guestrin,2019).ThedetailedsimulationresultsofthefourmethodsareshowninTable2.Fromtheperspectiveof precisionandaverageranking,wecanseethatG-indexperformsslightlybetterthanH-index,andtheCCmethodisthebest amongthem.Unsurprisingly,thepapercountistheworstindex,becausePConlyconsiderstheproductivitybutignoresthe quality.

TheresultsoftwonewmethodsCCCandCPCareshowninFig.4.Itshouldbenotedthatsinceournewmethodscontains atunableparameter˛,inourexperimentsweset˛from0to1,andcalculateevery0.01step.TheresultofCCCisshown inthefirstrowofFig.4.First,fromtheperspectiveofprecision,comparedwiththeoriginalCCmethod(when˛=0),most ofthe˛valuescanimprovetheidentificationprecision(exceptforasmallrangearound0.3),andtheoptimalprecision

(7)

Table2

Quantitativeresultsforallawards.Forthesakeoffairnesswhencomparingallawards,weadjust˛=0.64forCCCmethod(Experimentalresultsshowthat when˛=0.64,CCChasthebestpredictionprecisionontheNobelPrizedataset.Inordertocompareperformancewithoutlossofgenerality,wewillfix thisparameteronthethreeotherdatasets).

Prize CCC CC H-index G-index PC

Percision AR RIR Percision AR Percision AR Percision AR Percision AR

NobelPrize 0.222 21,370 0.717 0.139 30,617 0.095 33,761 0.114 32,440 0.013 35,921

EnricoFermiPrize 0.120 28,145 0.820 0.000 44,736 0.000 47,582 0.020 45,762 0.020 46,160 MaxPlanckMedal 0.069 20,768 0.724 0.034 37,737 0.017 48,652 0.017 46,287 0.017 53,448

WolfPrize 0.100 15,917 0.560 0.060 18,108 0.060 23,062 0.060 24,505 0.020 29,953

Fig.4. Dependenceoftheresultsontheparameter˛.Thesubgraphsa,bandcaretheresultsofourCCCmethod,andthesubgraphsd,eandfaretheresults oftheCPCmethod.Theleft,middle,andrightsidesrepresentprecision,averagerank,andrankingincreaseraterespectively.ThebestprecisionofCCCis 0.222,andthecorresponding˛is0.64.AndthebestprecisionforCPCis0.146withthe˛equalsto0.72.Asthe˛increases,theaveragerankingofthetwo methodsaremonotonicallyincreasing,whileboththerankingincreaseratesshowmonotonousdecreasingtrends.

0.222isreachedwhenalphaissetto0.64.Aswecansee,theaveragerankingshowsamonotonousincreasingpatternas˛ increases.However,thisdoesnotmeanthatthegreater˛isthebettertheresultwillbe.AswecanseefromtheRIRcurve, itshowsamonotonousdecreasingtrend.Thismeanswhenweincrease˛,theriseinaveragerankingsisduetoasmall numberofNobellaureatesmovingupsignificantly,notoveralloptimization.ThisphenomenonalsoindicatesthatARisnot aneffectiverankingmethodforidentifyingprize-winningscientists.Tosumup,ourCCCmethodshowsagoodperformance, whichindicatestheadvantageofintroducingthecompetitionmechanismintotheauthorrankingtask.

AsforCPC,itcanalsoprovetherobustnessofourmethod.Itiswellknownthatthereisapositivecorrelationbetween CCandPC(Bayer&Folger,1966).ComparedtoCC,PConlyaccountsforthequantityofpapersofscholarsbutnotquality information.IfourmethodstillhasgoodperformanceonPC,itindicatesthatourmethodisrobustagainstlowqualitydata. TheresultsofCPCareshowninthesecondrowofFig.4.SimilartotheresultsofCCC,theprecisionofCPCincreasesfirstand thendecreases.WecanseethattheRIRvaluestaysabove70%allthetime,whichmeansthatourmethodcaneffectively improvetherankofmostNobelPrizelaureates.ItshouldbenotedthatthereisagapattheleftsideofthetwoARcurvesin Fig.4(e).ThisisbecauserankingscholarsbyusingPCmaymakemanyscholarsindistinguishable.Forexample,thereare892 scholarsowning20papersinourdataset.Howeverafterusingournewindicator,thesescholars’scoreswillbecompletely differentandtherankingsoflaureatesbecomebetter.Itisworthnotingthatthereisagreatimprovementofprecisionin Fig.4(d)inwhichtheoptimalvalueoftheprecisionis0.146(when˛=0.72).Thisprecisionis11timeshigherthanusing theproductivitytoidentifyNobellaureates(when˛=0,precision=0.0127),anditisevenbetterthantheprecisionofCC (precision=0.139).Thisresultshowsthatafterconsideringthecompetitioneffect,ifweonlyuseproductivitytoidentify,it canalsoachievethesimilareffectasthecitationcount.

InordertocompareCCandCCCfromamoreintuitiveperspective,weusetherankingresultsofthetwoindicatorsto makeascatterplot,asshowninFig.5(a).Ifascholarisabovetheblackdiagonalline,itmeansthathis/herCCCrankingis betterthanCCranking,andthefartherthedeviatesfromtheline,themorehis/herrankingisimprovedbyCCC.Ascanbeseen fromFig.5,themajorityofNobelPrizelaureates(reddots)areabovethediagonalline,indicatingthatCCChasimprovedthe rankingofmostNobelPrizelaureates,whichistheresultofoverallimprovementratherthanlocalimprovement.Although someoftheNobellaureates(reddots)arebelowtheblacksolidline,thedeviationsaremuchsmallerthanthoseforthe dotsaboveit.Thatisactuallybecause,thosescholarswhoserankingsarenotpromoteddonotdropsignificantly,andthey

(8)

Fig.5.Thecomparisonofrankingresultsofdifferentmethods.Itshouldbenotethatwetakelogarithmforallrankings.(Left)Thex-axisistheranking ofascholarcalculatedbyCCC(˛=0.64),andthey-axisistherankingofascholarbyusingtheCC.(Right)Thex-axisisrankingofascholarcalculated byCPC(˛=0.72),andthey-axisistherankingofascholarbyusingthePC.Inthesetwofigures,eachdotrepresentsascholar.Theredmarkersandblue dotsindicatetheNobelPrizelaureatesandotherscholarsrespectively.Blacksolidlinesarey=x,indicatingthatthedotonithasthesamerankingsintwo methods.Theblackdashedlinesindicatetheboundariesofidentification,thatisx=ln158andy=ln158,sotheleftsideoftheverticaldashedlineindicates theidentificationresultofournewindicator,andthelowersideofthehorizontaldashedlineindicatestheidentificationresultofthetraditionalmethod.

canstillretaintheiroriginalrankings.Takingtwoscholarsasexamples,thefirstscholar‘ArthurH.Compton’andthesecond schoar‘CarlE.Wieman’whoaremarkedinthefigure.AmongalltheNobelPrizelaureatesidentifiedbyusingCCC,‘Arthur H.Compton’hasthelowestCCranking,hisCCCrankingandCCrankingare72and10,141respectively.Thisisasignificant improvementintheranking,whichlifts‘ArthurH.Compton’intotop-1%group.Meanwhile,amongalltheNobelPrize laureatesidentifiedbyusingCC,‘CarlE.Wieman’hasthelowestCCCranking,hisCCCrankingandCCrankingare453and 145respectively.Althoughthereisadropinhisranking,hestillremainsinthetop-1%group.

Inaddition,aswecanseethatthefigureisdividedintofourregionsbytwodashedlines.Thelowerleftareaisthe commonidentificationareaoftwomethods,thatis,thescholarsidentifiedbythetwomethodsatthesametime.Thelower rightareaandupperleftarearepresenttheadditionalidentificationareaofCCandCCCrespectively.WecanseethatCCC canretainthemostoflaureatesidentifiedbyCC(onlyfivereddotslocatedatthelowerrightarea).Asshowninthefigure, take‘StevenWeinberg’and‘PhilipW.Anderson’astwoexamples,nomatterwhichmethodisused,thesetwoscholarsare alwaysrankedveryhigh.Atthesametime,intheupperleftcorner,wecanseeCCCsuccessfullyidentifyother18Nobel Prizelaureates.ThisshowsthatCCCisasuccessfulextensionmethodofCC.TheresultsofCPCandPCareshowninFig.5.The sameastheleftone,mostoftheNobelPrizelaureates(reddots)arelocatedabovetheblacksolidline,andtheirdeviations aremuchlargerthanthereddotsbelowit(takethefourscholarsmarkedinthefigureasexamples).TherearenoNobel PrizelaureatesinthecommonidentificationareaofPCandCPC,mainlybecauseofthepoorperformanceoftheoriginalPC method.Inotherwords,thenewmethoddoesnotinheritanyusefulresultsfromPC.

Finally,weusetheTop-n%metrictocomparetheresultsofvariousmethods.Inordertoenrichtheresults,wealso usethreeotherfamousphysicsawards,theWolfPrize(50laureates),theEnricoFermiprize(50laureates)andtheMax PlanckMedal(58laureates).ComparedwiththeNobelPrize,thesethreeprizeshaveasmallernumberoflaureates,and thedistributionofthewinningtimeisnotaswideastheNobelPrize.Herewedonotplotthesethreeprizesseparately, butputalltheawardstogether(specificresultscanbefoundinTable2).Finallywehave272laureatesfromfourawards, someofwhommaywonmultipleawards.Inthetop-n%metric,weexpectthatthebetterthealgorithmperforms,themore awardedauthorswillbefoundinthetoprankpercentage.AswecanseefromtheFig.6,ourcompetition-awaremethod alwaysperformsbetter.Thetwocategories‘below0.1%’and‘0.1%-1%’alwayscontainmorelaureatescomparedtoother methods.Meanwhile‘1%-10%’and‘above10%’alwayshavefewerlaureates.Thismeansthatourmethodwouldidentify mostoftheawardwinningauthorsbyplacingthoseatarelativelyhighranks.Herewealsouseatabletoshowmoredetails ofperformanceonthefourawards,seeTable2.Weadjust˛to0.64.Fromtheresults,wecanseethatourmethodperforms wellonbothfourdatasets,andparameter˛seemstobeuniversal(Althoughinaspecificaward,0.64maynotbethebest choice,itstillworksbetter).

4.3. Theroleof˛

WhenweintroducetheEqn(1)proposedinsection3,weregard˛astheparameterthatisusedforadjustingthedegree ofattentionofthecompetition.Alarger˛indicatesthatstrongereffectofcompetitionisconsidered,whichisthemost intuitiveinterpretationof˛.Atthesametime,accordingtotheresultsinsection4.2,wecanfindthatalthoughmost˛can

(9)

Fig.6. Thetop-n%measurementon4awardsinphysics(theNobelPrize,theMaxPlanckMedal,theWolfPrize,andtheEnricoFermiPrize).Wefix˛=0.64 forCCC.Theperformanceofascholardependsonthepercentileofthescholar’sranking.Thelowerthispercentageis,thebettertheperformanceofthe consideredscientistis.Theheightofthebarindicatestheproportionoflaureates.TheleftfigureonlyconsiderstheNobelPrize,whiletherightfiguretakes intoaccountallfourprizes.WecanfindnomattertheNobelPrizeorallprizesweconsidered,ourcompetition-awaremethodperformsbetterthanothers.

Fig.7. WeuseKolmogorov-Smirnovtest(seemaintextforexplanation)toexplaintheroleofparameter˛.Thefigureontheleftsideisthecurveof KSscoreswithparameter˛,andthesmallinsetgraphisthecurveofp-values.Thethreesubgraphsontherightsidearethecomparisonsbetweenthe distributionoftheidentificationresultsunderthreedifferentparametervalues(˛=0,0.64,1)andthedistributionoftherealNobelPrizelaureates.Black barrepresentsthedistributionoftheNobelPrizelaureatesandredbarrepresentsthedistributionofourresults.

improvetheidentificationeffect,thereisalwaysabestchoiceof˛.Whyistherealwaysabestresultafteradjusting˛toa specificvalue,andhowdoes˛affecttheidentificationresults?Inthissection,weaimtoillustratetheeffectof˛.

˛isactuallyhelpingourmethodtofindthebestoutputdistribution,whichistheclosesttothedistributionofreal NobelPrizelaureates.Tobetterillustratethispoint,weuseKolmogorov-Smirnovtest(KStest)(Hodges,1958).Thistest isanonparametrictestoftheequalityofcontinuous(ordiscontinuous),one-dimensionalprobabilitydistributionsthatcan beusedtocomparetwosamples.ThelargertheKStestscore,thelargerthedistancebetweenthetwodistributions;the smallerthescore,themoresimilarthetwodistributionsare.Meanwhile,theKSscorealsogivesthecorrespondingp-value foranalyzingthesignificanceofresult.Theprocedureofexperimentisthesameasinsection4.2,where˛isfrom0to1,and thedistancebetweenthedistributionoftheidentificationsetandthedistributionoftherealNobelPrizesetiscalculatedin every0.01.

TheresultsareshowninFig.7.Wecanclearlyseethatunderthechangeof˛,theKSscoredecreasesfirstandthen increasessharply.Theminimumvalueofthedistanceappearsaround˛=0.6,andasmallerscoremeansthedistributionof ouridentificationcanmatchthedistributionoftherealNobelPrizebetter.InthethreehistogramsontherightsideofFig.7, weshowthedistributionsofourresultsunderthreespecialcasesof˛andthedistributionoftherealNobelPrizelaureates. Inthefirsthistogram,weset˛=0,whichmeansthattheredbarshowsthedistributionoftheCCresult.Itcanbeseen thatidentificationtendstoconcentratearound1970.Thisiseasytounderstand,sincewehaveanalysedinFig.3insection 4thatthedistributionsofthefourtraditionalindicatorsarebasicallythesame,withpeaksaround1970.Therefore,the distributionsoftopscholarscalculatedbytraditionalindicatorsalwaysappearnear1970.When˛=0.64,theidentification

(10)

Table3

Logisticregressionresults

Crossvalidation CV-1 CV-2 CV-3 CV-4 Average

A AR 8,242 9,829 6,405 6,105 7,646 F1-score 0.016 0.015 0.016 0.016 0.016 Precision 0.225 0.200 0.100 0.051 0.144 B AR 7,012 7,651 5,840 5,061 6,391 RIR 0.525 0.650 0.625 0.550 0.588 F1-score 0.026 0.022 0.026 0.021 0.024 Precision 0.250 0.250 0.150 0.077 0.182 C AR 5,840 5,356 4,305 4,580 5,020 RIR 0.725 0.825 0.725 0.667 0.735 F1-score 0.030 0.026 0.031 0.030 0.029 Precision 0.275 0.300 0.175 0.103 0.213 CCC AR 6,542 6,528 5,013 3,232 5,329 RIR 0.625 0.825 0.700 0.692 0.711 Precision 0.250 0.300 0.200 0.053 0.201

precisionreachestheoptimalvalue(0.222).Ascanbeseenfromthesecondhistogram,thematchingresultsoftheredbar andtheblackbararesignificantlybetterthanthefirstcaseandthedistributionofredbarbecomeswider.Itisparticularly noteworthythatbetween1900and1940,CCpredictsthatthenumberoftopscholarsinthisperiodisalmostzero,and thissituationisimprovedverywellafterincreasing˛to0.64.Inthethirdhistogramweshowtheresultswhen˛=1.0.It canbeseenthatthealgorithmpaystoomuchattentiontothecompetition,andthedistributionofthepredictionmainly concentratesbetween1900and1940.ItcanalsobeseenfromthecurveofKSscorethatwhenthecompetitionplaystoo muchroleinourmethodwhilethecitationsareignored,theKSscoreincreasesrapidlyto0.67.

Ingeneral,throughtheaboveanalysis,webelievethattheparameter˛isactuallyusedtoadjustthedistributionofthe identificationresult.Anoptimal˛makessurethatthereisnoobviousbiasontimewhenwemakeidentification,andthe outputdistributionismoreclosetothedistributionofreallaureates.

4.4. Furtherexplorationwithmachinelearning

Attheendofthepaper,wewanttoillustratethatourcompetition-awaremethodcanalsobecombinedwithmachine learningmethods,which deservesmore exploration.Wetakethelogisticregression (Owen,2007)asanexample.We considerthetaskofpredictingNobelPrizelaureatesasaspecialbinaryclassificationtask.Weuselabel1toindicatethe NobelPrizewinner,andlabel0toindicatescholarwhodoesnotwintheNobelPrize.Thispredictiontaskisveryspecial becauseitstrainingdatasetisextremelyimbalanced.Thesizeofthepositivesamplesintrainingdatasetisonly158but Thesizeofthenegativesamplesisabout230,000.Soduringthetrainingphase,weuseweightedmethodstogivegreater weighttothepositivesamplestosolvethisproblem.

Toillustratetheimportance ofthecompetitionmechanism,wegivethree trainingstrategies formachine learning experiments.

• (A)Citationcount.Takethecitationcountastheuniquefeatureofsampletomakeprediction.

• (B)Takecitationsineachsingleyearasasinglefeature.Sointhiscaseweusevectorstorepresentdifferentsamples. • (C)BasedoncaseB,weusetheconceptofcompetitiontonormalizeallfeaturevectors.SeeEqn(1)fordetails.

Inaddition,inordertocomparepredictionresultsmorecomprehensively,weusefourfoldcrossvalidationsandfour metricsAR,RIR,Precision,andF1-scoretomeasuretheperformanceofalgorithm.

ItcanbeseenfromtheTable3thatstrategyCachievesthebestperformance,andstrategyAhastheworstresult.Compared withstrategyA,thetrainingsetinstrategyBisupgradedsuchthateachsamplegetsextrainformationabouttime.This extrainformationisveryhelpfultoimprovethepredictionprecision.ComparedtostrategyB,wecanseethatinstrategy Cthepredictionprecisionofthemachinelearningalgorithmhasbeenfurtherimprovedafterthecompetitionmechanism isperformed.Infact,forstrategyC,theroleofthecompetitionmechanismisusedasadatanormalizationprocess.This procedureisconducivetoimprovingtheperformanceofmachinelearningmethods,whichindicatesthatconsideringthe competitioneffectisalsohelpfulfordesigningtrainingsetsformachinelearningalgorithms.

Attheendofthetable,wealsoshowthepredictionresultsofthesimplecompetition-awarecitationcount(CCC).It canbeseenfromtheresultsthattheperformanceofCCCissignificantlybetterthanthatofstrategyB,whichmeansthat ourcompetitionmechanismisusefulitself.StrategyChasthehighestaccuracy,whichalsomeansthatmachinelearning algorithmsaresuperiortotraditionalmethodsinrankingorpredictiontasks.Insummary,alltheresultsshowthatthe competitionisacrucialfactorforidentifyingprizewinningscientists.

(11)

5. Conclusion

Inthispaper,wemainlydiscusshowtomeasuretheacademicachievementsforscholarsmorereasonably.Weintroduce thecompetitionmechanismthroughanalysisontheAmericanPhysicalSociety(APS)dataset.Withtherapiddevelopment ofmodernscienceandtechnology,thenumberofnewscholarsandnewacademicpapersareincreasingexponentially.This meansthatscientistsarefacingmoreandmorecompetitorsinacademia.Therefore,juniorscholarsmayfindthatitwillbe moredifficultforthemtoobtainthoselifetimeachievementawardsovertime.Meanwhile,thisalsomeansthattraditional academicindicatorsmayhavelosttheirmeaningwhenevaluatingscholarsfromdifferentperiods(Fire&Guestrin,2019). Comparedwithseniorscholars,juniorscholarsareinalargerandmoreactiveacademicenvironment,itiseasierforthem toobtainmorecitationsandattentions,butitdoesnotmeanthattheirperformancesarebetter.Therefore,ifwedonottake thecompetitioneffectintoaccount,itisdifficultforustoevaluatescholars’achievementfairly.

Thenweuseaseriesofexperimentstoproveouridea.Fromtheresults,welearnthataftertakingthecompetitionfactor intoaccount,thepredictivepowerofournewmethodshowsagreatimprovement.Takecitationcountasanexample, thepredictionprecisionincreasesfrom0.139to0.222.Theaveragerankingofallreallaureatesincreasesfrom30,617to 21,370.Moreover,accordingtotheindicatorRIR,71.7%ofthelaureates’rankingsareimprovedbyusingournewmethod. Ournewmethodistheoptimalmethodamongthetraditionalindicatorswehavelistedforcomparison.Atthesametime, wealsomadeafurtherstepbyusingthepapercounttoprovethatourmethodhasenoughrobustnessandgenerality.The resultsshowthatwhenthenumberofpapersistheonlyknowninformationtodeterminetheNobelPrizelaureates,our newmethodstillhasasignificantimprovement.Theprecision0.146isreachedwhencompetitionistakenintoaccount.The averagerankingofallreallaureatesrisesfrom35,921to18,462.Moreover,accordingtotheindicatorRIR,76.1%ofthereal laureates’rankingsareimproved.Insummary,theaimofourmethodisnottoreplaceotherrankingtechniques,optimized andalmostperfectedinthecourseofmanyyears.Wewanttoillustratetheexistenceofthecompetitionfactorandsolve itbyasimpleway.Ourresearchhasimportantpracticalvalue,suchasprovidingvaluablereferencesforacademicawards, selectingsuitablescholarsasfaculties,andpromotingoutstandingacademicstaff.

Therearealsonumerousmeaningfulextensionsforthefutureresearch.First,onecantrytocombineourideawith othermethods,suchasthediffusionbasedmodelsdiscussedinintroduction.Thiskindofmethodscanusemorestructure informationofcitationnetwork,whichwillbenefittheresults.Similartocompetition-awarecitationcount,wecouldadjust theimpactofeachpaperaccordingtothecompetitionmechanism.TakingPagerankasanexample,wecangiveeachout edgeofapaperaweightbasedonitspublicationyear,andthisweightisdeterminedbythetotalnumberofreferencesor thenumberofpapersinthatyear.Second,itiswellknownthattherearemanybranchesinthephysicstoday.Theimpact ofcompetitionwithinthesebranchesisdifferentfromeachother.Therefore,whencomparingthescholarsfromdifferent academicdomains,itwillbefairer,ifweperformanalysistodistinguishdifferentdomains.Forexample,asmentionedby Zengetal.(2019),onecanusethecommunitydetection(Fortunato&Hric,2016)todetectdifferentresearchbranches oncitationnetwork.Thesizeofthosecommunitiescanrepresentthedegreeofcompetition.Sobesidesthecompetition betweendifferentperiods,wecanalsocalculatethecompetitioninsidedifferentacademicdirections.Thesetwofactors togetherdecidetheweightofedgesinthecitationnetwork.Inadditiontoimprovingtheprediction,onecanalsoanalyze thestrengthofcompetitionbetweendifferentfieldsaccordingtothefittingresults.

CRediTauthorshipcontributionstatement

YuhaoZhou:Software,Validation,Writing-originaldraft,Writing-review&editing,Formalanalysis.RuijieWang: Writing-originaldraft,Writing-review&editing.AnZeng:Conceptualization,Methodology,Writing-review&editing, Formalanalysis,Datacuration.Yi-ChengZhang:Conceptualization,Supervision.

Acknowledgment

TheauthorswouldliketothankMatúˇsMedoforhelpfuldiscussion.ThisworkwassupportedbytheSwissNational ScienceFoundation(GrantNo.200020-156188).

References

Abrishami,A.,&Aliakbary,S.(2019).Predictingcitationcountsbasedondeepneuralnetworklearningtechniques.JournalofInformetrics,13,485–499.

Bar-Ilan,J.(2008).Informetricsatthebeginningofthe21stcenturyałareview.Journalofinformetrics,2,1–52.

Barabási,A.L.,&Albert,R.(1999).Emergenceofscalinginrandomnetworks.science,286,509–512.

Bayer,A.E.,&Folger,J.(1966).Somecorrelatesofacitationmeasureofproductivityinscience.Sociologyofeducation,381–390.

Bianconi,G.,&Barabási,A.L.(2001).Bose-einsteincondensationincomplexnetworks.Physicalreviewletters,86,5632.

Bornmann,L.,&Daniel,H.D.(2005).Doestheh-indexforrankingofscientistsreallywork?Scientometrics,65,391–392.

Chen,P.,Xie,H.,Maslov,S.,Redner,S.2007.Findingscientificgemswithgooglea´rspagerankalgorithm.

Ding,Y.,Yan,E.,Frazho,A.,&Caverlee,J.(2009).Pagerankforrankingauthorsinco-citationnetworks.JournaloftheAmericanSocietyforInformation

ScienceandTechnology,60,2229–2243.

Egghe,L.(2006).Theoryandpractiseoftheg-index.Scientometrics,69,131–152.

Fiala,D.(2012).Time-awarepagerankforbibliographicnetworks.JournalofInformetrics,6,370–388.

Fire,M.,&Guestrin,C.(2019).Over-optimizationofacademicpublishingmetrics:observinggoodharta´rslawinaction.GigaScience,8,giz053.

Fortunato,S.,&Hric,D.(2016).Communitydetectioninnetworks.Auserguide.Physicsreports,659,1–44.

(12)

Garfield,E.,etal.(1970).Citationindexingforstudyingscience.Nature,227,669–671.

Gingras,Y.,&Wallace,M.(2009).Whyithasbecomemoredifficulttopredictnobelprizewinners:abibliometricanalysisofnomineesandwinnersofthe

chemistryandphysicsprizes(1901-2007).Scientometrics,82,401–412.

Hirsch,J.E.(2005).Anindextoquantifyanindividual’sscientificresearchoutput.ProceedingsoftheNationalacademyofSciences,102,16569–16572.

Hodges,J.1958.Thesignificanceprobabilityofthesmirnovtwo-sampletest.ArkivförMatematik3,469-486.

Jin,B.,Liang,L.,Rousseau,R.,&Egghe,L.(2007).Ther-andar-indices:Complementingtheh-index.Chinesesciencebulletin,52,855–863.

Light,R.P.,Polley,D.E.,&Börner,K.(2014).Opendataandopencodeforbigscienceofsciencestudies.Scientometrics,101,1535–1551.

Liu,X.,Bollen,J.,Nelson,M.L.,VandeSompel,H.2005.Co-authorshipnetworksinthedigitallibraryresearchcommunity.Informationprocessing& management41,1462-1480.

Lü,L.,Zhou,T.2011.Linkpredictionincomplexnetworks:Asurvey.PhysicaA:statisticalmechanicsanditsapplications390,1150-1170.

Mariani,M.S.,Medo,M.,&Zhang,Y.C.(2016).Identificationofmilestonepapersthroughtime-balancednetworkcentrality.JournalofInformetrics,10, 1207–1223.

Newman,M.E.(2009).Thefirst-moveradvantageinscientificpublication.EPL(EurophysicsLetters),86,68001.

Owen,A.B.(2007).Infinitelyimbalancedlogisticregression.JournalofMachineLearningResearch,8,761–773.

Papadopoulos,F.,Kitsak,M.,Serrano,M.Á.,Boguná,M.,&Krioukov,D.(2012).Popularityversussimilarityingrowingnetworks.Nature,489,537.

Perc,M.(2014).Themattheweffectinempiricaldata.JournalofTheRoyalSocietyInterface,11,20140378.

Price,D.d.S.(1976).Ageneraltheoryofbibliometricandothercumulativeadvantageprocesses.JournaloftheAmericansocietyforInformationscience,27, 292–306.

Price,D.J.D.S.(1965).Networksofscientificpapers.Science,510–515.

Radicchi,F.,Fortunato,S.,Markines,B.,&Vespignani,A.(2009).Diffusionofscientificcreditsandtherankingofscientists.PhysicalReviewE,80,056103.

Redner,S.(1998).Howpopularisyourpaper?anempiricalstudyofthecitationdistribution.TheEuropeanPhysicalJournalB-CondensedMatterand

ComplexSystems,4,131–134.

Shen,Z.,Chen,F.,Yang,L.,&Wu,J.(2019).Node2vecrepresentationforclusteringjournalsandasapossiblemeasureofdiversity.JournalofDataand

InformationScience,4,79–92.

Sinatra,R.,Wang,D.,Deville,P.,Song,C.,&Barabási,A.L.(2016).Quantifyingtheevolutionofindividualscientificimpact.Science,354,aaf5239.

VanHooydonk,G.(1997).Fractionalcountingofmultiauthoredpublications:Consequencesfortheimpactofauthors.JournaloftheAmericanSocietyfor

InformationScience,48,944–945.

Walker,D.,Xie,H.,Yan,K.K.,&Maslov,S.(2007).Rankingscientificpublicationsusingamodelofnetworktraffic.JournalofStatisticalMechanics:Theory

andExperiment,2007,P06010.

Yan,E.,&Ding,Y.(2009).Applyingcentralitymeasurestoimpactanalysis:Acoauthorshipnetworkanalysis.JournaloftheAmericanSocietyforInformation

ScienceandTechnology,60,2107–2118.

Zeng,A.,Shen,Z.,Zhou,J.,Fan,Y.,Di,Z.,Wang,Y.,Stanley,H.E.,&Havlin,S.(2019).Increasingtrendofscientiststoswitchbetweentopics.Nature

communications,10,3439.

Zeng,A.,Shen,Z.,Zhou,J.,Wu,J.,Fan,Y.,Wang,Y.,&Stanley,H.E.(2017).Thescienceofscience:Fromtheperspectiveofcomplexsystems.Physics

Reports,714,1–73.

Figure

Fig. 1. Statistical results on the APS data set. Subgraph (a) is the scatter plot of the career start year of researchers versus the total citations to them
Fig. 2. The illustration of the competition effect when evaluating the achievement of two scholars
Fig. 3. Scatter plot of four basic indicators versus the career start year of researchers (a:citation count, b:paper count, c:H-index, d:G-index)
Fig. 5. The comparison of ranking results of different methods. It should be note that we take logarithm for all rankings
+2

Références

Documents relatifs

Une fois l'arbre vasculaire segmenté pour chacun des 34 patients de notre base de données (13 patients atteints de broncho-pneumopathie obstructive (BPCO), 13

Pour améliorer l’utilisation et la coordination des structures de réseaux de soins palliatifs et d’hospitalisation à domicile et ainsi favoriser la fin de vie

Les éléments recueillis au cours de cette recherche nous laissent croire que la période du mitan de la vie est un phénomène très personnel. Il s'insère sournoisement

journées d'échantillonnage et les régimes alimentaires des rotifères alimentant les larves de plie rouge (tableau 2). L'abondance moyenne des bactéries totales diminue au cours du

Si pour enseigner la créativité la profession a progressivement éliminé « sujet », « exercice », ..., c’est que – nous le verrons - « l’incitation » est un artefact

The observed effect of alfacalcidol on the number of falls and fallers observed in participants with a creatinine clearance of <65 ml/min is most probably due to increased

Pour résumer, nous sommes ainsi partis de l’observation d’un malade il y a plus de 40 ans, à l’identification d’une nouvelle maladie vasculaire cérébrale,

In the context of this model, we show that if the designer does not face too much un- certainty about the difficulty of the research and if the up front sunk cost is not too small,