The long-term impact of ranking algorithms in growing networks

(1)

The

long-term

impact

of

ranking

algorithms

in

growing

networks

Shilun Zhang

a

_{, Matúš Medo}

a,b,c

_{, Linyuan Lü}

a,d

_{, Manuel Sebastian Mariani}

a,e,∗ a Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610051, PR China b Department of Radiation Oncology, Inselspital, Bern University Hospital and University of Bern, Bern 3010, Switzerland

c Department of Physics, University of Fribourg, Fribourg 1700, Switzerland

d Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou 311121, PR China e URPP Social Networks, Universität Zürich, Zürich 8050, Switzerland

Keywords:

Complex networks Ranking

Popularity and quality Popularity inequality Algorithmic bias

Whenuserssearchonlineforcontent,theyareconstantlyexposedtorankings.For exam-ple,websearchresultsarepresentedasarankingofrelevantwebsites,andonline book-storesoftenshowuslistsofbest-sellingbooks.Whilepopularity-basedrankingalgorithms (likeGoogle’sPageRank)havebeenextensivelystudiedinpreviousworks,westilllacka clearunderstandingoftheirpotentialsystemicconsequences.Inthiswork,wefillthisgap byintroducinganew model ofnetworkgrowththatallows usto comparethe proper-tiesofnetworksgeneratedundertheinfluenceofdifferentrankingalgorithms.Weshow thatby correctingfor theomnipresent agebiasofpopularity-basedranking algorithms, theresultingnetworksexhibitasignificantlylargeragreementbetweenthenodes’ inher-entqualityandtheirlong-termpopularity,andalessconcentratedpopularitydistribution. Tofurtherpromote popularitydiversity,weintroduceandvalidateaperturbationofthe originalrankingswhereasmallnumberofrandomly-selectednodesarepromotedtothe topoftheranking.Ourfindingsmovethefirststepstowardamodel-basedunderstanding ofthelong-termimpactofpopularity-basedrankingalgorithms,andournovelframework couldbeusedtodesignimprovedinformationfilteringtools.

1. Introduction

Rankingalgorithmsallowustoefficientlysortmassiveamountsofonlineinformation,andquicklyprovideuswiththe relevantitems thatfit ourneeds. Dueto theubiquity ofrankings,the implications ofranking-basedinformationfiltering toolssuchassearchengines[3]andrecommendationsystems[21]foroursocietyarewidelydebated[2,5,8,15].Importantly, insocialandinformationsystems,therankingsthatweareexposedtoareinfluencedbysocialprocessesand,atthesame time,influencesocialprocessesthemselves [40].Toprovideafew examples,rankingscanheavily impactontheeventual popularityof movies andsongs inculturalmarkets [39],affect theattention receivedby products in onlinee-commerce platforms[16,41,49],increasethesalesoftop-rankeddishesinrestaurants[4],andeveninfluencethechoicesofundecided

∗ _{Corresponding author at: Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, 610051 Chengdu,} PR China; URPP Social Networks, Universität Zürich, 8050 Zürich, Switzerland.

E-mail addresses: [email protected] (L. Lü), [email protected] (M.S. Mariani).

http://doc.rero.ch

Published in "Information Sciences 488(): 257–271, 2019"

which should be cited to refer to this work.

(2)

electorsand, asaresult, the outcomeofpolitical elections[9].Therefore,understanding thepotential systemicimpact of rankingalgorithmsandcorrectingtheirpotentialﬂawsbecomesacriticalissueindiversecontexts.

Sofar,moststudiesonrankingincomplexnetworksvalidatedthealgorithmsintermsoftheirabilitytofind“important” nodesin the network. These studies assumed that there exists a set of target nodes to retrieve, and they assessed the performance ofthe ranking algorithms interms oftheir accuracy in retrievingthe target nodes.Relevant studies on the topicinclude papers on the identification of structural anddynamical influential nodes[20,29]; on the identification of expert-selectedrelevantnodes[19,25,26];onthepredictionofsuccessfulnodes[19,20].Otherstudies[1]validatedranking algorithms with respect to their mathematical properties.We refer to the Related Works section (Section 2.1) for more details.

Yet,ourunderstandingofthepotentialconsequencesofrankingalgorithmsforagivensystemremainselusive.An algo-rithmwithhighaccuracyinidentifyingtarget nodesmightlead,inthelong term,toa highly-unevensystemwhereonly fewnodesareabletogainvisibility,makingitharderforpeopletodiscoverrecentcontent[11,49].Besides,both experimen-tal[9]andtheoretical[34]resultssuggestthat opinionformationprocessescanbemanipulatedbymeansof information-ﬁlteringtools.Therefore,inourincreasinglydigitalizedsociety,understandingthefeedbackmechanismsbetweenindividual behaviorandrankingalgorithmsisofutmostimportance.

Themain goalofthisarticleisto investigatethelong-termimplications ofrankingalgorithms froma theoretical per-spective.Tothisend,we introduceagrowingdirected-network modelwhere,ateachstep,anewnode entersthesystem andlinksto a limited number ofpreexisting nodes.The linksare chosen probabilistically based on two factors: (1) the preexistingnodes’rankingpositionasdeterminedbyapredefined,“adopted” rankingalgorithm;(2)thenodes’quality.The relativeimportance of thetwo effects isdetermined by a homogeneousparameter that can be interpretedas thenodes’ sensitivitytoquality.Inthemodel,qualityisanode-level parameter[24]whichcan beinterpretedasthesuccessthatthe nodewouldachieveintheabsenceofsocialinfluencemechanisms[39].Crucially,differentadoptedrankingalgorithmslead todifferentpropertiesofthefinalnetwork.

Inthisarticle,we aimtodeterminewhethertheadoptionofagivenrankingalgorithmby agivensystemallows high-qualitynodestoexperiencelargersuccessthanlow-qualitynodes.Ifthisisnotthecase,wemayconcludethattheadopted rankingalgorithmhasanegativeimpact onthesystem,asitmayprevent high-qualitynodesfrombecomingpopularand increasethepopularityoflow-quality nodes.Therefore,weusethemodeltoaddressthefollowingquestions:Willagiven algorithmfacilitateorimpedethesuccessofhigh-qualitynodesinthesystem?Isagivenalgorithmusefultodiscover high-qualitynodes?Doesitleadtouneven,highly-concentratedpopularitydistributions?

Wepostulatethatagoodrankingalgorithmshouldleadtoanetworkwhere:(1)thenodes’long-termpopularitystrongly correlateswiththeirquality(qualitypromotion);(2)thenodes’scorestronglycorrelateswiththeirquality(qualitydetection); (3)popularityisnotconcentratedinafewnodes(popularitydiversity).Thequalitypromotionanddetectionpropertiesfavor thealgorithms that help the systemto improvethepopularity-quality correlation(quality promotion)[6]and thosethat helpthenodesto ﬁndhigh-qualitynodes(qualitydetection) [24].The popularitydiversityproperty favorsthealgorithms thatdistributepopularitymoreevenlyacrossthenodes,makingiteasierforthenodestoﬁndhigh-qualitynodesthatare notamongthemostpopularones[50].

Wefind that in networksthat adoptthe ranking by cumulative popularity(as measured by the numberof incoming links– nodeindegree[30])astherankingalgorithm,thecorrelationbetweennodepopularityandqualitystronglydepends notonly on thenodes’sensitivityto quality,butalsoon their willingnessto selectlow-ranked nodes(“explorationcost” in[6]). Besides, theranking bya popularitymetricthat isnot biasedby node age[25] (calledrescaled indegree in[25]) leadstonetworkswhereboththenodes’finalpopularityandthenodes’scorearesignificantlybettercorrelatedwithnode quality,andthefinalpopularitydistributionissignificantly morediverse.Interestingly,whentheexplorationcost islarge, networksthatadoptedarandomrankingofthenodesexhibitevenhigherindegree-qualitycorrelationthannetworksthat adoptedapopularity-basedranking:whilepopularity-basedrankingalgorithmsarealwaysusefulforthenodestodiscover high-qualitycontent, they may accelerate the disseminationof low-quality content when individuals relytoo heavily on them.

Tofurtherpromotepopularitydiversity,weintroduceandvalidatea rankingalgorithm– therankingbyrescaled inde-greewithrandompromotion– wheretheoriginalrankingbyrescaledindegreeis“perturbed” by promotinga small number ofrandomly-selectednodestothetop-10orthetop-20oftheranking.Suchaperturbationhasadeterministiccomponent (thenumberofnodesthat arepromoted isfixed)andanoisyone(the promotednodesare chosen atrandom). Itallows ustostudytheimpactofasmallamountofnoiseonsystemicproperties,inasimilarspiritaspreviousstudiesthat inves-tigatedthe impactofnoise ondemocratic consensuspromotion inanimal groups[7],dynamical influencedetection [17], andtheperformanceofhumangroupsincoordinationproblems[42].Wefindthatwithrespecttotherankingbyrescaled indegree,therankingbyrescaledindegreewithrandompromotiongeneratesnetworkswithmoreevenpopularity distribu-tions.Intriguingly,wefindthattherandompromotioncanhaveamarginaloranegativeimpactontheagreementbetween nodes’finalpopularityandquality,dependingonthemodelparameters.

Themaincontributionofthisarticleistoprovidetheﬁrstsystematiccomparisonofnetwork-basedrankingalgorithms withrespectto their long-termsystemic effects.Comparedto existing works,ourwork shifts thefocus fromthe perfor-manceofrankingalgorithmsinretrievingrelevantnodestothepropertiesofthenetworksthat emergewhentheranking algorithminﬂuencesthenodes’behavior.Inaddition,ourworkcontributestotheliteratureontherelationbetween

(3)

larityandquality(seeSection2.2forrelatedworks)byrevealingthatsuppressingthebiasbynodeageofpopularity-based metricsisbeneﬁcialtoqualitypromotion,detection,andpopularitydiversity.

The manuscript is organizedas follows.Section 2 summarizes relatedworks on the validation of ranking algorithms in time-evolving networks (Section 2.1) and the relation between popularity andquality in social and information sys-tems(Section 2.2).Section 3introducesamodelofnetworkgrowthtogetherwiththerankingalgorithmsconsideredhere, andthe rankingevaluation criteria.Section 4 presentsthe resultsofour numericalsimulations bothfor thebasicmodel (Sections4.1–4.4)andforavariantofthemodelwithnoderemoval(Section4.5),togetherwithanapplicationofourmodel toarealinformationnetworkofscientificpapers(Section4.6).Section5isdedicatedtoadiscussionofourresultsandtheir implicationsforalgorithmicevaluationandthequality-popularityrelationininformationsystems.AppendicesA-Bconclude themaintext.TheSupplementary Material(SM)fileisavailableonline– figureswhoselabelscontainan“S” (e.g.,Fig.S1) canbefoundintheSMfile.

2. Relatedworks

2.1. Validationofrankingalgorithmsincomplexnetworks

Infact,thesyntheticnetworksgeneratedwithourranking-basedgrowthmodelcanbeinterpretedasbenchmarkgraphs for ranking algorithms. Ourfocus on quality promotion, quality detection, anddiversity promotion makes our validation frameworkforrankingalgorithmfundamentallydifferentfromexistingvalidationmethods.Indeed,network-basedranking algorithms are typically evaluated accordingto their abilityto identify structural vital nodes[29], ﬁndthose nodes that maximizethereachofaspreadingprocess[20],singleoutexpert-selectedimportantnodes[25,26],orrespectvarioussets ofaxioms[1,33].Inthefollowing,weprovidethebasicideasbehindthesestudies.

Structuralvital nodescanbedefinedasthenodeswhoseremovalcausesthelargestdamagetothenetwork’s connect-edness[20,29]. Theidentificationofsuch key nodesisreferred to asthe structuralinfluencemaximization problem[20]. Network-basedrankingalgorithmshavebeenthereforecomparedwithrespecttotheirabilitytoidentifythesenodes.High accuracycanbeobtainedbymeansofoptimalpercolationtheory[29].

Dynamicvitalnodescanbedefinedasthenodesthatmaximizeagivenfunctionthatdependsonboththetopology of thenetwork anda specificdynamical processonthenetwork [20].The identificationofsuch influentialnodesisreferred toasthedynamicinfluencemaximizationproblem[20].Theinfluentialspreadersareawidely-studiedexample:They max-imizetheasymptoticreach ofagivenspreading processwhichunfolds ona networkedpopulation[20]. Inlinewiththis definition,alargenumberofstudieshavecomparedtheaccuracyoftherankingsbydifferentalgorithmsinidentifyingthe optimalspreaders– see[20] fora review.In thiscontext,a relevant(andstilldebated)questioniswhetherranking algo-rithmsthat only uselocal information[23] are able tooutperform algorithms that take into account thewhole network topology[22].

Other scholars havebeen interestedinthe abilityof theranking algorithms to identifyimportant nodeswhose value hasbeen recognizedby experts.Forexample,one can evaluatethe algorithms bytheir ability toidentifyseminal papers selectedbytheeditorsofprestigiousjournals[25],scholarswhohavebeenawardedaprestigiousprize[35],orpatentsthat havebeenlabeledasseminal byexpertsintheinvolvedtechnological domains[26].Werefertothereview[19]formore examples.

Finally,scholarshavecharacterizedandvalidatednetwork-based rankingalgorithmsbasedontheirmathematical prop-erties.Inthisapproach,onepostulatesasetofaxioms,andpositsthata goodrankingalgorithmshouldfulﬁllall ofthem (or at least, most of them). For example, Boldi and Vigna [1]introduced three axioms (sizeaxiom, densityaxiom, and score-monotonicityaxioms)andusedthemtocharacterizethebehavior ofvarious centralitymetrics.Theyfound thatthe harmoniccentrality(inspiredbythepopularclosenesscentrality)is theonlymetricthatsatisﬁesall ofthem.Wereferto Boldi and Vigna [1]for more details anda summary ofprevious attempts to formalize the propertiesof network-based rankingalgorithmsthroughaxioms.

2.2. Therelationbetweenpopularityandquality

Our work contributesto the literature on the relation betweenthe popularityof an agentin a social system andits inherent quality.In thiscontext, quality (also referred to asfitness [28], ortalent [18]) refers to the popularitythat the agentwouldexperience in theabsence ofsocial influencemechanisms [39]. Arobust finding inprevious experimentsof diversenature [4,5,11,39] is that thecurrent rankingposition ofan item (or its currentpopularity) heavily influences its eventualpopularityorsuccess.Asaconsequence,oneofthekeychallengesistoassesswhetherinagivensystem,thefinal popularityofanitemisareliableproxyforitsquality.

Cho[5]pointedoutthatrankingalgorithmsandsearchenginesthatfavoralreadypopularitemscreateastrong “popu-laritybias”[11]– alsodubbedas“search-enginebias”[5],and“googlearchy”[15]– suchthatonlyalready-popularnodescan receivesubstantialattentioninthefuture,whereasrecenthigh-qualitynodeswillremainessentiallyunnoticed[5].Thisbias can amplifyinitial differencesbetweentheitems’ popularity,leadingtodisproportionately highpopularityofsomeitems regardlessoftheir quality.O’Madadhainetal.[32] emphasizedthat staticrankingalgorithms (likeGoogle’s PageRank[3]) arebasedontime-aggregatenetworkrepresentationsand, forthisreason,theydonotrespectthesequenceofeventsthat

(4)

Table 1

The network growth model and ranking algorithms: Notation used in this article. Variable Description

Model properties A Adopted ranking algorithm

r(_jA)(t) Ranking position of node i at time t by algorithm A

qi Node quality

N Final number of nodes α Exploration cost parameter β Sensitivity to ranking parameter Ranking algorithms k Indegree

R ( k ) Rescaled indegree

R ( k )+RP Rescaled indegree with random promotion

ledtotheformationofthenetwork. Tosolve thisissue,they introduced[32] andvalidated[33] anetwork-based ranking algorithm,calledEventRank,thattakesintoaccountthedetailedsequenceofevents,inasimilarspiritastherecent litera-tureontemporalnetworks[19].

Salganik et al. [39] found that in artificial cultural markets, showing the items’ ranking by popularityto consumers significantlyaffects theitems’ final popularity. Recentstudies [47] emphasized theindividuals’limitedattentionas a de-terminant for the viralpopularity of low-quality items. Other recentworks focused on modeling the interplay between quality/fitness/talentandpopularityforvarioustypesofinformationitemsoragents,includingwebsites[18],scientific pa-pers[28,46],researchers[27,43],andbestsellerbooks[48].Bothmodel-based[18,28]andexperimentalresults[39]indicate thatin presenceof socialinfluence, therelationbetweenpopularityand qualityis highlynon-linear,meaning that small variationsofqualityleadtolargevariationsinpopularity.

3. Modelandrankingalgorithms

InthisSection,weintroducethemodelofnetworkgrowth(Section3.1),therankingalgorithmsconsideredinthispaper (Section3.2),andthemetricsusedtoevaluatethelong-termimpactofrankingalgorithms(Section3.3).WerefertoTable1 forasummaryofthenotationusedinthisarticle.

3.1.Themodelofnetworkgrowth

Wefocushereonmonopartitedirectednetworks.Ourmodelismeanttorepresentasocialorinformationnetworkwhere thenumberofnodesgrows withtime.Inthe model,eachnode i isendowedwitha qualityparameter qi thatquantifies itsattractivenesstonewincomingconnectionsintheabsenceofrankinginfluence.Beforegeneratinganetwork,wechoose therankingalgorithmAthatinfluencesthegrowth:thenodeschoosethetargetsoftheirlinks1_based_on_the_ranking_of_the

nodesbyA.Forthesakeofbrevity,wesaythatthenetworkisA-generatedorthatthesystem“hasadopted” algorithmA. Weintroduceamodelthatfeaturesthreeessentialelements:

1.Growth.Ateachtimestep,onenewnodeentersthesystem.Nodescanthusbelabeleddirectlybythetimestepinwhich theyappeared.Eachnewnodecreatesmdirectedlinkstomdifferentpreexistingnodes.2

2.Ranking-drivenattachment.Withprobability

β

,nodetchoosesjasthetargetofalinkwiththeprobability

P(A)

(

j,t

)

=

(

r (A) j

(

t

))

−α t−1 s=1

(

rs(A)

(

t

))

−α , (1)

where r(_jA)

(

t

)

is the ranking positionof node j at time t according to A; the real number

α

≥ 0 is a tunable model parameterreferredtoasexplorationcostbyCiampagliaetal.[6]:Largevaluesof

α

implythatthenodesareonlywilling toconnecttothetop-nodesbytheadoptedrankingalgorithm,whereaslowvaluesof

α

allowthenodestoalsoconnect tolow-rankednodes.

3.Quality-drivenattachment.Withprobability1−

β

,nodetchoosesjasthetargetofalinkwiththeprobability

P(q)

₍

_j,_t

₎

₌ qj

t−1 l=1ql

. (2)

Ourmodelreducesitselfto themodelby Fortunatoetal.[10]in thespecialcase

β

=1.In other words,node quality andquality-drivenattachmentarenovelelementswithrespecttoFortunatoetal.’smodel[10].Differentlyfromtherecent popularitydynamicsmodelbyCiampagliaetal.[6]thatconsidersaculturalmarketcomposedofaﬁxednumberNofitems, 1 The network’s directed links might be interpreted as friendship or follower relationships in online social networks, or as citations between documents in information networks.

2 Self-loops and multiple links between a given pair of nodes are prohibited.

(5)

our model represents a network that grows with time. Compared to static markets, growing networks better represent realinformationsystemswherenewproducts,informationitems,songs,etc.continuallyappearwithtime.Differentlyfrom previous works[6,11,28],weaimtouseourgrowingnetworkmodeltocomparethelong-termpropertiesofthenetworks generatedbydifferentrankingalgorithms.

3.2. Rankingalgorithms

Ourmaingoalistouncoverthelong-termimplicationsofthetemporalbiasofstaticcentralitymetricsandthebeneﬁts fromsuppressingsuchbias.Forthisreason,wefocushereonindegree(duetoitssimplicityandwideuse[30])andrescaled indegree(asitisapopularitymetricthatisnotbiasedbytemporaleffects[25]).Inaddition,we introducearandom pro-motionmechanism which“perturbs” the ranking byrescaled indegree by promoting randomly-selectednodes tothe top ofthe ranking,andweconsider arandom rankingofthe node asa baseline. Weprovidebelowthe detailsof thesefour rankingalgorithms.

1. Ranking by indegree,k. The indegree3 _of _a _node _is _deﬁned_as _the _number _of _incoming _connections _received _by _that

node [30].We simply rank the nodesin order ofdecreasing indegree k, which isarguably the simplestway to rank the nodesin a directed network [30]. Ingrowing networks, node indegree isstrongly biasedby node age [19,25], as conﬁrmedbynumericalsimulationsandanalyticcomputationswithourmodel(seeAppendixB).

2. Rankingby(age-)rescaledindegree,R(k).We rankthenodesinorderofdecreasing age-rescaledindegree[25] R(k).The rescaled indegree is built on indegree by requiring that node score is not biased by node age.More speciﬁcally, for eachnode i,weconsidera referenceset Ri:=

{

i−

/

2,...,i+

/

2

}

of

+1 nodesofsimilarageasnode i– we set

=0.01N.Wecomputethemean

μ

i(k)andthestandarddeviation

σ

i(k)ofnodeindegreewithinthisreferencesetas

μ

i

(

k

)

=1₊₁ j∈Ri kj,

σ

i

(

k

)

=

1 +1 j∈Ri

(

kj−

μ

i

(

k

))

2.

TherescaledindegreeRi(k)ofnodeiisgivenbythez-score[25]

Ri

(

k

)

=

ki−

μ

i

(

k

)

σ

i

(

k

)

.

(3)

Therescaledindegreeofagivennodethusquantifiesthedifferencebetweenthenode’sindegreeandthemeanindegree ofnodesofsimilarage,inunitsofstandarddeviations.Usingthez-scoretonormalizestaticmetricsofnodeimportance iscustomaryinscientometrics[44],wherescholarsaimtogaugetheimpactofagivenscientificpaperindependentlyof itsfieldandpublicationdate[45].Besides,incitationnetworks,therescaledindegreeallowsustoearly-identifyseminal papers[25],movies[36]andpatents[26]betterthancitationcount.

3. Rankingbyage-rescaledindegree withRandomPromotion,(R(k)+RP).Whilethe age-rescaledindegreesuppressesthe cu-mulativeadvantageofoldernodes[25],itmaystillamplifytheadvantageofthenodesthat,perhapsbychance,received quicklymany moreconnections than nodes ofsimilar age did.To reduce thispotential problem andfurther increase diversity,weintroducetherankingbyrescaled indegreewithrandompromotion(R(k)+RP):Werankthenodesby age-rescaledindegree,R(k),andthenwe “promote” Prandomly-selectednodestothetop-Toftheranking.4 _Therefore,_the

fraction

η

:=P/Tofnodesinthetop-Tbytherankingischosenatrandom.Byplacingsomepreviouslyoverlookednodes atthetopoftheranking,thismechanismgivesthesenodesenhancedvisibilityand,therefore,anadditionalopportunity toattractsome links.Inthefollowing,we showresultsforT=10and

η

=0.5; resultsforother valuesofTand

η

are shownintheSupplementaryMaterial(SupplementaryFiguresS11– S13andS19–S24).

4. Randomranking. Thenodesarerankedatrandom.The resultingrankingisusedasabaselineto understandforwhich parametervaluesdoestheﬁnalindegree-qualitycorrelationbeneﬁtfromtherankingsbyindegreeandrescaledindegree. Fortherankingalgorithmsconsideredabove,iftwoormorenodeshappentohavethesamescore,theirrelativeorder isdeterminedatrandom.

Choosingthequality distributioninthemodeldeservessome attention.Asimplemean-ﬁeldapproximationshowsthat for

β

=0,theexpectedﬁnal nodepopularityisproportionaltonodequality; simulationresultsshow thatfor

β

>0,node ﬁnal popularityisapower-lawfunctionofnode quality(seeAppendixBfordetails).Motivatedbythisproperty,tomimic thebroadpopularitydistributionstypicallyobservedinrealdata[30],wechooseaParetodistributionofthequalityvalues

q (see Appendix Afordetails). Thischoice leadsindeedto broadindegree distributions asshownin Fig.S1. We referto AppendixAforfurthersimulationdetails.

3 In the following, we will use interchangeably “indegree”, “popularity”, and “cumulative popularity”. This is because, in our simple setting, the incoming links received by a node are the only available information on its “popularity”. The situation might be different in a real online system where, for example, the popularity of a video can be quantiﬁed by various means: the number of downloads, the number of views, the number of shares, etc.

4 When a node whose original ranking position was r

0 > T is promoted to the ranking position r ∗≤ T in the top- T of the ranking, the ranking positions of the nodes that occupied the ranking positions { r ∗_{, r}∗_{+ 1 , . . . , r}₀_{− 1 , r}₀_}_{increase by one (i.e., these nodes lose one ranking position).}

(6)

Table 2

Evaluation metrics used in this article. Property Symbol Description

Quality promotion rA₍_{k, q}₎ _{Pearson’s linear correlation between node indegree and node quality q in networks grown with algorithm A}

PA

100(k, q ) Precision of indegree in identifying the top-100 nodes by quality, in networks grown with algorithm A

Quality detection rA₍_{s, q}₎ _{Pearson’s linear correlation between the score produced by algorithm A and node quality q}

PA

100(s, q ) Precision of algorithm A in identifying the top-100 nodes by quality, in networks grown with algorithm A

Diversity Neff The effective number of nodes, deﬁned as the inverse of the Herﬁndahl index, H ( k ), associated with node indegree Weemphasizethattherescalingproceduredescribedaboveisonlyoneamongthepossiblewaystodesignatime-aware rankingalgorithm[19].Alreadyin2005,O’Madadhain andSmyth [32] recognized thatstaticcentralitymetricsare based on time-aggregatenetwork representations and, for thisreason, they do not respect the sequence of events that led to theformation of thenetwork. To solvethis issue,O’Madadhain and Smythintroduced [32] andvalidated [33] a ranking algorithmbased on the detailedcontacttime-series, actingas precursors ofthe recent streamof literature on centrality in temporal networksbased on time-preservingpaths [19]. We refer to Liao et al.[19] for a review of time-dependent rankingalgorithmsincomplexnetworks.Nevertheless,wefocusontheage-rescaledindegreeherebecauseofitssimplicity andeffectivenessinsuppressingtheindegree’sbiastowardsold nodes[19,25].Testingalternativetime-dependentranking algorithmswithinourmodel-generatedbenchmarkgraphsisaninterestingpossibilityforfutureresearch.

3.3.Evaluatingthealgorithms’long-termimpact

Toassessthelong-termimpactofdifferentalgorithms,wegrowrandomnetworksbasedonthemodeldescribedabove foreachrankingalgorithmA.Thealgorithmdetermines,atanytime,therankingofthenodesthat,inturn,determinesthe probabilitythatanodereceivesanewconnection,accordingtoEq.(1).WereferthenetworksgeneratedwithalgorithmAas

A-generatednetworks.Ideally,wewouldexpectagoodrankingalgorithmAtoexhibitthethreemainpropertiesintroduced above:(i)Qualitypromotion:Thealgorithmgeneratesnetworkswheretheﬁnal popularityofthenodesstronglycorrelates withtheirquality;(ii)Qualitydetection:Thealgorithmiseffectiveinidentifyinghigh-qualitynodes;(iii)Popularitydiversity: Thealgorithmgeneratesnetworkswherethepopularityisnot stronglyconcentratedina fewnodes.Inthefollowing,we introducethreegroupsofobservablestoquantifythesethreeproperties– seeTable2forasummary.

3.3.1. Qualitypromotion

Fora given ranking algorithm A, we evaluate how well the ﬁnal popularityk ofthe nodes reproduces the inherent qualityvaluesqforA-generatednetworks.WedosobycalculatingthePearson’s linearcorrelation rA

₍

_k_,_q

₎

_between_node popularityk andnode qualityq.Whilethismetrictakesinto accountall nodes,we arealsointerestedinthe algorithm’s abilityto promotethetop-quality nodes.Tothisend,we measuretheprecisionP₁₀₀A

(

k,q

)

deﬁnedasthefractionofnodes thatareplacedinthetop-100ofboththerankingbykandtherankingbyq.

Onecaveatappliestothequality-promotionevaluationmetrics,rA

(

k,q

)

andP₁₀₀A

(

k,q

)

.Duetothealways-present possi-bilityinthemodelofchoosingthetargetbynodequality,eventherandomrankingyieldspositivecorrelationandprecision. Thiscorrelationincreasesasthenodes’sensitivitytoqualityincreases(i.e.,as

β

decreases)– seeFig.2andthe related dis-cussionbelow.Astherandomrankingdoesnotcontainanyusefulinformation,itisclearthatitcannotprovideanypositive contributionto the quality promotion. Toquantify this effect,we measure the algorithms’ quality detectionmetrics (see nextparagraph).

3.3.2. Qualitydetection

ForagivenrankingalgorithmA,weevaluatehowwelldothenodescoresassignedbyAreproducetheinherentnode qualityvalues qforA-generatednetworks. Tothisend, we measure thePearson’s linearcorrelation rA

(

s,q

)

betweenthe node-levelscoressproducedby thealgorithmA(measuredattheendofthenetwork growth)andnode qualityq forA -generatednetworks. In parallel,we also measurethe precision [21]P₁₀₀A

(

s,q

)

ofthealgorithm, deﬁnedasthe fractionof nodesthatareplacedinthetop-100ofboththerankingbysandtherankingbyq.

3.3.3. Popularitydiversity

Toquantifytheabilityofthealgorithms toevenlyspreadthepopularityacrossthenetwork’snodes,andthusavoidits disproportionateconcentrationinasmallnumberofnodes,wemeasuretheindegree’sHerﬁndahlindex[14]H(k).

H

(

k

)

= N i=1

_k i L

2 , (4)

whereListhetotalnumberoflinks.The indexisproportional tothevarianceofthenetwork’sindegreedistribution: the smallerH(k), the lessconcentrated the indegree ina smallgroup of nodes.More speciﬁcally, the indexranges between

Hmin=1/N (egalitariannetworkwhere allthenodeshaveindegree equaltoL/N) andHmax=1(networkwhereone node

has indegree L, andall the other nodes have indegree zero). Consequently, N_{e f f}

(

k

)

=1/H(k

)

can be interpreted as the

(7)

Fig. 1. Quality promotion as measured by r ( k, q ) (the Pearson’s linear correlation between node indegree k and node quality q – top panels), and P 100 ( k,

q ) (the precision of node indegree k in identifying the top-100 nodes by quality q – bottom panels): a comparison between indegree-generated and R ( k )- generated networks. (A-B): r ( k, q ) for indegree-generated ( r k ( k, q ), panel A) and R ( k )-generated ( r R ( k, q ), panel B) networks, as a function of the model parameters α(exploration cost) and β(reliance on ranking). (C): Ratio r R ( k , q )/ r k ( k , q ) as a function of the model parameters. (D-E): P

100 ( k , q ) for indegree- generated ( P k

100(k, q ) , panel D) and R ( k )-generated ( P 100R (k, q ) , panel E) networks, as a function of the model parameters. (F): Ratio P 100R (k, q ) /P 100k (k, q ) as a

function of the model parameters. Results are averaged over 500 realizations.

“effective numberofnodes” that receivedincoming links:N_eff isequal to1 ifone singlenode received all the incoming links,anditisequaltoNifallthenodesreceivedthesamenumberoflinks.Wepositthatagoodrankingalgorithmshould notproducetooconcentratednetworks,anditsgeneratednetworksshouldthereforeexhibitrelativelylargevaluesofNeff.

4. Results

We grow networksofN=10,000nodes accordingtothe model describedabove;we referto Appendix Afor all the simulationdetails. Here,we show theresultsfornodeoutdegreem=6;theresults form=3are commentedbelowand shownintheSM (Figs.S2– (S13). Ourgoal istocomparethepropertiesofnetworksgeneratedby thefourranking algo-rithmsdeﬁnedinSection3,accordingtotheobservablesdescribedabove.

4.1. Qualitypromotion:impactoftheagebiassuppression

Fig.1Ashowstheindegree-quality Pearson’s linearcorrelationr(k, q) inindegree-generatednetworks, asafunction of the modelparameters

α

and

β

.The correlation betweenﬁnal popularity andquality issensitive to both

α

and

β

. As

α

grows,itbecomesharderforlow-rankedhigh-qualitynodestoacquire newincomingconnections, whichresultsinlower indegree-qualitycorrelation.The(approximately)monotonousdependenceofrk₍_k,_q₎_on

_α

_was_not_found_for_the_model_of astaticmarket5 _by_Ciampaglia_et_al._[6]_,_which_indicates_that_it_is_a_consequence_of_the_network’s_growth._As

_β

_grows,_the

nodesbecomelesssensitivetoqualityand,asadirectconsequence,theindegree-qualitycorrelationsdeteriorates.

5 By static market, we mean a collection of a ﬁxed number of nodes. This is different from our growing network model where, at each time step, a new node enters the system and connects to the preexisting nodes.

(8)

Fig. 2. Quality promotion as measured by r ( k, q ) (the Pearson’s linear correlation between node indegree k and node quality q – top panels), and P 100 ( k, q ) (the precision of node indegree k in identifying the top-100 nodes by quality q – bottom panels): a comparison between indegree-generated (red circles),

R ( k )-generated (blue squares), R (k) + RP-generated (green rhombuses), and random-generated networks (triangles). The three columns correspond, from left to right, to β= 0 . 7 , 0 . 8 , 0 . 9 , respectively. Results are averaged over 100 realizations; the error bars represent the standard error of the mean.

Fig.1BshowsthecorrelationrR₍_k,_q₎_between_node_indegree_k_and_node_quality_in_R₍_k_)-generated_networks._The_ﬁgure showsthatby adoptingthe age-rescaledmetric R(k),the indegree-qualitycorrelation stays large forabroader parameter region.Forexample,we observevaluesofrR₍_k,_q₎ _as_high_as_0.35_when

₍

_α

_,

_β

₎

₌

₍

₁_,₀_.₇

₎

_,_which_corresponds_to_a_scenario wherethenodesaredrivenbyqualityonlythreetimesoutoften.6

Tovisuallyappreciatetheparameterregionswheretheindegree-qualitycorrelationssigniﬁcantlydifferbetweenthetwo classesofnetworks,werepresenttheheatmapoftheratiorR₍_k_,_q_)/_rk₍_k_,_q₎₍_Fig.₁_C)._When

_β

₌₀_,_nodes_are_only_sensitive_to qualityandtheplottedratioisonebydeﬁnition(onaverage)becausenoderankinghasnoinﬂuence.When

β

>0,thenodes becomedrivenbothbyqualityandbyranking,whichmakesitpossibletoobservedifferencesbetweenthenetworksgrown withdifferentalgorithms.Weﬁndthattherescaledindegreeproducesnetworkswithahigherindegree-qualitycorrelation forallthe parameterspace; weobserve thelargestadvantage ofthe rescaledindegree intermsof qualitypromotion for theregionwhereboth

α

and

β

arerelativelylarge– i.e.,intheregionwherethenodesareunwillingtochooselow-ranked nodesand,atthesametime,arehighlysensitivetoranking.Analogousheatmapsfortheprecisionmetrics(Fig.1D–F)show thattheindegree’sprecisioninR(k)-generatednetworksissystematicallylargerthanthat inindegree-generatednetworks. DifferentlyfromFig.1C,Fig.1FshowsthatthelargestgapsbetweentheprecisioninR(k)-andindegree-generatednetworks occur inthe small

α

, large

β

region. We discussthe reasons behindthe differenttrend forcorrelation and precision in Section4.2.

4.2.Qualitypromotion:comparingthefourrankingalgorithms

Fig.1 indicates that adopting the rescaled indegree promotes node quality better than indegree for a broadrange of modelparameters.Atthe sametime,itisimportantto comparetheperformance achievedby adoptingrescaled indegree withthatachievedbyadoptingrescaledindegreewithrandompromotionandrandomranking,respectively.Aspointedout above,for

β

<1,thenodeshaveanon-zeroprobabilitytochoosetheirtargetsbasedonquality,whichresultsinapositive indegree-quality correlation even fornetworks generatedby adopting a random ranking. We focus onthree values of

β

(

β

=0.7,0.8,0.9)whichcorrespondtopopulationsofnodesthataremostlydrivenbyrankingwhenselectingtheirtargets; analogousresultsforsmallervaluesof

β

areshowninSupplementaryFigs.S16– S18form=6andS8–S10form=3.

We ﬁnd (Fig. 2, top panels) that the indegree-quality correlation observed in R(k)-generated networks is not always largerthantheindegree-qualitycorrelation observedinrandom-generatednetworks:astheexplorationcost

α

grows, the

6 As opposed to r k₍_{k, q}₎_{= 0 . 12 observed for indegree-generated networks for the same pair of (}_α_,_β_{) values.}

(9)

indegree-qualitycorrelation inR(k)-generatednetworksdwindles;when

α

islargerthanone,R(k)-generatednetworks ex-hibitasmallerindegree-qualitycorrelationthanrandom-generatednetworks. Whilealargeexplorationcostisharmfulfor theoverall indegree-qualitycorrelation inbothindegree-generatedandR(k)-generated networks,for

α

≥ 1,indegree’s pre-cision in promoting the top-quality nodes (Fig. 2, bottom panels) tends to grow with the exploration cost. Remarkably,

R(k)-generatednetworksexhibitthelargestprecisionvaluesforallvaluesof

α

.

The behavior ofthe (R(k)+RP)-generated networks(i.e, the networksgenerated withtheranking by rescaled indegree withrandompromotion,R(k)+RP)isnon-trivial.Forsmallvaluesof

α

,theindegree-qualitycorrelationandindegree’s preci-sionofthesenetworksarecomparablewiththoseofR(k)-generatednetworks,whichindicatesthatrandomlypromotinga fewnodestothetopoftherankingisnotharmfultoqualitypromotion.Ontheother hand,for

α

>1,theindegree-quality correlation(orindegree’sprecision)ofR(k)-generatednetworksbecomeslargerthanthatof(R(k)+RP)-generatednetworks. We conclude that for large values of

β

(

β

=0.7,0.8,0.9), the random promotion mechanism has a marginal impact on quality promotionforsparse networkswhen

α

<1,whereas itisharmfulto qualitypromotion whenthe explorationcost islarge,i.e., for

α

>1.Forsmallervaluesof

β

(

β

=0.1,0.3,0.5), the(R(k)+RP)-generatednetworksandtheR(k)-generated networksexhibitcomparablevaluesofindegree-qualitycorrelationandprecision(Figs.S8– (S16).

The qualitativedifference betweenthetop andthebottom panelsofFig. 2isexplained bythe differentindegree dis-tributions ofthe networksgeneratedwithdifferentvaluesof

α

. When

α

islarge,the incominglinksconcentrateon few top items(as the effectivenumber ofnodes shows, see Fig.4 belowand therelated discussion)and, at thesame time, the low-qualityitemsremain unnoticed. Thesmallnumber ofincominglinksreceivedby low-quality itemsdonot allow themetricstodiscriminatetheirquality,whichresultsinsmallindegree-qualitycorrelationvalues.Bycontrast,high-quality nodesreceivealargenumberofincominglinks,anditispossibleforthemetricstorankthematthetop,whichresultsin relativelylargeprecisionvalues.

4.3. Qualitydetection

Inindegree-generatedandR(k)-generatednetworks,nodeindegreeandage-rescaledindegree,respectively,arethescores thatareusedforthenodes’ranking.Ourabilitytodetectqualityinsuchnetworksisdeterminedbythestrengthofthe re-lationbetweennodescoresandq(asmeasuredbyboththePearson’slinearcorrelationrs₍_s,_q₎_and_the_precision_Ps

100

(

s,q

)

).

Remarkably, thecorrelation rR₍_R₍_k_), _q₎ _is _larger_than _the_correlation _rk₍_k,_q₎ _for_all_the _parameter_values ₍_Fig.₃_, _top pan-els,andFig.S14).Theprecision ofage-rescaledindegreeinidentifyingthetop-qualitynodesisalsolargerthanindegree’s precisionforalltheparametervalues(Fig.3,bottompanels).Inqualitativeagreementwiththeresultsobtainedforquality promotion (Fig. 1), form=6and

β

=0.7,0.8,0.9,the random promotion mechanismturns out tohave a marginal im-pactfor

α

<1,whereasitisharmfulfor

α

>1.Form=3,6and

β

=0.1,0.3,0.5,the(R(k)+RP)-generatedandR(k)-generated networksexhibit similarlevelsofscore-quality correlation(see Figs.S9andS17).Bybeingcompletelyinsensitive tonode popularity,therandomscoreachievesonaveragezeroprecisioninidentifyingthetop-qualitynodes.Asexpected,whilethe randomscorecanstill generatenetworkswithnon-zeroindegree-qualitycorrelation(Fig.2),therankingsitproduceshave nopracticalutility.

4.4. Diversity

Forbothindegree-andR(k)-generated networks,theeffectivenumberofnodesNeff dependsonboth

α

and

β

(Figs.4). Unsurprisingly,therandomrankingproducesthemostegalitariannetworks(Fig.4),withNeff valuesabove3000=0.3N.In qualitativeagreementwithprevious ﬁndings[39],forallthestudiedmetrics,alargersensitivitytorankingposition(i.e.,a larger

α

value)leadstoamoreunequalpopularitydistribution—thismanifestsitselfinthedecreaseofNeff as

α

increases. Importantly, the popularity distribution is more even forR(k)-generated networks than for indegree-generated networks (Figs. 4 andS15), andthe (R(k)+RP) algorithm further enhancespopularity diversity. In summary,the age normalization procedure not only improves the indegree-quality andthe score-quality correlation, butit also decreases the popularity inequalityinthesystem;asexpected,therandompromotionmechanismfurtherdecreasesthepopularityinequality.Atthe sametime,boththeR(k)-generatedandthe(R(k)+RP)-generatednetworksexhibitNeff valuessigniﬁcantlysmallerthanthe

N_effachievedbytherandomranking.Itremainsopen todesignrankingalgorithmsthatleadtomoreegalitariannetworks thanthosegeneratedwithrescaledindegreeand(R(k)+RP),yetmaintainingasimilarlevelofqualitypromotionandquality detection.

4.5. Networkgrowthmodelwithnoderemoval

Inrealsocialandinformationsystems,notonlycannewnodesenterthesystem,butalsoexistingnodescandisappear or lose relevance. The members of an online community, forexample, may lose interest in the platform anddeactivate their accounts, whichmayeventuallylead tothe“death” oftheplatform [12].IntheWWW, manywebpagesaredeleted every day, whichmakes it essential toincorporate node deletion intonetwork growthmodels [18].Totake into account thissituation,weintroduceavariantofthemodelintroducedinSection3.1byassumingthatthenodesareinoneoftwo possiblestates:“active” or“removed”.Ateverytimestept,eachactivenodebecomesremovedwithprobability

μ

,andone newactivenodeentersthesystemandchoosesitstargetsamongtheexistingactivenodesaccordingtotherulesdescribed

(10)

Fig. 3. Quality detection as measured by r ( s, q ) (the Pearson’s linear correlation between node score s and node quality q – top panels), and P 100 ( s, q ) (the precision of node score s in identifying the top-100 nodes by quality q – bottom panels): a comparison between indegree-generated ( s = k, red circles),

R ( k )-generated ( s = R (k) , blue squares), (R(k) + RP) -generated ( s = R (k) , green diamonds), and random-generated networks ( s is given by a random score, black triangles). The three columns correspond, from left to right, to β= 0 . 7 , 0 . 8 , 0 . 9 , respectively. The dots represent averages over 100 realizations; the error bars represent the standard error of the mean.

inSection 3.1. Inthe continuum approximation, thenumber ofactive nodes, A(t), followsthe equation A˙

(

t

)

=1−

μ

A

(

t

)

whosesolution(withtheinitialconditionA

(

1

)

=1)is

A

(

t

)

=

μ

−1₊

₍

₁₋

_μ

−1

₎

_exp

₍

₋

_μ

₍

_t_{− 1}

₎₎

_. ₍₅₎

Afteraninitiallineargrowth(A(t)≈ tfort

μ

−1_),_the_number_of_active_nodes_eventually_equilibrates_at_A∗₌₁_/

_μ

_._The

validityofEq.(5)isconﬁrmedbyournumericalsimulations(seeFig.5F).Notethatwhen

μ

= 0(i.e.,withoutnoderemoval),

A

(

t

)

=t andN=t∗,where t∗ denotesthe totalnumberofperformedsimulation steps.In thefollowing,we sett∗=104_;

duetothenoderemovalmechanism,thenumberA(104₎_of_active_nodes_at_the_end_of_the_simulation_is_{substantially}_smaller

than104_,_as_expected_based_on_Eq.₍₅₎_(see_Fig.₅_F).

Weinvestigatewhetherthenoderemovalmechanismsubstantiallyalterstheresultsdescribedabove.Morespeciﬁcally, weperformnumericalsimulationswithvaluesof

μ

rangingfrom10−4 to5· 10−4_,_which_corresponds_to_expected_values_of

theasymptoticnumberofactivenodes,A∗=1/

μ

,rangingfrom2,000to10,000.Weﬁndthatthroughthewholerangeof

μ

values,theresultsare qualitativelysimilartothoseobtainedforthenetworkswithoutnode removal(seeFig.5A-Eforthe resultsfor

α

=0.5,

β

=0.7,andFigs. S25for

α

=1,

β

=0.7).Alltheconsidered observablesdisplay remarkablestability withrespectto

μ

,whichindicates thatnoderemovalhasamarginalimpactonthealgorithms’qualitypromotion,quality detection,andpopularitydiversity.Itremainsopentodeterminethelargestremovalprobability

μ

toleratedbythesystem beforetheresultschangequalitatively.

4.6.Acasestudy:growinganetworkofscientiﬁcpapersbasedondifferentmetrics

Sofar,wehaveconsideredmodel-generatednetworks.Howareourmodelandourresultsonrankingalgorithmsrelevant to real systems? If our model provides a plausible description of the growth of a given system ofinterest, andwe are abletoinferthe(

α

,

β

) parametersfromtheavailable data,we shouldbe abletoquantifythepotential impactofvarious rankingalgorithmsonthefuturepropertiesofthesystem.Stimulatedby thisobservation,inthisSection, weanalyzereal informationnetworkstoﬁtourmodel’sparameterstotheempiricaldata,andusetheresultingparameterstogrowagain thenetworkbasedondifferentrankingalgorithms.

(11)

Fig. 4. Diversity as measured by N eff(the larger the value, the more egalitarian the indegree distribution): a comparison between indegree-generated (red circles), R ( k )-generated (blue squares), (R(k) + RP) -generated (green diamonds) and random-generated (black triangles) networks. The three columns correspond, from left to right, to β= 0 . 7 , 0 . 8 , 0 . 9 , respectively. The dots represent averages over 100 realizations; the error bars represent the standard error of the mean.

Fig. 5. The impact of node removal on quality promotion, quality detection, and popularity diversity. We grow networks generated with the model de- scribed in Section 4.5 with α= 0 . 5 , β= 0 . 7 , T = 10 −4 . We show ﬁve ranking evaluation metrics as a function of the inverse of the removal probability,

A∗₌_μ−1 : (A) the Pearson’s correlation between node indegree and node quality; (B) the Pearson’s correlation between node score and node quality (see Fig. 3 ’s caption for the deﬁnition of node score); (C) the effective number of nodes, N eff; (D) the indegree’s precision in identifying the top-100 nodes by quality; (E) the score’s precision in identifying the top-100 nodes by quality. As in the previous ﬁgures, different lines correspond to the networks generated with different algorithms. The symbols represent averages over 50 realizations; the error bars represent the standard error of the mean. Panel F shows the number of active nodes at time t = 10 4 , i.e., at the time when we halt the simulations, as a function of _μ−1_{; the values computed analytically through} Eq. (5) well match the values observed in the simulations.

(12)

Theinformationnetworksanalyzed herearesubsetsofthe AmericanPhysical Society(APS) citationnetworkof scien-tiﬁcpapers.7 _The_citation_dataset_provided_by_the_APS_contains_all₅₃₉₉₇₄_papers_published_by_the_APS_from₁₈₉₃_to₂₀₁₃

togetherwiththeir5992897referencestootherpaperspublishedbyAPSjournalsandtheirpublicationdates.Forour anal-ysis,weextracttwosubsetsofpapersthatincludethepaperswiththePACSnumber8_89.75.Hc_(“Networks_and_genealogical

trees”;thesubset includesN=1615papersandL=8222citations)and03.67.Lx(“Quantum computationarchitecturesand implementations”;thesubsetincludesN=3876papersandL=24213citations),respectively.

Toquantifythepotentialimpactofdifferentrankingalgorithms,theﬁrststepistoinfertheoptimalpair

(

α

ˆ,

_β

ˆ

₎

_of_model parameterstogetherwiththeoptimalalgorithmthattogether leadtothebestagreementbetweentheA-generatedmodel networksandtheobserved realnetwork. Thereal datadeterminethe ﬁnal numberofnodes inthenetwork, their order ofappearance,andtheir outdegreevalues;inan A-generatedmodelnetwork,thenodeschoosetheir referencesbasedon the rules of the model described in Section 3.1, where the ranking position of the nodes is determined by the chosen algorithmA.

Qualityis notaccessible inreal dataasopposedto syntheticnetworkswhere itis a well-deﬁnednode-level variable. Toovercome this limitation, as rescaled indegree is the best-performing metric in quantifying node quality in synthetic networks(Fig.3), weusethepapers’ﬁnalrescaledindegreescoreR(k)asaproxy fortheir qualityifR(k)≥ 0,whereas we setq=0forpaperswhoseR(k)isnegative.

TheagreementbetweentheA-generatednetworksandtheoriginalnetworkisquantiﬁedbythemodel’smeanerrorper node,e(A,α,β).ForeachA-generatednetwork9_G(A_,

_α

_,

_β

₎

_,_we_ﬁrst_deﬁne_the_network’s_mean_error_per_node,_e

_(G(A

_,

_α

_,

_β

₎₎

_,

as e

(

G

(

A,

α

,

β

))

= 1 N N i=1

ki

(

G

(

A,

α

,

β

))

− k∗i

, (6)

whereki

(G)

denotes paperi’s indegreein networkG, whereas k∗i denotesthe observedindegree ofpaperi.The model’s meanerrorper node,e(A,α,β) isdeﬁned astheaverage of e

(G)

over a sampleof 100 A-generated networks. Essentially, themodel’smeanerrorpernodequantiﬁestheaveragedifferencebetweennodeindegreeinrealandmodelnetworks.The lowere(A,α,β),thebettertheagreementbetweentheA-generatedmodelnetworksandtherealnetwork.

Finally,we comparetheproperties ofthenetworksgenerated bydifferent rankingalgorithms for

(

α

,

β

)

=

(

α

ˆ,

_β

ˆ

₎

_. _The rationalebehindsuch acomparisonisthat theparameter values

(

α

ˆ,

_β

ˆ

₎

_that _yield _the_best_agreement _represent_our_best estimationsof thenodesâ exploration cost

α

andsensitivityto popularity

β

.Based on thisassumption, we estimate the long-termimplications ofvarious algorithms by simplychanging the algorithmthat is used togrow the network,whilst keeping

α

=

α

ˆ and

β

=

_β

ˆ_ﬁxed.

ForthesubsetthatcorrespondstothePACScode89.75.Hc,weﬁndthatindegree-generatednetworkswith

α

ˆ=0.5and ˆ

β

=0.5 exhibit the best agreement with the realnetwork (mean errorper node e(k,0.5,0.5)₌₃_.₁_, _see _Fig.₆_A. _Therefore,

weﬁx

α

=0.5and

β

=0.5,andwecomparethenetworksgeneratedbythreedifferentalgorithms:indegree,age-rescaled indegree,andtherankingbyrescaledindegreewithrandompromotion.Inqualitativeagreementwithourpreviousresults, weﬁndthatR(k)-generatednetworksand(R(k)+RP)-generatednetworksexhibitsubstantiallylargerpopularitydiversitythan

k-generatednetworksandtheoriginal network(see Fig.6B). Qualitativelysimilar resultsareobtainedforthe subsetthat correspondstothePACScode03.67.Lx(seeFig.6C-D).Theseresultsconﬁrmthatinarealinformationsystem,suppressing theagebiasofpopularitymetricscanincreasepopularitydiversity.Beyondnumericalsimulations,weenvision thatfuture studiesmightfurthervalidatethisconclusionbymeans ofﬁeld experimentswheresubjectsindifferentgroupscanselect variousinformationitemsbasedontheitems’rankingpositionbydifferentrankingalgorithms.

5. Discussion

Tosummarize,we findthat age-rescaledindegreeallows usnotonly tofairly compareold andrecent nodes[25],but alsotogeneratenetworksthatexhibitalargerindegree-qualitycorrelationandamoreevenpopularitydistributionthanthe networksgeneratedwithindegree.Examplesofwidely-usedcumulativepopularitymetricsincludethenumberofviewsor downloads foronline content, the number of received citations for scientific papers,among others. Our results indicate thatdespitethewidespreaduseofcumulative popularitymetrics,age-rescaledmetrics maybetterhelp bothuserstofind high-qualitycontent,andhigh-qualitycontentto experiencelargersuccess thanlow-quality content.The random promo-tionmechanismintroduced hereconsistentlyimproves popularitydiversity,whereas its impactonquality promotion and detectionrangesfrommarginaltosubstantialdependingontheexplorationcostparameter

α

.

Themainmessageofourworkisthatnetwork-basedgrowthmodelscanhelpustounderstandtheimpactofnetwork growthmechanisms on the rankings by a givenalgorithm [19,24,27],as well as to estimate the impact of the adoption ofdifferentrankingalgorithms by agivensystem. Inother words,we caninvestigatenot onlyhow thepast evolutionof

7 The dataset has been already analyzed in _[25,26,28]_{, and it can be downloaded here:}_{https://journals.aps.org/datasets}_.

8 The PACS (Physics and Astronomy Classiﬁcation Scheme) codes refer to a hierarchical classiﬁcation of research areas in physics, astronomy, and related sciences. We refer to https://journals.aps.org/PACS for further information.

9 An A -generated network is a realization of the stochastic process that generates A -generated networks.

(13)

Fig. 6. A comparison of two real networks (subsets of the APS citation network) with the respective synthetic networks generated with the model described in Section 3.1 , using different ranking algorithms. Panels (A,C) show the scatter plots between the nodes’ original indegree k ∗ _{and the nodes’ average} indegree k in indegree-generated networks ( α= ˆ α, β= ˆ β) generated as described in the main text, for the APS network’s subsets that correspond to the PACS codes 89.75.Hc and 03.67.Lx, respectively. Panels (B,D) show the averages of N eff over 100 realizations of the model networks generated by different ranking algorithms. The ( R ( k )+RP)-generated ( η= 0 . 5 , T = 10 ) and R ( k )-generated networks exhibit signiﬁcantly larger values of N eff than the original network and the indegree-generated network. This indicates that in a real information system, suppressing the age-bias of popularity-based metrics can substantially improve popularity diversity.

thesysteminﬂuenced thecurrentrankings,butalsohowthe adoptedrankingsmayinﬂuence thefutureevolutionofthe system.Asaresult,ourranking-drivenandquality-drivengrowingnetworkmodelcanbeinterpretedasagenerativemodel forbenchmarkgraphstoevaluate theperformanceofrankingalgorithms intermsofqualitypromotion,quality detection, andpopularitydiversity.

Westressthattherankingalgorithms consideredherearesimpleinthesense ofbeingbasedonasinglecriterionthat isfurthermorereadilyquantiﬁed (e.g.,therankingby node indegree).Ina generalscenariowheremultiplecriteriaareto beconsidered, especiallywhentheyarecontradicting suchasnode popularityandnovelty,theexistingvastbody of liter-atureonmultiple-criteria decisionanalysis[13]– withtechniquessuchastheanalytic hierarchyprocess[38]and outrank-ing [37]– becomesrelevant.Thisgoesbeyonddirectlycombiningnode scoresby various metrics[31],yetit canbecome relevant whendesigning an informationﬁlteringsystemforreal userswiththeir heterogeneous,andoften contradictory, needsandpreferences.

Ourworkshedslightonthelong-studiedrelationbetweenquality/talentandsuccess[39,48]:Dothehigh-qualitynodes experiencelargersuccessthanthelow-qualitynodes?Whynodesofsimilarworthinessexperiencewidelydifferentsuccess? In realsystems,addressing thesequestionsis challengingasdefining “nodequality” inan unbiasedandobjectivewayis oftennotpossible.Ourmodel-basedapproachbypassesthisobstaclebydefiningnodequalityasanintrinsicnodeproperty, andbybuildingmultipleindependentrealizations ofan artificialsystemwherethenodeschoosetheir connectionsbased on both the other nodes’ ranking andtheir quality.At the same time, while the modelstudied here is arguablyone of thesimplestmodelswhichfeaturealltheelementsofinterestinouranalysis(networkgrowth,andthejointinfluenceof rankingandqualityonnetworkgrowth),itcanonlyprovideastylizeddescriptionofthegrowthofrealnetworks.

Finally, the applicationof ourgrowing networkmodel to the American Physical Societycitation network of scientific papersconstitutesan attempttoestimate thepotential consequencesofvarious rankingalgorithms onarealinformation system.Weenvisionthatmoresophisticatedmodelstogetherwithsuitablefieldexperimentswillimprovethereliabilityof model-basedpredictionsoftheeffectsofrankingalgorithms,providinguswitharobustbasisformoreinformedchoicesof rankingalgorithms forreal-worldapplications.Todrawaparallel,ina similarwayashigh-resolutionmodelsofepidemic spreadinghaveledtoaccuratepredictionsofthepropertiesofdiseaseoutbreaks[30],detailedmodelsofnetworkevolution mayleadtotheaccuratequantificationoftheconsequencesoftheadoptionofagivenmetricinagivensystem.

(14)

Authorcontributionsstatement

M.S.M.andM.M.conceived theidea, M.S.M.andL.L.designedresearch, S.Z.performedthenumericalsimulations, S.Z. andM.S.M.performedtheanalyticcomputations,allauthorsanalyzedanddiscussedtheresults.S.Z.andM.S.M.wrotethe manuscript.Allauthorsreviewedthemanuscript.

Acknowledgments

WethankYi-ChengZhangformanyenlighteningdiscussionsonthetopic.ThisworkhasbeensupportedbytheNational Natural Science Foundation of China (Grants Nos. 11622538, 61673150), the Science Strength Promotion Program of the UniversityofElectronicScienceandTechnologyofChina(GrantNo.Y030190261010020),andtheNaturalScienceFoundation ofZhejiangProvinceofChina(Grantno.LR16A050001).MSMacknowledgestheUniversityofZürichforsupportthroughthe URPPSocialNetworks.

AppendixA. Detailsonthenumericalsimulations

Wefocusonnetworkscomposedof104_nodes,_and_study_the_following_model_parameters:

_α

_from_0.25_to_2.0_with_step

0.25and

β

from0to1withstep0.1.Nodequality valuesare drawnfromtheParetodistributionP

(

q

)

∼ q−3 _where_q_∈_[1,

∞). The network is initialized witha network of m nodes,each of them withone outgoing andincoming link. Ateach timestept>m+1,anewnodetisaddedtothesystemandmnodes(onlyresultsform=6areshowninthemaintext, whereasresultsforbothm=3andm=6are shownintheSupplementary Material)are chosenastargetstoestablishm

newlinks. Withprobability

β

,theprobabilitythat agivennodeischosen isgivenby Eq.(1),withprobability1−

β

,itis givenby Eq.(2).Tosavecomputationaltime, fortimest≤ 102_,_we_re-compute _and_update_the_rankings _at_each _time_step,

whereasfortimest≥ 102_,_the _newly_introduced _nodes_are_placed _at_the _bottom_of_the_node _ranking,_and_we_re-compute

andupdatethe rankings every10 time steps.Tomake ourresults insensitivetorandom ﬂuctuations, foreach parameter pair(

α

,

β

),all the resultsshownhererepresentaverages over a suﬃcientlylarge numberofrealizations ofthe network growthprocess.

AppendixB. Therelationbetweenpopularityandqualityinindegree-generatednetworks

Westartby consideringnetworkswherethenodescannotperceivetheothernodesranking,andarecompletelydriven byquality(

β

=1).Inthisscenario,theaverageindegreeofnodeiattimeNisgivenby

ki

(

N

)

= N t=i+1 m_tq₋₁i j=1qj . (B.1)

InthethermodynamiclimitN i,byusingasimilarmean-ﬁeldapproximationasin[10],weobtain

ki

(

N

)

m qi q log

N− 1 i− 1

. (B.2)

There is a good agreementbetween Eq.(B.2) and the resultsof numerical simulations (see Supplementary Fig.S26). Suchlinearrelationbetweenindegreeandqualitydoesnot holdfor

β

>0,wheretheanalyticcalculationismadedifficult by thefact that the ranking position ofa given node ata giventime isinfluenced by both its quality and theprevious dynamicsofthesystem.Nevertheless,wefindthattherelationk_i

(

N

)

=Cqδ_i ﬁtsreasonablywellthesimulationresults,and thedependenceoftheﬁttedexponent

δ

onnodeageisrelativelyweak(seeFig.S27andTableS1fordetails).

Supplementarymaterial

Supplementarymaterialassociatedwiththisarticlecanbefound,intheonlineversion,atdoi:10.1016/j.ins.2019.03.021.

References

[1] P. Boldi , S. Vigna , Axioms for centrality, Internet Math. 10 (3–4) (2014) 222–262 .

[2] E. Bozdag , Bias in algorithmic ﬁltering and personalization, Ethics Inf. Technol. 15 (3) (2013) 209–227 .

[3] S. Brin , L. Page , The anatomy of a large-scale hypertextual web search engine, Comput. Netw. ISDN Syst. 30 (1) (1998) 107–117 .

[4] H. Cai , Y. Chen , H. Fang , Observational learning: evidence from a randomized natural ﬁeld experiment, Am. Econ. Rev. 99 (3) (2009) 864–882 . [5] J. Cho , S. Roy , Impact of search engines on page popularity, in: Proceedings of the 13th International Conference on World Wide Web, ACM, 2004,

pp. 20–29 .

[6] G.L. Ciampaglia , A. Nematzadeh , F. Menczer , A. Flammini , How algorithmic popularity bias hinders or promotes quality, Sci. Rep. 8 (2018) .

[7] I.D. Couzin , C.C. Ioannou , G. Demirel , T. Gross , C.J. Torney , A. Hartnett , L. Conradt , S.A. Levin , N.E. Leonard , Uninformed individuals promote democratic consensus in animal groups, Science 334 (6062) (2011) 1578–1580 .

[8] A. Diaz , Through the Google Goggles: Sociopolitical Bias in Search Engine Design, Web Search, 2008, pp. 11–34 .

[9] R. Epstein , R.E. Robertson , The search engine manipulation effect (seme) and its possible impact on the outcomes of elections, Proc. Natl. Acad. Sci. 112 (2015) E4512–E4521 .

(15)

[10] S. Fortunato , A. Flammini , F. Menczer , Scale-free network growth by ranking, Phys. Rev. Lett. 96 (21) (2006) 218701 .

[11] S. Fortunato , A. Flammini , F. Menczer , A. Vespignani , Topical interests and the mitigation of search engine bias, Proc. Natl. Acad. Sci. 103 (2006) 12684–12689 .

[12] D. Garcia , P. Mavrodiev , F. Schweitzer , Social resilience in online communities: the autopsy of friendster, in: Proceedings of the First ACM Conference on Online Social Networks, ACM, 2013, pp. 39–50 .

[13] S. Greco , J. Figueira , M. Ehrgott , Multiple Criteria Decision Analysis, Springer, 2016 .

[14] O.C. Herﬁndahl, Copper Costs and Prices: 1870–1957, 1959, Baltimore, Published, P, 1959. Published for Resources for the Future by Johns Hopkins Press.

[15] M. Hindman , K. Tsioutsiouliklis , J.A. Johnson , Googlearchy: how a few heavily-linked sites dominate politics on the web, Annual Meeting of the Midwest Political Science Association, 2003 .

[16] D. Jannach , K. Hegelich , A case study on the effectiveness of recommendations in the mobile internet, in: Proceedings of the Third ACM Conference on Recommender Systems, ACM, 2009, pp. 205–208 .

[17] J.-J. Jiang , Z.-G. Huang , L. Huang , H. Liu , Y.-C. Lai , Directed dynamical inﬂuence is more detectable with noise, Sci. Rep. 6 (2016) 24088 . [18] J.S. Kong , N. Sarshar , V.P. Roychowdhury , Experience versus talent shapes the structure of the web, Proc. Natl. Acad. Sci. 105 (2008) 13724–13729 . [19] H. Liao , M.S. Mariani , M. Medo , Y.-C. Zhang , M.-Y. Zhou , Ranking in evolving complex networks, Phys. Rep. 689 (2017) 1–54 .

[20] L. Lü, D. Chen , X.-L. Ren , Q.-M. Zhang , Y.-C. Zhang , T. Zhou , Vital nodes identiﬁcation in complex networks, Phys. Rep. 650 (2016) 1–63 . [21] L. Lü, M. Medo , C.H. Yeung , Y.-C. Zhang , Z.-K. Zhang , T. Zhou , Recommender systems, Phys. Rep. 519 (1) (2012) 1–49 .

[22] L. Lü, Y.C. Zhang , C.H. Yeung , T. Zhou , Leaders in social networks, the delicious case, PLoS ONE 6 (6) (2011) e21202 .

[23] L. Lü, T. Zhou , Q.M. Zhang , H.E. Stanley , The h-index of a network node and its relation to degree and coreness, Nat. Commun. 7 (2016) 10168 . [24] M.S. Mariani , M. Medo , Y.-C. Zhang , Ranking nodes in growing networks: when PageRank fails, Sci. Rep. 5 (2015) .

[25] M.S. Mariani , M. Medo , Y.-C. Zhang , Identiﬁcation of milestone papers through time-balanced network centrality, J. Informetr. 10 (4) (2016) 1207–1223 . [26] M.S. Mariani , M. Medo , F. Lafond ,Early identiﬁcation of important patents: design and validation of citation network metrics, Technol. Forecast. Soc.

Change (2018) .

[27] M. Medo , G. Cimini , Model-based evaluation of scientiﬁc impact indicators, Phys. Rev. E 94 (3) (2016) 032312 . [28] M. Medo , G. Cimini , S. Gualdi , Temporal effects in the growth of networks, Phys. Rev. Lett. 107 (23) (2011) 238701 .

[29] F. Morone , H.A. Makse , Inﬂuence maximization in complex networks through optimal percolation, Nature 524 (7563) (2015) 65 . [30] M. Newman , Networks: An Introduction, Oxford University Press, 2010 .

[31] K. Okamoto , W. Chen , X.-Y. Li , Ranking of closeness centrality for large-scale social networks, in: International Workshop on Frontiers in Algorithmics, Springer, 2008, pp. 186–195 .

[32] J. O’Madadhain , P. Smyth , Eventrank: a framework for ranking time-varying networks, in: Proceedings of the 3rd International Workshop on Link Discovery, ACM, 2005, pp. 9–16 .

[33] J. O’Madadhain , J. Hutchins , P. Smyth , Prediction and ranking algorithms for event-based network data, ACM SIGKDD Explor. Newslett. 7 (2) (2005) 23–30 .

[34] N. Perra, L.E.C. Rocha, Modelling opinion dynamics in the age of algorithmic personalisation, 2018, arXiv: 1811.03341 .

[35] F. Radicchi , S. Fortunato , B. Markines , A. Vespignani , Diffusion of scientiﬁc credits and the ranking of scientists, Phys. Rev. E 80 (5) (2009) 056103 . [36] Z.-M. Ren , M.S. Mariani , Y.-C. Zhang , M. Medo , Randomizing growing networks with a time-respecting null model, Phys. Rev. E 97 (5) (2018) 052311 . [37] B. Roy , The outranking approach and the foundations of electre methods, in: Readings in Multiple Criteria Decision Aid, Springer, 1990, pp. 155–183 . [38] T.L. Saaty , Analytic hierarchy process, in: Encyclopedia of Operations Research and Management Science, Springer, 2013, pp. 52–64 .

[39] M.J. Salganik , P.S. Dodds , D.J. Watts , Experimental study of inequality and unpredictability in an artiﬁcial cultural market, Science 311 (5762) (2006) 854–856 .

[40] I. Scholtes , R. Pﬁtzner , F. Schweitzer , The social dimension of information ranking: a discussion of research challenges and approaches, in: Socioinfor- matics-The Social Impact of Interactions between Humans and IT, Springer, 2014, pp. 45–61 .

[41] A. Sharma , J.M. Hofman , D.J. Watts , Estimating the causal impact of recommendation systems from observational data, in: Proceedings of the Sixteenth ACM Conference on Economics and Computation, ACM, 2015, pp. 453–470 .

[42] H. Shirado , N.A. Christakis , Locally noisy autonomous agents improve global human coordination in network experiments, Nature 545 (7654) (2017) 370 .

[43] R. Sinatra , D. Wang , P. Deville , C. Song , A.-L. Barabási , Quantifying the evolution of individual scientiﬁc impact, Science 354 (6312) (2016) aaf5239 . [44] G. Vaccario , M. Medo , N. Wider , M.S. Mariani , Quantifying and suppressing ranking bias in a large citation network, J. Informetr. 11 (3) (2017) 766–782 . [45] L. Waltman , A review of the literature on citation impact indicators, J. Informetr. 10 (2) (2016) 365–391 .

[46] D. Wang , C. Song , A.-L. Barabási , Quantifying long-term scientiﬁc impact, Science 342 (6154) (2013) 127–132 .

[47] L. Weng , A. Flammini , A. Vespignani , F. Menczer , Competition among memes in a world with limited attention, Sci. Rep. 2 (2012) 335 . [48] B. Yucesoy , X. Wang , J. Huang , A.-L. Barabási , Success in books: a big data approach to bestsellers, EPJ Data Sci. 7 (1) (2018) 7 .

[49] A. Zeng , C.H. Yeung , M. Medo , Y.-C. Zhang , Modeling mutual feedback between users and recommender systems, J. Stat. Mech. (7) (2015) P07020 . [50] T. Zhou , Z. Kuscsik , J.-G. Liu , M. Medo , J.R. Wakeling , Y.-C. Zhang , Solving the apparent diversity-accuracy dilemma of recommender systems, Proc.

Natl. Acad. Sci. 107 (2010) 4511–4515 .

The long-term impact of ranking algorithms in growing networks

The

long-term

impact

of

ranking

algorithms

in

growing

networks

Shilun Zhang

, Matúš Medo

, Linyuan Lü

, Manuel Sebastian Mariani

http://doc.rero.ch

Published in "Information Sciences 488(): 257–271, 2019"

which should be cited to refer to this work.

β

(

)

(

(

))

(

(

))

(

)

α

α

α

β

(

)

β

{

/

/

}

μ

σ

μ

(

)

σ

(

)

(

μ

(

))

(

)

μ

(

)

σ

(

)

η

η

η

β

β

(

)

(

)

(

)

(

)

β

(

)

(

)

(

)



_{, Matúš Medo}

_{, Linyuan Lü}

_{, Manuel Sebastian Mariani}

₍

₎

₍

₎

_α

_β

₍

_α

_β

₎

₍

₎

_β

₍

_μ

₎

₍

_μ