Identifying missing and spurious connections via the bi-directional diffusion on bipartite networks

(1)

Identifying

missing

and

spurious

connections

via

the

bi-directional

diffusion

on

bipartite

networks

Peng Zhang

a

,

An Zeng

b

,

c

,

∗

,

Ying Fan

c

,

∗

a_School_{of Science, Beijing University}_of_{Posts and}_{Telecommunications,}_Beijing,_{100876, PR China} b_Department_{of Physics,}_University_{of Fribourg, Chemin}_du_{Musee 3,}_CH-1700_Fribourg,_Switzerland c_School_of_Systems_Science,_Beijing_Normal_{University, Beijing,}_100875,_PR_China

Link prediction and spurious link detectionin complex networks have attracted increasingattention from both physical and computer science communities, dueto their wide applications in manyreal systems. Related previous works mainly focus on monopartite networks while these problems in bipartitenetworksarenotyetsystematicallyaddressed.Containingtwodifferentkindsofnodes,bipartite networksare essentiallydifferentfrommonopartitenetworks,especiallyinnodesimilaritycalculation: thesimilaritybetweennodesofdifferentkinds(calledinter-similarity)isnotwelldeﬁned.Inthisletter, weemploythelocaldiffusionprocessestomeasuretheinter-similarityinbipartitenetworks.Weﬁnd thattheinter-similarityisasymmetricifthediffusionisappliedindifferentdirections.Accordingly,we proposeabi-directionalhybrid diffusionmethodwhichisshowntoachieve higheraccuracy thanthe existingdiffusionmethodsinidentifyingmissingandspuriouslinksin bipartitenetworks.

Linkpredictionandspuriouslinkdetectionproblemsare long-standing challenges innetwork study [1]. Theyhave applications inmanyﬁeldssuchaschemistry,biology,sociologyandcomputer science.Inmanybiologicalnetworks,suchasfoodwebs, protein– protein interaction networks and metabolic networks, whethera link betweentwonodes existsmust be demonstratedby ﬁeld or laboratorialexperiments,whichis usuallyvery costly[2].In addi-tion, thedatain constructingbiological andsocial networksmay containinaccurateinformation,resultinginspuriouslinks[3]. Cor-rectingthenetworkconnectionscanbeveryexpensiveifitisdone bylaboratorialexperiments.Tosolvethisproblem,many network-basedmethodshavebeendevelopedandtheyare showntohave highaccuracyinidentifyingmissingandspuriouslinks[4].

So far, most related worksfocus on monopartite networks in which only one type of nodes exists [5]. However, many sys-tems coupled by two different building blocks should be mod-eledbybipartitenetworks.Forexample,thee-commercialsystems consisting of online users and items [6], the scientiﬁc collabo-ration system consisting of authors andpapers [7], family name inheritance systemconsisting of babies and names [8]are natu-rally described by such networks. The link prediction and spuri-ous link detectionproblems are not yet well addressedin bipar-titenetworks. Onecloseproblemis theso-called“network-based

*

Correspondingauthors.

E-mail addresses:[email protected](A. Zeng),[email protected](Y. Fan).

recommendation”inwhichnetworkstructureinformationisused togeneraterecommendationlisttoonlineusers[9–11].Thisﬁeld aims atpredicting the future linksfor each individual. However, whichlinkswillmostlikelytoemergeatsystemlevelstillremains unknown.

Comparedtothelinkprediction,thespuriouslinkdetectionis morediﬃcult.Thoughmanylinkpredictionmethods areclaimed tobe applicable to spurious link detection, they maylead to se-riousdistortionof networks’ structuralanddynamical properties, asshowninRef. [12].Therefore, thereisnot yeta well-accepted methodforthistask.The spuriouslinkdetectioninbipartite net-workisevennewerandlesstouched,tothebestofourknowledge. Onecloselyrelatedtopicisthedesignofthereputationsystemin online user–object rating networks through which the malicious spammerscanbeidentiﬁed[13].

Mostlinkpredictionandspurious link detectionare basedon calculatingthesimilaritybetweennodes[14].Mostsimilarity mea-sures in monopartite networks can be easily applied to calcu-late the intra-similarity (i.e. the similarity betweennodes of the samekind)inbipartitenetworks.However,theinter-similarity(i.e. thesimilarity betweennodes ofdifferentkinds) in bipartite net-worksis notyetwell-deﬁned [15,16].Inthisletter, weemploy a well-knownhybriddiffusionprocessfromrecommendersystemto calculateinter-similarities inbipartitenetworks. Thehybrid diffu-sion process was originally designed to be initialized from only one type of nodes [17]. Interestingly, we ﬁnd that the obtained

Published in 3K\VLFV/HWWHUV$±±

which should be cited to refer to this work.

(2)

inter-similarity willbeentirelydifferentifthedirectionofthe dif-fusionprocessisreversed.Asimilarphenomenon hasbeenfound when coarse graining bipartite networks [18]. We therefore de-signabi-directionalhybrid diffusionmethodandﬁndthat itcan achieveamuch higheraccuracythantheexisting diffusion meth-ods.We investigatethebi-directionhybriddiffusion inmany dif-ferent conditions and it is shown that the diffusion information fromboth directions are equally important inall the considered conditions.

1. Method

A bipartite network consists of two different types of nodes. Withoutlossof anygenerality,hereweusetheonlinecommercial system asan example. In such systems, the two kinds of nodes are respectively users and items. The bipartite network with N usersandM objectscanberepresentedbyanadjacencymatrix A, wheretheelementaiα equals1 ifuseri hascollecteditem

α

,and 0 otherwise. (In this letter,item nodeswill be labeled by Greek lettersandusernodeswillbeidentiﬁedbyLatinletters.)

We ﬁrst brieﬂy introduce the hybrid diffusion method in Ref. [17]. The hybrid diffusion method is a combination of two independentalgorithms calledMassdiffusion(MDforshort) [19]

andHeatconduction(HCforshort)[20].Forausernodei,theMD algorithm starts by assigning one unit of resource to each item collected by i, and redistributes the resource through the user– itemnetwork.Wedenoteby thevector fi theinitial resourceson items, where the

α

-th component fi

α is the resource possessed byobject

α

.Thethree-stepdiffusionaimsatestimatingthe inter-similaritybetweeni and

α

,namelythelikelihoodofexistingalink betweenthese two nodes.In the ﬁrst step, the resource reaches theobjectsselectedbythetargetuser.Inthesecondstep,the re-sourcereachestheuserswhoselectedthesameobjectsasthe tar-getuser,andtheseusersareusuallyrefereedasrelevantusers.In theﬁnalstep,theresourcereachestheobjectsselectedbythe rel-evantusers.Theseobjectsaremorelikelytobeselectedbythe tar-getuserinthefuture.Inrealnetworkstherecanbemanyobjects thatreachedbythediffusion buttheyarenotwiththesame like-lihoodtobeselectedbythetargetuser,theamountofresourceon theobjectis theestimation ofthelikelihood.The inter-similarity betweenuseri anditemsis obtainedbysetting theelementsinfi

tobe f_αi

=

aiα,andredistributingthem by

fi

=

W fi,where Wαβ

=

1 kβ N

j=1 ajαajβ kj

,

(1)

is the diffusion matrix, with kβ

=

lN=1alβ and kj

=

_γM₌1ajγ denotingthedegreeofobject

β

anduser j respectively[19]. Phys-ically, the diffusion is equivalent to a three-step random walk startingwithki unitsofresourcesonthetargetuseri.

TheHCalgorithmworkssimilarly totheMDalgorithm,but in-steadfollowsaconductiveprocessrepresentedby

W_α_β

=

1 k_α N

j=1 ajαajβ kj

.

(2) Physically, the recommendationscores can be interpreted asthe temperatureof an item, which is the average temperatureof its nearestneighborhood,i.e.itsconnectedusers.Thehigherthe tem-perature of an item, the higher its similarity to the target user node[20].TheMDandHCmethodsareillustratedinFig. 1(a).

ThehybridalgorithmofMDandHCwasproposedin[17],with thediffusionmatrixgivenby

W_α_β

=

1 k1_α−λkλ_β N

j=1 ajαajβ kj

,

(3)

Fig. 1. (Coloronline.) Theillustrationofthe(a)user-basedand(b)item-based dif-fusionprocesses.Thebluenumbernexttoanodeistheresourcethenodereceived inthemassdiffusionprocess.Therednumberistheresourcethenodereceivedin theheatconductionprocess.Theﬁnalresourceonnodescanbeusedtoestimate theinter-similaritybetweenthemandthetargetnode.

where the parameter

λ

adjusts the relative weight between the two algorithms. When

λ

increases from 0 to 1,the hybrid algo-rithm changes gradually fromHC to MD. Bytuning the parame-ter

λ

,theinter-similaritybetweenusernodesanditemnodeswill bemoreaccuratelymeasured.

The abovediffusionmethodsare deﬁnedtobeonly initialized fromtheuserside,whilethe diffusionfromtheitemsideisalways ignored.Here,weconsiderthehybriddiffusionfromtheitemside. We ﬁrstset theinitial resource intheuser sideas f α_i

=

aiα and redistributeitinthebipartitenetworkvia

Wi j

=

1 k1_i−λkλ_j M

β=1 ajαajβ k_β

.

(4)

Again,thehybridalgorithmisHCwhen

λ

=

0 andMDwhen

λ

=

1. ThisprocessisillustratedinFig. 1(b).

In this letter, we denoted the diffusion initialized from the usersideasuser-basedhybriddiffusionandtheobtained similar-itymatrix as

fUHD.The diffusioninitialized fromtheitem side is denotedastheitem-basedhybriddiffusionandthesimilarity ma-trix iswritten as

fIHD_._One _can_immediately _see_from_{Fig. 1} _that

f_iUHD_α

=

fIHD

αi . To make use of the information of diffusion from bothdirections,wedesignabi-directionhybriddiffusion(BHDfor short) method inwhich theﬁnal inter-similarity matrix is calcu-latedas

f_iBHD_α

= θ

f UHD iα

j

β

fUHDjβ

+ (

1

− θ)

f IHD αi

j

β

fIHDjβ

,

(5)

where

θ

isatunableparameterwhichcontrolstheweightbetween theuser-baseddiffusionanditem-baseddiffusion.

2. Dataandmetric 2.1. Data

To test theperformance of theBHD method,we make useof twobenchmarkdatasets:RYMandEconophys.TheRYMdatawas obtained by downloading publicly available data from the music ratingsWebsiteRateYourMusic.com.InRYM,userscanratethe mu-sicfrom1to10(i.e.,theworsttothebest).Weconsideronlythe

(3)

ratinghigherthan5asalinkhere.Weusearandomsubsetofthis datawith986users, 1197itemsand26 681 links[17].The Econo-physdata consistsofaset ofpapers intheﬁeld ofeconophysics that were published between April 1995andSeptember 2010 in 78scientiﬁcjournalsandane-printserver(i.e.arXiv.org)[21].The networkhas1608 distinct authors,1204 papersand19 061 links. Inthesetwonetworks, linkpredictionistopredict futureratings orcitations.Bothnetworksmayalsohavespuriouslinkssince peo-ple may give ratings orcite papers carelessly or some malicious behavior exists. Therefore, it is naturally to test our method in thesebipartitenetworks.

In thelink predictionproblem, each datais randomlydivided intotwoparts:thetrainingsetcontains90%ofthelinks(ET₎_and theremaining10% oflinksconstitutestheprobeset(EP_)._The al-gorithm will run on ET _while _EP _will _be _used _to _estimate _the prediction accuracy. In the spurious link detection problem, the original data is called true network. 10% spurious links(ES₎ _are randomlyaddedtothetruenetwork,whichgeneratetheobserved network(EO_)._Algorithms_will_be_run_on _EO_,_and_ES _will_be_used toevaluatethedetectionaccuracy.

2.2. Metric

In order to evaluate the accuracy of different algorithms, we adopttherankingscore (RS)index[22].Inthelinkprediction prob-lem,RS measureswhetherthelinksintheprobesethavehighest rankamongallnonexistinglinks.Foreachlinki

α

intheprobeset, itsinter-similarityrankisdenotedasRiα amongall L nonexisting links.ThenRS canbecalculated as

RS

=

1

|

EP

_|

iα∈EP Riα L

.

(6)

Accordingtothedeﬁnition,awell-performedlinkprediction algo-rithmshouldhaveasmallRS.

Likewise,weusetherankofspuriouslinktoevaluatethe accu-racyofalgorithmsinthespuriouslinkdetectionproblem.Foreach link i

α

inthe spuriouslink set,its rankbasedoninter-similarity (

f)isdenotedasRiα amongall L observedlinks.Agoodalgorithm willrankthespuriouslinksbehindthetruelinks,thusthese spuri-ouslinksaresupposedtohavelargerank.Inordertobeconsistent withtherankingscoreinlinkpredictionproblem,wecalculateRS as RS

=

1

−

1

|

ES

_|

iα∈ES Riα L

.

(7)

Inthisway,itisstillthesmallerRS thebetter.

3. Results

WeﬁrstinvestigatetheperformanceoftheBHDmethodunder different parameters

λ

and

θ

. The heatmaps of RS in link pre-diction andspurious link detection are shown in Fig. 2. In each panel,we usethedashed linetomarktheregionwheretheBHD method outperformsthe original hybrid method(with optimal

λ

when

θ =

1)inRS.Onecanseethat inboth datasets,theregion is large. Moreover, the region mainly locates in the region with 0

< θ <

1,whichindicatesthatawell-performedalgorithmshould combinethe diffusioninformation initialized fromboth userand itemsides.Whenappliedtolinkprediction(seeFig. 2(a)and(b)), the BHD method will outperform the hybrid method by 8.2% in RYMand15.0% inEconophys.In thecaseofspurious link detec-tion,theBHDmethodcanachieve22.1%higherRS thanthehybrid methodinRYMand14.4%higherRS inEconophys.Theresults im-plythattheBHDhasbiggeradvantageinspuriouslinkdetection.

Fig. 2. (Coloronline.) RankingscoreRS oftheBHDmethodintheparameterspace (θ,

λ

)inlinkpredictionfor(a)RYMand(b)Econophys.

RS of

theBHDmethodin theparameterspace(θ,

λ

)inspuriouslinkdetectionfor(c)RYMand(d)Econophys. Ineachpanel,thedashedlinemarkstheregionwhere

RS is

betterthanthebest

RS

valueachievablegiven

θ =

1.

Fig. 3. (Coloronline.) (a)

RS vs λ

and(b)

RS vs θ

inlinkprediction.(c)

RS vs λ

and (d)

RS vs θ

inspuriouslinkdetection.ThenetworkinthisﬁgureisRYM.Notethat they-axisinthisﬁgureisinlogscale.

InordertobetterunderstandtheresultsinFig. 2,wepick sev-eral

θ

and plot RS as a function of

λ

in Fig. 3(a), (c). One can seethatthere isan optimalRS∗ when tuning

λ

,whichis consis-tentwiththeresultsin[17].WecanalsoseethattheoptimalRS∗ when

θ <

1 can besmallerthanRS∗ when

θ =

1,whichconﬁrms that theBHDcan outperform the original hybridmethod (where onlyuser-baseddiffusionisconsidered).Forcompleteness,wealso show RS asa functionof

θ

forseveral

λ

in Fig. 3(b),(d).Clearly, thereis alsoan optimalRS∗ whentuning

θ

.The optimal param-eter

θ

is around 0

.

5, which indicates that the information from user-baseddiffusion anditem-based diffusionare equally impor-tant.Theoptimalparameters forlinkpredictionare

θ

∗

=

0

.

5 and

λ

∗

₌

₀

_.

_25._For_spurious_link_detection,_the_optimal_parameters_are

θ

∗

₌

₀

_.

_{5 and}

_λ

∗

₌

₀

_.

_3._The_results_in_{Fig. 3}_are_based_on_RYM_data, weremarkherethattheresultsbasedonEconophysdataare sim-ilar.

Inlink prediction, itis generallydiﬃcult to predict the miss-ing link of the nodes with small degree. This is known as the “cold-start”problem[23].In [17],ithasalreadybeenshownthat the item cold-start problem can be well addressed in the hy-brid method by tuning the parameter

λ

. More speciﬁcally, the

(4)

Fig. 4. (Coloronline.) (a) RSuser

k≤10 vs θ and (b) RSitemk≤10 vs θ in link prediction.

(c) RSuser

k≥100vs

θ

and(d)RSitemk≥100vs

θ

inspuriouslinkdetection. Thenetworkin

thisﬁgureisRYM.

prediction accuracyforsmalldegreeitemscanbelargelyimproved when

λ

<

1.However,theusercold-start problem(i.e.,predicting linksforsmalldegreeusers) cannot besolved bytuning

λ

. Actu-ally,addressingtheusercold-startisalsoofgreatimportancefrom the commercial point of view. In real online systems, web sites arecompetingforusers.Inordertoattractmoreusers,web sites shouldprovidenew/inactiveuserswithmoreaccurate recommen-dations, which means that they have to more accurately predict the future links for small degree users. The cold-start problems havetheir counterparts inspurious link detection. Different from thelinkpredictioncase,thespuriouslinksconnectingtothelarge degree nodesare usually more diﬃcult todetect. Therefore, itis importanttoimprovethedetectionaccuracyofspuriouslinksfor thenodeswithlargedegree.

Weargue herethat theusercold-start problemcanbe solved by theBHDmethod.In ordertoshow this, we pickall theusers with degree smaller than 10 and calculate the average ranking scoreof all their links inthe probe set asRSuser_k_≤₁₀.For complete-ness, we also calculate the average ranking score of the probe setlinksconnectingto theitemswithdegreesmaller than10 as RSitem

k≤10. In thisway, we can study RSkuser≤10 and RSkitem≤10 as a

func-tion of

θ

for different

λ

in Fig. 4(a), (b).One can seethat when

θ =

1 (i.e.theoriginalhybridalgorithm),RSitem

k≤10canbeindeed

re-ducedbytuning

λ

.However,RSuser_k_≤₁₀willbesigniﬁcantlyenlarged, which indicates that the user cold-start problem becomes even moreserious.Ontheotherhand,thisproblemcannotbewell ad-dressedalso when

θ =

0 (RSuser

k≤10 issmallbutRSitemk≤10istoolarge).

In BHD method where

θ

is around 0.5, one can see that both RSuser_k_≤₁₀ andRSitem_k_≤₁₀ can be improvedby tuning

λ

.Inthe spurious link detection, we mainly consider the spurious linksconnecting tothenodeswithdegreelargerthan100 inFig. 4(c)and(d)(i.e., we focus on RSuser_k_≥₁₀₀ and RSitem_k_≥₁₀₀). Similarly to the link predic-tion case, both RSuser_k_≥₁₀₀ and RSitem_k_≥₁₀₀ can be improved when

θ

is around 0

.

5.

Wefurtherinvestigatetheeffectsofdatasparsityonthe perfor-manceoftheBHDmethod.Inlinkprediction,weselectafraction 1

−

p (p ranging from 0.1 to 0.7 with interval 0.1) links from thewholedataset asthetraining set;thefraction p of thelinks formtheprobeset.Clearly,a higher p indicatesasparserknown network(i.e.,lessavailable information).Wereporttheminimum RS∗ of both theBHDmethod andthe hybridmethodin Fig. 5(a) and (b).Notethat RS∗ of BHDisobtainedby the doubleseekin

Fig. 5. (Coloronline.) Theoptimalrankingscore

RS

∗ ofBHDandhybrid methods vs

p in

linkpredictionfor(a)RYMand(b)Econophys.Theoptimalrankingscore

RS∗ofBHDandhybrid methodsvs

p in

spuriouslinkdetectionfor(c)RYMand(d) Econophys.Theinsetsaretheoptimal

θ

and

λ

inBHDmethodasafunctionof p. Theerrorbarsareobtainedbasedon10 independentrealizations.Someerrorbars areinvisiblesincetheyaresmallerthanthesizeofthemarkers.

theparameterspace(

θ

,

λ

).Obviously,theBHDmethodhasalower RS∗ than the hybrid method inboth data sets. The advantage of theBHDmethodbecomessmallerwhenp islarge.Thisisbecause theavailableinformationinthiscaseisverylimited,allalgorithms will havelow detecting accuracyandtheir performance becomes almostthesame.Inspuriouslinkdetection,weaddafractionp of spuriouslinkstotherealnetworktocreatetheobservednetwork. Thehigherp is,themorediﬃcultforthealgorithmstodetect spu-rious links. RS∗ against p is shownin Fig. 5(c)and (d). In RYM, RS∗ increaseswith p.However, the BHDmethod canstill have a lower RS∗ thanthehybridmethodunderdifferentp in bothdata sets.InEconophysdata,theRS∗ decreaseswith p.Thisisbecause the Econophys network is very sparse, a smallnumber of spuri-ous linkswillincrease theconnectivityandimprovetheaccuracy of the spurious link detection. However, note that RS∗ becomes worse when p is too large (e.g. RS∗

=

0

.

1365 when p

=

5). This phenomenonisverysimilartoarecentﬁndingwherethelink pre-dictionisimprovedbyaddingsomevirtuallinks[23].

TheinsetsinFig. 5showtheoptimal

θ

intheBHDmethod.One can see that

θ

∗ _is_rather _stable_around ₀

_.

_{5 in}_different _p, _which indicates that both user-baseddiffusion anditem-based diffusion are usefulunder different conditions.In the insets of Fig. 5, the optimal

λ

intheBHDmethodisreportedaswell.Onecanseefrom

Fig. 5(a),(b) that

λ

∗ approaches1 asthe trainingsetgets sparse,

which isconsistent witha recent ﬁnding[24].This phenomenon is due to the fact that mass diffusion (

λ

=

1) works better than the heat conduction (

λ

=

0) in sparse data. In Fig. 5(c), (d), the

λ

∗ _is_rather _stable_under _different _p, _indicating _that _the _optimal parameters are not sensitive to the fraction of spurious links in thenetworks.

4. Conclusion

Link prediction and spurious link detection in complex net-works have been intensively studied recently. Most of related worksarefocused onmonopartitenetworks.Inthisletter,we in-vestigate the problemof missing andspurious link identiﬁcation in bipartite networks. We propose a bi-directional hybrid diffu-sionmethodinwhichinter-similaritybetweennodesisestimated by the diffusion resource andthe diffusioninitialized fromboth typesofnodesis considered.Weﬁndthatthismethodcanachieve

(5)

higheraccuracy than theexisting methods inidentifying missing andspurious linksinbipartitenetworks.Moreover,ourmethodis veryeffectiveinsolvingthecold-startproblemsininformation ﬁl-tering.

Inthe literature,there areother importantexamples of spuri-ouslinkdetectionapproachesforbipartitenetworks(e.g. stochas-ticblockmodel[25]),which,however,designedfornetworkswith ratings.Moreover, the methodisbased onglobalalgorithms that can be prohibitive to use for large-scale systems. Instead, our method wouldbe easily applicable forlarge networks since itis basedonlocaldiffusion.Wealsoremarkthatdevelopingthe spuri-ouslinkdetectionapproacheshasmanyapplications.Forinstance, a relatedmethodhasbeenapplied topredict futureconﬂicts be-tweenteam-membersinsocialnetwork[26].

Many real networkssuch as online social networksare mod-eled by directed networks[27]. So far, the link predictionin di-rected network is still a challenge [28–30]. It is already pointed out that the directed networksare similar to bipartite networks and in some cases the problems in directed networks can be mappedtobipartitenetworks[31].Therefore,webelievethat the bi-directionalhybrid diffusionmethodinthis lettercanbe easily extendedtodirectednetworks,andweexpectourmethodtohave highaccuracyaswellindirectednetworks.Moreover,by consider-ing link senders asuser-like nodesand linkreceivers as objects-like ones, some link prediction methods for directed networks could be extended to bipartite networks [28–30]. However, such anadaptation mayleadtothepredictionofsomeself-loops.A sys-tematicstudyofadaptingmethodsbetweendirectedandbipartite networkswillbeanimportantsubjectforfutureinvestigation.

Acknowledgements

Thiswork issupportedby theSwissNationalScience Founda-tion(GrantNo. 200020-143272).Y. FanacknowledgestheNational Natural Science Foundation of China under Grant No. 61174150, NCET-09-0228, the Doctoral Fund of the Ministry of Education

(20110003110027) and the major project of National Social Sci-enceFund(12&ZD217).

References

[1]L. Lu, T. Zhou, Physica A 390 (2011) 1150. [2]H. Yu, et al., Science 322 (2008) 104.

[3]C. von Mering, R. Krause, B. Snel, M. Cornell, S.G. Oliver, S. Field, P. Bork, Nature

417(2002)399.

[4]R. Guimera, M. Sales-Pardo, Proc. Natl. Acad. Sci. USA 106 (2009) 22073. [5]S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, D.-U. Hwang, Phys. Rep. 424

(2006) 175.

[6]M.-S. Shang, L. Lü, W. Zeng, T. Zhou, Y.-C. Zhang, Europhys. Lett. 88 (2009)

68006.

[7]F. Radicchi, S. Fortunato, B. Markines, A. Vespignani, Phys. Rev. E 80 (2009)

056103.

[8]T.S. Evans, A.D.K. Plato, Phys. Rev. E 75 (2007) 056101.

[9]L.-Y. Lü, M. Medo, C.H. Yeung, Y.C. Zhang, Z.K. Zhang, T. Zhou, Phys. Rep. 519

(2012) 1.

[10]A. Zeng, C.H. Yeung, M.S. Shang, Y.C. Zhang, Europhys. Lett. 97 (2012) 18005. [11]L.-G. Liu, K. Shi, Q. Guo, Phys. Rev. E 85 (2012) 016118.

[12]A.Zeng,G.Cimini,Phys.Rev.E85(2012)036101.

[13]Y.-K.Yu,Y.-C.Zhang,P.Laureti,L.Moret,PhysicaA371(2006)732.

[14]T. Zhou, L. Lu, Y.-C. Zhang, Eur. Phys. J. B 71 (2009) 623. [15]C.-J. Zhang, A. Zeng, Physica A 391 (2012) 1822.

[16]P. Zhang, J. Wang, X. Li, M. Li, Z. Di, Y. Fan, Physica A 387 (2008) 6869. [17]T. Zhou, Z. Kuscsik, J.G. Liu, M. Medo, J.R. Wakeling, Y.C. Zhang, Proc. Natl. Acad.

Sci. USA 107 (2010) 4511.

[18]Y.Wang,A.Zeng,Z.Di,Y.Fan,Chaos23(2013)013104.

[19]T.Zhou,J.Ren,M.Medo,Y.C.Zhang,Phys.Rev.E76(2007)046115.

[20]Y.-C. Zhang, M. Blattner, Y.K. Yu, Phys. Rev. Lett. 99 (2007) 154301. [21]Y.-B. Zhou, L. Lu, M. Li, New J. Phys. 14 (2012) 033033.

[22]G. Adomavicius, A. Tuzhilin, IEEE Trans. Knowl. Data Eng. 17 (2005) 734. [23]F. Zhang, A. Zeng, Europhys. Lett. 100 (2012) 58005.

[24]A. Zeng, A. Vidmer, M. Medo, Y.-C. Zhang, Europhys. Lett. 105 (2014) 58002. [25]R.Guimera,A.Llorente,E.Moro,M.Sale-Pardo,PLoSONE7(2012)e44620.

[26]N. Rovira-Asenjo, T. Gumi, M. Sale-Pardo, R. Guimera, Sci. Rep. 3 (2013) 1999. [27]B. Ball, M.E.J. Newman, Netw. Sci. 1 (2013) 16.

[28]Q.-M. Zhang, L. Lu, W.Q. Wang, Y.X. Zhu, T. Zhou, PLoS ONE 8 (2012) e55437. [29]M. Kim, J. Leskovec, in: Proceedings of 11th SIAM International Conference on

Data Mining, 2011, pp. 47–58.

[30]J. Sanz, E. Cozzo, Y. Moreno, J. Stat. Mech. (2013) P12008. [31]W.Zhan,Z.Zhang,J.Guan,S.Zhou,Phys.Rev.E83(2011)066120.

Identifying missing and spurious connections via the bi-directional diffusion on bipartite networks

Identifying

missing

and

spurious

connections

via

the

bi-directional

diffusion

on

bipartite

networks

Peng Zhang

,

An Zeng

,

,

∗

,

Ying Fan

,

∗

*

Published in 3K\VLFV/HWWHUV$ ± ±

which should be cited to refer to this work.

α

α

α

α

=

=

=

,

=

=

β

=

.

=

,

λ

λ

λ

=

=

.

λ

=

λ

=

=

= θ

+ (

− θ)

,

θ

α

=

|

|

.

α

=

−

|

|

.

λ

θ

λ

θ =

< θ <

λ

RS of

λ

RS is

RS

θ =

RS vs λ

Published in 3K\VLFV/HWWHUV$±±

_|

_|

₌

_.

₌

_.

_λ

₌

_.