Information filtering in evolving online networks

(1)

Information

ﬁltering

in

evolving

online

networks

Bo-Lun Chen

a

,

b

,

Fen-Fen Li

a

,

∗

,

Yong-Jun Zhang

a

,

Jia-Lin Ma

a

a_College_of_Computer_Engineering,_Huaiyin_Institute_of_Technology,_Huaian_223300,_China b_Department_of_Physics,_University_of_Fribourg,_Chemin_du_Musée_3,_CH-1700_Fribourg,_Switzerland

Recommender systems use the records of users’ activities and profiles of both users and products to predict users’ preferences in the future. Considerable works towards recommendation algorithms have been published to solve the problems such as accuracy, diversity, congestion, cold-start, novelty, coverage and so on. However, most of these research did not consider the temporal effects of the information included in the users’ historical data. For example, the segmentation of the training set and test set was completely random, which was entirely different from the real scenario in recommender systems. More seriously, all the objects are treated as the same, regardless of the new, the popular or obsoleted products, so do the users. These data processing methods always lose useful information and mislead the understanding of the system’s state. In this paper, we detailed analyzed the difference of the network structure between the traditional random division method and the temporal division method on two benchmark data sets, Netflix and MovieLens. Then three classical recommendation algorithms, Global Ranking method, Collaborative Filtering and Mass Diffusion method, were employed. The results show that all these algorithms became worse in all four key indicators, ranking score, precision, popularity and diversity, in the temporal scenario. Finally, we design a new recommendation algorithm based on both users’ and objects’ first appearance time in the system. Experimental results showed that the new algorithm can greatly improve the accuracy and other metrics.

1. Introduction

Manynaturalandsocialsystemscanbedescribedbynetworks andgraphs[1,2].Among thesenetworks, manyofthememergea naturalbipartite structure, i.e.a network consistingoftwo sepa-ratednodesetsandlinksbetweenthetwonodesets.Examplesof thiskindofnetworksarescientiﬁccollaborationnetwork[3,4],P2P Internet formedby computer terminalsanddata [5];cooperative network formed by actors andtheir ﬁlms [6,7]; Shares Network formedbytheinvestorsandcompanies[8,9];theactivitynetwork formedby the membersofthe clubandtheir activities [10]; the audiencewiththesongsnetwork[11];disease-genenetworks[12], andsoon.

Ine-commercialsystems,thebipartitenetworkisanimportant tooltocharacterizetherelationsbetweenusersandobjects.Based on the user-object bipartite network, many recommendation al-gorithmsaredeveloped. Theexisting recommendationalgorithms can be divided into four main categories: collaborative ﬁltering

*

Correspondingauthor.

E-mailaddresses:chenbolun@hyit.edu.cn(B.-L. Chen),lifenfen@hyit.edu.cn

(F.-F. Li).

algorithm [13,14], associationrule-based algorithm [15], content-basedalgorithm[16,17]andresourceallocationalgorithm[18–20]. The main idea of the collaborativefiltering is todo a recom-mendation based on the similarity of nodes, which can be fur-ther classified asuser-based similarityor object-basedsimilarity. They can help customers quickly to find what they want. Lu et al. [21] carried out a research going through the latest applica-tion of the recommender system by classifying the applications intoeighttypes,includinge-groupactivities,e-resourceservices, e-tourism, e-learning, e-library, e-commerce/e-shopping, e-business ande-government.Besides,asummaryisalsomadeaboutrelevant recommendationtechniquesofeach type.This surveycan greatly help the researchers to promote their understandings about the applicationdevelopmentofrecommendersystem.Inaddition,Yera etal.[22] presenteda reviewabouttheapplicationoffuzzytools in this research. According to his research, application of fuzzy toolsis mainlyusedto detectmoreexclusively studiedtopicand also researchgaps andhencetomake propersuggestions forthe subsequentresearchestopromotethefurtherdevelopmentofthe fuzzy-basedrecommendersystem.Particularly,ananalysisismade inhisresearchabouthowtheapplicationareas,employeddataset, evaluationstrategiesandkeyfeaturesoftheThomsonReutersWeb ofSciencehavebeenappliedtorealizesuchanaim.

http://doc.rero.ch

Published in "Physics Letters A 382 (5): 265–271, 2018"

which should be cited to refer to this work.

(2)

Despite its success, collaborative filtering suffers from data sparsity problem [23,24]. The association rule-based recommen-dation algorithm can find the correlations of different products in the sales process and has been successfully applied in many websites.Thismethoddoesnotrequiredomainknowledgeto dis-cover new points of interest. However, the critical bottleneck of the algorithm is that it’s difficult to extract rules effectively be-causethetimecomplexity oftheassociationrule-basedalgorithm is veryhigh, whereas the diversityindexof therecommendation islow.Content-basedrecommendationusesthehistorical informa-tion(suchasratingscores,sharing activities,anddocuments col-lecting records) toconstructuser preferencesfile, thencalculates thesimilaritybetweenrecommendedobjectsanduserpreferences, andfinallyrecommendsthemostsimilarobjectstothetargetuser. The shortcoming of this method is that it is difficult to acquire usefulinformationfromnon-textualdata format,such asimages, music,videosetc.Therefore,thecontent-basedmethodmayfailto find theinterest of the userswithout deviation. Resource alloca-tionmethods aredesignedbyphysicists. Thesemethodscompute theusers’or/andobjects’similaritythroughdiffusionprocesseson user-objectbipartitenetworkandrecommendobjectstothetarget userbasedonthefinal resourcetheobjectsreceived.These phys-ical based algorithms are highly accurate andefficient andcould beusedtoresolvetheapparentaccuracy–diversitydilemmawhen combinedin an eleganthybrid withan mass(resource)diffusion [18] and heat conduction algorithm [20], to cope with the cold startproblemsinrecommendation[25],toavoidcongestioninthe stocksislimited[26],etc.

2. Related work

In recent years, some researchers also tried to integrate the temporalinformationintotherecommendationalgorithms[27,28]. Thesetemporaleffectsincludeengagementofnewusers,seasonal effects, user preference drifts and shifts and other phenomena alike. These phenomena can have a continuous impact over the underlying relations between the items andusers, which is also whatrecommendationalgorithmsattemptstocapture.

Revamping two dominating collaborative ﬁltering algorithms, Koren et al. [29] presented a model tracking the time changing behaviorsinordertodistillthelonger-termtrendsfromthenoisy patterns. Xiong etal. [30] proposed a Bayesian probabilistic ten-sorfactorization(TF)modelbasedonthe continuoustime. Liuet al. [31] studied the effect of temporal information on the heat-conductionalgorithmby graduallyexpandingtimewindow.Liuet al.[32]found thatboth temporalattenuationanddiversiondelay playkey rolesinrecommendersystemwhenlinkweightistaken intoconsideration.Thentheycombinedthesetwotemporalfactors withusers’lifespanstoconstructatime-weightednetwork(TWN) modelandresourceallocationprocesswasapplied.

There are several other examples of algorithms that use time ascontextavailableintheliteratureoncontext-aware recommen-dation. Forinstance,Maioetaletal.[33] promotedthat the time-awarecollaborative ﬁlteringcanbe adoptedto evaluatewhatthe users mighthold interest onwhen they browse through Twitter. Text analysisservice isadopted inthis approach toannotate the contentof Tweetssemanticallytotrackthenotions basedon the frequenciesofbeingpostedandforwardedalongthetime.Saraet al.[34] madea studytogetthecontextualrecommendation cus-tomizedthroughatime-awarenesssystem.

Even though many recommendationalgorithms hadbeen de-signed and in some of them the temporal information were ap-pliedtoimprovetherecommendationaccuracy,oneimportant is-sue is still being serious overlooked: some of temporal methods are done based on randomly divided training set and probe set [18,19].Themaintaskofrecommendersystemistopredictfuture

linksofeachtargetuserbasedonhistoricalratingrecords[31,32, 35–37],inwhichthewholedatasetisdividedintoa trainingset andatest set andonlythe linksintraining set isknown before recommendation.However,inmostoftheprevious studies,asfar asweknow, thetest dataset issampledrandomly without con-siderthetemporalorderofthelinks,leadingtoalogicaldisorder in the recommendation process, i.e., predicting the past links in testsetonthe basisofthefuture linksintraining set.Therefore, therealperformance of theexisting recommendationalgorithms, suchasthefamousGlobalRankingMethod(GRM)[18],User-based CollaborativeFiltering(UCF), andMass Diffusionmethod(MD), is actuallynotyetfullyunderstood.Inthispaper,wereexaminesome representativerecommendation algorithms withthe temporal di-vision method,i.e. the observed linksand unknown future links are divided strictly based on their temporal order. We ﬁnd that mostrecommendationalgorithmshavemuchloweraccuracywhen theyareappliedtotemporaldivisiontrainingsetandtestset,than randomdivisionmethod.Aftercarefulanalysis,weﬁndthisis be-causethere are more novel objects, with a little historical links intraining set,needtobe recommendedintemporaldivisionset, comparingwiththerandomdivisionset.

Additionally, to solve thisproblem, we presenta new recom-mendationalgorithmtoimprovetherecommendationaccuracyby consideringthe ﬁrstappearance time ofthe objectsandusers in system.

3. Problem formulation and evaluation methods

A bipartite network model gives us a clear vision when we dealwiththe probleminvolving two differenttypesof nodes.In networkbased recommendationalgorithm,data sets usually rep-resentedby anundirectedbipartitenetwork G

(

U

,

O

,

E

)

,whereU isthesetofusers, O isthesetofobjectsandE isthesetoflinks between them. The links between uses and objects can also be representedbyan adjacentmatrix A|U|×|O|

= {

aiα

}

,whereaiα

=

1 ifuserui selectedobjectoα andaiα

=

0 otherwise.

The purpose of a recommendation algorithm is to assign a score, Score

(

i

,

α

)

, to each unconnected pair of user and object

(

i

,

α

)

.Thisscorevaluemaybeunderstoodasakindofproximity, whichassociatedwiththeirconnection probability.Thatistosay, fora pairof unconnected nodes

(

i

,

α

)

,the larger the Score

(

i

,

α

)

is,thehighertheprobability therewillexist alinkbetweenuser i and object

α

. In most of the previous studies, the links in E arerandomly divided intotwo subsets: training set ET_, _which_is treated as known information, and probe set EP, which is un-knownbefore recommendation, to test therecommendation per-formanceofanalgorithm,e.g.,accuracyanddiversity.It’sclearthat ET

_∪

_EP

₌

_{E and}_ET

_∩

_EP

_{= ∅}

_.

In principle, a recommendation algorithm will provides each targetuserwithandescendingorderedlistofobjectsthatthe tar-getuserdidnot chosenbefore.Foreachrecommendedobjectsin thelist,ifthereexistalinkbetweentheobjectandtargetuserin probeset,wecallitahitting.Notethat,onlytheuserwhohaveat leastoneselectedobjectsintrainingsetandatleastoneconnected objectinprobesetwillbeconsideredwhencomputingalgorithms’ performance.Inthiswork,fourpopularusedmetricsareemployed totesttheeffectivenessofthealgorithms,RankingScore,P recision, P opularit y andDiversit y.

RankingScore (R S)isoneofthemostimportantaccuracy met-rics which measures the ability of recommendation algorithm to assign more score to users’ preferable objects than irrelevant ones.Consideringatargetuseri anditsrecommendationlist,the ranking score of each selected object

α

in probe set is R Si

α

=

rankα

/(|

O

|

−

kT

i

)

whererankα isthe rankofthe object

α

inthe descendingordered recommendationlist ofthetarget user i and

(3)

Fig. 1. An illustration about the calculation of the four metrics.

kT_i is the degree of target user i in training set. Thus, the inte-grated ranking scoreof the wholesystem is theaverage ranking scoreoverallusersandobjectspairsinprobeset,reads

R S

=

1

|

EP

_|

(i,α)∈EP

R Si

α

(1)

Notethat,theranksoftheobjectswithsamescoresarethesame andequaltotheirmedian. Itcanclearthatpredictionresultwith higheraccuracywillgethigherrankingscore.

Giventherankingofthenon-observedlinks,thePrecision (P )is deﬁnedastheratioofrelevantobjectsselectedtothetotalnumber of objects selected. That is to say, if we take the T op

−

L links asthe predicted ones, amongwhich hi links are right, then the Precision canbeexpressedas:

P

=

1 n n

i=1 hi L (2)

Clearly,higherprecisionmeanshigherpredictionaccuracy. The metric Popularity (I) measures the average degree ofthe objects in the recommendation list. It is hard for the users to ﬁndtherelevantbutunpopular objects.Therefore,a good recom-mendersystemshouldprefertorecommendsmalldegreeobjects. ThemetricPopularity (I)canbeexpressedas:

Ii

(

L

) =

1 L

α∈Oi kα (3)

whereOi _represents_the_{recommendation}_list_for_user_i,_kα repre-sentsthedegreeofthe object

α

.Alow meanpopularity I

(

L

)

for thewhole system indicates a highnovel andunexpected recom-mendationofobjects.

TheDiversity (D)mainlyconsidershowusers’recommendation lists are different from each other. Here, we measure it by the Hammingdistance.We denote Ci j

(

L

)

as the numberof common objectsinthetop-L placeoftherecommendationlistofuseri and

j,theirhammingdistancecanbecalculatedas: Di j

(

L

) =

1

−

Ci j

(

L

)

L

.

(4)

Di j

(

L

)

is between0 and 1, whichare respectively corresponding tothecaseswhereuseri and j havethesameoranentirely dif-ferent recommendationlist.By averaging Di j

(

L

)

over all pairs of users,weobtainthemeanhammingdistance D

(

L

)

.Themorethe recommendationlistdiffersfromeach other,thehigherthe D

(

L

)

is.

Fig. 1 gives an example ofhow to calculate the four metrics. Fig. 1(a)showsacompletenetworkstructurewhichincludesfour users and ﬁve objects. In the whole graph, we can see eleven existent links andnine nonexistent links. Totest thealgorithm’s accuracy,weneedtochoosesomeexistentlinksasprobeset.

For instance, we select

(

U1

,

O3

)

,

(

U2

,

O5

)

,

(

U3

,

O3

)

and

(

U4

,

O4

)

as probe links, which are presented by dotted lines

in Fig. 1(b). Then, the remaining seven links constitute the training set. An algorithm can only make use of the informa-tion contained in the training set (presented by solid lines in Fig. 1(b)).Ifan algorithmassigns scoresofall non-observedlinks as Score

(

U1

,

O2

)

=

0

.

4,Score

(

U1

,

O3

)

=

0

.

8, Score

(

U1

,

O4

)

=

0

.

6,

Score

(

U2

,

O1

)

=

0

.

4, Score

(

U2

,

O3

)

=

0

.

8, Score

(

U2

,

O5

)

=

0

.

7,

Score

(

U3

,

O2

)

=

0

.

4, Score

(

U3

,

O3

)

=

0

.

8, Score

(

U3

,

O4

)

=

0

.

6,

Score

(

U4

,

O1

)

=

0

.

4, Score

(

U4

,

O2

)

=

0

.

5, Score

(

U4

,

O4

)

=

0

.

6,

Score

(

U4

,

O5

)

=

0

.

7.Then, weshould sortthescores in

descend-ingorderforeachtargetuser.

TocalculateR S,theR S valueofU1equals1

/(

5

−

2

)

=

1

/

3,the

R S valueofU2equals2

/(

5

−

2

)

=

2

/

3,theR S valueofU3 equals

1

/(

5

−

2

)

=

1

/

3,theR S valueofU4equals2

/(

5

−

1

)

=

1

/

2.Hence,

the R S valueofthisalgorithmequals

(

1

/

3

+

2

/

3

+

1

/

3

+

1

/

2

)/

4

=

11

/

24.

To calculate P , here, we set L as 2.Then, the P valueof U1

equals1/2,the P valueofU2 equals1/2,theP valueofU3 equals

1/2,the P valueofU4 equals 0.Hence,the P valueofthis

algo-rithmequals

(

1

/

2

+

1

/

2

+

1

/

2

+

1

/

2

)/

4

=

1

/

2.

To calculate D, here, we set L as 3. Then, the D value of

(

U1

,

U2

)

equals 1

−

1

/

3

=

2

/

3, the D value of

(

U1

,

U3

)

equals

1

−

3

/

3

=

0, the D value of

(

U1

,

U4

)

equals 1

−

2

/

3

=

1

/

3, the

D valueof

(

U2

,

U3

)

equals1

−

1

/

3

=

2

/

3,theD valueof

(

U2

,

U4

)

equals1

−

1

/

3

=

2

/

3,theD valueof

(

U3

,

U4

)

equals1

−

2

/

3

=

1

/

3.

Hence,theD valueofthisalgorithmequals

(

2

/

3

+

0

+

1

/

3

+

2

/

3

+

1

+

1

/

3

)/

6

=

1

/

2. 4. Data and methods

4.1. Dataandthecorrelationofobjects’degree

Inthispaper,we usetwostandarddatasets whichhavebeen widely used to examine the performance of recommendation al-gorithms [27,28]. The first one is the Netflix with 1891 movies (objects)and2294users(http://www.Netflixprize.com/).Usersrate movies from1 (worst)to 5(best). Consistentwiththeliterature, we consider the ratings higher than 2 as a link. Finally, 59464 linksremaininthenetwork.ThesecondoneistheMovielensdata which is a random sample ofthe whole recordsof users ratings during the seven-month period from 19 September 1997 to 22 April 1998 inMovielens.com (http://www.grouplens.org/). It con-sists of 943 users, 1682 movies, and 100000 links. Like Netflix, Movielensisalsobasedona5-starratingsystems.Withthesame ratingfilteringprocessasNetflix,weobtain82520linksin Movie-lensdata.

Becausethelinksoftheobjectsmaybothappearinthetraining setandinthetestset,soweusethenumberoftheobjectslinks occurredinthetrainingsetastheabscissaandthenumberofthe objects linksoccurredinthe test setasthe ordinate.We set the ratiooftrainingsetis90%anddrawthecorrelationoftheobjects’ degree inthe dataset bothdivided by randomandby time. The resultofNetﬂixisshowninFig. 1,theresultofMovielensisshown inSI.

FormFig. 2,wecanseethatN P and N T almostshowalinear relationship.Itisbecausethatthelinksareselectedastrainingset andprobesetbyequalprobability.Butintheconditionofdivision data sets by temporal information,the distribution of points are more discrete.There exist that some objects’degreesare smaller inthe training setbutbiggerin theprobeset,andsome objects’

(4)

Fig. 2. Theanalysisoftheobjects’andusers’degreeintrainingsetandprobeset,

theNetﬂixcase.Thescatterplotsinpanels (a)and(b)aretheobjects’sdegreein

trainingsetversusprobesetwhenthedatasetisdividedrandomlyand

accord-ingtemporalinformation,respectively.Theratiooftrainingsetis0.9.Thehighlight

pointsin(b)arethecold-startobjectsandobsoleteobjects,whichareindispensable

inrealrecommendersystem.Panel(c)isthecorrelationofobjects’degreein

train-ingsetandprobesetversusthechangesoftheratiooftrainingset.Thegreenline

andredlinecorrespondtothesetsthatisdividedrandomlyanddividedaccording

thetemporalsequence,respectively.Panel(d)isthesameas(c),butforthe

corre-lationofusers’degree.(Forinterpretationofthereferencestocolorinthisﬁgure

legend,thereaderisreferredtothewebversionofthisarticle.)

degreesarebiggerinthetrainingsetbutsmallerintheprobeset. Evenmore,someobjects’degreesarezerointhetrainingset,but non-zero in the probe set that is the cold start problem. Those objectsarehighlightedintheFig. 2.WealsocanseethatC

−

Item andC

−

U ser almostnear1inthedatasetsdividedbyrandomas thechangingofthetrainingset’sratio.

4.2. Theageoftheobject

Wefurthermovetostudytheattributivecharacteroftheobject andgivethefollowingdeﬁnesﬁrstly.

Deﬁne 1. The age of the object(TA O):we supposed that the objectreceivestheﬁrstlinkisthebirthday,wesetitTB O.Weset the endingtime of thedataset is TE,so the ageof theobject is TA O

=

TE

−

TB O.

Deﬁne 2. The ageoftheobject(TAU):wesupposedthattheuser receives the ﬁrst link is the birthday, we set it TBU. We set the endingtime ofthedatasetis TE,sothe ageoftheuseris TAU

=

TE

−

TBU.

IntheNetﬂix,wesettheageintervalofobjects

T

=

100 and in this way the whole objects could be divided into 22 groups. Analogously,weassumethatageinterval

T =100andinthisway thewhole objectscould be dividedinto 22groupsinthe Movie-lens.Wecountthenumberofobjectsineveryage’srangebothin the whole datasets but inthe probe sets.In the probe sets,we consideredtherandomdivisionofdatasetsanddatasetsdivided by the temporalinformation inboth cases. The resultsofNetﬂix andMovielensareshowninFig. 2andSI,respectively.

Form Fig. 3, we can see that in the Netﬂix, the trend ofthe number in the probe sets divided by random is similar to the numberinthewholedatasetsandthenumberofyoung objects’ agebasedondivisiondatasetsbytemporalinformationaremore thanthenumberofyoungobjects’agebasedondivisiondatasets by random. Incontrast,the numberof old objects’age basedon division data sets by temporal information are smaller than the numberofoldobjects’agebasedondivisiondatasetsbyrandom. It indicatesthat we canget differentprobesets accordingto dif-ferentdivisionways.TheresultofMovielensisshowninSI.

Fig. 3. Theevolvingofobjects’quantityandtheaveragedegreewiththeincreasing

ofobjects’ageintheNetﬂixdataset.Panels (a)and(b)aretheobjects’the

quan-tityandtheaveragedegreeevolving,respectively,inthewholedatasetofNetﬂix.

Panels (c)and(d)aretheevolvinganalysisintheprobesetofNetﬂix.Thesame

analysisofMovielensdatasetisshowninSI.

4.3.Theanalysisofdatasetsdividedbyrandomandtime

The main task of recommendersystem is to predict a future linkofauserbasedonahistoricalratinglink,butwe usedto di-videthedata setintotwoparts byrandom thatcausedthelinks occurredat thelatest time as the training data, andwe use the linksoccurredat thelatest time to predict the linksoccurredat thepasttime.Inordertorevealtheeffectofusers’online behav-iorpatternson recommendersystem,we divided thedataset by time.The recordsthat occurred atthelatest time are settesting data,andothersastrainingdata.Theratiooftrainingsetincreases from10% to 90%.We use three differentrecommendation meth-ods such as user based collaborative ﬁltering(UCF), global rank method(GRM)andmassdiffusionalgorithm(MD)on thedifferent sizes oftraining data set divided by randomand temporal infor-mation.The fourmaterials valueson Netﬂixare shownin Fig. 3. TheresultoffourmaterialsvaluesonMovielensareshowninSI.

From Fig. 4,we can seethat differentrecommendation meth-odspresentthesametrendwiththeincreaseoftheRatio whether division data set by random or by temporal information. The Ranking Score of all methods are getting worse when the data setdividedbytemporalinformation,itindicates thatthenumber of young links is more than that of old links, and those young linksare notcorrect recommendation. Similarly, the P recision of allmethods are worse,because some newlinksare usedto pre-dicttheoldlinksintheconditionofdivisiondatasetbytemporal information.Finally,we average every material value indifferent ratiooftrainingdatasets,theresultisshowninTable 1.

From Table 1, we can seethat the all metrics almost become worse both in Netﬂix and Movielens except diversity when the datasetdividedbytemporalinformation.Speciﬁcally,comparedto therandom datadivision case, therankingscore R S isincreased by74

.

09%, 50

.

70%, 49

.

27% inNetﬂixand 35

.

43%, 30

.

67%, 31

.

98% inMovielensaccording to threedifferent methods. The precision P ,popularity I andhammingdistance D ofdifferentmethodsare alsoshowninTable 1.The resultsofprecision P in Table 1 con-ﬁrm our ﬁnding that the temporal data division could result in a lower recommendationaccuracy. Compared withthe precision in the random data division case, P in the temporal data divi-sionisdecreasedby27

.

10%,35

.

82%,45

.

25% inNetﬂixand32

.

65%, 38

.

64%,46

.

76% inMovielens. Theseresultsindicatethattemporal data division mechanism can indeed reduce the general recom-mendationaccuracy.

(5)

Table 1

Theaveragevalueofallmetricsindifferentratiooftrainingdataset.

GRM UCF MD

Random Time Random Time Random Time

Netﬂix R S 0.108 0.186 0.121 0.183 0.116 0.173 P 0.165 0.107 0.170 0.118 0.175 0.119 I 290.0 290.9 250.7 271.0 242.6 265.2 D 0.203 0.355 0.590 0.542 0.665 0.616 Movielens R S 0.158 0.201 0.146 0.198 0.135 0.196 P 0.282 0.190 0.340 0.209 0.372 0.198 I 204.2 205.7 187.3 189.4 180.1 183.1 D 0.300 0.565 0.625 0.691 0.7154 0.739

HerewesettherecommendationlistL

=

10.

Fig. 4. Theperformanceofthreeclassicrecommendationalgorithms,globalranking

method(GRM),user-basedcollaborativeﬁltering(UCF)andmassdiffusionmethod

(MD),bothonthedatasetdividedrandomlyandbytimesequence.Fourkey

cri-terionsofrecommendersystems,rankingscore,precision,popularityanddiversity,

areemployedinthisﬁgure.Thehorizontalaxisistheratiooftrainingset,andthe

lengthoftherecommendationlistisL=10.

Throughtheseriesofanalysis,wecanseethatrecommendation algorithmcanachievebetterrecommendationresultsinthecaseof divisiondatasetsbyrandom,butwhenweconsiderthetemporal information,theperformanceofthealgorithmhasdropped,sowe need to presenta newalgorithm that can achieve better results whenthedatasetsaredividedbytemporalinformation.

4.4.AlgorithmofADmodel

The basicidea ofouralgorithm is: ﬁrstly, we setunique

θ

to every target useraccording totheir ages.The olderusers in real systemsshow asmall

θ

whilethe youngerusersare withhigher

θ

.Second,weranktheobjectaccordingtotheiragesanddegrees. Weput theobjectswhich ageissmaller anddegree isbigger on theforefrontofthequeueL1.Inthenext,webuildanotherqueue L2 bymassdiffusionalgorithm.So,

θ

istheparametertoadjustthe twolists.Actually,theyoungerusers don’thavemuchexperience in exploring new objects, they are more likely to conservatively chooseobjectsinthegroupwhichthey arealreadyfamiliarwith. Onthecontrary,the olderuserinrealsystemsinclinesto search andtryunpopularobjects.So,ifthetargetuserisolder,theweight ofL1 is morethan L2, itmeansthat weprefer torecommendto theiryoungerandunpopularobjects.

TheframeworkofourADalgorithmforrecommendersystemis asfollows.

Algorithm AD_RS (ADforrecommendersystem) Input: A: Theadjacencymatrixofnetwork; Output: Score: Thescorematrix;

Begin

Step 1: We setunique

θ

to everytarget useraccordingto for-mula(5):

θ =

max

(

U ser Age

) −

T ragetU ser Age

max

(

U ser Age

) −

min

(

U ser Age

)

(5)

Step 2: We set Age O bjecti is the age of the Oi and Degree O bjecti isthedegree of Oi.Wecalculatethe valueof ev-eryobjectby formula(6)andsorttheirvaluefromsmalltolarge. WesetthelistasL1.Next,weusetherankingreplacetothe rec-ommendedvalueintheL1.

value

=

Age O bjecti Degree O bjecti

(6)

Step 3: We getarecommendedlistbyMassDiffusionalgorithm. WesetthelistasL2 andsorttheirvaluefromlargetosmall.Next, we usethe ranking replace to the recommendation value in the L2.

Step 4: We generateanewrecommendedlisttotargetuserby formula(7)andusethesimilarit y asthescorematrix.

similarit y

= (

1

− θ) ∗

log2L1

+ θ ∗

log2L2 (7)

Step 5: We sortthesimilarityfromsmalltolarge,S isthe rec-ommendedlistforalluser.

End 5. Results

In this section, we empirically demonstrate the effectiveness of AD algorithm on two real world networks. We also compare itsperformanceagainstthetraditionalrecommendationalgorithms suchastheglobalrankmethod(GRM),userbasedcollaborative ﬁl-tering (UCF), massdiffusion(MD), tensorfactorizationmodel(TF), time-weighted network model (TWN). We focus on the ranking score, precision,popularityand diversityofthe algorithm.In our experiments,wesettheratiooftrainingsetincreasesfrom10%to 90%.Theresultofdifferentrecommendationmethodsondifferent sizesofthetrainingdatasetareshowninFig. 5.Wecalculatethe averagevalueofallmetrics,theresultisshowninTable 2.

Accuracy is always the first consideration inevaluating a rec-ommendationalgorithm’sperformance.Comparingtheresultfrom the six methods, we can see that the AD outperformsthe other five algorithms on ranking score andthe average value of preci-sion of the AD outperforms other three on the datasets Netflix andMovielens. For instance,compared withthat ofTF,the aver-ageranking score R S canbereducedby 12.12%forNetflix,2.11% for Movielens. Whenthis comparisonis againstGRM, the reduc-tions arefurtherenlarged,to22.61%fortheNetflix,7.70%forthe Movielens.Besides,comparedwiththatofTF,theaveragevalueof precision P oftheADcanbeincreasedby4.20%forNetflix,3.61% for Movielens. Whenit compared withthat of GRM,the average

(6)

Table 2

Theaveragevalueofallmetricsindifferentratiooftrainingdataset.

GRM UCF MD TF TWN AD Netﬂix R S 0.188 0.183 0.174 0.165 0.156 0.145 P 0.107 0.118 0.118 0.119 0.122 0.124 I 290.9 271.0 265.2 262.5 259.6 256.2 D 0.355 0.542 0.616 0.606 0.618 0.626 Movielens R S 0.201 0.198 0.196 0.190 0.188 0.186 P 0.190 0.201 0.198 0.194 0.196 0.201 I 201.7 189.4 183.1 182.7 179.9 176.1 D 0.565 0.691 0.756 0.638 0.649 0.667

HerewesettherecommendationlistL=10.

Fig. 5. Comparisonsofaverage RankingScore,Precision, PopularityandDiversity

betweenthedifferentalgorithmsindifferentsizesoftrainingsetontheNetﬂix.

HerewesettherecommendationlistL=10.

value ofprecision oftheAD can beincreasedby 16.39%for Net-ﬂix, 5.78% for Movielens. The same effect is compared with the restofthetwoalgorithms.Generally,theADcanprovidethebest orverynearlythebestaccuracy,asmeasuredbytherankingscore andprecision.

Besidesaccuracy,popularityI anddiversityD aretwoother im-portantmetrics.Fromtheresult,wecanseethattheaveragevalue of popularityof the AD outperforms other five on different data sets, specifically, comparedwith that ofTWN, the average popu-larity I can be reducedby 1.31%forNetflix, 2.11%forMovielens. WhenitcomparedwiththatofGRM,theaveragepopularityI can be reducedby 11.91% for Netflix, 12.71% for Movielens. Besides, theaveragevalueofdiversityoftheADoutperformsotherfiveon theNetflix.

6. Conclusion

Withthe large amountof networkdata become readily avail-able in electric form today, recommender system has become a popular subarea in data mining. The aim of the traditional rec-ommendationmethods is toimprove theaccuracy ofthe recom-mendation,they all ignorethe division of thedata sets whichis the basicproblem ofthe recommendation. In thispaper, we an-alyzedthe distinctiveness betweenthe randomdata division and temporal data division. We ﬁnd that the distribution of the ob-jects’degree correlationisnon-linear,that meansthat thereexist that some objects’degreeare smallerinthe trainingset but big-gerinthe probeset,andsome are bigger inthe training setbut smallerintheprobeset.Evenmore,someobjects’degreearezero

inthe training set, butnon-zero in the probe set. Then we ﬁnd theeffectofmanymethods gettingworse whenthedatasets are dividedbytemporalinformation.Sowepresentanew recommen-dationmodel inwhich we setweight to each userbasedon the temporalinformationandconsideredthedegreeandtemporal in-formationoftheobjects.Ourexperimentalresultsshowthatitcan obtainhigherqualityresultsonthebipartitenetworks.

Acknowledgements

This research was supported in part by the Chinese National NaturalScienceFoundationundergrantNos.61602202,61379066, Natural Science Foundation of Jiangsu Province under contracts BK20160428,BK20161302,533talentprojectinHuaian.Xiao-Long Ren,MatusMedo,AnZengalsosupportedthiswork.

Supplementary material

Supplementary content related to this article has been pub-lishedonlineathttps://doi.org/10.1016/j.physleta.2017.11.027. References

[1]R.N.Lichtenwalter,J.T.Lussier,N.V.Chawla,Newperspectivesandmethodsin linkprediction,in:Proceedingsofthe16thACMSIGKDDInternational Confer-enceonKnowledgeDiscoveryandDataMining,ACM,2010,pp. 243–252.

[2]L.Lü,T.Zhou,Linkpredictionincomplexnetworks:asurvey,Phys. A,Stat. Mech.Appl.390 (6)(2011)1150–1170.

[3]J.Liu,Y.Dang,Z.Wang,et al.,Relationshipbetweenthein-degreeand out-degreeofWWW,Phys.A,Stat.Mech.Appl.371 (2)(2006)861–869.

[4]R. Guimera, M. Sales-Pardo, Missingand spuriousinteractions and the re-construction of complex networks, Proc. Natl. Acad. Sci. 106 (52) (2009) 22073–22078.

[5]A.Papadimitriou,P.Symeonidis,Y.Manolopoulos,Fastandaccuratelink pre-dictioninsocialnetworkingsystems,J.Syst.Softw.85 (9)(2012)2119–2132.

[6]T.Hossmann,G. Nomikos, T.Spyropoulos, et al., Collectionand analysis of multi-dimensionalnetworkdataforopportunisticnetworkingresearch, Com-put.Commun.35 (13)(2012)1613–1625.

[7]K.Jahanbakhsh,V.King,G.C.Shoja,Predictingmissingcontactsinmobilesocial networks,PervasiveMob.Comput.8 (5)(2012)698–716.

[8]Y. Sun, R. Barber, M. Gupta, et al., Co-author relationship prediction in heterogeneousbibliographicnetworks,in: 2011International Conferenceon Advances in Social Networks Analysis and Mining, ASONAM, IEEE, 2011, pp. 121–128.

[9]X.Li,H.Chen,Recommendationaslinkpredictioninbipartitegraphs:agraph kernel-basedmachinelearning approach,Decis.SupportSyst.54 (2)(2013) 880–890.

[10]Z.Huang,D.K.J.Lin,Thetime-serieslinkpredictionproblemwithapplications incommunicationsurveillance,INFORMSJ.Comput.21 (2)(2009)286–303.

[11]H.Liu,Uncoveringthenetworkevolutionmechanismbylinkprediction,Sci. Sin.Phys.Mech.Astron.41(2011)816.

[12]D. Lin, An information-theoretic deﬁnition of similarity, ICML 98 (1998) 296–304.

[13]D.Goldberg,D.Nichols,B.M.Oki,etal.,Usingcollaborativeﬁlteringtoweave aninformationtapestry,Commun.ACM35 (12)(1992)61–70.

[14]J.B. Schafer, D. Frankowski, J. Herlocker, et al., Collaborative Filtering Rec-ommender Systems. The Adaptive Web, Springer, Berlin–Heidelberg, 2007, pp. 291–324.

[15]J.Li,Y.Xu,Y.Wang,etal.,Strongestassociationrulesminingforeﬃcient ap-plications,in:2007InternationalConferenceonServiceSystemsandService Management,IEEE,2007,pp. 1–6.

(7)

[16]F.H.Wang,S.Y.Jian,Aneffectivecontent-basedrecommendationmethodfor Webbrowsingbasedonkeywordcontextmatching,J.Inform.Electron.1 (2) (2006)49–59.

[17]C.Wartena,W.Slakhorst,M.Wibbels,Selectingkeywordsfor contentbased recommendation,in:Proceedingsofthe19thACMInternationalConferenceon InformationandKnowledgeManagement,ACM,2010,pp. 1533–1536.

[18]T.Zhou,J.Ren,M.Medo,etal.,Bipartitenetworkprojectionandpersonal rec-ommendation,Phys.Rev.E76 (4)(2007)046115.

[19]Y.C.Zhang,M.Blattner,Y.K.Yu,Heatconductionprocessoncommunity net-worksasarecommendationmodel,Phys.Rev.Lett.99 (15)(2007)154301.

[20]T. Zhou,Z.Kuscsik, J.G. Liu,et al., Solvingthe apparent diversity–accuracy dilemma of recommender systems, Proc. Natl. Acad. Sci. 107 (10) (2010) 4511–4515.

[21]J.Lu,D.Wu,M.Mao,etal.,Recommendersystemapplicationdevelopments:a survey,Decis.SupportSyst.74(2015)12–32.

[22]R.Yera,MartinezL.Fuzzy,Toolsinrecommendersystems:asurvey,Int.J. Com-put.Intell.Syst.10 (1)(2017)776–803.

[23]J. Bobadilla, F. Ortega, A. Hernando, et al., Recommender systems survey, Knowl.-BasedSyst.46(2013)109–132.

[24]H.Ma,T.C.Zhou,M.R.Lyu,etal.,Improvingrecommendersystemsby incorpo-ratingsocialcontextualinformation,ACMTrans.Inf.Syst.29 (2)(2011)9.

[25]Z.K.Zhang,C.Liu,Y.C.Zhang,etal.,Solvingthecold-startproblemin recom-mendersystemswithsocialtags,Europhys.Lett.92 (2)(2010)28002.

[26]X.Ren,L.Lü,R-R.Liu,et al.,Avoidingcongestioninrecommendersystems, NewJ.Phys.16 (6)(2014)63057.

[27]P.Campos,G.Diezf,I.Cantador,Time-aware recommendersystems: a com-prehensivesurveyandanalysisofexistingevaluationprotocols,UserModel. User-Adapt.Interact.24 (1–2)(2014)67–119.

[28]J.Vinagre,A.M.Jorge, J.Gama,An overviewonthe exploitationoftimein collaborativeﬁltering,WileyInterdiscip.Rev.Data Min.Knowl. Discov.5 (5) (2015)195–215.

[29]Y.Koren,Collaborativeﬁlteringwithtemporaldynamics,Commun.ACM53 (4) (2010)89–97.

[30]L. Xiong, X.Chen, T.K. Huang, et al., Temporal collaborative ﬁltering with bayesianprobabilistictensorfactorization,SDM10(2010)211–222.

[31]Q.Guo, W.J.Song, L.Hou, et al., Effect ofthe time windowon the heat-conductioninformationﬁlteringmodel,Phys.A,Stat.Mech.Appl.401(2014) 15–21.

[32]F.Xie, Z.Chen,J.Shang,etal.,A Linkpredictionapproachforitem recom-mendationwithcomplexnumber,in:2014IEEE/WIC/ACMInternationalJoint ConferencesonWebIntelligence(WI)andIntelligentAgentTechnologies(IAT), vol. 1,IEEE,2014,pp. 205–212.

[33]C.DeMaio,G.Fenza,M.Gallo,etal.,Socialmediamarketingthrough time-awarecollaborativeﬁltering,in:Concurrencyand Computation:Practiceand Experience,2017.

[34]A.Sara,Y.E.B. ElIdrissi, R.Ajhoun, Timeaware recommendation,in: Infor-mationandCommunicationTechnologyforTheMuslimWorld,ICT4M,2016, pp. 244–247.

[35]H.Liao,A.Zeng,R.Xiao,etal.,Rankingreputationandqualityinonlinerating systems,PLoSONE9 (5)(2014)e97146.

[36]Q.M.Zhang,A.Zeng,M.S.Shang,Extractingtheinformationbackboneinonline system,PLoSONE8 (5)(2013)e62624.

[37]R.Rafeh,A.Bahrehmand,Anadaptiveapproachtodealingwithunstable be-haviourofusersincollaborativeﬁlteringsystems, J. Inf.Sci.38 (3) (2012) 205–221.