• Aucun résultat trouvé

Information filtering in evolving online networks

N/A
N/A
Protected

Academic year: 2021

Partager "Information filtering in evolving online networks"

Copied!
7
0
0

Texte intégral

(1)

Information

filtering

in

evolving

online

networks

Bo-Lun Chen

a

,

b

,

Fen-Fen Li

a

,

,

Yong-Jun Zhang

a

,

Jia-Lin Ma

a

aCollegeofComputerEngineering,HuaiyinInstituteofTechnology,Huaian223300,China bDepartmentofPhysics,UniversityofFribourg,CheminduMusée3,CH-1700Fribourg,Switzerland

Recommender systems use the records of users’ activities and profiles of both users and products to predict users’ preferences in the future. Considerable works towards recommendation algorithms have been published to solve the problems such as accuracy, diversity, congestion, cold-start, novelty, coverage and so on. However, most of these research did not consider the temporal effects of the information included in the users’ historical data. For example, the segmentation of the training set and test set was completely random, which was entirely different from the real scenario in recommender systems. More seriously, all the objects are treated as the same, regardless of the new, the popular or obsoleted products, so do the users. These data processing methods always lose useful information and mislead the understanding of the system’s state. In this paper, we detailed analyzed the difference of the network structure between the traditional random division method and the temporal division method on two benchmark data sets, Netflix and MovieLens. Then three classical recommendation algorithms, Global Ranking method, Collaborative Filtering and Mass Diffusion method, were employed. The results show that all these algorithms became worse in all four key indicators, ranking score, precision, popularity and diversity, in the temporal scenario. Finally, we design a new recommendation algorithm based on both users’ and objects’ first appearance time in the system. Experimental results showed that the new algorithm can greatly improve the accuracy and other metrics.

1. Introduction

Manynaturalandsocialsystemscanbedescribedbynetworks andgraphs[1,2].Among thesenetworks, manyofthememergea naturalbipartite structure, i.e.a network consistingoftwo sepa-ratednodesetsandlinksbetweenthetwonodesets.Examplesof thiskindofnetworksarescientificcollaborationnetwork[3,4],P2P Internet formedby computer terminalsanddata [5];cooperative network formed by actors andtheir films [6,7]; Shares Network formedbytheinvestorsandcompanies[8,9];theactivitynetwork formedby the membersofthe clubandtheir activities [10]; the audiencewiththesongsnetwork[11];disease-genenetworks[12], andsoon.

Ine-commercialsystems,thebipartitenetworkisanimportant tooltocharacterizetherelationsbetweenusersandobjects.Based on the user-object bipartite network, many recommendation al-gorithmsaredeveloped. Theexisting recommendationalgorithms can be divided into four main categories: collaborative filtering

*

Correspondingauthor.

E-mailaddresses:chenbolun@hyit.edu.cn(B.-L. Chen),lifenfen@hyit.edu.cn

(F.-F. Li).

algorithm [13,14], associationrule-based algorithm [15], content-basedalgorithm[16,17]andresourceallocationalgorithm[18–20]. The main idea of the collaborativefiltering is todo a recom-mendation based on the similarity of nodes, which can be fur-ther classified asuser-based similarityor object-basedsimilarity. They can help customers quickly to find what they want. Lu et al. [21] carried out a research going through the latest applica-tion of the recommender system by classifying the applications intoeighttypes,includinge-groupactivities,e-resourceservices, e-tourism, e-learning, e-library, e-commerce/e-shopping, e-business ande-government.Besides,asummaryisalsomadeaboutrelevant recommendationtechniquesofeach type.This surveycan greatly help the researchers to promote their understandings about the applicationdevelopmentofrecommendersystem.Inaddition,Yera etal.[22] presenteda reviewabouttheapplicationoffuzzytools in this research. According to his research, application of fuzzy toolsis mainlyusedto detectmoreexclusively studiedtopicand also researchgaps andhencetomake propersuggestions forthe subsequentresearchestopromotethefurtherdevelopmentofthe fuzzy-basedrecommendersystem.Particularly,ananalysisismade inhisresearchabouthowtheapplicationareas,employeddataset, evaluationstrategiesandkeyfeaturesoftheThomsonReutersWeb ofSciencehavebeenappliedtorealizesuchanaim.

http://doc.rero.ch

Published in "Physics Letters A 382 (5): 265–271, 2018"

which should be cited to refer to this work.

(2)

Despite its success, collaborative filtering suffers from data sparsity problem [23,24]. The association rule-based recommen-dation algorithm can find the correlations of different products in the sales process and has been successfully applied in many websites.Thismethoddoesnotrequiredomainknowledgeto dis-cover new points of interest. However, the critical bottleneck of the algorithm is that it’s difficult to extract rules effectively be-causethetimecomplexity oftheassociationrule-basedalgorithm is veryhigh, whereas the diversityindexof therecommendation islow.Content-basedrecommendationusesthehistorical informa-tion(suchasratingscores,sharing activities,anddocuments col-lecting records) toconstructuser preferencesfile, thencalculates thesimilaritybetweenrecommendedobjectsanduserpreferences, andfinallyrecommendsthemostsimilarobjectstothetargetuser. The shortcoming of this method is that it is difficult to acquire usefulinformationfromnon-textualdata format,such asimages, music,videosetc.Therefore,thecontent-basedmethodmayfailto find theinterest of the userswithout deviation. Resource alloca-tionmethods aredesignedbyphysicists. Thesemethodscompute theusers’or/andobjects’similaritythroughdiffusionprocesseson user-objectbipartitenetworkandrecommendobjectstothetarget userbasedonthefinal resourcetheobjectsreceived.These phys-ical based algorithms are highly accurate andefficient andcould beusedtoresolvetheapparentaccuracy–diversitydilemmawhen combinedin an eleganthybrid withan mass(resource)diffusion [18] and heat conduction algorithm [20], to cope with the cold startproblemsinrecommendation[25],toavoidcongestioninthe stocksislimited[26],etc.

2. Related work

In recent years, some researchers also tried to integrate the temporalinformationintotherecommendationalgorithms[27,28]. Thesetemporaleffectsincludeengagementofnewusers,seasonal effects, user preference drifts and shifts and other phenomena alike. These phenomena can have a continuous impact over the underlying relations between the items andusers, which is also whatrecommendationalgorithmsattemptstocapture.

Revamping two dominating collaborative filtering algorithms, Koren et al. [29] presented a model tracking the time changing behaviorsinordertodistillthelonger-termtrendsfromthenoisy patterns. Xiong etal. [30] proposed a Bayesian probabilistic ten-sorfactorization(TF)modelbasedonthe continuoustime. Liuet al. [31] studied the effect of temporal information on the heat-conductionalgorithmby graduallyexpandingtimewindow.Liuet al.[32]found thatboth temporalattenuationanddiversiondelay playkey rolesinrecommendersystemwhenlinkweightistaken intoconsideration.Thentheycombinedthesetwotemporalfactors withusers’lifespanstoconstructatime-weightednetwork(TWN) modelandresourceallocationprocesswasapplied.

There are several other examples of algorithms that use time ascontextavailableintheliteratureoncontext-aware recommen-dation. Forinstance,Maioetaletal.[33] promotedthat the time-awarecollaborative filteringcanbe adoptedto evaluatewhatthe users mighthold interest onwhen they browse through Twitter. Text analysisservice isadopted inthis approach toannotate the contentof Tweetssemanticallytotrackthenotions basedon the frequenciesofbeingpostedandforwardedalongthetime.Saraet al.[34] madea studytogetthecontextualrecommendation cus-tomizedthroughatime-awarenesssystem.

Even though many recommendationalgorithms hadbeen de-signed and in some of them the temporal information were ap-pliedtoimprovetherecommendationaccuracy,oneimportant is-sue is still being serious overlooked: some of temporal methods are done based on randomly divided training set and probe set [18,19].Themaintaskofrecommendersystemistopredictfuture

linksofeachtargetuserbasedonhistoricalratingrecords[31,32, 35–37],inwhichthewholedatasetisdividedintoa trainingset andatest set andonlythe linksintraining set isknown before recommendation.However,inmostoftheprevious studies,asfar asweknow, thetest dataset issampledrandomly without con-siderthetemporalorderofthelinks,leadingtoalogicaldisorder in the recommendation process, i.e., predicting the past links in testsetonthe basisofthefuture linksintraining set.Therefore, therealperformance of theexisting recommendationalgorithms, suchasthefamousGlobalRankingMethod(GRM)[18],User-based CollaborativeFiltering(UCF), andMass Diffusionmethod(MD), is actuallynotyetfullyunderstood.Inthispaper,wereexaminesome representativerecommendation algorithms withthe temporal di-vision method,i.e. the observed linksand unknown future links are divided strictly based on their temporal order. We find that mostrecommendationalgorithmshavemuchloweraccuracywhen theyareappliedtotemporaldivisiontrainingsetandtestset,than randomdivisionmethod.Aftercarefulanalysis,wefindthisis be-causethere are more novel objects, with a little historical links intraining set,needtobe recommendedintemporaldivisionset, comparingwiththerandomdivisionset.

Additionally, to solve thisproblem, we presenta new recom-mendationalgorithmtoimprovetherecommendationaccuracyby consideringthe firstappearance time ofthe objectsandusers in system.

3. Problem formulation and evaluation methods

A bipartite network model gives us a clear vision when we dealwiththe probleminvolving two differenttypesof nodes.In networkbased recommendationalgorithm,data sets usually rep-resentedby anundirectedbipartitenetwork G

(

U

,

O

,

E

)

,whereU isthesetofusers, O isthesetofobjectsandE isthesetoflinks between them. The links between uses and objects can also be representedbyan adjacentmatrix A|U|×|O|

= {

aiα

}

,whereaiα

=

1 ifuserui selectedobjectoα andaiα

=

0 otherwise.

The purpose of a recommendation algorithm is to assign a score, Score

(

i

,

α

)

, to each unconnected pair of user and object

(

i

,

α

)

.Thisscorevaluemaybeunderstoodasakindofproximity, whichassociatedwiththeirconnection probability.Thatistosay, fora pairof unconnected nodes

(

i

,

α

)

,the larger the Score

(

i

,

α

)

is,thehighertheprobability therewillexist alinkbetweenuser i and object

α

. In most of the previous studies, the links in E arerandomly divided intotwo subsets: training set ET, whichis treated as known information, and probe set EP, which is un-knownbefore recommendation, to test therecommendation per-formanceofanalgorithm,e.g.,accuracyanddiversity.It’sclearthat ET

EP

=

E andET

EP

= ∅

.

In principle, a recommendation algorithm will provides each targetuserwithandescendingorderedlistofobjectsthatthe tar-getuserdidnot chosenbefore.Foreachrecommendedobjectsin thelist,ifthereexistalinkbetweentheobjectandtargetuserin probeset,wecallitahitting.Notethat,onlytheuserwhohaveat leastoneselectedobjectsintrainingsetandatleastoneconnected objectinprobesetwillbeconsideredwhencomputingalgorithms’ performance.Inthiswork,fourpopularusedmetricsareemployed totesttheeffectivenessofthealgorithms,RankingScore,P recision, P opularit y andDiversit y.

RankingScore (R S)isoneofthemostimportantaccuracy met-rics which measures the ability of recommendation algorithm to assign more score to users’ preferable objects than irrelevant ones.Consideringatargetuseri anditsrecommendationlist,the ranking score of each selected object

α

in probe set is R Si

α

=

rankα

/(|

O

|

kT

i

)

whererankα isthe rankofthe object

α

inthe descendingordered recommendationlist ofthetarget user i and

(3)

Fig. 1. An illustration about the calculation of the four metrics.

kTi is the degree of target user i in training set. Thus, the inte-grated ranking scoreof the wholesystem is theaverage ranking scoreoverallusersandobjectspairsinprobeset,reads

R S

=

1

|

EP

|



(i,α)∈EP

R Si

α

(1)

Notethat,theranksoftheobjectswithsamescoresarethesame andequaltotheirmedian. Itcanclearthatpredictionresultwith higheraccuracywillgethigherrankingscore.

Giventherankingofthenon-observedlinks,thePrecision (P )is definedastheratioofrelevantobjectsselectedtothetotalnumber of objects selected. That is to say, if we take the T op

L links asthe predicted ones, amongwhich hi links are right, then the Precision canbeexpressedas:

P

=

1 n n



i=1 hi L (2)

Clearly,higherprecisionmeanshigherpredictionaccuracy. The metric Popularity (I) measures the average degree ofthe objects in the recommendation list. It is hard for the users to findtherelevantbutunpopular objects.Therefore,a good recom-mendersystemshouldprefertorecommendsmalldegreeobjects. ThemetricPopularity (I)canbeexpressedas:

Ii

(

L

) =

1 L



αOi (3)

whereOi representstherecommendationlistforuseri, repre-sentsthedegreeofthe object

α

.Alow meanpopularity I

(

L

)

for thewhole system indicates a highnovel andunexpected recom-mendationofobjects.

TheDiversity (D)mainlyconsidershowusers’recommendation lists are different from each other. Here, we measure it by the Hammingdistance.We denote Ci j

(

L

)

as the numberof common objectsinthetop-L placeoftherecommendationlistofuseri and

j,theirhammingdistancecanbecalculatedas: Di j

(

L

) =

1

Ci j

(

L

)

L

.

(4)

Di j

(

L

)

is between0 and 1, whichare respectively corresponding tothecaseswhereuseri and j havethesameoranentirely dif-ferent recommendationlist.By averaging Di j

(

L

)

over all pairs of users,weobtainthemeanhammingdistance D

(

L

)

.Themorethe recommendationlistdiffersfromeach other,thehigherthe D

(

L

)

is.

Fig. 1 gives an example ofhow to calculate the four metrics. Fig. 1(a)showsacompletenetworkstructurewhichincludesfour users and five objects. In the whole graph, we can see eleven existent links andnine nonexistent links. Totest thealgorithm’s accuracy,weneedtochoosesomeexistentlinksasprobeset.

For instance, we select

(

U1

,

O3

)

,

(

U2

,

O5

)

,

(

U3

,

O3

)

and

(

U4

,

O4

)

as probe links, which are presented by dotted lines

in Fig. 1(b). Then, the remaining seven links constitute the training set. An algorithm can only make use of the informa-tion contained in the training set (presented by solid lines in Fig. 1(b)).Ifan algorithmassigns scoresofall non-observedlinks as Score

(

U1

,

O2

)

=

0

.

4,Score

(

U1

,

O3

)

=

0

.

8, Score

(

U1

,

O4

)

=

0

.

6,

Score

(

U2

,

O1

)

=

0

.

4, Score

(

U2

,

O3

)

=

0

.

8, Score

(

U2

,

O5

)

=

0

.

7,

Score

(

U3

,

O2

)

=

0

.

4, Score

(

U3

,

O3

)

=

0

.

8, Score

(

U3

,

O4

)

=

0

.

6,

Score

(

U4

,

O1

)

=

0

.

4, Score

(

U4

,

O2

)

=

0

.

5, Score

(

U4

,

O4

)

=

0

.

6,

Score

(

U4

,

O5

)

=

0

.

7.Then, weshould sortthescores in

descend-ingorderforeachtargetuser.

TocalculateR S,theR S valueofU1equals1

/(

5

2

)

=

1

/

3,the

R S valueofU2equals2

/(

5

2

)

=

2

/

3,theR S valueofU3 equals

1

/(

5

2

)

=

1

/

3,theR S valueofU4equals2

/(

5

1

)

=

1

/

2.Hence,

the R S valueofthisalgorithmequals

(

1

/

3

+

2

/

3

+

1

/

3

+

1

/

2

)/

4

=

11

/

24.

To calculate P , here, we set L as 2.Then, the P valueof U1

equals1/2,the P valueofU2 equals1/2,theP valueofU3 equals

1/2,the P valueofU4 equals 0.Hence,the P valueofthis

algo-rithmequals

(

1

/

2

+

1

/

2

+

1

/

2

+

1

/

2

)/

4

=

1

/

2.

To calculate D, here, we set L as 3. Then, the D value of

(

U1

,

U2

)

equals 1

1

/

3

=

2

/

3, the D value of

(

U1

,

U3

)

equals

1

3

/

3

=

0, the D value of

(

U1

,

U4

)

equals 1

2

/

3

=

1

/

3, the

D valueof

(

U2

,

U3

)

equals1

1

/

3

=

2

/

3,theD valueof

(

U2

,

U4

)

equals1

1

/

3

=

2

/

3,theD valueof

(

U3

,

U4

)

equals1

2

/

3

=

1

/

3.

Hence,theD valueofthisalgorithmequals

(

2

/

3

+

0

+

1

/

3

+

2

/

3

+

1

+

1

/

3

)/

6

=

1

/

2. 4. Data and methods

4.1. Dataandthecorrelationofobjects’degree

Inthispaper,we usetwostandarddatasets whichhavebeen widely used to examine the performance of recommendation al-gorithms [27,28]. The first one is the Netflix with 1891 movies (objects)and2294users(http://www.Netflixprize.com/).Usersrate movies from1 (worst)to 5(best). Consistentwiththeliterature, we consider the ratings higher than 2 as a link. Finally, 59464 linksremaininthenetwork.ThesecondoneistheMovielensdata which is a random sample ofthe whole recordsof users ratings during the seven-month period from 19 September 1997 to 22 April 1998 inMovielens.com (http://www.grouplens.org/). It con-sists of 943 users, 1682 movies, and 100000 links. Like Netflix, Movielensisalsobasedona5-starratingsystems.Withthesame ratingfilteringprocessasNetflix,weobtain82520linksin Movie-lensdata.

Becausethelinksoftheobjectsmaybothappearinthetraining setandinthetestset,soweusethenumberoftheobjectslinks occurredinthetrainingsetastheabscissaandthenumberofthe objects linksoccurredinthe test setasthe ordinate.We set the ratiooftrainingsetis90%anddrawthecorrelationoftheobjects’ degree inthe dataset bothdivided by randomandby time. The resultofNetflixisshowninFig. 1,theresultofMovielensisshown inSI.

FormFig. 2,wecanseethatN P and N T almostshowalinear relationship.Itisbecausethatthelinksareselectedastrainingset andprobesetbyequalprobability.Butintheconditionofdivision data sets by temporal information,the distribution of points are more discrete.There exist that some objects’degreesare smaller inthe training setbutbiggerin theprobeset,andsome objects’

(4)

Fig. 2. Theanalysisoftheobjects’andusers’degreeintrainingsetandprobeset,

theNetflixcase.Thescatterplotsinpanels (a)and(b)aretheobjects’sdegreein

trainingsetversusprobesetwhenthedatasetisdividedrandomlyand

accord-ingtemporalinformation,respectively.Theratiooftrainingsetis0.9.Thehighlight

pointsin(b)arethecold-startobjectsandobsoleteobjects,whichareindispensable

inrealrecommendersystem.Panel(c)isthecorrelationofobjects’degreein

train-ingsetandprobesetversusthechangesoftheratiooftrainingset.Thegreenline

andredlinecorrespondtothesetsthatisdividedrandomlyanddividedaccording

thetemporalsequence,respectively.Panel(d)isthesameas(c),butforthe

corre-lationofusers’degree.(Forinterpretationofthereferencestocolorinthisfigure

legend,thereaderisreferredtothewebversionofthisarticle.)

degreesarebiggerinthetrainingsetbutsmallerintheprobeset. Evenmore,someobjects’degreesarezerointhetrainingset,but non-zero in the probe set that is the cold start problem. Those objectsarehighlightedintheFig. 2.WealsocanseethatC

Item andC

U ser almostnear1inthedatasetsdividedbyrandomas thechangingofthetrainingset’sratio.

4.2. Theageoftheobject

Wefurthermovetostudytheattributivecharacteroftheobject andgivethefollowingdefinesfirstly.

Define 1. The age of the object(TA O):we supposed that the objectreceivesthefirstlinkisthebirthday,wesetitTB O.Weset the endingtime of thedataset is TE,so the ageof theobject is TA O

=

TE

TB O.

Define 2. The ageoftheobject(TAU):wesupposedthattheuser receives the first link is the birthday, we set it TBU. We set the endingtime ofthedatasetis TE,sothe ageoftheuseris TAU

=

TE

TBU.

IntheNetflix,wesettheageintervalofobjects



T

=

100 and in this way the whole objects could be divided into 22 groups. Analogously,weassumethatageinterval



T =100andinthisway thewhole objectscould be dividedinto 22groupsinthe Movie-lens.Wecountthenumberofobjectsineveryage’srangebothin the whole datasets but inthe probe sets.In the probe sets,we consideredtherandomdivisionofdatasetsanddatasetsdivided by the temporalinformation inboth cases. The resultsofNetflix andMovielensareshowninFig. 2andSI,respectively.

Form Fig. 3, we can see that in the Netflix, the trend ofthe number in the probe sets divided by random is similar to the numberinthewholedatasetsandthenumberofyoung objects’ agebasedondivisiondatasetsbytemporalinformationaremore thanthenumberofyoungobjects’agebasedondivisiondatasets by random. Incontrast,the numberof old objects’age basedon division data sets by temporal information are smaller than the numberofoldobjects’agebasedondivisiondatasetsbyrandom. It indicatesthat we canget differentprobesets accordingto dif-ferentdivisionways.TheresultofMovielensisshowninSI.

Fig. 3. Theevolvingofobjects’quantityandtheaveragedegreewiththeincreasing

ofobjects’ageintheNetflixdataset.Panels (a)and(b)aretheobjects’the

quan-tityandtheaveragedegreeevolving,respectively,inthewholedatasetofNetflix.

Panels (c)and(d)aretheevolvinganalysisintheprobesetofNetflix.Thesame

analysisofMovielensdatasetisshowninSI.

4.3.Theanalysisofdatasetsdividedbyrandomandtime

The main task of recommendersystem is to predict a future linkofauserbasedonahistoricalratinglink,butwe usedto di-videthedata setintotwoparts byrandom thatcausedthelinks occurredat thelatest time as the training data, andwe use the linksoccurredat thelatest time to predict the linksoccurredat thepasttime.Inordertorevealtheeffectofusers’online behav-iorpatternson recommendersystem,we divided thedataset by time.The recordsthat occurred atthelatest time are settesting data,andothersastrainingdata.Theratiooftrainingsetincreases from10% to 90%.We use three differentrecommendation meth-ods such as user based collaborative filtering(UCF), global rank method(GRM)andmassdiffusionalgorithm(MD)on thedifferent sizes oftraining data set divided by randomand temporal infor-mation.The fourmaterials valueson Netflixare shownin Fig. 3. TheresultoffourmaterialsvaluesonMovielensareshowninSI.

From Fig. 4,we can seethat differentrecommendation meth-odspresentthesametrendwiththeincreaseoftheRatio whether division data set by random or by temporal information. The Ranking Score of all methods are getting worse when the data setdividedbytemporalinformation,itindicates thatthenumber of young links is more than that of old links, and those young linksare notcorrect recommendation. Similarly, the P recision of allmethods are worse,because some newlinksare usedto pre-dicttheoldlinksintheconditionofdivisiondatasetbytemporal information.Finally,we average every material value indifferent ratiooftrainingdatasets,theresultisshowninTable 1.

From Table 1, we can seethat the all metrics almost become worse both in Netflix and Movielens except diversity when the datasetdividedbytemporalinformation.Specifically,comparedto therandom datadivision case, therankingscore R S isincreased by74

.

09%, 50

.

70%, 49

.

27% inNetflixand 35

.

43%, 30

.

67%, 31

.

98% inMovielensaccording to threedifferent methods. The precision P ,popularity I andhammingdistance D ofdifferentmethodsare alsoshowninTable 1.The resultsofprecision P in Table 1 con-firm our finding that the temporal data division could result in a lower recommendationaccuracy. Compared withthe precision in the random data division case, P in the temporal data divi-sionisdecreasedby27

.

10%,35

.

82%,45

.

25% inNetflixand32

.

65%, 38

.

64%,46

.

76% inMovielens. Theseresultsindicatethattemporal data division mechanism can indeed reduce the general recom-mendationaccuracy.

(5)

Table 1

Theaveragevalueofallmetricsindifferentratiooftrainingdataset.

GRM UCF MD

Random Time Random Time Random Time

Netflix R S 0.108 0.186 0.121 0.183 0.116 0.173 P 0.165 0.107 0.170 0.118 0.175 0.119 I 290.0 290.9 250.7 271.0 242.6 265.2 D 0.203 0.355 0.590 0.542 0.665 0.616 Movielens R S 0.158 0.201 0.146 0.198 0.135 0.196 P 0.282 0.190 0.340 0.209 0.372 0.198 I 204.2 205.7 187.3 189.4 180.1 183.1 D 0.300 0.565 0.625 0.691 0.7154 0.739

HerewesettherecommendationlistL

=

10.

Fig. 4. Theperformanceofthreeclassicrecommendationalgorithms,globalranking

method(GRM),user-basedcollaborativefiltering(UCF)andmassdiffusionmethod

(MD),bothonthedatasetdividedrandomlyandbytimesequence.Fourkey

cri-terionsofrecommendersystems,rankingscore,precision,popularityanddiversity,

areemployedinthisfigure.Thehorizontalaxisistheratiooftrainingset,andthe

lengthoftherecommendationlistisL=10.

Throughtheseriesofanalysis,wecanseethatrecommendation algorithmcanachievebetterrecommendationresultsinthecaseof divisiondatasetsbyrandom,butwhenweconsiderthetemporal information,theperformanceofthealgorithmhasdropped,sowe need to presenta newalgorithm that can achieve better results whenthedatasetsaredividedbytemporalinformation.

4.4.AlgorithmofADmodel

The basicidea ofouralgorithm is: firstly, we setunique

θ

to every target useraccording totheir ages.The olderusers in real systemsshow asmall

θ

whilethe youngerusersare withhigher

θ

.Second,weranktheobjectaccordingtotheiragesanddegrees. Weput theobjectswhich ageissmaller anddegree isbigger on theforefrontofthequeueL1.Inthenext,webuildanotherqueue L2 bymassdiffusionalgorithm.So,

θ

istheparametertoadjustthe twolists.Actually,theyoungerusers don’thavemuchexperience in exploring new objects, they are more likely to conservatively chooseobjectsinthegroupwhichthey arealreadyfamiliarwith. Onthecontrary,the olderuserinrealsystemsinclinesto search andtryunpopularobjects.So,ifthetargetuserisolder,theweight ofL1 is morethan L2, itmeansthat weprefer torecommendto theiryoungerandunpopularobjects.

TheframeworkofourADalgorithmforrecommendersystemis asfollows.

Algorithm AD_RS (ADforrecommendersystem) Input: A: Theadjacencymatrixofnetwork; Output: Score: Thescorematrix;

Begin

Step 1: We setunique

θ

to everytarget useraccordingto for-mula(5):

θ =

max

(

U ser Age

) −

T ragetU ser Age

max

(

U ser Age

) −

min

(

U ser Age

)

(5)

Step 2: We set Age O bjecti is the age of the Oi and Degree O bjecti isthedegree of Oi.Wecalculatethe valueof ev-eryobjectby formula(6)andsorttheirvaluefromsmalltolarge. WesetthelistasL1.Next,weusetherankingreplacetothe rec-ommendedvalueintheL1.

value

=

Age O bjecti Degree O bjecti

(6)

Step 3: We getarecommendedlistbyMassDiffusionalgorithm. WesetthelistasL2 andsorttheirvaluefromlargetosmall.Next, we usethe ranking replace to the recommendation value in the L2.

Step 4: We generateanewrecommendedlisttotargetuserby formula(7)andusethesimilarit y asthescorematrix.

similarit y

= (

1

− θ) ∗

log2L1

+ θ ∗

log2L2 (7)

Step 5: We sortthesimilarityfromsmalltolarge,S isthe rec-ommendedlistforalluser.

End 5. Results

In this section, we empirically demonstrate the effectiveness of AD algorithm on two real world networks. We also compare itsperformanceagainstthetraditionalrecommendationalgorithms suchastheglobalrankmethod(GRM),userbasedcollaborative fil-tering (UCF), massdiffusion(MD), tensorfactorizationmodel(TF), time-weighted network model (TWN). We focus on the ranking score, precision,popularityand diversityofthe algorithm.In our experiments,wesettheratiooftrainingsetincreasesfrom10%to 90%.Theresultofdifferentrecommendationmethodsondifferent sizesofthetrainingdatasetareshowninFig. 5.Wecalculatethe averagevalueofallmetrics,theresultisshowninTable 2.

Accuracy is always the first consideration inevaluating a rec-ommendationalgorithm’sperformance.Comparingtheresultfrom the six methods, we can see that the AD outperformsthe other five algorithms on ranking score andthe average value of preci-sion of the AD outperforms other three on the datasets Netflix andMovielens. For instance,compared withthat ofTF,the aver-ageranking score R S canbereducedby 12.12%forNetflix,2.11% for Movielens. Whenthis comparisonis againstGRM, the reduc-tions arefurtherenlarged,to22.61%fortheNetflix,7.70%forthe Movielens.Besides,comparedwiththatofTF,theaveragevalueof precision P oftheADcanbeincreasedby4.20%forNetflix,3.61% for Movielens. Whenit compared withthat of GRM,the average

(6)

Table 2

Theaveragevalueofallmetricsindifferentratiooftrainingdataset.

GRM UCF MD TF TWN AD Netflix R S 0.188 0.183 0.174 0.165 0.156 0.145 P 0.107 0.118 0.118 0.119 0.122 0.124 I 290.9 271.0 265.2 262.5 259.6 256.2 D 0.355 0.542 0.616 0.606 0.618 0.626 Movielens R S 0.201 0.198 0.196 0.190 0.188 0.186 P 0.190 0.201 0.198 0.194 0.196 0.201 I 201.7 189.4 183.1 182.7 179.9 176.1 D 0.565 0.691 0.756 0.638 0.649 0.667

HerewesettherecommendationlistL=10.

Fig. 5. Comparisonsofaverage RankingScore,Precision, PopularityandDiversity

betweenthedifferentalgorithmsindifferentsizesoftrainingsetontheNetflix.

HerewesettherecommendationlistL=10.

value ofprecision oftheAD can beincreasedby 16.39%for Net-flix, 5.78% for Movielens. The same effect is compared with the restofthetwoalgorithms.Generally,theADcanprovidethebest orverynearlythebestaccuracy,asmeasuredbytherankingscore andprecision.

Besidesaccuracy,popularityI anddiversityD aretwoother im-portantmetrics.Fromtheresult,wecanseethattheaveragevalue of popularityof the AD outperforms other five on different data sets, specifically, comparedwith that ofTWN, the average popu-larity I can be reducedby 1.31%forNetflix, 2.11%forMovielens. WhenitcomparedwiththatofGRM,theaveragepopularityI can be reducedby 11.91% for Netflix, 12.71% for Movielens. Besides, theaveragevalueofdiversityoftheADoutperformsotherfiveon theNetflix.

6. Conclusion

Withthe large amountof networkdata become readily avail-able in electric form today, recommender system has become a popular subarea in data mining. The aim of the traditional rec-ommendationmethods is toimprove theaccuracy ofthe recom-mendation,they all ignorethe division of thedata sets whichis the basicproblem ofthe recommendation. In thispaper, we an-alyzedthe distinctiveness betweenthe randomdata division and temporal data division. We find that the distribution of the ob-jects’degree correlationisnon-linear,that meansthat thereexist that some objects’degreeare smallerinthe trainingset but big-gerinthe probeset,andsome are bigger inthe training setbut smallerintheprobeset.Evenmore,someobjects’degreearezero

inthe training set, butnon-zero in the probe set. Then we find theeffectofmanymethods gettingworse whenthedatasets are dividedbytemporalinformation.Sowepresentanew recommen-dationmodel inwhich we setweight to each userbasedon the temporalinformationandconsideredthedegreeandtemporal in-formationoftheobjects.Ourexperimentalresultsshowthatitcan obtainhigherqualityresultsonthebipartitenetworks.

Acknowledgements

This research was supported in part by the Chinese National NaturalScienceFoundationundergrantNos.61602202,61379066, Natural Science Foundation of Jiangsu Province under contracts BK20160428,BK20161302,533talentprojectinHuaian.Xiao-Long Ren,MatusMedo,AnZengalsosupportedthiswork.

Supplementary material

Supplementary content related to this article has been pub-lishedonlineathttps://doi.org/10.1016/j.physleta.2017.11.027. References

[1]R.N.Lichtenwalter,J.T.Lussier,N.V.Chawla,Newperspectivesandmethodsin linkprediction,in:Proceedingsofthe16thACMSIGKDDInternational Confer-enceonKnowledgeDiscoveryandDataMining,ACM,2010,pp. 243–252.

[2]L.Lü,T.Zhou,Linkpredictionincomplexnetworks:asurvey,Phys. A,Stat. Mech.Appl.390 (6)(2011)1150–1170.

[3]J.Liu,Y.Dang,Z.Wang,et al.,Relationshipbetweenthein-degreeand out-degreeofWWW,Phys.A,Stat.Mech.Appl.371 (2)(2006)861–869.

[4]R. Guimera, M. Sales-Pardo, Missingand spuriousinteractions and the re-construction of complex networks, Proc. Natl. Acad. Sci. 106 (52) (2009) 22073–22078.

[5]A.Papadimitriou,P.Symeonidis,Y.Manolopoulos,Fastandaccuratelink pre-dictioninsocialnetworkingsystems,J.Syst.Softw.85 (9)(2012)2119–2132.

[6]T.Hossmann,G. Nomikos, T.Spyropoulos, et al., Collectionand analysis of multi-dimensionalnetworkdataforopportunisticnetworkingresearch, Com-put.Commun.35 (13)(2012)1613–1625.

[7]K.Jahanbakhsh,V.King,G.C.Shoja,Predictingmissingcontactsinmobilesocial networks,PervasiveMob.Comput.8 (5)(2012)698–716.

[8]Y. Sun, R. Barber, M. Gupta, et al., Co-author relationship prediction in heterogeneousbibliographicnetworks,in: 2011International Conferenceon Advances in Social Networks Analysis and Mining, ASONAM, IEEE, 2011, pp. 121–128.

[9]X.Li,H.Chen,Recommendationaslinkpredictioninbipartitegraphs:agraph kernel-basedmachinelearning approach,Decis.SupportSyst.54 (2)(2013) 880–890.

[10]Z.Huang,D.K.J.Lin,Thetime-serieslinkpredictionproblemwithapplications incommunicationsurveillance,INFORMSJ.Comput.21 (2)(2009)286–303.

[11]H.Liu,Uncoveringthenetworkevolutionmechanismbylinkprediction,Sci. Sin.Phys.Mech.Astron.41(2011)816.

[12]D. Lin, An information-theoretic definition of similarity, ICML 98 (1998) 296–304.

[13]D.Goldberg,D.Nichols,B.M.Oki,etal.,Usingcollaborativefilteringtoweave aninformationtapestry,Commun.ACM35 (12)(1992)61–70.

[14]J.B. Schafer, D. Frankowski, J. Herlocker, et al., Collaborative Filtering Rec-ommender Systems. The Adaptive Web, Springer, Berlin–Heidelberg, 2007, pp. 291–324.

[15]J.Li,Y.Xu,Y.Wang,etal.,Strongestassociationrulesminingforefficient ap-plications,in:2007InternationalConferenceonServiceSystemsandService Management,IEEE,2007,pp. 1–6.

(7)

[16]F.H.Wang,S.Y.Jian,Aneffectivecontent-basedrecommendationmethodfor Webbrowsingbasedonkeywordcontextmatching,J.Inform.Electron.1 (2) (2006)49–59.

[17]C.Wartena,W.Slakhorst,M.Wibbels,Selectingkeywordsfor contentbased recommendation,in:Proceedingsofthe19thACMInternationalConferenceon InformationandKnowledgeManagement,ACM,2010,pp. 1533–1536.

[18]T.Zhou,J.Ren,M.Medo,etal.,Bipartitenetworkprojectionandpersonal rec-ommendation,Phys.Rev.E76 (4)(2007)046115.

[19]Y.C.Zhang,M.Blattner,Y.K.Yu,Heatconductionprocessoncommunity net-worksasarecommendationmodel,Phys.Rev.Lett.99 (15)(2007)154301.

[20]T. Zhou,Z.Kuscsik, J.G. Liu,et al., Solvingthe apparent diversity–accuracy dilemma of recommender systems, Proc. Natl. Acad. Sci. 107 (10) (2010) 4511–4515.

[21]J.Lu,D.Wu,M.Mao,etal.,Recommendersystemapplicationdevelopments:a survey,Decis.SupportSyst.74(2015)12–32.

[22]R.Yera,MartinezL.Fuzzy,Toolsinrecommendersystems:asurvey,Int.J. Com-put.Intell.Syst.10 (1)(2017)776–803.

[23]J. Bobadilla, F. Ortega, A. Hernando, et al., Recommender systems survey, Knowl.-BasedSyst.46(2013)109–132.

[24]H.Ma,T.C.Zhou,M.R.Lyu,etal.,Improvingrecommendersystemsby incorpo-ratingsocialcontextualinformation,ACMTrans.Inf.Syst.29 (2)(2011)9.

[25]Z.K.Zhang,C.Liu,Y.C.Zhang,etal.,Solvingthecold-startproblemin recom-mendersystemswithsocialtags,Europhys.Lett.92 (2)(2010)28002.

[26]X.Ren,L.Lü,R-R.Liu,et al.,Avoidingcongestioninrecommendersystems, NewJ.Phys.16 (6)(2014)63057.

[27]P.Campos,G.Diezf,I.Cantador,Time-aware recommendersystems: a com-prehensivesurveyandanalysisofexistingevaluationprotocols,UserModel. User-Adapt.Interact.24 (1–2)(2014)67–119.

[28]J.Vinagre,A.M.Jorge, J.Gama,An overviewonthe exploitationoftimein collaborativefiltering,WileyInterdiscip.Rev.Data Min.Knowl. Discov.5 (5) (2015)195–215.

[29]Y.Koren,Collaborativefilteringwithtemporaldynamics,Commun.ACM53 (4) (2010)89–97.

[30]L. Xiong, X.Chen, T.K. Huang, et al., Temporal collaborative filtering with bayesianprobabilistictensorfactorization,SDM10(2010)211–222.

[31]Q.Guo, W.J.Song, L.Hou, et al., Effect ofthe time windowon the heat-conductioninformationfilteringmodel,Phys.A,Stat.Mech.Appl.401(2014) 15–21.

[32]F.Xie, Z.Chen,J.Shang,etal.,A Linkpredictionapproachforitem recom-mendationwithcomplexnumber,in:2014IEEE/WIC/ACMInternationalJoint ConferencesonWebIntelligence(WI)andIntelligentAgentTechnologies(IAT), vol. 1,IEEE,2014,pp. 205–212.

[33]C.DeMaio,G.Fenza,M.Gallo,etal.,Socialmediamarketingthrough time-awarecollaborativefiltering,in:Concurrencyand Computation:Practiceand Experience,2017.

[34]A.Sara,Y.E.B. ElIdrissi, R.Ajhoun, Timeaware recommendation,in: Infor-mationandCommunicationTechnologyforTheMuslimWorld,ICT4M,2016, pp. 244–247.

[35]H.Liao,A.Zeng,R.Xiao,etal.,Rankingreputationandqualityinonlinerating systems,PLoSONE9 (5)(2014)e97146.

[36]Q.M.Zhang,A.Zeng,M.S.Shang,Extractingtheinformationbackboneinonline system,PLoSONE8 (5)(2013)e62624.

[37]R.Rafeh,A.Bahrehmand,Anadaptiveapproachtodealingwithunstable be-haviourofusersincollaborativefilteringsystems, J. Inf.Sci.38 (3) (2012) 205–221.

Figure

Fig. 1. An illustration about the calculation of the four metrics.
Fig. 2. The analysis of the objects’ and users’ degree in training set and probe set, the Netflix case
Fig. 4. The performance of three classic recommendation algorithms, global ranking method (GRM), user-based collaborative filtering (UCF) and mass diffusion method (MD), both on the data set divided randomly and by time sequence
Fig. 5. Comparisons of average Ranking Score, Precision, Popularity and Diversity between the different algorithms in different sizes of training set on the Netflix.

Références

Documents relatifs

In order to simplify the process of converting numbers from the modular representation to the positional representation of numbers, we will consider an approximate method that allows

The cunning feature of the equation is that instead of calculating the number of ways in which a sample including a given taxon could be drawn from the population, which is

Throughout this paper, we shall use the following notations: N denotes the set of the positive integers, π( x ) denotes the number of the prime numbers not exceeding x, and p i

In a Angle Cover by a convex pentagon, if a point out side the pentagon lies in a Great Angle (angle formed by three adjacent vertices of the pentagon, the center one being the

Instrumentation 2x Alto Saxophone 2x Tenor Saxophone Baritone Saxophone 4x Trumpet / Flugelhorn 3x Tenor Trombone Bass Trombone Electric Guitar Bass Guitar

Monomethoxy poly(ethylene oxide) (Mn = 2000 g/mol) was first end-capped by the RAFT agent, by esterification of the hydroxyl end-group with the carboxylic acid function of

the one developed in [2, 3, 1] uses R -filtrations to interpret the arithmetic volume function as the integral of certain level function on the geometric Okounkov body of the

EXPECTATION AND VARIANCE OF THE VOLUME COVERED BY A LARGE NUMBER OF INDEPENDENT RANDOM