• Aucun résultat trouvé

The effect of heterogeneous dynamics of online users on information filtering

N/A
N/A
Protected

Academic year: 2021

Partager "The effect of heterogeneous dynamics of online users on information filtering"

Copied!
6
0
0

Texte intégral

(1)

The

effect

of

heterogeneous

dynamics

of

online

users

on

information

filtering

Bo-Lun Chen

a

,

b

,

c

,

An Zeng

d

,

,

Ling Chen

a

,

b

aDepartmentofComputerScience,YangzhouUniversityofChina,Yangzhou225127,China

bDepartmentofComputerScience,NanjingUniversityofAeronauticsandAstronauticsofChina,Nanjing210016,China cDepartmentofPhysics,UniversityofFribourg,CheminduMusee3,CH-1700Fribourg,Switzerland

dSchoolofSystemsScience,BeijingNormalUniversity,Beijing100875,China

The rapid expansion ofthe Internet requires effective information filtering techniques to extract the mostessentialand relevantinformationforonlineusers.Manyrecommendationalgorithmshavebeen proposed to predict the future itemsthat agiven user mightbe interested in. However, there isan important issue that has always been ignored so far in related works, namely the heterogeneous dynamics of online users. The interest of active users changes more often than that of less active users, which asks for different update frequency of their recommendation lists. In this paper, we developaframework tostudytheeffectofheterogeneous dynamicsofusersontherecommendation performance. We find that the personalized application of recommendation algorithms results in remarkableimprovement intherecommendationaccuracyand diversity.Ourfindingsmayhelponline retailersmakebetteruseoftheexistingrecommendationmethods.

1. Introduction

Withthefast development oftheWorld Wide Web,ourdaily livesdependmoreandmoreontheInternet.However,howtofind the information we need is not a simple problem [1]. The huge amount of online items such as the movies, books, bookmarks make it impossible foreveryone to go over every item and find their favorite.Manyapproachessuchasthecollaborativefiltering [2,3],matrixfactorization[4–6],resourcediffusion[7–9]havebeen intensivelyinvestigated recently.This so-calledinformation filter-ing problem attracts researchers from computer science [10,11], physics[12–14],psychology[15,16],management[17] andso on. Theresearch issues rangefromtherecommendationaccuracy [7] anddiversity[9]tothesustainabilityofthewholesystemin evo-lution[18].Inthiscontext,manyrecommendationalgorithmshave beenproposedtohelponlineusersfilteroutirrelevantinformation andnarrowdownthesearchspace[19,20].

In the literature, the studies on recommender systems over-whelmingly focus on the recommendation techniques while the effect of online users’ features on the recommendation process has received far less attention. Research on human dynamics has shown clearly that the behavior of online users is

hetero-*

Correspondingauthor.

E-mailaddress:anzeng@bnu.edu.cn(A. Zeng).

geneous [21].Atindividual level,theinter-eventtime ofitem se-lectionsexhibitstheburstproperty,i.e.insomeperiodusersselect itemsfrequentlywhileinsomeotherperiodthetimebetweentwo selections of a user can be very long [22]. At system level, the distribution of users activity is very broad,indicating that some users are very active while many other users are much less ac-tive [23,24]. Moreover,it hasbeen revealedthat onlineusers are driven bydifferentnetworkgrowthmechanismwhenthey estab-lishnewlinksinthenetwork[25].Inthispaper,wefocusonhow the heterogeneous dynamics of onlineusers affects the informa-tionfilteringprocess.

Toevaluatetheaccuracyanddiversityofrecommendation, usu-ally the realdata (i.e.links) are randomly divided intotwo sets: the training set represents theknown historical information that canbeusedbytherecommendationalgorithm;theprobeset rep-resentstheunknownfutureinformationthatisusedtocheckthe quality ofthe recommendation[1]. Thisimpliesthat thenumber of links that the recommendation algorithm tries to predict for each user is proportionthe cumulative degree ofthe user. How-ever, this assumption could be problematic from practical point ofview.Sincethebehaviorofonlineusersis veryheterogeneous, someuserscouldbeveryactiveandrequirestherecommendation listtobeupdatedveryoften.Somelessactiveusers,ontheother hand,mayrequirelessfrequentupdateoftherecommendationlist [26].Therefore,it isnecessaryto take intoaccount the

heteroge-Published in 3K\VLFV/HWWHUV$ ± ±

which should be cited to refer to this work.

(2)

neous dynamicsofonlineusersindatadivision andexaminethe performanceofrecommendationalgorithmswhentheamountsof linksforpredictionareunequalfordifferentusers.

Inthispaper,we proposeaheterogeneousdatadivisionmodel fortherecommendationprocess.Insteadofrandomlydividingthe linksintothetraining setandprobe set,thenumberof linksfor eachuserintheprobesetisdeterminedbytheuser’sdegree.The biastowardsdifferentdegreeiscontrolledbyatunableparameter. Byimplementingsomerepresentativerecommendationalgorithms, wefindthatdifferentdatadivision indeedsignificantlyinfluences theevaluationresultsoftheexistingrecommendationalgorithms. Interestingly,ifthenumberofprobelinksforlargedegreeusersis smallerthanthatfromtherandomdivision(i.e.updatethe recom-mendationlistmorefrequentlyforlargedegreeusers),theoverall recommendationaccuracyanddiversityareimproved.

2. Relatedwork

In the literature, many researchers have considered the het-erogeneityofonlineusers whendesigning recommendation algo-rithms. Motivatedby the observed significant difference between users’structuralpropertiesinthenetwork, Zhangetal.proposed to remove redundantlinks foreach user to extract the so-called informationbackbone [19].Guanetal.observedthatlargedegree users tendto select niche objects whilesmall degree users tend toselect popular objects.They thus proposed a personalized hy-bridalgorithm in which each useris assigned with a parameter toadjustthe popularityoftheobjects showninhis/her personal recommendationlist[27].Zengetal.arguedthatduetousers’ het-erogeneity,they are carrying differentamount ofinformation for therecommendationalgorithms.Accordingly,Zengetal.identified some coreusers inthe networkandachieve 90% oftheaccuracy bytakingonly20%ofthecoreusers’dataintoaccount[20]. Simi-larideashavebeenextendedtothestudyofonlinesearchengine. Sugiyamaetal.proposedapersonalizedwebsearchengine accord-ingtoeachuser’sneedforrelevantinformationwithoutanyuser effort[28].Userheterogeneity willresultindifferentinformation needsforeachuser’squery.Therefore,thesearchresultsshouldbe adaptedtouserswithdifferentinformationneeds.

Comparedtothe recommendationalgorithm design,user het-erogeneityhas lessimpact on the research on the data division. Mostoftherecommendationalgorithmswere validatedbasedon therandom datadivision, whichobviouslyneglects theuser het-erogeneity in selecting items. In a recent review [1], it is men-tionedthatrecommendationshouldbedonewiththedatadivided intothe training setandprobe setbasedon thetime stamps on links. Focusing on the over-fitting problems forrecommendation algorithms, Zeng et al. proposed a triple data division model in which the realdata is divided into a training set, a learning set andaprobeset[13].The basicideaistoestimate users’ parame-terswiththelearningsetandthenappliedthelearnedparameters toactuallypredictusers’futureobjectsintheprobeset.Sincethe purchasebehaviorswhichhappenedlongtimeagocouldnottruly reflectthe currentinterests ofthe target user,Guo et al. investi-gated theimpact ofthe time window onthe training seton the recommender algorithms [29]. In order to improve the diversity andaccuracyoftherecommendersystem,Songetalpresentedan improved hybrid informationfiltering of adopting the partial re-cent information in terms of the face that the recent behaviors are more effective to capture the users’ potential interests, they alsogeneratedaseriesoftrainingsets,eachofwhichistreatedas knowninformationtopredictthefuturelinksprovenbytheprobe set[23].

Takentogether,theresearch onusers’heterogeneitymainly fo-cusesonnewalgorithmdesignandthemodificationonthe train-ingset.Onecrucialdirectionhasnotbeeninvestigatedsofar,that

is theeffect ofusers’heterogeneity on the probeset. Inthis pa-per, we take into account the heterogeneous dynamics of online users and associate it with the length of the probe set (i.e. the numberofitemstheywillconnecttointhefuture).Wearguethat active users and inactive users should have different amount of linksintheprobeset.Thisassumptionnotonlyhelpsusto under-stand betterthe recommendationprocessin realsystembutalso providesuswithabetter implementationoftherecommendation algorithms (i.e.update users’recommendationlistswithdifferent frequency).

3. Dataandmodel

Inthispaper,we usetwo standarddatasets whichhavebeen widely used to examine the performance ofrecommendation al-gorithms [29–31].The first one is the Movielensdata with1682 movies (items) and 943 users (http://www.grouplens.org/). Users rate movies from 1 (worst) to 5 (best). Consistent with the lit-erature, we consider the ratings higher than 2 as a link. Finally, 82520 links remain in the network. The second one is the Net-flixdatawhichisa randomsampleofthewholerecordsofusers ratingsinNetflix.com(http://www.netflixprize.com/).Itconsistsof 2294users,1891movies,and71074links.LikeMovielens, Netflix isalso basedona 5-starratingsystem. Withthesame rating fil-teringprocess, weobtain 59464linksinNetflixdata.Throughout thispaper,wemainlypresenttheresultsonNetflixdatabyfigures andtheresultsofbothdatasetsarereportedbytables.

In ordertomodel theprediction processof therecommender systems,theabove data(i.e.links)aredividedintotwoparts:the trainingset ET representstheknowninformationwhiletheprobe setEP representstheunknowninformationforprediction. Consid-ering the heterogeneous dynamics of users, the division of links intothesetwosetsarenotcompletelyrandom.Asactiveand inac-tiveusersspenddifferentamountoftimeonline,their recommen-dation needs tobe updated withdifferentfrequency. Some users arevery active,theirrecommendationlistsshouldbeupdated of-ten.Forthelessactive users,thegeneratedrecommendationlists couldbeusedforarelativelylongertime.Accordingly,wepropose a datadivision model inwhich theamount ofdata inthe probe set for each user is tunable. In each step,we randomly pick up a useri with the probability pi

=

kθi

/



jkθj. Then one ofhis/her links israndomly moved to the probeset. The process is termi-natedwhenthetotalamountoflinksintheprobesetreaches10% ofthelinksintheoriginalnetwork.Here,

θ

isatunableparameter. When

θ =

1, thedata division process reduces to the traditional random datadivision. When

θ <

1,the linksconnectingto small degree users are morelikely to be moved to the probe set, and viceversa.

4. Recommendationalgorithms

In this paper, we consider the hybrid recommendation algo-rithm which combines the Mass diffusion and Heat conduction methods [9].The user-item bipartite networkis characterized by anadjacencymatrix A wheretheelementaiα equalsto1 ifuseri hascollectedobject

α

,and0 otherwise.Thenumberofusersand items is denoted as N and M, respectively. Consistent with the literature, we use Latin and Greek letters, respectively, for user-anditem-relatedindices.Togeneratetherecommendationlistfor aspecificuseri,theHybridmethodstarts byassigningeachitem selectedby user i oneunitof resource.Thisresource assignment canberepresentedbyavector fi.Theresourcesoftheseselected itemsthendiffuseinthebipartitenetworkfortwostepswiththe transitionmatrixW .Eachcomponentinthismatrixcanbe com-putedas

(3)

wαβ

=

1 k1α−λkλβ n



i=1 aαiaβi ki (1)

wherekα isthedegreeofitem

α

andkiisthedegreeofuseri.The Hybrid methodreducesto the standard Heatconductionmethod when

λ

=

0,andthestandardMassdiffusionmethodwhen

λ

=

1. The final recommendation score of each item can be computed as→−fi

=

f W . The recommendationlist foruser i is obtained by sorting→−fiindescendingorder.

5. Evaluation

Aneffectiverecommendationshouldbeabletoaccuratelyfind the itemsthat users like.In order to measure the recommenda-tion accuracy, we make use ofrankingscore (RS). Specifically, RS measureswhethertheordering oftheitemsinthe recommenda-tion lists matches users’real preference. As discussed above, the recommender system will provide each user with a ranking list which contains all his uncollected items. For a target user i, we calculatethepositionforeach ofhislinksintheprobeset.Ifone ofhis uncollected item

α

is ranked atthe 3thplace andthe to-talnumberofhisuncollecteditemsis100, therankingscoreRSiα willbe0.03.Inagoodrecommendation,theitemsintheprobeset shouldberankedhigher,sothatRS willbesmaller.Therefore,the meanvalue oftheRS overalltheuser-itemrelationsintheprobe setcanbeusedtoevaluatetherecommendationaccuracyas

RS

=

1

|

EP

|



EP

RSiα

.

(2)

ThesmallerthevalueofRS,thehighertherecommendation accu-racy.

Usually,onlineweb sitesonlypresentthetoppartofthe rec-ommendationlisttousers.Therefore,amorepractical recommen-dation accuracy measurement called precision (P ) is considered. Foreachuseri,theprecisionofrecommendationiscalculatedas

Pi

(

L

) =

di

(

L

)

L

,

(3)

wheredi

(

L

)

represents thenumberof useri’s deleted links con-tained in the top-L places in the recommendation list. For the whole system, the precision P

(

L

)

can be obtained by averaging the individual precisions over all users withat least one link in theprobeset.ThehigherthevalueofP

(

L

)

,thebetterthe recom-mendations.

Predicting what a user likes from the list of the most popu-laritems is generallyeasy in recommendation, while uncovering users’ very personalized preference (i.e. uncovering the unpopu-laritemsintheprobe set) ismuchmore difficultandimportant. Therefore,diversityshouldbeconsideredasanothersignificant as-pectsforrecommendersystemsbesidesaccuracy.Inthispaper,we employ two kinds of diversity measurement: personalization (D) andnovelty (I).

Thepersonalizationmainlyconsidershowusers’ recommenda-tionlists are differentfromeachother.It isusually measured by Hammingdistance. We denote Ci j

(

L

)

asthe numberof common itemsinthetop-L placeoftherecommendationlistofuseri and

j,theirhammingdistancecanbecalculatedas

Di j

(

L

) =

1

Ci j

(

L

)

L

.

(4)

Di j

(

L

)

is between0 and 1, whichare respectively corresponding tothecaseswherei and j havethesameoranentirelydifferent recommendationlist. By averaging Di j

(

L

)

over all pairs of users, we obtain the mean hammingdistance D

(

L

)

. The more the rec-ommendationlistdiffersfromeachother,thehighertheD

(

L

)

is.

Fig. 1. (Coloronline.) (a)rankingscore,(b)precision,(c)noveltyand(d)

personal-izationin(θ,

λ

)planewhentheheterogeneous datadivisionisusedintheNetflix

data.TherecommendationalgorithmistheHybridmethod.Theresultsareaveraged

over10timesofindependentrealizations.

The novelty measures the average degree of the itemsin the recommendation list.Forthose popularitems, users mayalready get them fromother channels. However, it is hard for the users tofindtherelevantbutunpopularitem.Therefore,agood recom-mender system should prefer to recommend smalldegree items. Themetricnovelty canbeexpressedas

Ii

(

L

) =

1 L



αOi kα (5)

where Oirepresentstherecommendationlistforuseri, repre-sentsthedegreeoftheitem

α

.AlowmeanpopularityI

(

L

)

forthe whole systemindicates a highnovel andunexpected recommen-dationofitems.

6. Results

Westart frominvestigatingthedependenceofthe recommen-dation performance on parameter

λ

and

θ

in Netflix.The results areshownasheatmapsinFig. 1.InRef.[9],ithasbeenshownthat the hybridmethodofMDandHCalgorithm canachieve a better RS than the pure MDand pure HC methods when the real data is randomly divided into training set and probe set (i.e.

θ =

1). In Fig. 1(a), we findthat a minimum RS still exists when

θ =

1 andtheoptimalRS underallpossiblevalueof

λ

and

θ

happensin the region where

θ <

1. Specifically, the optimal parameters are

θ

=

0

.

7 and

λ

=

0

.

4 in Netflix and

θ

=

0

.

1 and

λ

=

0

.

5 in Movielens. Comparedtothe randomdatadivision case, RS is im-provedby2

.

66% inNetflixand13

.

64% inMovielens.Theseresults indicate that introducingthe heterogeneousdata division mecha-nism can indeed enhance the general recommendationaccuracy. As mentioned above, the heterogeneous data division is corre-sponding to personalized update of the recommendation list. In this sense, our resultssuggest that if personalized update of the recommendationlistisimplementedinpracticethe recommenda-tionaccuracycouldbeimproved.

We thenreport theprecision P

(

L

)

,novelty I

(

L

)

andhamming distance D

(

L

)

ofthehybridmethodinFig. 1(b),(c),(d).Allthese metrics depend on therecommendation list L. In this paper, we set L

=

10 according to the literature [9].The resultsof P

(

L

)

in Fig. 1(b)confirmourfindingthattheheterogeneousdatadivision couldresultinahigherrecommendationaccuracy.Comparedwith theprecisionintherandomdatadivisioncase, P

(

L

)

inthe hetero-geneous datadivisionisimprovedby7.21%inNetflixand9.23%in Movielens. Theeffectof

θ

onrecommendationdiversity,however, isminor,asshowninFig. 1(c),(d).

(4)

Fig. 2. (Coloronline.) Thedependence ofthe(a)rankingscore,(b)precision, (c)

noveltyand(d)personalizationon

λ

whendifferentvaluesof

θ

areset.Thedatain

thisfigureisNetflixandtherecommendationalgorithmistheHybridmethod.The

resultsareaveragedover10timesofindependentrealizations.

Fig. 3. (Coloronline.) Thedependenceof

λ

∗ (withrespecttorankingscoreand

precision)on

θ

in(a)Netflixand(b)Movielens.Theresultsareaveragedover10

timesofindependentrealizations.

Inordertoshowthedetailedeffectof

θ

on the recommenda-tion results,we select several

θ

valuesand plotthe dependence ofdifferentrecommendationmetricson

λ

inFig. 2.We findthat thereisindeedanoptimal

λ

∗ forrankingscoreandprecision un-dereach

θ

.Thevalueof

λ

changeswith

θ

.Thecurvesofdifferent

θ

innoveltyandpersonalizationarehighlyoverlapped.Itconfirms thatthereisonlyslighteffectof

θ

onrecommendationdiversity.

The resultsin Fig. 2 indicate that

λ

depends on

θ

. We then quantitativelyinvestigatethisphenomenoninFig. 3.Wefindthat

λ

decreases with

θ

inbothNetflixandMovielensdatasets.This isbecausethenumberoflinksinprobesetforlargedegreeusers willincrease when

θ

islarge.Aslargedegreeuserstendtoselect smalldegreeitems[21].Morelinksconnectingtothesmalldegree itemswillbe placedin theprobe setwhen

θ

islarge.Therefore, thehybridmethodneedstogivemoreweightontheheat conduc-tionmethodinordertopredicttheselinksmoreaccurately.When rankingscoreandprecisionmetrics areused todetermine

λ

∗,its valuecanbeabitdifferent.However,thedecreasingtrendof

λ

∗is unchanged.

Wefurthermove tostudythe dependenceofthe optimal rec-ommendationaccuracy(when

λ

∗ isused) on

θ

inFig. 4.Wefind thatRS∗ reachesabestvaluewhen

θ

issmallerthan1 (notethat

θ =

1 iscorrespondingtotherandomdatadivision).Similarly, P∗ alsoreaches an optimalvalue when

θ <

1. Theseresults suggest thatwhen morelinksofsmalldegreeusers areput intheprobe set,therecommendationaccuracycanbebetter.Thisphenomenon isnatural.Table 1 showsthedependenceofdifferent recommen-dationmetricsontheoptimal

λ

∗ onseveral

θ

values.Onecansee thattheresultsinMovielensareconsistentwiththoseinNetflix.

Fig. 4. (Color online.) Thedependence oftheoptimal recommendation accuracy

(when

λ

∗isused)on

θ

.ThedatainthisfigureisNetflix.Theresultsareaveraged

over10timesofindependentrealizations.

Table 1

ComparisonoftheRS∗, P∗,D∗ andI∗ (when

λ

∗isusedinthehybridmethod)in

therandomdatadivision(θ =1)andheterogeneousdatadivision(θ =1).

θ =0.5 θ =1.0 θ =1.5 θ = θopt Netflix RS∗ 0.0665 0.0676 0.0729 0.0658 P∗ 0.0635 0.0596 0.0542 0.0639 I∗ 374.7 325.2 285.9 279.2 D∗ 0.763 0.834 0.874 0.881 Movielens RS∗ 0.0728 0.0792 0.0851 0.0684 P∗ 0.1711 0.1592 0.1465 0.1739 I∗ 207.9 198.7 191.4 187.4 D∗ 0.922 0.930 0.934 0.937

In the literature, it has been pointed out that large degree itemsare more likelyto be collected by smalldegree users[21]. When

θ <

1,thenumberoflinksintheprobesetforsmalldegree usersincreasesandsmalldegreeuserstendtoselectlargedegree items. Therefore,thenumberoflinksconnectingto thelarge de-greeitemswillincrease intheprobeset.As thelinksconnecting to large degree items are easier to predict, the recommendation accuracy is improved. From the practical point of view, this re-sult suggests us to update large degree users’ recommendation listmorefrequently.Inthisway,their linksforpredictionineach roundofrecommendationwillbereducedandtheoverall recom-mendationaccuracywillbeimproved.

Finally,we studytheeffectof

θ

onthe predictionaccuracyof items withdifferent degree. The ranking scores on the item de-greeunderdifferent

θ

areshowninFig. 5.Itcanbeseenthatthe ranking score decreases withitem degree.This indicates that on averagelarge degreeitemscan bemoreaccurately recommended thanthesmalldegreeitems.Theinsetsalsoshowtherelation be-tween ranking scoreand item degree,but withthe x-axis setas log. There is a tendency that the ranking score of small degree itemsareinfluenced moreby

θ

.Thisindicates thatthe improved observedabove aremainlyduetothefact thatthe recommenda-tionaccuracyofthesmalldegreeitemsisimproved.

In fact, the heterogeneous data division is corresponding to personalizedupdate frequencyoftherecommendationlist. There-fore,it can be regardedasa newmethod toimplementthe rec-ommendation algorithm.We denote thehybrid methodwith the heterogeneousdata division asHH method.Wethen compareits performance with the traditionalrecommendation algorithms in-cludingtheGlobalRankmethod(GR)[1],User-basedCollaborative Filtering(UCF)[32]andMassDiffusion(MD)[9].Inaddition,two morerecentalgorithmsarecompared.ThefirstoneiscalledMost Popular Removalalgorithm(MPR)which recommenditemsbased

(5)

Fig. 5. (Coloronline.) Thedependenceoftherankingscoreontheitemdegreein(a)

Netflixand(b)Movielenswhendifferentvaluesof

θ

areset.Insetsshowranking

scoresofitemswhosedegreeisnomorethan100ondifferent

θ

.

Table 2

ComparisonsofaverageRS∗,P∗,D∗andI∗betweenthedifferentalgorithms.Inthe

table,theoptimalvaluesforthedatasetsbythedifferentalgorithmsare

empha-sizedinbold-face. GR UCF MD MPR BHC HH Netflix RS∗ 0.0875 0.0783 0.0718 0.0721 0.0714 0.0658 P∗ 0.0415 0.0490 0.0530 0.0527 0.0581 0.0639 I∗ 508.78 484.51 468.34 471.71 326.18 279.24 D∗ 0.3508 0.5193 0.5830 0.5663 0.8286 0.8813 Movielens RS∗ 0.1320 0.1024 0.1025 0.1027 0.0822 0.0684 P∗ 0.0852 0.1457 0.1375 0.1314 0.1628 0.1739 I∗ 355.40 312.98 314.32 286.54 199.63 187.40 D∗ 0.5125 0.7642 0.7408 0.7688 0.9263 0.9371

on the informationbackbone ofthe user-item network [19]. The popularityofalink ia isdefinedaskuikoa, wherekui (koa) isthe degree of user ui (itemoa). The method iteratively removesthe most popular linksto obtain the information backbone. In MPR, therecommendation isfinally done withthe MDalgorithm after thenetwork structure reduction.The second one iscalledBiased HeatConductionmethod(BHC)which couldgreatlyimprovesthe accuracyof thestandard HeatConduction algorithmby consider-ingthedegreeeffectsinthelaststepofthelocalheatconduction process [33].The resultsoftherecommendationmetrics ofthese methodsareshowninTable 2.

Accuracy is always the firstconsideration in evaluating a rec-ommendationalgorithm’sperformance.Comparingtheresultfrom the sixmethods,we can see that HHoutperforms theother five algorithmsinrankingscoreandprecision.Forexample,compared withBHC,the RS canbereducedby 7.84%forNetflix, 16.79%for Movielens. ComparedwithUCF, RS canbe reducedby15.96% for Netflix, 33.20% forMovielens. The improvement is even larger if comparedwithGR,i.e.24.80%forNetflix,48.18%forMovielens.In thissense,HHcanprovidethebestaccuracy.

Besides accuracy, novelty and diversity are two other impor-tant metrics. As shown in Table 2, HH outperforms GR in D

(

L

)

by 151.23% for Netflix, 82.85% for Movielens. When comparing HH with UCF and BHC, D

(

L

)

can be improved by 69.71% and 6.36% respectively for Netflix, and 22.62% and 1.17% respectively forMovielens.Takentogether,HHachievesthebest recommenda-tionperformanceamongalltheconsideredmethods.

7. Conclusion

In this paper, we propose a new framework to evaluate the performance of recommendation algorithms. Compared with the traditional method, the way of real links divided into the train-ing set and probe set is different. The amount oflinks for each usertobeplacedintheprobesetisbasedonhis/herdegreewith atunableparameterwhichcontrolswhethersmalldegreeusersor largedegreeuserstendtohavemorelinksinthe“future”.We con-sider severalrepresentative recommendation algorithms and find

thatthenewevaluationframeworksignificantlyinfluencesthe es-timated accuracy anddiversity of the recommendation methods. Moreover,ifthenumberofprobesetlinksforlargedegreeusersis smallerthanthatfromtherandomdivision,theoverall recommen-dation accuracy anddiversity are improved.Our new framework can be interpreted by the update frequencyof the recommenda-tion lists in practice. Ourresults show that the recommendation listsofthelargedegreeusersneedtobeupdatedmorefrequently to achieve higherusersatisfactory. Inthissense, ourfindings are meaningful inapplicationasithelpsonlineretailers make better useoftheexistingrecommendationmethods.

Our resultsindicate that theperformance ofthe existing rec-ommendation methods are only tested in a very special case in which eachlinkofthe largedegreeusers andsmalldegree users isequallylikelytobeputintheprobeset.Therefore,reexamining theperformanceofalltheexistingmethodsunderthenew frame-workinthispaperwouldbeaninterestingextension.Moreover,a moregeneraldatadivisionframework couldmaketheprobability ofa linktobeputintheprobeset dependsonboth userdegree anditemdegree.Suchframeworkcouldbeusedtoinvestigatealso howoftenitemsofdifferentpopularityshouldbeincludedinthe recommendationlist.Thisproblemasksforfutureresearch. Acknowledgements

This research was supported in part by the National Nat-ural Science Foundation of China under grant Nos. 61379066, 61379064, 61472344, 61402395, Natural Science Foundation of Jiangsu Province under contracts BK20130452, BK20140492, and ChinaScholarshipCouncil.A.Z. acknowledgesthesupportfromthe Youth Scholars Program of Beijing Normal University (grant No. 2014NT38).

References

[1]L.Lu,M.Medo,C.H.Yeung,Y.-C.Zhang,Z.-K.Zhang,T.Zhou,Phys.Rep.519 (2012)1.

[2]J.L.Herlocker,J.A.Konstan,K.Terveen,J.T.Riedl,ACMTrans.Inf.Syst.Secur.22 (2004)5.

[3]J.B.Schafer,D.Frankowski,J.Herlocker,S.Sen,Theadaptiveweb,in: Collabo-rativeFilteringRecommenderSystems,Springer,2007,p. 291.

[4]Y.Koren,R.Bell,C.Volinsky,Computer8(2009)30.

[5]H.Ma,H.-X.Yang,U.L,R.Michael,I.King,in:Proc.ACMConferenceon Infor-mationandKnowledgeManagement,ACM,2008,p. 931.

[6]M.Jamali,M.Ester,in:Proc.ACMConferenceonInformationandKnowledge Management,ACM,2010,p. 135.

[7]T.Zhou,J.Ren,M.Medo,Y.-C.Zhang,Phys.Rev.E76(2007)046115.

[8]Y.-C.Zhang,M.Blattner,Y.-K.Yu,Phys.Rev.Lett.99(2007)154301.

[9]T.Zhou,Z.Kuscsik,J.-G.Liu,M.Medo,J.R.Wakeling,Y.-C.Zhang,Proc.Natl. Acad.Sci.107(2010)4511.

[10]B.Sarwar,G.Karypis,J.Konstan,J.Riedl,in:Proc.ACMConferenceon Informa-tionandKnowledgeManagement,ACM,2001,p. 285.

[11]F.Ricci,L.Rokach,B.Shapira,in:RecommenderSystemsHandbook,Springer, 2011,p. 1.

[12]J.Ren,T.Zhou,Y.-C.Zhang,Europhys.Lett.82 (5)(2008)58007.

[13]A.Zeng,A.Vidmer,M.Medo,Y.-C.Zhang,Europhys.Lett.105 (5)(2014)58002.

[14]W.Zeng,A.Zeng,M.-S.Shang,Y.-C.Zhang,PLoSONE8 (11)(2013)e79354.

[15]D.Reitter,C.Lebiere,in:ProceedingsoftheTwenty-SixthAAAIConferenceon ArtificialIntelligence,AAAI,2012,p. 242.

[16]J.Zhou,J.Yin,T.Chen,X.-W.Ding,Z.-F.Gao,M.-W.Shen,PLoSONE6 (9)(2011) e23873.

[17]J.Yang,J.Kim,W.Kim,Y.H.Kim,PLoSONE7 (11)(2012)e49126.

[18]A.Zeng,C.H.Yeung,M.-S.Shang,Y.-C.Zhang,Europhys.Lett.97(2012)18005.

[19]Q.-M.Zhang,A.Zeng,M.-S.Shang,PLoSONE8 (5)(2013)e62624.

[20]W.Zeng,A.Zeng,H.Liu,M.-S.Shang,T.Zhou,Sci.Rep.4(2014)6140.

[21]M.-S.Shang,L.Lu,Y.-C.Zhang,T.Zhou,Europhys.Lett.90 (4)(2010)48006.

[22]P.G.Campos,F.Díez,A.Bellogín,in:Proc.ACMConferenceonInformationand KnowledgeManagement,ACM,2011,p. 29.

[23]W.-J.Song,Q.Guo,J.-G.Liu,PhysicaA416(2014)192.

[24]Y.-B.Zhou,A.Zeng,W.-H.Wang,PLoSONE10 (3)(2015)e0120735.

[25]L.Xiang,Q.Yuan,S.-W.Zhao,L.Chen,X.-T.Zhang,Q.Yang,J.Sun,in:Proc.ACM ConferenceonInformationandKnowledgeManagement,ACM,2010,p. 723.

[26]G.Bianconi,P.Laureti,Y.-K.Yu,Y.-C.Zhang,PhysicaA332(2004)519.

(6)

[27]Y.Guan,D.Zhao,A.Zeng,PhysicaA392(2013)3417.

[28]K.Sugiyama,K.Yoshikawa,M.Yoshikawa,in:Proc.13thInternational Confer-enceonWorldWideWeb,ACMPress,2004,p. 675.

[29]Q.Guo,W.-J.Song,L.Hou,Y.-L.Zhang,J.-G.Liu,PhysicaA401(2014)15.

[30]X.-L.Ren,L.-Y.Lu,R.-R.Liu,NewJ.Phys.16 (6)(2014)063057.

[31]D.-C.Nie,Y.-H.An,Q.Dong,YanFu,TaoZhou,PhysicaA421(2015)44.

[32]A.Bellogin,P.Castells,I.Cantador,ACMTrans.Web8 (2)(2014)12.

[33]J.-G.Liu,T.Zhou,Q.Guo,Phys.Rev.E84 (3)(2011)037101.

Figure

Fig. 1. (Color online.) (a) ranking score, (b) precision, (c) novelty and (d) personal- personal-ization in ( θ , λ ) plane when the heterogeneous data division is used in the Netflix data
Fig. 3. (Color online.) The dependence of λ ∗ (with respect to ranking score and precision) on θ in (a) Netflix and (b) Movielens
Fig. 5. (Color online.) The dependence of the ranking score on the item degree in (a) Netflix and (b) Movielens when different values of θ are set

Références

Documents relatifs

The quality of delivery score encompassed whether or not the officer spoke at an appropriate speed (i.e., at or less than 200 wpm), delivered each sentence of the right, checked

Hazardous operations, such as the operations in process plants, are confronted by three major risks: occupational, process and intentional damage risks. Previous works have

In this thesis, the water entry problems for 2-D and 3-D objects entering both calm water and regular waves are investigated numerically with a CIP-based method.. Emphasis is put

AE sensors are sensitive to the micro-cracking stage of damage, therefore showed a great potential for early detection of different forms of deteriorations in reinforced concrete

To answer this question, this study aims first to examine the male experience of health and embodiment in physical education. To achieve the aim of this study, the objectives are

Nature exposure may promote cooperative and environmentally sustainable behavior.. Mean differences in connection to nature scores across four conditions. Error bars

The example that has just beenofTered wiJl serve as a useful model for examining the use of democracy asa symbol in the development of low intensity democracy. Much like the notion

Incorporating an inverter type DEG with the proposed pumped hydro system, ba~tery bank and dump load can maximize the wind energy penetration in Ramea hybrid