• Aucun résultat trouvé

Solving the apparent diversity-accuracy dilemma of recommender systems

N/A
N/A
Protected

Academic year: 2022

Partager "Solving the apparent diversity-accuracy dilemma of recommender systems"

Copied!
3
0
0

Texte intégral

(1)

Supporting Information

Zhou et al. 10.1073/pnas.1000488107

0 0.5 1

λ 0.75

0.80 0.85 0.90 0.95

h(20)

Eq. 6 W W

0 0.5 1

λ 125

150 175 200

eP(20)

Eq. 6 W W

0 0.5 1

λ 6

8 10

I(20)

Eq. 6 W W

0 0.5 1

λ 0.07

0.08

r

Eq. 6 W W

Fig. S1. Elegant hybrids of the HeatS and ProbS algorithms can be created in several ways besides that given in Eq.6 of the paper. For example W0αβ¼ ð1−λkαþkλβÞ∑uj¼1aαjaβj∕ki, orW00αβ¼ð1−λÞk1αþλkβuj¼1aαjaβj∕kj. WhileW0αβperforms well only with respect toIð20Þ, Eq.6andW00αβboth have their advantages.

However, Eq.6is somewhat easier to tune to different requirements since it varies more slowly and smoothly withλ. The results shown here are for the RateYourMusic dataset.

0 0.5 1

λ 0

0.005 0.010 0.015

P(L)

L = 10 L = 20 L = 50

0 0.02 0.04 0.06

R(L)

Delicious

0 0.5 1

λ 0

0.05 0.10

P(L)

L = 10 L = 20 L = 50

0 0.1 0.2 0.3 0.4 0.5

R(L)

Netflix

Fig. S2. PrecisionPðLÞand recallRðLÞprovide complementary but contrasting measures of accuracy: the former considers what proportion of selected objects (in our case, objects in the topLplaces of the recommendation list) are relevant, the latter measures what proportion of relevant objects (deleted links) are selected. Consequently, recall (red) grows withL, whereas precision (blue) decreases. Here we compare precision and recall for the HeatSþProbS hybrid algorithm on the Delicious and Netflix datasets. While quantitatively different, the qualitative performance is very similar for both measures.

1

(2)

0 0.5 1 λ

0 200 400 600

eP(L)

L = 10 L = 20 L = 50

0 200 400 600

eR(L) Delicious

0 0.5 1

λ 0

50 100

eP(L)

L = 10 L = 20 L = 50

0 50 100

eR(L) Netflix

Fig. S3. A more elegant comparison can be obtained by considering precision and recallenhancement, that is, their values relative to that of randomly sorted recommendations:ePðLÞ ¼PðLÞ·ou∕DandeRðLÞ ¼RðLÞ·o∕L(Eqs.7a,bin the paper). Again, qualitative performance is close, and both of these measures decrease with increasingL, reflecting the inherent difficulty of improving either measure given a long recommendation list.

0 0.5 1

λ 0.75

0.80 0.85 0.90 0.95

h(20)

probe users only all users

0 0.5 1

λ 6

8 10

I(20)

probe users only all users

Fig. S4. Comparison of the diversity-related metricshð20ÞandIð20Þwhen two different averaging procedures are used: averaging only over users with at least one deleted link (as displayed in the paper) and averaging over all users. The different procedures do not alter the results qualitatively and make little quantitative difference. The results shown are for the RateYourMusic dataset.

0 50 100

100 200 300

0 100 200 300

0.5 0.7 0.9

0.7 0.8 0.9

0.8 0.9 1.0

0 0.5 1

λ 0

5 10

0 0.5 1

λ 6

8 10

0 0.5 1

λ 6

8 10 12

Netflix RYM Delicious

eP(L)h(L)I(L)

Fig. S5. Comparison of performance metrics for different lengthsL of recommendation lists:L¼10(red),L¼20(green), and L¼50(blue). Strong quantitative differences are observed for precision enhancementePðLÞand personalizationhðLÞ, but their qualitative behavior remains unchanged. Much smaller differences are observed for surprisalIðLÞ.

2

(3)

0 0.5 1 λ

10-2 100 102

eP(20)

original probe degree < 100 degree < 200

0 0.5 1

λ 0.5

0.6 0.7 0.8 0.9 1

h(20)

original probe degree < 100 degree < 200

0 0.5 1

λ 0

5 10 15

I(20)

original probe degree < 100 degree < 200

0 0.5 1

λ 0

0.1 0.2 0.3

r

original probe degree < 100 degree < 200

Fig. S6. Our accuracy-based metrics all measure in one way or another the recovery of links deleted from the dataset. Purely random deletion will inevitably favor high-degree (popular) objects, with their greater proportion of links, and consequently methods that favor popular items will appear to provide higher accuracy. To study this effect, we created two special probe sets consisting of links only to objects whose degree was less than some threshold (either 100 or 200): links to these objects were deleted with probability 0.5, while links to higher-degree objects were left untouched. The result is a general decrease in accuracy for all algorithms—unsurprisingly, since rarer links are inherently harder to recover—but also a reversal of performance, with the low-degree-favoring HeatS now providing much higher accuracy than the high-degree-oriented ProbS, USim, and GRank. The results shown here are for the Netflix dataset.

0 0.5 1

λ 50

100 150 200

eP(20)

ProbS HeatS + ProbS HeatS + USim HeatS + GRank ProbS + GRank

0 0.5 1

λ 0.4

0.6 0.8 1.0

h(20)

ProbS HeatS + ProbS HeatS + USim HeatS + GRank ProbS + GRank

0 0.5 1

λ 4

6 8 10

I(20)

ProbS HeatS + ProbS HeatS + USim HeatS + GRank ProbS + GRank

0 0.5 1

λ 0.04

0.06 0.08 0.10 0.12

r

ProbS HeatS + ProbS HeatS + USim HeatS + GRank ProbS + GRank

Fig. S7. In addition to HeatSþProbS, various other hybrids were created and tested using the method of Eq.5in the paper, where for hybrid XþY,λ¼0 corresponds to pure X andλ¼1pure Y. The results shown here are for the Netflix dataset. The HeatSþUSim hybrid offers similar but weaker performance compared to HeatSþProbS; combinations of GRank with other methods produce significant improvements inr, the recovery of deleted links, but show little or no improvement of precision enhancementePðLÞand poor results in diversity-related metrics. We can conclude that the proposed HeatSþProbS hybrid is not only computationally convenient but also performs better than combinations of the other methods studied.

3

Références

Documents relatifs

2014 The hyperfine coupling in metallic cobalt and in cobalt alloys has been determined both by nuclear orientation and by low temperature specific heats.. The results are

Despite the limitations, it may be advantageous to use of a number of simple predictions rules for the theoretical heats of combustion. For practical reasons, the evaluation of the

We identified four main areas where further research should be conducted in order to build true a↵ective recommender systems: (i) using emotions as context in the entry stage,

We describe our hybrid recommender system, called AdaRec, that uses domain attributes to un- derstand the domain drifts and trends, and user feedback in order to change it’s

Changes in coefficient of determination, efficiency, optimum number of functional groups and Jaccard index in simulated datasets, versus relative error of ecosystem function and

Since explicit ratings are not always available (26), the algorithms studied in this paper are selected to work with very simple input data: u users, o objects, and a set of

A l’aide du logiciel de calcul formel, justifier rigoureusement le comportement de la suite pour certaines valeurs de u 0 et en déduire un minorant de λ.. On reprend le cas

[r]