• Aucun résultat trouvé

Recommender systems

N/A
N/A
Protected

Academic year: 2022

Partager "Recommender systems"

Copied!
55
0
0

Texte intégral

(1)

Recommender systems

& matrix factorization

Stéphane Canu

[email protected]

APPC, 29 novembre 2021

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 1 / 52

(2)

Road map

1 Recommender systems: definitions

2 The Netflix prize

3 Factorize to recommend

4 Evaluation metrics

5 Socially enabled preference learning Tuenti’s problem

Modeling social preferences

Alternating least square minimization Experimental results on real data

(3)

Recommendations

(4)

Recommend items, movies, lines of code. . .

(5)

Rec Sys definition

Definition: recommender system given a user,

I given a request of the user on a list of items

recommend one or more items (suggest relevant items to users)

I predict the "rating" (or "preference") this user would give to this list

credit V. Guigue, LIP6

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 5 / 52

(6)

Rec Sys definition

Definition: recommender system

given a user, data on the user, a list of items, data on these items, some past relations between users and items

I given a request of the user on a list of items

recommend one or more items (suggest relevant items to users)

I predict the "rating" (or "preference") this user would give to this list

I based on that, select

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 6 / 52

(7)

Data based Recommender system

Available data Past behavior

I what you buy

I how you like

Relations to other users Item similarity

Context

I Time,

I Sequence,

I Item description,

I User categorization: age, socio-professional category,

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 7 / 52

(8)

Rec Sys Examples

Examples:

the Amazon item-to-item recommendation

if you buy a remote control, think about the battery what is the best smartphone -or car- for me

Google?

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 8 / 52

(9)

Seller’s/User’s perspective

Seller

Increase the number of items sold

Sell more diverse items Increase the user satisfaction Increase user fidelity

Better understand what the user wants

User

Find Some Good Items

[precision issue] quickly and/or in a huge catalog

Find all good items [recall issue]

I Layers Information Retrieval tasks

Being recommended a sequence / a bundle

Just browsing

Help others (forum profiles)

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 9 / 52

(10)

the value of recommendations

Amazon: 35% sales from recommendations

Netflix: 2/3 of the movies watched are recommended

Google News: recommendations generate 38% more clickthrough Choicestream: 28% of the people would buy more music if they found what they liked.

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 10 / 52

(11)

Rec Sys history

1998 Amazon item-to-item recommendation

2004-Now Special sessions in recommender system in several important conferences & journals: AI Communications ; IEEE Intelligent Systems; International Journal of Electronic Commerce;

International Journal of Computer Science and Applications; ACM Transactions on Computer-Human Interaction; ACM Transactions on Information Systems

2007 First ACM RecSys conference

2008 Netflix online services (& innovative HMI) 2008-09 Netflix RS prize

2010-Now RS become essential : YouTube, Netflix, Tripadvisor, Last.fm, IMDb, etc...

2014 Recommenders allow access to the long-tail of choices: Discovery

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 11 / 52

(12)

Rec Sys approaches

Collaborative filtering Item based approach

Hybrid Recommendation

combining collaborative filtering, content-based filtering, and other approaches

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 12 / 52

(13)

Road map

1 Recommender systems: definitions

2 The Netflix prize

3 Factorize to recommend

4 Evaluation metrics

5 Socially enabled preference learning Tuenti’s problem

Modeling social preferences

Alternating least square minimization Experimental results on real data

(14)

For 1 million $

Netflix Challenge (2007-2009) Task: movies recommendation CineMatch: Netflix rec. engine Improve performances by 10%

Criterion: square error Data

480 000 users 17 000 movies

100 millons scores (from 1 to 5) sparcity rate: 1,3 %

test on (almost) 3 millions (most recent ones) 48 000 download

(15)

Netflix rules

RMSE = v u u t

1 nt

nt

X

i=1

(yi −yˆi)2

random: RMSE=2

average rating: RMSE = 1.054

Cinematch: RMSE: 0.9514

(10% improvement)

prize RMSE: 0.8572

(10% improvement)

it is very simple to produce

“reasonable” recommendations it is extremely difficult to improve them to become “great”

X. Amatriain, Recsys 2014 Tutorial - The Recommender Problem Revisited

(16)

Netfix prize history

oct 6, 2006: challenge opened (RMSE: 0.9514 ) oct 2007: team KorBel RMSE: 0.8712 (8.43 % impro.)

I Top 2 algorithms

1 Singular Value Decomposition (SVD)

2 Restricted Boltzmann Machines (RBM as a NN)

oct 2008: team "BellKor in Big Chaos" RMSE: 0.8616 (9.44 % impr.) june 26, 2009: RMSE: 0.8558 (above 10.% impr.)

july 26, 2009: STOP

I "The Ensemble" 10.10% improvement on the Qualifying set

I and "BellKor’s Pragmatic Chaos" 10.09% improvement

sept 18, 2009: the winner is "BellKor’s Pragmatic Chaos": prize RMSE: 0.8567 (10.06% improvement) on the hidden test set

(17)

Netfix prize data

Averages (global, user, movie) and time effects

(18)

Netfix prize leaderboard timeline

(19)

Simon Funk’s SVD using SGD

interesting findings during the Netflix Prize came out of a blog post

incremental, iterative, and approximate way to compute the SVD using gradient descent

min

U∈IRn×k,V∈IRp×k n

X

i=1 p

X

j=1

wij(Yij −uivj>)2

(20)

baseline: CineMatch

factorization: SVD withd factorsUi,Vj ∈IRd min

U,V

X

i,j

(YijUi>Vj)2+λ(kUik2+kVik2)

I normalizationglobal mean µ, for useri bi

min

U,V

X

i,j

(YijUi>Vjµbipj)2+λ(kUik2+kVjk2)

I weightimplicit feedbackcij confidencelevel

min

U,V

X

i,j

cij(YijUi>Vjµbipj)2+λ(kUik2+kVjk2)

I time

min

U,V

X

i,j

(YijUi(t)>Vjµbi(t)pj(t))2+λ(kUik2+kVjk2)

mixture of models

(21)

Netflix results

initial error: 0.9514

factorization - 4 % number of factors - .5 % improvements - 2.5 %

I normalization

I implicit feedback

I time

bagging - 3 % = 0.8563 mixing 100 methods

Koren, Y., Bell, R.M., Volinsky, C.: Matrix factorization techniques for recommender systems. IEEE Computer 42(8), 30–37 (2009)

(22)

Lessons from Netflix

too specific efforts to be generalized factorizationis the solution

implicit feedbackhelps

take into account the specificities (time effect) stochastic gradient (that scale)

I flexible

I scalable

Currently in use as part of Netflix’ rating prediction, but:

I Designed for 100M ratings, not for xxxB ratings

I Not adaptable as users add ratings

I Performance issues

(23)

Road map

1 Recommender systems: definitions

2 The Netflix prize

3 Factorize to recommend

4 Evaluation metrics

5 Socially enabled preference learning Tuenti’s problem

Modeling social preferences

Alternating least square minimization Experimental results on real data

(24)

Recommendation as matrix completion

d d

1 1

1

1 1

1

1 1 1

1 1

Y

V U

F = <U ; V >

? 1

1

hUi,Vji estimates thedesire of useri for the product j U(:,1) andV(1,:)are about western

U(:,2) andV(2,:)are about romance . . .

U(:,d) andV(d,:)are about SF

(25)

Problem

Model

Y = F + R

observation = information + noise F regular andR noise matrix

Problem: build Fbto estimate F

min

Fb

L(Y,Fb)

| {z } data loss

+λ Ω(Fb)

| {z } penalization for λwell chosen

Fb=UVt low rank factorization

(26)

Problem

Model

Y = F + R

observation = information + noise F regular andR noise matrix Problem: build Fbto estimate F

min

Fb

L(Y,Fb)

| {z } data loss

+λ Ω(Fb)

| {z } penalization

for λwell chosen

Fb=UVt low rank factorization

(27)

How to estimate U et V ?

Low rank matrix Y approximation

min

Fb

L(Y,Fb) :=kY −Fbk2F =

n

X

i=1 p

X

j=1

(Yij −Fij)2 with rank(Fb) =d

Eckart & Young 1936

Solution: principal component analysis (PCA) Fb=UVt Yn =normalise(Y)

Large scale adaptation

C =Yn0∗Yn

OUT OF MEMORY

d =3

we need more 200

V,Σ =eig(C,d)

full → sparse SVD

Fb=UV0

(28)

How to estimate U et V ?

Low rank matrix Y approximation

min

Fb

L(Y,Fb) :=kY −Fbk2F =

n

X

i=1 p

X

j=1

(Yij −Fij)2 with rank(Fb) =d

Eckart & Young 1936

Solution: principal component analysis (PCA) Fb=UVt Yn =normalise(Y) Large scale adaptation C =Yn0∗Yn OUT OF MEMORY

d =3 we need more 200

V,Σ =eig(C,d) full → sparse SVD Fb=UV0

(29)

PCA, eigenvalues and SVD

X>X =VS2V>

XX>X =US2U>

(30)

How to implement PCA (to retrieve U and V )?

Singular value decomposition (SVD) Fb=UtV (=UeΣVt) Exact solution on sparse matrices (??? = 0)

svd too slow

use adapted Lanczos process

I bidiagonalizationY =WBZ

I diagonalizeB (using Givens rotations GVL 8.6.1) adapted software: svds, PROPACK a

ahttp://soi.stanford.edu/ rmunk/PROPACK/

>> size(Y) ans = 480189 17770

>> d = 200;

>> tic

>> [U,D,V] = lansvd(Y,d,’L’);

>> toc Elapsed time is 1030.8 seconds.

217 sec athttps://dgleich.wordpress.com/2013/10/19/svd-on-the-netflix-matrix/

(31)

How to compute U and V ?

min

U,V L(Y,UV) with rang(U) =rang(V) =k SGD (Stochastic Gradient Descent)

Compute the gradients loop

I UiNEW =UiOLDp∇UiL(Y,U,V)

I VjNEW =VjOLDp∇VjL(Y,U,V)

Existing implementations

S. Funk processa, PLSA, MMMF (Srebro et al, 2004), . . . code : RSVDb, CofiRankc, . . .

ahttp://sifter.org/ simon/journal/20061211.html bhttp://code.google.com/p/pyrsvd/

chttps://github.com/markusweimer/cofirank

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 29 / 52

(32)

Weighted Singular Value Decomposition algorithm

min

U∈IRn×k,V∈IRp×k

Jw(U,V) with Jw(U,V) =

n

X

i=1 p

X

j=1

wij(Yij −uivj>)2

where wij =0 if Yij is unknown (and elsewij =1)

Marlin’s (or Srebro and Jaakkola) Expectation Maximization (EM)

algorithm. Missing values are completed with values that does not change the results

Z =Y+???

Z = WY + (1−W)Z U,V = svd(Z)

Z = UV>

Z in no longer sparse

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 30 / 52

(33)

Alternated optimization

General framework

minU,V L(U,V,Y) + Ω(U,V)

I data related loss: L(U,V,Y)

I penalization term: Ω(U,V)

scalable algorithm

compute partial gradients example:

forL(U,V,Y) =kUV −Yk2 andΩ(U,V) =0 we have: → Ui = (YiV>).(VV>)−1

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 31 / 52

(34)

Alternated optimization

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 32 / 52

(35)

Road map

1 Recommender systems: definitions

2 The Netflix prize

3 Factorize to recommend

4 Evaluation metrics

5 Socially enabled preference learning Tuenti’s problem

Modeling social preferences

Alternating least square minimization Experimental results on real data

(36)

How good is the RMSE?

RMSE (Root Mean Squared Error) is typically used to evaluate regression problems

It tends to disproportionately penalize large errors as the residual (error term) is squared. This means RMSE is more prone to being affected by outliers or bad predictions.

Fails at taking the Long Tail into account

(37)

Mean Average Precision (from the IR domain)

Precision at k (p@k)is the proportion of recommended items in the top-k set that are relevant

For a user u with 4 liked items to discover:

prediction= i12

i8

i42

i1

i1 i42

i8

i9

=GT

p@4= 3 4 Average Precision

1 K

K

X

k=1

p@k = 1

4(p@1+p@2+p@3+p@4) = 1

4(0+0+2 3 +3

4) Mean Average Precision: mean over the population

(38)

Mean Average Recall (from the IR domain)

Recall at k (r@k)is the number of recommended items in the top-k set that are relevant divided by the number of relevant items

For a user u with 4 liked items to discover:

prediction= i12

i8 i42

i1

i1

i42 i8

i9

=GT

r@3= 2 4 Average Recall

1 K

K

X

k=1

p@k= 1

4(r@1+r@2+r@3+r@4) = 1

4(0+0+2 4+3

4) Mean Average recall: mean over the population

(39)

Mean Reciprocal Rank

At which rank is the first relevant item?

For a user u with 4 liked items to discover:

prediction= i12

i8 i42

i1

i1

i42 i8

i9

=GT

RR = 1 ranki

= 1 2

Mean Reciprocal Rank = Averaging over the whole population

(40)

nDCG : Normalized Discounted Cumulative Gain

Assume a relevance score for each item is available:

prediction= i12

i8

i42

i1

0 2 3 3

= relevance 1 2 3 4

= position

Discounted Cumulative Gain k is:

DCG@k=

k

X

i=1

relevi

log2(i+1) =0+ 2

log2(3) + 3

log2(4) + 3

log2(5) =4.05

nDCG@4= DCG@4

IdealDCG@4 = 4.05

5.89 =0.69 with IdealDCGk relavance being = (3,3,2,0)

NDCG is a measure of ranking quality

(41)

Learn to rank

e.g CofiRank, RankSVM, . . .

(42)

Rec Sys in python

(43)

A/B testing & production launch

Divide your online users into A/B-test groups

The easiest measures are the Click-Through Rate (CTR) and the Conversion Rate (CR) of the recommendations.

(44)

Road map

1 Recommender systems: definitions

2 The Netflix prize

3 Factorize to recommend

Factorization and singular values decomposition 4 Evaluation metrics

5 Socially enabled preference learning Tuenti’s problem

Modeling social preferences

Alternating least square minimization Experimental results on real data

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 42 / 52

(45)

Tuenti’s problem

Size

13M people 100K places

200M social connexions

Really sparse

4 places per user 20 friends per user

user 1 user 2 user 3 user 4

...

...

... ... ... ... ...

1 0

0 10 1 1 0 10 1

0

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 43 / 52

(46)

Objective

d d

1 1

1

1 1

1

1 1 1

1 1

Y

V U

F = <U ; V >

? 1

1

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 44 / 52

(47)

Objective

d d

1 1

1

1 1

1

1 1 1

1 1

Y

V U

? 1

1

A

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 44 / 52

(48)

Existing approaches

Cost functions

minU,V L1(Y,UV) +µL2(A,UU>)+λΩ(U,V) Yanget al.(2011)

minU,V L(Y,UV) +µ

n

X

i=1

kUi− 1

|Fi| X

k∈Fi

Ukk2F +λΩ(U,V) Maet al.(2011) Fi set ofi’s friends

Aii0 =1 if i0 ∈ Fi Decision function

minU,V

X

(i,j)∈Y

cij UiVj + X

k∈Fi

AikUkVj

|Fi|

| {z }

Fˆij

−Yij2

+ ΩU,V,A Maet al.(2009)

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 45 / 52

(49)

Decision function

Decision function

ij =UiVj + X

k∈Fi

AikUkVj

|Fi| Maet al.(2009) (1)

Loss function

U,Vmin,A

X

(i,j)∈Y

cij UiVj + X

k∈Fi

AikUkVj

|Fi| −Yij

2

+ ΩU,V,A (2)

cij penalizing 1 more than 0.

U,V,A1kUk2Fro2kVk2Fro3kAk2Fro

Contribution

Learn friendship influence

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 46 / 52

(50)

Algorithm

Alternating least square minimization – SE CoFi Input: Y andF

intializeU,V andA Repeat

Foreach useri∈ U updateUi

Foreach itemj ∈ V updateVj

Foreach useri∈ U Foreach userk∈ Fi

updateAik

Untilconvergence

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 47 / 52

(51)

Tuenti and Epinions dataset

Tuenti: 13 M users and 100 k places (20 friends)

Epinion: 50 k users et 100 k items (5 friends)

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 48 / 52

(52)

Tuenti and Epinions

0 5 10 15 20

0.65 0.7 0.75 0.8 0.85 0.9 0.95

iMF SE CoFi LLA RSR Trust average

0 5 10 15 20

0.025 0.03 0.035 0.04 0.045 0.05

iMF SECoFi LLA RSR Trust Average

Nombre de facteursd

MAP

Nombre de facteursd

RANK

0 10 20 30 40 50 60

0.54 0.56 0.58 0.6 0.62 0.64 0.66 0.68

Nombre de facteursd

MAP

0 10 20 30 40 50 60

0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3

Nombre de facteursd

RANK

SECoFi Trust LLA RSR iMF average

SECoFi Trust LLA RSR iMF average

Tuenti: 13 M users and 100 k places (20 friends)

Epinion: 50 k users et 100 k items (5 friends)

using 24 cores with 100Go RAM

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 49 / 52

(53)

Back on friends’ influence

−0.20 0 0.2 0.4 0.6 0.8 1 1.2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

frequency of the histogram

−0.2 0 0.2 0.4 0.6 0.8 1 1.20

5 10 15 20 25 30 35

Distribution

values of alpha

Tuenti

Epinion

Affinity principle (homophily effect), no negative influence

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 50 / 52

(54)

Conclusion

Summary

matrix AND graph large scale datasets

recommendation improvement influence measure

Perspectives

take advantage of the framework take into account more contextual data

I GPS, places types, . . .

towards friends recommendations (learn α)

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 51 / 52

(55)

Questions?

d d

1 1

1

1 1

1

1 1 1

1 1

Y

V U

F = <U ; V >

? 1

1

Matrix factorization, recommender systems, personalized with preferences

S. Canu (ITI, INSA Rouen) RecSys APPC, 29 novembre 2021 52 / 52

Références

Documents relatifs

One common characteristic of the previous literature on shilling attacks on CF-RS is their focus on assessing the global impact of shilling attacks on different CF models by

While this has been shown to work well under certain circumstances, it sometimes seems not pos- sible to assess user experience without enabling users to consume items, raising

The participants of the online survey instrument have developed a cognitive model of how recommender systems function going far beyond a primitive understanding of inputs and

These results allow us to extend the model for personalization in music recommender systems by providing guidelines for interactive visualization de- sign for music recommenders,

In this work we studied the attention bias in recommended item lists, where we compared the user behavior in textual and image representation of the items.. We proposed and conducted

However, when T(m i ) contains a tag with multiple matched ontologies in KB, such as the tag “Apple”, we design a simple tags' Co-occurrence relation based method to determine

E.g., if Interstellar is the initial item, Stanley Kubrick was annotated in one of its reviews, and 2001: A Space Odyssey was discovered through Stanley Kubrick, then 2001: A

Our approach allows the user to control the contextual information, i.e., the user can define the content (other users and items) and parameters (users, items, novelty and