Formulation of the Recommendation Problem

In general, the recommendation problem is defined as the problem of estimating ratings for the items that have not been seen by a user. This estimation is based on:

• ratings given by the user to other items,

• ratings given to an item by other users,

• and other user and item information (e.g. item characteristics, user demographics).

The recommendation problem can be formulated [1] as follows:

LetU be the set of all users U = {u1,u2, ...,um}and let I be theset of all possible items I = {i1,i2, ...,in}that can be recommended, such as music files, images, movies, etc. The spaceI of possible items can be very large.

Let f be a utility function that measures the usefulness of item i to user u,

f :U×I →R, (1.1)

where Ris a totally ordered set (e.g. the set of nonnegative integers or real numbers within a certain range). Then, for each useru∈U, we want to choose an itemi∈ I that maximizes the user utility function, i.e.

∀u∈U,i_u =arg max

i∈I f(u,i). (1.2)

In RS, the utility of an item is usually represented by a rating, which indicates how a particular user liked a particular item, e.g., useru1gave the objecti1the rating of R(1,1)=3, where R(u,i)∈ {1,2,3,4,5}.

Each useruk, wherek=1,2, ...,m, has a list of items Iu_k about which the user has expressed his/her preferences. It is important to note that Iu_k ⊆ I, while it is also possible forIu_k to be the null set. This latter means that users are not required to express their preferences for all existing items.

Each element of the user spaceU can be defined with a profile that includes various user characteristics, such as age, gender, income, marital status, etc. In the simplest case, the profile can contain only a single (unique) element, such as User ID.

Recommendation algorithms enhance various techniques by operating

• either onrowsof the matrixR, which correspond to ratings of a single user about different items,

• or oncolumnsof the matrixR, which correspond to different users’ ratings for a single item.

However, in general, the utility function can be an arbitrary function, including a profit function. Depending on the application, a utility f can either be specified by the user, as is often done for the user-defined ratings, or computed by the application, as can be the case for a profit-based utility function. Each element of the user space U can be defined with a profile that includes various user characteristics, such as age, gender, income, marital status, etc. In the simplest case, the profile can contain only a single (unique) element, such as User ID.

Similarly, each element of the item spaceI is defined via a set of characteristics.

The central problem of RS lies in that a utility function f is usually not defined on the entireU ×I space, but only on some subset of it. This means that f needs to begeneralizedto the entire spaceU×I. In RS, a utility is typically represented by ratings and is initially defined only on the items previously rated by the users.

Generalizationsfrom known to unknown ratings are usually done by:

• specifying heuristics that define the utility function and empirically validating its performance, or

• estimating the utility function that optimizes a certain performance criterion, such as Mean Absolute Error (MAE).

4 1 Introduction Once the unknown ratings are estimated, actual recommendations of an item to a user are made by selecting the highest rating among all the estimated ratings for that user, according to Eq.1.2. Alternatively, we can recommend theNbest items to a user. Additionally, we can recommend a set of users to an item.

1.2.1 The Input to a Recommender System

The input to a RS depends on the type of the filtering algorithm employed. The input belongs to one of the following categories:

1. Ratings (also called votes), which express the opinion of users on items. Ratings are normally provided by the user and follow a specified numerical scale (example:

1-bad to 5-excellent). A common rating scheme is the binary rating scheme, which allows only ratings of either 0 or 1. Ratings can also be gathered implicitly from the users purchase history, web logs, hyper-link visits, browsing habits or other types of information access patterns.

2. Demographic data, which refer to information such as the age, the gender and the education of the users. This kind of data is usually difficult to obtain. It is normally collected explicitly from the user.

3. Content data, which are based on content analysis of items rated by the user. The features extracted via this analysis are used as input to the filtering algorithm in order to infer a user profile.

1.2.2 The Output of a Recommender System

The output of a RS can be either apredictionor arecommendation.

• Apredictionis expressed as a numerical value,Ra,j =R(ua,ij), which represents the anticipated opinion of active useruafor itemij. This predicted value should necessarily be within the same numerical scale (example: 1-bad to 5-excellent) as the input referring to the opinions provided initially by active userua. This form of RS output is also known asIndividual Scoring.

• Arecommendationis expressed as a list ofNitems, whereN ≤n, which the active user is expected to like the most. The usual approach in that case requires this list to include only items that the active user has not already purchased, viewed or rated.

This form of RS output is also known asTop-N Recommendation or Ranked Scoring.

1.3 Methods of Collecting Knowledge About

Dans le document Machine Learning Paradigms (Page 16-19)